Improving Aviation Performance through Applying Engineering Psychology

Improving Aviation Performance through Applying Engineering Psychology Advances in Aviation Psychology, Volume 3

Edited by Michael A. Vidulich Pamela S. Tsang CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742

© 2019 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-1385-8863-9 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged, please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, trans- mitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

Names: International Symposium on Aviation Psychology (19th : 2017 : Dayton, Ohio) | Vidulich, Michael A., editor. | Tsang, Pamela S., editor. Title: Improving aviation performance through applying engineering psychology / edited by Michael A. Vidulich and Pamela S. Tsang. Description: Boca Raton : Taylor & Francis, a CRC title, part of the Taylor & Francis imprint, a member of the Taylor & Francis Group, the academic division of T&F Informa, plc, 2019. | Series: Advances in aviation psychology ; volume 3 | “This volume is a collection of expanded papers selected from the 19th International Symposium on Aviation Psychology (ISAP) that was held May 8-11, 2017.” | Includes bibliographical references. Identifiers: LCCN 2018045980| ISBN 9781138588639 (hardback : acid-free paper) | ISBN 9780429492181 (e-book) Subjects: LCSH: Aeronautics--Human factors--Congresses. | Aviation

Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com In the spirit of Wilbur and Orville Wright

It is possible to fly without motors, but not without knowledge & skill.

—Wilbur Wright

From his first letter (13 May 1900) to Octave Chanute. In Marvin W. McFarland (Ed.) The Papers of Wilbur and Orville Wright: 1899–1905 (1953), Vol. 1, p. 13.

Contents

Preface ...... ix Editors ...... xiii Contributors ...... xv

Section I Perceptual and Cognitive Challenges in Aviation

1. Comprehensive Approach to Pilot Disorientation Countermeasures ...... 3 Bob Cheung

2. Influences of Fatigue and Alcohol on Cognitive Performance ...... 25 Hans-Juergen Hoermann

3. Avionics Touch Screen in Turbulence: Simulator Design and Selected Human–Machine Interface Metrics ...... 53 Sylvain Hourlier, Sandra Guérard, and Xavier Servantie

Section II Modeling for Aviation Psychology

4. Prospective Comments on Performance Prediction for Aviation Psychology ...... 79 Kevin A. Gluck, Tiffany S. Jastrzembski, and Michael A. Krusmark

5. Analysis of Work Dynamics for Objective Function Allocation in Manned Spaceflight Operations ...... 99 Martijn IJtsma, Lanssie M. Ma, Karen M. Feigh, and Amy R. Pritchett

Section III Neuroergonomics

6. A Neuroergonomics Approach to Human Performance in Aviation ...... 123 Frédéric Dehais and Daniel Callan

7. Eye Movements Research in Aviation: Past, Present, and Future .... 143 Leandro L. Di Stasi and Carolina Diaz-Piedra

vii viii Contents

8. Human Performance Assessment: Evaluation of Wearable Sensors for Monitoring Brain Activity ...... 163 Kurtulus Izzetoglu and Dale Richards

Section IV Applications

9. Cold Bay Alaska Engine Change ...... 183 Michael Hagler

10. Operational Issues in Aviation Psychology ...... 197 Kathy Fox, Helena (Reidemar) Cunningham, Michael Hagler, Daniel Handlin, and Richard J. Ranaudo

11. Standardized Scenarios for Researchers ...... 217 Jerry M. Crutchfield and Angel M. Millan

Index ...... 237 Preface

This volume is a collection of expanded papers selected from the 19th International Symposium on Aviation Psychology (ISAP) that was held May 8–11, 2017. Selected authors were invited to expand on their presentation and, having the benefit of the interactions at the symposium, provide a state-of- the-art treatment of their topics. The first ISAP was held in recognition of the unique and difficult chal- lenges posed by the aviation environment to the field of applied psychology. Dr. Richard Jensen convened the First Symposium on Aviation Psychology at Ohio State University in Columbus, Ohio in 1981. In the foreword to the proceedings, the goals were clearly laid out, “The objective of this sympo- sium was to critically examine the impact of high technology on the role, responsibility, authority, and performance of human operators in modern aircraft and air traffic control systems.” This was a very ambitious objective for a small conference held in America’s heartland. Nevertheless, the first ISAP was a resounding success! There were 210 attendees for this first gathering and the Proceedings contained 43 papers and abstracts. Considered and debated were many of the central challenges of aviation, such as cockpit display and control design, automation, selection, workload, and performance assessment. The meeting was also successful in attracting participants from the varied communities that have a stake in aviation psychology, including participants from academia, the military, government regulatory agencies, and industry (including airframe manu- facturers and airlines) from around the world. A clear outcome of the first ISAP was the recognition that many challenges would remain and require diligent research in the future. It was also decided that a regular symposium on aviation psychology would provide a forum for encouraging focus on the evolving challenges of aviation psychology, con- solidating findings, and sharpening the debates central to the advance of a safe aviation environment. Consequently, a symposium has been held biennially since 1981. In 2003, the symposium was hosted in Dayton, Ohio, in conjunction with celebra- tions of the 100th Anniversary of the Wright Brothers’ first flight. Since 2009, the symposium has been managed through a collaboration between the Department of Psychology at Wright State University and the Air Force Research Laboratory (AFRL) at Wright-Patterson Air Force Base. The present volume is a direct outgrowth of the 19th ISAP held at Wright State University i n 2017. The years separating the 1st and 19th symposia have witnessed both the enduring challenges and rapidly-changing technological advances confronting aviation psychology as well as the evolving theoretical and

ix x Preface methodological psychological paradigms in meeting these challenges. The conference has continued to focus on the objectives outlined in the First Symposium and to attract broad participation that spans research and opera- tional communities and includes a strong international contingent. The present volume highlights the inherently intricate involvement of human interaction with a vast and complex aviation system in order to accomplish a mission that the human is ill-equipped to accomplish without significant technological support. For example, care must be taken to ensure that the demands placed on any individual or team do not exceed their capa- bilities. Consequently, the interface design is a major concern to reinforce that the information needed by the human operator(s) is presented in easy- to-process formats at an efficient rate of . Importantly, the syn- ergy of the human capabilities (some innate and many acquired via training) and the information provided via the human’s senses and the system’s dis- plays must provide an understanding that can support effective decision making and control. In order to validate the success of interface designs and training regimens, aviation psychology has had to develop assessment tools to measure mental workload and situation awareness in relation to the impacts on operational effectiveness. To optimally support the human, the system must at times utilize automation to take action without direct human control. However, this must be carefully managed in order to not disrupt the human’s understanding of what is happening. It has become clear that advances in automation change, but do not diminish, the importance of the role of humans in aviation systems. Consistent with the vision for ISAP, keynote speaker Dr. Kenneth Ford presented an excellent overview of the parallels between the historical development of aviation technology and the current development of arti- ficial intelligence (AI), as well as speculations of the future of aviation and AI working together. Keynote speaker Dr. David Woods explored the neces- sity of using resilience engineering as a counter to the challenges posed by the complexities of increasingly autonomous systems. And keynote speaker Dr. Bob Cheung provided an overview of the multi-dimensional problem of spatial disorientation in flight and the multi-faceted solutions needed to address it. The first chapter of this book is a more in-depth coverage of his presentation. The remaining chapters were selected from among the technical papers presented during the meeting. They reflect both the emerging and enduring challenges facing aviation psychology today. The chapter topics span from perceptual and cognitive challenges that people face within aviation, such as spatial disorientation, fatigue, or turbulence; to new physiological assess- ment or modeling tools to improve the understanding of the human reaction to aviation challenges. Issues addressing the concerns of practitioners as well as cutting-edge research are both represented in the chapters. We are especially proud to include a chapter whose lead author was the winner of the Stanley N. Roscoe Best Student Paper Competition for the Preface xi

19th ISAP. Martijn IJtsma coauthored the chapter on “Development of an Objective Function Allocation Method for Manned Spaceflight Operations.” We congratulate Martijn on his accomplishment. Despite the dramatic changes in the technologies present within the avia- tion system, many of the challenges confronted by the chapters in this vol- ume were foreshadowed within the 1981 proceedings. This is not surprising because operational effectiveness and safety still depend on coordination between technologies and humans. Developing the human–machine syn- ergy is the enduring challenge of aviation psychology, and the chapters of the current volume are excellent examples of some of the best contemporary approaches for addressing that challenge.

Editors

Michael A. Vidulich is a senior scientist at the Air Force Research Laboratory’s Human Effectiveness Directorate’s Applied Neuroscience Branch. He served as the technical advisor for the Warfighter Interface Division from 2006 to 2013. He is also a member of the adjunct faculty of the Wright State University Department of Psychology, where he has taught since 1989. Previously, he was a research psychologist at National Aeronautics and Space Administration (NASA) Ames Research Center. He received a BA (Psychology) from the State University College of New York at Potsdam, an MA (Psychology) from Ohio State University, and a PhD from the University of Illinois at Urbana-Champaign. His research specializes in cognitive met- rics for human–machine interface evaluation and adaptation. He co-edited the volume, Principles and Practice of Aviation Psychology, with Pamela Tsang, and the volumes Advances in Aviation Psychology—Volumes 1 and 2 with Pamela Tsang and John Flach.

Pamela S. Tsang is a professor of psychology at Wright State University in Dayton, Ohio. She received her AB from Mount Holyoke College and her PhD from the University of Illinois at Urbana-Champaign. Previously, she was a National Research Council post-doctoral fellow at National Aeronautics and Space Administration (NASA) Ames Research Center. Her research interests are attention and performance, extralaboratory-developed expertise, cognitive aging, and aviation psychology. She is interested in applications of her research in a wide variety of domains, which include aviation, surface transportation, and medicine. She co-edited the volume, Principles and Practice of Aviation Psychology, with Michael A. Vidulich.

xiii

Contributors

Daniel Callan received his PhD from the University of Wisconsin Madison in 1998. He conducted research at the Advanced Telecommunications Research Institute Computational Neuroscience Laboratories in Kyoto, Japan, from 1998 to 2013. Since 2013, he has been a senior researcher at the Center for Information and Neural Networks (CiNet) at the National Institute of Information and Communications Technology of Japan and a guest associate professor at Osaka University. He also currently holds a guest associate researcher position at Institut Superieur de l’Aeronautique et de l’Espace (ISAE) in Toulouse, . His research utilizes multiple brain imaging methods (EEG, MEG, fMRI, fNIRS) and stimulation techniques (tDCS, tACS, TMS) to determine neural processes underlying cognitive and mental states in complex real-world situations with the goal of developing neuroergonomic-based technology to enhance human performance and well-being. Much of this research is in the realm of Aviation/ Aerospace Cerebral Experimental Sciences in which perceptual, motor, cogni- tive, and mental states are investigated using flight simulation, as well as dur- ing operation of real manned and unmanned aerial vehicles.

Bob Cheung PhD, FAsMA, retired as the senior scientist of the Joint Operational Human Sciences Centre, Defence Research and Development , Research Centre, Department of National Defence, Canada. He is an adjunct professor of physiology, Faculty of Medicine, University of Toronto. His research interests include spatial disorientation, G transition effects, motion disturbance, visual-vestibular performance under altered gravitoinertial environments, and neuroplasticity. He has published over 100 refereed scientific journal papers, book chapters, scientific reports, and NATO documents. He served as the subject-matter expert in motion distur- bance and spatial disorientation domestically and in the international forum (NATO, TTCP Asian Defence Technology).

Jerry M. Crutchfield is an engineering research psychologist for the Federal Aviation Administration’s National Airspace System (NAS) Human Factors Safety Research Laboratory at the Civil Aerospace Medical Institute in Oklahoma City. He has over two decades of experience researching air traffic control-related human factors. Currently, his primary tasks are to serve as a human factors representative on the Safety Risk Management Panel for remote tower systems and to manage research conducted in two air traffic control simulation labs. The Air Traffic Control Advanced Research Radar Simulator and Air Traffic Control Advanced Research Tower Simulator labs provide high fidelity en route, tower, and TRACON simulation capabilities and measure- ment of human performance, eye movement, and electroencephalography

xv xvi Contributors in support of human factors research in the air traffic control domain. His most recent projects include standardized scenario development, NextGen en route/TRACON controller common information requirements, and the char- acterization of air traffic controller visual scanning behavior.

Helena (Reidemar) Cunningham has held the position of director of human factors in the ALPA Air Safety Organization Human Factors & Training Group since January 1, 2012. She currently flies the Boeing 717 at Delta Air Lines. Before that, she flew the Boeing 757/767, and she was a DC-9 first offi- cer instructor for 10 years. She is also an adjunct professor at the University of Central Missouri, teaching in its master’s degree program. She served six years in the Illinois Army National Guard. Captain Cunningham has participated in human factors-specific industry and academic research and numerous other projects for the past 20 years. She served as a CIRP coor- dinator for five years and Human Factors Subcommittee chair for the past 19 years at Northwest Airlines and Delta Air Lines ALPA. She was co-chair of “A Practical Guide for Effective Pilot Monitoring” working group with a deliverable document published by the Flight Safety Foundation in 2014. Captain Cunningham earned a BS in aviation management from Southern Illinois University; dual specialization MAs in aeronautical science from Embry-Riddle Aeronautical University–Daytona Beach; and a PhD ABD in safety engineering. She was elected a fellow at the Royal Aeronautical Society in London in 2014. She received the 2014 Air Safety Award in July 2015 from the Air Line Pilots Association, International. She currently resides in Ann Arbor, Michigan, with her husband and children.

Frédéric Dehais received a PhD in Cognitive Science Université Fédérale de Toulouse (France). He has been a full professor at ISAE-SUPAERO since 2011 and is a holder of the AXA-chair “Neuroergonomics for flight safety.” He leads the Human Factors and Neuroergonomics Department, a team composed of 18 permanent and non-permanent members with an interdisciplinary expertise in neuroscience, signal processing, computer science, and human factors. His department receives substantial grants from DoD, European and National Research Institutions and has developed strong collaborations with major aeronautical firms, such as , Dassault Aviation, Honeywell, Air France, and . His department has provided expertise for the French safety board for civilian aviation (BEA) and is now fully recognized as a referenced center of expertise by this authority. His research deals with the understanding of the neural correlates of human error in aviation, the design of cognitive countermeasure, and the implementation of brain-computer interface under realistic settings, such as motion flight simulators and actual aircraft. His work has also led to four international patents that are currently implemented in civilian aircraft. He is the founder of the European 2fNIRS portable brain imaging conference with Prof. S. Perrey and of the International Neuroergonomics conference with Prof. H. Ayaz. Contributors xvii

Leandro L. Di Stasi earned his bachelor’s and master’s degrees in experi- mental psychology from the University of Padua (Italy) and obtained his PhD in behavioral neuroscience in 2011 from the University of Granada (). From 2012 to 2014, Dr. Di Stasi worked at the Barrow Neurological Institute (AZ) under the US Fulbright Visiting Scholar Program. From 2014 to 2016, he worked at the Mind, Brain, and Behavior Research Center (Spain) under the EU Marie-Curie Excellence Scholar Program. After his postdoctoral studies, since 2017, he has been appointed an assistant pro- fessor of psychology at the University of Granada (Spain). His primary research interests focus on the application of eye movements technologies to improve operators’ performance in safety-critical systems, such as mili- tary and healthcare organizations. With Dr. Diaz-Piedra, Leandro leads the Neuroergonomics and Operator Performance Lab.

Carolina Diaz-Piedra is a lecturer at the University of Granada’s School of Psychology (Spain). She holds a BS in psychology and an MSc in research designs in health from the University of Granada. She received her PhD degree in psychology in 2013 from University of Granada for her work on sleep disturbances in chronic conditions, primarily chronic pain patients. From 2014 to 2016, she carried out her postdoctoral studies at Arizona State University (Phoenix, AZ), where she worked as a visiting scholar in the College of Health Solutions. Her research interests focus on psycho- physiology and health psychology and, especially, the improvement of sleep health through evidence-based, cost-effective behavioral interven- tions, and the impact of sleep disturbances on performance and well-being in operators in safety-critical systems. With Dr. Di Stasi, Carolina leads the Neuroergonomics and Operator Performance Lab.

Karen M. Feigh is an associate professor at Georgia Institute of Technology’s School of Aerospace Engineering. Dr. Feigh previously worked on fast-time air traffic simulation, conducted ethnographic studies of airline and frac- tional ownership operation control centers, and designed expert systems for air traffic control towers and NextGen concepts. She is also experienced in conducting human-in-the-loop experiments for concept validation. Her research interests fall into two broad categories: Decision Support System Design and Computational Cognitive Modeling for Engineering Design. Dr. Feigh’s research interests include the domains of dynamic socio-technical settings, including airline operations, air transportation systems, UAV and MAV ground control stations, mission control centers, and command and control centers. More generally her research interests include adaptive auto- mation design, the measurement of and design for different cognitive states. She serves as an associate editor for IEEE’s Transactions on Human Machine Systems and the Journal of the American Society. She currently serves on the National Academies’ Aeronautics and Space Engineering Board. She holds a BS in aerospace engineering from Georgia Institute of Technology, an xviii Contributors

MPhil in aeronautics from Cranfield University, UK, and a PhD in industrial and systems engineering from Georgia Institute of Technology.

Kathy Fox, chair, Transportation Safety Board of Canada, has been involved in aviation for more than 40 years, including sport parachut- ing, commercial aviation, and a 33-year career in air traffic control. Kathy retired from NAV CANADA in 2007 as VP, Operations. She holds an airline transport pilot license and a flight-instructor rating and has flown over 5000 hours. Kathy has received numerous accolades and was inducted into Canada’s Aviation Hall of Fame on June 9, 2016. She has been a mem- ber of the Transportation Safety Board since 2007 and was appointed TSB Chair in 2014.

Kevin A. Gluck is a Principal Cognitive Scientist with the Air Force Research Laboratory (AFRL). He began his career with the Air Force in 1993, first as a summer intern and then as a contractor research assis- tant at Lackland Air Force Base, supporting intelligent tutoring systems research. He became a government civilian scientist in 1996 during which time he was awarded a PALACE Knight Graduate Training Fellowship while pursuing his PhD in Cognitive Psychology at Carnegie Mellon University. After completing his doctorate, Kevin signed on with AFRL at the Mesa Research Site in Arizona, eventually relocating to Wright- Patterson Air Force Base, Ohio. In portions of 2010 and 2011, he held a “Gastwissenschaftler” (Visiting Scientist) position at the Max Planck Institute for Human Development in Berlin, . In 2011, he was honored to receive the Governing Board Award for Distinguished Service to the Cognitive Science Society and, in 2014, started a six-year term on the Cognitive Science Society Governing Board. Kevin has authored or co-authored more than 70 journal articles, book chapters, and conference papers; is co-inventor on two US patents; and has played a lead role in the organization and management of 13 international conferences and work- shops. He is the Core Research Area Lead for Personalized Learning and Readiness Sciences. Kevin’s enduring research interests and activities focus on computational and mathematical models of cognitive processes to explain and improve human performance.

Sandra Guérard received her engineering degree at Institut Français de Mécanique Avancée in 2002 and her PhD in mechanical engineering from the University of 6 in 2005. From 2005 to 2011, she was an associate professor in the Biomechanics Laboratory in Paris, and her main research interest was focused on biomechanical modeling of the neuro-musculoskel- etal system. Since 2011, she has been an associate professor at Institut de Mécanique et d’Ingénierie in Bordeaux. Her research interests include dura- bility of material and structures submitted to complex dynamic loadings, focusing in particular on the development of original experimental devices. Contributors xix

She has published several articles on the dynamic behavior of materials and biomechanics. She is a member of the governing board of DYMAT associa- tion since 2015.

Michael Hagler is currently a duty manager at Delta Airlines, managing two shifts of aircraft maintenance teams. He joined Delta following a 22-year maintenance career with American Airlines during which he held positions as a line maintenance technician, lead mechanic, production supervisor, shift manager, and finally as station manager of Seattle Aircraft Maintenance. At Delta he has partnered with researchers at the University of Washington on the cognitive workload of maintainers. He is deeply involved in maintenance education and works with regional and national education organizations. He understands the need for and value of human factors and the many human factors challenges in the maintenance environment. He created and currently administers an industry-leading aircraft maintenance intern program partnership between Delta and South Seattle College. The intent of this program is to allow students to gain real-world, hands-on experience in an airline maintenance line environment, augmenting their situational and human factors awareness, as well as enhancing skill sets.

Daniel Handlin started his 50 years of flying experience in 1966 at Logan County Airport in Lincoln, Illinois. As he said, “When I stepped into that airplane I had no idea I had begun to build flight time to be an airline pilot, but 22 minutes later I sure did…” Dan soloed on his 15th birthday and got his private license on his 17th birthday. He took his dream of flying and playing college football to the Air Force Academy, where he graduated in 1973. Graduating in the top of his USAF Pilot Training Class, a few years later, still in his twenties, he became a C-141 Aircraft Commander Flight Examiner and Squadron Flight Safety Officer. His career has been in 17 dif- ferent cockpits, ranging from an Aeronica Champ to an Airbus 320. Dan also completed a 27-year USAF career, both active and reserve duty, retiring in 2001. Among his career accomplishments he was the primary architect of the first procedures manuals (Boeing 727) for the NWA/Republic merger. He also built and managed a Northwest Air Lines FAR compliance audit in preparation for an FAA 30-day, 30-inspector NASIP inspection the result of which was a near flawless inspection result with no monetary fines. In another initiative he designed, built, and managed a highly successful, cul- ture changing, ideas campaign called Northwest NOW! that saved NWA over $100 million dollars from 13,000 employee ideas received in just 90 days— a game changer to avoid bankruptcy. And while he was a line pilot, he was assigned to the NWA Headquarters as a director of human resources with the chief task of renovating and modernizing the HR department. Captain Handlin’s current priority has been the development of a new, sys- temic, transformative approach to Air Line Safety called the Airline Flight xx Contributors

Operating System (AFOS). AFOS was presented at the 2015 and 2016 ALPA Safety Conference and is now being presented in multiple safety venues worldwide.

Hans-Juergen Hoermann is a senior human factors scientist at the German Aerospace Center (DLR) in Hamburg. He received his PhD in applied psy- chology from Free University in Berlin in 1987 and has accumulated over 30 years of research experience on aviation human factors. His research interests cover human performance and alertness management of flight crewmembers, Cockpit Resource Management (CRM) training, and pilot aptitude testing. Hans has published over one hundred research papers, most recently addressing the impact of present and future levels of automa- tion on the human role in the cockpit. In collaboration with airline training organizations, he developed several training courses specifically for Crew Resource Management and Flight Instruction. As a technical fellow for Safety and Human Factors, Hans spent more than three years at Boeing’s Research & Technology Center in Madrid. He was awarded fellowship in the Royal Aeronautical Society in 2006 and is a registered Aviation Psychologist in the European Association for Aviation Psychology. Hans serves as domain editor for the International Journal of Aviation Psychology. In his free time, he enjoys flying single-engine airplanes as a private pilot.

Sylvain Hourlier MD, MSc in human factors, retired from the Air Force as a Flight Surgeon Colonel after being a Human Factors senior research sci- entist in the French Defense Institute for Biomedicine (IRBA) for 15 years. His major work covered the Rafale Pilot assistant, Military ATC, Tigre Helicopter HMI, and operational military night vision. He was in charge of Airbus Zero-G onboard medical security and flew more than 1300 parab- olas between 2004 and 2009. He joined Thales in 2009 as a human factors specialist for the Avionics Division, in Merignac, near Bordeaux. He is also a scientific coordinator and co-manager of HEAL (Human Engineering for Aerospace Laboratory), a THALES/University of Bordeaux joint research ini- tiative. In 2016, he was appointed HF Senior Expert for the Avionics GBU. His main interest is cognitive resources management and its application in the aerospace domain, military and civil. His current work covers ecologic cognitive strategies and AI applications for next-generation cockpit displays.

Martijn IJtsma is a PhD student in the Daniel Guggenheim School of Aerospace Engineering at Georgia Tech. Currently, he works with Dr. Amy Pritchett and Dr. Karen M. Feigh in the school’s Cognitive Engineering Center (CEC). Martijn received his BSc degree in aerospace engineering with honors in 2013 and his MSc degree in aerospace engineering with honors in 2016, both at the Delft University of Technology (TU Delft) in the Netherlands. His research interests include human–robot/automation teaming and function allocation, as well as the design of automation support for decision-making Contributors xxi in complex work environments. At TU Delft, his thesis research was on the development of an adaptive automation system to support air traffic control- lers’ decision-making. His research at the CEC focuses on the use of computa- tional simulation to study function allocation in human–automation/robots teams, initially in the air traffic control domain and currently in manned spaceflight operations.

Kurtulus Izzetoglu, PhD, is an associate research professor of Biomedical Engineering at Drexel University, Philadelphia, US. His areas of expertise include human performance, learning, training, functional brain imaging, medical sensor development, and biomedical signal processing. He has con- ducted human performance and training research for the past seven years and has been principal investigator on grants from the DoD, FAA, and cor- porate partners. He led and managed a multi-year, other transactions agree- ment (iOTA) between Drexel and the FAA William J. Hughes Technical Center and conducted various human factors (human-in-the-loop) studies on ATCs as well as on pilots. He has been institutional principal investigator for the Drexel University’s efforts for the FAA Center of Excellence for Technical Training and Human Performance–COE Solutions for Operational Aviation Research (SOAR) and for the FAA Center of Excellence for Unmanned Aircraft Systems (UAS)–Alliance for System Safety of UAS through Research Excellence (ASSURE). Some of the research projects in which he is currently involved include: (1) cognitive workload assessment of air traffic controllers, pilots, and unmanned aircraft pilots and operators, (2) cognitive baselin- ing and index developments, (3) role of neurotechnology to improve pilot training, and (4) sensor development for optical brain imaging, hematoma, brain edema, and local tissue oxygenation. He holds a PhD in biomedical engineering from Drexel University. During his PhD studies, he worked on the functional near-infrared spectroscopy system (fNIRS) to identify neuro- markers during change in the cognitive state of mental engagement at both high and low levels of neural activation.

Tiffany S. Jastrzembski is a senior cognitive scientist with the Air Force Research Laboratory. She completed her undergraduate studies in cogni- tive psychology at Carnegie Mellon University and earned her master’s and PhD degrees in the same field under the advisement of Dr. Neil Charness at the Florida State University. She began her research with the Air Force Research Laboratory in 2005, as a summer intern, investigating and develop- ing cognitive models capable of handling the dynamics of human memory, continued on as a National Research Council postdoctoral researcher, and ultimately became a government civilian scientist in 2007. She now holds a Matrix position within the laboratory, meaning she applies the science and technology developed within her home Cognitive Models and Agents Branch to applications of interest at the School of Aerospace Medicine (USAFSAM). She has made noteworthy contributions in highly xxii Contributors applied medical domains spanning cardiovascular pulmonary resuscita- tion, virtual reality laparoscopic surgery, critical care nursing, and aeromed- ical evacuation and is currently expanding applications of interest to a depth and breadth of domains encompassing language learning, sharp shooting, and manufacturing safety. She received the Air Force Research Laboratory’s Early Career Award and the American Psychological Association’s New Investigator Award, holds a patent on the Predictive Performance Optimizer software tool, and possesses a publication record of over 60 refereed papers.

Michael A. Krusmark is a principal research scientist with L3 Technologies at the Air Force Research Laboratory, Human Performance Wing, Cognitive Models and Agents Branch. Mr. Krusmark holds a bachelor of science degree in psychology (1990) and a master of arts degree in cognitive psychology (1997), both from Arizona State University. He possesses 20 years of research experience on a wide range of projects aimed at developing and validating computational and mathematical models that replicate and extend the capac- ities of human cognition and demonstrating the applicability of these capa- bilities in Air Force training domains. A primary focus of this work has been on developing the Predictive Performance Optimizer, a patented technology for personalizing training that Mr. Krusmark co-invented.

Lanssie M. Ma is a fourth-year PhD student at Georgia Tech in the Daniel Guggenheim School of Aerospace Engineering and Computational Science and Engineering. Currently, she works with Dr. Karen M. Feigh at the Cognitive Engineering Center. She received a BA in computer science from the University of California, Berkeley in 2014 and her MS in computational science and engineering from Georgia Institute of Technology. Prior to graduate school, she worked in labs at Berkeley in the CITRIS Invention Lab working on interactive and wearable computing. Lanssie has completed internships at IBM working on back-end and system relay networking as well as at Unity Technologies working on augmented reality interaction with the HoloLens. Lanssie’s interests span from Human Computing Interaction to wearable computing and exploratory outer space robotics. She currently explores human–robot teaming, focusing on the interaction between team members in various space exploration scenarios.

Angel M. Millan is a human factors engineer at Mitsubishi Aircraft Corporation (MITAC) working on the development and certification of the Mitsubishi Regional Jet. Before joining MITAC, he worked for the Federal Aviation Administration as a researcher at the National Airspace System (NAS) Human Factors Safety Research Laboratory in Oklahoma City. He has worked on a variety of projects and programs involving human–system integration for air traffic control (ATC) systems, as well as flight deck with a special emphasis on NextGen technologies. After obtaining his PhD, Dr. Millan received a National Research Council Research Associateship Award Contributors xxiii from the National Academy of Engineering. His main areas of interest are in human performance, mathematical modeling, automation, ergonomics, bio- mechanics, and safety. As part of voluntary activities, Dr. Millan serves as reviewer for the International Journal of Aerospace Psychology. He holds a PhD in industrial engineering from the University of Central Florida, a master of science in aeronautics from Embry-Riddle Aeronautical University, and a bachelor of science in aeronautical engineering from Universidad Nacional Experimental de la Fuerza Armada (UNEFA), Venezuela.

Amy R. Pritchett is the head of the Aerospace Engineering Department and professor at Penn State. Dr. Pritchett received an SB, SM, and ScD in aeronau- tics and astronautics from MIT in 1992, 1994, and 1997, respectively. She has led numerous research projects sponsored by industry, NASA, and the FAA and has also served via IPA as director of NASA’s Aviation Safety Program, responsible for planning and execution of the program ($75-82M/year) conducted at four NASA research centers, sponsoring roughly 200 research agreements and serving on several executive committees, including the OSTP Aeronautic Science and Technology Sub-committee and the executive committees of CAST and ASIAS. She has published over 170 scholarly pub- lications in conference proceedings and in scholarly journals such as Human Factors, Journal of Aircraft, and Air Traffic Control Quarterly. She has also won the RTCA William H. Jackson Award and, as part of CAST, the Collier Trophy, and the AIAA has named a scholarship for her. Professor Pritchett is the editor-in-chief of the Journal of Cognitive Engineering and Decision Making.

Richard J. Ranaudo is a private aerospace consultant who provides flight test support, test piloting, flight training, and university-sponsored short course services. He has extensive operational and flight test experience in military, government, and civil aviation, has flown over 35 different aircraft types, and has logged more than 13,000 hours. Mr. Ranaudo received his initial flight training from the US Air Force and served as a fighter pilot and advanced jet instructor pilot. In 1973, he became a NASA research pilot and over the next 25 years performed a variety of aero-propulsion, air- craft performance, flying qualities, aircraft icing, auditory, and micrograv- ity flight research programs. In 1994, Mr. Ranaudo was appointed the head of the Aircraft Operations Branch at the NASA Glenn Research Center. In 1998, he joined the Bombardier Flight Test Center, Wichita, Kansas as man- ager and senior experimental test pilot for Canadair Flight Test Programs. Mr. Ranaudo joined the faculty at the University of Tennessee Space Institute (UTSI) as an assistant research professor in the Aviation Systems Program. Mr. Ranaudo retired from full-time teaching in 2010 but continues to direct the UTSI sponsored Human Engineering Principles for Flight Deck Evaluation short course. He is also a lecturer and flight/simulator instruc- tor for a short course offered by the Embry Riddle Aeronautical University. Mr. Ranaudo has a BS degree in civil engineering from the University of xxiv Contributors

Connecticut and MS degree in aeronautical and astronautical engineering from the Ohio State University. He has authored over 20 technical publica- tions. Mr. Ranaudo has an FAA Airline Transport Pilot (ATP) license and several type ratings in large jet, turbo-propeller, and reciprocating engine transport aircraft.

Dale Richards has a background in cognitive psychology and human– computer interaction. After completing his University of Wales PhD scholar- ship, he joined QinetiQ (formerly the Defense Evaluation Research Agency) working primarily on defense programs. He is a chartered psychologist, char- tered scientist, and an associate fellow of the British Psychological Society. During his time at QinetiQ, Dr. Richards applied human factors knowledge across different programs ranging from pervasive networks, ubiquitous com- puting, commercial flight decks, and advanced displays for Future Offensive Air Systems (FOAS). For several years, Dr. Richards was the human factors lead for the UK MoD Applied Research Programme (ARP) Autonomy and Mission Management for unmanned systems. This would eventually lead to the successful demonstration of a fast jet controlling multiple unmanned air systems in which he led the design of the displays implemented into the fast jet cockpit (Tornado F2A). Dale was also human factors lead for QinetiQ on the first two phases of civil UK Unmanned Air Vehicle programme– ASTRAEA (Autonomous Systems Technology Related Airborne Evaluation and Assessment). Since joining Coventry University, Dr. Richards has worked on commercial flight deck displays and visual displays. He has continued his research in human–autonomous systems, leading research on human–agent teaming for unmanned systems, single operator control of swarming UAVs, and the application of small UAVs in urban environments. He also has sev- eral research projects surrounding the human element of autonomous cars.

Xavier Servantie is aeronautical engineer working for Thales since 2001. He has been a human factors specialist in the aeronautical domain (fixed and rotor wings) for the last 12 years. Specializing in cockpits ergonomics within Thales, he managed major HF studies on innovative cockpit systems: touch screens, trackballs, but also navigation and piloting displays, head-up dis- plays, and head-mounted displays for augmented reality. He has managed numerous test campaigns in live and simulated environments. His main interest today is HF evaluation of breakthrough avionics technology. He is a private pilot in his spare time and is actually positioned in Thales Avionics Innovation Hub. Xavier Servantie lives in Bordeaux, France. Section I

Perceptual and Cognitive Challenges in Aviation

1 Comprehensive Approach to Pilot Disorientation Countermeasures

Bob Cheung

CONTENTS Comprehensive Solutions to SD ...... 6 Research ...... 7 The Size, Shape of Semicircular Canals and Maneuverability ...... 8 Influence of Neuroplasticity and Specific Orientation Neurons to Orientation ...... 12 Training ...... 12 Spatial Orientation Training in the Military Flight Simulator ...... 15 In-Flight Training ...... 15 Technological Developments for Pilot Disorientation Countermeasures ..... 16 Night Vision Devices ...... 16 Heads Up or Head-Mounted Displays ...... 17 Automated Ground Collision Avoidance Software (Auto GCAS) ...... 17 Degraded Visual Environments (DVE) ...... 17 Conclusions ...... 19 References ...... 20

In peacetime, the most life-threatening aeromedical problems, which the air force might encounter, are spatial disorientation (SD), G-induced loss of con- sciousness (G-LOC), and, to a lesser extent, hypoxia. Spatial disorientation, in general, is defined as the failure to perceive, or to perceive incorrectly, the position, motion, and attitude of the aircraft or oneself within a fixed frame of reference. On or near Earth, the fixed frame of reference is the veridical vertical and the Earth’s horizontal surface. Unlike G-LOC or hypoxia, SD occurs in less well-defined environments and it is influenced by physiologi- cal and perceptual limitations. Assessment of the role played by SD in any mishap may have to rely on circumstantial evidence and is always open to investigator and labelling bias. Mishap analysis often reveals multiple causal factors leading to the final event. New flight display technologies might also

3 4 Improving Aviation Performance contribute to SD susceptibility. The complexity of SD in the flight environ- ment demands a “wide-angle” holistic approach. Some suggestions on future investigation in SD research, training, and technological developments are provided in this chapter. Examples will be provided to demonstrate the necessity of a coordinated effort from operators, scientists, engineers, and research sponsors to lessen the impact of SD on flight safety. Spatial orientation is essential for all animals’ survival. For example, when our ancestors were hunters and gatherers, they were able to go and find food and return to where they were with the food for their families. For pilots, the ability to correctly perceive and maintain spatial orientation is essential for flight safety, survival, effective performance, and mission accomplishment. Spatial orientation is said to be partly a subconscious activity, like breathing, which demonstrates its psychophysiological significance in daily activities. However, it is a well-learned, well-developed perceptual, behavioral, and motor response, which occurs as we interact with our environment since an early age. This subconscious activity can be likened to entrainment, the constant synchronization of organisms to an external rhythm. The mecha- nism of spatial orientation is based on the neural integration of concordant and redundant visual, vestibular, and somatosensory inputs and critical interpretation with an internal model established from past experience and training. It incorporates perception, the physiology of various sensory sys- tems, and the characteristics of their respective neural substrates. Among the passive to active spectrum of perceptual theories, Gregory (1980) pro- posed that: Perceptions were not just simple reproductions of sensory data from the eyes or ears but need to be, had to be constructed by the brain, a construction involving the collaboration of many subsystems in the brain and constantly informed by memory, probability, and expectation. He fur- ther proposed that the brain played with ideas, what we called perceptions were really perceptual hypotheses that the brain constructed and played with. In other words, sensory receptors receive information from the envi- ronment, which is then interpreted and compared with stored information regarding orientation, based on previous training and experience, to arrive at the construct of current reality. In military aviation, SD is not a new phe- nomenon. The Air and Space Interoperability Council (ASIC) of the Five Eyes nations (, Canada, New Zealand, the United Kingdom and the United States), the former Air Standardization Coordination Committee (ASCC) provides a more formal definition of SD that encompasses SD in formation flying as the following:

“Spatial Disorientation (SD) is a term used to describe a variety of inci- dents occurring in flight where the aviator fails to perceive correctly the position, motion or attitude of the aircraft or of him or herself within the fixed co-ordinate system provided by the surface of the earth and the gravitational vertical. In addition, errors in perception by the aviator Comprehensive Approach to Pilot Disorientation Countermeasures 5

of his or her position, motion or attitude with respect to his or her air- craft, or of his or her own aircraft relative to other aircraft, may also be embraced within a broader definition of spatial disorientation in flight” (ASCC AIR STD 61/117/1, p.1).

Research and development over the last eight decades has revealed that spa- tial orientation depends on the timely integration and interpretation of vari- ous sensory cues with concordant and overlapping functional ranges. These sensory cues include visual inputs (focal and ambient vision for object identi- fication and visual guidance, respectively), vestibular inputs for angular and linear acceleration detection, somatosensory inputs (proprioception, touch sensation) for inertial force, and linear acceleration detection and auditory inputs for localization of sound source. The mechanism of spatial orienta- tion in the dynamic flight environment has been discussed in detail (Benson, 1978; Cheung, 2004). In general, SD in flight could result from inadequate, ambiguous or inconsistent sensory inputs that are generally referenced as input error. SD can also result from erroneous perception and misinterpreta- tion of correct sensory inputs referred as central error (Benson, 1978). The possible input and central errors are not mutually exclusive of each other. Lackner (2014) provided a model of spatial orientation that predicts the illu- sory changes in visual and self-orientation when exposed to altered gravi- tational environment. It should also be noted that accident analysis often reveals multiple cause factors leading to the final event. A loss of awareness of the flight path and failure to detect a dangerous flight path could lead to unrecognized SD. Spatial disorientation in flight is a multi-dimensional problem that is influenced by the dynamic operational flight environment. In other words, pilots should be made aware of the contributing factors that precipitate SD. The resulting misperception constitutes only a small part of the mishap scenario. Past and recent mishap statistics suggest that the occurrence of SD-implicated mishaps has not changed dramatically. For example, in the Royal Canadian Air Force (RCAF) between 1982 and 1992, 23% of all cat- egory A accidents (where there is a loss of life and or aircraft was dam- aged beyond repair) involving 24 fatalities were SD-implicated (Cheung, Money, Wright & Bateman, 1995). A follow-up survey (n = 112) based on the ASIC/ASCC SD Survey Questionnaire (Cheung, unpublished survey 2010) revealed that for the rating of “Severity of most recent SD incidents,” there were 73 minor, 10 significant, 2 severe incidents, and 27 returned surveys pro- vided no response. For “Severity of worst ever SD incidents” there were 52 minor incidents, 25 significant, 8 severe, and 27 provided no response. More recently in the US, 31% of the total helicopter accidents with fatalities (2002–2011) were SD-related (Gaydos, Harrigan & Bushby, 2012), and SD con- tributes to 25%–33% of all mishaps with a fatality rate of almost 100% (Gibb, Ercoline & Scharff, 2011). 6 Improving Aviation Performance

Comprehensive Solutions to SD The complexity of SD necessitates multifaceted solutions. Similar to other aeromedical problems, such as G-LOC and hypoxia, a comprehensive solu- tion to limit SD would require research, training, and technological advance- ment. A three-pronged research and development (R&D) countermeasures on the SD problem was indeed proposed by Gillingham (1992) and can be summarized as follows:

1. Elucidate the psychophysiological mechanisms of spatial orientation and disorientation. 2. Develop SD awareness training methods for both ground-based and in-flight application. 3. Design and test concepts for flight instruments and displays that contribute to enhanced spatial orientation.

Continued basic research has contributed to the elucidation of the psy- chophysiological mechanisms of spatial orientation and disorientation. However, as we are dealing with a dynamic operational problem, a practical application of basic science to operational purpose should be emphasized and advanced. Various type-specific SD awareness training can be imple- mented in the short-term, but it should go beyond the demonstration of sensory inadequacies and misperceptions. The training should incorporate how to anticipate SD and the development of spatial strategies for multitude of SD scenarios. Effective training would also require proper evaluation of training performance, efficiency, and constant updates with latest informa- tion. Display designs, layouts, and configurations to determine the true ori- entation of the aircraft can be the eventual solution, but the actual system would require validation in a type-specific flight simulator and flight proven through successful mission operation. As mentioned above, it is clear that the complexity of SD requires a multi-dimensional approach, where some SD occurrences may be reduced by using targeted active simulation training; others would require advance dis- play technologies and sensor suite. However, the proposed R&D approaches should not be considered to be mutually exclusive of each other. Our current understanding of the psychophysiological mechanisms of SD not only will enhance SD awareness training but facilitate the development of appropriate spatial strategies to prevent the predisposition to SD as an interim measure. Further understanding of the psychophysiological mechanism also facili- tates the appropriate design of future flight instrument displays, alternate sensor technologies and development of specific concepts for enhanced spa- tial orientation in flight. The focus of this chapter is to summarize some of Comprehensive Approach to Pilot Disorientation Countermeasures 7 the key advances made under the headings of research, training, and tech- nology and to propose further research and development requirements that are necessary to reduce SD occurrences.

Research The human in space program has greatly improved our understanding of the anatomy, morphology, and limitations of various sensory systems. The individual sensory components compensate for each other’s deficiencies. For example, under visual meteorological conditions (VMC) when unambiguous external visual cues are available, low frequency (<1–2 Hz) visual signals pro- vide reliable orientation information. During degraded visual environment (DVE) and instrument meteorological conditions (IMC), the vestibular inputs provide more reliable orientation information at higher frequencies. The three semicircular canals’ morphology enables the three ampullary organs to decompose mechanically any three dimensional angular movements experienced by the head into three individual components, one component carried by each afferent canal nerve. The angular motion in the maximum response direction of a given semicircular canal will excite, to some degree, the other two semicircular canals. The utricular and saccular otoliths detect linear accelerations including tilt from the gravitational vertical. The auditory system plays a role in identifying the direction of sound and uses that infor- mation to discern our own position, motion, and attitude. The somatosensory system complements the dynamic component of the otolith for sensing linear acceleration. It influences the interpretation of other sensory signals through expectation, and unfortunately at times, the somatosensory system rein- forces false sensations. Although the vestibular system provides an instan- taneous registration of angular and linear acceleration including orientation with respect to gravity, vision is often referred to as the predominant sensory input for spatial orientation because vision is in our conscious prominence. Indeed, it is often said that 80% of the information that a pilot needs in flight is acquired visually (Stott, 2013). However, the contribution of visual-vestibular interaction to spatial orientation should not be overlooked. The flight environment is a dynamic force environment and is distinctly different from the terrestrial environment where we were conceived, where we developed, and where we learned to function every day. Anatomically and physiologically, our sensory receptors developed and evolved for effec- tive functioning under 1 Gz. For example, force experienced on Earth as grav- ity is constant in direction and magnitude. Under the constant gravitational pull of 1 Gz, we are reminded that gravity (where the direction of down is) 8 Improving Aviation Performance indicates alignment with verticality. In a fast jet, during increased linear acceleration, pilots often misperceive the resultant force vector of the inertial force and gravity as the true vertical. As a result, they perceive that the air- craft was climbing too steeply and instinctively push the control column for- ward causing the aircraft to pitch the nose down at a dangerous attitude. The false sensation of pitch is compounded by the inherent inability of the otoliths to distinguish between gravity and linear acceleration, due to its anatomical and morphological design for functioning in the terrestrial environment. Our visual world is relatively Earth-stable in the terrestrial environment; however, the visual environment of flight can be deceptive. For example, with an increase in altitude, visual sense of speed rapidly diminishes, and our judgment of height above ground becomes difficult. Objects appear to be farther away as a result of loss of visual discrimination. Similarly, distance estimation becomes difficult at low levels, and over mountainous and smooth surface terrains. A distant mountain will be slightly bluer in hue and hazier in appearance. Inadequate and loss of visual cues could create false horizon. Pilots tend to falsely perceive a distant city to be on flat terrain and arc below the desired approach slope. Moreover, there are changes in the interaction between the visual world and the force environment. Therefore, one can argue that at least some SD occurrences were due to the inappropriate application of perceptual skills, which were well-learned and developed under a terrestrial environment, to a dynamic and changing gravito-inertial flight environment. Basic research into sensory receptors’ function and perceptual mechanisms in the laboratory is essential. However, it is important to advance the practical consequences of basic research; that is, can technology concepts and applica- tions be formulated for feasibility studies in flight? Regarding research into spatial disorientation, there are a number of areas that are open to further investigations. They are briefly discussed below.

The Size, Shape of Semicircular Canals and Maneuverability The semicircular canals detect angular acceleration, and their size and shape are interpreted as the biomechanical sensitivity of the semicircular canals (Muller, 1994; Oman & Young, 1972) and influence afferent sensitivity (Blanks, Estes & Markham, 1975; Curthoys, Markham & Curthoys, 1977). Qualitative studies with fluid-filled models (Muller & Verhagen, 2002) suggested that the overall impulse in the plane of the anterior semicircular canal is increased due to additional endolymph flow during pitch movements. Therefore, the major determinant of the sensitivity in each semicircular canal is the radius of cur- vature (R), which is often used to express the circumferential arc length of the semicircular canals. The orientations of all six canals determine the relative sensitivity of the vestibular system to angular accelerations in three dimen- sions (Figure 1.1). The ipsilateral semicircular canals are orthogonal to each other. The corresponding left and right semicircular canal pairs are symmet- rical and share equivalent angles. The contralateral synergistic semicircular Comprehensive Approach to Pilot Disorientation Countermeasures 9

FIGURE 1.1 The human vestibular systems showing the relative geometry of the lateral, anterior (superior), and posterior semicircular canals. canals are coplanar. However, recent high-resolution X-ray computed tomog- raphy (CT) revealed substantial deviations from the above assumption in 39 mammalian species, and the degree to which the semicircular canals deviate from orthogonality is negatively correlated with the estimated ves- tibular sensitivity (Berlin, Kirk & Rowe, 2013). There is a consistent association between increases in the radius of curva- ture (R) of semicircular canals and the inferred locomotor and flying agility across a wide sample of mammals and birds. For example, Jones and Spells (1963) noted that the dimensions of the semicircular canals of birds relative to body size were, as a group, longer than other tetrapods (bipedal dinosaur sub- groups). Highly maneuverable birds (pigeons, falcons, raptors) have greater R relative to body size, than less maneuverable ducks and geese (Gray, 1907; Hadziselimovic & Savkovic, 1964). Reconstructions of the labyrinth based on CT scans and canal geometry variables of 178 extant birds and 15 non-avian theropods reveal that there is an increase in the anterior and lateral semicir- cular canal circumferential arc length in highly agile birds. In addition, all three semicircular canals typically undergo torsional excursions out of their respective planes (Sipla, 2007). Torsional excursions out of the respective planes suggest performance enhancement during off planar accelerations (off-vertical and off-horizontal plane). The ability of owls (Money & Correia, 10 Improving Aviation Performance

1972) and pigeons (Bilo & Bilo, 1983) to stabilize their heads during turns and loops while maneuvering in space has long been observed. In pigeons, the anterior semicircular canal is approximately three times longer than the lat - eral and posterior semicircular canals and also has an unusual shape. The anterior semicircular canal has a major and a minor plane that fits into the canal’s circumferential course length (Dickman, 1996). In addition, the plane of the anterior canals deviates from orthogonality with other canal planes. It appears that the anterior canal afferents synthesize contributions from the major and minor plane components of the bony anterior canal structure to produce a resultant sensitivity vector that was positioned between the canal planes. On the contrary, the lateral and posterior semicircular canals of the pigeon are nearly orthogonal, and their afferents had average sen- sitivity vectors that were largely co-incident with the normal of the inner- vated canal plane direction (Dickman, 1996). The degree to which the bony anatomy of an animal’s semicircular canals may reflect vestibular-afferent responses remains uncertain (Graf & Vidal, 1996; Spoor, Wood & Zonneveld, 1996) as the dimensions of the membranous semicircular canals cannot be resolved directly by CT. Nevertheless, the bony canal provides a reflection of the length and diameter of the lumen and orientation of the membranous semicircular canals. Furthermore, the influence of semicircular canal mor- phology on endolymph flow dynamics has been extended to cases where the size, shape, and curvature of the canal lumen change continuously through the duct, utricle, and ampulla (Oman, Marcus & Curthoys, 1987). Unlike avians, human beings did not evolve to fly but they possess an ability to master their environment. From the first powered, sustained, and controlled airplane flights by the Wright brothers in 1903, to the cur- rent fifth-generation fighter aircraft, humans have been able to perform agile flight maneuvers in three dimensional spaces. Micro CT image reconstruc- tion of the bony labyrinth from 40 human temporal bones suggested that the R of the lateral semicircular canal was 20% smaller than those of the ante- rior and posterior semicircular canals. Using a high-resolution CT scan and mathematical modelling of the semicircular canals in living humans (34 ears and 14 complete pairs), the R of the lateral, anterior, and posterior semicircu- lar canals were 2.69 ± 0.20 mm, 3.43 ± 0.23 mm, and 3.32 ± 0.19 mm, respec- tively (Bradshaw et al., 2010). The angles between the three semicircular canals were 92.1, 84.4, and 86.2 degrees between the anterior and posterior, anterior and lateral, and posterior and lateral semicircular canals, respec- tively (Lee, Shin, Kim, Yoo, Song, & Koh et al., 2013). The relative symmetry of the shape and size of the human semicircular canals and the deviations of the three semicircular canals from orthogonality may present another inher- ent but less understood vestibular limitation during extreme flight maneu- vers in any aircraft mishap that involve high frequency and high amplitude pitch rotational accelerations. Extreme flight maneuvers usually combine some basic aerobatic maneu- vers such as an aileron roll, where the aircraft is rolled 360 degrees around Comprehensive Approach to Pilot Disorientation Countermeasures 11 its longitudinal axis (stimulating primarily the posterior semicircular canals), and loop, which is a 360-degree turn in the vertical plane (stimulating primar- ily the anterior semicircular canals). The Cuban Eight is a maneuver that com- bines portions of the loop and roll. The entry requires a high G pull (> +3 Gz) until the aircraft approaches vertical, as the aircraft reaches the inverted position, a half roll flown as a point roll (zero G) is commenced to avoid bar- reling off the reference point. Other extreme flight maneuvers such as the Immelmann turn and Split-S maneuver all require pitch maneuvers with high G pull up followed by roll maneuvers that would stimulate primarily the anterior and posterior canals respectively. For example, the Immelmann turn requires an ascending high G pull half-loop (stimulating primarily the anterior semicircular canals) that finishes with a half-roll out (stimulating pri- marily the posterior semicircular canals), resulting in level flight in the exact opposite direction at a higher altitude. Similarly, the Split-S maneuver, also called a reversed Immelmann turn, requires rolling inverted (stimulating pri- marily the posterior semicircular canals) and finishes with a descending loop (anterior semicircular canals stimulation) to wings level at a lower altitude. It is important to note that it is very rare to have a single cause in any mishap. However, there is a possibility that the lack of sensitivity of the ante- rior and posterior canal afferents might have an impact in the performance of extreme flight maneuvers. In 1984 and 1985, two F-20s, while practicing for an air show, crashed following an Immelmann turn. The visual envi- ronments for the two accidents were very similar with indistinct horizon and lack of contrast between the sky and ground (Cheung, Ercoline & Metz, 2002). From 2002–2016, there were 15 air show accidents that involved the aforementioned maneuvers (Cheung & Ercoline, 2017). They ranged from failure in pulling out of dive and failure to pull out a 45-degree bank. A number of them crashed after performing a Split-S shortly after take-off or crashed while performing a half Cuban 8 and snap rolls. Our knowledge on vestibular threshold, perception of rotation is primarily based on isolated rotation about the vertical or off-vertical axis in the labora- tory primarily stimulating the lateral semicircular canals and the utricular otoliths. In humans, the dynamics of the semicircular canals in response to yaw, pitch, and roll motion were studied indirectly using subjective cupulom- etry and objective measurement of slow phase eye angular velocity to mea- sure the time course of decay (Melvill Jones, Barry & Kowalsky, 1964). The stimulus comprised of rotating the subject about the vertical axis with the subject’s head positioned in the yaw, pitch, and roll orientation with respect to the turntable axis. This type of stimulus is constrained by gravity and influ- enced by the potential contributions from the otoliths and the neck afferents. In order to further advance our understanding of the mechanism of spatial orientation in flight, we need to advance beyond single axis rotation type of investigation as the flight environment involves angular acceleration in the roll, pitch, and yaw planes and often interacts with linear acceleration simul- taneously or sequentially. For example, the execution of rapid roll maneuvers 12 Improving Aviation Performance prior to or following G transition may lead to loss of attitude awareness and reduced G-tolerance (Cheung et al. 2002). Psychophysical investigation into high frequency rotation in the off-vertical and off-horizontal plane in-flight is warranted in order to elucidate the response characteristics of the vestibular system during extreme flight maneuvers. The influence of repeated training on the response characteristics of the vestibular system during extreme flight maneuvers needs to be investigated.

Influence of Neuroplasticity and Specific Orientation Neurons to Orientation Recent neuroscience research suggested that the human brain is not truly organized in terms of systems that process a single sensory modality, such as vision, balance, touch or hearing, but rather processes information about spa- tial relationships, movements, and shapes (Pascal-Leone & Hamilton, 2001). For example, young children with retinoblastoma (causing vision loss from an early age) exhibit a larger volume of auditory cortex (Hoover, Harris & Steeves, 2012; Nys, Aerts, Ytebrouck, Vreysen, Laeremans, & Arckens et al., 2014), which demonstrates an adaptive reorganization of neurons to inte- grate the function of two or more sensory systems (cross modal plasticity). Furthermore, specific neurons involved in navigation have been found. For example, “Head Direction Cells” in the rat’s anterior thalamic region (Taube, Muller & Ranck, 1990; Taube 1998) and “Path Cells” in the entorhinal cortex of neurosurgical patients (Jacobs, Kahana, Ekstrom, Mollison & Fried, 2010) fire only when subjects orient their head in selective directions, turning either clockwise or counterclockwise. The behavior of these cells was also found to be influenced by landmarks as well as motor and vestibular information concerning how the head moves through space. The “Grid Cells” in the ento- rhinal cortex of the rats act as the brain’s Global Positioning System (GPS) indi- cating where they are relative to where they started (Hafting, Fyhn, Molden, Moser, & Moser, 2005; Moser, Kropff & Moser, 2008; Sargolini, Fyhn, Hafting, McNaughton, Witter, Moser, & Moser et al. 2006). Finally, the “Place Cells” in the hippocampus of humans activates when we move into a specific location, so that such groups of Place Cells form a map of the environment (O’Keefe & Burgess, 2005). How neuroplasticity and specific orientation neurons influ- ence the mechanisms of pilot orientation in flight remains to be investigated.

Training Unlike research and technology development, spatial orientation training is relatively inexpensive and can be implemented without hesitation, as our knowledge of the mechanism of spatial disorientation in flight has been Comprehensive Approach to Pilot Disorientation Countermeasures 13 greatly improved since the invention of human controlled flight. The key questions in SD countermeasures training can be summed up as: “When should you teach what, to whom and how?” The scope of SD countermea- sures training is not only limited by the goals and objectives of the train- ing but administrative, logistics, and resource constraints. As noted by Lawson, Curry, Muth, Hayes, Milam, & Brill et al. (2017), in some countries, there appears to be a general lack of continuity and sustainment in such training. Specifically, improving and updating training strategies based on state-of-the art knowledge have not always been implemented at the train- ing command. Very often, the important role of the subject-matter expert in providing training strategies on spatial disorientation countermeasures has largely been neglected. There have been numerous changes and developments in training methods and technologies since passive demonstrations of sensory misperceptions. The most common visual and motion stimulus includes Coriolis cross-coupling sensations, correlated nystagmus, and motor responses on the Barany chair rotating about the single vertical axis. Training recommendations from vari- ous NATO (North Atlantic Treaty Organization) and ASIC countries have been summarized by Lawson et al. (2017). A formal approach in providing factual knowledge in a didactic manner is effective and appropriate for ab initio pilots or pilot candidates. The identification of unrecognized, recog- nized, or incapacitating SD serves a fundamental role in the teaching and training of SD countermeasures. In some cases, advanced technology will not be helpful if the pilot does not recognize that he/she is disoriented (an example is provided below). Demonstration of sensory misperceptions is part of the training; the emphasis should be on the demonstration of the inad- equacies and limitations of the sensory systems. Overemphasis on the result- ing misperception might have inadvertently led to the belief that if one had been exposed to the demonstration, one can prevent or avoid SD mishaps (Cheung 2013). Similarly, classification of SD mishaps based solely on specific sensory misperceptions will also run the risk of ignoring the other contribut- ing factors that precipitate the false perception. The apparent misconception of the dominant role for sensory illusions in SD mishaps investigation and spatial orientation training has been discussed in detail by Cheung (2013) and Stott (2013). The typical visual, vestibular, visual-vestibular, somatosen- sory, and cognitive misperceptions are important elements but not the sole element. More importantly, the resulting physiological, perceptual, and per- formance response to the misperception should be highlighted. They are important foundations to build on in designing spatial strategies to limit SD occurrences. Sensory systems that are involved in orientation constantly interact with each other based on their frequency response range to the avail- able physical motion and optic flow. There are misperceptions that cross the various sensory/cognitive boundaries. Of all the sensory systems, the ves- tibular system perhaps is often misunderstood. It has a resting discharge and does not adhere to an on/off response of visual system or response based on 14 Improving Aviation Performance the characteristics of the highly adaptive somatosensory system. The vestibu- lar input cannot be “turned off” but can be influenced by other sensory inputs depending on the frequency range. There is not a single instant when the vestibular system does not provide any input to our orientation since birth even when one is standing still. Over 80 years ago, Jones and Ocker (1935) advocated the following for the pilots:

“You must simply come to understand your ears, not only for the correct information which that gives to you, but for the incorrect information which they may give you, when flying blind.” Jones & Ocker, 1935, p. 420

With the advance in computing software, commercial SD demonstrators or trainers have become increasingly sophisticated with wide field-of-view visual display and multi degree-of-freedom. They can reproduce some SD scenarios in a controllable and repeatable manner. However, one should be cognizant of the fact that any ground-based devices are limited by the Universal Law of Gravitation. For example, a false sensation of pitch cannot be demonstrated on the ground truthfully without being contaminated by Coriolis cross coupling forces. In some commercial devices, the false sensa- tion of pitch is accomplished by actually pitching the cockpit, which might contribute to negative training. An experienced and well-informed instructor would bring the limitations of our sensory systems into relevance of flights involving rotation about a single or multiple axes rather than presenting the SD demonstrators as “a device designed to induce sensory illusions.” In gen- eral, demonstrations of some of the typical misperceptions in an appropriate context and during appropriate training cycle remain relevant as another fragment of spatial orientation training. Different SD mishaps would require different methods as remedies whether it is based on the Barany chair, sophisticated motion devices or based on simple spatial strategies that were developed for a specific mission that was to be flown. The experiential training model is based on the common premise that individuals learn best by doing and that learning also takes place when con- fronted with reality. Therefore, experienced pilots at an operational training unit or on squadron will appreciate learning the characteristics of disorienta- tion related to type-specific aircraft and potential “SD traps” that may appear in specific mission profiles. In order to select the best solution to counteract specific types of SD or specific SD scenarios that occur in type-specific aircraft or under specific mission requirements, understanding the nature of various SD scenarios remains of paramount importance and should be emphasized. As in most accidents and incidents, there is seldom a single cause for the mishap. A loss of awareness of the flight path and failure to detect a dan- gerous flight path could lead to SD. In other words, pilots should be made aware of the contributing factors (including inappropriate in-flight strategies) Comprehensive Approach to Pilot Disorientation Countermeasures 15 that precipitate the resulting misperception and SD occurrence. This can be achieved by providing the pilots with knowledge and skills to assess the risk of SD during pre-mission briefing and identifying mission profiles in con- junction with weather conditions that are conducive to SD. These procedures will allow the pilots to anticipate the potential for SD occurrence.

Spatial Orientation Training in the Military Flight Simulator Simulator-based training for spatial orientation has been suggested by the UK and US Army Aviation (Estrada, Braithwaite, Gilreath, Johnson, & Manning, 1998; Johnson, Estrada, Braithwaite & Manning, 1999), and it was extensively reviewed by a NATO symposium on Spatial Disorientation in Military Vehicles in 2002 and NATO Task Group (TG 039) on Brownout Mitigation in 2008. The topic was thoroughly discussed in the text book on SD in Aviation (Braithwaite, Ercoline & Brown, 2004). High fidelity and operational oriented military flight simulator as a potential training device for SD countermeasures has been proposed by Estrada, Adam, and Leduc (2002). They observed that training in a flight simulator improved overall situational awareness (SA) and crew coordination skills. In addition, mili- tary simulator-based training will enhance pilots’ ability to incorporate aircraft-type specific mission training profile (e.g., workload, multi-ship) and better prepare pilots to recognize factors, which make SD more likely and improve decision making skills. More recently, simulation-based train- ing using military flight simulators has been further developed by the UK and US Army Aviation. In general, subjective training results are reported to be favorable (Powell-Dunford, Bushby & Leland, 2016). Customized active simulator-based training at multiple stages of a pilot’s career can no doubt be used as one of the methods for spatial awareness training, it would require significant time and resources. Its effectiveness and impact on accident sta- tistics would require robust objective evidence.

In-Flight Training ASIC has developed in-flight demonstration of the limitation of the orienta- tion senses and SD for fast jets, transports, and rotary wing aircraft, which inserted a new level of realism in spatial orientation training (Braithwaite, 1997). In-flight demonstration can consolidate the flying conditions and mis- sion scenarios in type-specific aircraft that can predispose pilots to disori- entation. In addition, they also allow the teachings of spatial strategies and flight maneuvers that can be used to ameliorate the resulting misperceptions and deviations from the intended flight path. However, in-flight demonstra- tion is costly although some of the demonstration can be performed en route to the flying range. It is limited by administrative and logistic constraints as well as acceptance from Commanding Officers. Nevertheless, in-flight SD 16 Improving Aviation Performance demonstration and training can change a pilot’s attitude towards SD, accept- ing SD is part of the risk of flying, and that it cannot be entirely prevented. It consolidates the notion that SD can happen to experienced pilots as well as novice pilots, no one is immune. It also reaffirms the concept that SD is an inappropriate application of learned physiological and perceptual response to an abnormal force environment.

Technological Developments for Pilot Disorientation Countermeasures The comprehensive history of instrument flights was described by Ercoline (2017). Under certain circumstances, enhanced technology in orientation displays and environmental sensors are the necessary and eventual solu- tions for SD countermeasures. In order to highlight the importance of technology-based SD countermeasures, the following outlines some of the advantages and limitations of a few technological advances that have been developed in combating SD.

Night Vision Devices As mentioned earlier, vision is often referred to as the predominant sensory input for spatial orientation because it is in our conscious prominence. However, there are occasions where visual cues are inadequate or unavail- able, for example when flying at night. Night vision devices (NVDs), such as forward-looking-infrared (FLIR) cameras and night vision goggles (NVGs), provide a tremendous advantage for night operations as it permits images to be reproduced at low levels of light that approaching total darkness. It improves nighttime SA for pilots. Many night operations would have been impossible without night vision devices. However, NVD do not turn night into day. The amplification of whatever light that is available lowers our visual acuity and hence provides a degraded image and does not afford color discrimination. In addition, NVGs possess 4–8 times lower contrast sensitivity, which affects depth perception. The narrower field of view (FOV) degrades motion detection, affects spatial awareness, decreases the user’s presence, and influences vestibular sensitivity (Cheung, 2007). Although later generations of NVGs, called panoramic night vision goggles (PNVGs), are available and double the user’s FOV to around 95 degrees; however, the increased separation between sensors could create hyperstereopsis that could potentially affect ease of use and performance at certain distances. Ironically, flying with NVDs have been recognized as a contributor to pilot disorientation. Gaydos et al. (2012) reported that 65% of SD-implicated rotary wing mishaps occurred while using some form of NVDs. Comprehensive Approach to Pilot Disorientation Countermeasures 17

Heads Up or Head-Mounted Displays There are instances when there is insufficient visibility where pilots need to transition from flight conducted in VMC environments to IMC environ- ments. In order to eliminate the need to change accommodation (re-focusing), reduce head down time, and to provide continuous knowledge of real and virtual information in the far domain, various heads-up displays (HUD) and head-mounted displays (HMD) with scene references, projected flight path etc. have been developed. However, there remain some issues that have not been fully resolved such as non-intuitive conformality, cluttering, symbol- ogy location, small FOV, and the lack of standardization of heads-up displays. Similarly for any helmet mounted display, in addition to some of the limi- tations seen in HUD, there exists a conflicting frame of reference between apparent motion of nose-referenced flight symbology and off-axis view of the outside world. Furthermore, head tracking accuracy, repeatability, and latency could influence attitude awareness. Inherent in some HMDs, there is an opti- cal error of parallax where objects appearing to be in different locations when viewed from different angles. In addition, head orientation while wearing HMD, for example during lateral and vertical translation, roll tilt away from azimuth and elevation could lead to potential loss of attitude awareness.

Automated Ground Collision Avoidance Software (Auto GCAS) In recent years, for the fighter communities, ground collision avoidance tech- nology, such as Auto GCAS, has been implemented in a number of existing airframes (e.g., F-16D). Auto GCAS employs a flight control logic that uses a Digital Terrain Elevation Database (DTED) to calculate the aircraft’s position relative to the ground. When the flight control system senses that the aircraft is on a collision course that is outside of normal parameters, it will command the aircraft to roll wings level and execute a +5 Gz pull up to recover. There is some evidence that Auto GCAS may prevent aircraft crashes as a result of GLOC, hypoxia, and cockpit decompression. However, as a remedy for SD mishaps, the pilot must be able to recognize that he/she is disoriented and is able to activate the Pilot Activated Recovery System (PARS) to recover from unusual attitude. It has also been reported that, at least in earlier versions, if the throttle is in the idle position upon activation, the aircraft will lose maneuverability and controllability.

Degraded Visual Environments (DVE) Changing operational environments such as when flying in poor weather con- ditions (for example, when there is snow, sand, dust, fog, and smog) or during NVG flight on low illumination nights <( 1.5 mLux) in “good” weather conditions are commonly referred as degraded visual environment (DVE). Typical DVE conditions include “brownout” when flying in desert and “whiteout/snowball” 18 Improving Aviation Performance when flying in snowy terrain. Flying in DVE presents a different challenge. For example, when flying in the desert, during critical phases of flight such as takeoff and approach to landing in the rotary wings, DVE predisposes pilots to SD. There might be a total loss of external visual reference due to rotor down- wash. In some circumstances, misleading cues can be more dangerous than the absence of cues. For example, circulating dust and sand could provoke visually induced sensation of self-motion (circular or linear vection) when, in fact, the helicopter is stationary or translating/drifting in a different direction. The loss of external visual references would require the pilot to transition from VMC to IMC flying. However, there is an inherent perceptual delay in reacquiring orientation cues when transitioning from VMC to IMC. This perceptual delay will be lengthened if the pilot is disoriented (Cheung, Hofer, Heskin & Smith, 2004). Due to the limitations of the otolith organs, there is a lack of correct feed- back for lateral, longitudinal, and vertical drift. Unfortunately, most instrument displays (especially in legacy ) were developed for cruise flight only. The available displays in legacy helicopters do not contain sufficient informa- tion such as drift, ground slope, terrain features, landing point location, obstacle clearance, and moving obstacle detection. There is an increased risk of mishaps due to unrecognized descent rates, unintended drift, and collision with ground and other aircraft. The information bandwidth is insufficient to communicate the necessary orientation cues in a timely manner to the pilot. Therefore, dur- ing the critical phases of flight and when close to the ground in DVE, there is intolerance for error and corrections. The eventual solution to flying in DVE requires a multi-factorial approach other than pilot ground-based and in-flight procedural training in handling DVE conditions. For example, improved platform specific symbology dis- plays that are specifically designed for the critical phases of flight and han- dling qualities, such as Fly-by-Wire (FBW) technology and Digital Automatic Flight Control System (DAFCS), could ameliorate the lack of correct feed- back from drifts. In addition, advance sensors such as millimeter wave radar, LADAR (Laser Detection and Ranging), RADAR (Radio Detection and Ranging), and LIDAR (Light Detection and Ranging) that can penetrate fine particulates is essential during brownout/whiteout. Dust penetrating sen- sor systems may allow the pilot to “see through” dust and particulates and re-establish external visual reference. However, no single sensor technology can provide the capability to “see through” DVE and provide high-resolution vision over the wide range of requirements for safe helicopter operations in various operational modes. Some level of sensor fusion is necessary, and the cost could be prohibitive. Other longer term solutions include improved understanding and characterization of particulates in order to provide phys- ical and chemical abatement of dust and sand. Of all the possible solutions for DVE flights, effective cueing in terms of symbology display is the enabler of all the aforementioned approaches. The criteria for effective cueing include minimum cognitive processing, mini- mal latency, and an increase in overall SA without increasing workload and Comprehensive Approach to Pilot Disorientation Countermeasures 19 allowing division of attention without coning of attention. In keeping with the optimal criteria for effective cueing, the design of any orientation dis- play must address the specific inadequacies of the sensory systems, pilots’ preference, and ease of training, implementation, and egress. The orientation cueing system should have a very “natural feel” similar to typical helicopter references during a VMC approach. As an example of symbology concept development, a 3D conformal system, with reference to an existing legacy display, and a 2D system, with improved scaling specifically designed for DVE, were evaluated both in a modified Griffon flight simulator (Cheung et al. 2015a) and in a Bell 412HP helicopter, similar to the RCAF CH146 Griffon (Cheung et al. 2015b). The 3D conformal system (HDTS-DVE, Elbit System Ltd) uses an augmented reality principle where symbols are placed accurately on the real world ahead of the aircraft and viewed through a helmet mounted display (i.e., earth referenced and mimics real-world perspective cueing). The 2D system with improved scal- ing (BOSS, ARMDEC, US Army) provides a scaled indication of acceleration, drift, ground speed, rate of descent, and rate of closure towards a pre-planned landing point. For detail description and pictorial presentation of the two symbology display systems, please refer to the aforementioned journal arti- cles by Cheung et al. (2015a, 2015b). Based on the results of subjective and objective evaluation, the data indicated that the conformal 3D landing grid with virtual towers, horizontal grid, and designated landing zone overcome the vestibular inadequacies (lack of feedback in longitudinal, lateral, and ver- tical drifts) and provide the necessary orientation cues to land the aircraft safely without external visual references. Specifically, the vertical towers pro- vide an intuitive cue of yaw motion, lateral drift, and to a lesser extent, longi- tudinal drift. The virtual vertical speed with respect to the virtual RADALT (Radar Altimeter) on the vertical tower provides information on rates of descent. A “guiding (dynamic) caret” aligned the aircraft with the static caret and ensured the aircraft arrived at the designated landing point. The guiding caret allowed for the determination of horizontal drift. The seemingly more intuitive 3D virtual reference shortens the latency in re-acquisition of orienta- tion cues (especially in lateral drift) when transitioning from VMC to IMC. The exact mechanism requires further laboratory investigation. Therefore, the 3D conformal system with an effective interface could negate the need for an expensive upgrade to heavily augmented digital flight control systems.

Conclusions The essential elements among the multifaceted approach to disorienta- tion countermeasures includes basic and applied research; ground-based, simulator-based, and in-flight-based training; and technology development 20 Improving Aviation Performance in orientation displays, detection, and ranging sensors. Any research and development must follow through the various stages of Technology Readiness Level (TRL) assessment from “Basic principles observed and reported” to the point of where the “Actual concept and system are flight proven through mission operations” (TRA Guidance, 2011). In order to ensure success, the most important element is to bring together basic scientists, applied scien- tists, engineers, operators, commanders, and research sponsors to bridge the gap between research, technological advances, applications, and operations. Specifically, a closer coordination between research scientists, training com- mands, and industry is imperative.

References Air Standardization Coordinating Committee (ASCC). (1997). Aviation Medicine/ Physiological Training of Aircrew in Spatial Disorientation. AIR STD 61/117/1. Arlington, VA: ASCC. Benson, A. J. (1978). Spatial disorientation—General aspects. In J. Ernsting, & P. King (Eds.), Aviation medicine (pp. 2772–2796). Boston, MA: Butterworth. Berlin, J. C., Kirk, E. C., & Rowe, T. B. (2013). Functional implications of ubiquitous semicircular canal non-orthogonality in mammals. PLoS ONE, 8(11), e79585. doi:10.1371/journal.pone.0079585. Bilo, D., & Bilo, A. (1983). Neck flexion related activity of flight control muscles in the flow-stimulated pigeon.Journal of Comparative Physiology, 153, 111–122. Blanks, R. H. I., Estes, M. S., & Markham, C. H. (1975). Physiological characteristics of vestibular first-order neurons in the cat. II. Response to angular acceleration. Journal of Neurophysiology, 38, 1250–1268. Bradshaw, A. P., Curthoys, I. S., Todd, M. J., Magnussen, J. S., Taubman, D. S., Aw, S. T., & Halmagyi, G. M. (2010). A mathematical model of human semicircular canal geometry: A new basis for interpreting vestibular physiology. Journal of the Association for Research of Otolaryngology, 11, 145–159. Braithwaite, M, G. (1997) The British Army Air Corps spatial disorientation demon- stration sortie. Aviation, Space & Environmental Medicine, 68, 342–345. Braithwaite, M. G., Ercoline, W. R., & Brown, L. (2004). Spatial disorientation instruc- tion, demonstration, and training. In F. Previc, & W. Ercoline (Eds.), Spatial disorientation in aviation. Progress in Astronautics and Aeronautics (Vol. 203, pp. 323–377). Reston, VA: American Institute of Aeronautics and Astronautics. Cheung, B. (2004). Spatial orientation—Nonvisual spatial orientation mechanisms. In F. Previc, & W. Ercoline (Eds.), Spatial disorientation in aviation. Progress in Astronautics and Aeronautics (Vol. 203, pp. 37–94). Reston, VA: American Institute of Aeronautics and Astronautics. Cheung, B. (2007, April). Vestibular suppression while using NVGs NATO RTA HFM 141 Symposium on “Human Factors and Medical Aspects of Day/Night All Weather Operations: Current Issues and Future Challenges, Crete, Greece. Cheung, B. (2013). Spatial disorientation—More than just illusion. Aviation, Space & Environmental Medicine, 84, 1211–1214. Comprehensive Approach to Pilot Disorientation Countermeasures 21

Cheung, B., & Ercoline, W. (2017) Why can’t we orient like birds? Implication for human spatial disorientation in flight [Abstract]. Aerospace Medicine & Human Performance, 88(3), 289. Cheung, B., Craig, G., Steels, B., Sceviour, R., Cosman, V., Jennings, S., & Holst, P. (2015b). In-flight study of helmet mounted symbology system concepts in degraded visual environment. Aerospace Medicine & Human Performance, 86, 714–722. Cheung, B., Ercoline, W., & Metz, P. (2002, April). G-transition induced loss of orienta- tion and reduced G threshold. NATO RTA HFM Symposium of Spatial Disorientation in Military Vehicles—Causes, consequences and cures. A Coruña, Spain. Cheung, B., Hofer, K., Heskin, R., & Smith, A. (2004). Physiological and behavioral responses to false sensation of pitch. Aviation, Space & Environmental Medicine, 75, 657–665. Cheung, B., McKinley, A., Steels, B., Sceviour, R., Cosman, V., Jennings, S., & Holst, P. (2015a). Simulator study of helmet mounted symbology system concepts in degraded visual environment. Aerospace Medicine & Human Performance, 86, 588–598. Cheung, B. (2010). Internal survey on spatial disorientation accidents and incidents in the Royal Canadian Air Force. Unpublished manuscript. Cheung, B., Money, K., Wright, H., & Bateman, W. (1995). Spatial disorientation impli- cated accidents in Canadian Forces 1982–1992. Aviation, Space & Environmental Medicine, 66, 579–585. Curthoys, I. S., Markham, C. H., & Curthoys, E. J. (1977). Semicircular duct and ampulla dimensions in cat, guinea pig and man. Journal of Morphology, 151, 17–34. Dickman, J. D. (1996). Spatial orientation of semicircular canals and afferent sensitiv- ity vectors in pigeons. Experimental Brain Research, 111, 8–20. Ercoline W. R. (2017). The history of instrument flight. In M. A. Vidulich, P. S. Tsang, & J. Flach (Eds.), Advances in aviation psychology (Vol. 2, pp. 3–38). London, UK: Taylor & Francis Group. Estrada, A., Adam, G. E., & Leduc P. A. (2002, April). Use of simulator spatial disori- entation awareness training scenarios by the U.S. Army and National Guard. NATO RTA HFM Symposium of Spatial Disorientation in Military Vehicles—Causes, consequences and cures. A Coruña, Spain. Estrada, A., Braithwaite, M. G., Gilreath, S. R., Johnson, P. A., & Manning, J. C. (1998). Spatial disorientation awareness training scenarios for U. S. Army aviators in visual flight simulators (USAARLReport No. 98–17). Fort Rucker, AL: U.S. Army Aeromedical Research Laboratory. Gaydos, S. J., Harrigan, M. J., & Bushby, A. J. (2012). Ten years of spatial disorientation in US Army rotary-wing operations. Aviation, Space & Environmental Medicine, 83, 739–745. Gibb, R., Ercoline, W., & Scharff, L. (2011). Spatial disorientation: Decades of pilots fatalities. Aviation, Space & Environmental Medicine, 82, 717–724. Gillingham, K. K. (1992). The spatial disorientation problem in the United States Air Force. Journal of Vestibular Research, 2, 297–306. Graf, W., & Vidal, P. P. (1996). Semicircular canal size and upright stance are not inter- related. Journal of Human Evolution, 30, 175–181. Gray, A. A. (1907). The labyrinth of animal (Vol. 1). London, UK: Churchill. Gregory, R. J. (1980). Perception as hypotheses. Philosophical Transaction of the Royal Society London. Series B, Biological Science, The Psychology of Vision, 290(1038), 181–197. 22 Improving Aviation Performance

Hadziselimovic, H., & Savkovic, L. J. (1964). Appearance of semicircular canals in birds in relation to mode of life. Acta Anatomica, 57, 306–315. Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., Moser, E. I. (2005). Microstructure of a spatial map in the entorhinal cortex. Nature, 436, 801–806. Hoover, A. E., Harris, L. R., & Steeves, J. K. E. (2012). Sensory compensation in sound localization in people with one eye. Experimental Brain Research, 216, 565–574. Jacobs, J., Kahana, M. J., Ekstrom, A. D., Mollison, M. V., & Fried, I. (2010). A sense of direction in human entorhinal cortex. Proceedings of the National Academy of Sciences of the United States of America, 107, 6487–6492. Johnson, P. A., Estrada, A., Braithwaite, M. G., & Manning, J. C. (1999). Assessment of sim- ulated spatial disorientation scenarios in training U.S. Army aviators (USAARL Report No. 2000-06). Fort Rucker, AL: U.S. Army Aeromedical Research Laboratory. Jones, G. M., & Spells, K. M. (1963). A theoretical and comparative study of the func- tional dependence of the semicircular canal upon its physical dimensions. Proceedings of the Royal Society, Series B, 157(968), 403–419. Jones, I. H., & Ocker, W. C. (1935). Flying blind, a study in the physiology of the VIII nerve. The Larynoscope, XLV, 405–420. Lackner, J. R. (2014). An earth bound perspective on orientation illusion experienced in aerospace flight. In M. A. Vidulich, P. S.Tsang, & J. Flach (Eds.),Advances in aviation psychology (Vol. 1, pp. 29–48). New York: Taylor & Francis Group. Lawson, B. D., Curry, I. P., Muth, E. R., Hayes, A. M., Milam, L. S., & Brill, J. C. (2017). Training as a countermeasure for spatial disorientation (SD) mis- haps: Have opportunities for improvement been missed? Educational Notes Paper NATO-STO-EN-HFM-265. Lee, J. Y., Shin, K. J., Kim, J. N., Yoo, J. Y., Song, W. C., & Koh, K. S. (2013). A Morphometric study of the semicircular canals using micro-CT images in three-dimensional reconstruction. The Anatomical Record, 296, 834–839. Melvill Jones, G., Barry, W., & Kowalsky, N. (1964) Dynamics of the semicircular canals compared in yaw, pitch and roll. Aerospace Medicine, 35, 984–989. Money, K., & Correia M. (1972). The vestibular system of the owl. Comparative Biochemistry and Physiology Part A: Physiology, 42, 355–358. Moser, E. I., Kropff, E., &Moser, M. B. (2008). Place cells, grid cells, and the brain’s spatial representation system. Annual Review Neurosciences, 31, 69–89. Muller, M. (1994). Semicircular duct dimensions and sensitivity of the vertebrate ves- tibular system. Journal of Theoretical Biology, 167, 239–256. Muller, M., & Verhagen, J. H. G. (2002). Optimization of the mechanical performance of a two duct semicircular duct system. Part 2: Excitation of endolymph move- ments. Journal of Theoretical Biology, 216, 425–442. Nys, J., Aerts, J., Ytebrouck, E., Vreysen, S., Laeremans, A., & Arckens, L. (2014). The cross-modal aspect of mouse visual cortex plasticity induced by monocular enucleation is age dependent. Journal of Comparative Neurology, 522, 950–970. O’Keefe, J., & Burgess, N. (2005). Dual phase and rate coding in hippocampal place cells: Theoretical significance and relationship to entorhinal grid cells. Hippocampus, 15, 853–866. Oman, C. M., & Young, L. R. (1972). The physiological range of pressure difference and cupula deflections in the human semicircular canal: Theoretical consider- ations. Acta Otolaryngologica (Stockholm), 74, 324–331. Comprehensive Approach to Pilot Disorientation Countermeasures 23

Oman, C. M., Marcus, E. N., & Curthoys, I. S. (1987). The influence of semicircular canal morphology on endolymph flow dynamics. An anatomically descriptive mathematical model. Acta Otolaryngology, 103, 1–13. Pascal-Leone, A., & Hamilton, R. (2001). The metamodal organization of the brain. In C. Casanova, & M. Ptito (Eds.), Progress in brain research, Chapter 7, (Vol. 134, pp. 1–19). Elsevier Science B.V. Powell-Dunford, N., Bushby, A., & Leland, R. (2016). Spatial disorientation training in the rotor wing flight simulator. Aerospace Medicine & Human Performance, 87, 890–893. Sargolini, F., Fyhn, M., T Hafting, T., McNaughton, B. L., Witter, M. P., Moser, M.-B., & Moser, E. I. (2006). Conjunctive representation of position, direction, and veloc- ity in entorhinal cortex. Science, 312, 758–762. Sipla, J. S. (2007). The semicircular canals of birds and non-avian theropod dinosaurs. Ph. D. Dissertation in Anatomical Sciences, Stony Brook University, New York. Spoor, F., Wood, B., & Zonneveld, F. (1996). Evidence for a link between human semi- circular canal size and bipedal behaviour. Journal of Human Evolution, 30, 183–187. Stott, R. (2013). Orientation and disorientation in aviation. Extreme Physiology & Medicine, 2(2), 1–11. Taube, J. S. (1998). Head direction cells and the neurophysiological basis for a sense of direction. Progress in Neurobiology, 55, 225–256. Taube, J. S., Muller, R. U., & Ranck. J. B. (1990). Head-direction cells recorded from the postsubiculum in freely moving rats. I. Description and quantitative analysis. Journal of Neuroscience, 10, 420–435. Technology Readiness Assessment (TRA) Guidance, United States Department of Defense. (2011, April). Retrieved from https://www.acq.osd.mil/chieftechnologist/ publications/docs/.

2 Influences of Fatigue and Alcohol on Cognitive Performance

Hans-Juergen Hoermann

CONTENTS Fatigue Risk Factors in Aviation ...... 26 Prevalence of Fatigue in Aviation ...... 29 Risk of Alcohol Intoxication in Aviation ...... 32 Comparison of Fatigue and Alcohol Effects on Performance...... 34 Method and Procedure ...... 34 Results ...... 36 Discussion...... 45 Conclusions ...... 47 References ...... 47

In air-transportation, passengers and airline managers trust that all per- sonnel involved in the daily operation are always competent, well rested, and regularly and sufficiently trained when performing their job tasks at the required level of safety and efficiency. To achieve this, the respective organizations need to provide the conditions, facilities, and guidance to support the mental and physical fitness of their employees when they go to work. In this same notion, it is the individual’s responsibility to keep their level of proficiency up-to-date and to recover from work during resting times to be fit enough for their next duty period. The mental and physical conditions of operators can be affected by various circumstances. Among them, excessive levels of fatigue, alcohol, and other drugs are known to lead to serious impairment of cognition, motor control, risk perception, communication, and well-being at least temporarily. If these influences are chronically prolonged, they can even severely degrade the general health status and medical fitness to fly (Ansieu, Marquie, Tucker, & Folkard, 2013; Jackson & Earl, 2006). Aviation authorities have defined legal limits for the consumption of drugs and alcohol prior to or during duty times. However, no such limits have been established for the level of fatigue. While the blood-alcohol concentration

25 26 Improving Aviation Performance can be measured, for example, by breathalyzers, no such direct measure of the operator’s current level of fatigue is available. In order to prevent flight crew fatigue-related performance degradation, maximum legal flight duty times and minimum rest times have been ascertained instead (European Commission, 2014; Federal Aviation Administration, 2012). Still, debates are ongoing as to whether flight time limitations are sufficiently conservative to consider individual differences. For example, these regulations do not take into account the possibility that a flight crew member may not be well rested when commencing duty (Moebus Aviation, 2008; National Research Council, 2011). This chapter presents an overview of specific safety risks related to opera- tor fatigue and alcohol intoxication in aviation. Empirical findings of a labo- ratory study are reported in which effects of alcohol and fatigue on cognitive performance and self-assessments of performance are compared.

Fatigue Risk Factors in Aviation The International Civil Aviation Organization (ICAO) has defined fatigue as “a physiological state of reduced mental or physical performance capabil- ity resulting from sleep loss, extended wakefulness, circadian phase, and/or workload (mental and/or physical activity) that can impair a person’s alertness and ability to perform safety-related operational duties” (ICAO, 2016, p. xv). According to the most common bio-mathematical models of fatigue risk, fatigue can be explained through the interaction of homeostatic (sleep pressure equivalent to time being awake) and circadian influences (circadian phase) (e.g., Borbély, 1982). Some models add sleep inertia (Åkerstedt & Folkard, 1990), task- related factors (e.g., time-on-task, workload; Tritschler & Bond, 2010), individual factors (e.g., life-style, chronotype; Van Dongen, Baynard, Maislin, & Dinges, 2004) or cumulative effects (Van Dongen, Maislin, Mullington, & Dinges, 2003) to the prediction of fatigue risk. A vast amount of field and laboratory research has been published that illustrates the existence of specific fatigue risk factors especially for pilots, cabin crewmembers as well as air traffic controllers (e.g., Caldwell & Caldwell, 2016; Nealley & Gawron, 2015; Sallinen & Hublin, 2015). A non- exhaustive list of external fatigue risk factors for flight crews contains the following:

• Alternate shift work, sometimes without a regular pattern • Crossing multiple time-zones within a short time • Standby and reserve duties, changing rosters Influences of Fatigue and Alcohol on Cognitive Performance 27

• Commuting times between home base and departure/arrival airport • Reduced sleep quality due to hotel overnights • Complex tasks being executed under time pressure or after long periods of passive monitoring

It is commonly agreed that a healthy adult human can fully satisfy the basic need for sleep with an average of approximately eight hours good quality sleep per day (Caldwell & Caldwell, 2016; Duffy, Zitting, & Czeisler, 2015; Van Dongen et al., 2003). The influences enumerated above can disturb the natural sleep regulation in different ways. The first two of the listed factors can cause desynchronization of the circadian rhythm, which means that the physiological sleep-wake cycle is not in synchrony with external cues of day and night. For example, Samel and his colleagues at the German Aerospace Center (DLR) found that the sleep duration of flight crews during a layover on long-haul night flights across the equator is on average two hours shorter than the eight hours baseline sleep prior to the flight (Samel et al., 1997). Due to the night shift, the pilots went to bed after arrival at about 8 a.m. (CET, time at home base) and got up again at about 2 p.m. (CET). Even though the night duty caused a sleep deprivation of about 22 hours, the sleep during this day- time layover was significantly shorter and disturbed by frequent awaken- ings. Consequently, critical fatigue ratings of above 12 points on the Samn and Perelli Fatigue Checklist (Samn & Perelli, 1982) had been observed three times more often during the return flight than during the outbound flight. Also EEG (Electroencephalography) measurements showed twice as many micro-sleeps in form of spontaneous alpha activity during the return flight. Findings from long-haul transmeridian flights were very similar. While none of the pilots during the westbound daytime flights showed critical subjective fatigue ratings, during the eastbound night flight from the US to Europe 20% of the pilots rated their level of fatigue above the limit where according to Samn and Perelli performance deficits become probable. Fatigue problems are exacerbated if aircrews have to conduct multiple transmeridian flights in succession without sufficient acclimatization in between (Samel, Wegmann, & Vejvoda, 1995). With regard to the other factors, scientific research has shown that sleep taken when on “standby” or “on-call duty” is shorter and of poorer quality than normal sleep (Torsvall & Åkerstedt, 1988). For short-haul operation, stud- ies have found that the number of sectors flown and the duty length were the most important influences on fatigue (Powell, Spencer, Holland, Broadbent, & Petrie, 2007). This was confirmed recently by Honn, Satterfield, McCauley, Caldwell, and Van Dongen (2016), however, they stated that the fatiguing effects of multiple take-offs and landings were just modest. Dinges, Graeber, Rosekind, Samel, & Wegmann (1996) identified airport standby reserve as a critical factor, which can lead to a sleep deficit of aircrew members if it is not 28 Improving Aviation Performance considered as being equivalent to duty times. A crew member who is being on airport standby is required to be available at the airport for flight duty assignment. Because crew resting facilities at the airport are either missing or severely lacking standard the flight duty times have to be reduced when having been on prior reserve. Even when on on-call standby, Goerke and Soll (2014) found that pilots of a regional airline reported a significant reduction of sleep quality in addition to potential friction in their family. Niederl (2007) found in a longitudinal study of 28 short-haul pilots that their sleep period time is one or two hours shorter during a hotel overnight compared to aver- age sleep periods at home. As a result of shift work and ongoing poorer sleep quality during layovers, the risk arises that sleep debt accumulates and affects performance in the form of chronic fatigue, even if the duration of recent resting times was as required (Van Dongen et al., 2003). A significant proportion of airline pilots need to commute prior to duty. According to a survey conducted by the National Research Council (NRC, 2011) with 17,519 commercial pilots, more than half of the pilots men- tioned a home-to-airport distance of over 150 miles, about a quarter even more than 750 miles. If the commuting time is counted as off-duty time, commuting pilots might arrive already fatigued when commencing their duty. If commuting results in inadequate sleep, the pilot has to decide within the context of the flight schedule, the airline’s respective policy, and contrac- tual implications, about still being fit to fly. However, the NRC report (2011) concluded that it is unsure of whether safety problems regarding pilot com- muting can be solved by a regulatory approach because of lacking realistic data about the actual effects of different pilot commuting practices. Though it sounds reasonable to expect higher levels of fatigue as a result of physical and/or mental exertion, research findings with regard to task-related factors of fatigue (such as time-on-task, workload, complexity, monotony) are not consistent. For example Powell et al., (2007) found a linear relationship between subjective fatigue scores in short-haul operations and duty length as well as the numbers of sectors flown on. This was confirmed by Honn et al. (2016). However, in Honn et al.’s study, the fatiguing effect of multiple take-offs and landings was only modest. Evaluations of flight performance did not even differ significantly between multi-sector duty days and single sector duty days in the same group of pilots. Also, in flight simulator night missions Hoermann, Gontar, and Haslbeck (2015) found that mental work- load was only related to subjective ratings of fatigue but unrelated to objec- tive performance based indicators such as the Psychomotor Vigilance Task (PVT) (Dinges & Powell, 1985). Compared to effects of high workload, the effects of low workload are rather clear. Wickens, Hutchins, Laux, and Sebok (2015) confirmed that complex task performance is less degraded by fatigue compared to simple tasks. The authors proposed that task complexity could uplift the individual’s arousal and partly compensate for a fatigue induced lack of alertness. The effects of task monotony (e.g., through low demanding routine activity or constant/ Influences of Fatigue and Alcohol on Cognitive Performance 29 repetitive stimulation) gain more weight the longer the task itself lasts and the higher the current level of fatigue. Grech, Neal, Yeo, Humphreys, and Smith (2009) provided evidence from the maritime sector for a non-linear relation- ship between workload and fatigue. As was proposed by Spencer, Robertson, and Folkard (2006), the authors found that low and high workloads were associated with the highest fatigue. As a factor that augments fatigue, task monotony is already present for many operators across various transporta- tion modes (Sallinen & Hublin, 2015) and could become even more critical in the future when level of automation keeps rising (Gawron et al., 2011). Many laboratory studies with hundreds of subjects provided empirical evidence that cognitive performance can be adversely affected by conditions of restricted sleep. A few seminal meta-studies provide an overview of the general findings (Caldwell & Caldwell, 2016; Lim & Dinges, 2010; Wickens et al., 2015). Lim and Dinges reported moderate to large effect sizes for the cognitive domains, simple and complex attention, processing speed, work- ing memory, and short-term memory. No significant effects were found on reasoning. These results were basically confirmed by Wickens at al. who con- cluded that simple cognitive tasks were more impacted by disruptive sleep than complex cognitive tasks. This evidence suggests the notion that vigi- lance is the fundamental process impaired by deprived sleep (Lim & Dinges, 2010). Also in flight simulator studies conducted, for example, by Caldwell, Caldwell, Brown, and Smith (2004), Petrilli, Thomas, Dawson, and Roach (2006), and Previc, Lopez, Ercoline, Daluz, Workman, Evans, and Dillon (2009), a significant disruption of flying performance of experienced pilots after continuous wakefulness of 24 hours and more was demonstrated for main tasks including crew decision-making.

Prevalence of Fatigue in Aviation A few years ago, the European Cockpit Association (2012) summarized several surveys on pilot fatigue carried out between 2010 and 2012 in eight different countries across Europe. More than 6,000 commercial pilots were asked to participate. As an indicator for the prevalence of fatigue problems, about half of the surveyed pilots reported falling asleep involuntarily in the cockpit at least once while flying. In the UK, one-third of these pilots men- tioned that the other pilot was also asleep when they woke up. Airline man- agers are often not aware of this critical problem, because only 20%–30% of the European pilots said that they would actually file a report if feeling too tired for duty. Similar figures were found in a NASA survey of about 1,500 corporate pilots. As reported by Rosekind, Co, Gregory, and Miller (2000), actually about 65% of these pilots admitted that due to severe fatigue they had dozed off during flight operations. 30 Improving Aviation Performance

In a study of 162 short-haul pilots of low-cost carriers and scheduled air- lines, Jackson and Earl (2006) stated that even though severe fatigue was reported more frequently by low-cost airline pilots, fatigue problems can- not be attributed solely to the work schedules of low-cost carriers. The main factor for higher fatigue scores was the fact that those pilots were regularly flying into their “discretion hours” (extension of flight duty due to the com- mander’s decision) and flying into discretion time occurred equally often in low-cost as in scheduled airlines. However, pilots of low-cost carriers and cargo airlines did have less trust in their management’s support to effectively manage the issue of fatigue as was identified in another major survey on safety culture with 7,239 European pilots (Reader, Parand, & Kirwan, 2016). Though the reduction of fatigue-related accidents has been an issue on the National Transportation Safety Board (NTSB) Most Wanted List of safety improvements for many years, the number of accidents where pilot fatigue has been clearly identified as a contributing factor is not that large, but the trend is increasing. Marcus and Rosekind (2015, 2017) reviewed 182 major NTSB investigations in different modes of transportation completed between 2001 and 2012 and found that fatigue was cited in 20% of the investigations as a probable cause, as a contributing factor, or as a finding (proportions rang- ing from 40% of highway investigations and 23% of aviation investigations to 4% of marine investigations). One problem is that (compared to alcohol, drugs, and medication), the level of the operator’s fatigue cannot be deter- mined with a simple blood test. Even if it was concluded that the crew was fatigued at the time of an accident or incident, for example by checking the voice recorder, or the prior resting time or the length of the duty period, it still remains cumbersome to prove that fatigue-related performance impair- ments had a direct influence on the course of the event (Caldwell & Caldwell, 2016; Goode, 2003; Rosekind et al., 2001; Sumwalt, 2017). Cases of aviation incidents and accidents where fatigue had been involved are regularly reviewed by NTSB (e.g., Rosekind, 2014; Sumwalt, 2017). Several incidents had been reported where both pilots were asleep and lost radio contact in flight for several minutes. In February 2008, for example, a Bombardier CL-600 jet flew with 40 passengers on autopilot on a constant heading at cruising altitude past its destination airport in Hawaii when air traffic control (ATC) tried to contact the crew repeatedly for 18 minutes. Because the airplane was already fueled for the return flight to Honolulu on the same day, the crew and passengers landed safely with a slight delay. Contributing to the incident was an undiagnosed obstructive sleep apnea (OSA) of the captain and a busy work schedule with several consecutive days of early-morning departures (NTSB, 2009). In 2009, NTSB cited pilot fatigue as a causal factor in an incident when a Boeing 767 mistakenly landed at night on a taxiway instead of the parallel runway in Atlanta. The captain had been awake for 11 hours prior to departure from the night before. Because one of the other crewmembers fell ill during the flight, he could Influences of Fatigue and Alcohol on Cognitive Performance 31 not leave the cockpit during the 10-hour flight to take his customary rest. Nevertheless, the flight crew elected to continue the flight to its destination resulting in 22 hours of time awake for the captain and 14 hours for the . Fortunately, there was no other traffic on that taxiway at the time of landing (NTSB, 2010). Although the company’s flight operations manual stated that it was the pilot’s responsibility to report properly rested for duty and that no pilot should feel pressured to fly if not well rested, in subsequent interviews with the crew, it was found that the company had no formal fatigue risk management program in place and that the pilots were unaware of information about sleep/fatigue management prior to the incident. In 2011, the NTSB investigated a number of incidents where airplanes lost radio contact because the air traffic controller had fallen asleep. These cases usually happened during night shifts like in March 2011, when two incoming domestic flights into Washington D.C.’s Reagan National Airport had to land without any assistance from the tower controller. Being in his fourth consecutive night shift that week, the lone controller fell asleep for approximately 25 minutes (NTSB, 2011). A survey with over 3,000 US air traffic controllers found that the average amount of sleep between midnight shifts was only 5.5 hours (Orasanu et al., 2012). Compared to other nations like Germany or Japan, the Federal Aviation Administration (FAA) cur- rently does not permit controllers to take a nap, not even during breaks. In response to these incidents, the FAA eliminated single-controller shifts on several 24-hour airports and required a minimum of nine hours rest period between shifts. There are also a few cases reported where pilot performance was adversely affected after a period of long inflight napping. So-called sleep inertia can lead to deficits in post-sleep performance for up to 15 minutes or longer after awak- ening (Ferrara & De Gennaro, 2000). On an overnight flight from Toronto to Zurich in 2011, a Boeing 767 went into an abrupt dive when one of the crew members pushed the control column forward to avoid an imminent collision with alleged opposite traffic, which in fact was 1000 feet below as displayed on the traffic alert and collision avoidance system (TCAS) instrument. The pilot had just woken up from a 75-minute nap and was startled by outside visual cues due to his sleep inertia. Fourteen passengers and two flight attendants were injured (Transportation Safety Board of Canada, 2012). After the failure to abort an un-stabilized approach into the Mangalore/India airport, a Boeing 737 overshot the landing runway and exploded in 2010, killing 152 passengers and six crew members. The investigation authority concluded that the cap- tain of that flight had a prolonged sleep during that night flight from Dubai to Mangalore. Sleep inertia probably impaired his judgment when he woke up shortly before the approach (Gokhale, 2010). On an early morning return flight from Palma de Mallorca to Munich, the crew of an Airbus 330 suffering from severe fatigue made a precautionary landing after declaring “Pan-Pan- Pan” to ATC. After the emergency call, the crew received priority and direct 32 Improving Aviation Performance approach vectors by ATC and performed an uneventful automatic landing. The crew had been on duty for over 10 hours and stated their attentiveness toward ATC had become limited because they were no longer able to cor- rectly acknowledge instructions. They described that it became impossible to correctly add up numbers and increasingly difficult to think clearly (German Federal Bureau of Aircraft Accident Investigation BFU, 2012).

Risk of Alcohol Intoxication in Aviation In the aftermath of the tragic accident of an Airbus 320 in 2015 in which apparently the co-pilot deliberately crashed into the French Alps, several preventive safety measures were proposed by the European Aviation Safety Agency (EASA). Among those measures is the systematic drug and alcohol testing of flight crewmembers upon employment accompanied by random alcohol screening of flight and cabin crew within the ramp inspection pro- gram (European Aviation Safety Agency, 2016). Referring to the low positive rate for random alcohol testing for commercial flight crews in the US, its use- fulness as a safety barrier was criticized (e.g., Horne, 2015; Li, Baker, Qiang, Rebok, & McCarthy, 2007). On the other hand, it was reasoned that random testing has deterrent effects. In fact, the number of accidents in commercial aviation with one of the pilots being alcohol intoxicated seems very small. A UK review study of medical-cause fatal accidents in commercial air transport between 1980 and 2011 resulted in an estimate that every year only one accident in 100 mil- lion flight hours can be categorized as a medical-cause accident and less than half of those cases involved drugs or alcohol (Mitchell & Lillywhite, 2013). However, to draw conclusions, more data are required, especially for non-fatal accidents and incidents. A FAA report, which included also the general aviation, revealed that out of 1,353 pilots who were fatally injured in aviation accidents between 2004 and 2008, 7% were found with a postmortal alcohol level at or above the FAA limit of 0.04%. Most of these pilots (94%) were operating small non-commercial aircraft at the time of the accident. No pilot was flying under Part 121 flight rules (Canfield, Dubowski, Chaturvedi, & Whinnery, 2012). As already stated by Damkot and Osga (1978), it appears that especially general aviation pilots show a somewhat lax attitude towards alcohol and flying. In a survey the authors estimated that between 27% and 32% of the responding 362 general aviation pilots had no safety concerns about flying with a blood alcohol concentration (BAC) above the legal thresh- old. Among the reasons for drinking alcohol are, according to Hawkins (1987), social after-flight habits and winding down from stress or facilitating sleep (see also Cook, 1997; Graeber, 1988). Influences of Fatigue and Alcohol on Cognitive Performance 33

Compared to acute alcohol intoxication while on duty, hangover effects may also cause subtle deficits to crewmembers’ performance. Simons (2015) cited several studies where BAC-levels between 0.07% and 0.10% the evening before, resulted in performance impairment during tasks of simulated flying and radio communication the morning after. This effect is often referred to as Post Alcohol Impairment (Cook, 1997). The effects of alcohol on human behaviors are generally well understood; alcohol significantly impairs pilots’ performance, even at very low BAC- levels (Cook, 1997). Studies of performance under the influence of alcohol have been conducted in real aircraft (Billings, Wick, Gerke, & Chase, 1973), in flight simulators (Davenport & Harris, 1992; Mundt & Ross, 1993; Ross & Mundt, 1996), or in the laboratory environment (Dry, Burns, Nettelbeck, Farquharson, & White, 2012; Hindmarch, Kerr, & Sherwood, 1991; ). Billings et al. concluded that even at a BAC-level of 0.04% piloting performance would be incompatible with flight safety due to an increasing rate of procedural errors. Though there appears to be some inconsistency in flight simulator studies with respect to the effects of low BAC-levels (< 0.10%) on flight per- formance, Mundt and Ross (1993) and Ross and Mundt (1996) summarized in their reviews that pilot performance was impaired especially when work- load was high or unexpected problems occurred. As Cook (1997) stated, “that a pilot’s performance may not be impaired by alcohol whilst he is engaged primarily in familiar tasks, but that his performance might deteriorate sig- nificantly when faced with novel or unexpected circumstances” (pp. 543). In addition, there is evidence that sustained G-forces and effects of hypoxia exacerbate the effects of alcohol on task performance (Cook, 1997). In laboratory settings, Dry et al. (2012) and Hindmarch et al. (1991) examined alcohol effects on a broader scope of cognitive and psychomotor tests. At alcohol doses of 0.10% and 0.07%, Hindmarch et al. found the larg- est effect sizes of performance impairment for recognition reaction times, secondary task reaction times, and psychomotor performance, while short term memory and speed of motor responses were less and partly insig- nificantly affected. Dry et al. (2012) found that the most sensitive areas of performance to lower BAC-levels of about 0.05% were processing speed of visual performance, false alarms in a sustained attention task, and working memory, while other tasks were either not affected up to 0.10% BAC (errors of omission in a sustained attention task, strategic problem solving) or required higher doses (> 0.08%) to reveal performance dete- rioration (psychomotor functions). Speed-accuracy trade-off when being alcohol-intoxicated also seems to be dose-related (Tiplady, Drummond, Cameron, Gray, Hendry, Sinclair, & Wright, 2001). That means at higher BAC-levels fast performance will cost substantial errors. According to the review by Modell and Mountz (1990), further negative influences of alco- hol on behaviors include the fact that pilots could not accurately judge their own degree of performance impairment even 14 hours after alcohol 34 Improving Aviation Performance ingestion and that the natural sleep stages are disturbed if alcohol is used as a sleep facilitator, which can lead to a reduced recreation effect and daytime sleepiness (Graeber, 1988).

Comparison of Fatigue and Alcohol Effects on Performance In this final section, findings of a laboratory study that compared the effects of alcohol and fatigue on cognitive performance and self-assessments of performance are reported. The intention is to examine whether fatigue and alcohol affect the same or a different set of cognitive functions. According to an earlier study by Dawson and Reid (1997) with 40 subjects, the impair- ment of psychomotor performance in an unpredictable tracking task after 17 hours of wakefulness was equivalent to the performance impairment cor- responding to 0.05% BAC. After 24 hours of sustained wakefulness, the level of psychomotor performance was equivalent to that observed at a BAC-level of 0.10%. Williamson and Feyer (2000) administered a broader spectrum of cognitive and psychomotor tasks to 39 subjects. With a randomized cross- over design, sleep deprivation and alcohol were applied as treatments in this experiment. The test battery covered the Mackworth Clock vigilance test, a simple reaction time test, tracking, dual tasking (tracking plus simple reac- tion times), a symbol-digit test, spatial memory search, a memory search test, and grammatical reasoning. The authors found that task performance after about 17–19 hours of wakefulness corresponded to the performance at a BAC- level of 0.05% for most tests. Vigilance appeared to show the earliest effects after 17 hours, and dual tasking the latest effects after 19 hours. Performance for grammatical reasoning and memory search remained relatively stable after sleep deprivation. However, the question remains as to whether effects of alcohol and fatigue can be reduced to quantitative differences only. Described below is an experimental sleep deprivation study conducted at the DLR to examine the influences of sustained wakefulness and alcohol on a number of cognitive and psychomotor tasks (Elmenhorst et al., 2013). The goal was to compare the levels of performance after each treatment and additionally to analyze the accuracy of prospective and retrospective self- assessments related to task performance.

Method and Procedure Participants and experimental protocol. In a sleep laboratory, cognitive and psychomotor performance of 46 volunteers (20 female, mean age 26.5 years (SD = 5.1 years) were monitored during 12 consecutive days and nights. All participants were mentally and physically fit with no recent history of sleep disorders, medication, and drug or alcohol abuse. After one adaptation and two baseline days, ‘with participants’ sleep was restricted to induce different Influences of Fatigue and Alcohol on Cognitive Performance 35

FIGURE 2.1 Experimental protocol for the interventions during the sleep deprivation experiment. Each row represents one session with up to eight subjects. ARR = Arrival, ADAP = Adaptation, B = Baseline, TSD = Total sleep deprivation, R = Recovery, PSD = Partial sleep deprivation, PSA = Partial sleep deprivation with prior alcohol consumption. levels of performance impairment (total sleep deprivation [TSD], 38 hours awake), partial sleep deprivation (PSD, one night sleep restriction to four hours), and partial sleep deprivation after moderate alcohol intake (PSA, sleep restricted to four hours and prior BAC-level of 0.07%). Alcohol was applied in form of gin between 4 p.m. and 5 p.m. The maximum BAC-level of 0.07% (s = 0.01) was measured at 6 p.m. Additionally, every other day at 12 p.m., the oxygen level of the inhaled air was reduced to 15%. However, these hypoxia effects will not be discussed here. These interventions were administered in a randomized cross-over design with two recovery nights with eight hours of time-in-bed (TIB) between the conditions. The participants were accom- modated in the sleep laboratory in six sessions with up to eight subjects at a time (Figure 2.1). During time-not-in-bed, participants performed a battery of performance tests in three-hour intervals (in a total of 63 times). Before the study, all participants were adjusted to a standardized sleep-wake cycle (TIB 11 p.m. to 7 a.m.) and received a comprehensive briefing and intensive training in all performance measures. Performance tests. The computerized test battery included a ten-minute version of the Psychomotor Vigilance Task (PVT) (Dinges & Powell, 1985), the Unstable Tracking Task (UTT) (Santucci et al., 1989), a mental concentra- tion test (MCO), a spatial orientation test (SPO), and a perceptual speed test (PSP). Each of the test sessions took approximately 60 minutes (Hoermann, Mischke, Elmenhorst, & Benderoth, 2016). By following a four-step pro- cedure, the MCO involves symbol-digit conversions and simple arithme- tic. The SPO requires left-right distinctions and mental figure rotations. The PSP measures the ability to rapidly read the pointer positions in two 36 Improving Aviation Performance instruments and to count objects presented visually for about two seconds. Numbers of correct responses (C) and numbers of commission errors (E) were counted for the MCO, SPO, and PSP tests. Performance of the PVT test was measured by the median of the reaction times (RT) and the num- ber lapses (L, reaction times longer than 500 ms), and the UTT with track- ing deviations (DV) and number of control losses (LC). Therefore, all tests have measures of speed (quantity of performance) and accuracy (quality of performance). Self-assessment of fatigue and performance. Before each test session, subjective fatigue was assessed using the Subjective Fatigue Check card (FAT) (Samn & Perelli, 1982). Subjects reported their current levels of fatigue with ratings of ten different mental states. Compared to the original, we preferred an inverted scale ranging from 10 to 30 (instead of 20 to 0) for the total score, so that higher values indicated higher levels of fatigue. Samn and Perelli categorized the total score into four different fatigue levels related to performance capabilities: class I—severe fatigue (27–30; perfor- mance definitely impaired); class II—moderate to severe fatigue (23 to 26; some performance impairment probable); class III—mild fatigue (19 to 22; performance impairment possible); class IV—sufficiently alert (10 to 18; no performance impairment). These categories were converted to the 10 to 30 scale. Self-assessment of performance (SAP): Subjects were asked to assess their perceived level of performance on a number of anchored 6-point Likert scales ranging from 1 = Min to 6 = Max. Four self-rating scales were directly related to one of the administered performance tests UTT, MCO, SPO, and PSP. Self- ratings for the PVT were not available because a performance feedback was displayed after each response. Subjects assessed their expected performance level immediately before each testing session and retrospectively after each session. No extra feedback on the test performance was provided. A total score for self-confidence of performance (SAP) was calculated by averaging all prospective self-assessments.

Results At first we illustrate the development of subjective fatigue (FAT) and self- confidence of performance across the time period of three days and two nights with the participants being deprived of sleep during the first night (Figure 2.2, top). During these three days, subjective fatigue develops according to the classification of Samn and Perelli from the lowest category of being sufficiently alert (Day 1, 9 a.m. to 6 p.m.) via mild fatigue (first night, 12 a.m. to 6 a.m.) to moderate fatigue after 26 hours of continuous wakeful- ness. After the recovery night with eight hours TIB, the participants felt sufficiently alert like on the day before the sleep deprivation (Day 3, 9 a.m. to 6 p.m.). The total score for the self-assessments of performance develops correspondingly. Influences of Fatigue and Alcohol on Cognitive Performance 37

FIGURE 2.2 Development of subjective fatigue (FAT) and performance self-assessments (SAP) during the interventions with total sleep deprivation (TSD; top) and with partial sleep deprivation plus alcohol intoxication (PSA; bottom). The enlarged data markers “O” indicate the peak BAC- Level of 0.07%. The error bars depict standard errors of means.

In Figure 2.2 (bottom), three days and two nights are depicted with alcohol application in the afternoon of the first day and the following night with sleep being restricted to four hours. During the second night participants had the usual eight hours TIB. Compared to the effects of total sleep depriva- tion, subjective fatigue seems unaffected as a result of alcohol intoxication. However, it is clearly visible that the performance self-assessment drops in relation to the alcohol intake, while the FAT-scores seem to remain unaf- fected. With some delay, approximately three hours after the peak BAC-level, subjective fatigue is increasing to mild fatigue (at 9 p.m.). Subjective fatigue seems to remain slightly elevated after the recovery night with eight hours TIB (Figure 2.2, bottom). Figures 2.3 and 2.4 allow a comparison of the extent of performance impairments due to the different interventions. Each of the ten charts covers 38 Improving Aviation Performance

FIGURE 2.3 Development of performance scores over three days and two nights. First night was totally sleep restricted (TSD), second night was for recovery. Enlarged data markers “◻” indicate the time of reduced oxygen in inhaled air (day 2, 12 p.m.). The error bars depict standard errors of means. (Continued) Influences of Fatigue and Alcohol on Cognitive Performance 39

FIGURE 2.3 (Continued) Development of performance scores over three days and two nights. First night was totally sleep restricted (TSD), second night was for recovery. Enlarged data markers “◻” indicate the time of reduced oxygen in inhaled air (day 2, 12 p.m.). The error bars depict standard errors of means.

three days and two nights. In Figure 2.3, the participants had to stay awake during the first night (TSD). InFigure 2.4, they stayed in bed for only four hours during the first night (PSA). Before the partial sleep restriction the participants drank an amount of alcohol up to a BAC-level of about 0.07%. This is indicated by the enlarged round-shaped data markers in Figure 2.4. The enlarged square-shaped data markers at day 2 indicate the treatment with slight hypoxia. For each of the five tests, two performance measures are shown: a quantitative parameter (reaction times for the PVT, tracking deviations for UTT, number of correct responses for MCO, SPO, and PSP) and a qualitative parameter (number of lapses for PVT, number of control 40 Improving Aviation Performance

FIGURE 2.4 Development of performance scores over three days and two nights. First night was partially sleep restricted after consuming alcohol in the afternoon of day 1 (PSA), second night was for recovery. Enlarged data markers indicate the peak BAC-Level “O” and the time of reduced oxygen in inhaled air “◻” (day 2, 12 p.m.). The error bars depict standard errors of means. (Continued) Influences of Fatigue and Alcohol on Cognitive Performance 41

FIGURE 2.4 (Continued) Development of performance scores over three days and two nights. First night was partially sleep restricted after consuming alcohol in the afternoon of day 1 (PSA), second night was for recovery. Enlarged data markers indicate the peak BAC-Level “O” and the time of reduced oxygen in inhaled air “◻” (day 2, 12 p.m.). The error bars depict standard errors of means. losses for UTT, error rates for MCO, SPO, and PSP). Performance impair- ment after total sleep deprivation appears considerably larger than after partial sleep deprivation. Alcohol intoxication seems to affect the tests with a psychomotor component (PVT and UTT) more clearly than the cognitive tests (MCO, SPO, PSP). The extent of the impairment was evaluated with the analyses of variance described below. In order to determine the statistical significance of the effects of sleep deprivation and alcohol on cognitive and psychomotor performance, we conducted four multivariate analyses of variance with repeated measures (MANOVA). The effect of total sleep deprivation (TSD) was analyzed with one within-subjects factor: time of measurement at 9 a.m. the day before TSD (TSD 1, 9 a.m.) versus 9 a.m. after 26 hours of time awake (TSD 2, 9 a.m.). 42 Improving Aviation Performance

Two sets of dependent variables were entered: (a) the quantitative perfor- mance scores and (b) the qualitative performance scores. The independent factor in the MANOVAs for the effect of alcohol on performance was the time of measurement at 6 p.m. with the maximum BAC-level of 0.07% (PSA1, 6 p.m.), against the 6 p.m. measures two days later after one full recovery night subsequent to the additional intervention with partial sleep depriva- tion (PSA 3, 6 p.m.). The same sets of dependent variables as for the TSD effects were used in these two MANOVAs. All four MANOVAs resulted in significant multivariate F-Tests for the within-subject factor. This means that TSD and ALC significantly impaired both quantitative and qualitative aspects of cognitive and psychomotor performance. In order to compare the magnitude of the effects on the different cognitive and psychomotor func- tions the results of all univariate and multivariate F-Tests are compiled in Tables 2.1 and 2.2. Both quantitative and qualitative aspects of cognitive and psychomotor performance were affected by sleep deprivation and by alcohol intoxication. While the largest effects for sleep deprivation were observed for the PVT and the MCO, for the alcohol intoxication, the PVT and the UTT were affected most strongly. The results found suggested that fatigue adversely affected the executive functions, especially attentional control. Qualitative and quan- titative performance aspects of the PVT and the MCO show the largest effect sizes. The alcohol intervention had larger effects on the psychomotor func- tions of the PVT and UTT (reaction times and manual tracking).

TABLE 2.1 Multivariate and Univariate Tests of Significance for the Effects of Total Sleep Deprivation at 9 a.m. Dependent Variable Significance Effect Size

a) Performance quantity 2 PVT_RT F(5, 40) = 76.05, p = 0.000 ɳp = 0.79 2 UTT_DV F(5, 40) = 57.65, p = 0.000 ɳp = 0.63 2 SPO_C F(5, 40) = 124.48, p = 0.000 ɳp = 0.57 2 MCO_C F(5, 40) = 112.99, p = 0.000 ɳp = 0.74 2 PSP_C F(5, 40) = 17.95, p = 0.000 ɳp = 0.72 2 Multivariate test F(5, 40) = 29.42, p = 0.000 ɳp = 0.29

b) Performance quality 2 PVT_L F(5, 41) = 38.50, p = 0.000 ɳp = 0.46 2 UTT_LC F(5, 41) = 12.52, p = 0.001 ɳp = 0.22 2 SPO_E F(5, 41) = 13.31, p = 0.001 ɳp = 0.23 2 MCO_E F(5, 41) = 30.14, p = 0.000 ɳp = 0.40 2 PSP_E F(5, 41) = 10.40, p = 0.002 ɳp = 0.19 2 Multivariate test F(5, 41) = 10.10, p = 0.000 ɳp = 0.55 Influences of Fatigue and Alcohol on Cognitive Performance 43

TABLE 2.2 Multivariate and Univariate Tests of Significance for the Effects of Alcohol Intoxication at 6 p.m. Dependent Variable Significance Effect Size

a) Performance quantity 2 PVT_RT F(5, 41) = 26.16, p = 0.000 ɳp = 0.37 2 UTT_DV F(5, 41) = 21.24, p = 0.000 ɳp = 0.32 2 SPO_C F(5, 41) = 17.50, p = 0.000 ɳp = 0.28 2 MCO_C F(5, 41) = 11.32, p = 0.002 ɳp = 0.20 2 PSP_C F(5, 41) = 9.30, p = 0.004 ɳp = 0.17 2 Multivariate test F(5, 41) = 5.86, p = 0.000 ɳp = 0.42

b) Performance quality 2 PVT_L F(5, 41) = 3.55, p > 0.05, n.s. ɳp = 0.07 2 UTT_LC F(5, 41) = 0.70, p > 0.05, n.s. ɳp = 0.02 2 SPO_E F(5, 41) = 6.25, p = 0.016 ɳp = 0.12 2 MCO_E F(5, 41) = 3.55, p > 0.05, n.s. ɳp = 0.07 2 PSP_E F(5, 41) = 9.88, p = 0.003 ɳp = 0.18 2 Multivariate test F(5, 41) = 4.43, p = 0.003 ɳp = 0.35

Because of these different effects of fatigue and alcohol on performance, it seems a bit simplistic to state that effects of sleep deprivation correspond to an “x-level” of alcohol intoxication. This rather depends on the performance area in question. The stronger the effects of alcohol intoxication are the longer the time being awake needs to be to reach equivalent performance decrements. For example, according to our results shown in Figure 2.3 and 2.4, we could state that for perceptual speed, mental concentration, and spatial orientation approximately 17 hours of time being awake correspond to the performance at a BAC-level of 0.07%, while it is 21 or 22 hours for sustained vigilance and manual tracking. The final question to be examined is whether the participants themselves realized that their performance was impaired when being fatigued or under the influence of alcohol. With correlation analyses, we compared the corre- spondence between self-assessments of performance prior to and after the execution of the respective test under both interventions and under baseline conditions. For the PVT, no self-assessments were available. To avoid con- fusion about the direction of the relations, the UTT deviation scores were converted into accuracy scores (AC). For the quantitative performance mea- sures, the higher the scores, the better the performance. For the qualitative performance measures, the larger the scores, the worse the impairment. All correlations for the four tests are shown in Tables 2.3 and 2.4. Subjective fatigue ratings are not reported here because for both interven- tions (TSD and ALC) the correlations were negligible and mostly not sig- nificant. The self-assessments showed a systematic difference. While under 44 Improving Aviation Performance

TABLE 2.3 Correlations between Self-Assessment of Performance and Actual Test Performance under Baseline Conditions and after Total Sleep Deprivation (TSD 2, 9 a.m.)

Baseline 9 a.m. Total Sleep Deprivation 9 a.m.

Self-Assessment Self-Assessment

Pre Post rmean Pre Post rmean UTT_AC 0.14 0.35a 0.25 0.26 0.55b 0.41 UTT_LC −0.25 −0.28 −0.27 −0.19 −0.55b −0.38 SPO_C 0.15 0.20 0.17 0.44b 0.48b 0.46 SPO_E −0.38b −0.37a −0.37 −0.37b −0.25 −0.31 MCO_C 0.04 0.08 0.06 0.43b 0.44b 0.43 MCO_E −0.27 −0.16 −0.21 −0.44b −0.28 −0.36 PSP_C 0.35a 0.46b 0.40 0.20 0.34a 0.27 PSP_E −0.36a −0.35a −0.36 −0.35a −0.27 −0.31 rmean Correct 0.17 0.28 0.33 0.45 rmean Errors −0.32 −0.29 −0.34 −0.34 Note: ap < 0.05, bp < 0.01 (two-tailed). Mean correlations > 0.30 are in bold.

TABLE 2.4 Correlations between Self-Assessment of Performance and Actual Test Performance under Baseline Conditions and after Alcohol Intoxication (PSA 1, 6 p.m.)

Baseline 6 p.m. Alcohol Intoxication 6 p.m.

Self-Assessment Self-Assessment

Pre Post rmean Pre Post rmean UTT_AC 0.29a 0.21 0.25 0.18 0.44b 0.31 UTT_LC −0.20 −0.34a −0.27 0.04 −0.23 −0.10 SPO_C 0.09 0.05 0.07 0.07 0.15 0.11 SPO_E −0.29a −0.28 −0.29 0.17 0.07 0.12 MCO_C 0.09 0.07 0.08 0.08 0.19 0.14 MCO_E −0.32a −0.23 −0.27 −0.08 0.05 −0.02 PSP_C 0.17 0.36a 0.27 0.16 0.38b 0.27 PSP_E −0.27 −0.37a −0.32 −0.31a −0.47b −0.39 rmean Correct 0.16 0.17 0.12 0.29 rmean Errors −0.27 −0.30 −0.05 −0.15 Note: ap < 0.05, bp < 0.01 (two-tailed). Mean correlations > 0.30 are in bold. Influences of Fatigue and Alcohol on Cognitive Performance 45 baseline conditions as well as under alcohol intoxication, only a few coef- ficients were significant, most correlations increase substantially after sleep deprivation. This observation holds not only for the retrospective assessments (which are generally larger), but also for the prospective self-assessments prior to the corresponding test. This means that participants seemed to be aware of their impaired performance capabilities when being fatigued but not when being under the influence of alcohol. These findings are based on the laboratory setting in our study only. However, if such findings could be replicated in real-life settings, they would have important implications for self-control and the decision-making of a human operator. If the self- confidence of one’s own performance is unrealistic under the influence of alcohol, the risk related to a proposed course of action could not be assessed adequately. For the effects of fatigue, this issue seems less critical, at least as long as no outside pressure affects the decision process.

Discussion The results of our laboratory study confirm that fatigue leads to a considerable disruption of cognitive and psychomotor performance in all administered tests. The tests of spatial orientation, mental concentration, and perceptual speed are validated aptitude tests used for pilot and air traffic controller selection (e.g., Goeters, 1998). Therefore, performance deteriorations in these tests are of specific relevance for the aviation domain. The largest fatigue effects were found for the PVT and the MCO. As a test of sustained attention, the PVT has proven its sensitivity to sleep deprivation in many previous studies (Lim and Dinges, 2010). The reaction times increased in our study by about 30 ms (15% slower than under baseline conditions), and the number of attention lapses jumped from one to nine per 10 minutes after 26 hours without sleep. The MCO is a self- paced test, which requires mental control in order to focus attention on relevant information while inhibiting distracting stimuli. Being sleep deprived, the average performance dropped by nearly 40%, while the number of commission errors tripled. In order to explain the effects of sleep deprivation on cognitive performance, Lim and Dinges (2008, 2010) referred to the vigilance hypothesis, which proposes that the primary cognitive function being adversely affected is sustained attention and that a lower level of vigilance or situation awareness could account for most of the performance changes under fatigue. However, the large effects for the MCO, which we found, do not fit into this picture, because the MCO clearly measures executive attention. Therefore, it seems that two separate aspects of attention—the alertness level as well as attention control— deteriorated most when participants were fatigued. At a BAC-level of 0.07%, performance also declined in all five tests. Most affected were the PVT reaction times and the UTT manual tracking perfor- mance. Because the reaction times of the PVT are related also to psychomo- tor skills (Hoermann, Uken, & Voss, 2012), we conclude that alcohol has a larger impact on psychomotor performance compared to the effects of fatigue. 46 Improving Aviation Performance

Another difference is that the qualitative measures (number of lapses, control losses, commission errors) were aggravated only for the SPO and the PSP. All effect sizes for sleep deprivation on performance (Table 2.1) were clearly larger than for alcohol intoxication (Table 2.2). More specifically, the test performance for SPO, MCO, and PSP after 17 hours’ time being awake corresponded to the average performance at a BAC-level of 0.07%. For PVT and UTT, sustained wakefulness needs to be 21 or 22 hours to impair performance with a similar extent as alcohol intoxication with BAC of 0.07%. The findings of the reported study do not indicate any speed-accuracy trade- off in test performances as reported by Tiplady et al. (2001). If performance quantity was higher at the expense of errors, under the influence of alcohol, a different pattern of effects would have emerged in Table 2.2. However, for both interventions, sleep deprivation and alcohol influence, performance dec- rements related to response speeds (PVT), control precision (UTT), and num- ber of correct results (MCO, SPO, PSP) were more salient and, according to the calculated effect sizes, also larger than the increments of errors. There were no indications in our data that participants speeded up their performance at the cost of errors when being fatigued or under alcohol influence. A remarkable distinction was found for the accuracy of the self-assessments of test performance. When participants were asked to assess their expected individual performance prior to a test, they could do this with a reasonable degree of accuracy when being fatigued but not when being under the influ- ence of alcohol. We conclude from this fact that fatigue does not seem to distort the realistic self-image of performance capabilities. This confirms a study on self-monitoring of performance during simulated night shifts by Dorrian, Lamond, Holmes, Burgess, Roach, Fletcher, & Dawson (2003). In contrast to the effects of alcohol intoxication, even after 26 hours of wakefulness, most correlations between expected and achieved test results were significant. No such relationship could be found for the alcohol condition. In real-life settings, such unrealistic views of one’s own performance could lead to a severely biased judgment and risk management. In an earlier flight simula- tor study, Morrow, Leirer, Yesavage, and Tinklenberg (1991) confirmed that at least younger pilots (mean age = 25.3) had a biased view on their flying ability when being under the influence of different BAC-levels. For the subjective fatigue ratings, we did not find significant correlations with the performance scores under the different conditions. However, the intraindi- vidual development of the mean FAT-scores reflects a pattern compatible with the expectations from homeostatic and the circadian influences. Between 3 a.m. and 6 a.m. during the total sleep deprivation conditions, FAT-scores cross the line from mild to moderate/severe fatigue, which corresponds to 20 to 23 hours being awake. However, especially for the cognitive tests, significant perfor- mance decrements occurred already after 16 hours of sustained wakefulness. Therefore, it seems advisable for operators being on duty to not only rely on their subjective feeling of fatigue before taking precautionary actions, but also to consider the actual time being awake as a serious decision gate. Influences of Fatigue and Alcohol on Cognitive Performance 47

Conclusions The figures of aviation accidents are rising where fatigue had been involved in one way or another (Marcus & Rosekind, 2015). Increasing levels of auto- mation at the future workplaces of pilots and air traffic controllers and the present lack of qualified staff will most probably lead to longer duty times with more monotonous time periods that provides little stimulation for the highly skilled employees. Resilience against fatigue cannot be trained or gained by experience (Caldwell & Caldwell, 2016). As a safety risk in avia- tion, fatigue has received a high level of public attention for decades, but the implemented regulatory countermeasures still have to prove their efficiency. In our comparative study, we have only examined the effects of acute fatigue after short-term sleep deprivation on performance. We have not considered additional aggravating influences of individual differences or of accumu- lated fatigue. However, the acute effects, especially on attention after 16 of 17 hours of wakefulness, were equivalent to the effects of alcohol intoxication far above the legal limits. With the implementation of Fatigue Risk Management Systems (FRMS) (International Civil Aviation Organization, 2016) the aviation organiza- tions have a powerful tool to identify safety gaps in their daily operation. Provided that the company culture is supportive to the employees to report their observations and encountered fatigue issues to the management, FRMS can become influential countermeasure to prevent operators’ fatigue of reaching critical thresholds. However, FRMS are not a “silver bullet.” It will remain the responsibility of the individuals to use their off-duty times for recreation from work and stress and to realistically assess the actual level of fitness when reporting for duty. For example, a non-negotiable limit for the subjective fatigue level as well as for alcohol and drugs as specific items on a “personal preflight checklist” of pilots can be a powerful strategy to coun- teract these threats from leading to errors and/or undesired states during the operation.

References Åkerstedt, T., & Folkard, S. (1990). A model of human sleepiness. In J. A. Horne (Ed.), Sleep’90. Bochum, Germany: Pontenagel Press. Ansieu, D., Marquié, J. C., Tucker, P., & Folkard, S. (2013). Longitudinal study of the effects of shift work on health: Analysis of VISAT (ageing, health and work) data. Report submitted to the IOSH Research Committee. Leicester, UK: Institute of Occupational Safety and Health. Billings, C. E., Wick, R. L., Gerke, R. J., & Chase, R. C. (1973). Effects of ethyl alcohol on pilot performance. Aerospace Medicine, 44, 379–382. 48 Improving Aviation Performance

Borbély, A.A. (1982). A two process model of sleep regulation. Human Neurobiology, 1, 195–204. Caldwell, J. A., & Caldwell, J. L. (2016). Fatigue in aviation. A guide to staying awake at the stick (2nd ed.). New York: Routledge. Caldwell, J. A., Caldwell, J. L., Brown, D. L., & Smith, J. K. (2004). The Effects of 37 hours of continuous wakefulness on the physiological arousal, cognitive performance, self-reported mood, and simulator flight performance of F-117A pilots. Military Psychology, 16, 163–181. Canfield, D. V., Dubowski, K. M., Chaturvedi, A. K., & Whinnery, J. E. (2012). Drugs and alcohol found in civil aviation pilot fatalities from 2004 to 2008. Aviation, Space, and Environmental Medicine, 83, 764–770. Cook, C.C.H. (1997). Alcohol and aviation. Addiction, 92, 539–555. Damkot, D.K., & Osga, G.A. (1978). Survey of pilot’s attitudes and opinions about drinking and flying. Aviation, Space and Environmental Medicine, 49, 390–394. Davenport, M., & Harris, D. (1992). The effect of low blood alcohol levels on pilot per- formance in a series of simulated approach and landing trials. The International Journal of Aviation Psychology, 2, 271–280. Dawson, D., & Reid, K. (1997). Fatigue, alcohol and performance impairment. Nature, 388, 235. Dinges, D. F., Graeber, R. C., Rosekind, M. R., Samel, A., & Wegmann, H. (1996). Principles and Guidelines for Duty and Rest Scheduling in Commercial Aviation. NASA Technical Memorandum No. 110404. Moffett Field, CA: NASA Ames Research Center. Dinges, D. F., & Powell, J. W. (1985). Microcomputer analysis of performance on a portable, simple visual RT task during sustained operations. Behavior Research Methods, Instruments & Computers, 17, 652–655. Dorrian, J., Lamond, N., Holmes, A. L., Burgess, H. J., Roach, G. D., Fletcher, A., & Dawson, D. (2003). The ability to self-monitor performance during a week of simulated night shifts. Sleep, 26, 871–877. Dry, M. J., Birns, N. R., Nettelbeck, T., Farquharson, A. L., & White, J. M. (2012). Dose- related effects of alcohol on cognitive functioning. PLoS ONE, 7(11): e50977. doi:10.1371/journal.pone.0050977. Duffy, J. F., Zitting, K.-M., & Czeisler, C. A. (2015). The case for addressing operator fatigue. Reviews of Human Factors and Ergonomics, 10, 29–78. Elmenhorst, E.-M., Hoermann, H.-J., Oeltze, K., Pennig, S., Rolny, V., Verjvoda, M., Schießl, C. et al. (2013). Validierung eines fitness-for-duty tests zur steigerung der sicherheit in luftfahrt und verkehr. Research Report DLR-Project “FIT”. / Germany: DLR. European Aviation Safety Agency. (2016). Aircrew medical fitness. Implementation of the recommendations made by the EASA-led Task Force on the accident of the . EASA Opinion No 14/2016. Retrieved on December 1st, 2018 from https://www.easa.europa.eu/document-library/ opinions/opinion-142016. European Cockpit Association. (2012). Pilot fatigue barometer. Retrieved on December 1st, 2018 from https://www.eurocockpit.be/sites/default/files/eca_barometer_ on_pilot_fatigue_12_1107_f.pdf. Influences of Fatigue and Alcohol on Cognitive Performance 49

European Commission. (2014). Commission regulation (EU) No 83/2014 of 29 January 2014. Official Journal of the European Union, L 28, 17–29. Federal Aviation Administration (FAA). (2012). 14 CFR Parts 117-121. Flight crew member duty and rest requirements. Federal Register, 77(2), 330–403. Ferrara, M., & De Gennaro, L. (2000). The sleep inertia phenomenon during the sleep-wake transition: Theoretical operational issues. Aviation, Space, and Environmental Medicine, 71, 843–848. Gawron, V.J., Kaminski, M.A., Serber, M.L., Payton, G.M., Hadjimichael, M., Jarrott, W.M., Neal, T.A. et al. (2011). Human performance and fatigue research for controllers—Revised. (MITRE Technical Report MTR100316R1). McLean, VA: The MITRE Corporation. German Federal Bureau of Aircraft Accident Investigation BFU. (2012, May). Accidents and severe incidents during operation of civil aircraft (in German). Braunschweig, Germany: BFU Bulletin 2012–2005. Goerke, P., & Soll, H. (2014). Standby-dienst: Rufbereitschaft in der Luftfahrt. In S. Fietze, M. Keller, N. Friedrich, & J. Dettmers (Eds.), Rufbereitschaft—Wenn die Arbeit in der Freizeit ruft. Munich, Germany: Rainer Hampp-Verlag. Goeters, K.-M. (Ed.) (1998). Aviation psychology: A science and a profession. Aldershot, UK: Ashgate, . Gokhale, B. N. (2010). Report on accident to Air India Express Boeing 737–800 aircraft VT-AXV on 22nd May 2010 at Mangalore. New Delhi, India: Government of India, Ministry of Civil Aviation. Goode, J. H. (2003). Are pilots at risk of accidents due to fatigue? Journal of Safety Research, 34(3), 309–313. Graeber, R. C. (1988). Aircrew fatigue and circadian rhythmicity. In E. L. Wiener, & D. C. Nagel (Eds.), Human factors in aviation (pp. 305–344). San Diego, CA: Academic Press. Grech, M. R., Neal, A., Yeo, G. B., Humphreys, M., & Smith, S. (2009). An examination of the relationship between workload and fatigue within and across consecutive days of work: Is the relationship static or dynamic? Journal of Occupational Health Psychology, 14(3), 231–242. Hawkins, F. H. (1987). Human factors in flight. Aldershot, England: Gower Technical Press. Hindmarch, I., Kerr, J. S., & Sherwood, N. (1991). The effects of alcohol and other drugs on psychomotor performance and cognitive function. Alcohol & Alcoholism, 26, 71–79. Hoermann, H.-J., Gontar, P., & Haslbeck, A. (2015, May). Effects of Workload on Measures of Sustained Attention During a Flight Simulator Night Mission. Paper presented at the 18th international symposium on aviation psychology. Dayton, OH: Wright State University. Hoermann, H.-J., Mischke, M., Elmenhorst, E.-M., & Benderoth, S. (2016, September). Differential effects of sleep deprivation on cognitive performance. Proceedings of the 32nd EAAP conference (pp. 439–447), Cascais, Portugal. Hoermann, H.-J., Uken, T., & Voss, F.-R. (2012, September). The psychomotor vigi- lance tests—A measure of trait or state? Proceedings of the 30th EAAP conference (pp. 13–17). Villasimius, Italy. 50 Improving Aviation Performance

Honn, K., Satterfield, B., McCauley, P., Caldwell, J. L., & Van Dongen, H. P. A. (2016). Fatiguing effect of multiple take-offs and landings in regional airline operations. Accident Analysis and Prevention, 86, 199–208. Horne, J. (2015, December). Drugs and alcohol testing: Identifying real risk and effectively reducing that risk. Paper presented at the EASA aircrew medical fitness workshop. Cologne, Germany. International Civil Aviation Organization. (2016). Manual for oversight of fatigue man- agement approaches. ICAO Doc 9966 (2nd ed.). Montreal, UK: International Civil Aviation Organization. Jackson, C. A., & Earl, L. (2006). Prevalence of fatigue among commercial pilots. Occupational Medicine, 56, 263–268. Li, G., Baker, S. P., Qiang, Y., Rebok, G. W., & McCarthy, M. L. (2007). Alcohol viola- tions and aviation accidents: Findings from the U.S. mandatory alcohol testing program. Aviation, Space, and Environmental Medicine, 78, 510–513. Lim, J., & Dinges, D. F. (2008). Sleep deprivation and vigilant attention. Annals of the New York Academy of Sciences, 1129, 305–322. Lim, J., & Dinges, D. F. (2010). A meta-analysis of the impact of short-term sleep depri- vation on cognitive variables. Psychological Bulletin, 136, 375–389. Marcus, J. H., & Rosekind, M. R. (2015). Fatigue in aviation: NTSB findings and safety recommendations. Aerospace Medicine and Human Performance, 86, 174. Marcus, J. H., & Rosekind, M. R. (2017). Fatigue in transportation: NTSB investiga- tions and safety recommendations. Injury Prevention, 23, 232–238. Mitchell, S. J., & Lillywhite, M. (2013). Medical cause fatal commercial air trans- port accidents: Analysis of UK CAA worldwide accident database 1980–2011 (Abstract). Aviation, Space, and Environmental Medicine, 84, 346. Modell, J. G., & Mountz, J. M. (1990). Drinking and flying: The problem of alcohol use by pilots. New England Journal of Medicine, 323, 455–461. Moebus Aviation. (2008). Scientific and medical evaluation of flight time limitations. (Final Report, TS.EASA.2007.OP.08). Zurich, Switzerland: Moebus Aviation. Morrow, D., Leirer, V., Yesavage, J., & Tinklenberg, J. (1991). Alcohol, age, and piloting: Judgment, mood, and actual performance. International Journal of the Addictions, 26, 669–683. Mundt, J. C., & Ross, L. E. (1993). Methodological issues for evaluation of alcohol and other drug effects: Examples from flight-simulator performance. Behavior Research Methods, Instruments, & Computers. 25, 360–365. National Research Council. (2011). The effects of commuting on pilot fatigue. Committee on the Effects of Commuting on Pilot Fatigue, Board on Human-Systems Integration. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. National Transportation Safety Board. (2009, August). Safety recommendation A-09-61 through -66. Washington, DC. National Transportation Safety Board. (2010, September). Aviation incident final report. Incident number OPS10lA001. NTSB. National Transportation Safety Board. (2011, March). NTSB investigating air traffic controller service interruption at Washington’s National Airport. NTSB News Release on incident number OPS11IA401. Washington, DC: NTSB. Nealley, M. A., & Gawron, V. J. (2015). The effect of fatigue on air-traffic controllers. The International Journal of Aviation Psychology, 24, 14 – 47. Influences of Fatigue and Alcohol on Cognitive Performance 51

Niederl, T. (2007). Untersuchungen zu kumulativen psychischen und physiologischen Effekten des fliegenden Personals auf der Kurzstrecke. (Research Report FB 2007-17). Cologne, Germany: German Aerospace Center. Orasanu, J., Parke, B., Kraft, N., Tada, Y., Hobbs, A., Anderson, B., McDonnel, L., & Dilchinos, V. (2012, December). Evaluating the effectiveness of schedule changes for air traffic service (ATS) providers: Controller alertness and fatigue monitoring study. Technical report DOT/FAA/HFD-13/001. Washington, DC: U.S. Department of Transportation, Federal Aviation Administration. Petrilli, R. M., Thomas, M. J. W., Dawson, D., & Roach, G. D. (2006). The decision- making of commercial airline crews following an international pattern. Paper presented at the Seventh International AAvPA Symposium, Manly, Australia. Powell, D. M. C., Spencer, M. B., Holland, D., Broadbent, E., & Petrie, K. J. (2007). Pilot fatigue in short-haul operations: Effects of number of sectors, duty length, and time of day. Aviation, Space, and Environmental Medicine, 78, 698–701. Previc, F. H., Lopez, N., Ercoline, W. R., Daluz, C. M., Workman, A. J., Evans, R. H., & Dillon, N. A. (2009). The effects of sleep deprivation on flight performance, instrument scanning, and physiological arousal in pilots. The International Journal of Aviation Psychology, 19, 326–346. Reader, T. W., Parand, A., & Kirwan, B. (2016). European pilots’ perceptions of safety culture in European Aviation. Project report FSS_P5_LSE_D5.4 of Future Sky Safety. Retrieved from https://www.futuresky-safety.eu/wp-content/ uploads/2016/12/FSS_P5_LSE_D5.4_v2.0.pdf. Rosekind, M. K. (2014, March). International flight ops: An NTSB perspective on fatigue challenges. Paper presented at the NBAA international operators conference, Tampa, FL. Rosekind, M. K., Co, E. L., Gregory, K. B., & Miller, D. L. (2000). Crew factors in flight operations XIII: A survey of fatigue factors in corporate/executive aviation operations. NASA/TM-2000-209610. Moffett Field, CA: National Aeronautics and Space Administration, Ames Research Centre. Rosekind, M. R., Gregory, K. B., Miller, D. L., Co, E. L., Lebacqz, J. V., & Brenner, M. (2001). Examining Fatigue Factors in Accident Investigations: Analysis of Guantanamo Bay Aviation Accident. Washington, DC: NTSB. 94–04 (pp. 133–141). Washington, DC: National Transportation Safety Board. Ross, L. E., & Mundt, J. C. (1996). Methodological issues in research on the effects of alco- hol on pilot performance. The International Journal of Aviation Psychology, 6, 95–106. Sallinen, M., & Hublin, C. (2015). Fatigue-inducing factors in transportation opera- tors. Reviews of Human Factors and Ergonomics, 10, 138–173. Samel, A., Wegmann H. M., & Vejvoda, M. (1995). Jet-lag and sleepiness in aircrew. Journal of Sleep Research, 4, 30–36. Samel, A., Wegmann H. M., Vejvoda, M., Drescher, J., Gundel, A., Manzey, D., & Wenzel, J. (1997). Two-crew operations: Stress and fatigue during long-haul night operations. Aviation, Space, and Environmental Medicine, 68, 679 – 687. Samn, S. W., & Perelli, L. P. Estimating aircrew fatigue: A technique with applications to airlift operations.Technical Report, 1982: SAM-TR-82-21, Brooks AFB/TX: USAF School of Medicine. Santucci, G., Boer, L., Farmer, E., Goeters, K. M., Grisset, J., Schwartz, E., Wetherall, A., Wilson, G., & Yates, R. (1989). Human performance assessment methods. AGARDograph N 309. Neuilly-sur-Seine, France: NATO-AGARD. 52 Improving Aviation Performance

Simons, R. (2015, December). Alcohol, drugs, medication testing in class 1 pilots. Paper presented at the EASA aircrew medical fitness workshop. Cologne, Germany. Spencer, M. B., Robertson, K. A., & Folkard, S. (2006). The development of a fatigue risk index for shiftworkers. Research report 446 for the health and safety executive. Norwich/UK: HSE Books. Sumwalt, R. L. (2017). Investigating fatigue in transportation accidents: A board member’s perspective. Presentation at the Investigating human fatigue factors course, Ashburn, VA, 02/12/2017. Retrieved from https://www.ntsb.gov/news/speeches/ RSumwalt/Documents/sumwalt-20171012.pdf. Tiplady, B., Drummond, G. B., Cameron, E., Gray, E., Hendry, J., Sinclair, W., & Wright, P. (2001). Ethanol, errors, and the speed–accuracy trade-off. Pharmacology, Biochemistry, and Behavior, 69, 635–641. Torsvall, L., & Åkerstedt, T. (1988). Disturbed sleep while being on-call: An EEG study of ships’ engineers. Sleep, 11, 35–38. Transportation Safety Board of Canada. (2012). Aviation investigation report: Pitch excursion, Air Canada Boeing 767-333, C-GHLQ, North Atlantic Ocean, 55°00’N 029°00’W, January 14, 2011. (TSB Publication No. A11f0012). Ottawa, Canada. Tritschler, K., & Bond, S. (2010). The influence of workload factors on flight crew fatigue. 63rd Milan, Italy: Annual IASS, Information at a Glance. Van Dongen, H. P. A., Baynard, M. D., Maislin, G., & Dinges, D. F. (2004). Systematic interindividual differences in neurobehavioral impairment from sleep loss: Evidence of trait-like differential vulnerability. Sleep, 27, 423–433. Van Dongen, H. P. A., Maislin, G., Mullington, J. M., & Dinges, D. F. (2003). The cumulative cost of additional wakefulness: Dose-response effects on neuro- behavioral functions and sleep physiology from chronic sleep restriction and total sleep deprivation. Sleep, 26, 117–126. Wickens, C. D., Hutchins, S. D., Laux, L., & Sebok, A. (2015). The impact of sleep disruption on complex cognitive tasks: A meta-analysis. Human Factors, 57, 930–946. Williamson, A. M., & Feyer, A. M. (2000). Moderate sleep deprivation produces impairments in cognitive and motor performance equivalent to legally prescribed levels of alcohol intoxication. Occupational and Environmental Medicine, 57, 649–655. 3 Avionics Touch Screen in Turbulence: Simulator Design and Selected Human–Machine Interface Metrics

Sylvain Hourlier, Sandra Guérard, and Xavier Servantie

CONTENTS Preparing for Cockpit Touch Screens ...... 54 Characterizing Aeronautical Turbulence ...... 55 Origin of Turbulence ...... 55 Intensity of Turbulence ...... 56 Developing Turbulence Simulation Capability ...... 56 Step_1 Evaluation ...... 62 Step_1 Description ...... 62 Step_1 Results ...... 63 Discussion on Step_1 Simulated Turbulence Evaluation ...... 63 Step_2 Evaluation ...... 64 Step_2 Description ...... 64 Step_2 Results ...... 65 Discussion on Step_2 Simulated Turbulence Evaluation ...... 65 Characterizing HMI Requirements ...... 66 Evaluation Results ...... 66 Discussion of Performance Results for the Evaluated Touch Interactions ...... 70 Discussion of Results Per Type of Basic Interaction ...... 73 Conclusions ...... 74 References ...... 75

Today’s children will be our future cockpit users and they will be extremely familiar with interactive surfaces. Since the uprising of the iPhone, touch interactions have overtaken the cell phone industry. Nowadays, children spontaneously try to interact on any screen they come by as if “obviously” it had to be a touch screen. Moreover, the trend of touch technology for inter- action is undisputed. DisplaySearch, a market analysis firm, forecasts it to grow to over $16 billion by 2016 and $31.9 billion by 2018. The market growth is being driven by increased demand from applications such as iPads and

53 54 Improving Aviation Performance other tablet PCs (personal computer), smart phones, and emerging note- book PC designs (Hsieh, 2010). More recently, another analyst confirms the trend that the touch screen market grew from $1.5 billion in annual revenues in 2008 to over $6 billion in 2011 (Blanco, 2012). It is anticipated that touch screens will be unavoidable as an interface for all Command and Control situations, including aircraft cockpit.

Preparing for Cockpit Touch Screens Facing such an inevitable trend, any HMI (Human–Machine Interface) man- ufacturer should look into the future of touch technology in their field. The ODICIS (One DIsplay for a Cockpit Interactive Solution) concept demon- strator was presented at Le Bourget Air Show in 2011 as a long term (2035) prospective aim at the future of cockpit design and featured a large, single, seamless, freeform touch screen to accommodate all of the crew interactions. Since then the concept has matured (Hourlier, 2015) into the Avionics 2020 full touch screen cockpit (Figure 3.1), a soon-to-be-developed (2020) concept. It includes multiple seamless touch screens (contiguous screens without margin) in an integrated approach to meet pilots’ HMI demands. Analyzing basic interactions can be costly and fastidious, especially when there are no referenced simulation devices to start with. We had to develop a specific means of simulation for turbulences, then collate repetitive basic interactions that had to be analyzed to verify their efficiency. Yet it has proven very rewarding as impacting design rules could be drawn that really enabled efficient touch screen interactions in turbulent conditions.

FIGURE 3.1 Avionics 2020 full touch screens cockpit. (Courtesy of Thales Avionics, La Défense, France.) Avionics Touch Screen in Turbulence 55

Yet, implementing touch technology in an airliner cockpit means comply- ing with part 25 aircraft certification. The process is thorough and specifies that the design of systems should take into account aeronautical effects (such as turbulence) and the way they affect the efficiency of pilots’ interactions. In 2014, after consulting with European certification authorities on the cer- tification process of touch screens, it became obvious that an applicant must demonstrate that their technology (i.e., touch screens) should enable crew to perform the same task, with at least the same level of performance obtained with previous (non-touch) systems, in all relevant aeronautical environments. As the usability of cockpit touch screens was often judged doubtful by the pilots we interviewed at the time, turbulence (up to “severe”) appeared to be the one issue most urgently needing in-depth evaluation within the number of specific aeronautical environments that had to be addressed. Hence, a human factors evaluation and its Means of Compliance (MOC) technology was needed to prepare certification. They represent ways to demonstrate that compliance is respected by our technology. Many means of compliance are possible: formal demonstration, simulation study, flight test verification, and so on. For us, only a realistic simulation would alleviate the usability risk of touch displays in turbulence and refine design recommen- dations for interactions with touch technology (HMI design and physical installation). Consequently, a human factors plan was designed to: (1) characterize lev- els of turbulence in a way that they could be reproduced in simulation, (2) develop simulation to produce these levels and a number of representative turbulence “patterns,” and (3) select and validate the turbulence levels and patterns with end-users.

Characterizing Aeronautical Turbulence Before choosing a simulation device, we had to establish the characteristics of what needed to be simulated. The characterization of aeronautical turbu- lence relies on its origin and its intensity. This section describes, first, the origin of turbulences, then, the intensity (level) of turbulences.

Origin of Turbulence Even with limited flight experience, one can relate to the term “turbulence” in flight. Usually the captain orders passengers to their seat with their seat belt tightened due to upcoming turbulence. Atmospheric turbulence is defined as “small-scale, irregular air motions characterized by winds that vary in speed and direction” (Aeronautical Turbulence, 2015). More precisely, turbulence can be defined as small-scale, short-term, random, and frequent changes to 56 Improving Aviation Performance the velocity of air. In other words, when there are rapid changes to either the air’s speed or its direction of movement, or both, conditions are said to be turbulent. Finally, one must note that turbulence is distinct from a vibration, as turbulence is chaotic by nature and not cyclic. There are four fundamental causes of aeronautical turbulence (Wagtendonk, 2003):

• Thermal: When the earth surface is sufficiently warm, vertical cur- rents of air form creating turbulent conditions. • Mechanical: This turbulence is caused by the interference of earth surface features (mountains, tall buildings, trees, etc.) on the hori- zontal flow of air. The amount of turbulence depends on wind speed and obstruction size and shape. • Shear: When the direction or speed of wind changes dramatically within a short horizontal or vertical distance, turbulence will be created. • Wake turbulence: An aircraft causes wake turbulence in its trail as it flies through the air.

Intensity of Turbulence The Federal Aviation Administration (FAA) (2017) has categorized turbu- lences in four levels according to the aircraft reaction and what occupants may feel inside the aircraft. These four levels are described in Table 3.1.

Developing Turbulence Simulation Capability The objective of the FAA’s description is to recognize turbulence by its effects in order to enable reporting. As our objective is to analyze the effect of vari- ous controlled turbulence levels (in a simulator) on touch screen usability, we needed a more precise metric for the extent of displacement and acceleration that occurred under different levels of turbulence. Figure 3.2 represents the relationship between displacement and accelera- tions. The dotted line characterizes the effects at 1 Hz. In a sinusoid, dis- placement of 25 cm per second corresponds to a maximum acceleration of 9.81 m.s−2 (1 G). Using this relationship, the various levels of turbulence were approximated with regard to the maximum acceleration and the maximum displacement. We focused on the effects of vibration between 0.2 and 7 Hz as they are predominant on the control of hand/arm movement (Berthoz, 1981). This frequency range helps to define the displacement range that a turbulence simulator must be capable of; to produce the desired levels of turbulence. At one end, for a frequency of 0.2 Hz, one would need 12 meters of displacement to reach an acceleration of 2Gs. On the other hand, the higher the frequency, the flatter the line; at 7 Hz, one would reach 2Gs for a displacement of only 1 cm. This preliminary analysis enabled us to focus our search for a suitable Avionics Touch Screen in Turbulence 57

TABLE 3.1 Description of Turbulences Levels and Consequences from an Occupant Point of View Turbulence Level Aircraft Consequences Occupants’ Consequences

Light Turbulence that momentarily causes Occupants may feel a slight strain slight, erratic changes in altitude against seat belt or shoulder straps. and/or attitude (pitch, roll, yaw in Unsecured objects may be displaced the aircraft’s reaction. Should be slightly. Food service may be reported as LIGHT TURBULENCE. conducted and little or no difficulty Turbulence that causes slight, rapid is encountered in walking. and somewhat rhythmic bumpiness Occasional—less than 1/3 of the time; without appreciable changes in Intermittent—1/3 to 2/3; altitude or attitude. Should be Continuous—more than 2/3 reported as LIGHT CHOP. Moderate Turbulence that is similar to Light Occupants feel definite strains Turbulence but of greater intensity. against seat belt or shoulder straps. Changes in altitude and/or attitude Unsecured objects are dislodged. occur but the aircraft remains in Food service and walking are positive control at all the times. It difficult. usually causes variation in indicated airspeed. Should be reported as MODERATE TURBULENCE. Turbulence that is similar to Light Chop but of greater intensity. It causes rapid bumps or jolts without appreciable changes in aircraft altitude or attitude. Should be reported as MODERATE CHOP. Severe Turbulence that causes large, abrupt Occupants are forced violently changes in altitude and/or attitude. against seat belts or shoulder straps. It usually causes large variations in Unsecured objects are tossed about. indicated airspeed. Aircraft may be Food service and walking are momentarily out of control. Should impossible. be reported as SEVERE TURBULENCE. Extreme Turbulence in which the aircraft is Occupants may be injured. violently tossed about and is practically impossible to control. It may cause structural damage. Should be reported as EXTREME TURBULENCE.

Note: Adapted from the FAA’s Turbulence Reporting Criteria Table. simulation platform: it should be able to reproduce large displacements at low frequencies (i.e., vibration pods are no solution, as they produce small displacements at high frequencies). Though the best evaluation would be flight tests in real turbulence, the cost would rapidly become unacceptable, the environment not being always 58 Improving Aviation Performance

FIGURE 3.2 Level of turbulence as a function of acceleration and displacement. predictable (finding areas of turbulence is not always feasible). Moreover, it is impossible to control for levels of turbulence in terms of similarity and level. Our second-best solution was the full flight simulator (FFS). We tried one at our Thales Training and Simulation (TTS) facility in the north of Paris. However, there were problems that prevented this from being a usable turbulence tes- tbed. First, although technically FFS should be able to perform the levels of turbulence we needed, those platforms are not designed to and would need some costly adaptation to be able to fully reproduce them. Second, there is little layout adaptability because the TTSs are dedicated to training on specific aircraft or helicopters and are not readily reconfigurable as research systems. Our solution was to use a Hexapod simulator. There are many types of Hexapods and only the high-end ones are able to reproduce the levels of movement characteristic of aeronautical turbulence. We need three axes of linear acceleration, X, Y, and Z and three angular accelerations, all with a displacement capability consistent with those needed to reproduce realistic in-flight turbulence. To complete our initial analysis, we started collecting inflight data on a Socata TBM700 aircraft. We used an SGB IG-500N GPS enhanced miniature Attitude and Heading Reference System (AHRS) that delivers attitude and position measurements. It was installed near the center of gravity of the air- craft to collect movement and accelerations data (3 angular + 3 linear) at 20 Hz. Hexapod limits integration (technical evaluation). The inflight record- ings provided geo-referenced flight paths with 20 Hz sampling of accelera- tions (3 angular + 3 linear) on each given path. Because the Hexapod was fixed to the ground and the machine cannot understand geo-referenced movements, the data first had to be transformed. The mathematical trans- formation produced the linear acceleration X, Y, and Z and three angular Avionics Touch Screen in Turbulence 59 accelerations around a stabilized geo-reference that would be the center of the Hexapod, constituting the turbulence profiles as shown on Figure 3.3. Since the sampling rate of the Hexapod was 100 Hz the inflight data were interpolated using a cubic interpolation to fit the Hexapod’s requirements. The selected Hexapod was the property of Ecole Nationale Supérieure des Arts et Métiers (ENSAM) ParisTech located on the campus of Bordeaux-Talence

FIGURE 3.3 Example of turbulence profile: (top) position displacement in mm, (bottom) rotation angle in degree. 60 Improving Aviation Performance

(ENSAM Bordeaux-Talence). We verified that the profiles were within the capac- ity of the machine: maximum displacement (± 40 cm) and maximum accelera- tions (± 1 G): First, to take gravity into account, inflight measurements were filtered using a Bessel filtering. The rotations were filtered using a 4th order low pass filter from 10 Hz, and the translations filtered using an 8th order low-pass filter from 0.6 Hz. The order was intentionally increased to obtain trajectories profiles within the capacity of the Hexapod. The cut-off frequency was kept as low as possible (0.6 Hz) in order to restore the best spectrum of turbulence movement without overshooting the amplitude limitations of the Hexapod. Profile adaptation (expert evaluation). After adjustments were made with the help of a flight test pilot, a temporal selection was made to define a profile of suitable duration (neither too short to prohibit the pilot from appreciating the turbulence levels, nor too long to cause muscular fatigue). The hardest aspect of turbulence to reproduce was the chaotic sequence of variations in all axes (3 angular and 3 linear), our choice focused on the most hectic nonsym- metrical zone of the profile in order to preserve that very chaotic sequence. The first defined profile was a root profile (referred to as P3) of 45s. It was the closest to the raw inflight profiles we had gathered. From this root profile, new profiles were generated to address all levels of turbulences. The P6 profile was defined by using the maximum range of displacements, dilating or compress- ing parts of the sample P3, and by adding or reducing accelerations, mostly in the Z and Y directions (the front back acceleration being rare in aircraft). The P3 and P6 profiles were run several times at ¼ displacement (¼D), then at ½ displacement (½D), then full (1D). Four other profiles (namely P1, P2, P4, and P5) were then generated and tested on an empty seat, for safety reasons. The six selected profiles (Table 3.2) were designed to cover the desired turbu- lence levels to be used in the Hexapod. With the profiles defined, our test pilot performed progressive runs to col- lect initial subjective assessments of the profiles (see example of a profile in Figure 3.3). The objective was to produce representative simulator profiles for the light, moderate, and severe levels of turbulence. Subjective evaluation setup of simulated turbulence profiles. Two evalua- tions took place at ENSAM Bordeaux-Talence with two objectives. The first was a pilot assessment of the levels of turbulence produced by the Hexapod, and the second was an in-depth evaluation of touch-screen interactions performance

TABLE 3.2 Tested Turbulence Profiles (Acceleration Vector Norm in m/s2) Profile P1 P2 P3 P4 P5 P6

Maximum 1.38 2.29 5.51 4.12 5.52 8.11 Mean 0.35 0.65 1.33 1.28 1.53 2.60 Median 0.31 0.57 1.15 1.11 1.32 2.29 Level sought Less than Light/ Moderate Moderate Moderate Severe light moderate high low high Avionics Touch Screen in Turbulence 61 when subjected to various levels of turbulence. The pilot evaluations of the tur- bulence simulations were conducted in two separate steps (Step 1 and Step 2). The second step was needed to address a potential bias in the first evaluation’s experimental design and also to increase the corpus of pilot evaluators. The Hexapod (±1 g, ± 40 cm XYZ displacements and 3 axis angular acceleration) was fitted with a specific “cage” replicating the AV2020 cock- pit design (Figure 3.4). The design of the cage was contracted to ENSAM Bordeaux-Talence with detailed specifications to ensure the realism of the multiple screen positions.

FIGURE 3.4 The Hexapod at ENSAM Bordeaux-Talence with the test bench on top. (Courtesy of Sylvain Hourlier and Xavier Servantie.) 62 Improving Aviation Performance

Six 45-second profiles were pre-set on the Hexapod and could be played on demand. Table 3.2 shows the acceleration levels (mean and maximum) on the six selected profiles. Note that, at first inspection, Profiles P3, P4, and P5 might appear to be the same, but pilots appeared to be able to differentiate them in terms of their realism and level of turbulence. This is yet another reason to perform a human factor evaluation with pilots.

Step_1 Evaluation Step_1 Description Subjects. Five pilots performed the first evaluation. They were all men, right handed, two were aged between 40 and 49 and three between 50 and 59. One had a piloting experience between 100 and 500 hours, while the four others had over 2000 hours of piloting experience. Run. A typical run would comprise the comparative evaluation of the six turbulence profiles, followed by the touch screen evaluation under turbu- lence. The required time was approximately 1.5 hours on average. A pause in the middle was added to accommodate the test subjects, the experience being somewhat tiring. For the subjective evaluation of the turbulence level, the protocol was quite simple. The pilots were run through each profile and were asked to evaluate the realism of the profile as a “turbulence one could encounter in an aircraft,” and to rate the “level of turbulence.” An example of the questionnaire is shown in Table 3.3. The measure performed collected a distance from 0 to the cross added by the subject. Each distance was res- caled to 0–10 and 0–4, respectively, to provide results for realism and level assessment.

TABLE 3.3 Format Used to Collect the Subjective Evaluation of Simulated Turbulence Profiles Does it Feel Like Real Please Estimate the Level Turbulence Profile Played In-Flight Turbulence? of this Turbulence Profile

- - - Not at all Perfectly Light Moderate Severe 0______10 0______1______2______3______4

Note: Pilots wrote a cross along each horizontal scale. Its measure served for the evaluation of their response. Avionics Touch Screen in Turbulence 63

Step_1 Results Though our sample of pilots was small, our results show considerable coher- ence and little variability. Estimated “realism” of the simulated turbulence profiles. As shown in Table 3.4, all profiles have a realism (rendering quality) rating superior to 5 on a 0 to 10 scale with 10 being the most realistic. In addition, 5 out of 6 profiles were judged to be higher than 7.7 and there was not much dispersion in the ratings. Estimated level of the simulated turbulence profiles. Table 3.5 provides the pilot-estimated levels of turbulences. As shown in Figure 3.5, Profiles P1 and P2 were scored as light turbulence, the Profiles P3, P4, and P5 were scored as moderate and the last profile P6 was scored as severe.

Discussion on Step_1 Simulated Turbulence Evaluation The pilots interviewed all agreed on the quality and representativeness of the Hexapod as a means to reproduce turbulence. The Hexapod move- ments were judged similar to real turbulence with a high level of confidence, except for the lowest level (P1). It appears to be less realistic than the others. Though P1 could still be accepted as a representative level of turbulence, it was already set aside from our short list.

TABLE 3.4 Pilot-Rated Realism of Turbulence Rendering by Hexapod (Step_1) Turbulence Test Level P1 P2 P3 P4 P5 P6

Mean 5.6 7.7 7.9 8.2 9.2 9.2 SD 1.0 1.4 1.4 1.4 0.5 0.5

Note: Rating (0: not good, 10: excellent).

TABLE 3.5 Pilot-Estimated Levels of Turbulences (Step_1) Turbulence Test Level P1 P2 P3 P4 P5 P6

Mean 1.15 1.87 2.54 2.39 2.61 3.17 SD 0.42 0.37 0.21 0.18 0.13 0.26

Note: Rating (between 0 and 1: no turbulence, between 1 and 2: light, between 2 and 3 moderate, and between 3 and 4: severe). 64 Improving Aviation Performance

FIGURE 3.5 Estimated level of simulated turbulence profiles (Step_1).

The Step_1 protocol presented the turbulence runs in a progressive manner from the lowest to the highest to minimize motion sickness effects. Our main concern after this first evaluation was that the progressive order for the sim- ulated turbulence levels could have suggested to the subjects a progressive increase in the turbulence level and biased their responses. To address this potential bias a second evaluation was designed with another group of pilots.

Step_2 Evaluation Step_2 Description Subjects. Six pilots performed this second evaluation. They were all men, right handed, two were aged between 20 and 29 years old, two between 40 and 49 years old, one between 50 and 59 years old, and one over 60. One had less than 100 hours of piloting experience, two had between 100 and 500 hours, two had between 500 and 2000 hours, and one had over 2000 hours of piloting experience. Run. The runs would comprise the same six turbulence profiles that, for technical reasons, had to be set to a specific order for all the runs: P3 → P1 → P5 → P2 → P4 → P6. Each pilot only ran the order once, so they could not anticipate the influence of the predefined order. The same protocol as Step_1 was used. Six pilots ran each profile in the predefined order and were asked to evaluate the realism of the profile as a “turbulence one could Avionics Touch Screen in Turbulence 65 encounter in an aircraft” and rate the “level of turbulence.” The same ques- tionnaire was used as in the Step_1 evaluation (Table 3.3).

Step_2 Results As the Step_2 evaluation was designed to address the concern about the influence of the run orders, the general method, other than presentation order, was duplicated; thus, the results can be analyzed either separately or aggregated. The results of our second sample of pilots, though slightly differ- ent from the Step_1 results, still show great overall agreement. Estimated “realism” of the simulated turbulence profiles. As shown in Table 3.6, all Step_2 profiles had a “realism” rating better than 5 out of 10, four profiles (P2, P4, P5, and P6) are judged higher than 6, and the dispersion in the ratings is a bit higher than in the Step_1 trials. Estimated level of the simulated turbulence profiles. As shown in Table 3.7, Levels P1, P2, and P3 were judged to be light, Levels P4 and P5 were judged to be moderate, and the last profile, P6, was judged to be severe. Although there was a higher dispersion of results in the Step_2 evaluation, the overall rating of levels was consistent with Step_1 evaluation.

Discussion on Step_2 Simulated Turbulence Evaluation As seen in Table 3.6, the realism of the profiles was always above mid-range value of 5, with a notable difference between the first profile and the last three. Interestingly, the rating of the rendering of the lower level of turbu- lence being not as good as the higher levels was replicated. This reinforces

TABLE 3.6 Pilot-Rated Realism of Turbulence Rendering by Hexapod (Step_2) Turbulence Test Level P1 P2 P3 P4 P5 P6

Mean 6.0 6.1 5.2 7.8 7.3 7.6 SD 1.5 2.8 3.1 1.3 2.0 1.9

Note: Estimated Realism of Turbulence (0: not good, 10: excellent).

TABLE 3.7 Pilot-Estimated Levels of Turbulences (Step_2) Turbulence Test Level P1 P2 P3 P4 P5 P6

Mean 1.27 1.66 1.98 2.38 2.86 3.37 SD 0.78 0.58 1.13 0.90 0.66 0.74

Note: Rating (between 0 and 1: no turbulence, between 1 and 2: light, between 2 and 3 moderate, and between 3 and 4: severe). 66 Improving Aviation Performance the impression of the Step_1 evaluation; the lower the turbulence, the harder it is to simulate. Most importantly, it appears that following a random sequence of turbulence runs did not invalidate our Step_1 results. Statistical tests were run to evaluate the impact of flying experience on the estimated realism and/or level of the profiles, but were not significant.

Characterizing HMI Requirements To characterize HMI requirements in turbulence, we only used the first three levels (none, light, and moderate) and excluded the severe level. Our usability analysis of the task to perform in various levels of turbulence, revealed that, in the case of severe turbulence, pilots keep their hand on their throttle and stick and focus on leaving the turbulent environment, not on interacting with the HMI. We focused on basic gestures: press, release, double tap, and long press. For each of them, we measured their precision and timing dimensions:

• Distance between effective press and target • Distance between effective release and target • Distance between two taps while performing a double tap • Time between two taps while performing a double tap • Involuntary moves while performing a long press

Evaluation Results The evaluation results below served as specifications for Thales’ product development purposes. The results are shown in a condensed format (bul- lets and short description) to ease design reference search. For each type of interaction, we will present here: the triggering action for launching a trial, the recorded dimension, the sample collected per turbulence level, a figure exemplifying the touch instruction, and a figure reproducing the results. Press performance:

• Triggering action: A validated press will load the next trial. • Recorded dimension: Distance (in mm) of finger to target center at (every) press time. • Sample collected: 646 trials for each level of turbulence (No, Light, and Moderate).

Figures 3.6 and 3.7 show, respectively, how the interaction was recorded and the subsequent results for various levels of turbulence. Avionics Touch Screen in Turbulence 67

FIGURE 3.6 Touch instruction for press evaluation.

FIGURE 3.7 Performance with regard to distance between finger and target for single press at various lev- els of turbulence.

Release performance:

• Triggering action: A validated release will load the next trial. • Recorded dimension: Distance (in mm) of finger to release target center at release time. • Sample collected: 638 trials for no turbulence, 649 for light turbu- lence, and 575 for moderate turbulence. 68 Improving Aviation Performance

Figures 3.8 and 3.9 show, respectively, how the interaction was recorded and the subsequent results for various levels of turbulence. Double tap performance:

• Triggering action: A validated double tap will load the next trial. • Recorded dimension: Distance (in mm) from finger at first tap to -fin ger at the second tap. • Sample collected: 228 trials for no turbulence, 523 for light turbu- lence, and 509 for moderate turbulence.

FIGURE 3.8 Touch instruction for release evaluation.

FIGURE 3.9 Performance with regard to distance between finger and target center at release time for vari- ous levels of turbulence. Avionics Touch Screen in Turbulence 69

Figures 3.10 and 3.11 show, respectively, how the interaction was recorded and the subsequent results for various levels of turbulence. As a double tap implies, the necessity for the system to analyze it differ- ently from a false secondary touch, and it was important to analyze also the timing of double taps for discrimination purposes. Results are shown in Figure 3.12.

FIGURE 3.10 Touch instruction for double tap evaluation.

FIGURE 3.11 Performance with regard to distance between finger at first tap to finger at the second tap for various levels of turbulence. 70 Improving Aviation Performance

FIGURE 3.12 Performance with regard to time between two taps for an effective double tap reconnaissance at various levels of turbulence.

• Recorded dimension: Time (in seconds) between two taps for an effective double tap. • Sample collected: 228 trials for no turbulence, 523 for light turbu- lence, and 509 for moderate turbulence.

Long press performance:

• Triggering action: A validated long press will load the next trial. The target needed a continuous 2-second press to issue feedback. • Recorded dimension: Maximum recorded distance (in mm) from finger to target during 2 second. • Sample collected: 307 trials for no turbulence, 306 for light turbu- lence, and 186 for moderate turbulence.

Figures 3.13 and 3.14 show, respectively, how the interaction was recorded and the subsequent results for various levels of turbulence.

Discussion of Performance Results for the Evaluated Touch Interactions Before reviewing touch interaction type related results, a point can be made about the general aspect of all the data. A mathematical analysis was per- formed to approximate these results, as the figure curves seem to follow a Avionics Touch Screen in Turbulence 71

FIGURE 3.13 Touch instruction for long press evaluation.

FIGURE 3.14 Performance of a long press interaction, maximum unwanted move for various levels of turbulence. 72 Improving Aviation Performance

Weibull distribution (Weibull distribution, 2017). The general formulation for this kind of law is:

β  d  −  V =−1 e  α  Equation 3.1. Weibull distribution. Where V is the probability of success, d is the distance between target and finger, α and β are the two constants depending on level of turbulence (see Table 3.8 for single press references) As shown in Figure 3.15, the estimated figures using a Weibull distribution with Table 3.8 α and β constants match the collected data distribution for all three levels of turbulence. This suggests the results are coherent and fol- low a mathematical law. Such a mathematical model of human performance enables the prediction of usability outcomes for any type of touch interac- tion, once the α and β constants have been determined for that interaction.

TABLE 3.8 α and β Constants Depending on Level of Turbulence for Single Press

Turbulence Level α β No 3,61900396 1,74132224 Light 5,9091823 1,63635047 Moderate 8,51799839 1,33989697

FIGURE 3.15 Comparison of estimated performance to actual collected performance for single press evaluation. Avionics Touch Screen in Turbulence 73

Discussion of Results Per Type of Basic Interaction All interactions described in this paper were collected on 15” touch screens with one single finger rest located on all four sides of each screen. As all the results discussed here refer only to the size of active touch zones, there is still to determine the relationship between visual target size and perfor- mance. Though training easily enables the understanding of efficient and non-efficient touch strategies whatever the target visual size for a given act- able zone, there can still be readability issues. Some vibrations (that are part of turbulence profiles) can blur vision and smaller symbols may not be rec- ognized, though certified for non-turbulent conditions. The mechanism is not only at screen level, but also at eye level. Eyeballs can enter in resonance around 18 Hz (Ohlbaum, 1976), and such levels of vibration can be found in helicopters with certain combinations of turbine speed, rotor blade configu- ration, and rotating speed. Those configurations mostly appear at takeoff and are well known to helicopter pilots. In fact, since that experiment, we have expanded our evaluations to helicopter profiles simulation. For a given performance objective (% of errors) and a given turbulence level, engineers can appropriately choose within the charts or use the for- mulas for any type of basic interaction. As a corollary, when an activation zone is constrained by design we can predict the performance objective (error rate). The optimal size of a “single tap” activation zone for a button is easily calculated for targeted error and turbulence level. A virtual touch keyboard, for instance, could benefit from these figures to become turbulence resistant. Notably, it isn’t the size of the target that matters, but the size of the activa- tion zone and thus the size of the surrounding inactivated zone. By comparing the results from Figures 3.8 and 3.10, one can deduce that, for single taps, the precision is much better (up to four times) at release than at press. Thus, it is best to activate a button event at release than at press, even if the press action is outside the activation zone of the target button. Beyond buttons, this should help structure all keyboard design for touch screens in turbulent environments. The “double tap” analysis enabled us to build a model for the distance between two taps with regards to turbulence level. These data help by lim- iting the validated second tap within an acceptable activated zone, as the system should be able to discern between a double tap and two single taps. Combining this with the timing model of the double tap, greatly improves the effective detection of a double tap. For example, limiting the detection to less than 13 mm around the target and less than 650 ms between taps dis- criminates between a double tap and an error with 90% accuracy in moderate turbulences. Depending on the criticality of the interaction, designers can adjust its efficiency for an optimum safety, as they can choose the accuracy level on the chart. An alternate interpretation of the same data gives us the time to wait before considering an interaction is a simple tap. 74 Improving Aviation Performance

For double taps, the delay between two taps increases with turbulence level. Also, it appeared that while the target display size gets smaller, the acceptable time between two taps (double tap) increased. This is still under scrutiny and should be confirmed through further data analysis. We witnessed that having a finger resting on the touch surface helped with the stabilization of the hand and, thus, reduced the effect of turbulence. The “long press” analysis helped the interpretation of voluntary versus involuntary movements when dragging is the objective of the interaction. We built a model to account for involuntary finger movement (that must stay on a given spot) with regards to turbulence level. It allowed for recommen- dation on the threshold to consider before actually activating a “drag” or considering a mere “long press” under turbulence.

Conclusions In the conservative field of avionics HMI, it appeared as a challenge to push touch screens as a reliable interaction device, especially in view of an aero- nautical turbulent environment that seemed such an insurmountable bar- rier. Evidence-based design (Ulrich, 2008) came as an appropriate approach to overcome such a challenge. We needed hard data to show the feasibility of touch interactions under turbulent conditions and were inspired by its applicability beyond architecture and health care (Tannen, 2009). Through the design of a proper environmental simulator that enabled objective data collection, we issued numerous recommendations for turbulence compliant interactions. Hexapod movements (except for the lowest level) were judged similar to real turbulence with a high level of confidence. Therefore, the Hexapod was judged adapted to the simulation of light to severe turbulence profiles. The pilots interviewed all agreed on the quality and representativeness of the Hexapod and the three selected profiles as a means to reproduce aeronauti- cal turbulence for technology testing. Testing many basic interactions enabled the collection of predictive data for each level of turbulence and for a given performance target. Data profiles can even be mathematically described, which could enhance their predict- ability for future design under turbulent conditions. This research supports the credibility of touch screen usage under light and moderate turbulence. Complete HMI were designed using these recommendations and then tested and validated using these profiles. This would have not been possible before, since there were no equivalent records on the subject. Avionics Touch Screen in Turbulence 75

This project was about the development and use of a dedicated means of compliance for touch screen use in turbulence. Beyond the foreseeable development of further tests and evaluations, one could speculate about the usability of such simulator and turbulence profiles for training purposes. Indeed, over the last years we have witnessed, within our user population, strategies and hand positioning techniques that evolved for better perfor- mance. Some of those could be worth training for or at least taught as com- plimentary means of performance improvement. As a last remark, our overall methodology was quite simple; observe the problematic environment’s characteristics, collect and compute representative samples, search for means capable of reproducing said sample, test candidate profiles with specialists of the environment to grade the environment effect, validate with professionals, and finally, test the tech- nology in the novel simulated environment. What proved to be the most challenging was to balance between testing expectations and available resources to do so. How much realism would be enough for efficient testing in the problematic environment? The answer to that question lies with the end user acceptability rating of our technology. Only their validation could justify our design choices.

References Aeronautical turbulence. (2015). Encyclopedia britannica. Retrieved on October 27, 2017 from https://global.britannica.com/science/atmospheric-turbulence. Berthoz, A. (1981). Effets des vibrations sur l’homme. In J. Scherrer (Ed.), Précis de physiologie du travail, 2ème éd. (pp. 341–372). Paris, France: Masson. Blanco, R. (2012). Nothing touches this market… The penny sleuth. Retrieved on October 27, 2017 from http://pennysleuth.com/nothing-touches-this-market. F.A.A. Regulations. (2017). Aeronautical information manual. TBL 7 − 1−10. Retrieved on October 27, 2017 from https://www.faa.gov/air_traffic/publications/media/ aim.pdf. Hourlier, S. (2015). Human factors drivers behind next generation AV2020 cockpit display (No. 2015-01-2537). SAE Technical Paper. Retrieved on October 27, 2017 from https://doi.org/10.4271/2015-01-2537. Hsieh, C. (2010). Touch panel market research. Retrieved on October 27, 2017 from http:// www.displaysearch.com/cps/rde/xchg/displaysearch/hs.xsl/touch_panel_ market_analysis.asp. Ohlbaum, M. K. (1976). Mechanical resonant frequency of the human eye ‘In Vivo’ (No. AMRL-TR-75-113). Wright-Patterson AFB Dayton, OH: Air Force Aerospace Medical Research Lab. 76 Improving Aviation Performance

Tannen, R. (2009). Designers, take a look at evidence-based design for health care. Retrieved on October 27, 2017 from https://www.fastcompany.com/1418953/ designers-take-look-evidence-based-design-health-care. Ulrich, R. S., Zimring, C., Zhu, X., DuBose, J., Seo, H. B., Choi, Y. S., Joseph, A. et al. (2008). A review of the research literature on evidence-based healthcare design. HERD: Health Environments Research & Design Journal, 1(3), 61–125. Wagtendonk, W. (2003). Meteorology for professional pilots. Bay of Plenty, New Zealand: Aviation Theory Centre (NZ) Ltd. Retrieved on October 27, 2017 from http:// aviationknowledge.wikidot.com/aviation:atmospheric-turbulence. Weibull distribution. (2017). Wikipedia. Retrieved on October 27, 2017 from https:// en.wikipedia.org/wiki/Weibull_distribution. Section II

Modeling for Aviation Psychology

4 Prospective Comments on Performance Prediction for Aviation Psychology

Kevin A. Gluck, Tiffany S. Jastrzembski, and Michael A. Krusmark

CONTENTS Key Questions ...... 80 What? ...... 80 How? ...... 81 When? ...... 82 Progress on Performance Prediction ...... 84 Evaluating the Theoretical Adequacy and Applied Potential of PPE ...... 85 Predictive Performance Optimizer (PPO) ...... 86 PPO Application—CPR Study Example ...... 87 Emerging and Prospective Applications ...... 90 Medical Applications ...... 91 Manufacturing Safety ...... 92 Virtual Training ...... 92 Pilot Training ...... 93 Enduring Challenges and Prospects for Performance Prediction in Aviation Psychology ...... 94 Validated Measures ...... 94 Item Sequencing within Curricula ...... 94 Earlier Predictions ...... 95 Robustness to High Schedule Variability ...... 95 Quantifying Certainty in Predictions ...... 96 Conclusion ...... 96 Acknowledgments ...... 96 References ...... 97

“... one important variable for both trainers and researchers to consider when evaluating instructional methods is delayed performance.” (Clark & Wittrock, 2000, p. 79)

Nearly everything about aviation changed over the course of the first century of flight. Even as the aviation industry began to mature, technologi- cal advancements brought continuous change to the human experiences of

79 80 Improving Aviation Performance aviation, in and out of the cockpit. These changes have continued in recent decades, resulting in an increasingly important role for principled guid- ance from aviation psychology (Tsang & Vidulich, 2003) and an increasing requirement for people to train and retrain on an evolving collection of com- plex sociotechnical aviation systems. This sense we have that the only constant is change is not an issue restricted to aviation. It is a reality that pervades all of contemporary society. In the context of that broad experience and its resulting implications for education and training, it is important to note that there is always a delay between the last training opportunity and the next performance opportunity on the job. The reality of delayed performance creates a demand signal for new tech- nologies that improve adaptive training and our ability to deliver it when and where it is most needed. More specifically, delayed performance has motivated our investments in recent years in the science and technology of future performance prediction. The research approach and resulting capabil- ity is domain agnostic, though we note prospective connections to aviation in the content that follows. The chapter begins by positioning the research in the key questions students and instructors must always ask about training: what, how, and when?

Key Questions Instructors face the important and daunting task of helping people improve their knowledge and skill. Successfully achieving this requires decisions across an array of relevant dimensions, and these dimensions have been the focus of an increasingly sophisticated, expanding set of practical issues and scientific questions in training research.

What? The first of these dimensions is: what to train? This is the obvious starting point for any practical effort to train people, as it identifies the fundamental goal. We must know what, precisely, we want people to know or do better, so that the training effort is appropriately focused. From a historical perspective, the competition for survival among our dis- tant ancestors placed a premium on physical performance in the naturally occurring world. The emergence of higher order cognitive functions, such as language and theory of mind, then made instruction possible. Whichever person, couple, group, or community most effectively taught each other how to gather, hunt, cook, defend, and attack possessed a survival advantage over the others. Thus, the earliest answers to what to train were focused on the Prospective Comments on Performance Prediction for Aviation Psychology 81 skills necessary for survival, and evolutionary pressures selected for mecha- nisms that improved an agent’s ability both to learn new tasks and to selec- tively convey task-relevant information to others. Eventually the creation of complex engineered systems combined with an emerging appreciation of the fact that people themselves are an important focus of scientific inquiry to produce research efforts on methods for the formal analysis of what to train. These have taken an assortment of forms in different sub-disciplines and communities of interest, including job analysis (Gael, 1988), work analysis (Vicente, 1999; Wilson, Bennett, Gibson, & Alliger, 2012), and task analysis (Schraagen, Chipman, & Shalin, 2000). Regardless of the specific method, the goal is to identify what to train within a particular domain.

How? A second question is: how to train? For millennia, answering this question was comparatively easy by virtue of the fact that there were not many options. Training was done one-to-one or one-to-many through demon- stration, verbal instruction, and apprenticeship. The advent of the informa- tion age in the mid-twentieth century changed this completely. It became possible to represent the information processing of humans in computer software, leading to a general interest in artificial intelligence and its appli- cations in the context of education and training. Other computing-related technological innovations in knowledge representation, databases, visual displays, and processors led to an explosion of ideas and increasingly attractive options in simulations, games, virtual reality, and augmented reality. Every aspect of instruction is open for exploration, producing an enormous combinatoric space of options (Koedinger, Booth, & Klahr, 2013) available for the research community to investigate. A couple important dimensions of this instructional design space include the type and timing of feedback (Kyllonen, 2000; Shute, 2008) and how to blend live, virtual, and constructive training (Colegrove, Rowe, Alliger, Garrity, & Bennett, 2009). Progress toward scientific consensus on these and other challenging instructional design issues is hampered by real-world education and train- ing realities, such as small sample sizes, uncontrolled confounds, limited availability of valid, objective, quantitative measures in some domains, and socio-political and organizational resistance to change. However, research- ers and forward-thinking practitioners across government, industry, and academia continue to explore the space of these design issues, which are all about how to train for maximal benefit. “What?” and “How?” are crucial questions that must be addressed in order to train at all. In recent years, a third important consideration has emerged in an attempt to further refine and improve on the implementation of state-of- the-art training programs: when to train? 82 Improving Aviation Performance

When? Any rigorous, scientific, evidence-based attempt to answer this question challenges decades of simple, convenient, but often insufficient standard practices associated with the need to address the question of when. Examples of these standard practices include: One-and-Done. Using a checklist of knowledge to be recalled or skills to be demonstrated, a trainee attempts the activity until successfully doing it one time, at which point the training is done. In the most extreme (and inadvisable) case, this is also a one-time activity. The trainee has proven it is possible for them to recall or do the target of the instruction, therefore the training program is over. Forever. Fixed Calendar Repetition. This is generally implemented as a variant of one-and-done that repeats on a permanently fixed interval. Another name for this practice could be Once-in-a-Fixed-While. Here the trainee demon- strates it is possible for them to recall or do the target of the instruction, usu- ally just once, and then there is no requirement for any additional training or re-training until the fixed calendar interval has passed. This interval is gen- erally measured in months or years. Annual retraining is common, such as with most government personnel required training courses. Biennial (every other year) retraining requirements also are extremely common. One of the best-known examples of this is the traditional 2-year fixed repetition interval for cardiopulmonary resuscitation (CPR) certification. The advantages of these standard, traditional scheduling practices are clear: they are easy to describe in a policy, easy for employees to remember, and easy for training managers to implement. Let’s say Anne takes a new job at a physical therapy clinic, where the policy requires all employees to be CPR certified. Anne doesn’t have her CPR certification, so she completes the required class and demonstrates a level of proficiency judged or mea- sured to be sufficient for certification. Now Anne is certified in CPR for the next 2 years and is eligible to remain on the job throughout that period. Two years later, precisely 730 days after she previously earned her certification, it expires. She is flagged in the employee database as requiring re-certification, Anne and her training manager receive a notification to this effect, and she signs up for the next available re-certification course. No problem. The answer to the question of when is to train on whatever interval the estab- lished policy says it is required, be it every 3, 6, 12, 24, or n months. The assumption implicitly reified in this kind of policy is that through- out the 2-year period Anne will be able to perform proficiently if a cardiac event requires her to use her CPR skills. Per the policy, this is true on Day 1 post-certification and is assumed to be just as true on Day 729. However, Day 730 triggers a binary re-categorization of eligibility and assumed profi- ciency. Figure 4.1 shows a representation of this policy dynamic. This is all fine and good, as long as we ignore nearly everything we have learned in the last century about the internal dynamics of the human Prospective Comments on Performance Prediction for Aviation Psychology 83

FIGURE 4.1 Binary certification state switching and implied proficiency. The same policy is always applied invariantly to all people. cognitive system. If we think critically about this kind of fixed calendar sys- tem, however, its problems are apparent. The problems center around the fact that all fixed calendar training and certification programs ignore realities of human cognition and performance that are now very well documented. The most general of these realities is individual differences. Every study ever run with human subjects shows differences across individual people. These manifest in a variety of ways, from differences in initial performance, to dif- ferences in rate of acquisition of knowledge or skill, to differences in rate of forgetting, and to differences in maximal level of performance achievement during the time period available in the study. The lesson here is that real people are really different and in myriad ways. Any policy that is uniformly applied to all people fails to take their fundamental individual differences into account. Another important disconnect between Fixed Calendar Repetition train- ing policies and the people they are intended for is that these policies fail to acknowledge what we know about the general shape of learning curves and forgetting curves. Any layperson can look at the assumed shape of the proficiency profile in Figure 4.1 and recognize that it is a bizarre departure from reality. We all know from personal experience that we forget and get worse at things over time in the absence of additional study or practice. The assumed fixed proficiency interval after certification is implausible on its face, as is the assumption that a single additional arbitrarily chosen calendar day causes proficiency to fall off a cliff. We know this even without consult- ing the scientific literature. As it turns out, however, there is an abundance of relevant science we can consult on this issue reaching back more than a century. Learning and forgetting curves do not take the shape of the dramatic step function changes in performance implied by the binary state switching in these policies. The evidence out of the cognitive science literature shows that learning and forgetting dynamics are better represented by power func- tions (Newell & Rosenbloom, 1981) or exponential functions (Heathcote, Brown, & Mewhort, 2000) than they are by step functions. There is also 84 Improving Aviation Performance considerable evidence for the existence of spacing effects (Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006) that influence the rate of forgetting and of re-learning (Walsh et al., 2018) as a function of the details of the temporal spacing of learning events. Armed with this evidence about the underlying nature of the dynamics of human cognition, we are in a position to consider more sophisticated and empirically grounded alternatives to today’s standard practices. For instance, rather than applying the same fixed calendar interval to all learners, we can imagine a future in which decisions about when to train are made adaptively on the basis of what is best for the individual learner in order to get them to or above a target level of proficiency by a particular goal date, or to keep them above that level of proficiency for as long as possible. Adaptive sched- uling of learning events requires the integration of these dynamics into a quantitative model. These models now exist, and we have the technology to support personalized scheduling at scale. In the next section, we describe an ongoing research and development effort that is turning that imagined future into a current reality.

Progress on Performance Prediction Over the past decade, research scientists within the Air Force Research Laboratory’s (AFRL’s) Cognitive Models and Agents branch have been devel- oping and iteratively refining the Predictive Performance Equation (PPE). PPE has been validated across a variety of domains, from simpler laboratory tasks, such as paired associate learning, to more complex and critical applied domains, such as medical skills training. PPE is based on three fundamental findings of the human memory system. First, performance increases with the amount of practice—the power law of learning. Second, performance drops with elapsed time since practice occurred—the power law of forget- ting. Third, memory and skill retention improves when practice is distrib- uted over time—the spacing effect. These dynamics emerge in PPE through a series of mathematical equations, which are formally documented in recent publications (Walsh, Gluck, Gunzelmann, Jastrzembski, & Krusmark, 2018; Walsh et al., 2018). Here, we represent them in a pseudo-equation form for ease of interpretation. The effects of practice and elapsed time on activation are multiplicative, as shown in Equation 4.1:

− ActivationN=⋅learning rate T decayrate (4.1)

N is the amount of practice. T is a function of the elapsed time since practice occurred. Prospective Comments on Performance Prediction for Aviation Psychology 85

To capture the idea that spaced practice produces more stable knowledge, the complete history of lags between successive practice opportunities (lagj) is used to calculate the decay rate, shown in Equation 4.2:

 1  =+⋅ DecayrateDecay interceptDecay slopeaverage  (4.2) logl()agi   Decay is a linear function of the average of one over the sum of the natural logarithm of the lags between successive practice repetitions. The effects of training history are scaled by a decay slope parameter and offset by a decay intercept parameter. PPE treats performance as a logistic function of activation, as shown in Equation 3: 1 Performance = (4.3)  thresholda− ctivation  1+ exp   scalar  The threshold and scalar parameters control the shape of the logistic function. PPE contains four free parameters: decay rate, decay slope, threshold, and scalar. These parameters relate to the efficiency and effectiveness of psycho- logical processes that vary across individuals. PPE’s four free parameters are estimated to maximize the correspondence between the model’s output and an individual’s past performance. In this way, PPE can account for individual differences in learning and retention. PPE also accounts for differences in training schedules among individuals. Features of a training schedule such as amount of practice (N), elapsed time between practice and tests (T), and spacing between repetitions (lagi) are rep- resented in PPE’s equations. In this way, PPE can account for the effects of different training schedules on learning and retention. PPE relates to but is distinct from two existing models of learning and retention. Like Anderson and Schunn’s (2000) General Performance Equation (GPE), it accounts for the effects of practice and decay using a power-law of learning and a power-law of forgetting. However, it also accounts for the effects of spacing, while the GPE does not. PPE is conceptually related to the Pavlik and Anderson (2005) ACT-R model of the spacing effect. In both mod- els, spaced practice is beneficial because it reduces decay rate.

Evaluating the Theoretical Adequacy and Applied Potential of PPE We are interested in the validity of PPE both as a mathematical theory and as a component of an innovative cognitive technology. To evaluate the theoreti- cal adequacy and applied potential of PPE, we fit the model to data from ten 86 Improving Aviation Performance

TABLE 4.1 Evaluation of PPE’s Theoretical Adequacy and Application Potential Theoretical Criteria Applied Criteria

Role of spacing on retention Account for effects of training variables on learning and retention Relationship between retention interval Operate on timescales relevant to education and optimal spacing interval and training Increased benefit of spacing with amount Make precise predictions and valid of practice prescriptions Attenuation of spacing effect with re-learning Applicable to a variety of tasks and performance measures Superadditive learning gains from repetition Tractable computational run time

classic studies of knowledge acquisition and retention (Walsh et al., 2018). These studies demonstrate a set of five robust phenomena related to the effects of amount of practice, elapsed time since practice, and temporal dis- tribution of practice on learning and retention (Table 4.1). Most fundamen- tally, including more time between practice repetitions (i.e., spacing) slows initial learning but improves retention. Additionally, the benefits of spacing increase with the amount of time before retention is tested and with the num- ber of practice repetitions. PPE accounts for these and the other basic phe- nomena revealed by studies of the spacing effect. In addition to satisfying key theoretical criteria, PPE satisfies the set of applied criteria specified in Table 4.1. PPE accounts for the effects of multiple variables related to amount of practice, elapsed time since practice, num- ber of training sessions, and the distribution of training sessions over time. Additionally, it accounts for learning and retention across timescales rang- ing from minutes to years (i.e., Bahrick, 1979). As described in the previous sections, PPE can fit existing data, predict learning and retention for differ- ent schedules, and predict future retention. The model is applicable to tasks involving both declarative knowledge and procedural skill. Declarative knowledge is fact-based, verbalizable, “know-what” knowledge, such as the fact that Dayton is the Birthplace of Aviation. Procedural skill is action-based, “know-how” knowledge, such as automatically sequencing through an instrument cross-check when that activity is highly practiced. Finally, PPE’s equations are instantiated in computationally efficient computer code. This final feature enables real-time refresher training prescriptions.

Predictive Performance Optimizer (PPO) PPE has been integrated into a patented cognitive technology called the Predictive Performance Optimizer (PPO) (Jastrzembski, Rodgers, & Gluck, 2009; Jastrzembski, Rodgers, Gluck, & Krusmark, 2013, 2014). PPO can be Prospective Comments on Performance Prediction for Aviation Psychology 87 used to capture and dynamically assess individual and team performance, predict future performance, and prescribe schedules for future training events that enhance learning stability and maximize retention. PPO allows users to visually and graphically assess and compare the impact of poten- tial training regimens with respect to performance effectiveness goals and projected training budgets. The goal of PPO is to provide teachers, trainers, and learners of all types with a new type of adaptive training assistance for performance tracking, prediction, and prescription. The design of PPO was motivated by the functional requirements of three different types of prospective users: researcher/analyst, training manager, and resource decision maker. For the researcher/analyst, PPO can be used to explore and understand the mechanics of different PPE variants. For the training manager, PPO can be used to create and evaluate training regimens and to explore how the temporal dynamics of scheduling can impact the rate of learning/forgetting, performance levels, and the stability of learn- ing. Lastly, for resource decision-makers, PPO can be used for an empirically grounded and psychologically plausible approach to training regimen con- struction and resource optimization. Overall, PPO can be utilized to address questions such as:

• When will a given level of effectiveness be achieved under a given training structure? • How much training is needed to achieve or sustain a given level of effectiveness? • When should the training be delivered? • What practice schedule best meets training goals given resource and time constraints? • Will a specific budget be adequate for the necessary amount of training?

PPO Application—CPR Study Example An overarching objective in our work to create and improve on PPO has been to develop a technology that can be used to prescribe personalized training regimens so that individuals receive training when they need it rather than according to the typical pre-planned, one-size-fits-all, fixed interval approach. To evaluate how far we are towards achieving this objective, we have collab- orated with the American Heart Association (AHA) and Laerdal Medical to evaluate the ability of PPO to prescribe personalized training schedules, for purposes of maximizing the likelihood that individual trainees will perform adequate compressions at all times (Jastrzembski et al., 2017). The collabora- tion involved a large multi-year, multi-site, longitudinal study to evaluate 88 Improving Aviation Performance the validity of PPO for prescribing cardiopulmonary resuscitation (CPR) refresher training. In the study, nursing students learned how to perform CPR compressions during four initial training sessions, which were spaced either 1-day, 1-week, 1-month, or 3-months part. Each session consisted of a pretest on CPR compressions, training with real-time performance feedback, and a posttest. Performance scores range from 0% to 100%, with a minimum threshold of 75% required for successful performance. Scores are calculated using a weighted combination of CPR subskills, including hand placement, compression depth, release height, and compression rate. The weighted per- formance score was the input data provided to calibrate parameters in PPO. Following the initial training phase of the study, participants moved to the sustainment phase, where they returned either every 3 months, every 6 months, or at a PPO-prescribed retention interval for a total of 1 year. During the sustainment phase of the study, participants in the PPO condi- tion were prescribed the date for their next session immediately following current session training. PPO prescribed training dates are based on fore- casted performance decay trajectories and scheduled for the date when it was predicted that person would cross below the 75% minimum proficiency threshold. The following PPO process was used to generate prescribed return dates: Following each session, PPO retrieved the performance data from that session and all previous sessions. It used those data for a calibration procedure that identified the best-fitting model parameters values—those that optimized the correspondence between the data and model. These parameter values were then used to generate out-of-sample predictions of CPR performance into the future, wherein PPO identified the date that performance dropped to the minimum proficiency level. Each time an individual completed a new session, they were provided with a PPO schedule recommendation. Thus, following each additional session, PPO was re-calibrated to all of the data available, and a new return date was prescribed. Using this process, if an individual’s CPR performance was consistently poor and below criterion, PPO prescribed refresher training more often (the minimum interval for the study is 7 days), and as performance improved and began to consistently exceed criterion, PPO prescribed longer retention intervals (the maximum interval for the study was 180 days). Data collection for the CPR study was completed during the writing of this chapter, and initial results are encouraging. During the 1-year sustainment phase of the study, results suggest that PPO managed risk and cost are bet- ter than the corresponding risk and cost profiles of the fixed 3- and 6-month retention interval conditions of the study. Here, we are defining risk as the estimated total time spent below the target proficiency level. This is an important measure because in real-world medical settings, there is increased risk in the system to the extent people are on the job, but not actually profi- cient at target skills. We are defining cost as the total time spent in training and assessment. This is an important measure because time is money and Prospective Comments on Performance Prediction for Aviation Psychology 89 attention is limited. In real medical settings, any time spent in training or testing is time not spent attending to patients. Relative to either of the com- peting fixed calendar intervals, PPO prescribed more practice sessions when individuals needed more practice, thus reducing risk, and prescribed fewer practice sessions when individuals need less practice, thus reducing cost. The potential value of adaptive training tools, such as PPO, may be best illustrated by estimating the long-term implications of such an approach. Based on PPO parameter estimates generated from each participant in the CPR study, we projected individual performance trajectories 20 years into the future with PPO prescribed and multiple fixed-calendar schedules. The PPO prescribed training schedules were truly adaptive, such that PPO prescribed new training sessions when predicted performance fell below the set perfor- mance threshold. The minimum PPO prescribed interval between training sessions was 1 week, and the maximum interval was unbounded. The per- formance history of each individual and the resulting PPO parameter values that best fit this history create unique learning and decay signatures for each individual. These signatures predict how much performance will increase with each additional training session and how quickly performance will decay over time. So, based on an individual’s signature, if PPO predicted that performance would fall below criterion after 1 week, or if performance was previously below threshold and remained so, then PPO prescribed a new training session in 1 week and continued to prescribe additional training sessions every week until performance stabilized above criterion. If initial performance was high and above criterion, or after performance stabilized above criterion, then PPO prescribed training when it predicted that perfor- mance would decay to criterion. Because performance decay is a function of the lag between sessions, as performance stabilized, lags became longer with increasing number of sessions. For the fixed calendar schedules, PPO predicted performance at consistently spaced intervals for 20 years. Intervals ranged from every month to every 2 years. In the fixed conditions, depend- ing upon the person’s unique learning and decay signature, they may receive more or less training than would be required to maintain proficiency. Figure 4.2 shows the average overall risk and cost associated with each of the fixed and PPO prescribed training schedules. Risk is represented as the mean number of years that participants’ performances were predicted to be below criterion. Cost is the total number of training sessions during the 20-year period. For the fixed interval schedules, the cost and risk asso- ciated with each schedule forms a smooth Pareto frontier representing the trade space between risk and cost. As the number of training sessions increases, the number of years at risk decreases. The PPO schedule falls below this frontier. Over a forecasted 20-year period, by utilizing each individuals’ unique learning and decay signature, PPO schedules train- ing so that the overall risk and cost falls below the smooth frontier gener- ated by the fixed interval schedules. To be clear, this is an average result. Some individuals would need more training and others would need less. 90 Improving Aviation Performance

FIGURE 4.2 Comparison of PPO and fixed calendar cost and risk extrapolated over 20 years.

It is also an analytic extrapolation into the future, not a completed 20-year longitudinal study. Importantly, however, this analysis suggests that per- sonalizing the scheduling of learning events with PPO provides a bet- ter enterprise-wide combination of risk reduction and cost reduction than any fixed calendar interval.

Emerging and Prospective Applications One-size-fits-all, subjectively assessed, fixed-calendar date approaches to training have been demonstrably inadequate across an array of domains and contexts (e.g., Madden, 2006; Woollard et al., 2006). This results in both inefficient use of resources and ineffective dosing of training experiences. This results in an inability to detect lack of competency, the production of a workforce that is likely unable to sustain skills between training opportuni- ties, and individuals who may be unable to apply required skills when they are most needed. It also comes at the cost of potentially over-training higher performers (wasting time and resources of those who are steadfastly com- petent) and under-training struggling performers (allowing those individu- als to perform “at risk,” or below minimum standards). In industries where “at-risk” performance can lead to lethal outcomes, as in medicine or aircraft Prospective Comments on Performance Prediction for Aviation Psychology 91 operations, it is critical to know if and when an individual is competent to perform a specific procedure. As such, it is beneficial to leverage state-of- the-art cognitive science and technology to provide guidance regarding the durability of individuals’ skills and to be able to principally prescribe when additional training ought to be delivered to ensure competence is sustained. Through the application of PPO, we have demonstrated utility in more pru- dently achieving varied and specialized training objectives in both an effi- cient (savings in time and dollars) and effective (reduction in performance risk) manner, matching training needs to individual differences based on the dynamics of human learning and forgetting. Evidence stemming from our research in personalized CPR training (described in the previous section) has opened the doors to an array of other applications for which stakeholders have asked that PPO be applied. Because PPO is a computational cognitive modeling tool, it requires that learning and forgetting curves be observed in performance measures within specific tasks. As such, the first step towards applying PPO to new domains is to ensure that minimal performance measurement standards (e.g., quantita- tive, objective data that result in discrimination of learning across time and across individuals of varying skill levels) are met. And as it stands, there are relatively few domains where these type of data already exist.

Medical Applications Fortunately, the medical community is increasingly shifting from subjec- tive assessment to proficiency-based criteria for specific medical tasks (e.g., Cook, Pedley, & Thakore, 2006; Stefanidis, Korndorffer, Markley, Sierra, & Scott, 2006). Given this cultural willingness to change, we have partnered with the United States Air Force School of Aerospace Medicine (USAFSAM) to help revamp performance measures for high-risk, low-volume tasks, including trauma assessment, intracranial pressure (ICP) monitoring, and advanced cardiac life support (ACLS). We have developed and are continuing to validate proficiency-based metrics for each of these areas and are working to investigate PPO’s applicability for helping to (1) ensure Air Force medical care providers initially demonstrate skill profi- ciency in the schoolhouse and (2) sustain those skills as they move into the squadron and deployed environments using a learning management sys- tem (LMS) infrastructure. Use of an LMS affords PPO the opportunity to track individual performance histories and automatically prescribe training refreshers as a function of predicted skill decay. We have integrated PPO into both commercial and industry-specific LMS to date, enabling tailored training for both hardware-based simulation training (e.g., human patient simulators [manikins]) and virtual simulation training (e.g., synthetic task environments). We are also harnessing the affordances of LMS track- ing to develop a semi-autonomous trauma training simulation system, in which a higher-fidelity human patient simulator (e.g., a manikin) will 92 Improving Aviation Performance be instrumented to provide critical objective performance measures for provider-centric performance effectiveness scores. PPO may then allow medical care providers to train to proficiency in a range of specific medi- cal tasks, under minimally supervised conditions, and receive automated prescriptions regarding when to return to sustain each specific competency. This type of an approach easily lends itself to other domains where simula- tion environments are used to train students and embedded measurement systems have been (or can be) developed to track and predict proficiency.

Manufacturing Safety One of these emerging domains with complex and sophisticated hardware is in the field of manufacturing safety. This application uses instrumented platforms that provide performance feedback to operators regarding the safety of their performance. PPO’s role in these platforms is to personalize the learning by reducing real-time, automated performance feedback when individuals have demonstrated consistent competency of a specific skill and by determining what knowledge gaps need to be remediated as a function of observed performance lapses. This system may also move in the oppo- site direction, such that if knowledge gaps are observed and remediated, the operator would then be given more targeted messaging on the platform itself to detect whether skills need to be remediated as well. This domain provides a unique opportunity to integrate both knowledge and performance-based learning measures, to optimally remediate issues observed in each of those respective domains.

Virtual Training The use of virtual simulation for training holds widespread appeal due to its cost-effective sustainability and its ability to be accessed on-demand, any- time, and anywhere, in a distributed fashion. The affordances of this train- ing approach lend themselves perfectly to personalized scheduling. Given the complexity of skills required in Air Force domains and contexts, the decreased Air Force manpower resulting in increased on-the-job demands, and the fiscal and time limitations associated with live training opportuni- ties; distributed learning using virtual simulation may be a viable way to reduce skill decay, keep skills current, and bridge the gap between class- room and live training. Through a collaboration with USAFSAM, we devel- oped a 3-dimensional, dynamic, proficiency-based technology-enhanced simulation to train Air Force medical professionals on the Alaris MedSystem III Infusion pump, a piece of equipment that is flight-approved, only used in aeromedical evacuation (AE) environments, and is associated with high failure rates in the field (Correspondence with Air Mobility Command, 2015). Prospective Comments on Performance Prediction for Aviation Psychology 93

This simulation was designed to offload classroom study and practice to a virtual prerequisite, so that the limited classroom time students have may be optimally geared towards training more complex scenarios, and so medical care providers may refresh this skill on-demand, anytime, and anywhere. We have successfully integrated PPO into USAFSAM’s LMS to deliver auto- mated, personalized prescriptions for future training to help individuals either acquire or sustain proficiency. The proof-of-concept we have demonstrated with regard to the integra- tion of PPO into an LMS holds wide implications for calendar-based train- ing throughout the Air Force, the Department of Defense, and the civilian and commercial sectors more broadly. We are now working with the Air Force to overhaul Total Force Training—first beginning with resiliency, sexual assault prevention response (SAPR), and suicide prevention. These training modules are currently classroom-trained on an annual basis and have no knowledge checks or assessments (subjective or objective). We are identifying areas of low-hanging fruit where performance may be objec- tively measured in a virtual environment, leveraging the power of PPO per- sonalization in an LMS. Similar to development of objective metrics in the medical domain, we anticipate providing learners with a sequence of varied scenarios, for which knowledge and skills must be applied in a procedural- ized and measurable way. Pilot work to investigate the validity of proposed metrics are forthcoming.

Pilot Training Finally, an emerging opportunity lies in the realm of virtual reality study for undergraduate pilots, through a collaboration with the Air Education and Training Command’s (AETC) Undergraduate Pilot Training—Next (UPT-Next) program. Students will be given virtual reality headsets, which they are expected to use to practice offline, outside of the classroom, on basic piloting techniques. We are working with AETC to lay a performance mea- surement foundation for which PPO may ultimately be applied. As it cur- rently stands, they have developed a data engine in a brute force manner, and thereby, have massive amounts of data amassed to try to make sense of. Given our success with applications in the medical domain, we have proposed using theory-driven critical path analyses to identify objective performance metrics and are beginning to work with subject-matter experts to determine how spe- cific pilot procedures could be quantified and objectively measured so that learning may be better assessed, tracked, and predicted. Assuming that valid measures are identified, PPO will then provide principled guidance to stu- dents for when to study and practice most effectively outside of class. There is a similar effort underway in collaboration with AETC and the Defense Language Institute to explore the application of PPO in linguist training. 94 Improving Aviation Performance

Enduring Challenges and Prospects for Performance Prediction in Aviation Psychology As should be apparent from the previous section on emerging applications, PPO is intentionally designed as a domain-general technology. This is not to suggest that it is without limits or that it is as easily applied to any domain as it is all others, but it was not specifically created for use in any one context. Whether medicine, manufacturing, or maneuvering an aircraft, the same general cognitive technology should be applicable. Therefore, at least in theory, and probably in practice, human performance prediction is possible throughout areas of knowledge and skill in aviation. We can only make this statement prospectively, of course, because it hasn’t yet been tried in all areas of aviation. However, there are already some lessons learned and enduring challenges that should inform expectations and can drive requirements. We finish the chapter with a description of those.

Validated Measures Quantitative performance predictions require quantitative performance measures, and there is nothing within the technology we have developed that informs what those measures should be. It is up to subject-matter experts from within the domain of interest, perhaps in collaboration with those trained in cognitive/work/mission analysis methods, to determine what to train, how to train, and what constitutes valid measures of performance. We strongly advocate for objective measures of performance in order to avoid the social pressure artifacts often seen in subjective ratings, and because objec- tive measures frequently offer a finer resolution scale and therefore greater measurement sensitivity. Canonical examples include time, distance, and accuracy, all of which offer a finer resolution measurement of performance than a 5- or 7-point Likert scale or a pass/fail assessment. To the extent there is a validated, objective measurement of the aviation knowledge or skill of interest, the probability increases that PPO can be used to make a usefully accurate prediction about future performance.

Item Sequencing within Curricula The original development of PPO assumed a single piece of knowledge or a single skill as the focus of optimization. As various unanticipated appli- cations emerged, however, domain experts, training managers, and other researchers have challenged us to try to use the technology for item sequenc- ing within curricula. This is a different goal that orients the calculations in PPO around a combination of the “what?” and “when?” questions to ask something like “what portion of the curriculum should I study next/now?” Prospective Comments on Performance Prediction for Aviation Psychology 95

Vocabulary learning in second language acquisition is an example context for this kind of question, and the general challenge is one of curriculum management. This challenge is not at all unique to PPO or to the domain of second language learning. It is a general challenge for any education or training context in which there is a desire for an adaptive curriculum. We do not yet have documented successes of this sort with PPO, but new research and development is underway, with some good progress.

Earlier Predictions A persistent challenge in performance prediction is that at least some data must be available for use in calibrating free model parameters before it can be used to predict into the future. Conceptually, for each individual pre- diction, we separate these data into a calibration set and a prediction. Data already in hand (the calibration set) are used to find best fitting param- eters to account for that known performance profile, and those parameters drive an extrapolation into the future at time t. Frequently we find that at least four calibration data points are required for good quality predictions. The problem with this is that in many real-world domains, and especially in domains where this kind of performance prediction technology might be most valuable, data collection can be very expensive. Undergraduate students in psychology department participant pools are often free, whereas first responders, surgeons, linguists, and pilots certainly are not. This places a premium on the possibility of making accurate predictions with as small a calibration set as possible. Research is underway to explore options for achieving this (Collins et al., 2017). In the meantime, good per- formance predictions in aviation are likely to require a minimum of four calibration data points.

Robustness to High Schedule Variability An often-overlooked feature of much of the scientific literature on learning, retention, and spacing effects is that the time lag between learning events and between learning and subsequent testing tends to be fairly uniform, at least within a particular study. Indeed, because much of that literature originates from brief laboratory studies, it is also often the case that the time lags are quite short, with all of the learning and testing occurring during one or two sessions within an hour. The real world is quite different. Time lags can be very long. Training may unfold over multiple sessions. Learning and reten- tion schedules can be highly variable. It is commonly the case, for instance, that there will be an initial learning session that takes just a few minutes and is followed by a lag of days or weeks before returning to that material for additional study or test. And future sessions may occur following lags that can vary substantially, as well. Very high temporal variability within 96 Improving Aviation Performance a domain presents a challenge to quantitative prediction methods that are sensitive to timing details, such as PPO, but may also be an affordance from the perspective of practical implementation of adaptive scheduling.

Quantifying Certainty in Predictions Accurate predictions of future performance are necessary but not sufficient for using PPO in an applied setting. To properly assess risk, the training manager must also know the degree of certainty about those predictions (Jastrzembski, Addis, Krusmark, Gluck, & Rodgers, 2010). Research is cur- rently underway to identify and test methods for computing prediction intervals that quantify the range of performance predictions expected at future points in time given uncertainty in the data.

Conclusion In this chapter, we summarize research and describe applications for a new cognitive technology, PPO, which is being used in exploratory applications to personalize the delivery of training. PPO provides teachers, trainers, and learners with a tool they can use to make more informed decisions about when training should be delivered. PPO has been developed as a domain gen- eral tool. It is most relevant for mission critical, high consequence tasks for which performance is dependent on knowledge and skills that are not rou- tinely practiced or used but must be maintained. In many contexts, knowl- edge and skill acquisition is incremental and cumulative, so what you learn today is needed for what you do next week. When knowledge and skills are frequently practiced in this way, then the value of PPO may be more limited, although we have begun to explore applications in these contexts, as well. However, when knowledge and skills are not routinely practiced, then our research suggests that PPO reduces both risk and cost. This is especially rel- evant to domains like aviation where both the consequences of poor perfor- mance and the costs of training are so high.

Acknowledgments Writing of this chapter was funded by the Air Force Research Laboratory. The authors thank AFRL for supporting the performance prediction research described in this chapter, and also thank the American Heart Association and Laerdal Medical for the collaborative field study validation of PPO in Prospective Comments on Performance Prediction for Aviation Psychology 97 the domain of CPR skills. Michael Collins and Siera Martinez provided con- structive feedback on an earlier version of this chapter, which we very much appreciated. We thank the editors, Pamela Tsang and Michael Vidulich, for their vision, commitment, and patience, and for the opportunity to contrib- ute this chapter.

References Anderson, J. R., & Schunn, C. D. (2000). Implications of the ACT-R learning the- ory: No magic bullets. In R. Glaser (Ed.), Advances in instructional psychology: Educational design and cognitive science (Vol. 5, pp. 1–34). Mahwah, NJ: Erlbaum. Bahrick, H. P. (1979). Maintenance of knowledge: Questions about memory we forgot to ask. Journal of Experimental Psychology: General, 108, 296–308. Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed prac- tice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132, 354–380. Clark, R., & Wittrock, M. C. (2000). Psychological principles in training. In S. Tobias & J. D. Fletcher (Eds.), Training and retraining: A handbook for business, industry, government, and the military (pp. 51–84). New York: Macmillan. Colegrove, C. M., Rowe, L. J., Alliger, G., Garrity, M., & Bennett, W. Jr. (2009). Defining the training mix: Sorties, sims, and distributed mission operations.Proceedings of the interservice/industry training, simulation, and education conference. Orlando, FL. Collins, M. G., Gluck, K. A., Walsh, M. M., & Krusmark, M. A. (2017). Using prior data to inform initial performance predictions on individual students. Proceedings of the 39th annual conference of the cognitive science society. London, UK. Cook, R., Pedley, D., & Thakore, S. (2006). A structured competency-based training programme for junior trainees in emergency medicine: The “Dundee Model.” Emergency Medicine Journal, 23, 18–22. Gael, S (Ed.). (1988). The job analysis handbook for business, industry, and government. Hoboken, NJ: Wiley. Heathcote, A., Brown, S., & Mewhort, D. J. K. (2000). The power law repealed: The case for an exponential law of practice. Psychonomic Bulletin and Review, 7, 185 –207. Jastrzembski, T. S., Addis, K., Krusmark, M., Gluck, K. A., & Rodgers, S. (2010). Prediction intervals for performance prediction.Proceeding of the 10th international conference on cognitive modeling, Philadelphia, PA, pp. 109–114. Jastrzembski, T. S., Rodgers, S., & Gluck, K. A. (2009). Improving military readi- ness: A state-of-the-art cognitive tool to predict performance and optimize training effectiveness.Proceedings of the interservice/industry, simulation, and education con- ference (I/ITSEC) annual meetings (pp. 1498–1508). Orlando, FL: National Training Systems Association. Jastrzembski, T., Rodgers, S., Gluck, K. A., Krusmark, M. A. (2013). U.S. Patent No. 8568145B2. Washington, DC: U.S. Patent and Trademark Office. Jastrzembski, T., Rodgers, S., Gluck, K. A., Krusmark, M. A. (2014). U.S. Patent No. 8777628B2. Washington, DC: U.S. Patent and Trademark Office. 98 Improving Aviation Performance

Jastrzembski, T., Walsh, M. M., Krusmark, M., Kardong-Edgren, S., Oermann, M., Dufour, K., Millwater et al. (2017). Personalizing training to acquire and sus- tain competence through use of a cognitive model. In D. D. Schmorrow & C. M. Fidopiastis (Eds.), Augmented cognition. Enhancing cognition and behavior in complex environments (pp. 148–161). Cham, Switzerland: Springer International Publishing AG. Koedinger, K. R., Booth, J. L., & Klahr, D. (2013). Instructional complexity and the science to constrain it. Science, 342, 935–937. Kyllonen, P. (2000). Training assessment. In S. Tobias & J. D. Fletcher (Eds.), Training and retraining: A handbook for business, industry, government, and the military (pp. 525–549). New York, NY: Macmillan. Madden C. (2006). Undergraduate nursing students’ acquisition and retention of CPR knowledge and skills. Nurse Education Today, 26, 218 –27. Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition and the law of practice. In J. R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 1–55). Hillsdale, NJ: Erlbaum. Pavlik, P. I., & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary memory: An Activation-based model of the spacing effect. Cognitive Science, 29, 559–586. Schraagen, J. M., Chipman, S. F., & Shalin, V. L. (Eds.). (2000). Cognitive task analysis. Mahwah, NJ: Erlbaum. Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78, 153–189. Stefanidis, D., Korndorffer, J. R., Markley, S., Sierra, R., & Scott, D. (2006). Proficiency maintenance: Impact of ongoing simulator training on laparoscopic skill reten- tion. Journal of the American College of Surgeons, 202, 599–603. Tsang, P. S., & Vidulich, M. A. (Eds.). (2003). Principles and practice of aviation psychol- ogy. Mahwah, NJ: Erlbaum. Vicente, K. (1999). Cognitive work analysis: Toward safe, productive, and healthy computer-based work. Mahwah, NJ: Erlbaum. Walsh, M. M., Gluck, K. A., Gunzelmann, G., Jastrzembski, T., & Krusmark, M. (2018). Evaluating the theoretical adequacy and applied potential of computational models of the spacing effect. Cognitive Science, 42(S3), 644–691. Walsh, M. M., Gluck, K. A., Gunzelmann, G., Jastrzembski, T., & Krusmark, M., Myung, J. I., Pitt, M. A. & Zhou, R. (2018). Mechanisms underlying the Spacing Effect in learning: A comparison of three computational models. Journal of Experimental Psychology: General, 147(9), 1325–1348. doi:10.1037/xge0000416. Wilson, M. A., Bennett, W., Jr., Gibson, S. G., & Alliger, G. M. (Eds.). (2012). The hand- book of work analysis: Methods, systems, applications and science of work measurement in organizations. New York, NY: Routledge Woollard M, Whitfield, R., Newcombe, R., Colquhoun, M., Vetter, N., & Chamberlain, D. (2006). Optimal refresher training intervals for AED and CPR skills: A ran- domised controlled trial. Resuscitation, 71, 237–247. 5 Analysis of Work Dynamics for Objective Function Allocation in Manned Spaceflight Operations

Martijn IJtsma, Lanssie M. Ma, Karen M. Feigh, and Amy R. Pritchett

CONTENTS Background: Human–Machine Function Allocation ...... 100 Computational Simulation Framework ...... 102 Work Models ...... 102 Agent Models ...... 105 Work Dynamics ...... 105 Metrics...... 105 Case Study ...... 106 Results ...... 110 Discussion ...... 115 Future Work ...... 116 Conclusions ...... 117 Acknowledgments ...... 117 References ...... 118

Future manned spaceflight missions are likely to see significant changes in concepts of operations. In particular, communication delays from Earth will be on the order of minutes, requiring a shift from operations centered on the Mission Control Center (MCC) to a crew-centered operation. With this shift, the crew will autonomously need to make decisions and execute tasks without support from the ground. The astronauts can also potentially be supported by a variety of robotic capabilities. These two factors will require novel function allocations that shift many decisions from MCC to the astro- nauts on the vehicle, and that shift many on-board tasks from astronauts to robotic aids. Several challenges must be considered in function allocation for the space domain. First, the astronauts’ workload can be a critical factor. Second, as robots will be working together with the astronauts, their interaction needs to be considered carefully. Possible concerns are, amongst others, long idle

99 100 Improving Aviation Performance times for astronauts as they wait for others to complete their tasks, high monitoring demands on astronauts, and excessive communication require- ments to coordinate work. The design trade-space for the range of potential function allocations in spaceflight operations is difficult to predict. Thus, there is a need for meth- ods that can help evaluate different function allocations early in the design of potential space missions and robotic capabilities. To this end, this chap- ter outlines extensions to a fast-time computational simulation framework called Work Models that Compute (WMC) that has previously been applied to analyze function allocation in aviation (Pritchett, Kim, & Feigh, 2014a, 2014b). With these extensions, WMC can provide quantitative and objective evaluation of human–robot function allocations in envisioned spaceflight operations. After reviewing the literature on human–machine function allocation in aviation and space, we detail the WMC framework. Then, we apply WMC to a case study analyzing six different function allocations in an on-orbit maintenance scenario. Throughout the case study, we limit the number of tasks required simultaneously of each astronaut to relate different function allocation strategies to taskload saturation.

Background: Human–Machine Function Allocation Function allocation is the process of distributing tasks over multiple agents. Feigh and Pritchett (2014) and Woods (1985) have noted that function alloca- tion needs to consider two dimensions: allocation of authority and respon- sibility. Authority defines which agent will perform the work, whereas responsibility identifies who is accountable for its outcome. In manned mis- sions to date (e.g., International Space Station, Apollo missions) most of the authority and responsibility for decision-making has been allocated to the MCC, whereas the authority and responsibility for tasks requiring hands-on manipulation in space is allocated to the crew. There are two main reasons behind this allocation: First, the command structure is rooted in military operations, in which decision-making and responsibility is centralized (Diggelen, Bradshaw, Grant, Johnson, & Neerincx, 2009). Second, the MCC has the expertise and manpower to evaluate many decision options in detail (Watts et al., 1996). When we travel to destinations further away from Earth, communication delays will become a critical factor, especially for tasks that require real-time communication. Lester and Thronson (2011) have shown that tasks demand- ing real-time human cognition and challenging perceptual motor control can only be reliably performed for latencies under 300–400 ms, which they term as the “cognitive horizon.” Additionally, significant delays can complicate Analysis of Work Dynamics for Objective Function Allocation 101 the natural communication between space crews and MCC and degrade the performance of such distributed teams (Fischer & Mosier, 2014). Shifting functions from MCC to astronauts and robots onboard the space- craft will mitigate communication delays, but introduces the challenge of managing this work with limited resources. Here, one of the concerns is excessive workload for some astronauts. Perceived workload limits the number of tasks that can be allocated to any astronaut, both in the short term (e.g., too many tasks at the same time might lead to cognitive over- loading) and in the long term (e.g., an astronaut may get too fatigued under sustained high workload conditions). Similarly, robotic or automation agents might be limited in the number of tasks they can perform simultaneously due to constraints in the machine’s processing power or its physical capabili- ties (e.g., a robotic arm can only grab one object at a time). One of the challenges in the evaluation of function allocation for the space domain is that the interplay of availability of agents and resources, interdependencies of tasks, and communication with time delays results in emergent work patterns that are difficult to predict from static analysis of function allocations alone. These emergent patterns, in this chapter referred to as work dynamics, have a significant impact on the design trade-space for function allocation. A good function allocation method, thus, does not just consider constraints of the agents, resources or the work, but needs to take into account the interplay of these constraints. A good function allocation method should, furthermore, recognize that function allocation is not just a matter of divvying up tasks; in a multi-agent system, distributing the work requires the agents to coor- dinate, which results in additional teamwork (Feigh & Pritchett, 2014). This teamwork is a result of the function allocation. For example, in an authority-responsibility double bind (Woods, 1985), the agent that is respon- sible for an action needs to monitor the agent that is authorized to perform the action (Bhattacharyya & Pritchett, 2017; Pritchett & Bhattacharyya, 2016). Likewise, when a human and robotic agent are involved in joint activ- ity through tele-operation or command sequencing, teamwork is required (Fong, Zumbado, Currie, Mishkin, & Akin, 2013). In the space domain, earlier work on function allocation ranges from optimizing cost or reliability in human–robot teaming (Shah, Saleh, & Hoffman, 2007) to 1-G full-scale testing of different function allocations in a space assembly task (Rehnmark, Currie, Ambrose, & Culbert, 2013). Rodriguez and Weisbin (2003) developed an analytic model for task allocation that breaks down work into functional primitives, which are allocated to agents based on a trade-off between estimated performance and resources requirements. Howard describes a task allocation method that uses performance metrics and performance evaluation to determine an “optimal” allocation of tasks that min- imizes mental workload and maximizes system performance (Howard, 2006). In aviation, there is a notable body of work on pilot-automation function allocation. Some of the methods used here include the Functions Allocation 102 Improving Aviation Performance

Methods (FAME) framework (Bye, Hollnagel, & Brendeford, 1999) and the Playbook paradigm (Miller & Parasuraman, 2003). An important contribu- tion of the FAME framework is that, to reassign functions fluently, mod- els of function allocation should employ a common representation of the work, independent of the performing agent. Although in the strict sense of the word, the Playbook paradigm is not a method for evaluating function allocations, Miller and Parasuraman contributed the idea that work can be allocated to an automation agent at different levels of task decomposition, moving away from more aggregate descriptors, such as the levels of automa- tion paradigm. Unfortunately, these methods fall short in their consideration of the interplay between constraints in the work and of the agents, and thus do not adequately address the need to analyze the emergent effects in func- tion allocation. More recent work has, therefore, focused on the application of computational simulation (Pritchett et al., 2014a, 2014b). The simulation framework WMC developed in these works is particularly aimed at identify- ing the emergent effects of work. Pritchett et al. (2014b) used WMC to analyze function allocations between a flight crew and automation. This framework was later applied to study the distribution of authority and responsibility in novel concepts of operations in the air traffic management system (Pritchett, Bhattacharyya, & IJtsma, 2016). In this chapter, the WMC framework is extended to better capture the work dynamics and constraints that characterize future concepts of space- flight operations, such as the limited availability of resources and agents, and requirements for coordination between agents that are spatially sepa- rated (communication delays) and have very different capabilities (astro- nauts versus robots).

Computational Simulation Framework The WMC computational simulation framework was originally developed to evaluate concepts of operation in dynamic environments (Pritchett, Feigh, Kim, & Kannan, 2014). The framework consists of two types of models: work models and agent models. These models and a subset of the possible interac- tions between them are represented in Figure 5.1.

Work Models In the WMC framework, work is modeled separately from the agent mod- els, which allows for fluid reallocation of work without having to modify agent-specific code. Work models consist of resources, which represent the work environment, and actions, which are descriptions of work’s interac- tion with and on the environment. Depending on the desired level of detail Analysis of Work Dynamics for Objective Function Allocation 103

FIGURE 5.1 Relationships between the two types of models. Auth. denotes authority for an action and Resp. the responsibility. PR and IR represent physical and information resources, respectively. in the analysis, resources and actions can be fairly high-level or detailed descriptions of the environment and the work. The work model is shown as a dashed block in Figure 5.1. Resources. Resources represent the information and physical entities in the work environment. Information resources (IR) describe the state of the environment, shown as pentagons in Figure 5.1. For example, the speed of a vehicle or the location of an agent can be modeled as an information resource. Physical resources (PR) represent artifacts in the environment that can be manipulated and/or used by performing work, shown as the circle in Figure 5.1. At a lower level, a physical resource consists of information resources that specify the artifact’s characteristic. For example, a wrench can be modeled as a physical resource, which has the information resources specifying its availability, location, and other attributes. Actions. Work is modeled through three types of relationship with resources (see the arrows between actions and resources in Figure 5.1): actions can get information resources, emulating the retrieval of information from the environment, and set information resources, simulating manipu- lation of the environment. Actions can additionally use physical resources, which make them unavailable to other actions for the duration of the usage. The duration of each action can be specified, allowing for action durations based on which agent is performing it. A work model consists of taskwork actions, which are all the actions that need to be performed to fulfill the scenario’s objectives, irrespective of the 104 Improving Aviation Performance function allocation. A function allocation is then defined by denoting an authorized agent and a responsible agent for each taskwork action. Once the function allocation is specified during a simulation, WMC automatically engenders the teamwork actions required to coordinate the taskwork allo- cated to the agents, see Figure 5.2. Teamwork required to verify another agent’s actions is a consequence of an authority-responsibility mismatch, when one agent is authorized to per- form an action but another is responsible for the outcome. When such a mismatch is detected in the function allocation, WMC automatically engen- ders a monitoring and/or confirmation action that is then allocated to the responsible agent. Monitoring actions occur in parallel with the parent actions, whereas confirmation actions are scheduled after the parent action and require the authorized agent to wait for confirmation before it can con- tinue with its next action. The desired type of verification (monitoring/confirmation) can be speci- fied on a per action basis within the function allocation. For example, certain critical actions can be identified that warrant confirmation from a human operator before a robot is allowed to continue. Likewise, monitoring might be the preferred verification method when the responsible agent has real-time access to the authorized agent’s state, whereas confirmation may be more appropriate when there are communication delays. Teamwork associated with joint activity is required when one agent does not have the ability to complete an action independently. Using constructs defined in human–robot interaction (Fong et al., 2013), when a robot can- not complete an action alone, the simulation can engender a control or a command action to be executed by the responsible human agent. Control actions emulate the direct (tele-)operation of a robot, in which the control action needs to occur in parallel with the parent action. Command actions need to be executed before their parent actions, which is more useful for time-delayed communication between the controlling and executing agent.

FIGURE 5.2 Taskwork and engendered teamwork actions for verification and joint activity. Analysis of Work Dynamics for Objective Function Allocation 105

Agent Models The WMC framework can use any type of agent model that accepts calls from the simulation framework to execute actions during run-time. In this chapter, we will apply a performance model that has simple heuristics asso- ciated with taskload saturation and availability of resources. First, this performance model specifies a taskload saturation limit on the number of actions that it can execute simultaneously. Whenever an agent is at its taskload saturation limit, newly incoming actions will be delayed until capacity is freed up. The agent model also accounts for the satura- tion of responsible agents: when an incoming action has teamwork actions associated with it that require simultaneous execution (monitoring actions for authority-responsibility mismatches and control actions for direct tele-operation), the agent model checks whether the responsible agent is available too. Thus, the agent model only executes an action when both the authorized and responsible agent are available. Second, when passed an action to execute, the agent model checks whether all required physical resources are available, such as tools. If not, the agent model will delay the action until such resources become available again.

Work Dynamics When a scenario is simulated in WMC, the simulation framework updates the simulation time by stepping through the actions in the work model in time sequence, calling each corresponding agent model to execute each action at the appropriate time. The work dynamics then follow from how actions are executed. There are a number of mechanisms at play. First, sequential relationships between actions are modeled as actions scheduling their follow-up actions. Thus, when an agent executes an action, known follow-up actions (including confirmation actions) are automatically added to the simulation’s action list. Second, relationships requiring actions before another are detected by the simulation’s core, which automatically adds the earlier actions to the action list before the parent action, then executing them in the correct order. Third, the agent models apply constraints due to taskload saturation and resource availability. If actions are delayed by the agent model, they are re-scheduled for the earliest next feasible time. Finally, the agent models add any required teamwork actions to the action list.

Metrics The default output of WMC are detailed traces of the execution of the actions, specifying when each action is performed and by who. From these traces, WMC can compute a range of metrics of function allocation. These metrics include, amongst others, mission duration, idle time, and total time on tasks, which can provide insight in the efficiency of work, and how long agents 106 Improving Aviation Performance are occupied and thus unavailable for other missions. Taskload measures quantify how busy agents are, both in the aggregate as well as through time. When the agent models do not impose a taskload saturation limit, such mea- sures can identify when the default taskload imposed on any agent might create excessive workload. Additionally, any instance in which one agent gets an information resource that has previously been set by another agent is logged, indicating infor- mation transfer between the agents. Likewise, instances where an agent requires a physical resource that has previously been used by another agent are logged as transfers of physical resources.

Case Study To demonstrate the methodology, we analyze several function allocations for an on-orbit maintenance mission involving several robots and human astronauts. The analysis presented in this case study would be typical for an early exploration of the function allocation trade-space. Thus, we start with simple work and agent models. Later in the design process, when the concept of operation is crystalizing, higher fidelity models can be tested in WMC to provide more detailed evaluation. This scenario focuses on three panels on the outside of the spacecraft that need to be inspected and possibly replaced. A simple hierarchical task analy- sis (HTA) was performed to decompose the work into functions and actions, as seen in Figure 5.3. These functions and actions were subsequently modeled in a WMC work model. Other work analysis methods can be used if they are deemed a better fit with the purpose of the analysis or the scenario of interest. In the HTA, we made several assumptions with regards to temporal and resource constraints within this scenario. Inspection is assumed to be a lin- ear sequence of obtaining the required tools, applying these tools and sub- sequently storing them away, as shown in Figure 5.4. Each action in this sequence requires the use of a toolset, which will be unavailable to other actions at that time. It is assumed that each panel is in a different location, and thus traversal actions are required to move the inspecting agent from one panel to the other. This also means that inspection and repair actions for two different panels cannot be performed by the same agent simultaneously. Replacement of a broken panel entails obtaining the required tools, remov- ing and disposing of the broken panel, retrieving and emplacing a new panel from the inventory, and finally storing away the tools. The precedence rela- tionships between these actions result in two sequences that are connected to each other, as seen in Figure 5.5. The first sequence consists of getting the repair tools, removing the broken panel, and disposing of it (4.1 – 4.3 – 4.5). The second sequence consists of getting a new panel, emplacing it, and storing away the tools (4.2 – 4.4 – 4.6). The sequences are interconnected as Analysis of Work Dynamics for Objective Function Allocation 107

FIGURE 5.3 Hierarchical task analysis for the on-orbit maintenance scenario.

FIGURE 5.4 Precedence relationships for inspection actions.

FIGURE 5.5 Precedence relationships for repair actions. 108 Improving Aviation Performance emplacement of a new panel can only start once the broken panel has been removed. Furthermore, the actions 4.1, 4.3, 4.4, and 4.6 all require a toolset that cannot be used simultaneously by multiple actions. The actions are modeled in WMC, together with physical resources repre- senting the toolsets, the panels to be inspected, and new panels for replace- ment. Information resources include characteristics of the physical resources, for example, the condition of a panel and whether it is screwed on or not. There are six agents that work can be allocated to: an extra-vehicular astro- naut (EV), an intra-vehicular astronaut (IV), a Remote Manipulator System (RMS) (similar to the Canadarm), two humanoid robots (e.g., Robonaut), and the MCC. Based on reasonably assumed capabilities of each agent, we came up with six possible function allocations, as shown in Table 5.1. FA1 and FA2 are reasonably attainable with current capabilities, whereas FA3 to FA5 assume more futuristic capabilities of the robots. FA4-A and FA4-B differ in the allocation of responsibility for the robotic operations: in FA4-A, the IV astronaut is responsible, and in FA4-B the MCC is responsible. A communica- tion delay of 10 seconds is assumed between MCC and the maintenance site, which could be representative of a spacecraft early in a transfer orbit to Mars. Differences in capabilities between the astronauts and the robots require a human (EV, IV or MCC) to be involved with all taskwork performed by robots. In modeling this form of teamwork, we assumed the RMS can be controlled by a human operator through direct tele-operation. The human- oid robots instead require commands to be specified prior to starting each action. Thus, WMC will engender control actions for the RMS’s operation and command actions for the humanoid operations. All robots require a human to verify each of their actions. In this case study, we assumed that real-time monitoring is the preferred verification method, unless there is a time delay (for FA4-B), in which case the MCC will need to provide posterior confirmation before the robots are allowed to start their next action. We furthermore assumed that humans are responsible for their own actions and do not need to be monitored by other agents. Some of the above assumptions can be revisited and evaluated for their validity when doing more detailed evaluation. For example, a more detailed analysis can take into account how astronauts need to monitor or confirm each other’s actions, to simulate the crosschecking and redundancy built into actual operations. Possible concerns with taskload saturation can be analyzed in two ways. First, we can set a taskload saturation limit for each agent and analyze the resulting work dynamics of each candidate function allocation. In such test cases, the work dynamics are governed by interactions between the limits of the agent model and the demands of the work model. Second, without a taskload limit in the agent models, the simulation results reflect the inherent taskload given to the agents, identifying possible workload spikes and drops as caused by taskload inherently imposed on each agent. For this case study, the agents are simulated with three different taskload saturation levels: the agents are capable of performing one (taskload limit = 1), Analysis of Work Dynamics for Objective Function Allocation 109 FA5 Hum1/IV Hum2/IV Hum1/IV Hum1/IV Hum2/IV Hum2/IV Hum1/IV Hum2/IV Hum1/IV Hum2/IV Hum1/IV Hum1/IV Hum1/IV FA4-B Hum1/MCC Hum2/MCC RMS/MCC Hum2/MCC Hum2/MCC RMS/MCC Hum2/MCC Hum1/MCC Hum1/MCC Hum1/MCC Hum1/MCC Hum1/MCC Hum1/MCC Future Day Capabilities FA4-A Hum1/IV Hum2/IV RMS/IV Hum2/IV Hum2/IV RMS/IV Hum2/IV Hum1/IV Hum1/IV Hum1/IV Hum1/IV Hum1/IV Hum1/IV FA3 EV/EV Hum1/IV RMS/IV EV/EV EV/EV RMS/IV Hum1/IV Hum1/IV EV/EV Hum1/IV EV/EV EV/EV EV/EV FA2 EV/EV EV/EV RMS/IV EV/EV EV/EV RMS/IV EV/EV EV/EV EV/EV EV/EV EV/EV EV/EV EV/EV . EV and IV Are RMS the and Astronauts, is Extra-Vehicular Intra-Vehicular respectively. the > Current Capabilities FA1 EV/EV EV/EV EV/EV EV/EV EV/EV EV/EV EV/EV EV/EV EV/EV EV/EV EV/EV EV/EV EV/EV responsible agent responsible < / > authorized agent panel 5.1 Enter dock 4.6 Store repair tools repair 4.6 Store 4.5 Dispose of broken 4.5 Dispose of broken 4.4 Emplace new panel 4.3 Remove broken panel 4.3 Remove broken 4.2 Get new panel 4.1 Get repair tools 4.1 Get repair 3.3 Store inspection tools 3.3 Store 3.2 Apply inspection tools 3.1 Get inspection tools 2.1 Traverse 1.2 Leave dock 1.1 Prepare Actions TABLE 5.1 Candidate Function Allocations the for On-Orbit Maintenance Scenario. The Function Allocations Are Described in the Form < Remote Manipulator System, Hum1 and Hum2 Denote the Two Humanoid is Robots the Manipulator and andRemote Hum1 Hum2 MCC Mission Denote System, Control the Center Two 110 Improving Aviation Performance two (taskload limit = 2), or infinitely many (no taskload limit) actions at a time. Thus, eighteen conditions were run in WMC: six function allocation candidates with three taskload saturation levels.

Results Figure 5.6 shows time traces for each agent under function allocations FA1–FA3 and a taskload saturation limit of one. Figure 5.6a can be considered the baseline condition, being a linear sequence of all the actions conducted by a single agent. Figure 5.6b shows that the total mission duration can be

FIGURE 5.6 Time traces for three function allocations, each with a taskload limit of one. Subfigure (a) shows FA1 with EV astronauts performing all of the work, (b) shows FA2 with RMS handling panels, and (c) shows FA3 with RMS handling panels and Humanoid I managing tools. Analysis of Work Dynamics for Objective Function Allocation 111 decreased by allocating the handling of the panels to the RMS robot, mainly because the panel handling can occur simultaneously with the removal and emplacement of the panels. This reduction in mission time comes at the cost of involving the IV astronaut in the mission, who now has to supervise the RMS through control and monitoring actions. However, because the IV agent can only do one action at a time in this test condition, the monitoring actions are delayed until the robot’s action and associated control action have been completed. Introducing a second robot to take over handling of tools from the EV astronaut does not further decrease the mission duration; on the contrary, as shown in Figure 5.6c, the mission duration increases drastically, even compared to the baseline function allocation where only the EV astronaut executes all the work. This can mainly be attributed to the IV astronaut being overloaded with monitoring, command and control actions for supervising two robots simultaneously. These actions, and their follow-on actions, are continuously pushed back, resulting in long idle times as the robots wait for input from the IV astronaut. With this function allocation, the IV astronaut clearly is the bottleneck. Furthermore, as the actions performed by the EV astronaut require the use of resources (the toolsets) that are also required for the Humanoid’s actions, there is no added benefit of delegating the tool handling actions to the Humanoid; the EV astronaut and Humanoid are often waiting on each other, again resulting in long idle times. It also significantly increases the number of required resource exchanges, during which the EV astronaut and Humanoid need to physically interact to hand over the toolsets. To further demonstrate the effect of taskload saturation limits, Figure 5.7 shows results for function allocation FA4-A for the three taskload satura- tion levels. Comparing FA4-A (Figure 5.7a) with FA1-FA3 (Figure 5.6), both having the same taskload limit, the mission time has been increased further because the third robot (Humanoid II) introduced in FA4 also needs to be commanded and monitored by the IV astronaut. If the agents have the capa- bility to do two actions simultaneously (taskload limit of two, Figure 5.7b), the work patterns change significantly due to the IV astronaut being capable of monitoring one robot while commanding another. Only when the work of the three robots lines up, which is the case at about time 2100 seconds, is Humanoid II idling because of unavailability of the IV astronaut. Figure 5.7c shows the work patterns with no taskload saturation limits, where the work dynamics result purely from the work model. The robots still occasionally idle while the IV astronaut is providing commands. When the work of the robotic agents lines up, for example when both humanoid robots are executing actions that need to be confirmed while the IV also needs to control the RMS, around the times of 1450 and 1800 seconds, large peaks in the IV astronaut’s taskload can be observed. Furthermore, because the IV’s taskload is no longer a bottleneck, the Humanoid I can now also per- form actions simultaneously (previously, the IV astronaut could not provide 112 Improving Aviation Performance

FIGURE 5.7 Time traces of FA4-A with three different taskload saturation limits. Subfigures show results for (a) taskload saturation limit of one, (b) taskload saturation limit of two and (c) no taskload limit. commands in time). The other two agents, however, each only perform one action at a time, even though no limit is imposed on their taskload. This is because these function allocations allocate parallel actions to different agents to allow simultaneous work to reduce the mission duration. However, parallel activity is not always possible. FA5 illustrates allocation of parallel actions to the same agent, as seen in Figure 5.8. The two Humanoid robots are now indeed executing multiple actions at once. However, because Analysis of Work Dynamics for Objective Function Allocation 113

FIGURE 5.8 Time trace of FA5 with no taskload saturation limit. of sequencing requirements for the repair actions, one Humanoid needs to wait on the other and we could just as well distribute the work over the two robots instead of one. Figure 5.9 summarizes three time-related metrics for all test conditions: the mission duration, total time on task, and, as a subpart of the total time on task, the idle time. As would be expected, higher assumed taskload satura- tion levels result in reduced mission durations as agents can do more actions simultaneously. This goes hand in hand with a reduction in idle time and total time on task. As more actions are allocated to robots (from left to right in the figures), the increased teamwork and the interdependencies between this teamwork and the robots’ taskwork results in significant increases in mission dura- tion, idle time, and total time on task. A visual example of how these work dynamics increase these metrics is shown in Figure 5.10, which is an adapted

FIGURE 5.9 Overview of (a) mission duration and (b) total time-on-task, by function allocation and for all agents combined. 114 Improving Aviation Performance

FIGURE 5.10 Example of the effect of teamwork actions on the precedence relationships between repair actions. version of Figure 5.5 in which command and confirmation actions have been added in sequence. In addition to the fact that more work now needs to be performed, the teamwork does not allow for simultaneous execution of the existing taskwork. Here, it is worthwhile to introduce a concept from the literature on opera- tions research and project management, which is applicable in this context: the critical path. The critical path is defined as the sequence of actions that deter- mines the minimum duration for an operation (Armstrong-Wright, 1969). If any action on the critical path is delayed, the mission duration is extended by the duration of that delay. Thus, depending on the critical path in the work, even without taskload saturation, the sequencing of teamwork and taskwork can affect the mission duration. The effects of sequencing in the human–robot teamwork, however, become more pronounced as we impose stricter taskload saturation limits: the abso- lute differences in mission duration and idle time between the conditions with taskload limits of one and two are larger as we move from the func- tion allocation allocating more work to a single EV, to those allocating more work to robots. From analysis of the time traces, this interaction between allocating more actions to robots and the taskload saturation limits can be attributed to the responsible agent becoming the bottleneck. Thus, the com- bination of more teamwork and stricter taskload saturation limits can have a profound effect on the time efficiency of the operations. Figure 5.11 shows the total number of information requirements and phys- ical resource transfers for the various conditions. Because these metrics are a function of which agent has last manipulated an information or physical resource, as long as the general sequence of actions does not change, there is very little impact of different taskload saturation levels. The only difference between the taskload limits is for FA4-B, which can be explained by the fact that when the taskload saturation limit is one, the first repair is delayed to a point where it interleaves with the second repair. Thus, whereas a panel was first exchanged between the agents between the two repairs, this exchange is no longer necessary when the two repairs are interleaved. Analysis of Work Dynamics for Objective Function Allocation 115

FIGURE 5.11 Number of (a) information resource exchanges and (b) physical resource requirements for the function allocation candidates.

Comparing across the function allocations, the differences in information requirements can be explained by the increased number of teamwork actions that require information exchange for the input that the human gives to the robot, and for the state information that the robot provides to the human for monitoring and confirmation. The number of physical resource exchanges is high for FA3 and FA5, because, in both function allocations, actions sharing the same resource have been divvied up between agents, in particular the inspection actions.

Discussion The case study described here is for demonstration purposes, but the general principles can be followed to perform larger analyses of similar scenarios and come to an informed trade-off as to what function allocation is most suitable. How the different metrics need to be weighed against each other is ultimately up to the designer. Similar scenarios that can be analyzed using WMC include work environments in which the timing of actions is governed within the work itself (e.g., temporal and resource constraints), or through an outside dynamical process (e.g., see Pritchett et al., 2016) for a case study in which airplane dynamics determine the timing of actions). The case study shows that the likely taskload saturation limits of agents is an important factor to take into account, as it significantly affects the emer- gent work dynamics that govern the effectiveness of a function allocation. The taskload saturation limit was particularly limiting for human operators 116 Improving Aviation Performance monitoring multiple robots at the same time. The excessive monitoring loads here lead to robots having to wait for the IV astronaut to become available to provide commands. From this analysis, one might thus conclude that a sec- ond human agent may be needed to bear some of the monitoring load when more than one robot is operating outside the spacecraft. Much of the work dynamics were governed by the required human– robot teamwork. This shows the importance of carefully considering the desired methods for verification and control of each robot. Other forms of human–robot interaction can also be examined. For example, how would the metrics improve when the human operator can proactively provide com- mands to the robot, instead of reactively? What would be the effect of allow- ing the robot to continue to its next action without confirmation for certain less-safety-critical actions? Beyond the initial evaluation provided here, the simulation framework can be applied throughout the process of designing function allocations. Identifying the emergent patterns and steering design decisions based on a good understanding of the implications of different function allocations will ease the design process in later stages. In such an iterative evaluation, we believe there is a benefit in first modeling coarser tasks and simulating them in WMC to gain fundamental insights early. These insights—and the WMC models—can then be elaborated upon in coordination with human-in- the-loop studies and with the development of robotic capabilities.

Future Work In the future, development of our methodology will include further work on the front- and back-end of our computational simulation framework, including more formal modeling of the input to WMC as well as standard- ized post-processing methods of the output. With a more streamlined pro- cess from input to output, we can start performing more multi-dimensional analyses and systematically explore many different function allocations. Planned internal extensions to the WMC framework include the simu- lation of the dynamics of taskwork and teamwork in greater detail and at higher levels of fidelity, supporting the more detailed evaluations of function allocations and the procedures and mechanisms supporting them. We will keep considering new forms of teamwork between robots and humans that are of interest to function allocation designers, particularly focusing on the differences in capabilities of robots. A second avenue of research is to apply the WMC framework to study dynamic function allocation. To be useful, a function allocation should be flexible enough that, if circumstances require, the function allocation can be changed during operation. Likewise, there is a large body of research Analysis of Work Dynamics for Objective Function Allocation 117 looking into “smarter” systems that change the function allocation (particu- larly between human and machine) based on real-time measurement of rel- evant metrics. The WMC framework can be used to evaluate the effects of protocols associated with re-allocating or delegating tasks between agents. One way of simulating this adaptability is through the modeling of deci- sions that reconfigure the function allocation or other high-level parameters of how the work is performed (Pritchett et al., 2014b). Finally, a potentially useful addition to the framework is the inclusion of stochasticity in the models of the actions and agents. Randomness in some of the model parameters (for example, the action durations) can provide insight into how robust the observed metric values are too predictable variance in the work and the work environment, versus where the function allocation is fragile, and improvements can be made.

Conclusions This chapter described a method for evaluating function allocation options for future manned spaceflight. We argue that the benefit of using computa- tional simulation is its ability to identify emergent work dynamics that might go unnoticed by static analysis methods. The case study demonstrated the application of our approach to an on-orbit maintenance scenario. The results illustrated the impact of different taskload saturation levels on the emergent work patterns and performance of the function allocation. As a concluding remark, beyond the problem of function allocation, there are benefits to applying computational simulation in a broader sense. Computational simulation allows for fast and broad exploration of the design trade-space of complex, multi-dimensional operations involving humans, that would—given its time-consuming and costly nature—otherwise be impracticable or infeasible to fully explore through human-subject studies. Insight from simulation, especially if obtained in earlier design stages, can help guide human-in-the-loop experiments. In an iterative loop, the results obtained from these experiments can, in turn, be used to improve the fidelity of the computational models.

Acknowledgments This work is sponsored by the NASA Human Research Program with Jessica Marquez serving as Technical Monitor under grant number NNJ15ZSA001N. The authors would also like to thank the other WMC developers. 118 Improving Aviation Performance

References Armstrong-Wright, A. T. (1969). Critical path method: Introduction and practice. Upper Saddle River, NJ: Prentice Hall Press. Bhattacharyya, R. P., & Pritchett, A. R. (2017). Designing function allocations in air traffic concepts of operation using network optimization. Journal of Air Transportation, 25, 61–71. Bye, A., Hollnagel, E., & Brendeford, T. S. (1999). Human–machine function alloca- tion: A functional modelling approach. Reliability Engineering & System Safety, 64, 291–300. Diggelen, J. V., Bradshaw, J. M., Grant, T., Johnson, M., & Neerincx, M. (2009). Policy-based design of human-machine collaboration in manned space missions. Proceedings of the third IEEE international conference on space mission challenges for information technology (pp. 376–383). Pasadena, CA: IEEE Computer Society Washington. Feigh, K. M., & Pritchett, A. R. (2014). Requirements for effective function allocation: A critical review. Journal of Cognitive Engineering and Decision Making, 8, 23–32. Fischer, U., & Mosier, K. (2014). The impact of communication delay and medium on team performance and communication in distributed teams.Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 58, pp. 115–119). Chicago, IL: SAGE Publications. Fong, T., Zumbado, J. R., Currie, N., Mishkin, A., & Akin, D. L. (2013). Space telero- botics: Unique challenges to human–robot collaboration in space. Reviews of Human Factors and Ergonomics, 9, 6–56. Howard, A. M. (2006). Role allocation in human–robot interaction schemes for mission scenario execution. Proceedings 2006 IEEE international conference on robotics and automation (pp. 3588–3594). Orlando, FL: IEEE. Lester, D., & Thronson, H. (2011). Human space exploration and human spaceflight: Latency and the cognitive scale of the universe. Space Policy, 27, 89–93. Miller, C. A., & Parasuraman, R. (2003). Beyond levels of automation: An architecture for more flexible human-automation collaboration. Proceedings of the human factors and ergonomics society annual meeting (Vol. 47, pp. 182–186). Denver, CO: SAGE Publications. Pritchett, A. R., & Bhattacharyya, R. P. (2016). Modeling the monitoring inherent within avia- tion function allocations. Proceedings of the international conference on human–computer interaction in aerospace (Article 20). Paris, France: ACM New York. Pritchett, A. R., Bhattacharyya, R. P., & IJtsma, M. (2016). Computational assessment of authority and responsibility in air traffic concepts of operation. Journal of Air Transportation, 24, 93–102. Pritchett, A. R., Feigh, K. M., Kim, S. Y., & Kannan, S. K. (2014). Work models that com- pute to describe multiagent concepts of operation: Part 1. Journal of Aerospace Information Systems, 11, 610–622. Pritchett, A. R., Kim, S. Y., & Feigh, K. M. (2014a). Measuring human-automation function allocation. Journal of Cognitive Engineering and Decision Making, 8, 52–77. Pritchett, A. R., Kim, S. Y., & Feigh, K. M. (2014b). Modeling human-automation func- tion allocation. Journal of Cognitive Engineering and Decision Making, 8, 33–51. Analysis of Work Dynamics for Objective Function Allocation 119

Rehnmark, F., Currie, N., Ambrose, R. O., & Culbert, C. (2013). Human–robot team- ing in a multi-agent space assembly task. Journal of Chemical Information and Modeling, 53, 1689–1699. Rodriguez, G., & Weisbin, C. R. (2003). A new method to evaluate human–robot sys- tem performance. Autonomous Robots, 14, 165–178. Shah, J. A., Saleh, J. H., & Hoffman, J. A. (2007). Review and synthesis of consider- ations in architecting heterogeneous teams of humans and robots for optimal space exploration. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 37, 779–793. Watts, J. C., Woods, D. D., Corban, J. M., Patterson, E. S., Kerr, R. L., & LaDessa, C. H. (1996). Voice loops as cooperative aids in space shuttle mission control. Computer Supported Cooperative Work’96, 9, 48–56. Woods, D. (1985). Cognitive technologies: The design of joint human-machine cogni- tive systems. AI Magazine, 6(4), 86–92.

Section III

Neuroergonomics

6 A Neuroergonomics Approach to Human Performance in Aviation

Frédéric Dehais and Daniel Callan

CONTENTS Human Performance Issues in Aviation ...... 123 Neuroergonomics Methodology ...... 125 Perceptual and Motor Aspect of Flying ...... 127 Attentional Aspect of Flying...... 128 Decision-Making Aspect of Flying ...... 130 Emerging Technological Solutions ...... 131 Conclusion ...... 135 References ...... 136

Human Performance Issues in Aviation The technical progress of brain imaging over the last decade has dramati- cally revolutionized our understanding of the brain structures and func- tions underlying perceptual, motor, and cognitive processes. This corpus of knowledge is of great importance for any discipline concerned with the eval- uation of human performance. Since the early 2000s, neuroergonomics, the intersection of neuroscience, cognitive engineering and human factors, pro- poses to examine human-technology interaction and the underlying brain mechanisms in increasingly naturalistic settings representative of work and in everyday life situations. This discipline was described by its founder, as the scientific study of the brain mechanisms and psychological and physi- cal functions of humans in relation to technology, work, and environments (Parasuraman, 2003). Aviation operations constitute an ideal domain to implement this approach. Because of the diversity of tasks involved when operating aircraft or unmanned vehicles, diverse topics can be investigated ranging from motor control, attention, learning, alertness, fatigue, workload, decision-making, situational awareness, and anxiety. Neuroergonomics

123 124 Improving Aviation Performance presents a relevant framework to understand the neural substrate underpin- ning a pilot’s performance as well as the neural mechanisms at the core of human error. This is of importance as human error and poor human-system inter- actions are commonly cited as the main cause of accidents (Li, Baker, Grabowski, & Rebok, 2001). For instance, loss-of-control (LOC) events form the most prominent category of accidents, of which there have been more than 50 in the last 5 years. A study revealed that 18 LOC were responsible for nearly 1493 deaths between 2002 and 2011 (Boeing, 2012) in which the crews generally failed to react appropriately (Ancel & Shih, 2012; Bureau Enquêtes et Analyses, 2012; Commercial Aviation Safety Team, 2008; Dehais, Peysakhovich, Scannella, Fongue, & Gateau, 2015; Spangler & Park, 2010). Beyond psychomotor and perceptual issues, decision-making impairment is also known to be a contributing factor in the second most prominent cat- egory of accidents, “controlled flight into terrain” (CFIT), in which an air- worthy aircraft is unintentionally flown into the ground (or the sea), often during the approach phase. Amazingly, 51% of accidents occur during approaches/landings, whereas this phase represents only 4% of exposure time of a flight lasting 1.5 hours (National Transportation Safety Board, 2007). A paradigmatic accident is the one that killed the president of Poland in 2010 near Smolensk where the crew persisted in a no-visibility landing despite sev- eral auditory “Pull-Up” alerts. According to the accident analysis, the crew may have feared a negative reaction from the President should they have to divert to an alternate airfield (Committee for Investigation of National Aviation Accidents, 2011). These critical events and the inappropriate reactions of expert pilots present a challenge for human factor practitioners and aircraft manufac- turers. On one hand, the understanding of performance and human error have been investigated through subjective and observable measures. Although this approach has paved the way to great progress, especially when observations led to descriptive modeling, an important part of pilot-cockpit interaction remains unknown. On the other hand, cogni- tive neuroscience has opened “the black box” and shed light on under- lying neural mechanisms supporting human behavior. Despite decades of exciting progress, this discipline has mostly been limited to labora- tory studies and the use of simplified paradigms to investigate cogni- tion. Thus, neuroergonomics constitutes a paradigm shift away from the standard reductionist approach to neuroscience and from the typical lack of objective measures from field studies as conducted by human factors practitioners. Therefore, in this chapter, we propose to examine neuroer- gonomics and its benefits for flight safety. In the next sections, we pres- ent the neuroergonomics-based methodology, then provide illustrations applied to the understanding of motor, attentional, and decision-making aspects of flying, followed by the proposal of neuroergonomic-based solutions to enhance flight safety. A Neuroergonomics Approach to Human Performance in Aviation 125

Neuroergonomics Methodology Neuroergonomics maintains that an understanding of neural processes underlying human behavior can best be understood by investigating the underlying interacting brain networks in the context of carrying out vari- ous real-world tasks under investigation, rather than under reduced isolated conditions that only occur in the laboratory. To that end, neuroergonomics promotes the use of various brain imaging techniques and psychophysiolog- ical sensors to investigate the neural mechanisms underlying phenomena that occur during complex real-life activities. A challenge of importance for neuroergonomics is to succeed in repro- ducing ecological conditions in well-controlled laboratory protocols and to determine solutions for application of portable devices to measure human per- formance in realistic settings. In laboratory settings, expensive high-resolution devices such as functional magnetic resonance imaging (fMRI) and magne- toencephalography (MEG) can be employed. Functional magnetic resonance imaging indirectly measures brain activity by primarily looking at blood oxygenation level dependent changes between various experimental condi- tions (Ogawa, Lee, Kay, & Tank, 1990). Excellent spatial resolution of brain activity, both cortical and subcortical, is provided by fMRI. However, one of the limitations of fMRI is that it lacks good temporal resolution (on the order of several seconds, limited by the slow rise of the hemodynamic response). MEG directly measures magnetic fields generated by simultaneous local field poten- tials in large groups of similarly oriented neurons (Baillet, 2017). The temporal resolution for MEG is good (on the order of 1 msec), and the transparency of magnetic fields with respect to various tissues (skin, bone, cerebral spinal fluid) provides advantages over electroencephalography (EEG) with respect to source localization. Using an individual specific anatomical MRI to model the brain spatial resolution lower than one cm can be achieved (Sato, Yoshioka, Tkajihara, Toyama, Goda, & Kawato, 2004). While fMRI and MEG both provide precious insights into the neural mech- anisms underlying specific cognitive processes, these techniques have sev- eral drawbacks that constrain the design of ecological protocols to examine the brain “at work.” The main drawbacks of fMRI and MEG techniques for ecological protocols are the lack of portability (fMRI and MEG both require specialized facilities with shielded rooms and cannot be moved), the large sus- ceptibility to artifacts caused by very small head movement (participants must keep their head very still), and the considerable expense for these devices. Despite these apparent limitations, the use of these brain-imaging methods together with specialized fiber optic-based equipment (e.g., joystick, pedals, throttle lever, steering wheel) and audio-visual display of realistic flight and driving simulation have been used to investigate neural activity underlying perceptual, motor, and cognitive processes involved with operating a vehicle (Adamson, Taylor, Heraldez, Khorasani, Noda, Hernandez, & Yesavage, 126 Improving Aviation Performance

2014; Callan, Gamez, Cassel, Terzibas, Callan, Kawato, & Sato, 2012; Callan, Terzibas, Cassel, Callan, Kawato, & Sato, 2013; Callan, Terzibas, Cassel, Sato, & Parasuraman, 2016b; Durantin, Dehais, Gonthier, Terzibas, & Callan, 2017). Alternative functional neuroimaging techniques that could help overcome the aforementioned limitations associated with fMRI and/or MEG include mea- sures derived from EEG and functional near-infrared spectroscopy (fNIRS). Portability of their instruments allows for non-invasive examination of brain function in real-world settings. Less known than EEG, fNIRS is a non-invasive optical brain monitoring technique that provides a measure of cerebral hemo- dynamics within cortical regions. It has low temporal resolution but good spa- tial localization for the outer region of the cortex that has been verified with fMRI (Cui, Bray, Bryant, Glover, & Reiss, 2011). The use of these techniques recently has gained momentum, offering interesting prospects for human fac- tors issues, as it is field-deployable (Ayaz, Shewokis, Bunce, Izzetoglu, Willems, & Onaral, 2012; Gateau, Durantin, Lancelot, Scannella, & Dehais, 2015). Therefore, one adequate solution to understand the neural mechanisms underpinning pilots’ performance is to conduct a “progression” of experi- ments starting with well-controlled protocols with high-resolution devices (e.g., fMRI, and MEG), which are constrained to the use of low-fidelity simulators, then proceeding to more ecological experiments in dynamic microworlds using motion-based, high-fidelity flight simulators, and brain recording devices that are portable but with lower resolution (e.g., fNIRS, EEG), to eventually conducting experiments in real flight conditions using these same portable brain recording devices (see Figure 6.1).

FIGURE 6.1 Illustration of the neuroergonomics methodology from highly controlled but less ecological situations to highly ecological but less controlled situations. Cerebral and autonomous nervous system activations are compared across the different situations to ensure the validity of the measurements. (Courtesy of F. Dehais and D. Callan.) A Neuroergonomics Approach to Human Performance in Aviation 127

Perceptual and Motor Aspect of Flying An understanding of the neural processes underlying perceptual and motor aspects of flying related to performance and learning will provide insight for neuroergonomic-based application of enhanced training paradigms and facilitative technology. Experience shapes the way that the brain processes information. Brain processing and neuroanatomical differences between pilots and non-pilots have been investigated to explore effects of experience. In an fMRI experiment (Callan et al., 2013) looking at execution and obser- vation of aircraft landing, it was found that pilots showed greater activity than non-pilots in brain regions involved with the motor simulation “Mirror Neuron System,” thought to be important for imitation-based learning. It is interesting to point out that glider pilots compared to non-pilots show higher gray matter density in the ventral premotor cortex, thought to be a part of the “Mirror Neuron System” (Ahamed, Kawanabe, Ishii, & Callan, 2014). Pilots also showed greater activity in the cerebellum when observ- ing their own previous versus another person’s aircraft landing perfor- mance, suggesting a role of motor simulation-based, error-feedback learning (Callan et al., 2013). Pilots utilize unsupervised imitation-based learning in the cortex and error-feedback-based learning carried out in the cerebellum (Callan et al., 2013). The results of these experiments, given above, are con- sistent with the hypothesis that experience (piloting, in this case) shapes the way in which the brain processes perceptual motor tasks. Brain activity from perceptual motor brain regions has been used in a brain-machine-interface-based neuroadaptive automation application to improve performance (Callan et al., 2016b). In this case, performance was defined as response speed to recover from an unexpected perturbation in flight attitude. The experiment was carried out recording brain activity with MEG during a flight simulation task. The goal of the experiment was to utilize only perceptual motor activity occurring normally during task performance, to identify the perception of a perturbation, and the intention to move. Machine learning of brain activity to detect the perturbation in flight attitude was used to train a classifier on data collected on a simple task. The trained classifier was then used on a complex task in which the pilots were to maneuver through the Grand Canyon while experiencing an unexpected perturbation in flight attitude. This neuroadaptive automation application (automated initiation of control stick deflection of flight surfaces—elevator) utilizing a brain-machine- interface was able to selectively detect motor intention in response to pertur- bation from continuous motor control used to continuously maneuver the airplane, improving response speed from a mean of 425.0 to 352.7 ms (mean time savings of 72 ms). One of the most important aspects of this research was that it was able to increase overall system performance (faster response speed) using brain activity naturally occurring during operation of the piloting task without any additional workload incurred on the pilot. 128 Improving Aviation Performance

Attentional Aspect of Flying An important component of flying is related to the monitoring of the flight deck and the external environment. There have been recent concerns raised by the National Transportation Safety Board and the International Civil Aviation Organization (Civilian Aviation Authority, 2013) about the crew’s proficiency to supervise flight parameters. These institutions identified poor monitoring as a contributing factor involved in most of the major civilian accidents, such as the Colgan Air Flight 3407 (Spangler & Park, 2010), the Asiana Air Flight 214 (National Transportation Safety Board, 2014b), the Flight 1951 (Dutch Safety Board, 2010), or more recently the UPS Airlines Flight 1354 (National Transportation Safety Board, 2014a) crashes, to name a few. Whereas these visual attentional issues and causal factors are now well-documented (Casner & Schooler, 2015; Dehais, Behrend, Peysakhovich, Causse, & Wickens, 2017; Parasuraman, & Riley, 1997; Reynal, Colineau, Vernay, Dehais, 2016, ), insufficient attention has been given to its auditory attentional counterpart. Indeed, several accident analyses (Bliss, 2003; Mumaw, 2017) and research in the aviation domain (Dehais, Causse, Régis, Menant, Labedan, Vachon, & Tremblay, 2012; Dehais, Causse, Vachon, Régis, Menant, & Tremblay, 2014; Dehais, Roy, Gateau, & Scannella, 2016; Dehais, Roy, Durantin, Gateau, & Callan, 2018) have revealed that the absence of response to auditory warn- ings could take place in the cockpit. The understanding of this phenomenon is complex and advocates for the use of a neuroergonomic approach. Indeed, contrary to visual attention, which can be measured by the recording of eye movements (Duchowski, 2007), the understanding of auditory attentional pro- cessing is more dependent on the use of brain imaging techniques. Following a neuroergonomic methodology (see Figure 6.1), we conducted a series of experiments using fMRI and EEG, respectively, in simulated and real flight conditions. This approach led us to explore the “where” (i.e., brain areas), the “when” (i.e., temporal dynamic), and the “how” (i.e., underlying neural mechanisms) underpinning alarm misperception. We first carried out an fMRI study to investigate the activity of the brain regions during epi- sodes of alarm misperception using a first-person view “Red Bull Air Race” flight simulator (Durantin et al., 2017). During the flight scenario, the pilots had to pass through several gates using a joystick while auditory alarms, periodically triggered at irregular intervals, were to be reported by the vol- unteers. In order to maintain a high level of engagement and force pilots to scan both the instruments and the world outside the plane, a light in the flight deck indicated the orientation (either horizontal or vertical) in which pilots were to fly through the gates. The results revealed that pilots missed about 35% of the alarms, but more interestingly, the fMRI analyses revealed that auditory misses relative to auditory hits yielded greater differential activation in several brain structures involved with an attentional bottleneck A Neuroergonomics Approach to Human Performance in Aviation 129

(Tombu, Asplund, Dux, Godwin, Martin, & Marois, 2011). These latter regions were also particularly active when flying performance was low, sug- gesting that when the primary task demand was excessive, this attentional bottleneck attenuated the processing of non-primary tasks to favor execu- tion of the visual piloting task. Deeper analyses lend support to this hypoth- esized mechanism via reduced functional neural connectivity from some of these attentional bottleneck regions to auditory processing regions for missed audio alarms relative to hits (Durantin et al., 2017). This latter result suggests that the auditory cortex can be literally switched off by top-down mechanisms when the flying task becomes too demanding. Although fMRI is a valuable tool for identifying the brain areas responsible for alarm misperception, its temporal resolution is too low to measure when this phenomenon may occur. For this, we conducted a second experiment (Scannella, Roy, Laouar, & Dehais, 2016) to assess the dynamics of alarm misperception in the cockpit by using EEG and analyzing event-related potentials (ERPs). Seven participants were placed in a motion flight simu- lator facing a critical landing situation with smoke in the cabin requiring an emergency night landing in adverse meteorological conditions. During the task, either a low pitch tone was presented (“the standard,” probabil- ity = 0.80), which participants were told to ignore or a deviant high pitch tone (“the alarm,” probability = 0.20), which participants were asked to report, was presented. The pilots failed to respond to 56% of the auditory alarms. In addition, the analysis of neurophysiological signals showed that the missed over the hit alarms led to a drastic reduction of the amplitude of the auditory N100 (perceptual) and P300 (attentional) event-related components. These results were consistent with previous findings (Giraudet, St-Louis, Scannella, & Causse, 2015; Scannella, Causse, Chauveau, Pastor, & Dehais, 2013) and also suggested, together with the previously cited fMRI study (Durantin et al., 2017), that alarm misperception phenomenon occurs even during early stages of auditory processing. Lastly, a third experiment was conducted in actual flight conditions to improve our understanding of the neural mechanisms underlying alarm misperception (Callan, Gateau, Durantin, Gonthier, & Dehais, 2018). Whereas the previous study focused on the analysis of ERPs is informative, it does not provide insight into oscillatory and phasic properties thought to mod- ulate perceptual cortices by inducing phase resetting (Yamagishi, Callan, Anderson, & Kawato, 2008). Interestingly, Inter-Trial Coherence (ITC) can measure these modulations over the EEG signal. In this experiment, we used a similar odd-ball paradigm in which the pilots were to ignore a frequent tone and to respond by button press when they heard a deviant chirp sound. The experiment was conducted using a DR400 4-seat airplane with thir- teen pilots who were equipped with a 64-channel Cognionics dry-wireless EEG system to measure their brain activity. A flight instructor was present on all flights and in charge of initiating the various scenarios consisting of diverted flight plan, simulated engine failure, off-field emergency landing 130 Improving Aviation Performance procedures, and low altitude circuit patterns. The results of our in-flight EEG experiment demonstrated that the pilots missed 38% of auditory targets and that these misses, in comparison to hits, were associated with a reduction in phase resetting in alpha- and theta-band frequencies, as measured by ITC (Callan et al., 2018). These results suggested that the auditory cortex fails to be in phase with the external auditory environment to adequately process the alarms. This finding is consistent with our first fMRI study disclosing that the activation of the attentional bottleneck led to a de-synchronization of the auditory cortex preventing an accurate processing of the alarms. These three experiments represent typical illustrations of the neuroergo- nomics approach from basic experiments conducted with high-definition measurement tools in the lab to the measurement of cognition in realistic settings. First, this approach allows one to confirm the consistency of the measurement collected through different devices and settings: Early gat- ing mechanisms are taking place in the auditory cortex and impair the pro- cessing of auditory alarms. These findings provide new insight on auditory alarms misperception that is usually not attributed to attentional limitations, but rather due to decision biases according to the human factors literature (see Dehais et al., 2014). Secondly, this three-step methodology provided complementary findings on the temporal dynamics, the brain areas, as well as the mechanisms underpinning the interactions between these cortical regions responsible for auditory alarm misperception.

Decision-Making Aspect of Flying Decision-making has always been a fundamental aspect of flying and naviga- tion. Once take-off has been accomplished, the aviator is sometimes confronted by the explore/exploit dilemma; that is, should I go-on with my plan or should I divert? Unfortunately, several safety analyses and experiments conducted in the simulator have found evidence of situations in which pilots are unable to adapt to environmental changes and press on into deteriorating conditions (Dehais, Tessier, & Chaudron, 2003). An area of ongoing research is concerned with addressing pilots’ tendency to try to reach the final destination at all costs despite evidence that this goal is not relevant anymore (Orasanu, Martin, and Davison, 2001). A study conducted by Rhoda and Pawlak (1999) demonstrated that in 2000 cases of approaches under thunderstorm conditions, two aircrews out of three kept trying to land especially if their flight had been delayed, they were in sequence behind another airplane, or if it was a night flight. A first attempt to understand this phenomenon is to consider commitment escala- tion (Staw, 1981). This theory stipulates that the higher and longer is the level of commitment to achieve an important goal, the harder it is to drop this goal even if it is not relevant anymore. O’Hare and Smitheram (1995) have observed A Neuroergonomics Approach to Human Performance in Aviation 131 that the probability for a pilot to continue visual flight rules into dangerous weather grows as the pilot gets closer to the final destination. A complementary explanation to commitment escalation resides in the large range of aversive consequences associated with the decision to abort a goal. This framework has been particularly applied to the aviation domain to understand why pilots persist in erroneous landing instead of going-around or diverting. O’Hare and Smitheram, (1995) hypothesized that the frame in which human operators make their decisions shifts from gains to losses as goal achievement gets closer, resulting in increased risk taking. Indeed, prospect theory (Kahneman & Tversky, 1979) postulates that people become less risk aversive (in other words, take more risk) when decisions are framed in terms of potential loss. For instance, a go-around has a cognitive cost as it may lead to great difficulties in reinserting the aircraft back in the traffic pattern. It also increases uncertainty as pilots rarely perform this maneu- ver during their operational career (Bureau Enquêtes et Analyses, 2013). All these emotional pressures could alter the rational reasoning by shifting decision-making constraints from safety rules to economic ones. To investigate this issue, Causse et al. (2013) used a simplified, but plausible, landing task based on a standard cockpit instrument landing system, to esti- mate changes in brain activity related either to the type of incentive (neutral or financial) or the level of uncertainty (low or high as manipulated by the degree of flight path deviation). A payoff matrix was designed to reproduce the negative consequences linked with the decision to go-around in a man- ner efficient enough to provoke risky behavior. Combined with behavioral outcomes, the neuroimaging results revealed a shift from rational to errone- ous decision-making in response to uncertainty when financial incentive was present. Whereas a large network of prefrontal regions (responsible for rational decision-making) was observed in response to increased uncertainty, a differ- ent collection of brain regions, not including the frontal regions, was found when biased financial incentive was combined with uncertainty. Participants with poor decision-making performance who adopted more risky behavior demonstrated lower activity in the right dorsolateral prefrontal cortex. This interesting outcome demonstrated that reward and uncertainty can tempo- rarily jeopardize rational decision-making supported by a specific cerebral network during complex and ecologically valid tasks. This approach provides neurocognitive correspondence on erroneous decision-making made by pilots.

Emerging Technological Solutions Another important facet of neuroergonomics is to design technical solu- tions to improve human system interactions in complex real-life situations. Neuroadaptive automation that uses operator mental states and assessment 132 Improving Aviation Performance

FIGURE 6.2 Neuroadaptive automation based on integrating decoded operator neural states in relation to sit- uation assessment. (Adapted from Figure 4.3 in Vidulich, M. A., Mental workload and situation awareness: Essential concepts for aviation psychology practice, in Tsang, P. S. & Vidulich M. A. (Eds.). Principles and Practice of Aviation Psychology, Erlbaum, Mahwah, NJ, pp. 115–146, 2003.). of the situation of the vehicle can be utilized to enhance overall system per- formance by means of a real-time adaptable interface (Figure 6.2). The goal of neuroadaptive automation is to enhance overall system performance by facilitating human actions and decisions through the use of artificial intel- ligence and adaptation algorithms that drive a real-time adaptable interface. Through the use of “Big Data,” artificial intelligence is able to utilize the dynamic actions of the pilot in relation to the state of the vehicle in the envi- ronment to assess the situation (Situation Assessor). Machine-learning and artificial neural networks (Adaptive Algorithms) are used to decode men- tal states in relation to the ongoing situation to predict operator action and activate appropriate response through automation (Real-Time Adaptable Interface and/or override control, if necessary). The situational awareness of the pilot can be predicted by assessing the perceptual, attentional, and memory states of the pilot in relation to the situation assessor. Feed-forward prediction of the pilot can also be estimated by looking at the response to direct feedback and the real-time adaptable interface that can act as an error A Neuroergonomics Approach to Human Performance in Aviation 133 signal. The performance of the pilot is facilitated by optimally delivering only the most relevant information via the real-time adaptable interface based on the situational awareness and mental state of the pilot given the task at hand. System performance can also be enhanced by augmenting such things as response speed by initiating control of the vehicle based on the motor intention of the operator (see Callan et al. 2016b, discussed above). Additionally, brain stimulation methods can be utilized to facilitate operator performance and training. Two major components of the proposed neuroad- aptive automation system are brain computer interface (BCI) technology and brain stimulation methods that are discussed more extensively below. BCI could be classified as active or passive and there are three major areas of research: (1) inferring neural correlates of mental workload, (2) target spe- cific motor, perceptual, attentional or decisional processes to predict per- formance, and (3) developing effective trigger algorithms. An exciting field of research is related to the implementation of “active” and “passive” BCI based on the real-time processing of the neurophysiological and physiologi- cal signals. “Active” BCI allows a user to control artifacts with one’s brain waves without requiring any physical actions on the user interface. Despite initial spectacular promises, the use of such BCI is still limited to the con- trol of a few actuators. This seriously limits its use for controlling aircraft (Fricke, Paixão, Loureiro, Costa, & Holzapfel, 2015). Moreover, it requires a lengthy procedure to train the user to focus on controlling his brain waves. The attention required to control these brain waves leaving few cognitive resources to monitor or interact with other relevant systems (Lotte, Larrue, & Mühl, 2013). In contrast, “Passive” BCIs—or neuroadaptive technologies—are not meant to directly control a device (e.g., a mouse) via brain activity but to support “implicit interaction” (Zander, Kothe, Jatzev, & Gaertner, 2010). Research on “passive” BCIs provides interesting insight as its goal is to derive the human operator’s cognitive state, such as a low vigilance state or high mental workload, and then adapt the nature of the interactions to overcome cognitive bottlenecks (Roy, Bonnet, Charbonnier, & Campagne, 2013, 2016). The design of neuroadaptive user interfaces (see Figure 6.2) represents a growing field of research for human-system interactions (Roy & Frey, 2016). One area of research has been to infer the neural correlates of mental work- load with portable brain imaging techniques, such as EEG and fNIRS. An extensive review in the field of workload in aviation has shown that an increase in mental workload is associated with an increase in EEG power in the theta-band (4–8Hz) and a decrease in alpha-band power (8–15Hz) and that the transition from high mental workload to mental fatigue is charac- terized by increased EEG power in theta as well as delta (<4Hz) and alpha bands (Borghini, Astolfi, Vecchiato, Mattia, & Babiloni, 2014). Changes in oxygenated hemoglobin concentration in the prefrontal cortex is also known to be a relevant predictor of a pilot’s mental workload variation (Ayaz et al., 2012; Durantin, Gagnon, Tremblay, & Dehais, 2014; Durantin, Scannella, 134 Improving Aviation Performance

Gateau, Delorme, & Dehais, 2015). Based on this latter assumption, a prefron- tal fNIRS-based passive BCI was implemented in a motion flight simulator to detect two levels of mental workload. During the experiment, the pilots were tasked to read back air traffic control instructions of two levels of difficulty (e.g., “Speed 150, Heading 150, Altitude 1500, Vertical Speed + 1500” versus “Speed 238, Heading 155, Altitude 2300, Vertical Speed + 1800”) while super- vising the flight. A classifier was trained to perform on-line single trial work- ing memory loads classification with 80% of accuracy (Gateau et al., 2015). Interestingly, when this protocol was adapted to be performed in real flight conditions with a light aircraft (Gateau, Ayaz, & Dehais, 2018), classification accuracy up to 78% was obtained; thus, demonstrating the feasibility of mon- itoring cognition in extreme and complex real-life situation with fNIRS. In addition, the use of connectivity metrics—that is the analysis of the co-activation of long-range neural networks—appears to be particularly suited to assess the dynamics of cerebral activity. Such an approach was ini- tially used in motion-based flight simulators (Astolfi et al., 2012), and more recently connectivity metrics were successfully used to discriminate differ- ent levels of engagement during automated and manual landing (Verdière, Roy & Dehais, 2018). Another area of research is to measure and target specific motor, perceptual, attentional, or decisional processes to predict performance. For instance, EEG has been used successfully to detect the onset of pilot-induced oscillations with 79% accuracy in actual flight conditions (Scholl et al., 2016). Utilizing artifact cleaning (Automatic Subspace Reconstruction; Mullen et al., 2013) and removal (Independent Component Analysis) techniques it was possible to train a classifier to detect the presence or absence of an audio stimulus with around 79.2% accuracy even in an open cockpit biplane in-flight with considerable vibration, wind, acoustic noise, and physiological artifacts (Callan, Durantin, & Terzibas, 2015). Such technical advances pave the way to implement passive BCI to detect auditory alarm misperception in the cockpit. The identification of degraded cognitive states remains challenging, but one other important step is to dynamically trigger adequate solutions to improve flight performance. One first category of solutions consists of designing new alerting systems for more efficient interaction (Causse, Phan, Ségonzac, & Dehais, 2012). For instance, it has been shown that switching off the displays for a very short period is an effective way to mitigate attentional tunneling (Dehais, Causse, & Tremblay, 2011; Dehais, Tessier, Christophe, & Reuzeau, 2010). A second type of solutions resides on reallocating tasks between the human and automated systems based on neurophysiological index. This concept known as “adaptive automation” aims at automatically allocating tasks between human and automation to maintain a constant, acceptable and stimulating task load to the pilot. The hyperscanning of two pilots—that is the simultaneous neurophysiological measures of the crew— could enable extension of this concept to the optimal distribution of work- load (Astolfi et al., 2012). A Neuroergonomics Approach to Human Performance in Aviation 135

Another form of neuro-based technology that can be used to enhance performance and facilitate training of real-world tasks are brain stimulation methods. Transcranial direct current stimulation (tDCS) involves the appli- cation of low-intensity electric current through the scalp to underlying brain tissue. Several studies have shown that tDCS can improve performance and learning on perceptual, motor, and cognitive tasks (Brunoni & Vanderhasselt, 2014; Coffman, Clark, & Parasuraman, 2014; Jacobson, Koslowsky, & Lavidor, 2012; Parasuraman & Galster, 2013). This improvement in performance and learning is thought to be mediated by enhancing cortical excitability leading to long-term-potentiation/depression (Coffman et al., 2014; Nitsche & Paulus, 2000). Enhancement of human abilities by tDCS has been demonstrated in neuroergonomic context (Parasuraman & McKinley, 2014). Simultaneous fMRI and tDCS on an aviation-related visual search task (Callan, Falcone, Wada, & Parasuraman, 2016a) was used to further investigate the neural processes underlying task-related modulation of resting-state brain activity resulting from tDCS. It was found that the degree of functional connectivity from the site of stimulation in the precuneus to the substantia nigra predicts future enhancement in visual performance induced by tDCS. The substantia nigra is part of the dopaminergic system and is involved with value depen- dent learning. These individual differences in the improvement in perfor- mance as a result of training were only found for the active tDCS group and not for the sham tDCS group. These results show that there are individual differences in the extent to which tDCS enhances plasticity in task related neural networks leading to improved performance. This suggests that one could use the knowledge of relative connectivity strength in these networks to optimize individuals training time and future performance. The use of tDCS as well as other brain stimulation methods such as transcranial alter- nating current stimulation (tACS) and temporal interference noninvasive deep brain stimulation (Grossman et al., 2017), can be enhanced by utilizing high-density electrode configurations that are able to more focally stimu- late specific cortical and subcortical regions in multiple areas of the brain simultaneously. These methods will allow for considerable advancement in neuroergonomic applications used to facilitate performance and training in real-world situations.

Conclusion This chapter presented the first overview of neuroergonomic research in avia- tion and its potential benefits for flight safety. Our motivation was to demon- strate the potential of measuring the neural mechanisms underpinning motor, perceptual/attentional and decisional aspects of flying. This review revealed the subtle dynamic of brain activity when facing complex real-life operational 136 Improving Aviation Performance situations. Moreover, neuroergonomics studies demonstrate, that fNIRS and EEG can be effectively used in the noisy environment of a flight simulator and more importantly, even in the noisy environment of an airplane by using vari- ous signal processing techniques. Thus, bringing us closer to the realization of neuroergonomics-based technology in the cockpit to promote performance, safety, efficiency, and well-being of the pilots, crew, and passengers. Indeed, using this information it may be possible to develop neuroadaptive cockpits (Figure 6.2) to reduce workload and to facilitate the processing of critical infor- mation. Therefore, we strongly believe that this approach can be beneficial not only for basic neuroscientists concerned with the understanding of the brain functioning but also for human factors researchers and practitioner.

References Adamson, M. M., Taylor, J. L., Heraldez, D., Khorasani, A., Noda, A., Hernandez, B., & Yesavage, J. A. (2014). Higher landing accuracy in expert pilots is associated with lower activity in the caudate nucleus. PloS one, 9(11), e112607. doi: 10.1371/ journal.pone.0112607. Ahamed, T., Kawanabe, M., Ishii, S., & Callan, D. E. (2014). Structural differences in gray matter between glider pilots and non-pilots. A voxel-based morphometry study. Frontiers in Neurology, 5, Article 248. doi: 10.3389/fneur.2014.00248 Ancel, E., & Shih, A. T. (2012). The analysis of the contribution of human factors to the in-flight loss of control accidents. In 12th AIAA Aviation Technology, Integration and Operations (ATIO) Conference and the 14th AIAA/ISSM September 17–19, 2012 (AIAA 2012–5548). Reston, VA: AIAA. Astolfi, L., Toppi, J., Borghini, G., Vecchiato, G., He, E. J., Roy, A., He, B. et al. (2012). Cortical activity and functional hyperconnectivity by simultaneous EEG recordings from interacting couples of professional pilots. In Engineering in medicine and biology society (EMBC), 2012 annual international conference of the IEEE (pp. 4752–4755). New York: IEEE Ayaz, H., Shewokis, P. A., Bunce, S., Izzetoglu, K., Willems, B., & Onaral, B. (2012). Optical brain monitoring for operator training and mental workload assess- ment. Neuroimage, 59, 36–47. doi: 10.1016/j.neuroimage.2011.06.023. Baillet, S. (2017). Magnetoencephalography for brain electrophysiology and imaging. Nature Neuroscience, 20, 327–339. doi: 10.1038/nn.4504. Bliss, J. P. (2003). Investigation of alarm-related accidents and incidents in avia- tion. The International Journal of Aviation Psychology, 13, 249–268. doi: 10.1207/ S15327108IJAP1303_04. Boeing. (2012). Statistical summary of commercial jet airplane accidents. Worldwide Operations. 1959–2011. Boeing Technical Report. Seattle, WA: Boeing. Borghini, G., Astolfi, L., Vecchiato, G., Mattia, D., & Babiloni, F. (2014). Measuring neurophysiological signals in aircraft pilots and car drivers for the assess- ment of mental workload, fatigue and drowsiness. Neuroscience & Biobehavioral Reviews, 44, 58–75. doi: 10.1016/j.neubiorev.2012.10.003. A Neuroergonomics Approach to Human Performance in Aviation 137

Brunoni, A. R., & Vanderhasselt, M. A. (2014). Working memory improvement with non-invasive brain stimulation of the dorsolateral prefrontal cortex: A systematic review and meta-analysis. Brain and Cognition, 86, 1–9. doi: 10.1016/j.bandc. 2014.01.008. Bureau d’Enquêtes et d’Analyses. (2012). On the accident on 1st June 2009 to the -203 registered F-GZCP operated by Air France flight AF447 Rio de Janeiro-Paris. Aéroport du Bourget, France: Bureau d’Enquêtes et d’Analyses. Bureau Enquêtes et Analyses. (2013). Study of Aeroplane State Awareness dur- ing Go-Around (No. FRAN-2013-023). Aéroport du Bourget, France: Bureau d’Enquêtes et d’Analyses. Callan, D. E., Gamez, M., Cassel, D. B., Terzibas, C., Callan, A., Kawato, M., & Sato, M. (2012). Dynamic visuomotor transformation involved with remote flying of a plane utilizes the ‘mirror neuron’ system. PloS one, 7(4), e33873. doi: 10.1371/ journal.pone.0033873. Callan, D. E., Gateau, T., Durantin, G., Gonthier, N., & Dehais, F. (2018). Disruption in neural phase synchrony is related to identification of inattentional deafness in real-world setting. Human Brain Mapping, 39, 2596–2608. doi: 10.1002/hbm.24026. Callan, D. E., Terzibas, C., Cassel, D. B., Sato, M., & Parasuraman, R. (2016b). The brain is faster than the hand in split-second intentions to respond to an impending hazard: A simulation of neuroadaptive automation to speed recovery to per- turbation in flight attitude. Frontiers in Human Neuroscience, 10, doi: 10.3389/ fnhum.2016.00187. Callan, D., Durantin, G., & Terzibas, C. (2015). Classification of single-trial audi- tory events using dry-wireless EEG during real and motion simulated flight. Frontiers in Systems Neuroscience, 9doi: 10.3389/fnsys.2015.00011. Callan, D., Falcone, B., Wada, A., & Parasuraman, R. (2016a). Simultaneous tDCS-fMRI identifies resting state networks correlated with visual search enhancement. Frontiers in Human Neuroscience, 10. doi: 10.3389/fnhum.2016.00072. Callan, D., Terzibas, C., Cassel, D., Callan, A., Kawato, M., & Sato, M. (2013). Differential activation of brain regions involved with error-feedback and imi- tation based motor simulation when observing self and an expert’s actions in pilots and non-pilots on a complex glider landing task. NeuroImage, 72, 55–68. doi: 10.1016/j.neuroimage.2013.01.028. Casner, S. M., & Schooler, J. W. (2015). Vigilance impossible: Diligence, distraction, and daydreaming all lead to failures in a practical monitoring task. Consciousness and Cognition, 35, 33–41. doi: 10.1016/j.concog.2015.04.019. Causse, M., Péran, P., Dehais, F., Caravasso, C., Zeffiro, T., Sabatini, U., & Pastor, J. (2013). Affective decision making under uncertainty during a plausible aviation task: An fMRI study. NeuroImage, 71, 19–29. doi: 10.1016/j.neuroimage.2012.12.060. Causse, M., Phan, J., Ségonzac, T., & Dehais, F. (2012). Mirror neuron based alerts for con- trol flight into terrain avoidance. In K. M. Stanney, & K. S. Hale (Eds.),Advances in cognitive engineering and neuroergonomics (pp. 157–166). Boca Raton, FL: CRC Press. Civilian Aviation Authority. (2013). Monitoring matters—Guidance on the development of pilot monitoring skills. Civilian aviation authority (CAA Paper 2013/02). West Sussix, UK: Civil Aviation Authority. Coffman, B. A., Clark, V. P., & Parasuraman, R. (2014). Battery powered thought: Enhancement of attention, learning, and memory in healthy adults using transcranial direct current stimulation. Neuroimage, 85, 895–908. doi: 10.1016/ j.neuroimage.2013.07.083. 138 Improving Aviation Performance

Commercial Aviation Safety Team. (2008). Mode awareness and energy state manage- ment aspects of flight deck automation (Safety Enhancement 30, Revision 5). Washington, DC: Comercial Aviation Safety Team. Committee for Investigation of National Aviation Accidents. (2011). Final report from the examination of the aviation accident no 192/2010/11 involving the Tu-154M air- plane, tail number 101, which occurred on April 10th, 2010 in the area of the Smolensk North airfield. Moscow, Russia: Interstate Aviation Committee. Cui, X., Bray, S., Bryant, D. M., Glover, G. H., & Reiss, A. L. (2011). A quantitative comparison of NIRS and fMRI across multiple cognitive tasks. Neuroimage, 54, 2808–2821. doi: 10.1016/j.neuroimage.2010.10.069. Dehais, F., Behrend, J., Peysakhovich, V., Causse, M., & Wickens, C. D. (2017). Pilot flying and pilot monitoring’s aircraft state awareness during go-around execu- tion in aviation: A behavioral and eye tracking study. The International Journal of Aerospace Psychology, 27, 15–28. doi: 10.1080/10508414.2017.1366269. Dehais, F., Causse, M., & Tremblay, S. (2011). Mitigation of conflicts with automa- tion use of cognitive countermeasures. Human Factors, 53, 448–460. doi: 10.1177/0018720811418635. Dehais, F., Causse, M., Régis, N., Menant, E., Labedan, P., Vachon, F., & Tremblay, S. (2012). Missing critical auditory alarms in aeronautics: Evidence for inattentional deafness? Proceedings of the human factors and ergonomics society annual meeting (Vol. 56, pp. 1639–1643). Santa Monica, CA: HFES. doi: 10.1177/1071181312561328. Dehais, F., Causse, M., Vachon, F., Régis, N., Menant, E., & Tremblay, S. (2014). Failure to detect critical auditory alerts in the cockpit evidence for inattentional deaf- ness. Human Factors, 56, 631–644. doi: 10.1177/0018720813510735. Dehais, F., Peysakhovich, V., Scannella, S., Fongue, J., & Gateau, T. (2015). Automation surprise in aviation: Real-time solutions.Proceedings of the 33rd annual ACM confer- ence on human factors in computing systems (pp. 2525–2534). New York: ACM. Dehais, F., Roy, R. N., Durantin, G., Gateau, T., & Callan, D. (2018). EEG-engagement index and auditory alarm misperception: An inattentional deafness study in actual flight condition. In C. Baldwin (Ed.),Advances in neuroergonomics and cognitive engineering: Proceedings of the AHFE 2017 international conference (pp. 227–234). Switzerland, UK: Springer. Dehais, F., Roy, R. N., Gateau, T., & Scannella, S. (2016). Auditory alarm misperception in the cockpit: An EEG study of inattentional deafness. In D. D. Schmorrow, & C. M. Fidopiastis (Eds.), Foundations of augmented cognition: Neuroergonomics and operational neuroscience (pp. 177–187). Switzerland, UK: Springer. Dehais, F., Tessier, C., & Chaudron, L. (2003). GHOST: Experimenting conflicts coun- termeasures in the pilot’s activity. IJCAO’03 Proceedings of the 18th international joint conference on artificial intelligence, (pp. 163–168). San Francisco, CA: Morgan Kaufmann. Dehais, F., Tessier, C., Christophe, L., & Reuzeau, F. (2010). The perseveration syndrome in the pilot’s activity: Guidelines and cognitive countermeasures. In P. Palanque, J. Vanderdonckt, & M. Winckler (Eds.), Human error, safety and systems development, (pp. 68–80). Switzerland, UK: Springer. Duchowski, A. (2007). Eye tracking methodology: Theory and practice. New York: Springer. Durantin, G., Dehais, F., Gonthier, N., Terzibas, C., & Callan, D. E. (2017). Neural signature of inattentional deafness. Human Brain Mapping, 38, 5440–5455. doi: 10.1002/hbm.23735. A Neuroergonomics Approach to Human Performance in Aviation 139

Durantin, G., Gagnon, J.-F., Tremblay, S., & Dehais, F. (2014). Using near infrared spec- troscopy and heart rate variability to detect mental overload. Behavioural Brain Research, 259, 16–23. doi: 10.1016/j.bbr.2013.10.042. Durantin, G., Scannella, S., Gateau, T., Delorme, A., & Dehais, F. (2015). Processing functional near infrared spectroscopy signal with a Kalman Filter to assess working memory during simulated flight. Frontiers in Human Neuroscience, 9. doi: 10.3389/fnhum.2015.00707. Dutch Safety Board. (2010). Crashed during approach, Boeing 737-800, near Amsterdam Schiphol Airport, February 25, 2009 (Final report of Dutch Safety Board.). The Hague, NL: Dutch Safety Board. Fricke, T., Paixão, V., Loureiro, N., Costa, R. M., & Holzapfel, F. (2015). Brain control of horizontal airplane motion–A comparison of two approaches. AIAA atmo- spheric flight mechanics conference (AIAA 2015-0238). Reston, VA: AIAA. doi: 10.2514/6.2015–0238. Gateau, T., Ayaz, H., & Dehais, F. (2018). In silico vs. over the clouds: On-the-fly mental state estimation of aircraft pilots using functional near infrared spec- troscopy based passive-BCI. Frontiers in Human Neuroscience, 12. doi: 10.3389/ fnhum.2018.00187. Gateau, T., Durantin, G., Lancelot, F., Scannella, S., & Dehais, F. (2015). Real-time state estimation in a flight simulator using fNIRS. PloS one, 10(3), e0121279. doi: 10.1371/journal.pone.0121279. Giraudet, L., St-Louis, M.-E., Scannella, S., & Causse, M. (2015). P300 event-related potential as an indicator of inattentional deafness? PLoS one, 10(2), e0118556. doi: 10.1371/journal.pone.0118556. Grossman, N., Bono, D., Dedic, N., Kodandaramaiah, S. B., Rudenko, A., Suk, H. J., Boyden, E. S. et al. (2017). Noninvasive deep brain stimulation via temporally interfering electric fields. Cell,169, 1029–1041. doi: 10.1016/j.cell.2017.05.024. Jacobson, L., Koslowsky, M., & Lavidor, M. (2012). tDCS polarity effects in motor and cognitive domains: A meta-analytical review. Experimental Brain Research, 216, 1–10. doi: 10.1007/s00221-011-2891-9. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica: Journal of the Econometric Society, 47, 263–291. doi: 10.2307/1914185. Li, G., Baker, S., Grabowski, J., & Rebok, G. (2001). Factors associated with in aviation crashes. Aviation, Space, and Environmental Medicine, 72, 52–58. Lotte, F., Larrue, F., & Mühl, C. (2013). Flaws in current human training protocols for spontaneous Brain-Computer Interfaces: Lessons learned from instructional design. Frontiers in Human Neuroscience, 7. doi: 10.3389/fnhum.2013.00568. Mullen, T., Kothe, C., Chi, Y. M., Ojeda, A., Kerth, T., Makeig, S., Cauwenberghs, G., & Ping, T.-Z. (2013). Engineering in medicine and biology society (EMBC), 2013 annual international conference of the IEEE (pp. 2184–2187). New York: IEEE. doi: 10.1109/ EMBC.2013.6609968. Mumaw, R. J. (2017). Analysis of alerting system failures in commercial aviation accidents. Proceedings of the human factors and ergonomics society annual meeting (Vol. 61, pp. 110–114). Santa Monica, CA: HFES. doi: 10.1177/1541931213601493. National Transportation Safety Board. (2007). Runway overrun and collision south- west airlines flight 1248 Boeing 737-7H4, N471WN Chicago Midway International Airport Chicago, Illinois December 8, 2005 ( N T S B/A A R- 0 7/0 6). Wa s h i n g t o n , DC: NTSB. 140 Improving Aviation Performance

National Transportation Safety Board. (2014a). Crash during a nighttime nonprecision instru- ment approach to landing UPS flight 1354 -600, N155UP Birmingham, Alabama, August 14, 2013 (NTSB/AAR-14/02). Washington, DC: NTSB. National Transportation Safety Board. (2014b). Descent below visual glidepath and impact with seawall, Asiana Airlines flight 214 Boeing 777- 200ER, HL7742, San Francisco, California, July 6, 2013 (NTSB/AAR-14/01). Washington, DC: NTSB. Nitsche, M. A., & Paulus, W. (2000). Excitability changes induced in the human motor cortex by weak transcranial direct current stimulation. Journal of Physiology, 527, 633–639. doi: 10.1111/j.1469-7793.2000.t01-1-00633.x. O’Hare, D., & Smitheram, T. (1995). Pressing on into deteriorating conditions: An appli- cation of behavioral decision theory to pilot decision making. The International Journal of Aviation Psychology, 5, 351–370. doi: 10.1207/s15327108ijap0504_2. Ogawa, S., Lee, T.-M., Kay, A. R., & Tank, D. W. (1990). Brain magnetic resonance imaging with contrast dependent on blood oxygenation. Proceedings of the national academy of sciences, 87, 9868–9872. doi: 10.1073/pnas.87.24.9868. Orasanu, J., Martin, L., & Davison, J. (2001). Cognitive and contextual factors in avia- tion accidents: Decision errors. In E. Salas & G. A. Klein (Eds.), Linking expertise and naturalistic decision making (pp. 209–225). Mahwah, NJ: Lawrence Erlbaum Associates. Parasuraman, R. (2003). Neuroergonomics: Research and practice. Theoretical Issues in Ergonomics Science, 4, 5–20. doi: 10.1080/14639220210199753. Parasuraman, R., & Galster, S. (2013). Sensing, assessing, and augmenting threat detection: Behavioral, neuroimaging, and brain stimulation evidence for the critical role of attention. Frontiers in Human Neuroscience 7. doi: 10.3389/ fnhum.2013.00273. Parasuraman, R., & McKinley, R. (2014). Using noninvasive brain stimulation to accel- erate learning and enhance human performance. Human Factors 56, 816–824. doi: 10.1177/0018720814538815. Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39, 230–253. doi: 10.1518/001872097778543886. Parasuraman, R., & Rizzo, M. (2007). Neuroergonomics: The brain at work. New York: Oxford University Press. Reynal, M., Colineaux, Y., Vernay, A., & Dehais, F. (2016). Pilot flying vs. pilot monitor- ing during the approach phase: An eye–tracking study.HCI-Aero 2016, international conference on human-computer interaction in aerospace (Article 7). New York: ACM Digital Library. doi: 10.1145/2950112.2964583. Rhoda, D. A., & Pawlak, M. L. (1999). An assessment of thunderstorm penetrations and deviations by commercial aircraft in the terminal area (Project Report NASA/A-2). Lexington, MA: Lincoln Laboratory. Roy, R. N., & Frey, J. (2016). Neurophysiological markers for passive BCIs. In L. Bougrain, M. Clerc, & F. Lotte, (Eds.), Brain-computer interfaces: Methods, applications and per- spectives (pp. 85–100). UK: ISTE-Wiley. doi: 10.1002/9781119144977.ch5. Roy, R. N., Bonnet, S., Charbonnier, S. & Campagne, A. (2013). Mental fatigue and work- ing memory load estimation: Interaction and implications for EEG based passive BCI. Conference proceedings: 35th annual international conference of the IEEE engineering in medicine and biology society (pp. 6607–6610). New York: IEEE. Roy, R. N., Bonnet, S., Charbonnier, S. & Campagne, A. (2016). Efficient workload classification based on ignored auditory probes: A proof of concept. Frontiers in Human Neuroscience, 10. doi: 10.3389/fnhum.2016.00519. A Neuroergonomics Approach to Human Performance in Aviation 141

Sato, M., Yoshioka, T., Tkajihara, S., Toyama, K., Goda, K., & Kawato, M. (2004). Hierarchical Bayesian estimation for MEG inverse problem. Neuroimage 23, 806–826. doi: 10.1016/j.neuroimage.2004.06.037 Scannella, S., Causse, M., Chauveau, N., Pastor, J., & Dehais, F. (2013). Effects of the audiovisual conflict on auditory early processes.International Journal of Psychophysiology, 89, 115–122. doi: 10.1016/j.ijpsycho.2013.06.009. Scannella, S., Roy, R. N., Laouar, A., & Dehais, F. (2016, October). Auditory neglect in the cockpit: Using ERPS to disentangle early from late processes in the inattentional deafness phenomenon. Presented at the 1st international neuroergonomics conference: The brain at work and in everyday life. Paris, France. Scholl, C. A., Chi, Y. M., Elconin, M., Gray, W. R., Chevillet, M. A., & Pohlmeyer, E. A. (2016). Classification of pilot-induced oscillations during in-flight piloting exer- cises using dry EEG sensor recordings.Engineering in medicine and biology society (EMBC), 2016 IEEE 38th annual international conference of the (pp. 4467–4470). New York: IEEE. Spangler, C., & Park, A. (2010, February). Loss of control on approach Colgan Air, Inc., operating as continental connection flight 3407 bombardier DHC-8-400, N200WQ Clarence Center, New York. ACM SIGGRAPH 2010 Dailies. New York: ACM. Staw, B. M. (1981). The escalation of commitment to a course of action. Academy of Management Review, 6, 577–587. doi: 10.2307/257636. Tombu, M. N., Asplund, C. L., Dux, P. E., Godwin, D., Martin, J. W., & Marois, R. (2011). A unified attentional bottleneck in the human brain.Proceedings of the national academy of sciences, 108, 13426–13431. doi: 10.1073/pnas.1103583108. Verdière, K., Roy, R., & Dehais, F. (2018). Detecting pilot’s engagement using fNIRS connectivity features in an automated vs manual landing scenario. Frontiers in Human Neuroscience, 12. doi: 10.3389/fnhum.2018.00006. Vidulich, M. A. (2003). Mental workload and situation awareness: Essential con- cepts for aviation psychology practice. In Tsang, P. S. & Vidulich M. A. (Eds.). Principles and practice of aviation psychology. (pp. 115–146). Mahwah, NJ: Erlbaum. Yamagishi, N., Callan, D. E., Anderson, S. J., & Kawato, M. (2008). Attentional changes in pre-stimulus oscillatory activity within early visual cortex are pre- dictive of human visual performance. Brain Research, 1197, 115–122. doi: 10.1016/ j.brainres.2007.12.063. Zander, T. O., Kothe, C., Jatzev, S., & Gaertner, M. (2010). Enhancing human-computer interaction with input from active and passive brain-computer interfaces. In D. S. Tan & A. Nijholt (Eds.), Brain-computer interfaces (pp. 181–199). London, UK: Springer.

7 Eye Movements Research in Aviation: Past, Present, and Future

Leandro L. Di Stasi and Carolina Diaz-Piedra

CONTENTS Oculometer Parameters and Eye Tracking Devices in Aviation Research .....144 A Brief History of the Measurement of Oculometer Parameters in Aviation ...... 147 Eye Movements as an Aid for Aviation Training ...... 149 Eye Movements as a Biomarker of the Pilot’s (Cognitive) Psychophysical State...... 151 Mental Workload ...... 151 Fatigue and Sleepiness ...... 152 Hypoxia ...... 153 Current and Future Challenges of Eye Movements Research in Aviation .....154 Acknowledgments ...... 156 References ...... 156

“(…) If we know where a pilot is looking, we do not necessarily know what he is thinking, but we know something of what he is thinking about.” Fitts, Jones, and Milton (1950)

Flying an aircraft is a highly demanding cognitive task where performance relies heavily on visual search and the interpretation of visual information. For example, in manual flying, pilots must control and continuously moni- tor multiple variables, which are usually cross-coupled (Haslbeck & Zhang, 2017). Therefore, healthy vision is considered a crucial sensory system for pilots (since World War I, the assessment of pilots’ vision and oculometer functions is mandatory). Moreover, since the pioneering studies (Figure 7.1) of Fitts and colleagues with military pilots (e.g., Milton, Jones, & Fitts, 1949), eye tracking technologies have represented one of the most reliable tools to improve aircraft instruments/panels design (e.g., Gainer & Obermayer, 1964) and to study pilots’ biobehavioral states (e.g., Di Stasi, et al., 2016; Diaz-Piedra, et al., 2016). As flying not only requires conceptualknowledge, but the skills to visually search for relevant information, eye-tracking technologies can also enhance pilot training (Diaz-Piedra, Catena, et al., 2016). In the present

143 144 Improving Aviation Performance

FIGURE 7.1 On the left side, arrangement of the experimental equipment inside the cockpit made by Fitts and colleagues (1950). Eye movements were studied through recordings of the reflected eyes (2) taken by a video camera (4) behind the pilots (1: blind flying hood, 3: watch). On the right side, three reference photographs taken at the beginning of the experimental flight. These photo- graphs were used as measuring samples for scoring the recorded film. Thus, it was possible to determine which instrument the pilot was fixating within each frame (in this example, from the top: air speed indicator, vertical speed indicator, and gyro horizon). (Adapted from Eye Fixations of Aircraft Pilots: Frequency, Duration, and Sequence of Fixations When Flying the USAF Instrument Low Approach Systems (ILAS), by J. L. Milton, R. E. Jones, and P. M. Fitts, 1949. Dayton, OH: Wright Air Development Center, Wright-Patterson AFB.) chapter, we describe the main oculometer indices and the different tech- nologies commonly used in aviation. After, we present a short overview on the application of eye tracking technologies in aviation, from the first stud- ies during World War I to the proliferation of studies that occurred after the 1950s, with special reference to the latest advances in the field. Finally, we briefly discuss current and future challenges and the directions in which eye movements research in aviation is likely to evolve.

Oculometer Parameters and Eye Tracking Devices in Aviation Research Vision is a dominant sensory system supporting human function, especially in psychomotor tasks, as it is vital to identify the action’s target and pro- vides feedback to enable corrections (Di Stasi, Diaz-Piedra, Rieiro, Sanchez Carrión, Berrido, Olivares, Catena. 2016). Moreover, as parts of the human eye (retinas) are outgrowths of the brain—thus, a portion of the central ner- vous system (Di Stasi, Catena, Cañas, Macknik, & Martinez-Conde, 2013; Hoar, 1982)—eye movements can be used to reveal perceptual and cognitive Eye Movements Research in Aviation: Past, Present, and Future 145 processes. Since the 1950s, eye tracking technologies have developed over time in Engineering Psychology research (McCarley & Kramer, 2006). Nowadays, eye tracking technologies represent a powerful neuroergonomic assessment instrument due to the high density and richness of the obtain- able datasets (Goldstein, 2010). The most common oculometer parameters studied in aviation research are based on gaze indices (saccades, fixations, and gaze entropy), blink behavior (e.g., blink rate), and pupil response (e.g., pupil dilation) (Figure 7.2). Even though eye movements seem to be continuous, saccades (fast, ballistic eye movements) typically occur 3 or 4 times every second (Gilchrist, 2011). Usually, after each saccade, a fixational pause occurs. Such sequences of saccades and fixations bring regions of interest onto the fovea. During fix- ations, imperceptible micromovements, such as drifts and microsaccades occur to prevent fading of the retinal image (Di Stasi, McCamy, et al., 2013; McCamy, Otero-Millan, Di Stasi, Macknik, & Martinez-Conde, 2014). The velocity profile of a saccade (Figure 7.7) starts with a relatively stable eye that quickly accelerates to a peak velocity and, then, rapidly decelerates to reach stability again (Di Stasi, Marchitto, Antolí, & Cañas, 2013). The sac- cadic peak velocity/magnitude relationship, known as the main sequence, is commonly studied because the peak velocity of a saccade is intimately related to its magnitude (or amplitude) (Bahill, Clark, & Stark, 1975). This relationship is relatively fixed, and it can be used to assess the functioning of the oculometerr system. For centuries, the measurement of eye movements was made through the observation of the own eyes or the eyes of another person. Other attempts to measure eye movements were based on feeling the movements of the closed eyes by touching the lids (Smith, 1738), or the recording of the sounds of muscular movements using a kymograph (Javal, 1879). However, the wide- spread current application of eye movement metrics would not be the same without the development of the eye trackers. The first eye trackers appeared at the end of the nineteenth century, and they required direct contact with the cornea (Huey, 1898). The later work of Dodge, an experimental psychologist, was key in the development of such ingenious instruments at the beginning of the twentieth century (Dodge, 1903, 1904; Dodge & Cline, 1901). His continuous efforts to improve the mea- surement of eye movements resulted in a photographic eye tracker based on the corneal reflection. The photocronograph, which recorded horizontal eye movements non-invasively, was the basis of many modern devices and allowed the proliferation of studies in several fields, including aviation. In 1905, another group of psychologists, led by Judd, developed an advanced eye tracker that had the ability to record the temporal aspects of eye move- ments in both vertical and horizontal dimensions (Judd, McAllister, & Steel, 1905). In the next 80 years, the development of eye tracking technologies was based on these foundational works (for a detailed review, see Duchowski, 2017; Hollomon, Kratchounova, Newton, Gildea, & Knecht, 2017). 146 Improving Aviation Performance

FIGURE 7.2 Several oculometer parameters used as research outcomes in aviation studies. Clockwise. A pilot wearing an eye tracker mounted onto an eyeglasses frame. Pupil response (e.g., pupil dilation) and blink behavior (e.g., blink rate). An example of a horizontal gaze position over a 2.5-second epoch of recording. The gray line represents the raw horizontal eye-position sig- nal. The pulses indicate two saccades and two microsaccades. Hypothetical visual scanning behavior (scan path) of a pilot flying the Eurocopter EC120 Colibri. Pilot’s fixations are illus- trated by circles and circle diameter indicates their duration. The black lines represent pilot’s saccades. For analyzing eye movements data, researchers can define several areas of interest (AOIs). In this case, an AOI is defined around the gyro horizon. Temporally contiguous fixa- tions that fall within a specific AOI define a dwell.

In aviation, eye trackers can show where the pilot is looking at, but they can also reflect aspects of the pilot’s cognitive state. In the first case, to esti- mate the gaze direction, it is necessary to measure both head and eye move- ments in order to know the position of the eye in relation to the surrounding objects. Depending on how sophisticated the eye trackers are, the point of gaze can be estimated using different metrics including number of fixations, scanning patterns, or gaze entropy. In the second case, to know “how” the pilot is looking, it is necessary to measure the motion of the eyes in relation to the head. Depending on how sophisticated the technology is, and if the head is fixed or the pilot can move it freely, different metrics can be recorded. This would include saccadic parameters, microsaccades and drifts, blink behav- ior, or the pupil response. Most eye trackers use a video camera to record images of the pilot’s eyes to estimate the gaze position in the world. These systems can include infrared emitting diodes that illuminate the eyes for a Eye Movements Research in Aviation: Past, Present, and Future 147

FIGURE 7.3 Different eye tracking solutions to record pilot’s eye movements in real time inside an aircraft/ flight simulator. On the left side, the eye tracker is fixed under/on the instrument panel while the scene camera is mounted in the cockpit’s ceiling (or behind the pilot’s seat). Middle, the eye tracker and the scene camera are integrated and mounted onto an eyeglasses frame worn by the pilot. On the right side, an EOG system is placed around the pilot’s eyes. It includes a verti- cal and a horizontal channel, as well as a reference. For the three solutions, a data processing system (e.g., laptop) can be placed behind the pilot’s seat (not shown in the figure).

more reliable image of the pupil and the corneal reflectance. These (video- based) eye trackers can be either head-mounted systems (that way, the rela- tive image of the eye in relation to the head remains approximately constant (Tvaryanas, 2004)) or cockpit-mounted systems a separate head/face tracker is needed to estimate the point of gaze (e.g., Sullivan, Yang, Day, & Kennedy, 2011). Eye trackers also vary in their accuracy (average error in estimating gaze position) and their sampling rate (the frequency the system samples the gaze position, from ~ 30Hz, enough to study fixations and gaze entropy, to~ 500/1000Hz, to study saccadic velocity and microsaccades). Other methods to track eye movements are based on the electrooculography (EOG). EOG- based eye tracking devices use the corneoretinal potential to estimate eye movements and the blink behavior (e.g., Sem-Jacobsen, Nilseng, Patton, & Eriksen, 1959; Stern & Bynum, 1970; Wilson & Fisher, 1991). Figure 7.3 depicts three possible solutions to track the pilot’s eye movements inside an aircraft/ flight simulator.

A Brief History of the Measurement of Oculometer Parameters in Aviation In the nineteenth century, the hypoxic environments experienced by bal- loonists brought about one of the first connections between vision and avia- tion. Paul Bert, one of the fathers of aviation medicine, conducted a series of seminal experiments on balloonists’ vision and recognized the effectiveness of supplying air enriched of oxygen to mitigate the impairments of altitude (Colin, 1978). After 1903, once the first aircrafts started to fly, it was obvious 148 Improving Aviation Performance that pilots had problems of physiological tolerance beyond those related to hypoxia, such as in-flight acceleration forces. In the context of World War I, the Department of Ophthalmology of the US Air Service Medical Research Laboratory was one of the first to systematically test pilots’ vision. They diagnosed eye and vision pathologies, but also carried out a thorough exam- ination of ocular difficulties related to altitude fitness, dark adaptation, and vestibular sensitivity. They also studied the effects of tobacco, drugs, and alcohol on oculometer functions, as well as the effects of the goggles used by the pilots (Wilmer, 1920). Other pioneering works were the studies of Wilmer and Berens (1918) and the studies of Griffith (1920). Wilmer and Berens (1918) tested the effect of altitude upon ocular functions of pilots in low-pressure chambers and found that hypoxia increased errors of refraction, and pro- duced confused stereoscopic vision. Griffith (1920) examined pilots’ eye movements following body rotation and described the phenomenon of ves- tibular habituation. Tiffin and Bromer (1943) made the first recordings of pilots’ eye movements during flight. They analyzed motion pictures of pilots’ eye movements during the last 5–10 seconds before landing. However, the technology was very immature at that time and they failed to dis- cover consistent differences in visual patterns between experienced and novice pilots. In the late 1940s and early 1950s, one of the fathers of the Engineering Psychology research, Paul Fitts, conducted a series of semi- nal studies at Wright-Patterson Air Force Base (Ohio, US) (e.g., Fitts et al., 1950; Jones, Milton, & Fitts, 1949; Milton et al., 1949). Fitts and colleagues aimed at improving aviation safety by adjusting instrument panel lay- outs in military aircrafts. To do so, they recorded eye movements of US Air Force pilots reading different instrument panels, performing various maneuvers, and flying under different conditions in a variety of aircrafts. They were able to determine how often and for how long pilots looked at the instruments depending on flight conditions and pilots’ expertise. A video camera was attached behind the pilot’s seat and a mirror was set on the cockpit in such a way that the camera viewed the pilot’s head and eyes in the mirror (Figure 7.1). With Fitts’ studies, the modern his- tory of eye tracking in aviation started (see Glaholt, 2014; Ziv, 2016, for an extended review), and they represent the earliest applications of eye tracking to what is now known as usability engineering (Jacob & Karn, 2003). Although Fitts’ investigations were not generalizable to other aircrafts, flight conditions, maneuvers, or instruments, his results sup- ported the usefulness and importance of eye tracking technologies in aviation (Senders, 1966). His legacy has contributed enormously to the design of modern aircraft cockpits, significantly reducing pilot errors and fatal plane crashes (Miller, Kirlik, Kosorukoff, & Byrne, 2004). During the second half of the twentieth century, there was a flourishing of eye Eye Movements Research in Aviation: Past, Present, and Future 149

FIGURE 7.4 Schematic representation of the timeline of the eye movements studies in aviation over approx- imately the last 100 years.

movements studies in aviation (Figure 7.4) that was the result of relevant improvements in eye tracking technologies and data analyses.

Eye Movements as an Aid for Aviation Training In the 1970s and the 1980s, NASA and the US Armed Forces researchers developed the oculometer training tape technique (Albery, 1976; Barnes, 1970; Spady, Jones, Coates, & Kirby, 1982). This is an instructional aid that measures the eye positions while pilots are flying, providing online infor- mation about the trainee pilot’s scan behavior to the flight instructors, or vice versa. This technique and new developments (e.g., Carroll et al., 2013; Diaz-Piedra, Catena, et al., 2013; Wetzel, Anderson, & Barelka, 1998) would enhance aviation training by (1) allowing flight instructors to provide real- time feedback about the trainee pilot’s scan behavior, (2) improving debrief- ing sessions by playing back the trainee pilot’s scan behavior, and (3) editing didactic videos based on the scan behavior of experienced pilots. First, visual scan patterns are guided to improve the performance of flight tasks (Wetzel et al., 1998). In order to provide the best feedback (in the right 150 Improving Aviation Performance form and at the right time), flight instructors need to know how the trainee pilots allocate their (visual) attention. Eye tracking technologies provide an objective measure of where the pilot is looking at. Furthermore, modern wireless live streaming systems allow flight instructors to monitor the train- ee’s eye movements online and to provide real-time feedback (Diaz-Piedra, Catena, et al., 2016). Second, flight instructors can play back trainee pilots’ flight videos during debriefing sessions to show them their visual scan paths (Harris, Glover, & Spady, 1986), and where they focused during the flight session (pilot’s eye movements can be superimposed onto the recorded video). Lastly, it is known that, when accomplishing complex visual tasks, experts possess sophisticated visual observation skills that enable them to find relevant features of a visual stimulus and to interpret these observations (Jarodzka, Scheiter, Gerjets, & van Gog, 2010). Therefore, even though Jones and colleagues already noted that flying training based on individualized feedback would be more helpful (Jones, Coates, & Kirby, 1983), using visual observations of experienced and successful task performers (for example, standardized videos) might also improve instruction by cueing (van Gog & Jarodzka, 2013). Despite the original positive evaluations of the oculometer training tape technique (Albery, 1976; Jones et al., 1982; Spady et al., 1982; Wetzel et al., 1998), it has failed to gain traction in aviation training programs. This is probably due to the technical difficulties of recording eye movements, and the intrusiveness/bulkiness of the equipment needed (Di Stasi, McCamy, et al., 2016). In recent years, user-friendly, commercial, and portable eye track- ers—e.g., mounted onto a lightweight eyeglasses frame (Figure 7.5)—have overcome many of these barriers. These non-intrusive devices can record pilots’ eye movements while simultaneously capturing what the pilots see (Diaz-Piedra, Catena et al., 2016; Weibel, Fouse, Emmenegger, Kimmich, & Hutchins, 2012).

FIGURE 7.5 Sketch of the oculometer training tape technique. On the left side, the trainee pilot’s eye move- ments are continuously monitored by the flight instructor using a wireless system during a flight simulation. Fixations are illustrated by circles. On the right side, a pilot wearing an eye- glasses-frame eye tracker inside the Airbus Helicopter Tiger simulator. Eye Movements Research in Aviation: Past, Present, and Future 151

Eye Movements as a Biomarker of the Pilot’s (Cognitive) Psychophysical State Mental Workload One of the main challenges for aviation research is the study of the pilot’s mental workload, as her/his performance is dependent on it (Wickens, 2002). Several eye movement metrics have been used to study the pilot’s workload (see Glaholt, 2014, for a recent review). Here, we focus on gaze entropy and pupil dilation. The pilot’s gaze entropy represents the spatial/temporal randomness of the pilot’s visual scanning, that is, how the pilot is visually processing the surrounding information (rather than the amount of information acquired by the pilot during flight) (Harris et al., 1986). It could be a powerful tool, as it can be measured in simulated/real conditions by wearable, non-intrusive eye trackers. However, previous studies focused on the pilot’s gaze entropy as a mental workload index have found contradictory results (e.g., Di Nocera, Camilli, & Terenzi, 2007; Harris et al., 1986; Tole, Stephens, Vivaudou, Harris, & Ephrath, 1982; van de Merwe, van Dijk, & Zon, 2012). For example, Di Nocera and colleagues (2007) found that highly demanding flight proce- dures (i.e., simulated takeoff and landing) were associated with higher dis- persion of the eye fixation (higher entropy), whereas less demanding phases (i.e., climb, descend, and cruise phase) were associated with lower disper- sion within the same pilot (Figure 7.6). Two recent studies that included

FIGURE 7.6 Gaze entropy variations for a pilot over different simulated flight phases. Gaze entropy is higher during departure (DEP) and landing (LAN) phases than during climb (CLI), descent (DES), or cruise (CRU) phases. (Adapted from Di Nocera, F. et al., J. Cogn. Eng. Decis. Maki., 1, 271–285, 2007.) 152 Improving Aviation Performance emergency operational procedures found similar results (van de Merwe et al., 2012; van Dijk, van de Merwe, & Zon, 2011). In both studies, gaze entropy increased after the pilots discovered a cockpit instrument failure. However, these recent findings seem counterintuitive in the light of previous research that found opposite trends using similar experimental manipulations: gaze entropy decreased as mental workload increased (Harris, Tole, Ephrath, & Stephens, 1982; Tole at al., 1982; Ruigrok & Hoekstra, 2007). Differences in experimental procedures (e.g., did the pilot employ standard procedures to solve a malfunction?) and/or in entropy estimation procedures (Shannon estimation vs. Nearest neighbor indicator) might have generated discrepan- cies in the results. Further research, taking into account potential confound- ing factors, is needed to understand how mental workload impacts on the pilot’s gaze entropy under different flight conditions. Pupil response (e.g., dilation) is also considered a sensitive index to track pilot’s mental workload variations (Foroughi, Sibley, & Coyne, 2017). Increased task demands seem to induce larger pupil dilation (Krebs, Wingert, & Cunningham, 1977; Yu, Wang, Li, Braithwaite, & Greaves, 2016, but see Glaholt, 2014). Yet, pupil response might be an unreliable index in aviation settings as environmental (e.g., luminance) factors can affect it (Peysakhovich, Vachon, & Dehais, 2017).

Fatigue and Sleepiness Extended hours of duty and/or inadequate sleep are common among pilots (Caldwell, 2012). As fatigue and sleepiness are detrimental to the pilot’s capacities, they pose a significant risk to aviation safety (Borghini, Astolfi, Vecchiato, Mattia, & Babiloni, 2014; Portman-Tiller, 1998). Several oculome- ter parameters have been assessed as feasible indices of reduced levels of alertness (including both fatigue and sleepiness), especially saccadic velocity and blink rate. Total and partial sleep deprivation studies have found clear decrements in pilots’ saccadic velocity (e.g., Caldwell, Caldwell, Brown, & Smith, 2004; Chandler, Arnold, Phillips, Lojewski, & Horning, 2010). Studies focused on fatigue also found saccadic velocity decrements with time in flight. For example, Di Stasi and colleagues (Di Stasi, McCamy, et al., 2016) found a decrease in saccadic velocity after long simulated flights ~( 2 hours) compared to short (~ 1 hour) simulated flights. In real flight conditions, decrements in saccadic velocity after instrument (Diaz- Piedra, Rieiro, et al., 2016) and visual flight (LeDuc, Greig, & Dumond 2005) have been described (Figure 7.7). Finally, studies focused on the effects of sleepiness and fatigue on blink behavior are less conclusive. Several stud- ies have found that blink rate increases as fatigue (generated by the time on flight) and sleepiness increased (e.g., Stern, Boyer, & Schroeder, 1994; but see Previc et al., 2009). However, blink behavior might not be fully reliable in aviation settings as it is affected by environmental (e.g., low humidity) factors (Thomas, Gast, Grube, & Craig, 2015). Eye Movements Research in Aviation: Past, Present, and Future 153

FIGURE 7.7 Main sequence relationship and eye velocity profile. On the left side, the effects of time on flight on the saccadic main sequence (peak velocity/magnitude relationship). Pre-flight period is rep- resented in dashed line, post-flight period in solid line. The curves are the fits to the data from each period. On the right side, the velocity profile of a specific saccade. At the beginning of a saccade, the eye quickly accelerates until it reaches the peak velocity and then slowly deceler- ates. The saccadic velocity waveform is typically symmetric (for saccades smaller than 10º).

Hypoxia In-flight hypoxia, defined as the decreased availability of oxygen in the body’s tissues due to altitude, is considered one of the most serious haz- ards during flight. Hypoxia can cause impaired vision, cognition, and motor control functions, as well as incapacitation and death (Petrassi, Hodkinson, Walters, & Gaydos, 2012). Therefore, the early detection of the effects of hypoxia may serve to alert the aircrew before they are unable to take correc- tive actions and to help to prevent catastrophic events. Recent studies have focused on finding preventive indices of physiological impairments related to hypoxia in the aircrew, with special reference to oculometer functions and vision (Petrassi et al., 2012). However, the studies that have tested the sensi- tivity of saccadic velocity as a tool to detect hypoxic events have produced contradictory results. Van der Post and colleagues (2002), and Temme and colleagues (2011), found decreased saccadic peak velocity, as a result of the slowing of brain processes due to decreased oxygen. Alternatively, different studies found no effect of hypoxia on saccadic velocity (Cymerman et al., 2003; Kowalczuk et al., 2016; Stepanek et al., 2014). Neither of these studies accounted for the influence of fatigue due to time on task, which is a well- documented modulator of saccadic velocity (Di Stasi, Catena, et al., 2013). Moreover, both hypoxia and time on task induce fatigue. More recent work by Di Stasi and colleagues (2014) found that the effect of hypoxia on saccadic velocity was no longer evident after controlling for the influence of fatigue due to time on task. Interestingly, hypoxia still affected oculometer behavior, as it decreased gaze stability: hypoxia-induced an increase in the velocity of the (intersaccadic) drift, one of the main types of eye micromovements that pilots produce whenever they fix their gaze on an instrument (Di Stasi et al., 2014) (Figure 7.8). 154 Improving Aviation Performance

FIGURE 7.8 On the left side, one of the hypobaric chambers for altitude training at the Aerospace Medicine Training Center (CIMA, Madrid, Spain). On the right side, an instructor (standing on the right of the chamber) is monitoring a group of and Air Force pilots during a hypoxia training session.

Current and Future Challenges of Eye Movements Research in Aviation Since the 1950s, significant progress has been made in eye movements research, particularly in research methods and eye tracking technologies (both software and hardware). This has enabled a deeper understanding of the operator’s cognitive state while performing real tasks (Grabowski, Rowen, & Rancy, 2018). Currently, one of the greatest challenges for eye tracking technologies is its use in daily-life settings for enhancing operator perfor- mance, more than being just a research tool. That way, the use of eye track- ing technologies would improve aviation safety in all its domains (Benitez, del Corte Valiente, & Lanzi, 2018), from air traffic management to spaceflight. Thus, it is plausible to foresee the implementation of fully functional EOG- based devices, as they are easy to set up (e.g., no calibration is needed) and low cost (e.g., just one/two pairs of dry electrodes). In addition, to enhance flight safety, the integration of different operator’s psychophysiological indi- ces, such as the brain activity, will be key. Furthermore, the development of algorithms able to provide accurate real-time classifications of the pilot’s cog- nitive state (alertness, fatigue, etc.) from psychophysiological data is still an open issue (Harrivel et al., 2017). Finally, using wearable eye trackers inside the aircraft could be also useful in aviation safety investigations. Advanced cockpit recorders could include pilots’ oculometer parameters and other psy- chophysiological indices of cognitive states (Wang, Li, & Lin, 2017). Another growing trend in aviation is commercial space travel, which is already starting to revolutionize the global aviation industry and Eye Movements Research in Aviation: Past, Present, and Future 155 demands fundamental changes in the pilot’s tasks and training (Reddy, 2018). If the first manned space flight by Yuri Gagarin, in 1961, lasted almost two hours, future space travels will have longer durations (for months to years). These travels will likely affect operators’ cogni- tive state (Heard, Harriott, & Adams, 2018) and the way operators inter- act with highly automated systems (Bruder, Eißfeldt, Maschke, & Hasse, 2014). In particular, operators will undergo a multitude of adaptations to living in microgravity for extended periods of time (Donoviel, Zimmer, & Clayton, 2017), that will alter their eye movements (Uri, Linder, Moore, Pool, & Thornton, 1989), and will generate visual impairment upon return to Earth (Patel, Pass, Mason, Gibson, & Otto, 2018). Finally, if we consider that more than one hundred years of results in eye movements research might be not directly applicable to environments with reduce gravity (White et al., 2016), a new era of research with new challenges and opportunities is coming (Figure 7.9). To conclude, eye tracking technologies have provided (and still provides) great opportunities for researchers to better understand and improve flight training, the interaction between the pilot and the aircraft, and to monitor the pilot’s cognitive state. Finally, even though pilots’ eyes might not be a window to their souls, eye movements are unquestionably a clear window to their minds.

FIGURE 7.9 Cosmonaut Sergei Krikalev on the International Space Station with a head-mounted video- based eye tracking device (Source: Wikipedia). 156 Improving Aviation Performance

Acknowledgments Research by LLDS and CDP is funded by the CEMIX UGR-MADOC- SANTANDER (Project PIN 2018-15). Research by LLDS is supported by the Ramon y Cajal fellowship program (RYC-2015-17483). Research by CDP is supported by the UGR MediaLab research grant (Mlab-PP2016-03). We thank Dr. Jarrett (University of Utah) for proofreading the paper.

References Albery, W. B. (1976). Recording pilot eye movement behavior: Approaches and possible appli- cations (Report No. AFHRL-TR-75-81). Dayton, OH: Wright Air Development Center, Wright-Patterson AFB. Bahill, A. T., Clark, M. R., & Stark, L. (1975). The main sequence, a tool for studying human eye movements. Mathematical Biosciences, 24, 191–204. Barnes, J. A. (1970). Tactical utility helicopter information transfer study (Report No. 7–70). Aberdeen Proving Ground, Maryland, MD: US Army Aberdeen Research & Development Center. Retrieved from: www.dtic.mil/dtic/tr/fulltext/ u2/705594.pdf Benitez, D. M., del Corte Valiente, A., & Lanzi, P. (2018). A novel global operational concept in cockpits under peak workload situations. Safety Science, 102, 38–50. Borghini, G., Astolfi, L., Vecchiato, G., Mattia, D., & Babiloni, F. (2014). Measuring neurophysiological signals in aircraft pilots and car drivers for the assess- ment of mental workload, fatigue and drowsiness. Neuroscience & Biobehavioral Reviews, 44, 58–75. Bruder, C., Eißfeldt, H., Maschke, P., & Hasse, C. (2014). A model for future aviation: Operators monitoring appropriately. Aviation Psychology and Applied Human Factors, 4, 13–22. Caldwell, J. A. (2012). Crew schedules, sleep deprivation, and aviation performance. Current Directions in Psychological Science, 21, 85–89. Caldwell, J. A., Caldwell, J. L., Brown, D. L., & Smith, J. K. (2004). The effects of 37 hours of continuous wakefulness on the physiological arousal, cognitive performance, self-reported mood, and simulator flight performance of F-117A pilots. Military Psychology, 16, 163–181. Carroll, M., Surpris, G., Strally, S., Archer, M., Hannigan, F., Hale, K., & Bennett, W. (2013). Enhancing HMD-based F-35 training through integration of eye tracking and electroencephalography technology. In D. D. Schmorrow & C. M. Fidopiastis, Foundations of augmented cognition (pp. 21–30). Heidelberg, Germany: Springer-Verlag GmbH. Chandler, J. F., Arnold, R. D., Phillips, J. B., Lojewski, R. A., & Horning, D. S. (2010). Preliminary validation of a readiness-to-fly assessment tool for use in naval aviation (Report No. NAMRL-10-22). Pensacola, FL: Naval Aerospace Medical Research Laboratory. Retrieved from: http://www.dtic.mil/docs/citations/ADA522106 Eye Movements Research in Aviation: Past, Present, and Future 157

Colin, J. (1978). Paul Bert (Report No. TM-75599). Washington, DC: NASA. Retrieved from: https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19790005516.pdf Cymerman, A., Muza, S. R., Ditzler, D., Sharp, M., Friedlander, A., Hagobian, T., … & Fulco, C. (2003). Oculometer and pupillary reflexes during acute exposure to hypo- baric hypoxia (Report No. T-03). Natick, MA: US Army Research Institute of Environmental Medicine. Retrieved from: http://www.dtic.mil/docs/citations/ ADA411788 Di Nocera, F., Camilli, M., & Terenzi, M. (2007). A random glance at the flight deck: Pilots’ scanning strategies and the real-time assessment of mental workload. Journal of Cognitive Engineering and Decision Making, 1, 271–285. Di Stasi, L. L., Cabestrero, R., McCamy, M. B., Ríos, F., Catena, A., Quirós, P., … & Martinez-Conde, S. (2014). Intersaccadic drift velocity is sensitive to short‐term hypobaric hypoxia. European Journal of Neuroscience, 39, 1384–1390. Di Stasi, L. L., Catena, A., Canas, J. J., Macknik, S. L., & Martinez-Conde, S. (2013). Saccadic velocity as an arousal index in naturalistic tasks. Neuroscience & Biobehavioral Reviews, 37, 968–975. Di Stasi, L. L., Diaz-Piedra, C., Rieiro, H., Sanchez Carrión, J. M., Berrido, M. M., Olivares, G., & Catena, A. (2016). Gaze entropy reflects surgical task load. Surgical Endoscopy, 30, 5034–5043. Di Stasi, L. L., Marchitto, M., Antolí, A., & Cañas, J. J. (2013). Saccadic peak velocity as an alternative index of operator attention: A short review. Revue Européenne de Psychologie Appliquée/European Review of Applied Psychology, 63, 335–343. Di Stasi, L. L., McCamy, M. B., Catena, A., Macknik, S. L., Canas, J. J., & Martinez- Conde, S. (2013). Microsaccade and drift dynamics reflect mental fatigue. European Journal of Neuroscience, 38, 2389–2398. Di Stasi, L. L., McCamy, M. B., Martinez-Conde, S., Gayles, E., Hoare, C., Foster, M., … & Macknik, S. L. (2016). Effects of long and short simulated flights on the saccadic eye movement velocity of aviators. Physiology & Behavior, 153, 91–96. Diaz-Piedra, C., Catena, A., Fuentes, L. J., Cherino, A., Sáenz de Santa María, I., Suarez, J., & Di Stasi, L. L. (2016). Innovative tools based on eye movements to improve training in the military context: The EYES Project. In J. Serna, P. Sanchez-Andrada, & I. Alvarez (Eds.), Proceedings from the IV national confer- ence on defense and security research and innovation, DESEi+d 2016 (pp. 1313–1320). Santiago de la Ribera, Murcia, Spain: Centro Universitario de la Defensa de San Javier. Diaz-Piedra, C., Rieiro, H., Suarez, J., Rios-Tejada, F., Catena, A., & Di Stasi, L. L. (2016). Fatigue in the military: Towards a fatigue detection test based on the saccadic velocity. Physiological Measurement, 37, 62–75. Dodge, R. (1903). Five types of eye movement in the horizontal meridian plane of the field of regard. American Journal of Physiology, 8, 307–329. Dodge, R. (1904). The participation of eye movements and the visual perception of motion. Psychological Review, 11, 1–14. Dodge, R., & Cline, T. S. (1901). The angle velocity of eye movements. Psychological Review, 8, 145 –157. Donoviel, D. B., Zimmer, C. N., & Clayton, R. (2017). A novel space ocular syndrome is driving technology advances on and off the planet. In T. George, A. K. Dutta, & M. S. Islam (Eds.), Proceedings of micro-and nanotechnology sensors, systems, and applications IX (Volume 10194, pp.1019427/1-101927/18). Anaheim, CA: SPIE. 158 Improving Aviation Performance

Duchowski, A. T. (2017). Eye tracking methodology. Theory and practice (3rd Ed.). Cham, Switzerland: Springer International Publishing AG. Fitts, P. M., Jones, R. E., & Milton, J. L. (1950). Eye movements of aircraft pilots during instrument-landing approaches. Aeronautical Engineering Review, 9, 24–29. Foroughi, C. K., Sibley, C., & Coyne, J. T. (2017). Pupil size as a measure of within‐task learning. Psychophysiology, 54, 1436–1443. Gainer, C. A., & Obermayer, R. W. (1964). Pilot eye fixations while flying selected maneuvers using two instrument panels. Human Factors, 6, 485–501. Gilchrist, I. D. (2011). Saccades. In S. P. Liversedge, I. D. Gilchrist, & S. Everling (Eds.), The Oxford handbook of eye movements (pp. 85–94). New York: Oxford University Press. Glaholt, M. (2014). Eye tracking in the cockpit: a review of the relationships between eye movements and the aviators cognitive state (Report No. R153). Toronto, Ontario, Canada: Defence Research and Development Canada. Retrieved from: http:// www.dtic.mil/docs/citations/AD1000097 Goldstein, E. B. (Ed.). (2010). Encyclopedia of perception (Vol. 1). Thousand Oaks, CA: Sage Publications. Grabowski, M., Rowen, A., & Rancy, J. P. (2018). Evaluation of wearable immersive aug- mented reality technology in safety-critical systems. Safety Science, 103, 23–32. Griffith, C. R. (1920). The organic effects of repeated bodily rotation. Journal of Experimental Psychology, 3, 15 – 47. Harris, R. L., Glover, B. J., & Spady, A. A. (1986). Analytical techniques of pilot scanning behavior and their application (Report No. NASA TP-2525). Hampton, VA: NASA Langley Research Center. Retrieved from: https://ntrs.nasa.gov/archive/nasa/ casi.ntrs.nasa.gov/19860018448.pdf Harris, R. L., Tole, J. R., Ephrath, A. R., & Stephens, A. T. (1982). How a new instrument affects pilots’ mental workload. In Proceedings of the human factors society annual meet- ing (pp. 1010–1013). Santa Monica, CA: Human Factors and Ergonomics Society. Harrivel, A. R., Stephens, C. L., Milletich, R. J., Heinich, C. M., Last, M. C., Napoli, N. J., ... Pope, A. T. (2017). Prediction of cognitive states during flight simula- tion using multimodal psychophysiological sensing. In American Institute of Aeronautics and Astronautics Information Systems-AIAA Infotech@ Aerospace (AIAA 2017-1135). Grapevine, TX: AIAA. Haslbeck, A., & Zhang, B. (2017). I spy with my little eye: Analysis of airline pilots’ gaze patterns in a manual instrument flight scenario.Applied Ergonomics, 63, 62–71. Heard, J., Harriott, C. E., & Adams, J. A. (2018). A survey of workload assessment algorithms. IEEE Transactions on Human-Machine Systems, 48, 434–451. Hoar, R. M. (1982). Embryology of the eye. Environmental Health Perspectives, 44, 31–34. Hollomon, M. J., Kratchounova, D., Newton, D. C., Gildea, K., & Knecht, W. R. (2017). Current status of gaze control research and technology literature (Report No. DOT/ FAA/AM-17/4). Oklahoma City, OK: Civil Aerospace Medical Institute. Huey, E. B. (1898). Preliminary experiments in the physiology and psychology of reading. American Journal of Psychology, 9, 575–886. Jacob, R. J., & Karn, K. S. (2003). Eye tracking in human-computer interaction and usability research: Ready to deliver the promises. In J. Hyona, R. Radach, & H. Deubel (Eds.), The mind’s eye: Cognitive and applied aspects of eye movement research (pp. 573–605). Amsterdam, The Netherlands: Elsevier Science B.V. Eye Movements Research in Aviation: Past, Present, and Future 159

Jarodzka, H., Scheiter, K., Gerjets, P., & van Gog, T. (2010). In the eyes of the beholder: How experts and novices interpret dynamic stimuli. Learning and Instruction, 20, 146–154. Javal, L. É. (1879). Essai sur la physiologie de la lecture. Annales d’Oculistique, 82, 242–253. Jones, D. H., Coates, G. D., & Kirby, R. H. (1982). The effectiveness of incorporating a real-time oculometer system in a commercial flight training program (Report No. NASA-CR-3667). Hampton, VA: NASA Langley Research Center. Retrieved from: https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19830012281.pdf Jones, D. H., Coates, G. D., & Kirby, R. H. (1983). The effectiveness of an oculometer train- ing tape on pilot and copilot trainees in a commercial flight training program (Report No. NASA-CR-3666). Hampton, VA: NASA Langley Research Center. Retrieved from: https://ntrs.nasa.gov/search.jsp?R=19830012282 Jones, R. E., Milton, J. L., & Fitts, P. M. (1949). Eye fixations of aircraft pilots: A review of prior-eye-movement studies and a description of a technique for recording the fre- quency, duration, and sequence of eye fixations during instrument flight (Report No. 5837). Dayton, OH: Wright Air Development Center, Wright-Patterson AFB. Judd, C. H., McAllister, C. N., & Steele, W. M. (1905). General introduction to a series of studies of eye movements by means of kinetoscopic photographs. Psychological Review Monographs, 7, 1–16. Kowalczuk, K. P., Gazdzinski, S. P., Janewicz, M., Gąsik, M., Lewkowicz, R., & Wyleżoł, M. (2016). Hypoxia and coriolis illusion in pilots during simulated flight. Aerospace Medicine and Human Performance, 87, 108–113. Krebs, M. J., Wingert, J. W., & Cunningham, T. (1977). Exploration of an oculometer- based model of pilot workload (Report No. NASA-CR 145153). Washington, DC: NASA. Retrieved from: https://ntrs.nasa.gov/search.jsp?R=19770017823 LeDuc, P. A., Greig, J. L., & Dumond, S. L. (2005). Involuntary eye responses as mea- sures of fatigue in US Army Apache aviators. Aviation, Space, and Environmental Medicine, 76, C86–C91. McCamy, M. B., Otero-Millan, J., Di Stasi, L. L., Macknik, S. L., & Martinez-Conde, S. (2014). Highly informative natural scene regions increase microsaccade pro- duction during visual scanning. Journal of Neuroscience, 34, 2956–2966. McCarley, J. S., & Kramer, A. F. (2006). Eye movements as a window on perception and cognition. In R. Parasuraman & M. Rizzo (Eds.), Neuroergonomics: The brain at work (pp. 95–112). New York: Oxford University Press. Miller, S. M., Kirlik, A., Kosorukoff, A., & Byrne, M. A. (2004). Ecological validity as a mediator of visual attention allocation in human-machine systems (Report No. AHFD- 04-17/NASA-04-6). Savoy, IL: University of Illinois at Urbana-Champaign. Retrieved from: http://aviation.illinois.edu/avimain/papers/research/report/ Milton, J. L., Jones, R. E., & Fitts, P. M. (1949). Eye fixations of aircraft pilots: Frequency, duration, and sequence of fixations when flying the USAF instrument low approach systems (ILAS) (Report No. 5839). Dayton, OH: Wright Air Development Center, Wright-Patterson AFB. Retrieved from: http://contrails.iit.edu/reports/1524 Patel, N., Pass, A., Mason, S., Gibson, C. R., & Otto, C. (2018). Optical coherence tomography analysis of the optic nerve head and surrounding structures in long-duration International Space Station astronauts. JAMA Ophthalmology, 136, 193–200. 160 Improving Aviation Performance

Petrassi, F. A., Hodkinson, P. D., Walters, P. L., & Gaydos, S. J. (2012). Hypoxic hypoxia at moderate altitudes: Review of the state of the science. Aviation, Space, and Environmental Medicine, 83, 975–984. Peysakhovich, V., Vachon, F., & Dehais, F. (2017). The impact of luminance on tonic and phasic pupillary responses to sustained cognitive load. International Journal of Psychophysiology, 112, 40–45. Portman-Tiller, C. A. (1998). The Fitness Impairment Test (FIT): A First Look. (Report No. NAMRL-1401). Pensacola, FL: Naval Aerospace Medical Research Laboratory. Retrieved from: http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix= html&identifier=ADA350435 Previc, F. H., Lopez, N., Ercoline, W. R., Daluz, C. M., Workman, A. J., Evans, R. H., & Dillon, N. A. (2009). The effects of sleep deprivation on flight performance, instrument scanning, and physiological arousal in pilots. The International Journal of Aviation Psychology, 19, 326–346. Reddy, M. V. (2018). A space odyssey. New Vistas, 3, 40–46. Ruigrok, R. C., & Hoekstra, J. M. (2007). Human factors evaluations of free flight: Issues solved and issues remaining. Applied Ergonomics, 38, 437–455. Sem-Jacobsen, C. W., Nilseng, O., Patten, C., & Eriksen, O. (1959). Electroencephalographic recording in simulated combat flight in a jet fighter plane; the pilot’s level of consciousness. Electroencephalography and Clinical Neurophysiology, 11, 154–155. Senders, J. W. (1966). A re-analysis of the pilot eye-movement data. IEEE Transactions on Human Factors in Electronics, 7, 103–106. Smith, R. (1738). A compleat system of opticks in four books. Cambridge, MA: Printed for the author. Spady, A. A., Jones, D. H., Coates, G. D., & Kirby, R. H. (1982). The effectiveness of using real-time eye scanning information for pilot training. Proceedings of the Human Factors Society Annual Meeting (pp. 1014–1017). Santa Monica, CA: Human Factors and Ergonomics Society. Stepanek, J., Pradhan, G. N., Cocco, D., Smith, B. E., Bartlett, J., Studer, M.,... & Cevette, M. J. (2014). Acute hypoxic hypoxia and isocapnic hypoxia effects on oculometric features. Aviation, Space, and Environmental Medicine, 85, 700–707. Stern, J. A., & Bynum, J. A. (1970). Analysis of visual search activity in skilled and novice helicopter pilots. Aerospace Medicine, 41, 300–305. Stern, J. A., Boyer, D., & Schroeder, D. (1994). Blink rate: A possible measure of fatigue. Human Factors, 36, 285–297. Sullivan, J., Yang, J. H., Day, M., & Kennedy, Q. (2011). Training simulation for heli- copter navigation by characterizing visual scan patterns. Aviation, Space, and Environmental Medicine, 82, 871–878. Temme, L., Reeves, D., Dennis, R., Bleiberg, J., Levinson, D., & Kelly, J. (2011). Normobaric hypoxia as a cognitive stress test for mild traumatic brain injury: Oculometrics, pulse oximetry, and the self-report of symptom severity (Report No. 2016–2015). Fort Rucker, AL: US Army Aeromedical Research Laboratory. Retrieved from: http://www.dtic.mil/docs/citations/AD1031346 Thomas, L. C., Gast, C., Grube, R., & Craig, K. (2015). Fatigue detection in commercial flight operations: Results using physiological measures. Procedia Manufacturing, 3, 2357–2364. Eye Movements Research in Aviation: Past, Present, and Future 161

Tiffin, J., & Bromer, J. (1943). Analysis of eye fixations and patterns for eye movement in landing a piper cub 5-3 airplane (Report No. 14). Washington, DC: CAA Division of Research. Tole, J. R., Stephens, A. T., Vivaudou, M., Harris, R. L. & Ephrath, A. R. (1982). Entropy, instrument scan, and pilot workload. In Proceedings of the international conference on cybernetics and society (pp. 588–592). NY: IEEE. Tvaryanas, A. P. (2004). Visual scan patterns during simulated control of an uninhab- ited aerial vehicle (UAV). Aviation, Space, and Environmental Medicine, 75, 531–538. Uri, J. J., Linder, B. J., Moore, T. P., Pool, S. L., & Thornton, W. E. (1989). Saccadic eye move- ments during space flight (Report No. TM IO0 475). Houston, TX: NASA Lyndon B. Johnson Space Center. Retrieved from: https://ntrs.nasa.gov/archive/nasa/ casi.ntrs.nasa.gov/19890012092.pdf van de Merwe, K., van Dijk, H., & Zon, R. (2012). Eye movements as an indicator of situation awareness in a flight simulator experiment. The International Journal of Aviation Psychology, 22, 78–95. van der Post, J., Noordzij, L. A. W., de Kam, M. L., Blauw, G. J., Cohen, A. F., & van Gerven, J. M. A. (2002). Evaluation of tests of central nervous system per- formance after hypoxemia for a model for cognitive impairment. Journal of Psychopharmacology, 16, 337–343. van Dijk, H., van de Merwe, K., & Zon, R. (2011). A coherent impression of the pilots’ situation awareness: Studying relevant human factors tools. The International Journal of Aviation Psychology, 21, 343–356. van Gog, T., & Jarodzka, H. (2013). Eye tracking as a tool to study and enhance cogni- tive and metacognitive processes in computer-based learning environments. In R. Azevedo & V. Aleven (Eds.), International handbook of metacognition and learn- ing technologies (pp. 143–156). New York: Springer. Wang, T., Li, W.-C., & Lin J. J. H. (2017). Lessons learned from aviation occurrence: Integrated pilots’ visual parameters into cockpit recorders for accident investigation and prevention. Retrieved from International Society of Air Safety Investigators 2017 Technical Papers: http://www.isasi.org/Library/technical-papers.aspx Weibel, N., Fouse, A., Emmenegger, C., Kimmich, S., & Hutchins, E. (2012). Let’s look at the cockpit: Exploring mobile eye-tracking for observational research on the flight deck. In Proceedings of the symposium on eye tracking research and applications (pp. 107–114). NewYork: Association for Computing Machinery. Wetzel, P. A., Anderson, G. M., & Barelka, B. A. (1998). Instructor use of eye position based feedback for pilot training. Proceedings of the human factors and ergonomics society annual meeting (pp.1388–1392). Santa Monica, CA: Human Factors and Ergonomics Society. White, O., Clément, G., Fortrat, J. O., Pavy-LeTraon, A., Thonnard, J. L., Blanc, S., … Paloski, W. H. (2016). Towards human exploration of space: The THESEUS review series on neurophysiology research priorities. NPJ Microgravity, 2, 16023. Wickens, C. D. (2002). Situation awareness and workload in aviation. Current Directions in Psychological Science, 11, 128–133. Wilmer, W. H. (1920). Aviation Medicine in the A.E.F. (Report No. 1004). Washington, DC: Government Printing Office. Retrieved from: https://archive.org/stream/ aviationmedicin01wilmgoog/aviationmedicin01wilmgoog_djvu.txt Wilmer, W. H., & Berens, C. (1918). Medical studies in aviation. V. The effect of altitude on ocular functions. Journal of the American Medical Association, 71, 1394–1398. 162 Improving Aviation Performance

Wilson, G. F., & Fisher, F. (1991). The use of cardiac and eye blink measures to deter- mine flight segment in F4 crews. Aviation, Space, and Environmental Medicine, 62, 959–962. Yu, C. S., Wang, E. M. Y., Li, W. C., Braithwaite, G., & Greaves, M. (2016). Pilots’ visual scan patterns and attention distribution during the pursuit of a dynamic target. Aerospace Medicine and Human Performance, 87, 40 – 47. Ziv, G. (2016). Gaze behavior and visual attention: A review of eye tracking studies in aviation. The International Journal of Aviation Psychology, 26, 75–104. 8 Human Performance Assessment: Evaluation of Wearable Sensors for Monitoring Brain Activity

Kurtulus Izzetoglu and Dale Richards

CONTENTS Maintaining the Objective: Assessing Cognitive State ...... 165 Assessment and Measurement ...... 167 An Investigation of Optical Brain Imaging Sensor in Performance Assessment ...... 169 Principles of fNIRS in Brain Activity Assessment ...... 169 Review of fNIRS Application in Aviation ...... 170 Sensor Operator ...... 170 Air Traffic Control Operator ...... 174 Discussion ...... 175 References ...... 177

The aviation domain is changing. Initiatives within the Next Generation Air Transportation System (NextGen) and Single European Sky ATM Research (SESAR) alone require attention as to how the human element is to seamlessly integrate within advanced technologies and complex procedures. The increasing demand in utilizing unmanned systems and the associated challenges this presents to existing aerospace infrastructure provides us with not only an evolving picture of the future of aerospace, but an equal challenge in ensuring the operators and users of this system are able to utilize this safely and efficiently. Measuring cognitive factors is critical in assessing human performance within an aviation context. Ensuring that the human is not overloaded in terms of mental workload, has sufficient situation awareness, and maintains an appropriate degree of vigilance are key areas that must be examined. We have seen that there are a variety of different ways in which we can measure these aspects of human performance, and with each application there are both pros and cons. However, while some of the performance and subjective metrics have

163 164 Improving Aviation Performance remained stalwart assessment tools, advances in neurophysiological wear- able technologies present an additional means by which to address ques- tions pertaining to human performance. This chapter discusses the importance of measurement techniques in assessing various aspects of cognition that are prevalent in aerospace domain. It is inevitable that any discussion on aviation psychology will include such cognitive constructs as attention and cognitive workload and their relation- ship with human performance in safety-critical environments. In particular, we will focus on the use of non-invasive wearable sensors to examine brain activity as a correlate of human performance. By utilizing this technology, we attempt to bridge the gap between cognition and measurement and focus on what we believe are significant improvements in the field of neurophysi- ological assessment techniques. It is also important at this stage to outline key cognitive areas of interest when attempting to explore the correlation between physiological state changes and psychological constructs. A number of studies are described whereby wearable systems, specifically, functional near-infrared spectroscopy (fNIRS)-based systems, are used to assess the cognitive state of different operators. The potential advantages and challenges are discussed in relation to implementing such sensors in operational settings. We have seen a marked increase in the availability of smart technol- ogy that has varying levels of integrated technology designed to measure aspects of our physiology as we go about our daily activities. In particular, the neuroergonomics approach, introduced by Parasuraman (Parasuraman & Wilson, 2008), paved the way for widespread use of wearable physiological monitoring sensors and real-time analytic techniques to enable objective assessment of human operators’ neurophysiological state in the field settings (Parasuraman & Rizzo, 2008). These technologies are now not only every- where, but we tend to expect them even in devices that are not acquired for that particular purpose. Because of this, we can safely say that the means to assess our neurophysiological state is rapidly becoming ubiquitous. Moreover, this trend also provides scientists with an opportunity to now apply such technologies in more contextually real and dynamic environments. Traditionally, performance and subjective metrics have been used to eval- uate the cognitive status and capacities of the crew in the cockpit as well as those operating ground control stations. However, the advance in wearable physiology technologies could provide additional metrics directly derived from brain-based measures, potentially validating performance and sub- jective assessments and ultimately bringing us closer towards maintaining safe and effective performance. Furthermore, these techniques may also aid the design and evaluation of new technologies that are being adopted as a means to increase operational capacity, efficiency, and safety across the aerospace domain. For example, the measurement of real-time brain activ- ity from the operator can help evaluate decision making, or compare the workload burden of next-generation system versus legacy systems in the air transportation domain. Human Performance Assessment: Evaluation of Wearable Sensors 165

Civilian pilots, air traffic controllers, and ground controllers are all increas- ingly required to utilize larger amounts of data and more complex systems, such as those being developed under a number of NextGen and SESAR initia- tives. Hence, we are likely to observe an increase in the information-processing load and decision-making demands on aviation personnel. The human ele- ment within any future concept still represents a critical point that may either be seen as a point of failure or a means by which these new technologies are optimized. Therefore, it is important to consider not only how we assess such technologies, but also the way in which the human interacts with them and ultimately arrives at decisions that they made. The last decade has seen significant advances in physiological monitor- ing techniques, and, in particular, their integration into mobile devices. One aspect of this has been the increase in wearable human performance monitoring technologies that can be used to evaluate the cognitive state and capacities of the flight crew and ground operators. Non-invasive wearable technologies offer the potential to assess operator performance based on brain-based measures. Currently, the most widely used brain activity mea- sures are functional magnetic resonance imaging (fMRI), magnetoencepha- lography (MEG), electroencephalography (EEG), and fNIRS. EEG measures of workload and task difficulty have been reported in studies of air traf- fic controllers (Brookings, Wilson, & Swain, 1996), airline pilots (Sterman & Mann, 1995), drivers (Brookhuis & De Waard, 1993), and participants per- forming cognitive tasks (Berka, Levendowski, Cvetinovic, Petrovic, Davis, Lumicao, Zivkovic, Popovic, & Olmstead, 2004). Further, fMRI is widely used to study the operational organization of the human brain and has been demonstrated that it can map changes in brain hemodynamics produced by human mental tasks (Logothetis & Wandell, 2004). The use of fMRI in real-world operations, where participants perform real-world tasks, is lim- ited due to the restrictions they impose on participants. On the other hand, fNIRS has been introduced as a new neuroimaging modality with which to conduct functional brain studies (Boas, Franceschini, Dunn, & Strangman, 2002; Chance et al., 1998; Obrig & Villringer, 1997). This chapter focuses on the fNIRS measures. Principles of fNIRS are dis- cussed in relation to application and calibration, before highlighting their potential contribution to providing reliable and objective assessment of cog- nitive performance.

Maintaining the Objective: Assessing Cognitive State The aerospace industry is regarded as one of the safest transport domains, with a constantly improving safety record (Harris, 2014). When we consider the different roles and responsibilities that we ask of the humans that operate across the national airspace system (NAS), we can appreciate the diversity of 166 Improving Aviation Performance tasks and systems that users of those systems have to utilize. When tasks become complex, laborious, or when demands dramatically increase, auto- mation is commonly applied. Although we have seen a rise in the effective use of automation within aerospace applications, it is fair to say that the human will remain responsible for making critical decisions based on the information they are presented with. If we are asked to consider the key human factors elements associated with the aviation industry, it would be unsurprising that aspects of cognitive pro- cessing would feature heavily on that list. The discipline of human factors (HF) within the aviation domain has provided us with a good understanding of the cognitive processes involved in aviation operations, predominantly focused on manned and unmanned aviation and the critical management task provided by Air Traffic Control Operations (ATCO). In order to better understand human performance in relation to human–machine interaction, we believe that fNIRS affords a means by which we can assess a key compo- nent of human information processing (HIP)—mental workload (MW). But we must first consider the way in which humans process information. To start at the beginning, we can describe, in general terms, the core aspects of HIP as related to how information travels from the environment to the human and subsequently how he/she acts on that information. This, in turn, can be further deconstructed into three key factors: (1) Encoding data from the environment, (2) Processing the data into meaningful information we can use, and (3) Executing actions as a result of the first two steps. Although this sounds like a simple mechanistic approach we must remember that all of these activities must take place rapidly across different dynamic memory systems; namely sensory, short-term memory (often referred to as working memory), and long-term memory (Atkinson & Shiffrin, 1968, 1971). These distinct memory systems allow us to understand the processing of infor- mation in terms of how we attend to sensory stimuli, before we move on to register and encode aspects of the information, see Figure 8.1. Of course, how we process information is somewhat dependent on the characteristics of the information associated with the specific requirements of the task. This will further determine how attentional resources are uti- lized during the context of the task demand and which stimuli are attended to (Baddeley & Hitch, 1974; Baddeley, 2003). Inevitably, this represents a con- straint in terms of how humans process and store information, more so when confronted with dynamic and complex tasks to perform. Unsurprisingly, there are many instances where this constraint of HIP can sometimes lead to bottlenecks, whereby information will compete for the attention of the indi- vidual to process. The human brain adapts to this by selectively attending to certain information (very much dependent on the task context), while filter- ing out less relevant information (Moran & Desimone, 1985). We can all think back to an instant where we have felt overwhelmed by a situation that has affected our ability to act efficiently and in a timely manner. Regardless of whether that experience was within an aviation context or not, Human Performance Assessment: Evaluation of Wearable Sensors 167

FIGURE 8.1 Three-component memory model of information processing. (Adapted from Shiffrin, R.M. and Atkinson, R.C., Psychol. Rev., 76, 179–193, 1969.) it is likely that this increase in MW could also raise the likelihood of inducing human error and ultimately reducing system effectiveness (Moray, 1988). As with many cognitive constructs, there is no single agreed definition of MW, but we can broadly agree that it is composed of a number of features that require: an input (or task load), a specified amount of effort required by the human to satisfy the task, and the actual performance of the human in doing the task (Jahns, 1973). Clearly the ability to assess an individual’s MW during critical tasks can provide important details as to the nature of the task demand, which may then assist in the future design and integration of that system into an operational context. The key element to consider here is that the assessment of MW requires a tangible value that can be assessed by employing a range of techniques. Primarily, we can use observation and performance measures to determine whether the task has been completed successfully. Performance measures may be constructed of behavioral markers assigned to primary or secondary tasks. That is, quantifiable measures (such as holding altitude or maintaining safe separation) may be used to determine whether the individual is oper- ating under a higher or lower amount of MW. A more direct perception of what the participant may report in terms of their perceived effort also can be gathered by a large number of available subjective MW techniques.

Assessment and Measurement In many instances, we hear of a new display or the introduction of a new task or technology that purports to improve human performance, and a battery of metrics is used to explain the differences between the 168 Improving Aviation Performance compared elements. This is what we refer to as traditional A/B testing used commonly in HF evaluation work. However, applying HF metrics in the aviation domain requires additional validation and verification (V&V) of the system being assessed. The V&V process is a critical component of assessing system compliance to a set of requirements and standards that are defined by a stakeholder. These requirements will be a defined set of statements that will need to be met through a number of acceptance criteria, which tend to be associated with issues such as system/human performance, safety, and regulatory standards. Castaneda (2013) suggests that “without concise, measurable and justifiable requirements the V&V process can become impossible to accomplish” (p.2035) and goes on to state that the lack of HF within this process has made it very challenging to address system performance requirements. Castaneda further uses the subjective workload, NASA-TLX, scale as an example of when a metric can be used out of context (and without the understanding of the nature of contextual influences that can affect such a scale) when being used to define an acceptable level of MW. Unfortunately, this is all too common a practice when the lack of HF expertise can simply reach for the nearest met- ric without fully understanding the benefits/constraints of its application. Mental workload (and indeed situation awareness [SA]) are cognitive con- structs that are somewhat illusive to direct measurement. However, all is not lost, as we may use a number of methods to assess these constructs. We can see that there methods fall into three categories: (1) subjective rating scales, (2) performance measures associated with primary and secondary tasks, and (3) psychophysiological measures. It is generally recognized that there is a need to use multiple, complementary measures to increase the validity of the assessment. Because all measures have specific strengths and drawbacks, the context within which we conduct human performance assessments plays a pivotal role in how we attempt to measure cognitive processes. There are a number of factors that should be considered within the selection criteria for metrics, and have been discussed in various publications (see Tsang & Wilson, 1997; Wickens & Hollands, 2000):

• Sensitivity: Ability to discriminate between different variations in workload associated with a task • Diagnosticity: Differentiation between subsets of workload depen- dent on cognitive resource • Intrusiveness: The degree to which the measure intrudes on the nature of the task • Implementation: Nature of integrating a technique prior to use including training • Acceptance: Validity by which users accept the metric in the context of application Human Performance Assessment: Evaluation of Wearable Sensors 169

An Investigation of Optical Brain Imaging Sensor in Performance Assessment The safe and effective performance of aviation personnel depends on their ability to manage and maintain high levels of cognitive performance. A field-deployable optical brain imaging device can provide team member’s cognitive state and relative level of expertise for a given level of performance by monitoring cortical areas that are known to be associated with MW, train- ing and the development of expertise. Near-infrared spectroscopy (NIRS) has been widely used in brain stud- ies as a non-invasive tool to study changes in the concentration of oxygen- ated hemoglobin (oxy-Hb) and deoxygenated hemoglobin (deoxy-Hb). Based on the NIRS technique, a functional brain activity assessment (fNIRS) system has been deployed as a means to monitor cognitive func- tions, particularly during attention and working memory tasks as well as for complex tasks such as piloting and air traffic control performed by healthy individuals under operational conditions. The fNIRS is a field-deployable non-invasive optical brain monitoring technology that provides a direct measure of cerebral hemodynamics from the forehead in response to sensory, motor, or cognitive activation. Using this technique, several types of brain function have been assessed including motor and visual activation, auditory stimulation, and the performance of various cognitive tasks (Chance, Zhuang, UnAh, Alter, & Lipton, 1993; Gratton, Corballis, Cho, Fabiani, & Hood, 1995; Hoshi & Tamura, 1993; Izzetoglu, Bunce, Onaral, Pourrezaei, & Chance, 2004; Kato, Kamei, Takashima, & Ozaki, 1993; Villringer, Planck, Hock, Schleinkofer, & Dirnagl, 1993).

Principles of fNIRS in Brain Activity Assessment Understanding the brain energy metabolism and associated neural activity is of importance for realizing principles of fNIRS in assessing brain activ- ity. The brain has small energy reserves and a great majority of the energy used by brain cells are for processes that sustain physiological functioning (Ames III, 2000). Ames III reviewed the studies on brain energy metabo- lism as related to function and reported that the oxygen (O2) consump- tion of the rabbit vagus nerve increased 3.4-fold when it was stimulated at 10 Hz and O2 consumption in rabbit sympathetic ganglia increased 40% with stimulation at 15 Hz. Furthermore, glucose utilization by various brain regions increased several fold in response to physiological stimulation or in response to pharmacological agents that affect physiological activity (Ames III, 2000). These studies provide clear evidence that large changes occur in brain energy metabolism in response to changes in activity. Izzetoglu (2008) outlined the levels of compounds involved in energy metabolism and energy metabolites as follows: 170 Improving Aviation Performance

• Brain cells consume energy when activated. Oxygen is required to metabolize the glucose. The concentration of oxygen in brain is about 0.1 μmol g−1 of which 90% is in oxy-Hb in brain capillaries. This concentration can support the normal oxygen consumption (about 3.5 μmol g−1 min−1) for 2 seconds. (Ames III, 2000). For that reason, increase in neural activity in the brain is followed by the rise in local cerebral blood flow (CBF) (Buxton, Uludagˆ , Dubowitz, & Liu, 2004). • Oxygen is transported to neural tissue via oxygenated hemoglobin (oxy-Hb) in the blood. • The oxygen exchange occurs in the capillary beds. • As oxy-hemoglobin gives up oxygen, it is transformed into deoxy- genated hemoglobin (deoxy-Hb). • Local CBF increases much more than the cerebral metabolic rate of

oxygen (CMRO2); therefore, local blood is more oxygenated and less deoxy-Hb present (Buxton et al., 2004).

Based on this brain energy metabolism, methods and imaging modalities, such as fNIRS and fMRI (Kwong, Belliveau, Chesler, Goldberg, Weisskoff, Poncelet, Kennedy, Hoppel, Cohen, Turner, 1992; Ogawa et al., 1990), for mea- surements of deoxy-Hb and/or oxy-Hb are implemented to provide corre- lates of brain activity through oxygen consumption by neurons. Because oxy-Hb and deoxy-Hb have characteristic optical properties in the visible and near-infrared light range, the change in concentration of these molecules during increase in brain activation can be measured using opti- cal methods. Most biological tissues are relatively transparent to light in the near infrared range between 700–900 nm, largely because water, a major component of most tissues, absorbs very little energy at these wavelengths. Within this window, the spectra of oxy- and deoxy-hemoglobin are distinct enough to allow spectroscopy and measures of separate concentrations of both oxy-Hb and deoxy-Hb molecules (Cope, 1991). This spectral band is often referred to as the “optical window” for the non-invasive assessment of brain activation (Jobsis, 1977).

Review of fNIRS Application in Aviation In this section, we discuss the use of wearable brain sensors to ascertain human performance in two aviation contexts; that of an unmanned aerial vehicle (UAV) sensor operator and an ATCO.

Unmanned Aerial Vehicle Sensor Operator The Predator UAV Ground Control Station (GCS) has what may be considered a pilot-centric display, in that the operator (or pilot operating the GCS) is pre- sented with essentially a cockpit on the ground. The displays provided on the Human Performance Assessment: Evaluation of Wearable Sensors 171

GCS replicate information displays that one would expect to find in a cockpit, including the need to manipulate the display via a sidestick. The Predator team, however, is composed of two crew, with the second crew member focus- ing on the sensor display. It is the role of this individual to bring the sensor to bear on points of interest and to carry out visual searches while utilizing the sensor onboard the aircraft. Although part of the flying and monitoring of the system would be automated (Chao, Cao & Chen, 2010), this does not dimin- ish the responsibility of the crew members to monitor the system as the flight progresses. The nature of UAV operations will tend to be sensor-centric and in many instances will require a human placed in front of a sensor feed on the associated GCS. In this chapter, we will not focus on the different design paradigms that a GCS may align to, but rather highlight the nature of the human task. In this instance, it is the visual task of monitoring a sensor feed (normally a near real-time full motion video [nrtFMV]). This is a significant task for the sensor operator (SO), as the SO not only have to accomplish the mission requirement, but manipulate the sensor in order to maximize the chances of attaining the information they need to satisfy the requirement. Much information processing and decision-making take place as the SO is occupied with making inputs to the sensor to ensure the correct point is being captured, adjusting the angle/azimuth of the sensor, adequate timing of feed capture and so forth. This begins to define the SO role as being far more com- plex than simply pointing a camera at a fixed point on the ground. We have adopted fNIRS (as shown in Figure 8.2) in order to assess the workload of the UAV pilots (Izzetoglu, Ayaz, Hing, Shewokis, Bunce, Oh, & Onaral, 2015) and UAV sensor operators (Armstrong, Izzetoglu, & Richards, 2018). The results of these studies revealed that such fNIRS can be used to monitor true MW changes during aerospace operations. They also showed that fNIRS could serve as an objective measure of expertise development; that is, during the transition from novice to expert during operator training (Ayaz, Shewokis, Bunce, Izzetoglu, Willems, & Onaral, 2012).

FIGURE 8.2 Functional infrared sensor used in the UAV sensor operator and Air Traffic Control Operator studies: 16-channel sensor was used to monitor prefrontal cortex area. 172 Improving Aviation Performance

In this UAV sensor operator (SO) study, we utilized Simlat’s C-STAR sim- ulator system which consists of Performance Analysis & Evaluation mod- ule (PANEL) that collects and processes simulation data, while producing comprehensive reports of trainees’ performance in various tasks during a mission. The simulator has the capability to transfer views between sensor operator and pilot, as well as a realistic landscape, targets, and accurate rep- resentations of UAV operator controls. The software allows for two trainees and one instructor to operate the generic tactical unmanned vehicle (G-TAC UAV) simultaneously and in designated roles, as well as the capability of the instructor to manually or automatically preset “emergency” situations that the pilot(s) might encounter, such as cloud cover, precipitation, and equip- ment failure. This robust system is ideal for real-world training of both the sensory operator and pilot roles of the UAV (Figure 8.3). Participants were separated into low and high performers based on their performance data to conduct comparisons between fNIRS measurements amongst the trials. The participants who scanned more area within certain field of camera view, were placed in the high performers group. Participants who scanned less area and showed no performance improvement between initial trial and final trial were classified as low performers. The simulator software had a capability to record camera’s field of view and zoom levels for each participant. These data were used to calculate the percentage of the designated area that was successfully scanned. A successful scan was determined to be a scan at a zoom level lower than 15 degrees as the UAV proceeded along the designated route. These areas and the designated route were shown in Figure 8.3. The analysis for the fNIRS measures was to inves- tigate the measures from the prefrontal cortex (PFC) region area associated with attention. We calculated oxygenation changes for low and high perform- ers. Figure 8.4 depicts oxygenation changes from the Optode located over the middle frontal gyrus of the right hemisphere, which is the area known to be

FIGURE 8.3 UAV Simulator: C-STAR system and task screen: The picture on the left shows trainees’ screen including payload view, and the image on the right depicts the designated flight route where the search/scanning task was performed. Provided by Simlat, Inc (Miamisburg, Ohio; www. simlat.com) Human Performance Assessment: Evaluation of Wearable Sensors 173

FIGURE 8.4 fNIRS results for high and low performers between initial trial and final trial.

associated with attention (Izzetoglu, Bunce, Izzetoglu, Onaral, & Pourrezaei, 2007). Based on this preliminary experimental protocol, localized oxygen changes over the middle frontal gyrus of the right hemisphere were notable in this taxing attention task, such as scanning. Another key finding is that the oxygenations changes were larger for the high performers as their scanning performance improved over three trials (Armstrong et al., 2018). On the other hand, poor performers’ oxygenation changes remained small, which would be expected based on their scanning task performance. When we interpret this result, one should note that a low oxygenation does not always mean a lack of cognitive effort. For example, we found that the high performers’ oxygenation levels decreased as they became more proficient over time while performing the scanning task. A similar trend was observed here for the high performers with the previ- ously reported studies using fNIRS when assessing unmanned and manned pilots while they were acquiring new skills (Ayaz et al., 2012; Izzetoglu, et al., 2015; Menda, et al., 2011). That is, while one becomes familiar with the task, 174 Improving Aviation Performance the oxygenation levels at the PFC region decreased. This was not seen in the low performers during this preliminary study. There were no significant differences between final and initial trials for this group. This may be due to the fact that they did not become more proficient over time as there was no improvement in final trial’s scan percentage score compared to the first trial. Considering the small sample size, which limits the power of these analyses, the influence of these group differences on the neurophysiological measures remains for a future study to explore.

Air Traffic Control Operator Harris (2005) provides a brief overview of the Air Traffic Controller (ATC) task and indicates the high cognitive demand placed on this safety critical role. If we examine the role of the ATCO, we see marked similarities when compared with the UAV operator in terms of higher cognitive functions. This primarily involves active management of traffic passing through differ- ent airspace types, all the while ensuring conflict resolution and coordina- tion. This is essentially a visual scanning task that is affected by such things as attention, vigilance, and fatigue, all the while requiring the operator to manage their MW and maintain SA. Our work utilizing fNIRS has allowed us to advance this technique towards deploying this device in the field; whereby operators can be assessed in their normal working condition. In a study with the Federal Aviation Administration (FAA), we explored the impact of the different air- craft numbers on air traffic control operator’s behavior and mental workload (Harrison, Izzetoglu, Ayaz, Willems, Hah, Ahlstrom, Woo, & Onaral, 2014). The airspace used in the experiment consisted of two active high-altitude sectors, i.e., 20 and 22 of the Kansas Center (ZKC). Traffic scenarios were developed based on samples extracted from the Aircraft Situation Display to Industry (ASDI) feed to ZKC. The traffic was filtered to include only aircraft that crossed a volume of airspace of 300 × 300 nautical miles, which included the sectors used in the experiment. The traffic was modified to make the traffic volume steadily increase during each run from a low monitor alert parameter (MAP) to a high MAP. A MAP was previously described by the FAA as the number of aircraft that a sector/airport can accommodate without degraded efficiency during a specific period of time; that is, at 100% MAP vol- ume, the airspace cannot accommodate another aircraft without a decrease in safety and efficiency of the airspace. In Sector 20, a MAP increase from a 33% to 100% MAP value results in an aircraft increase from about 6 aircraft in the beginning to 19 aircraft at the end of each session. For the oxygenation changes, as measured by fNIRS, contrasts of the main within-subject factor, aircraft counts, indicate statistically significant differences between aircraft count blocks (see Figure 8.5). Credible differ- ences were found for the contrasts between adjacent blocks “13–15 vs. < 13,” “16–18 vs. 13–5,” and “19–21 vs. 16–18” (Harrison et al., 2014). Human Performance Assessment: Evaluation of Wearable Sensors 175

FIGURE 8.5 Mean relative oxygenation changes with increasing aircraft count. (Adapted from Harrison, J. et al., IEEE Trans. Hum. Mach. Syst., 44, 429–440, 2014). Error bars = +/− 1 SE.

Discussion Advances in neurophysiology and neuro-monitoring technologies have demonstrated that changes in physiology associated with different tasks can be assessed explicitly. These may relate to instances where the human is confronted with high cognitive loading, or events that can be identified as leading to a change in situation awareness. It may also be used to develop adaptive, personalized training regimes and provide indicative markers that are associated with expertise development. Therefore, it is essential that before we start to decide which metric to use, we must consider the context within which the measurement is to be applied, what we are exactly attempt- ing to measure, and so on. Once we can establish these requirements we can begin to address the robustness of these neurophysiological biometrics in terms of validity and reliability. We have seen that HIP describes the cognitive processes by which human make sense of surrounding information and how elements associated within this process can affect decision making. We use cognitive models to explain the nature of this process and have defined the practical aspects of memory that play a significant role in an aviation context and draw upon constructs, 176 Improving Aviation Performance such as MW, to understand human performance limitations and benefits. However, we have seen that measuring these cognitive attributes are some- what elusive to measure, and also raise issues pertaining to their validity and whether they do indeed measure what we think they are designed to assess (Castaneda, 2013). The HF toolset at our disposal is large, and a number of different techniques can indeed be used; although a number of pertinent factors should be considered during the selection of these metrics. In this chapter, we have applied the use of wearable optical brain imaging techniques to demonstrate their effectiveness in obtaining data that can be used in conjunction with existing metrics or used as a means by which other scales need to be validated. The improvements we are seeing in wearable technologies presents the experimenter with technology that has the ability to assess the human in a dynamic context (where data are related directly to the task they are per- forming and can also be deployed within naturalistic settings). Devices that specifically target the neurophysiological mechanism associated with the brain activity are of particular interest, as these measurements could provide a real-time correlation between higher cognitive function and the task the human is conducting. In this chapter, we have seen the application of fNIRS to both UAV Sensor Operator and Air Traffic Control Operator. We have dis- cussed studies that demonstrate how fNIRS can be applied to the UAV SO role and revealed that participant scanning behavior was directly related to changes in the oxygenation levels in different regions of the PFC. This finding supports previous suggestions that this region of the brain supports executive functions, such as visual attention (see Fan, McCandliss, Fossella, Flombaum, & Posner, 2005). Stimulation to this area has also been found to enhance cognitive performance in relation to attentional demand (Weiss & Lavidor, 2012). When we apply this same technology to assess ATCO behav- ior we observe an increase in oxygenation within the inferior frontal gyrus region of the PFC. This part of the brain has been associated with work- ing memory (Becker, Androsch, Jahn, Alich, Striepens, Markett, Maier, & Hurlemann, 2013; Luo & Niki, 2000); thus, activation of this area will allow us to assess the presentation of information to an operator, or even the man- ner in which training can allow for better use of cognitive strategies associ- ated with working memory. It is worth noting that the sensitivity of self-reporting metrics may only provide one side of the story, in that they are subjective assessments and sometimes do not reveal the full picture. Both subjective and objective met- rics clearly have a role to play here, but we must exercise caution by not focusing on one metric only. Indeed, some studies have revealed contrasting results when we compare subjective versus physiological metrics in terms of MW (e.g., Richards, Scott, Furness, Lamb, Jordan, & Moore, 2016). Similarly, subjective and performance sometimes dissociate, albeit in a lawful manner (Vidulich & Wickens, 1986). There have also been observations that suggest that subjective metrics, such as the NASA-TLX, can be limited by the nature of Human Performance Assessment: Evaluation of Wearable Sensors 177 individual differences in introspection skills (Paulhus & Vazire, 2007). Chen, Lee, and Stevenson (1995) even go so far as to suggest that this limitation may even be observed at a cultural level, whereby instructing an individual to report perceived feelings of cognitive state are difficult to articulate. While this chapter has provided insights into the advances in wearable sensors and how they may be used to measure physiological state changes, they represent an exciting opportunity to explore the psychology–physiology divide. Brain imaging measures allow us to add to our growing human per- formance toolkit, and when used with a battery of other metrics (includ- ing both perform ace and subjective measures), it provides us with a more in-depth understanding of human-system performance.

References Ames, III, A. (2000). CNS energy metabolism as related to function. Brain Research Reviews, 34, 42–68. Armstrong, J., Izzetoglu, K. & Richards D. (2018). Using functional near infrared spectroscopy to assess cognitive performance of UAV sensor operators during route scanning. Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies—Volume 3 (pp. 286–293). Madeira, Portugal: SciTePress. Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In J. T. S. Kenneth W. Spence (Ed.), Psychology of learning and motivation (pp. 89–195). New York, NY: Academic Press. Atkinson, R. C., & Shiffrin, R. M. (1971). The control of short-term memory. Scientific American, 225(2), 82–90. Ayaz, H., Shewokis, P. A., Bunce, S., Izzetoglu, K., Willems, B., & Onaral, B. (2012). Optical brain monitoring for operator training and mental workload assess- ment. NeuroImage, 59, 36 – 47. Baddeley, A. (2003). Working memory: looking back and looking forward. Nature Reviews. Neuroscience, 4, 829–839. Baddeley, A. D., & Hitch, G. (1974). Working Memory. In G. H. Bower (Ed.), Psychology of learning and motivation: Advances in research and theory (Vol. 8, pp. 47–89). New York, NY: Academic Press. Becker, B., Androsch, L., Jahn, R. T., Alich, T., Striepens, N., Markett, S., Maier, W., & Hurlemann, R. (2013). Inferior frontal gyrus preserves working memory and emotional learning under conditions of impaired noradrenergic signaling. Frontiers in Behavioral Neuroscience, 7, 197. Berka, C., Levendowski, D. J., Cvetinovic, M. M., Petrovic, M. M., Davis, G., Lumicao, M. N., Zivkovic, V. T., Popovic, M. V., and Olmstead, R. (2004). Real-time analysis of EEG indexes of alertness, cognition, and memory acquired with a wireless EEG headset. International Journal of Human-Computer Interaction, 17, 151–170. Boas D.A., Franceschini M.A., Dunn A.K., & Strangman G. (2002). Non-invasive imaging of cerebral activation with diffuse optical tomography. in-vivo optical imaging of brain function (pp. 193–221). Boca Raton, FL: CRC Press. 178 Improving Aviation Performance

Brookhuis, K. A., & de Waard, D. (1993). The use of psychophysiology to assess driver status. Ergonomics, 36, 1099–1110. Brookings, J. B., Wilson, G. F., & Swain, C. R. (1996). Psychophysiological responses to changes in workload during simulated air traffic control. Biological Psychology, 42, 361– 377. Buxton, R. B., Uludağ, K., Dubowitz, D. J., & Liu, T. T. (2004). Modeling the hemody- namic response to brain activation. NeuroImage, 23, Suppl 1, S220–S233. Castaneda, M. (2013). Human Factors for verification and validation of remotely piloted aircraft. In proceedings of the human factors and ergonomics society annual meeting (pp. 2032–2036). Santa Monica, CA: Human Factors and Ergonomics Society. Chance, B., Anday, E., Nioka, S., Zhou, S., Hong, L., Worden, K., Li, C. et al. (1998). A novel method for fast imaging of brain function, non-invasively, with light. Optics Express, 2, 411–423. Chance, B., Zhuang, Z., UnAh, C., Alter, C., & Lipton, L. (1993). Cognition-activated low-frequency modulation of light absorption in human brain.Proceedings of the national academy of sciences of the United States of America, 90(8), 3770–3774. Chao, H., Cao, Y., & Chen, Y. (2010). Autopilots for small unmanned aerial vehicles: A survey. International Journal of Control, Automation and Systems, 8, 36–44. Chen, C., Lee, S., & Stevenson, H. W. (1995). Response style and cross-cultural com- parisons of rating scales among east Asian and North American students. Psychological Science, 6, 170–175. Cope, M. (1991). The development of a near infrared spectroscopy system and its application for non-invasive monitoring of cerebral blood and tissue oxygenation in the newborn infants. London, UK: University of London. Retrieved from http://discovery. ucl.ac.uk/id/eprint/1317956 Fan, J., McCandliss, B. D., Fossella, J., Flombaum, J. I., & Posner, M. I. (2005). The acti- vation of attentional networks. NeuroImage, 26, 471–479. Gratton, G., Corballis, P. M., Cho, E., Fabiani, M., & Hood, D. C. (1995). Shades of gray matter: Noninvasive optical images of human brain responses during visual stimulation. Psychophysiology, 32, 505–509. Harris, D. (2005). A socio-technical systems analysis of increasing operational effi- ciency: Why human factors solutions developed without reference to the wider context may not work. Measurement and Control, 38, 235–238. Harris, D. (2014). Improving aircraft safety. The Psychologist, 27, 90–94. Harrison, J., Izzetoglu, K., Ayaz, H., Willems, B., Hah, S., Ahlstrom, U., Woo, H., & Onaral, B. (2014). Cognitive workload and learning assessment during the implementation of a Next-Generation air traffic control technology using functional near-infrared spectroscopy. IEEE Transactions on Human-Machine Systems, 44, 429–440. Hoshi, Y., & Tamura, M. (1993). Detection of dynamic changes in cerebral oxygen- ation coupled to neuronal function during mental work in man. Neuroscience Letters, 150(1), 5–8. Izzetoglu, K. (2008). Neural correlates of cognitive workload and anesthetic depth: fNIR spectroscopy investigation in humans. Philadelphia, PA: Drexel University. Retrieved from https://idea.library.drexel.edu/islandora/object/idea%3A2896 Izzetoglu, K., Ayaz, H., Hing, J. T., Shewokis, P. A., Bunce, S. C., Oh, P., & Onaral, B. (2015). UAV operators’ workload assessment by optical brain imaging technol- ogy (fNIR). In K. P. Valavanis & G. J. Vachtsevanos (Eds.), Handbook of unmanned aerial vehicles (pp. 2475–2500). Dordrecht, the Netherland: Springer. Human Performance Assessment: Evaluation of Wearable Sensors 179

Izzetoglu, K., Bunce, S., Onaral, B., Pourrezaei, K., & Chance, B. (2004). Functional optical brain imaging using near-infrared during cognitive tasks. International Journal of Human-Computer Interaction, 17, 211–227. Izzetoglu, M., Bunce, S. C., Izzetoglu, K., Onaral, B., & Pourrezaei, K. (2007). Functional brain imaging using near-infrared technology. IEEE Engineering in Medicine and Biology Magazine, 26, 38–46. Jahns, D. W. (1973). A Concept of Operator Workload in Manual Vehicle Operations. Ges. zur Förderung d. Astrophysikal. Forschung. Jobsis, F. (1977). Noninvasive, infrared monitoring of cerebral and myocardial oxy- gen sufficiency and circulatory parameters. Science, 198(4323), 1264–1267. Kato, T., Kamei, A., Takashima, S., & Ozaki, T. (1993). Human visual cortical function during photic stimulation monitoring by means of near-infrared spectroscopy. Journal of Cerebral Blood Flow and Metabolism, 13, 516–520. Kwong, K. K., Belliveau, J. W., Chesler, D. A., Goldberg, I. E., Weisskoff, R. M., Poncelet, B. P., Kennedy, D. N., Hoppel, B. E., Cohen, M.S., Turner, R. (1992). Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. Proceedings of the National Academy of Sciences of the United States of America, 89(12), 5675–5679. Logothetis, N. K., & Wandell, B. A. (2004). Interpreting the BOLD signal. Annual Review of Physiology, 66, 735–769. Luo, J., & Niki, K. (2000). The role of left inferior frontal gyrus in working memory: Phonological competition and inhibition. NeuroImage, 11(5), S400. Menda, J., Hing, J. T., Ayaz, H., Shewokis, P. A., Izzetoglu, K., Onaral, B., & Oh, P. (2011). Optical brain imaging to enhance UAV operator training, evaluation, and interface development. Journal of Intelligent & Robotic Systems, 61, 423–443. Moran, J., & Desimone, R. (1985). Selective attention gates visual processing in the extrastriate cortex. Science, 229(4715), 782–784. Moray, N. (1988). Mental workload since 1979. International Reviews of Ergonomics, 2, 123–150. Obrig, H., & Villringer, A. (1997). Near-infrared spectroscopy in functional activation studies. Can NIRS demonstrate cortical activation? Advances in Experimental Medicine and Biology, 413, 113 –127. Ogawa, S., Lee, T. M., Kay, A. R., & Tank, D. W. (1990). Brain magnetic resonance imag- ing with contrast dependent on blood oxygenation. Proceedings of the National Academy of Sciences of the United States of America, 87(24), 9868–9872. Parasuraman, R., & Rizzo, M. (2008). Neuroergonomics: The brain at work. New York, NY: Oxford University Press. Parasuraman, R., & Wilson, G. F. (2008). Putting the brain to work: Neuroergonomics past, present, and future. Human Factors, 50, 468–474. Paulhus, D. L., & Vazire, S. (2007). The self-report method. In R. W. Robins, R. C. Fraley, & R. F. Krueger (Eds.), Handbook of research methods in personality psychol- ogy (pp. 224–239). New York, NY: Guilford Press. Richards, D., Scott, S., Furness, J., Lamb, P., Jordan, D., & Moore, D. (2016). Functional symbology—Evaluation of task-specific Head-Up Display information for use on a commercial flight deck. In AIAA modeling and simulation technologies confer- ence. Reston, VA: American Institute of Aeronautics and Astronautics. https:// doi.org/10.2514/6.2016–3374 Shiffrin, R. M., & Atkinson, R. C. (1969). Storage and retrieval processes in long-term memory. Psychological Review, 76, 179–193. 180 Improving Aviation Performance

Sterman, M. B., & Mann, C. A. (1995). Concepts and applications of EEG analysis in aviation performance evaluation. Biological Psychology, 40, 115–130. Tsang, P., & Wilson, G. (1997). Mental workload. In G. Salvendy (Ed.), Handbook of human factors (2nd ed., pp. 417–449). New York, NY: Wiley. Vidulich, M. A., & Wickens, C. D. (1986). Causes of dissociation between subjective workload measures and performance: Caveats for the use of subjective assess- ments. Applied Ergonomics, 17, 291–296. Villringer, A., Planck, J., Hock, C., Schleinkofer, L., & Dirnagl, U. (1993). Near infrared spectroscopy (NIRS): A new tool to study hemodynamic changes during acti- vation of brain function in human adults. Neuroscience Letters, 154(1–2), 101–104. Weiss, M., & Lavidor, M. (2012). When less is more: Evidence for a facilitative cath- odal tDCS effect in attentional abilities. Journal of Cognitive Neuroscience, 24, 1826–1833. Wickens, C. D., & Hollands, J. G. (2000). Engineering psychology and human performance (3rd ed.). Upper Saddle River, NJ: Prentice Hall. Section IV

Applications

9 Cold Bay Alaska Engine Change

Michael Hagler

CONTENTS Prologue...... 183 Houston, a.k.a. MCC, We Have a Problem: The Story of the Cold Bay Engine Change...... 184 “Anyone Want to Go to Alaska for an Engine Change? It’s Beautiful This Time of Year!” ...... 186 Remote Operation ...... 187 The Participants Are Assembled ...... 187 Let the Party Begin! ...... 188 It’s 0300, a Pleasant 23 Degrees Outside, and the C-130 Just Landed with the Engine. Time to Go to Work! ...... 189 Heartbreak: Bad Part from Stock ...... 192 1 Giant Airplane, 1 Tank of Glycol, and 2 De-Icing Trucks ...... 192 Engine Change 2.0...... 193 Volcanoes and Runway Lights ...... 194 Murphy’s Law Strikes Again ...... 195 So Close and Yet So Far ...... 195 Authors Note ...... 196

Prologue The following story provides a window into an event. The Cold Bay Alaska engine change was accomplished by aircraft mechanics doing what they were trained for, albeit in extreme conditions. They were, in effect, the tip of the spear (or iceberg in this case) of an organization that operates on many levels to keep an incredibly diverse fleet of aircraft successfully operational and, in the context of a commercial airline, in revenue service. Of course, no department operates in a vacuum. When an airplane breaks, everyone feels the pain. It starts with the passenger service agent who has to reprocess a plane full of passengers, rebooking, or securing hotels for the stranded. Flight crews who are out of position have to be rerouted. Reservation agents have to notify passengers their flight has either been cancelled or delayed.

183 184 Improving Aviation Performance

Ramp agents have to unload and reload tons of cargo and bags. The list goes on. On the rare occasion an airplane makes an unscheduled landing in a remote location, the direct ambassadors of the airline are the flight crews, employees travelling on a pass, or non-revs as they are called in the indus- try, and any other people associated with the flight that may have an airline connection. It’s ultimately in their hands to help mitigate a less than sterling situation for passengers until the cavalry arrives. Based on interviews and notes from people in CDB (airport at Cold Bay, Alaska), that is exactly what happened. Helping the airline crew were the staff of the Cold Bay airport who acted with warmth, humanity, and upmost professionalism toward the crew and passengers alike. The help they rendered to the aircraft mainte- nance team during the six days was priceless. An event like this can be a galvanizing experience for an employee. For a line technician pushing the limits of endurance in adverse weather condi- tions with a group of fellow mechanics, it can become a bonding experience; similar in many ways to the bond soldiers forge in combat. The Seattle-based mechanics still talk about the last Cold Bay engine change they performed in 2013. These shared experiences can also cross divisions; in this case, the relationship between the engine change crew and the pilots who helped and supported them. The business world measures performance in cold numbers, metrics, and mind-numbing charts and graphs. Stories like this should remind us that, more than anything else, an organization is comprised of flesh and blood humans, analog in design and operation, and capable of doing absolutely amazing things. People in the airline industry do it every day.

Houston, a.k.a. MCC, We Have a Problem: The Story of the Cold Bay Engine Change When the number two engine on flight 86, a Boeing 767 flying from Japan to Portland, Oregon, suddenly lost oil pressure over the North Pacific Ocean and forced the flight crew to make an emergency landing in Cold Bay, Alaska, it set in motion a process that would test the mettle of an aircraft maintenance organization and the line maintenance technicians, who ultimately made aircraft 361 operational. When the pilots of flight 86 performed an inflight shutdown due to near zero oil pressure on the number 2 engine, a Communications Addressing and Reporting System message (ACARS) alerted the airlines Maintenance Control Center (MCC) to the malfunction. Immediately, a process was ini- tiated that would have to address a plethora of logistical challenges. Of immediate importance, however, and certainly germane to the pilots and Cold Bay Alaska Engine Change 185 passengers, land the airplane as soon as possible at a suitable alternate air- port. Operating a 175-ton, one engine airliner over the remote latitudes of the North Pacific Ocean, even for two seasoned professionals, couldn’t have been a “comfort zone” flying experience (see Figure 9.1). True to their training

FIGURE 9.1 Shortly before touchdown in Cold Bay. (Courtesy of Kirk Davenport.) 186 Improving Aviation Performance and experience, Captain Greg and First Officer Jay diverted into the nearest alternate airfield, and on one engine, less than two hours after their number 2 engine lost oil pressure, guided their aircraft safely into Cold Bay, Alaska, where ship 361 would remain for almost a week. Any time an aircraft breaks in a remote location, an airline maintenance control department becomes the logistical focal point. Maintenance Control coordinates parts for the broken aircraft, secures the rescue ship and crew with the sector manager, works with inflight services to provide flight atten- dants (F/As) needed for the rescue flight, and dispatches technicians to the broken aircraft. The MCC is in essence an asset manager. The aircraft might belong to maintenance, but it takes everyone to bring her back home. And it’s MCC who has to put the puzzle together. Curt, the operations manager in MCC at the time 361 became an Aircraft Out of Service (AOS), recalled a humorous conversation with inflight services trying to find out if the flight attendants from 361 would be able to extend their duty times to support the return rescue flight. The MCC manager explained to the inflight manager if they did not extend, they would have to stay in CDB because there wouldn’t be enough seats on the rescue plane. They would be left behind and forced to share an eight room hotel with the three pilots and the six Seattle-based mechanics who were trying to resurrect 361. Some basic math suggested it was better to extend one’s duty time than spend days “hot bunking” Navy style. Needless to say, the original flight attendants made the return trip! Sharing the fun with Curt that day in MCC were two other maintenance controllers. Curt later described the CDB double engine change as one of the most logistically challenging events since an Ascension Island engine change. Due to the remoteness of CDB, cellular communications were spotty at best, exacerbating an already strained communication chain. It would only get worse.

“Anyone Want to Go to Alaska for an Engine Change? It’s Beautiful This Time of Year!” For aircraft line mechanics, going on a maintenance road trip is a fairly clear-cut process. It usually starts with a call from Maintenance Control to a station or station manager, requesting support of either a non-maintenance station with an AOS aircraft, to augment a staffed maintenance station with extra manning, or to provide expertise with an esoteric maintenance task that hasn’t been performed in that station before. In the airline maintenance world, it’s rare, if not unheard of, for a request for help to go unheeded. The aircraft maintenance family takes care of its own, and when the call goes out for help, you can bet someone will be there to help a sister station get an airplane operational! Such was the case when Maintenance Control Cold Bay Alaska Engine Change 187 called the Seattle station maintenance manager whose phone at this point was nearing melt down status and asked if Seattle could dispatch a crew to Cold Bay, Alaska to change an engine on aircraft 361. As soon as the call was made, the scramble was on to secure a crew who could dispatch as soon as possible. Time was of the essence due to the fact the road trip crew would be departing on the “rescue” plane heading up to Cold Bay to retrieve the passengers, and they would only have two hours to prepare. Concurrent to preparing for this road trip, Seattle maintenance was also asked to sup- port another station in the lower 48 (California) with a couple of Aircraft Maintenance Technicians (AMTs) to assist with another airlines contract airplane. The two mechanics slated to support that project were ultimately re-tasked and ended up on the Cold Bay trip due to the urgency and magni- tude of that AOS event. There is little doubt the weather would have been slightly more hospitable in California!

Remote Operation There are basically two types of road trips, domestic and remote. Domestic is defined as being close to logistical support, food, water, tooling, and reason- able access to aircraft parts, to mention a few. Remote is the most problematic. Engines, tooling, and anything else that will be needed for the task usually have to be flown in. In the absence of other scheduled carriers, simple parts availability is problematic at best. Hardware, maintenance manuals, sealant, electrical parts, rags, tape, and even basic chemicals have to be thought out in advance and transported to the airplane site. Prepping for an event like this has many similarities to a military combat deployment. The devil is in the details. To add to the mayhem, more often than not, mechanics are likely to be near the end of their shift when the road trip request comes in. Trying to figure out which personal items one might need in addition to the tooling and miscellaneous parts needed for a remote field trip is made all the more difficult if one has already been awake all night working, and now has to reboot his or her brain to prepare for a week away from home and get their brain wrapped around an engine change.

The Participants Are Assembled Soon all of the crew members were assembled. AMTs Greg and Dave, Lead Inspector Kirk, Lead AMT Randy, and AMT Jason along with two of the newer members of the Seattle maintenance team, Evan and Paul were 188 Improving Aviation Performance packed and, along with hundreds of pounds of tooling, ready to launch to Cold Bay, Alaska. It’s worth noting that Randy, Kirk, and Jason, veterans of the last engine change road trip to Cold Bay, were no strangers to this remote airfield. This time, however, the engine change would be accom- plished in deep winter, not in the relative balmy conditions of October, like the last one in 2013. Rescue ship 396 arrived in Cold Bay at 3 pm local time along with the Seattle-based maintenance crew and pilots to bring the stranded passengers back to Portland. They were greeted by the airport manager who remembered the returning mechanics by name. The three pilots from ship 361, Greg, Jay, and Paul would remain in Cold Bay until 361 was operational; however, Paul had to return early, leaving Greg and Jay there for the duration. Cold Bay is a remote location, but it is no stranger to commercial traffic dropping in occasionally. Strategically located at the top of the Aleutian chain of islands, CDB is an alternate airport for commercial flights that experience mechanical issues and need to get on terra firma as soon as possible. It is roughly 600 miles from Anchorage, and although it is served by Penn Air, a small regional oper- ator, offering flights to Anchorage, large commercial jets seldom frequent its nearly 11,000-foot asphalt airstrip. This week, however, would bring an influx of large (including one very large) aircraft to the small village of 100 people. To a resident of the Lower 48, the name Cold Bay, Alaska can evoke images of igloos, tin shacks, Kodiak Bears, and a frozen barren wasteland. In reality, it is a hospitable, albeit meteorologically challenged, town surrounded by incred- ible beauty, which rolls out the welcome mat any time a jetliner drops in for a visit. How many places can have an airplane land in their local airport and nearly triple the population of their town! Throwing the full resources of their town at the stranded airliners’ disposal, the staff of the airport and the local citizens made the passengers feel at home.

Let the Party Begin! Once ship 396, the CDB rescue plane, was loaded with the passengers and the flight crew, it was ready to launch. Luck and timing were on the side of the rescue plane when an impending weather event came within 15 min- utes of scuttling take-off due to snow and ice conditions. With no airport de-icing capability, there was no room for bad weather. Flight 86 finally left Cold Bay with 206 Portland-bound passengers and 10 crew members. For the mechanics, the real work was about to begin. It was 9 pm, and the first order of the day was to secure 361 for cold weather. This meant draining all potable water. With the extreme cold temperatures and the amount of time the plane would likely be on the ground, freeze damage prevention was the top priority. Cold Bay Alaska Engine Change 189

MCC was still trying to source a replacement engine and without the major tooling and ladders, there was little to do but catch some much-needed sleep and mentally prepare for the engine change. Early the next morning, the crew was up and assembled in the 10-degree, bone-chilling dawn. They checked on the condition of the aircraft and operated the APU (auxiliary power unit) and the packs (aircraft pressurization and temperature control system) to warm up the interior. At the request of MCC, they pulled the chip detectors to assess the extent of any internal damage to the engine. While surveying the plane, it became clear to the maintenance crew that when 361 became operational, it would need de-icing. Snow and ice was already starting to accumulate on the fuselage and wings. Depending on the weather over the next few days, it would certainly get worse. There will be more on this logis- tical challenge later. On the 16th January, an engine was finally sourced in Los Angeles International Airport (LAX), and the crew began the engine pylon disas- sembly process. They were cautious not to start removing the engine cowls too early. To do so would have introduced ice and snow into the crevices of the pylon and engine assembly, making the removal and installation of the hoisting equipment and engine connecting hardware exceedingly difficult. With the generous help of the local airport manager and staff, the crew built a windbreak and shelter out of two 20-foot long Conex containers (the type used on container ships) and some large tarps. This would be their make- shift, open-air hangar for the next few days.

It’s 0300, a Pleasant 23 Degrees Outside, and the C-130 Just Landed with the Engine. Time to Go to Work! In the frigid dark hours on the night of the 17th, a Lynden Air Transport civil- ian version of a C-130 Hercules four-engine turboprop transport touched down in Cold Bay with the replacement engine (see Figure 9.2). The Pratt & Whitney 4060 filled practically every inch of the fuselage diameter in the C-130. LAX maintenance, in addition to loading the replacement engine also shoehorned ladders, a heater, generator, lights and almost anything else they could ship to help the engine change. Massaging the engine out of a C-130 was an art form in its self (see Figure 9.3). It involved a cable winch located inside the aircraft to prevent the 12,000 lb. engine and ship- ping cradle from making an uncontrolled excursion through the back of the airplane. A fork lift with a chain was positioned behind the loading ramp to gently pull the engine assembly out of the rear of the plane, while a loadmaster pays out the cable and, with pry bars, “steers” the casters on the engine cradle to ensure the engine doesn’t damage the transport. At one point, there was less than four inches of clearance between the top 190 Improving Aviation Performance

FIGURE 9.2 Engine removed from pylon, waiting for replacement. (Courtesy of Kirk Davenport.)

FIGURE 9.3 Flight crew wrestles Pratt & Whitney from C-130. (Courtesy of Kirk Davenport.) Cold Bay Alaska Engine Change 191 of the engine fan case and the C-130’s inner structure. Needless to say, finesse was the order of the day. After the engine change gear was carefully unloaded from the transport, the process of engine removal began. As the Lead AMT on the project, Randy was responsible for maintaining an inventory of tooling and parts. He also was in charge of maintaining and processing the reams of paper- work that accompany an engine change. Because this was not exactly a basic office environment, special care had to be exercised not to lose anything in the gusty winds and snow. By 0800, the nose cowl had been removed, hun- dreds of pounds of engine hoisting equipment mounted to the pylon, and all the hydraulic, pneumatic, fuel, and electrical connections had been discon- nected. By 1100, the engine had been removed. Undoubtedly, one of the most formidable looking and intimidating tasks performed on an aircraft is an engine change. For anyone who hasn’t been involved in an engine change on a commercial jet, the process could best be described as a six-piece orchestra performing a concert. The lead AMT is the conductor; his or her job is to direct the mechanics, set the tempo, and ulti- mately produce the fine music of an operational jet. It’s where brute force, precision, and attention to detail go hand in hand. A crew who has changed an engine multiple times can develop a synergy, at which point the process becomes Zen-like and almost automatic. Most engine changes occur in a han- gar where the temperature is generally regulated and precipitation, frozen or otherwise, is a non-issue. A field engine change is another animal. In a pro- cess that involves hundreds of individual tasks, all documented and critical, an engine change in the Aleutian Islands in the middle of winter can become an epic if not miserable undertaking. At 1530 on the 17th, roughly 12 hours from the time the engine arrived, the Seattle-based team had the new engine up and torqued. At 1600, the weather turned foul with sustained winds of 26 knots and the wind chill factor dropping to 1-degree F. In spite of this and with the much-appreciated help of the stranded pilots, Greg and Jay, assisting with equipment, cook- ing, and providing much needed morale support, the crew continued to perform the final installation of the engine. As mentioned earlier, the pilots from 361 remained with the plane, the exception being Paul who was forced to return home after three days. The mechanics had nothing but glow- ing comments about Greg and Jay. Their help was indispensable. In this unlikely liaison, a professional bond was undoubtedly forged during the six days they spent together on a frozen airfield in the middle of nowhere resurrecting a 767. By 2100, 18 hours after the engine change commenced, Inspector Kirk prepared to “dry motor” the freshly installed engine. This is the first phase in operating a freshly installed engine. It ensures there are no leaks in the engine or on its peripheral components before actually running it. 192 Improving Aviation Performance

Heartbreak: Bad Part from Stock There were no leaks noted during the dry motor, but when the engine was started the oil quantity dropped to zero. The engine was re-serviced and run at idle for 4 minutes. The crew looked on in disbelief as the engine shot a 2-inch stream of oil from the engine breather tube. This was an indication of a possible internal issue with the engine, and one that certainly could not be easily repaired or diagnosed on a snow- and ice-covered airstrip 4000 miles from the airline maintenance operation center. The crew, exhausted and cold soaked from 18 hours installing an engine in subfreezing weather and wind, called it quits for the day and returned, bitterly disappointed, to the hotel to sleep and regroup.

1 Giant Airplane, 1 Tank of Glycol, and 2 De-Icing Trucks While the engine change was in progress, in the Lower 48, the hunt was on to figure out how to get 361 de-iced once it was operational again. Many ideas were proposed, use the portable unit that was in Cold Bay, try to get a de-icing truck from Anchorage, even try to barge a truck from Seattle. Unfortunately, these had drawbacks. Anchorage couldn’t spare a truck, and a portable unit simply didn’t contain enough product to ade- quately de-ice a 767 that was already cold soaked and would probably accumulate more ice and snow over the next few days. It could conceiv- ably take a couple of thousand gallons of type 1 (heated de-ice fluid that removes the initial build-up of snow or ice) alone to clear the ice and snow accumulation. Shipping a truck on a barge from Seattle not only would take two to three weeks, depending on the weather, but would possibly expose the vehicle to the Gulf of Alaska in the middle of winter, a very damaging marine environment in the least. Exposure to salt, cold, and ocean spray could compromise the truck. After perusing the e-mail trail, Seattle maintenance proposed sending two trucks, each full of type 1 and 4, one ground service technician to be the caretaker, and two experienced de-ice crews. Experienced was the operative word. If the weather turned bad, there would only be one chance to de-ice 361. Wasting precious de- icing fluid would not be an option, every gallon had to count. Between the two trucks would also be an additional 800 gallons of Type 4 anti-ice fluid, which would help augment the hold over time if it became neces- sary. Chartering a Russian Antonov 124 heavy lift, roll on roll off tactical transport and delivering the goods direct to CDB, although not necessar- ily cheap, was the only expedient way to solve the de-ice problem for 361. Cold Bay Alaska Engine Change 193

FIGURE 9.4 The crew unloads the giant AN 124 containing the de-icing trucks and spare equipment. (Courtesy of Kirk Davenport.)

MCC concurred and contacted a charter company to provide an Antonov 124 to ship the two de-ice trucks and support crews (see Figure 9.4).

Engine Change 2.0 At 0400 on the 19th, the giant Antonov 124 touched down in Cold Bay, tax- ied up to the comparatively diminutive 767, and disgorged two de-ice trucks, two MSP-based de-ice crews, two MSP-based ground service equipment (GSE) mechanics, a spare tank of glycol, some miscellaneous tools, and one engine scavenge pump, which was sent in an attempt to salvage the replace- ment engine on 361 that was dumping its oil the night before. The pump was changed to no avail; the problem persisted. Unfortunately, there was no option other than to change another engine. For the second time in 48 hours, working in freezing conditions, the Seattle crew stripped the inoperative engine of its nose cowl, fan cowls, and reattached the hundreds of pounds of bitterly cold steel hoisting equipment to the pylon structure in preparation for the engine removal (see Figure 9.5). By 1600, the crew had removed the engine from the airplane and placed it in a small hangar next to the previously removed engine. 361 sat once again, its right wing engineless on the barren, frozen tarmac waiting for another C-130 to deliver a replacement engine. 194 Improving Aviation Performance

FIGURE 9.5 The crew digging their equipment out of the snow before continuing with the removal. (Courtesy of Kirk Davenport.)

Volcanoes and Runway Lights In keeping with the early morning tradition of engine and materiel arrival logistics, the C-130 with replacement engine number two arrived in the wee hours of the morning of the 20th. It would have been on the ground at 0100 but a glitch with the runway lights delayed its landing until 0130. True to the “epic and remote” nature of this engine change venue, the nearby erupting Bogoslof volcano almost delayed the engine delivery when air traffic was threatened by its ash cloud. Fortunately, this time, luck was on the side of the engine change crew and the ash cloud drifted away from Cold Bay. By 0330, the engine had been unloaded and lead AMT Randy with the help of the MSP based de-ice crew (Randy let the engine change crew continue with their desperately needed sleep and asked the de-ice crew to help him unload the engine) positioned the new engine adjacent to 361. At 0830, the engine was unwrapped, prepped, inspected (Inspector Kirk’s second in as many days), and ready to install. Amazingly, at 1200, engine number 2 was hoisted up to the pylon, the eight attachment bolts were torqued leaving only the ancillary hook-ups. In less than five hours, all of the lines, wires, tubing, and engine cowls were reinstalled. The engine was ready to run! This time the results would be more satisfying. For anyone involved in an aircraft engine change, the final test after all of the preparation and sweat (unless you are in Cold Bay) and hard work is the Cold Bay Alaska Engine Change 195 leak check after the engine has been installed. This starts with the dry motor of the engine to ensure that hydraulic lines are not leaking, among other things, and ends up with the initial engine run. All fingers are crossed when the engine is motored, the fuel switched on, and the engine lights off with a muffled pop and a long puff of smoke, due to the preservative burning off, and continues to accelerate until it reaches a stable rpm. Such was the case when the Seattle mechanics started the number 2 engine on ship 361. The bitter disappointment after the failure of the last engine was replaced with a collective breath of relief when the new engine operated flawlessly. Now it was time to get packed and head back home.

Murphy’s Law Strikes Again Murphy’s Law goes something like this, if it can go wrong, it will go wrong. Aircraft mechanics know this adage all too well. After the first replacement engine failed, conventional wisdom would have suggested all of the bad luck for the crew had been expended on that. Unfortunately, it wasn’t the case. At 1030 on the 20th, ship 361 was loaded with equipment, Jason, Evan, Randy, and Kirk in preparation to finally complete its flight to Portland International Airport (PDX). Mechanics Greg and Dave volunteered to remain behind to perform mop-up detail and load the remaining engines and equipment on the C-130. Captain Greg and First Officer Jay were going through their pre- flight checklists when they noticed the standby horizon was inoperative. In many circumstances, this could have been placarded but due to daylight restrictions, this was a no-go item. The call went out, and CDB charter plane number four was dispatched to deliver a replacement for the recalcitrant instrument. Late morning on the 21st, the chartered jet landed in CDB with the standby horizon. Although as capable of grounding a jet as a bad engine, the diminu- tive instrument was infinitely easier and faster to replace. In short order, a very motivated maintenance crew had the new part installed, and, at 1100, the 767 rolled down the icy asphalt airstrip, lifted-off, and headed to Portland.

So Close and Yet So Far Flight 86 arrived on US soil (Cold Bay) having not cleared customs. The pas- sengers were transferred to the rescue plane in CDB and went on to clear customs in Portland but 361, still loaded with bags and freight, would have to make a stop in PDX to clear customs and unload the bags and cargo before 196 Improving Aviation Performance it could proceed to Seattle to drop off the flight crew and mechanics. After some confusion after arriving in Portland over the status of the mechanics due to them boarding a “sterile” airplane on U.S. soil and flying to U.S. soil, a very foul smelling jet (the lavatories had not been serviced in a week) arrived in Seattle that evening. 361 taxied to the west side of the hangar and parked. The pilots and mechanics disembarked the airplane that had been the focal point of their work lives for the past 7 days, collected their tools, personal belongings and equipment and headed back to their homes and families. It was another successful aircraft maintenance road trip and one they wouldn’t forget anytime soon.

Authors Note This story is based on an actual event. It is typical of situations that airlines and aircraft mechanics face in daily operations. This particular event is significant, because of the location and the logistical challenges facing the mechanics and their maintenance organization. The names of all involved have been modified for the story. 10 Operational Issues in Aviation Psychology

Kathy Fox, Helena (Reidemar) Cunningham, Michael Hagler, Daniel Handlin, and Richard J. Ranaudo

CONTENTS Operational Safety Issues in Aviation Psychology ...... 198 Suggestions for the Way Ahead ...... 202 Developing a Modern Airline Flight Operating System Based on Human-Centered Operating Principles ...... 203 History Lesson—Traditional Standard Operating Procedures ...... 203 Current Conditions ...... 203 Problematic Procedures ...... 204 Bad Apples or Something Else? ...... 205 Finding a Healthy Balance ...... 205 Welcome to the Real World ...... 206 Recognizing Inferior Procedures ...... 206 A Path to More Resilient Procedures ...... 207 An Industry Initiative ...... 208 Conclusion ...... 209 Unaddressed Line Maintenance Human Factors Challenges in Part 121 Operations ...... 210 A World of Differences: Depot Maintenance and Line Maintenance ..... 210 Training, an Evolving Mission Requirement ...... 211 Maintenance Resource Management: Coming to Grips with a New Operational Reality ...... 212 A Tip of the Proverbial Iceberg ...... 213 Artificial Intelligence in Aviation ...... 214 Acknowledgments ...... 215 References ...... 215

This chapter is based on the Practitioner Plenary Panel held at the International Symposium on Aviation Psychology (ISAP) in May of 2017. This Panel was followed by the Researcher Plenary Panel held on the following day of the symposium. The overarching goal of the two sessions was to foster a dialogue between operational personnel and researchers towards a safer and more efficient flight environment. The practitioner panelists

197 198 Improving Aviation Performance were invited to articulate their operational challenges and to inform the aviation community of their vision of a more efficacious aviation system. The researcher panelists were invited to discuss ways to shape research questions and theories on human performance based on today’s operational challenges and to help translate scientific discoveries into practical solutions. As Stokes (1977) argued in his book, Pasteur’s Quadrant, use-inspired research would be a good path towards both advancing basic science and accelerating the process of putting basic knowledge to practical use. Dr. Steve Casner from the National Aerospace and Space Administration (NASA) moderated the practitioner panel and charged the panelists with two questions:

1. What are the most important unaddressed human-factors issues witnesses in everyday operations? 2. What still needs to be fixed?

This chapter captures the panelists’ thoughts and discussions on these issues.

Operational Safety Issues in Aviation Psychology Kathy Fox The mandate and sole purpose of the Transportation Safety Board of Canada (TSB) is to advance transportation safety in the air, marine, rail, and pipeline modes of transportation. The TSB does this by conducting independent investigations; identifying safety deficiencies, causes, and contributing factors; making recommendations; and publishing reports. Put another way, when something goes wrong, the TSB investigates to find out not just what happened, but why. And then it makes public what has been learned, so that those best placed to take action—regulators and industry—can do so. The first question that the panel was charged with is tough for an organization like the TSB to answer. That’s because of the nature of the work itself: TSB investigators usually arrive after things have gone wrong, meaning they don’t get to witness “everyday operations.” The TSB’s data set, therefore, shows a very limited slice of abnormal operations, and individual investigations are often in-depth looks at “really bad day operations.” However, TSB investigations do offer some good indications regarding the second question: What still needs to be fixed? To that end, the biggest issue that the Human Factors community (and aviation psychology) needs to address is: How to improve the uptake of human factors methods and knowledge in the design and monitoring of the aviation system? To illustrate this, consider three examples from recent TSB investigations. Operational Issues in Aviation Psychology 199

Example 1: TSB Aviation Investigation Report A13H0002 On September 9, 2013, a helicopter took off from the Canadian Coast Guard Ship (CCGS) Amundsen with one pilot, the vessel’s master, and a scientist on board for a combined low-level ice measurement and reconnaissance mission in M’Clure Strait, Northwest Territories. Seventeen minutes after failing to arrive back at the ship as expected, the helicopter’s position was checked on the flight-following system, which was displaying the position as 3.2 nautical miles from the vessel. The CCGS Amundsen’s crew attempted to communicate by radio with the pilot, but without success. The vessel then proceeded toward the helicopter’s last displayed position, where crew quickly spotted debris from a crash. Unfortunately, none of the three people on board survived. The TSB’s subsequent investigation found that the helicopter likely crashed because of the pilot’s spatial disorientation or distraction during the low- level flight. However, the investigation also looked at broader issues, and one of the things we found was this: although the vessel’s new flight-following system displayed a continuous digital readout of the helicopter’s position, the vessel’s crew had not been trained “to the required level of competence to set up the … system and later interpret the information displayed on the control display unit.” As a result, “this reduced the effectiveness of the flight- following system” (Finding as to Cause #5, page 86). Figure 10.1 shows the location of the flight-following system within the Amundsen’s wheelhouse. The system identifies the helicopter by flight number (CCG-364) and displays the helicopter’s longitude position and the time that position was received. (Latitude would be shown on another “page” of the display.) In addition, the investigation found that there “was no aural warning to alert the vessel’s crew immediately that the helicopter was no longer

FIGURE 10.1 A13H0002 flight following system with time and positional data about the helicopter, along with instructions for how to initiate the rendezvous function. (Courtesy of TSB.) 200 Improving Aviation Performance

transmitting position reports” (Finding as to Cause #6, page 86). This, in turn, delayed the initiation of search and rescue efforts. Finally, although not directly identified as a causal or contributing factor, the investigation found the following associated risk: “If systems are developed without the benefit of appropriate end-user input and the use of relevant human factors design standards, then there is an increased risk that display systems will not be suited for their end purpose and that users will not use them correctly” (Finding as to Risk #7, page 87).

Example 2: TSB Aviation Investigation Report A13A0075 On July 3, 2013, a Bombardier CL-415 amphibious aircraft touched down on Moosehead Lake, Newfoundland and Labrador, to scoop a load of water as part of efforts to fight a nearby forest fire. During the scooping run, the aircraft took on too much water, because the switch controlling the scooping probes was in the wrong position and the flight crew did not notice. The heavier-than-intended water load extended the takeoff run, placing the aircraft closer to the shore than desired, and the pilot flying elected to turn the aircraft to the left to allow for a longer departure path. During the turn, the left float contacted the water while the hull became airborne. The resulting forces caused the aircraft to “water-loop,” and it came to rest upright but partially submerged. Although there were no injuries to the two crew members, the aircraft was destroyed. Figure 10.2 shows the toggle switch that controls the scooping probes. The switch’s design is such that it can be easily moved from one position

FIGURE 10.2 A13A0075 toggle switch. With the center pedestal cover removed, the switch is exposed and can be moved easily. (Courtesy of TSB.) Operational Issues in Aviation Psychology 201

to another. Sure enough, the TSB’s investigation revealed that: “It is likely the switch was inadvertently moved from the AUTO selection (which controlled the scooping to a pre-determined limit) to the MANUAL selection when the center pedestal cover was removed, which is why the probes continued to scoop water beyond what the crew expected” (Finding as to Cause #1, page 25). The investigation also found that challenges can arise when automation is introduced without first being incorporated into Standard Operating Procedures (SOPs). This was expressed in some of the investigation’s findings:

• The AUTO/MANUAL switch position check for the scooping probes was not included on the (aircraft’s) checklist. • This omission from the checklist had been identified by flight crews during informal discussions, but at the time of the occurrence, the checklist had not been amended. In fact, since [the previous year], flight crew [had been] relying on their memory to ensure that the PROBES AUTO/MANUAL switch position was checked at the appropriate time. • During the scooping run on the lake, the crew did not notice that the water quantity exceeded the predetermined limit until after the tanks had filled to capacity, resulting in an overweight condition, which compromised the aircraft’s performance.

Before drawing any conclusions, consider one last example where the melding of people and technology didn’t go as smoothly as might otherwise have been expected—in this case, an accident where a crew had used aspects of automation only infrequently or in a mode they weren’t practiced in using.

Example 3: TSB Investigation Report A14F0065 On May 10, 2014, an departed Toronto Lester B. Pearson International Airport, Toronto, Ontario, under instrument flight rules for Montego Bay, Jamaica, with 131 passengers and six crew members on board. The flight crew was cleared for a non-precision approach to Runway 07 in visual meteorological conditions. The approach became unstable: there was excessive airspeed, as well as vertical speed deviations, an incomplete landing checklist, and unstabilized thrust. The aircraft touched down hard, exceeding the design criteria of the landing gear. Fortunately, there was no structural damage to the aircraft nor was anyone injured. No accident is ever the result of a single action by a person or organization, and after completing its investigation, the TSB made 10 findings about causes and contributing factors. One of them, however— the use of the autothrust—clearly illustrates the point of this article. As stated, the flight was unstable as it approached the runway. Although the flight crew recognized that they were a bit high and too fast, due to a combination of factors—including distraction and confusion about what the automation was doing—they turned off the autothrust, manually reducing the thrust in order to “slow down and get down.” 202 Improving Aviation Performance

By turning off the autothrust, the flight crew had full authority/ control of the thrust system—and yes, they successfully reduced thrust and lost enough speed. But as they continued the approach, they forgot that the autothrust was still off. Their speed continued to decrease, and they ended up coming in too slow: below the target speed. This resulted in the aforementioned hard landing, a fact that was reflected in the TSB’s subsequent finding: “Air Canada Rouge did not include autothrust-off approach scenarios in each recurrent simulator training module, and flight crews routinely fly with the automation on. As a result, the occurrence flight crew was not fully proficient in autothrust-off approaches, including management of the automation” (Finding as to Cause #11, page 40).

Suggestions for the Way Ahead Returning to the earlier point that TSB investigators don’t often get to see much of “everyday operations,” it should be noted that the three examples used in this chapter are only the tip of the iceberg. Similar instances—where automation that is not well-designed around the human user or where it is not effectively implemented—are easily found in operational contexts. Despite this, human factors—which is by no means a new field of study— has struggled as a field to be systematically considered during system- development and system-integration activities. So, what to conclude from all this? And, perhaps more important, what to do about those conclusions? Here are a few thoughts:

First, when designing systems, consider end-user and human-factors design standards. Second, don’t introduce automation without also including it in SOPs. Third, crews must be familiar with the technology they are using. In other words: practice, practice, practice.

And finally, effective safety management processes are critical to identifying and mitigating human factors hazards. Because no matter how proactive they try to be, no individual or organization can expect to anticipate all of the possible issues that might arise during the development and introduction of a new system. This means the flow of information and effective system monitoring (through processes such as a safety management system, or SMS) is critical—to both the success of that system and to continual improvement. A note of caution about that last point: although there are many organizations that acknowledge the importance of information flow and effective system monitoring, making it actually happen can be challenging. That’s because these things only work if they are enshrined in a robust safety culture—in other words, a company culture where people feel “safe” to report issues Operational Issues in Aviation Psychology 203 and incidents, free of fear of reprisal, and where management (and, ideally, the entire organization) not only listens, but actively seeks to learn from the problem being reported and takes action to correct it.

Developing a Modern Airline Flight Operating System Based on Human-Centered Operating Principles Helena Cunningham and Dan Handlin History Lesson—Traditional Standard Operating Procedures When Standard Operating Procedures (SOPs) were first introduced almost 50 years ago, the concept was enthusiastically embraced. And for good reason—the safety and efficiency value of standardization is immense. In subsequent decades, SOPs have been tweaked in response to incidents, accidents, new technology, and changing operational requirements. The current blend of old and new SOPs is very well intended, but it’s not uncommon for airlines to have SOPs with inherent contradictions. And although all SOPs are FAA approved, the reality is that not all SOPs are created equal, nor are they acceptable by human factors science.

Current Conditions Today’s flight deck operating system is not nearly as smoothly running and standardized as many suppose. It is actually a disorderly mess of distractions and multitasking challenges embedded in an overloaded environment pressurized by an industry desire to hurry up (see Figure 10.3). Pilots know this to be a fact, while others do not realize the time pressure flight crews have to balance today. The airline industry is forever in a state of flux with new entrants, new aircraft, bankruptcies, and mergers. Because of the constant change, management’s current operating systems have very different styles and remain works in progress, frequently changing with every new manager. Some are defined and disciplined, others flexible and open-ended. They can range from quite capable to deficient, but some generalizations can be made:

1. Each airline independently constructs and manages its own cockpit operating system; each has its own view of what is appropriate 2. There is no industry-wide understanding or application of best practice 3. If the procedures manager (assuming one exists) does not employ a proper procedures development system based on human factors science, repetitive systemic errors will occur. 204 Improving Aviation Performance

FIGURE 10.3 Flight deck complexity. (Courtesy of TSB.)

A poorly built flight deck operating system, like a computer program, can contain bugs or glitches that lead to systemic failure. Today, some airlines have cockpit procedures with multitasking requirement and poor human factors interface that jeopardize operational safety. Many procedures are created in isolation or are a knee-jerk response to operational difficulties, rather than being part of a coordinated process. The lack of an overarching human factors philosophy and procedures that align well with human capabilities and limitations has contributed to many types of systemic repetitive errors. Especially problematic is the prevalence of multitasking; flight deck duties are constantly being interrupted at critical times, distractions are plentiful, and pilots are expected to do whatever it takes to get the job done. Despite decades of human factors research there is no regulatory requirement to fully utilize our current level of human factors knowledge.

Problematic Procedures Some of the well-intentioned procedures and processes layered one on top of the other have created new latent threats. Studies reveal that they can work against scientifically established weaknesses in the human condition. In addition to the aforementioned inability to multitask, science has shown that Operational Issues in Aviation Psychology 205 we are consistently poor at relying on memory and managing distractions. This is especially true when we are fatigued or performing complex tasks. Conversely, establishing habit patterns, performing tasks in a fixed sequence, and anchoring important tasks maximizes human performance. Human- centered operating principles (HCOP) recognize these human limitations and expose SOPs that are unwittingly working against human nature.

Bad Apples or Something Else? Reports on past accidents and incidents overwhelmingly focus on pilots and their failure to follow established procedures. Rarely are those procedures themselves questioned. There is little attempt to develop or evaluate the effectiveness of the SOPs, resulting in the same type of accidents recurring over and over again as illustrated by the repeated occurrence of flap extension accidents listed below:

• LH 540, Boeing 747, Nairobi, 1974—59 fatalities, 55 Injuries • Northwest 255, MD82, DTW, 1987—156 fatalities, 1 injury • Delta 1141, Boeing 727, DFW, 1988—14 fatalities, 76 injuries • LAPA 3142, Boeing 737, Buenos Aires, 1999—65 fatalities, 40+ injuries • Mandala 091, Boeing 737, Medan, 2005—149 fatalities, 41 injuries • Spanair 5022, MD82, Madrid, 2008—154 fatalities, 18 injuries • LAMIA 2933, Avro RJ85, Medellin, Colombia, 2016—71 fatalities

The official reports for all these accidents cite pilots’ failure to follow established procedures as a cause. How can that be? Were all these pilots “bad apples?” Bad apple thinking extends to incidents as well; investigators and managers have been quick to assign blame to pilots. Dr. Sidney Dekker and other human factor scientists have concluded that this view is outdated, with the evidence strongly suggesting that the human error contribution is frequently a symptom of trouble deeper inside the operation. Recent studies of these and other accidents, find key deficiencies in the underlying procedures used by the crews. The mixture of inadequate SOPs, distractions, multitasking, and time constraints can line up the holes in the proverbial Swiss cheese and create a system built for failure, ready to ensnare even well- trained, well-rested, professional aviators.

Finding a Healthy Balance Modern airline flight operations have two main missions: safety and production. An operation solely focused on safety would not move any aircraft. An operation that focuses only on production will harm people and break equipment. Healthy operations find a reasonable balance. The lesson 206 Improving Aviation Performance here is that safety is not inherent in airline operations. Systemic human error is directly connected to pilots’ tools, tasks, and operating environment. A healthy system needs to recognize ways the operation works against human traits. A series of human errors of a similar type points to procedural problems. Human strengths and frailties need to be part of the balance between safety and production in a healthy, efficient, and stable operating system. The industry continues to employ processes that needlessly challenge the pilot’s ability to get it right and avoid errors. Fragile components make a weak system and pilot performance suffers. Today, the pilot must continually be reminded by management to “stay safe,” because the operating system does not properly drive and incorporate safety resilience. The mixture of the pilot and the operating system is not appropriate if the components collectively and repetitively produce adverse outcomes. Repetitive error is a classic example of a failing blend of the human, the airline flight operating systems (AFOS), SOPs, aircraft/support technology, and the environment. Some seek to fix these problems by focusing on just one component—the pilot, discounting the effect of other systemic failings. To improve the system, we must improve all its components and their interactions.

Welcome to the Real World Procedures and checklists are designed in a quiet room at Mach 0 with sufficient time to create a theoretically perfect plan. These procedures are often tested and trained in a controlled environment where events tend to be predictable and linear, any needed information is available, and the operation is always under complete control of the crew. The real world is much more dynamic. Pilots routinely juggle many tasks concurrently with frequent distractions and interruptions. Information and tools are also commonly overly complex, missing, or unavailable. There is little or no guidance on how to cope with these dynamic influences on our operations. To adapt, pilots create their own coping techniques and workarounds. Most of the time these ad-hoc efforts have acceptable outcomes, but they unwittingly increase vulnerability to error and compromise standardization.

Recognizing Inferior Procedures There are many common ways operators create procedures that work against the innate way our brains work. These include:

• Inadequate use of cues, anchors, and checks for critical items • Excessively long checklists • Low-value checklist items Operational Issues in Aviation Psychology 207

• Silent checklists • Broken or paused checklists • Redundant checklist items • Single item checklists • Rote reading of checklists • Repeated use of ambiguous or generic responses • Data overload • Poorly presented data (weight/dispatch/weather/etc.) • Silent unchecked/unverified flap position changes • Required interaction with the cabin or company at critical times • Overtasking First Officers when they are Pilot Flying • Requiring running of checklists while taxiing onto the active runway, rather than completing the entire checklist prior to approaching the runway • Inadequate connections between flow patterns and checklists • Not scheduling critical tasks early in the window of opportunity

A Path to More Resilient Procedures Procedures can be built to maximize the benefits of the way we process information and minimize the weaknesses. For example, forgetting to set the flaps before takeoff remains a recurring problem in airline operations. Simply including it in the checklist and exhorting pilots to pay attention and “fly safe” has not prevented numerous flap-related fatal accidents. None of the previously reported flap extension accidents had appropriate HCOP defined procedures. Crews failing to set the flaps could be substantially reduced by using proper cues, flows, anchors, and checks. For example, it is necessary to use a strong reliable memory cue to remind the captain to call for flap extension (cues such as: receiving the wave-off and the first officer [FO] hand on the flap handle set there by the FO after start flow). Then, we must anchor the captain’s call to the call for the Before Taxi Checklist; e.g. “Set flaps ,5 Before Taxi Checklist” (note that The Before Taxi Checklist call is the anchor for the flaps call). Then, we must check that the flaps are properly set using the Before Taxi Checklist (note that The Before Taxi Checklist will include a flap verification checklist item). To build resilient procedures, we must have a solid foundation upon which to scaffold. SOPs is only one of the components of the AFOS. To have an efficient, effective, and resilient flight we must provide the pilot with a blueprint for the entire operation. The components of the AFOS include SOP’s, checklists, maneuvers, communication, aircraft management, crew resource management (CRM), and the human. All components (including the human) 208 Improving Aviation Performance must be coordinated and their interaction described fully. All components must be correctly connected to avoid requiring excessive multitasking. A template for building the crew actions is a visual script for the AFOS. The seams that connect crew actions, the command items for action, the SOPs, and maneuvers should be woven together in a logical disciplined way, utilizing a systemic approach. The HCOP should guide the entire development process. If a procedure does not hold a specific position in HCOP (i.e., the task floats with regard to its time of accomplishment), it should be identified as a human factors concern for further review. Other human factors issues should be addressed in the AFOS including distractions by outside agencies (that include air traffic control [ATC], aircraft communications addressing and reporting system [ACARS], and crew members, etc.). The HCOP should be used by all aircraft in every fleet and need to be aircraft specific, as necessary. Each airline should maintain a “Standard HCOP Text” that ensures a standardized and disciplined approach for the entire airline that is managed by a designated procedures manager. Further, a template should be provided for designing each SOP. Each template should include at least the following:

1. ”Transition Points” into and out of the SOP task 2. “Cues” for memory support and action 3. “Anchors” for memory support and action 4. Pilot flying and pilot monitoring actions 5. Flow pattern action 6. Checklist requirements

All actions and tasks by the crew that require definition should be in the template. The HCOP should clearly explain how the flight is to progress and what actions/tasks the pilots must make in order to ensure safe flight completion. It is vital that the SOP designers clearly understand the need to build a defined and disciplined system and that SOPs are just one component of a larger AFOS.

An Industry Initiative In order to reach the next level of safety, many stakeholders have a role to play. Airlines can take a fresh look at their policies and procedures with human factors experts. Reviews should be performed by qualified but independent experts, since the owners of these programs often cannot see inherent problems. Manufacturers’ checklists and procedures are usually designed for moving one aircraft at a time rather than an airline operation, and many were created in an age when human factors knowledge was meagre, compared with today. NASA, especially the Human System Integration Division team at NASA Ames, the Federal Aviation Administration (FAA), Operational Issues in Aviation Psychology 209 and academia perform continuous research and have produced much new and fundamental information about how pilots best interact with their environment. They are a great resource to help develop the next generation of SOPs. The Air Line Pilots Association (ALPA) and other pilot associations contain the very humans these policies and procedures most directly affect. They know from experience the points of friction and frustration and where procedures break down in practice. Investigative agencies, such as the National Transportation Safety Board (NTSB), can demand that problematic procedures be examined with the same scrutiny that befalls error-prone pilots. The Commercial Aviation Safety Team (CAST) has developed many safety enhancements, and the HCOP initiative could fit their proactive mandate. The Aviation Safety Information Analysis and Sharing (ASIAS) system contains a huge trove of data. Organizations such as MITRE may be employed to conduct special studies. The Joint Implementation Measurement Data Analysis Team (JIMDAT) may be a rich source of data and able to identify gaps and emerging risks in this area. The HCOP initiative challenges the industry to collaboratively:

1. UNDERSTAND and BUILD organizational understanding of human cognitive capabilities and limitations. 2. DESIGN a flight deck interactive operating system structure that allows the natural inclusion of resilient HCOPs and Procedures. 3. DEPLOY modern human-centered error-resistant NextGen SOPs in the flight deck operating system structure. 4. Properly APPLY human capabilities and limitations in all crew interactive aspects of the flight deck operation. 5. ANALYZE the APPLICATION of HCOPs in the flight deck. 6. ESTABLISH POLICY that harnesses data guides the analysis and allows continuous improvement of HCOPs industry wide.

Conclusion Today’s flight deck has become dramatically more complex. It is imperative to apply countermeasures that provide our pilots with an intuitive environment that naturally flows and works with rather than against human nature and cognitive abilities. Standard Operating Procedures (SOPs) built using Human-Centered Operating Principles (HCOP) and applied to our Airline Flight Operating System (AFOS) is an idea whose time has come. The old model of blaming the “bad apple” pilot is outmoded because it does not provide an answer that can be used to fix the problem. All airlines should provide the line pilot with an effective and efficient operating system incorporating HCOP. A modern, resilient AFOS has a well-designed script that allows thinking but does not rely on memory. It works with rather than 210 Improving Aviation Performance against human tendencies. The key to developing a reliable, comprehensive, and modern flight deck operation is a collaborative effort using NASA Ames, the industry, and other stakeholders to build AFOSs that do not challenge us to “get it right” but instead provide an environment that naturally “makes things go right.” We must focus our procedures on the way humans work best by using HCOPs in a transformative approach that modernizes our core operating procedures.

Unaddressed Line Maintenance Human Factors Challenges in Part 121 Operations Michael Hagler The operationally byzantine world of commercial line aircraft maintenance is brimming with human factors challenges daily. A myriad of diverse and often conflicting tasks, multiple aircraft system troubleshooting, a 24/7 schedule operation, interruptions in workflow, exacerbated by shift changes, long hours, and conflicting information, can and do lead to human error. This can be an innocuous, minor paperwork glitch or worse, aircraft damage or personal injury. There are two major issues, among many, that need to be addressed in line maintenance that will become of paramount importance to the industry in the coming years. Training is one and an industry acceptance and commitment to some form of Maintenance Resource Management (MRM) is the other. Unions, airline management, and employee groups have dabbled in MRM to varying degrees but with limited awareness or buy in from front line personnel. A casual inquiry of people within the industry suggests very little awareness of the concept of MRM and little if any use in daily operations. Recruiting and training freshly minted maintainers to replace future vacancies is rapidly becoming a hot issue with all airlines. The sheer quantity of required hands-on training and modifying it to accommodate changing industry demographics and the increased bias toward line maintenance entry-level placement are other front burner challenges. This dynamic will place heavy emphasis on modified training methodology to help facilitate the influx of new maintainers into the line environment.

A World of Differences: Depot Maintenance and Line Maintenance Any conversation regarding human factors in aviation maintenance would not be complete without understanding the differences in aircraft maintenance between depot-level maintenance and line maintenance. Depot level maintainers operate in a factory-type environment, usually a Operational Issues in Aviation Psychology 211 large building or hangar. The aircraft they work on are essentially reverse manufactured. The aircraft is generally in work for weeks at a time. Processes, tooling, ancillary equipment, stands, scaffolds, and work bays, are fixed and organized for long-term maintenance and overhaul activities. In this “factory” environment, it is easier to maintain control over the maintenance process based on the redundancy of tasks and the length of the aircraft stay. Line maintenance, in contrast, is performed on “live” aircraft. These aircraft are on the ground generally less than 8 hours (for overnight layovers) and as little as 45 minutes for thru flights. Aircraft line maintenance is performed at the gates, often while passengers are departing or boarding the aircraft. Line maintainers often have minutes to assess the nature of a maintenance issue with the aircraft and the impact it may have on the flight. Split-second decisions, based on an aircraft system diagnosis have to be made to mitigate the impact of a delayed flight on passengers. In contrast to depot- level maintenance, as a rule, the line maintainer works outside in conditions ranging from blistering hot to freezing cold. Wind, snow, rain, and darkness of night, remote locations far removed from immediate logistical support, are many of the challenges faced by line aviation maintainers. In addition, maintainers are often deployed on road trips to remote locations around the world where aircraft have broken. Long hours and logistical hurdles are but a few of the obstacles they face when deployed for days or weeks at a time often in locations bereft of industrial amenities. It is within this operating envelope that the line aircraft maintainer faces ubiquitous human factors tripping hazards.

Training, an Evolving Mission Requirement As today’s airline industry continues to evolve into a leaner operating entity, one of the major challenges to their respective maintenance organizations revolves around replacing an aging cadre of line maintainers, collectively possessing thousands of years of knowledge, with their successors. Recruiting, vetting, and thousands of hours of classroom training are but a few of the hurdles facing airlines. The process of hands-on learning, actual aircraft touch time, preferably in the presence of an experienced mentoring mechanic, remains the greatest challenge to integrating the next generation of maintainers into airline line maintenance. Historically, the number of new or inexperienced AMTs (aircraft maintenance technicians) filtering into an operation was relatively small as a percentage compared to the existing pool of experienced cadre. The operations could easily “absorb” them with little to no impact to the operation. Today and well into the future, Part 121 operators will be integrating greater percentages of ab initio maintainers directly into their line operations. Of course, not all airlines or maintenance operations are created equal and the stress from a disproportionate influx of neophyte AMTs can have an operational impact on some maintenance stations. This is starting to occur today. Geographically hard-to-fill stations, SFO, LAX, BOS, 212 Improving Aviation Performance and operations in the New York region are experiencing a constant churn of talent as the cost of living drives experienced people away and creates opportunity for those getting their start in the field. They, in effect, become entry-level operations, never fully reaching full operational readiness in terms of a mature knowledge base. It is this experience, or lack thereof, that has a profound effect on maintenance effectiveness. This dilution of experience can also be exacerbated by an operation or an airline’s fleet composition. Depending on the airline, aircraft fleet types range from homogeneous (e.g., Alaska and Southwest) with all Boeing 737 fleets to a dizzying amalgam of aircraft types and engines. Airlines like American, United, and Delta operate fleets from three major manufacturers, Douglas, Boeing, and Airbus and from smaller manufacturers Embraer, Canadair, among others. Aircraft systems, structures and operating architecture spanning over 50 years of design operate in today’s airline industry. One airline alone operates over 17 different types of aircraft and sub-fleets. Each of those fleets can be equipped with different engine types. A Boeing 767 can be equipped with Pratt & Whitney or General Electric engines, a 757 with Rolls Royce or Pratt & Whitney. Three derivatives of the venerable 60s era McDonnell Douglas DC-9, including the MD-88, Boeing 717 (pre-Boeing merger MD-95), and the MD-90 are still in operation today with Delta Airlines. All utilize three entirely different engine types. The list goes on but the challenges of switching mental models when diagnosing multiple aircraft within a diverse fleet can be a daunting proposition for a seasoned line maintainer, certainly a less seasoned one. In the US airline industry, pilots may be required to be proficient in one fleet type, but the maintainers have to be adroit at handling nearly every aircraft type the airline operates. Broad band provider could best describe the job duties of the typical US airline line maintenance person. From wing tip to wing tip, nose to tail, fleet by fleet, they have to mitigate delays, repair, assess damage and troubleshoot, all in a time constrained environment. Technicians with years of accumulated experience do it every day. Filling their shoes with the next generation will, in the opinion of the author, be one of the biggest challenges the industry has faced in its history.

Maintenance Resource Management: Coming to Grips with a New Operational Reality Maintenance Resource Management (MRM) is a relatively untouched concept in Part 121 line operations. Although attempts have been made to adapt a pilot centric version to maintenance over the years, a widely used, stand-alone maintenance-oriented version has proven to be an elusive tool in the line maintenance arsenal. One of the numerous problems faced in line operations is intra- and inter-team communication continuity. With multiple shifts working in locations scattered throughout any given airport, there is Operational Issues in Aviation Psychology 213 an inherent risk of missed communication and interruption of work flow continuity. Exposure to formal human factors or MRM-related training is generally relegated to a one- or two-day class that deals with subject matter on a peripheral level. Follow-up training is relegated to a yearly computer- based training “refresher.” Other than this cursory training, there is little to no MRM trickle down pertaining to operational level line mechanics. With the complexity of today’s line maintenance work environment, operational prudence would dictate a much higher level of MRM exposure and an enhanced frequency of training and awareness. As mentioned earlier, the industry is facing an influx of “green talent,” many of whom will possess little to no prior operational experience in this field. Simply immersing them in this environment without a comprehensive knowledge of the human factors exposure they will face and the use of MRM in their daily work lives is simply not an option. This is a challenge that is germane to management and maintainers.

A Tip of the Proverbial Iceberg Line aircraft maintenance is an occupational crucible. For some it can best be described as working in a Level 1 trauma center of a hospital. The patients, in this case, are an endless barrage of potentially ailing aircraft. Each one containing passengers and flight crew whose destinations are directly affected by the split-second decisions rendered by a line maintainer who will allow the flight to either depart, incur a delay, or worse, cancel. The Part 121 maintainer, working with a level of professional autonomy that is unheard of in many industries as well as their military counterparts, performs these tasks and makes these decisions daily. He or she will repeat this process many times a day on a multitude of different aircraft types and systems. Yet, it’s an industry that has remained remarkably opaque to the public and the research world given the gravity of the work scope it encompasses. While volumes of research have been dedicated to every aspect of pilot operations, maintenance seems to operate in near anonymity. A brief glimpse into maintenance human factors related material in Europe and Australia suggest an interest and commitment to the field by regulators and researchers alike. In the United States, research and regulatory awareness of these challenges, in the opinion of the author, lags far behind the visibility given to pilot work groups. The work scope of the majority of Part 121 maintainers, mainly as a result of the airline industry outsourcing a large portion of their in-house depot-level maintenance, evolved almost overnight placing entry-level maintainers directly into front-line operations on a growing scale. In the years to come, demographics will dramatically alter the makeup of the workforce as attrition strips away thousands of years of accumulated expertise and forces an unprecedented learning curve on a very complex and unforgiving industry. 214 Improving Aviation Performance

This chapter briefly touched on two human factors issues the author felt were broad enough to encompass important challenges in the industry. There are many other unaddressed human factors tripping hazards that, based on conversations with maintainers from several airlines, reflect a common concern. Among them are work place distractions, paperwork that is confusing, contradictory and inaccurate, ambiguous duty time limitations and ever present time pressure. These alone could be the grist for a lengthy human factors treatise on tripping hazards of a typical US airline maintainer. Airlines are not the only industry facing a shortage of skilled technicians (and pilots) in the United States. Indeed, all industries are scrambling to find qualified replacements for the impending flood of retirements. Airlines, however, are a very high profile transportation entity whose existence is predicated on safety and reliability. Mitigating the long learning curve associated with creating the next generation of line maintainers mandates an unorthodox approach to a process that has hitherto been approached with a “business as usual” operating system. Some airlines are only now starting to realize this and manage it.

Artificial Intelligence in Aviation Richard J. Ranaudo The integration of Artificial Intelligence (AI) is perhaps the most pressing issue in modern flight operations. Situation awareness that is critical to mission accomplishment, which in the past was based on human observational skills, is being replaced by artificial intelligence that is telling the pilot what he or she should be thinking or doing. The format for this information through virtual visual and auditory cues can be quite compelling, and may give rise to pilots accepting an AI assessment without question. Years ago, a squadron mate of this author bailed out of an F-4 after running out of fuel because he believed the INS instead of his own dead reckoning. If one watches the newest generation, they can become totally immersed in a virtual world and think it is “reality.” There was an interesting video of an Elon Musk (Space X) interview, where he indicated that AI can become a credible threat to our societal norms— replacing human function with a robotic response to everything. There is now talk of pilotless passenger carrying airplanes; having already done this in automobiles. The technology is advancing so fast, it is a concern that too many decisions and actions in future aviation may leave out the human altogether. Adaptability is perhaps the most important trait of humans, but the growing dependency on virtual solutions may remove that trait from Operational Issues in Aviation Psychology 215 the cockpit. A good example of adaptability, by the way, was the incredible job that Sully Sullenberger did in saving the lives of all on board when he lost both engines due to bird ingestion that occurred over one of the most highly populated areas in the United States.

Acknowledgments Helena Cunningham and Dan Handlin would like to thank Captain Lindsay Fenwick, Captain Scott Hammond, and Captain Brian Moynihan for their being incredibly supportive, helpful, and instrumental in bringing the important concept of Human-Centered Operating Principles to light in the airline industry. Without their inspired counsel, this concept would not have seen the success it currently has.

References Stokes, D. E. (1997). Pasteur’s quadrant: Basic science and technological innovation. Washington, DC: Brookings Institution Press. Transportation Safety Board of Canada, Aviation Investigation Report A13H0002: Collision with water, Government of Canada, Department of Transport, MBB BO 105 S CDN-BS-4 (helicopter) C-GCFU M’clure Strait, Northwest Territories, 09 September 2013 (published 07 December 2015) Transportation Safety Board of Canada, Aviation Investigation Report A13A0075: Loss of control and collision with water, Government of Newfoundland and Labrador, Air Services Division, CL-415 C-FIZU, Moosehead Lake (Newfoundland and Labrador), 03 July 2013 (published 28 August 2014) Transportation Safety Board of Canada, Aviation Investigation Report A14F0065: Unstable approach and hard landing, Air Canada Rouge LP Airbus A319, C-FZUG, Sangster International Airport, Montego Bay, Jamaica, 10 May 2014 (published 09 January 2017)

11 Standardized Scenarios for Air Traffic Control Researchers

Jerry M. Crutchfield and Angel M. Millan

CONTENTS Scenario Development ...... 221 Selection of Airspace ...... 221 Selection of Scenario Events ...... 224 Composition of Air Traffic and Validation of Scenarios ...... 227 Identifying Recommended Performance Measures ...... 228 Resulting Products ...... 228 Airspace Materials ...... 229 Air Traffic Scenarios ...... 229 Performance Measures ...... 229 Scenario Usage ...... 229 Author Notes ...... 234 References ...... 234

Researchers of Air Traffic Control (ATC) human factors often employ human- in-the-loop (HITL) simulations in attempts to answer either theoretical or applied questions about controller performance under various conditions of interest. This technique involves presenting a high-fidelity representation of the controller task to participants who will control simulated traffic under the specified conditions. While the participants control the simulated traffic, researchers record human performance data for analysis, possibly including biometric data, such as electrical activity of the brain and eye movements. After the participants complete the simulated air traffic scenarios, research- ers will sometimes conduct interviews to learn how useful the participants thought new systems were or whether the participants believed that the condition had a negative impact on safety, workload levels, or other factors. Results of these simulations can inform developers of new systems, such as the Next Generation Air Transportation System (NextGen), so that the proce- dures guiding human-system interactions and the system’s design features promote safe and efficient system operation and maintenance.

217 218 Improving Aviation Performance

Human-in-the-loop simulations of the ATC task enable researchers to control many of the conditions that cannot be controlled in the field and make possible the collection of performance data that in the real world might pose a risk to the safe, orderly, and expeditious flow of traffic. Using this technique to evaluate proposed ATC procedures and technologies using pre- production prototypes can also save system developers the costs associated with redesigning a more fully-developed system when problems are found after operational implementation. However, using HITL simulations can be a complicated, time-consuming, and costly process in itself. Even when researchers have access to an ATC simulation platform with reasonably high fidelity, realistic air traffic scenarios must be scripted. If these scenarios are based on real-world traffic situations, suitable data must be selected and acquired about those situations. It is also necessary to take the time to vali- date the scenarios during and after the scripting process by using qualified ATC subject-matter experts (SMEs) to run the scenarios. Unrealistic charac- teristics found in the scenarios can call a study’s results into question. In this chapter, the authors describe validated air traffic scenarios made available for ATC human factors researchers to use in HITL simulations. The chapter also includes a description of how researchers at the Federal Aviation Administration’s (FAA’s) Civil Aerospace Medical Institute (CAMI) developed the scenarios by first identifying airspace sectors in which the scenarios would occur, choosing scenario events, and validating the sce- narios for realism. A URL where the scenarios can be freely downloaded in Microsoft Excel format is provided below. The authors hope that having access to these scenarios will save researchers, who run HITL simulations as part of their ATC human factors studies, the time and money it costs to develop and validate them on their own. In return, if researchers agree to use these scenarios and report these performance measures, it will enable the comparison of proposed new procedures and technologies and the insight that can be gained from making those comparisons for system development going forward. The FAA identifies and evaluates new ATC procedures and technologies that have the potential to enhance safety and efficiency in the US air trans- portation system. The FAA has a long history of using HITL simulations to accomplish these evaluations (Anderson & Vickers, 1953). Evaluations that examine human performance and other aspects of human factors are impor- tant because they support the identification of benefits a new procedure or technology can provide along with potential problem areas that need to be addressed prior to deployment. Although the use of simulations has limita- tions and is not without its own challenges (Buckley, DeBaryshem, Hitchner, & Kohn, 1983), it remains the primary means of evaluating ATC procedures and technologies prior to using them with live air traffic. Some of the challenges associated with HITL simulations in the ATC domain are related to the comparison of results and the control of extrane- ous variables (Buckley et al., 1983). The typical method used to evaluate a Standardized Scenarios for Air Traffic Control Researchers 219 new technology or procedure is to present an ATC SME participant ( usually either an active or retired certified professional air traffic controller) with a simulation of an ATC workstation, such as the En Route Automation Modernization (ERAM) or Standard Terminal Automation Replacement System (STARS) workstations, and a simulated air traffic situation for the participant to control. The participants are asked to control simulated traffic using the new procedure or technology. In many cases, the same participant will also control simulated traffic without the new procedure or technology so that performance under the two conditions can be compared. Researchers conducting the evaluations will try to use as many participants as they can, given the limited availability of resources, to attempt to increase the reliabil- ity of performance results. Reliability can be affected by extraneous variables. Extraneous variables are factors that may influence a controller’s performance results that are not related specifically to the new procedures or technologies, therefore making comparisons of controller participant performance in the two conditions less reliable. Researchers often use two techniques to control extraneous variables associated with differences in scenarios. They sometimes use the same sce- nario more than once, making only superficial or nominal changes (which, in the ATC domain, may include changing aircraft call signs to attempt to make the scenarios less recognizable). However, even if participants do not remember having previously controlled the specific aircraft in a scenario, they may have learned something that will improve their performance while controlling aircraft with identical characteristics and flight plans during the second running of the scenario. Therefore, researchers may choose to coun- terbalance the order of presentation of the conditions, whereby half of the participants get the new procedure and/or technology first when controlling the scenario and the other half get the baseline or no new procedure and/ or technology condition first. The second technique used to control extrane- ous variables associated with the scenario involves creating multiple differ- ent scenarios and trying to equate those scenarios around traffic complexity characteristics (such as number and type of aircraft, number of aircraft climbing or descending, number of aircraft with intersecting flight plans, etc.; Manning, Mills, Fox, Pfleiderer, & Mogilka, 2002) so that the difficulty of the scenarios will be more comparable to each other. Additionally, new ATC procedures and technologies should be evaluated against situations as similar as possible to those that controllers are likely to encounter in the field. Events that perturb the status quo but would not be considered rare in the context of ATC operations, such as convective weather, medical emergencies, equipment malfunction, and air traffic compression, are sometimes referred to as off-nominal events (Burian, 2008). In the interest of maintaining safety in air transportation, it is important for system devel- opers to know how a new procedure or technology will interact or behave in the context of off-nominal events. Therefore, researchers should use sce- narios both with and without off-nominal events. 220 Improving Aviation Performance

Evaluation of ATC procedures and technologies often involves more than just comparing performance results from two conditions, a baseline condi- tion and a condition with the new technology or procedure. Rather, there can be multiple proposed new technologies or procedures that need to be compared. Evaluators must try to predict what the overall changed system will look like in the context of more than one proposed change and predict how multiple new technologies will interact with each other. They must also determine the costs and benefits of each proposed change and which change, or combination of changes will provide the greatest potential benefit for their cost. Being able to compare across multiple proposed new technologies can inform the selection of proposed changes for certain types of operations and the sequence with which proposed changes should be implemented as well as provide insight regarding potential negative or beneficial synergies that are likely to occur with the proposed changes being used together. Proposals to change the ATC system come from a variety of developers and researchers both within and outside the government (such as the National Aeronautics and Space Administration’s Ames Research Center, the Mitre Corporation’s Center for Advanced Aviation System Development, the Volpe National Transportation System Center, and numerous private government contractors). These organizations coordinate some aspects of their develop- ment and evaluation work with the FAA and, in some instances, each other, and coordinate other aspects only occasionally. There exists a variety of ATC simulation platforms, including those commercially available and those developed in house at the FAA and other organizations within the indus- try. Furthermore, there are multiple accepted ways to measure many human factors measures such as workload (e.g., Hart & Staveland, 1988; Stein, 1985) and situation awareness (SA; Durso et al., 1995; Endsley, 1988). The different industry groups that develop and propose new ATC procedures and technol- ogies are unlikely to have access to the same ATC SME participants, use the same scenarios or the same airspace, use the same ATC simulation platforms, and collect and report the same performance measures as other industry groups when conducting evaluations. Attempting to compare results from HITL simulations conducted by different organizations that use different airspace and air traffic situations and report different types of performance measures presents an additional challenge to ATC system evaluators and decision makers at the FAA, in addition to those associated with using the HITL simulation evaluation method (as discussed above). Fortunately, these challenges to interpreting HITL results may not be wholly necessary. The authors initiated the Standardized Scenarios Project to provide orga- nizations, both within and outside the FAA, with a standard set of scenarios usable for HITL ATC simulations. Analyses were conducted to identify air- space (for en route and Terminal Radar Approach Control (TRACON)) appro- priate for the testing of new procedures and technologies, such as 4D trajectory planning and conflict resolution decision aids including proposed procedures and technologies associated with NextGen. Although ATC tower scenarios Standardized Scenarios for Air Traffic Control Researchers 221 could be extrapolated from the TRACON scenarios, scenarios specific to ATC towers were out of scope for the current project due to resource limitations. Interviews and working group meetings were held to identify suitable air traffic events to include in scenarios. En route and TRACON scenarios were scripted using these criteria and were validated by multiple controllers famil- iar with the given airspace. The recommended airspace, air traffic, weather events, and controller participant familiarization materials developed for this project are available for download in a format that should allow for entry into any en route or TRACON air traffic simulation platform. A set of recom- mended performance measures to be collected and documented in reports of evaluations that used these scenarios is also provided.

Scenario Development This section describes the process the project team used to identify two en route sectors and a TRACON airspace suitable to use in evaluating new pro- cedures and technologies, determine the traffic volume and traffic pattern that should be used in the scenarios, determine the number and type of sce- narios created, and develop the off-nominal events included in the scenarios. The method used to select a recommended list of performance measures to be assessed across all the scenarios is also described.

Selection of Airspace The first step in the creation of scenarios was the selection of the airspace in which the scenarios would take place. The use of generic (designed by researchers) airspace was considered as a possibility as there are certain advantages associated with using generic airspace. Generic airspace allows researchers to build made-to-order challenges into sectors and, given that no controller would have encountered the generic airspace outside of a lab, controller participants would all have the same level of experience or, rather, unfamiliarity with the sectors. However, the project team ultimately elected to use real-world sectors. There are a finite number of sectors in the National Airspace System (NAS). Testing proposed procedures and technologies for their ability to solve real-world air traffic issues in these sectors has a poten- tial to provide ancillary benefits. Furthermore, human factors research such as that in ATC will be presented to the user community and can readily be subjected to criticisms associated with face validity. Using real-world sec- tors improves the face validity of the findings. The project team believed that these advantages would, for this project, outweigh the advantages of using a generic sector. However, new procedures and technologies, such as those related to NextGen, are often developed to solve issues related to traffic 222 Improving Aviation Performance complexity and volume. The benefits of simulating real-world airspace could only be achieved if the airspace selected provided enough real-world chal- lenges or opportunities to allow the procedure or technology to solve these real-world issues. Toward that end, the team developed a method for identi- fying the busiest and most complex sectors in the NAS. The primary criteria used to select sectors were traffic volume and com- plexity. Finding a suitable TRACON airspace was fairly straightforward. The team used the Air Traffic Activity Data System (ATADS) to identify the airport with the greatest number of operations annually for the year 2014, Hartsfield–Jackson Atlanta International Airport (ATL). This information supported the decision to use the Atlanta TRACON (A80) airspace, specifi- cally the terminal arrival sector for ATL shown in Figure 11.1. The project team attempted to identify complex en route sectors by requesting analysts at the Airborne Tactical Advantage Company (ATAC) to do an analysis of sectors in the three busiest Air Route Traffic Control Centers (ARTCCs): Atlanta (ZTL), Chicago (ZAU), and New York (ZNY). The sector analysis examined air traffic characteristics across a two-year time span (2013–2015). Traffic characteristics requested included average number of aircraft in the sector per hour, number of climbing or descending

FIGURE 11.1 Map of terminal arrival sector L for west arrivals to ATL in A80 airspace. Standardized Scenarios for Air Traffic Control Researchers 223 aircraft per hour, number of potential aircraft conflicts per hour, and num- ber of adjacent sectors with which that sector controller would coordinate. Sectors with an average of fewer than 25 aircraft per hour were eliminated as possible candidates for simulation because there was not enough activ- ity. The remaining sectors were compared with regard to how many climb- ing and descending aircraft and how many potential conflicts occurred per hour. The project team decided that, to increase opportunities to evaluate a wide variety of procedures and technologies associated with en route air- space, it would be necessary to provide scenarios for both a low altitude and a high altitude en route sector. Low and high altitude sectors have different characteristics that may differentially affect the way the new procedures/ tools are used or may differentially affect their utility for resolving the problem they were created to resolve. Certain sectors in both ZAU and ZNY were comparable in complexity, depending upon how one weighted the selected traffic characteristics; however, the team selected two sectors in ZNY that were adjacent to each other: a high altitude sector designated as ZNY10 (see Figure 11.2) and a low altitude sector designated ZNY27 (see Figure 11.3). The candidate sectors were presented to project sponsors familiar with proposed changes to the air transportation system associated with NextGen. The sponsors assessed the sectors for opportunities to apply procedures and technologies such as optimized profile descents and the integration of unmanned aerial vehicles into civilian airspace. The

FIGURE 11.2 Map of high altitude en route sector ZNY10. 224 Improving Aviation Performance

FIGURE 11.3 Map of low altitude en route sector ZNY27. sponsors concurred with the selection of the two adjacent sectors at ZNY. It was determined that since the two sectors were adjacent to each other, selecting them would leave open the possibility of running traffic through the two sectors during simulations with two participants controlling the sectors simultaneously and coordinating with each other. This presents an opportunity to collect data regarding coordination between sectors at a future date.

Selection of Scenario Events Given the importance of knowing how a new procedure or technology will impact operations during off-nominal events, the project team decided to include off-nominal events in the standardized scenarios. Events are occur- rences of interest scripted to take place during a scenario. The project team began the identification of suitable off-nominal events to include in the sce- nario by using a list of off-nominal events collected as part of a previous project conducted to identify scenarios to evaluate trajectory-based opera- tions proposed as part of NextGen (Crutchfield & Pfleiderer, 2009). That list was created from input from controller, pilot, and weather SMEs across five knowledge elicitation sessions that occurred during 2008. The list was then updated using a hazard analysis that specifically analyzed new procedures and technologies associated with NextGen (Sawyer, Berry, & Blanding, 2010). Standardized Scenarios for Air Traffic Control Researchers 225

As the scenarios developed for this project are meant to be used during ATC HITL simulations, any event identified for the previous project that speci- fied a scripted error on the part of controller participants was dropped from the list, although pilot errors or errors on the part of controllers for adja- cent scenarios were left in. Also, considering that a new technology being evaluated may replace current systems being used or improve the reliabil- ity of current technology, events that stemmed from specific equipment failures (such as communication systems failures) were dropped from the list. Other events dropped from the list were events deemed by the project team to occur too rarely to be considered off-nominal (e.g., special handling of Air Force 1) or that would result in such a significant change to opera- tions that the situation might be considered a better measure of emergency procedures than of a new technology or procedure for normal operations (e.g., ). Many events collected from the SMEs, interviewed in 2008, were minor variations of other events on the list. Events that would require similar responses from a controller were combined under more generalized event titles. The resulting list of off-nominal events follows:

• Adjacent Sector Does Not Accept a Handoff • Adjacent Sector Fails to Handoff • Aircraft Icing • Aircraft–Aircraft Conflict • Aircraft–Airspace Conflict • High Traffic Load • Line of Severe Weather • Medical Emergency • Performance Limiting Aircraft Equipment Failure • Pilot Deviation • Pilot Requested Reroute • Pop-up Storm • Restricted Visibility • Runway Configuration Change • Traffic Management Unit (TMU) Route Change • Traffic Compression • Turbulence

Determination of how many and what type of anticipated non-routine, adverse, and off-nominal events are needed to adequately evaluate a new system presents a challenge to system evaluators. Ubiquitous limitations in the resources needed to run HITLS suggest that the number be kept to a minimum. Therefore, the team decided to select just two high profile 226 Improving Aviation Performance off-nominal events from the list to include in the standardized scenarios: pop-up storm and high traffic load. Severe weather occurs in the NAS on a frequent basis and has the potential to impact traffic flows across the NAS for many hours. Additionally, evaluators and decision makers are inter- ested in knowing how new procedures and technologies will perform in the face of high air traffic loads predicted to occur years into the future. While other off-nominal events are fairly trivial to script into a scenario, these two events are relatively more complex and workload-intensive to create. Therefore, the project team decided that scripting scenarios with a pop-up storm and a high traffic load event would not only allow new procedures and technologies to be evaluated against two very important off-nominal events, but would also provide a greater value to the user community than just a number of minor variations to otherwise similar scenarios. The project team used the two identified off-nominal events and their pre- vious experience in running HITL evaluations to define a list of scenarios that would be developed for this project with the time and resources avail- able. All scenarios were designed to be 40 minutes in length. Data suggest that, on average, controllers spend 55 minutes on position between breaks (Bailey, 2012). Forty minutes was chosen so as to approach 55 minutes but minimize the amount of time controller participants would be needed while providing sufficient time to collect a useful amount of performance data. Two versions of each evaluation scenario were developed for each sector so that one version can be used as a baseline condition, while the other version can be used with the new technology or procedure. The first 20 minutes of two moderate traffic load scenarios for each sector were scripted without any off-nominal events to allow observation of the benefits of new procedures or technologies under ideal traffic conditions. The second 20 minutes is intended to be modified by the evaluation team to include an off-nominal event representative of an equipment failure or other event directly related to the new procedure or technology being evaluated. This type of off-nominal event is suggested for inclusion to investigate how the use of the new procedure or technology would affect controllers’ behav- ior, workload, and Situation Awareness (SA) in comparison to use of existing systems during this type of off-nominal event. Two scenarios developed for each sector represent a moderate traffic load plus a severe weather system that impacts operations in the sectors. Two scenarios for each sector have a heavy traffic load. Additionally, two mod- erate traffic load scenarios for each sector will be provided as examples of scenarios that can be used to familiarize controller participants who are naïve to a given sector with the sector operations and traffic flow and also to familiarize controller participants with the new procedure or technology being evaluated. Evaluators are encouraged to create more familiarization scenarios given available time and resources. Standardized Scenarios for Air Traffic Control Researchers 227

Composition of Air Traffic and Validation of Scenarios The initial versions of the scenarios developed for this study were created by a retired controller employed with the ATAC Corporation using an I-Sim sim- ulator provided by Kongsberg Geospatial. The SME created 40-minute long scenarios from real-world air traffic data recorded in the summer of 2014 for the specified sectors using Performance Data Analysis and Reporting System (PDARS) data. The SME had extensive experience creating scenarios using this simulator but had no experience controlling traffic at either ZNY or A80 and, thus, could not tell if the translation process created air traffic scenarios that were entirely realistic. Therefore, once the draft scenarios were created, the SME ran the scenarios for other retired controllers over WebEx for their review. During this review, a retired controller from ZNY familiar with those two sectors reviewed the scenarios developed for ZNY10 and ZNY27, and a retired controller from A80 reviewed the TRACON scenarios developed for arrival Sector L. While they watched the scenarios, the retired controllers commented about flights that needed changing to maintain air traffic realism and fidelity. The developer made changes to the scenarios in real time. After the ATAC Corporation delivered the scenarios to the project team, the project team began validating them by using a second set of retired controllers, familiar with the respective airspace and sectors, to run the sce- narios again. These scenario runs were conducted using a high-fidelity sim- ulation of an ERAM workstation, again provided by Kongsberg Geospatial. The scenarios used live pseudo-pilots to perform the associated flight deck/ controller communications. Comments on how to improve these scenarios were collected from this second set of retired controllers and the scenarios were changed accordingly. At this time, other minor changes were made to each scenario to try to further equate them in complexity with other sce- narios created for that sector, where intended. The project team further validated the scenarios by using a third set of two retired controllers, one experienced with working traffic at ZNY10 and ZNY27 and another with experience working traffic at A80, to run the scenarios with pseudo-pilots and make comments about the scenario characteristics. Using these comments, the final changes to the scenarios were made. The second and third sets of retired controllers were also asked to help the project team create presentations to be used in familiarizing control- lers naïve to the ZNY sectors and A80 airspace with Letters of Agreement (LOAs), traffic flows, sector boundaries, and other types of information necessary to be able to control simulated traffic in those airspaces. The familiarization material was then presented to naïve retired controllers who subsequently controlled two scenarios from each airspace. These naïve controllers were interviewed afterwards to identify information in the presentations that needed further clarification or recommend additional 228 Improving Aviation Performance information that controller participants would need to be able to success- fully control traffic in these sectors. The familiarization material was then improved accordingly.

Identifying Recommended Performance Measures The collection and report of a common set of performance measures will greatly facilitate the process of comparing results from HITL simulations using these scenarios. Therefore, the project team recommended a set of performance measures to be collected by system evaluators. The project team assessed candidate ATC performance measures from three primary sources: a literature review conducted by Hadley, Guttman, & Stringer (1999), a performance measurement data base created and populated for use by the FAA by Esa Rantanen (2006), and the output files generated by the Kongsberg Geospatial en route and TRACON simulators that con- tained measures reflecting controller actions. Due to the significant costs involved in collecting brain activity and eye-movement data, the project team chose not to include them in the list of recommended measures, even though they can be useful for understanding human interaction with the ATC system under study. The project team also chose not to include measures intended to compare the knowledge and abilities of controllers rather than the impact of new procedures and technologies on control- ler performance, although these types of measures can be highly useful for other types of human factors studies. Human factors constructs, such as workload and situation awareness, can be usefully collected and mea- sured by a variety of techniques. The project team used personal knowl- edge and experience to select current measurement techniques most frequently used in research involving new ATC systems. The resulting list was reviewed by ATC human factors SMEs. The SMEs added additional measures based on personal knowledge. The list is provided in the URL provided below.

Resulting Products All the products of the standardized scenarios project, including airspace definition, air traffic flight plans and recommended performance measures, may be downloaded at https://www.faa.gov/data_research/research/med_ humanfacs/humanfactors/atc_scenarios. In this section the authors provide an overview of the available products. Standardized Scenarios for Air Traffic Control Researchers 229

Airspace Materials The project team identified airspace at Atlanta TRACON A80 and New York ARTCC ZNY10 and ZNY27 to represent in the standardized scenarios. The boundaries, altitude definition, waypoints and fixes, routes, airports, and winds can be found in the Microsoft Excel file “TRACON A80 Airspace Definition.xlsx” attachment. The sector boundaries within A80 can be found in the “Sectors within A80.xlsx” attachment. The boundaries, altitude defini- tion, waypoints and fixes, routes, airports, and winds for ZNY10 and ZNY27 can be found in the “En Route High ZNY10 Airspace Definition.xlsx” and “En Route Low ZNY27 Airspace Definition.xlsx” attachments. The project team created briefing materials to familiarize controller par- ticipants with the airspaces being represented. Attachments Sector A80 Briefing.pdf, Sector ZNY10 Briefing.pdf, and Sector ZNY27 Briefing.pdf are pdf versions of those briefings.

Air Traffic Scenarios The project team developed and validated eight scenarios for each of the three airspaces. The scenarios are listed in Table 11.1 with a short description and the name of the associated attachments.

Performance Measures The authors make available a list of performance measures that the project team suggests researchers collect while controller participants are running the standardized scenarios. The file with the list of performance measures is named “Suggested Performance Measures for Standardized Air Traffic Scenarios.” For researchers interested in using the en route scenarios and the “Situation Present Assessment Method” to assess participant situation awareness, SA queries suitable for presentation at roughly 5-minute intervals during these scenarios are provided in a file named “En Route Situation Awareness Queries.” These queries were created by a retired en route ATC SME. Additionally, although these SA queries follow the convention used in other HITL studies conducted at CAMI, they have not yet been used during a data collection run. At the very least, these queries can be used as examples of the types of queries researchers can develop on their own.

Scenario Usage The purpose of this chapter is to provide a standard set of validated air traffic scenarios for ATC human factors researchers to use in HITL simula- tions. The project team developed and validated eight scenarios suitable 230 Improving Aviation Performance ) Continued ( Attachment 1.xlsx 2.xlsx 1.xlsx 2.xlsx Scenarios A80 Weather Cells for TRACON 1 and 2.xlsx Scenarios A80 Weather Cells for TRACON 1 and 2.xlsx Scenario 1.xlsx Scenario 2.xlsx Scenario 1.xlsx TRACON A80 Familiarization Scenario TRACON A80 Familiarization Scenario TRACON Scenario A80 Moderate Traffic TRACON Scenario A80 Moderate Traffic TRACON Scenario 1.xlsx and A80 Weather TRACON Scenario 2.xlsx and A80 Weather TRACON Scenario 1.xlsx A80 Busy Traffic TRACON Scenario 2.xlsx A80 Busy Traffic TRACON En Route Low ZNY27 Familiarization En Route Low ZNY27 Familiarization En Route Low ZNY27 Moderate Traffic Description with sector or new procedure/technology with sector or new procedure/technology to the system being evaluated in a system related failure to the system being evaluated in a system related failure airspace airspace with sector or new technology/procedure with sector or new technology/procedure to the system in a system related followed by failure being evaluated Scenario to be used to familiarize controller participants Scenario to be used familiarize controller participants Scenario to be used familiarize controller with no events for 20 minutes followed byLight traffic with no events for 20 minutes followed by Light traffic the with a weather cell traveling through Light traffic the with a weather cell traveling through Light traffic load at near capacity for this configuration Traffic load at near capacity for this configuration Traffic participants Scenario to be used familiarize controller participants Scenario to be used familiarize controller load for this sector in 2014 20 minutes traffic Average Type of AirspaceType Arrivals Arrivals Arrivals Arrivals Arrivals Arrivals Arrivals Arrivals ZNY27 ZNY27 ZNY27 TRACON A80 West TRACON A80 West TRACON A80 West TRACON A80 West TRACON A80 West TRACON A80 West TRACON A80 West TRACON A80 West TRACON A80 West En Route Low En Route Low En Route Low and System Failure and System Failure Cell and Weather Cell and Weather and System Failure TABLE 11.1 Air Scenarios Traffic of ScenarioType Familiarization Familiarization Moderate Traffic Moderate Traffic Moderate Traffic Moderate Traffic Busy Traffic Busy Traffic Familiarization Familiarization Moderate Traffic Standardized Scenarios for Air Traffic Control Researchers 231 ) Continued ( Attachment Scenario 2.xlsx 1.xlsx and Cells for En Route Low ZNY27 Scenarios 1 and 2.xlsx Weather 2.xlsx and Cells for En Route Low ZNY27 Scenarios 1 and 2.xlsx Weather 1.xlsx 2.xlsx Scenario 1.xlsx Scenario 2.xlsx Scenario 1.xlsx Scenario 2.xlsx 1.xlsx and Cells for En Route High ZNY10 Scenarios 1 and 2.xlsx Weather En Route Low ZNY27 Moderate Traffic En Route Low ZNY27 Moderate Traffic Scenario En Route Low ZNY27 Weather Scenario En Route Low ZNY27 Weather Scenario En Route Low ZNY27 Busy Traffic Scenario En Route Low ZNY27 Busy Traffic En Route High ZNY10 Familiarization En Route High ZNY10 Familiarization En Route High ZNY10 Moderate Traffic En Route High ZNY10 Moderate Traffic Scenario En Route High ZNY10 Weather Description above average for this sector in 2014 above average for this sector in 2014 % % followed by failure in a system related to the system in a system related followed by failure being evaluated reroutes requiring cell traveling through reroutes requiring cell traveling through with sector or new procedure/technology with sector or new procedure/technology to the system in a system related followed by failure being evaluated to the system in a system related followed by failure being evaluated reroutes requiring cell traveling through Average traffic load for this sector in 2014 20 minutes traffic Average load for this sector in 2014 with a weather traffic Average load for this sector in 2014 with a weather traffic Average load 15 Traffic load 15 Traffic participants Scenario to be used familiarize controller participants Scenario to be used familiarize controller load for this sector in 2014 20 minutes traffic Average load for this sector in 2014 20 minutes traffic Average load for this sector in 2014 with a weather traffic Average ) Type of AirspaceType ZNY27 ZNY27 ZNY27 ZNY27 ZNY27 ZNY10 ZNY10 ZNY10 ZNY10 ZNY10 En Route Low En Route Low En Route Low En Route Low En Route Low En Route High En Route High En Route High En Route High En Route High ontinued C (

and System Failure Cell and Weather Cell and Weather and System Failure and System Failure Cell and Weather Air Scenarios Traffic TABLE 11.1 TABLE of ScenarioType Moderate Traffic Moderate Traffic Moderate Traffic Busy Traffic Busy Traffic Familiarization Familiarization Moderate Traffic Moderate Traffic Moderate Traffic 232 Improving Aviation Performance Attachment 2.xlsx and Cells for En Route High ZNY10 Scenarios 1 and 2.xlsx Weather Scenario 1.xlsx Scenario 2.xlsx En Route High ZNY10 Weather Scenario En Route High ZNY10 Weather En Route High ZNY10 Busy Traffic En Route High ZNY10 Busy Traffic Description above average for this sector in 2014 above average for this sector in 2014 % % cell traveling through requiring reroutes requiring cell traveling through Average traffic load for this sector in 2014 with a weather traffic Average load 15 Traffic load 15 Traffic ) Type of AirspaceType ZNY10 ZNY10 ZNY10 En Route High En Route High En Route High ontinued C (

and Weather Cell and Weather Air Scenarios Traffic TABLE 11.1 TABLE of ScenarioType Moderate Traffic Busy Traffic Busy Traffic Standardized Scenarios for Air Traffic Control Researchers 233 for use in evaluating TRACON procedures/technologies, eight scenarios suitable for use in evaluating en route procedures/technologies in low altitude airspace, and eight scenarios suitable for use in evaluating en route procedures/technologies in high altitude airspace. The scenarios are scripted to re-create real-world airspaces that analyses have shown are associated with highly complex traffic situations. The TRACON airspace is based on Atlanta TRACON A80 Sector L, and the en route sectors are based on New York Center’s sectors ZNY10 and ZNY27. These scenarios included representations of severe weather and high traffic load, for the purpose of demonstrating the performance of new procedures/ technologies when challenged by such real-world events. All the scenarios were vetted by retired controllers who had experience working either A80 Sector L or ZNY10 and ZNY27. The scenarios and other materials provided by this project were designed such that evaluators can use controller participants who either work at the sectors represented or else are naïve to the sectors but can learn about them through familiarization materials and training scenarios. When running the HITL scenarios, it is expected that the evaluators will use a repeated measures design where every participant runs every scenario in turn. It is expected that evaluators will use one of each type of scenario provided (moderate traffic, weather, busy traffic) as a baseline condition using current procedures and technologies and use a second scenario of each type of sce- nario provided in an experimental condition that may, for example, include the use of new procedures/technologies. It is further expected that the order of the scenarios used (baseline vs. new procedure/technology test) will be counterbalanced across participants to control for differences in difficulty level that may inadvertently exist in the scenarios. Although one of the primary goals of the project team was to provide sce- narios that would allow evaluating the effectiveness of new procedures/ technologies under conditions that might stress controllers, another goal was to fashion the scenarios to be independent of any new procedures or technologies being evaluated. In some cases, the change to be evaluated may have an impact on air traffic flows into the airspace or on the structure of air- space routes, areas, and restrictions within the airspace being represented. In these cases, it is suggested that when running the baseline condition, evalu- ators use the air traffic provided in these scenarios. When running the con- dition that uses the new procedure or technology, evaluators are justified in changing the sequencing or spacing of aircraft into the airspace or the routes and/or airspace objects in the airspace. It is recommended that changes to the airspace simulated only be made if the changes are similar to changes that would be made to any airspace using the new procedure or technology rather than changes that would be unique to just these sectors. The project team intends the scenarios to be used by a variety of organi- zations, both within and outside of the FAA, when evaluating new proce- dures and technologies such as those associated with the NextGen. It is the 234 Improving Aviation Performance authors’ hope that by providing these scenarios, it will reduce the time and costs associated with using HITLS by identifying suitable airspace, scripting air traffic and air traffic events, and validating scenarios with qualified ATC SMEs for researchers in advance. Having these ready-made scenarios avail- able enables research that might not otherwise be possible. Furthermore, the authors hope that these scenarios will be adopted widely by the industry and, in so doing, will facilitate the comparison of performance results from a variety of new procedures and technologies, by decision makers at the FAA and elsewhere. In this way, air transportation development can benefit from the additional insight gained.

Author Notes This research was supported and funded by the FAA NextGen Organization’s Human Factors Division, ANG-C1. Correspondence regarding the sce- narios should be directed to: Jerry M. Crutchfield at the Federal Aviation Administration’s Civil Aerospace Medical Institute, 6500 S. MacArthur, Building 13, Oklahoma City, OK 73169. (Phone: 405-954-4537; email: jerry. [email protected])

References Anderson, C. & Vickers, L. (1953). Application of simulation techniques in the study of terminal area air traffic control problems (Report No. 192). Washington, DC: Civil Aeronautics Administration Technical Division. Bailey, L. (2012). Analysis of en route operational errors: Probability of resolution and time- on-position (DOT/FAA/AM-12/2). Atlantic City International Airport: Federal Aviation Administration Technical Center. Buckley, E.P., DeBaryshe, B.D., Hitchner, N., & Kohn, P. (1983). Methods and measure- ments in real-time air traffic control system simulation (Report No. DOT/FAA/ CT-83/26). Atlantic City, NJ: Federal Aviation Administration Technical Center. Burian, B.K. (2008). Perturbing the system: Emergency and off-nominal situations under NextGen. International Journal of Applied Aviation Studies, 8, 114 –127. Crutchfield, J. M. & Pfleiderer, E. (2009). CAMI far-term TBO scenarios. Oklahoma City, OK: Authors. Durso, F.T., Truitt, T.R., Hackworth, C.A, Crutchfield, J.M., Nikolic, D., Moertl, P.M., Ohrt, D., & Manning, C.A. (1995). Expertise and chess: Comparing situation awareness methodologies. In D. Garland & M. Endsley (Eds.), Proceedings of the International Conference on Experimental Analysis and Measurement of Situation Awareness (pp.295-303). Daytona Beach, FL: Embry-Riddle Aeronautical University Press. Standardized Scenarios for Air Traffic Control Researchers 235

Endsley, M. (1988). Situation awareness global assessment technique (SAGAT). In Proceedings of the IEEE 1988 National Aerospace and Electronics Conference – NAECON 1988, 3, 789-795. New York: Institute of Electrical and Electronics Engineers. Hadley, G. A., Guttman, J. A., & Stringer, P. G. (1999). Air traffic control specialist per- formance measurement database (DOT/FAA/CT-TN99/17). Washington, DC: FAA Office of Aerospace Medicine. Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P. A. Hancock, & N. Meshkati (Eds.), Human Mental Workload: Advances in Psychology (pp. 139–183). Amsterdam: North Holland. Manning, C.A., Mills, S.H., Fox, C., Pfleiderer, E., & Mogilka, H.J. (2002). Using air traf- fic control taskload measures and communication events to predict subjective workload (DOT/FAA/AM-02/4). Washington, DC: FAA Office of Aerospace Medicine. Rantanen, E. M. (2006). A Taxonomy of measures in air traffic control research and develop- ment. Savoy, IL, USA: Author. Sawyer, M., Berry, K., & Blanding, R. (2010). NextGen human hazard assessment report. Washington DC: TASC Inc. Stein, E.S. (1985). Air traffic controller workload: An examination of workload probe (DOT/ FAA/CT-TN84/24). Atlantic City International Airport: Federal Aviation Administration Technical Center.

Index

Note: Page numbers in italic and bold refer to figures and tables respectively.

A13A0075 investigation report 200, ATADS (Air Traffic Activity Data 200–1 System) 222 A13H0002 investigation report 199, ATC (air traffic control) 30, 218–20 199–200 ATC HITL simulations 217; airspace A14F0065 investigation report 201–2 materials 229; airspace adaptive automation concept 134 selection 221–4; air traffic aeronautical turbulence characterization composition 227–8; air traffic 55; intensity of 56; origin of scenarios 229, 230–2; ATC 55–6; simulation capability procedures evaluation 220; 56–62 available products for 228–9; aerospace industry 165 challenges 218–19; description AFOS (airline flight operating systems) 218; development scenario 207–9 221–6; events selection scenario agent models, WMC framework 105 224–6; extraneous variables, AI (artificial intelligence) 214–15 controlling 219; performance Air and Space Interoperability Council measures 228–9; reliability 219; (ASIC) 4, 15 scenarios validation 227–8; Airborne Tactical Advantage Company TRACON scenarios 220–1; (ATAC) 222, 227 usage scenario 229, 233–4 aircraft fleets 212 ATCO (air traffic control operator) 166, airline flight operating systems (AFOS) 174–5 207–9 atmospheric turbulence 55 Air Line Pilots Association (ALPA) 209 auditory system 7 Air Route Traffic Control Centers AUTO/MANUAL toggle switch (ARTCCs) 222 position 201 Air Traffic Activity Data System Automated Ground Collision Avoidance (ATADS) 222 Software (Auto GCAS) 17 air traffic control (ATC) 30, 218–20 autothrust 201–2 air traffic control operator (ATCO) 166, aviation psychology: AI 214–15; 174–5 commercial space travel 154–5; alcohol effects versus fatigue 34; method eye movements 149–50; eye and procedure 34–6; results tracking devices 144–7; human 36–45 performance issues 123–4; alcohol intoxication risk 32–4 oculometer parameters 144–9; ALPA (Air Line Pilots Association) 209 operational safety issues in 198– ARTCCs (Air Route Traffic Control 203; wearable eye trackers 154 Centers) 222 Avionics 2020 full touch screens cockpit artificial intelligence (AI) 214–15 54, 54 ASIC (Air and Space Interoperability Council) 4, 15 bad apple pilot 205, 209 ATAC (Airborne Tactical Advantage BCI (brain computer interface) Company) 222, 227 technology 133–4

237 238 Index

Before Taxi Checklist 207 DR400 4-seat airplane 129–30 Boeing 767 (flight 86) in Cold Bay: de-ice DVE (degraded visual environment) problem 192–3; engine change 17–19 2.0 193; engine replacement 189–91; landing in 184–6; EASA (European Aviation Safety participants, assembling 187–8; Agency) 32 rescue ship 396 188 electroencephalography (EEG) 27, 165 brain computer interface (BCI) electrooculography (EOG) 147 technology 133–4 emerging/prospective applications brain stimulation methods 133, 135 90–1; manufacturing safety 92; broad band provider 212 medical applications 91–2; pilot training 93; virtual C-130 Hercules 189 training 92–3 candidate function allocations: critical engine change 191; see also Cold Bay path 114; information/physical Alaska engine change resource 114, 115; metrics En Route Automation Modernization 113–14; for on-orbit maintenance (ERAM) workstation 219, 227 scenario 108, 109; time traces of en route sectors 222–3 110–13, 112, 113 En Route Situation Awareness cardiopulmonary resuscitation (CPR) Queries 229 certification 82, 87–90 EOG (electrooculography) 147 CAST (Commercial Aviation Safety ERAM (En Route Automation Team) 209 Modernization) workstation central error, SD 5 219, 227 CFIT (controlled flight into terrain) 124 European Aviation Safety Agency cockpit-mounted systems 147 (EASA) 32 cockpit touch screens, preparing for eye movements: for aviation training 54–5 149–50; challenges 154–5; EOG cognitive horizon 100 147; measurement of 145; pilot’s Cold Bay Alaska engine change: Boeing 147, 147; pilot’s psychophysical 767 landing in see Boeing state 151–4 767 (flight 86) in Cold Bay; eye-tracking technologies: in aviation overview 183–4; story of 184–6 144; main sequence 145; Commercial Aviation Safety Team oculometer parameters (CAST) 209 and 144–7; reflected eyes, computational simulation framework recordings of 144; timeline of 102; agent model 105; metrics 149; video camera 146 105–6; work dynamics 105; work models 102–4 FAA (Federal Aviation Administration) controlled flight into terrain (CFIT) 124 32, 56, 174, 218 Cook, C. C. H. 33 FAME see function allocation method critical path 114 (FAME) Cuban Eight 11 fatigue 26; versus alcohol effects 34–46; prevalence of 29–32; risk degraded visual environment (DVE) factors 26–9; SAP 36 17–19 Fatigue Risk Management Systems de-ice trucks 192–3, 193 (FRMS) 47 depot level maintenance 210–11 Federal Aviation Administration (FAA) domestic road trips 187 32, 56, 174, 218 Index 239

Fixed Calendar Repetition policies 82 HMI requirements see flap extension accidents 205, 207 Human–Machine Interface flight deck: complexity 204; operating (HMI) requirements system 203–4 HUD (heads-up displays) 17 flight-following system 199, 199 human-centered operating principles flight simulator: pilot’s eye movements (HCOP) 203–10 147; Red Bull Air Race 128–9 human information processing (HIP) flying: attentional aspect 128–30; 166, 175 commitment escalation 130–1; human-in-the-loop (HITL) simulations decision-making 130–1; DR400 217–18; challenges 218–19; 4-seat airplane 129–30; event- evaluations 226; scenarios 233 related potentials 129; motor human–machine FAME 100–2 aspects of 127 Human–Machine Interface (HMI) fMRI (functional magnetic resonance requirements 66, 74; double tap imaging) 125, 129, 165 performance 68–70, 69, 70; for fNIRS system see functional evaluated touch interactions near-infrared spectroscopy 70, 72, 72, 72; long press (fNIRS) system performance 70, 71; press FRMS (Fatigue Risk Management performance 66, 67; release Systems) 47 performance 67–8, 68; results function allocation method (FAME) 102; 73–4 human–machine 100–2; on-orbit human performance assessment maintenance scenario 106–10 163–5; cognitive state 165–7; functional magnetic resonance imaging measurement 167–8; optical (fMRI) 125, 129, 165 brain imaging sensor 169–75 functional near-infrared spectroscopy human performance issues 123–4 (fNIRS) system 126, 165; human vestibular systems 9 application 170; ATCO 174–5; in hypoxia 153, 154 brain activity assessment 169–70; UAV sensor operator 170–4 ICAO (International Civil Aviation Organization) 26 GCS (ground control station) 170–1 Immelmann turn 11 generic airspace 221 in-flight training 15–16 Gregory, R. J. 4 information resources (IR) 103, 103 ground control station (GCS) 170–1 input error, SD 5 International Civil Aviation Hartsfield–Jackson Atlanta International Organization (ICAO) 26 Airport (ATL) 222, 222 Inter-Trial Coherence (ITC) 129–30 HCOP (human-centered operating IR (information resources) 103, 103 principles) 203–10 head-mounted displays (HMD) 17 Joint Implementation Measurement Data heads-up displays (HUD) 17 Analysis Team (JIMDAT) 209 Hexapod limits integration 58–60, 61 hierarchical task analysis (HTA) 106, 107 kymograph device 145 HIP (human information processing) 166, 175 learning management system (LMS) HITL simulations see human-in-the- infrastructure 91, 93 loop (HITL) simulations line maintenance 211 HMD (head-mounted displays) 17 loss-of-control (LOC) events 124 240 Index

magnetoencephalography (MEG) 125 neuroergonomics methodology: main sequence relationship 145, 153 auditory alarm misperception Maintenance Control Center (MCC) 130; experiments 128–30; fMRI 184–6, 189 125; fNIRS 126; illustration of Maintenance Resource Management 126; MEG 125; Mirror Neuron (MRM) 210, 212–13 System 127; neuroadaptive manufacturing safety applications 92 automation application 127; MAP (monitor alert parameter) 174 tDCS 135 MCO (mental concentration) test 35, 45 neuroplasticity 12 mechanical turbulence 56 Next Generation Air Transportation medical applications 91–2 System (NextGen) 217, 220, MEG (magnetoencephalography) 125 224, 233 mental concentration (MCO) test night vision devices (NVDs) 16 35, 45 NIRS (near-infrared spectroscopy) 169 mental workload (MW) techniques NRC (National Research Council) 167–8, 176 report 28 Mirror Neuron System 127 NTSB (National Transportation Safety Mission Control Center (MCC) 99, 101 Board) 30–1, 209 modern AFOS: bad apples 205, 209; current conditions 203–4; oculometer parameters: eye-tracking HCOP initiative challenges technologies 144–7, 146; history 209; industry initiative of 147–9, 149; training tape 208–9; inferior procedures technique 149–50, 150 206–7; problematic off-nominal events 219, 224–6 procedures 204–5; One DIsplay for a Cockpit Interactive procedures and checklists Solution (ODICIS) concept 54 206; resilient procedures on-orbit maintenance scenario: 207–8; safety and production candidate function allocations 205–6; traditional SOPs 203 108–15, 109; HTA for 106, 107; monitor alert parameter (MAP) 174 inspection actions 106, 107; MRM (Maintenance Resource repair actions 107 Management) 210, 212–13 optical brain imaging sensor 169–75 multi-agent system 101 optical window 170 multivariate and univariate tests 42, orientation neurons to orientation 12 42, 43 Murphy’s Law 195 panoramic night vision goggles MW (mental workload) techniques (PNVGs) 16 167–8, 176 Part 121 operations 210–14 performance prediction, challenges National Research Council (NRC) and prospects for 94; item report 28 sequencing within curricula National Transportation Safety Board 94–5; quantifying certainty in (NTSB) 30–1, 209 96; robustness 95–6; validated near-infrared spectroscopy (NIRS) 169 measures 94 neuroadaptive automation application: photocronograph 145 BCI 133; brain-machine- physical resources (PR) 103, 103 interface-based 127; goal of 132; pilot disorientation countermeasures real-time adaptable interface 16; Auto GCAS 17; DVE 17–19; 132, 132–3 HMD 17; HUD 17; NVDs 16 Index 241

pilots: adaptive automation 134; SESAR (Single European Sky ATM decision-making 130–1; eye Research) 163 movements 148; eye tracking shear turbulence 56 solutions 146–7, 147; fatigue simulated turbulence profiles 60–2; 152; feed-forward prediction evaluation steps 62–6, 64; 132–3; hypoxia 150, 153; mental subjective evaluation 62 workload 151–2; situational Single European Sky ATM Research awareness 132; sleepiness 152; (SESAR) 163 training 93 Situation Present Assessment Method PNVGs (panoramic night vision 229 goggles) 16 sleep deprivation experiment 35, 35 Post Alcohol Impairment effect 33 SMEs (subject-matter experts) 218, 227–8 PPE see Predictive Performance SOPs see Standard Operating Equation (PPE) Procedures (SOPs) PPO see Predictive Performance spatial disorientation (SD) 3–5; Optimizer (PPO) comprehensive solutions to PR (physical resources) 103, 103 6–7; countermeasures training Practitioner Plenary Panel 197–8 13; research 7–12; training Predictive Performance Equation (PPE) 12–16 84–5; theoretical adequacy and spatial orientation mechanism 4–5; potential 85–6, 86 countermeasures training Predictive Performance Optimizer 13; training in military flight (PPO) 86–7; application simulator 15 87–90, 90; as domain-general Standard HCOP Text 208 technology 94; emerging/ Standardized Scenarios Project 220 prospective applications 90–3 Standard Operating Procedures (SOPs) prospect theory 131 201, 203; accidents due to proverbial iceberg 213–14 failure of 205; template 208 Psychomotor Vigilance Task (PVT) 28, Standard Terminal Automation 36, 45 Replacement System (STARS) pylon 189, 190, 193–4 workstations 219 subject-matter experts (SMEs) 218, Red Bull Air Race 128–9 227–8 Remote Manipulator System (RMS) 108 Suggested Performance Measures remote road trips 187 for Standardized Air Traffic repetitive error 204, 206 Scenarios 229 Researcher Plenary Panel 197–8 systemic failure 204 reversed Immelmann turn 11 systemic human error 206 RMS (Remote Manipulator System) 108 road trips 187 task allocation method 101 robustness 95–6 tDCS (transcranial direct current stimulation) 135 SD see spatial disorientation (SD) Terminal Radar Approach Control Seattle station maintenance 186–7, 192 (TRACON) scenarios 220–1, self-assessment of performance (SAP) 233 36, 44 thermal turbulence 56 semicircular canals, size and toggle switch, aircraft 200, 200–1 shape 8–12 transcranial direct current stimulation sensory cues 5 (tDCS) 135 242 Index

Transportation Safety Board of Canada validation and verification (V&V) (TSB) 198 process 168 TSB Aviation Investigation Report: vigilance hypothesis 45 A13A0075 200–1; A13H0002 virtual training 92–3 199–200; A14F0065 201–2 vision 7, 16 turbulence 55–6; as acceleration V&V (validation and verification) function and displacement process 168 58; levels and consequences 57; pilot-estimated levels 63, wake turbulence 56 65; profile 59, 60; rendering by wearable sensors: non-invasive 164–5; Hexapod 63, 65; simulation optical brain imaging sensor capability 56–62 169–75 wireless live streaming systems 150 UAV sensor operator see unmanned Work Models that Compute (WMC) aerial vehicle (UAV) sensor 100; computational operator simulation framework unaddressed line maintenance human 102–6; framework 117; factors challenges 210–11; insights 116; on-orbit MRM 212–13; training and maintenance scenario 211–12 106–10; streamlined unmanned aerial vehicle (UAV) sensor process 116 operator 170; C-STAR system 172, 172; fNIRS in 171; GCS ZNY10, en route sector 223, 223 170–1; sensor operator 171–2 ZNY27, en route sector 223, 224