<<

A Multimodal Ouija Board for Carrier Operations

by Birkan Uzun S.B., C.S. M.I.T., 2015

Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Computer Science and Engineering at the Massachusetts Institute of Technology June 2016 Copyright 2016 Birkan Uzun. All rights reserved.

The author hereby grants to M.I.T. permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole and in part in any medium now known or hereafter created.

Author ……………………………………………………………………………………………... Department of Electrical Engineering and Computer Science April 6, 2016

Certified by ………………………………………………………………………………………... Randall Davis, Professor Thesis Supervisor

Accepted by ……………………………………………………………………………………….. Dr. Christopher J. Terman Chairman, Masters of Engineering Thesis Committee

1

2 A Multimodal Ouija Board for Deck Operations

by Birkan Uzun

Submitted to the Department of Electrical Engineering and Computer Science April 6, 2016 in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Computer Science and Engineering

Abstract

In this thesis, we present improvements to DeckAssistant, a system that provides a traditional

Ouija board interface by displaying a digital rendering of an aircraft carrier deck that assists deck handlers in planning deck operations. DeckAssistant has a large digital tabletop display that shows the status of the deck and has an understanding of certain deck actions for scenario planning. To preserve the conventional way of interacting with the old­school Ouija board where deck handlers move aircraft by hand, the system takes advantage of multiple modes of interaction. Deck handlers plan strategies by pointing at aircraft, gesturing and talking to the system. The system responds with its own speech and gestures, and it updates the display to show the consequences of the actions taken by the handlers. The system can also be used to simulate certain scenarios during the planning process. The multimodal interaction described here creates a communication of sorts between deck handlers and the system. Our contributions include improvements in hand­tracking, speech synthesis and speech recognition.

3

4 Acknowledgements

Foremost, I would like to thank my advisor, Professor Randall Davis, for the support of my work, for his patience, motivation and knowledge. His door was always open whenever I had a question about my research. He consistently allowed this research to be my own work, but steered me in the right direction with his meaningful insights whenever he thought I needed it.

I would also like to thank Jake Barnwell for helping with the development environment setup and documentation.

Finally, I must express my gratitude to my parents and friends who supported me throughout my years of study. This accomplishment would never be possible without them.

5

6 Contents

1. Introduction……………………………………………………………………………..13

1.1. Overview…………………………………………………………………………13

1.2. Background and Motivation……………………………….…..….………..….....14

1.2.1. Ouija Board History and Use…………………………………………….14

1.2.2. Naval Push for Digital Information on Decks………………………....…15

1.2.3. A Multimodal Ouija Board………………………………………………16

1.3. System Demonstration………………………………………………………...…17

1.4. Thesis Outline……………………………………………………………………20

2. Deck Assistant Functionality…………………………………………………………..21

2.1. Actions in DeckAssistant……...………………………………………………....21

2.2. Deck Environment………...……………………………………………………..22

2.2.1. Deck and Space Understanding…....…………………………………….22

2.2.2. Aircraft and Destination Selection…………..…………………………...23

2.2.3. Path Calculation and Rerouting.…………………………………………23

2.3. Multimodal Interaction..…………………………………………………………24

2.3.1. Input………...……………………………………………………………24

2.3.2. Output………...………………………………………………………….24

3. System Implementation…….…………………………………………………………..28

3.1. Hardware………………….…...…………………………………………………28

3.2. Software………………...... ……………………………………………………..29

3.2.1. Libraries……………………....…....…………………………………….29

7 3.2.2. Architecture……………………...…………..…………………………...30

4. Hand Tracking...……………………….…….…..………………………...... …..……..32

4.1. The Leap Motion Sensor…....……………………………………………………33

4.1.1. Pointing Detection………....……………………………....…………….34

4.1.2. Gesture Detection…………………………....…………………………...35

5. Speech Synthesis and Recognition……………………………………………………..37

5.1. Speech Synthesis……….……...…………………………………………………37

5.2. Speech Recognition…..…...…………………………………………………..…38

5.2.1. Recording Sound………………...... ……………………………………..38

5.2.2. Choosing a Speech Recognition Library..…..…………………………...39

5.2.3. Parsing Speech Commands…....…………………………………………40

5.2.4. Speech Recognition Stack in Action……………………………………..41

6. Related Work…………………….……………………………………………………..44

6.1. Navy ADMACS.……….……...…………………………………………………44

6.2. Deck Heuristic Action Planner……....…………………………………………..44

7. Conclusion….…………………….……………………………………………………..45

7.1. Future Work…...……….……...…………………………………………………46

8. References…..…………………….……………………………………………………..47

9. Appendix…....…………………….……………………………………………………..48

9.1. Code and Documentation....…...…………………………………………………48

8 List of Figures

Figure 1: Deck handlers collaboratively operating on an Ouija Board. Source: Google Images..15

Figure 2: The ADMACS Ouija board. Source: Google Images…………………………………16

Figure 3: DeckAssistant’s tabletop display with the digital rendering of the deck [1]...... …..17

Figure 4: A deck handler using DeckAssistant with hand gestures and speech commands [1]....18

Figure 5: The initial arrangement of the deck [1]...... 19

Figure 6: Deck handler points at the aircraft to be moved while speaking the command [1]...... 19

Figure 7: DeckAssistant uses graphics to tell the deck handler that the path to destination is blocked [1]...... 19

Figure 8: DeckAssistant displays an alternate location for the F­18 that is blocking the path

[1]...... ………………………………………………………………....20

Figure 9: The logic for moving aircraft [1]...... 22

Figure 10: Regions on an aircraft carrier’s deck. Source: Google Images...... 23

Figure 11: (a) Orange dot represents where the user is pointing at. (b) Aircraft being hovered over is highlighted green [1]...... 25

Figure 12: (a) Single aircraft selected. (b) Multiple aircraft selected [1]...... 25

Figure 13: Aircraft circled in red, meaning there is not enough room in region [1]...... 26

Figure 14: Alternate region to move the C­2 is highlighted in blue [1]...... 27

Figure 15: The hardware used in DeckAssistant………………………………………………...28

Figure 16: DeckAssistant software architecture overview……………………………………....31

Figure 17: The Leap Motion Sensor mounted on the edge of the table top display……………..33

9 Figure 18: Leap Motion’s InteractionBox, colored in red. Source: Leap Motion Developer

Portal……………………………………………………………………………………………..35

Figure 19: Demonstration of multiple aircraft selection with the pinch gesture………………...36

Figure 20: A summary of how the speech recognition stack works……………………………..43

10 List of Tables

Table 1: Set of commands that are recognized by DeckAssistant……………………………….41

11 List of Algorithms

Algorithm 1: Summary of the pointing detection process in pseudocode…………………….…35

12 1. Introduction

1.1. Overview

In this thesis, we present improvements to DeckAssistant, a digital aircraft carrier Ouija

Board interface that aids deck handlers with planning deck operations. DeckAssistant supports multiple modes of interaction, aiming to improve the user experience over the traditional Ouija

Boards. Using hand­tracking, gesture recognition and speech recognition, it allows deck handlers to plan deck operations by pointing at aircraft, gesturing and talking to the system. It responds with its own speech using speech synthesis and updates the display, which is a digital rendering of the aircraft carrier deck, to show results when deck handlers take action. The multimodal interaction described here creates a communication of sorts between deck handlers and the system. DeckAssistant has an understanding of deck objects and operations, and can be used to simulate certain scenarios during the planning process.

The initial work on DeckAssistant was done by Kojo Acquah, and we build upon his implementation [1]. Our work makes the following contributions to the fields of

Human­Computer Interaction and Intelligent User Interfaces:

● It discusses how using the Leap Motion Sensor is an improvement over the Microsoft

Kinect in terms of hand­tracking, pointing and gesture recognition.

● It presents a speech synthesis API which generates speech that has high pronunciation

quality and clarity. It investigates several speech recognition APIs, argues which one is

the most applicable, and introduces a way of enabling voice­activated speech recognition.

13 ● Thanks to the refinements in hand­tracking and speech, it provides a natural, multimodal

way of interaction with the first large­scale Ouija Board alternative that has been built to

help with planning deck operations.

1.2. Background and Motivation

1.2.1. Ouija Board History and Use

The flight deck of an aircraft carrier is a complex scene, riddled with incoming aircraft, personnel moving around to take care of a variety of tasks and the ever present risk of hazards and calamity. Flight Deck Control (FDC) is where the deck scene is coordinated and during flight operations it's one of the busiest places on the ship. The deck handlers in FDC send instructions to the aircraft directors on the flight deck who manage all aircraft movement, placement and maintenance for the deck regions they are responsible for.

FDC is filled with computer screens and video displays of all that is occurring outside on deck, but it is also home to one of the most crucial pieces of equipment in the Navy, the Ouija board (Figure 1). The Ouija board is a waist­high replica of the flight deck at 1/16 scale that has all the markings of the flight deck, as well as its full compliment of aircraft — all in cutout models, and all tagged with items like thumbtacks and bolts to designate their status. The board offers an immediate glimpse of the deck status and allows the deck handlers in charge the ability to manipulate the model deck objects and make planning decisions, should the need arise. The board has been in use since World War II and has provided a platform of collaboration for deck handlers in terms of strategy planning for various scenarios on deck.

It is widely understood that the first round of damage to a ship will likely take out the electronics; so to ensure the ship remains functional in battle, everything possible has a

14 mechanical backup. Even though the traditional board has an advantage of being immune to electronic failures, there is potential for digital Ouija board technology to enhance the deck­operation­planning functionality and experience.

Figure 1: Deck handlers collaboratively operating on an Ouija Board. Source: Google Images.

1.2.2. Naval Push for Digital Information on Decks

Even though the Ouija board has been used to track aircraft movement on aircraft carriers for over seventy years, the Navy is working on a computerized replacement due to limitations of the current model. As one of the simplest systems aboard Navy ships, the Ouija boards can only be updated manually, i.e. when the deck handlers move models of aircraft and other assets around the model deck to match the movements of the real­life counterparts. The board does not offer any task automation, information processing or validation to help with strategy planning for various deck scenarios.

15

Figure 2: The ADMACS Ouija board. Source: Google Images.

The new Ouija board replacement (Figure 2) is part of the Aviation Data Management and Control System (ADMACS) [2], a set of electronic upgrades for carriers designed to make use of the latest technologies. This system requires the deck handler to track flight deck activity via computer, working with a monitor that will be fed data directly from the flight deck. In addition, the deck handler can move aircraft around on the simulated deck view using mouse and keyboard.

1.2.3. A Multimodal Ouija Board

The ADMACS Ouija board fixes the problem of updating the deck status in real­time without any manual work. It also allows the deck handlers to move aircraft on the simulated deck view using mouse and keyboard as noted. However, most deck handlers are apparently skeptical of replacing the existing system and they think that things that are not broken should not be fixed

[6]. Considering these facts, imagine a new Ouija board with a large digital tabletop display that could show the status of the deck and had an understanding of certain deck actions for scenario planning. To preserve the conventional way of interacting with the old­school Ouija board where

16 deck handlers move aircraft by hand, the system would take advantage of multiple modes of interaction. Utilizing hand­tracking and speech recognition techniques, the system could let deck handlers point at objects on deck and speak their commands. In return, the system could respond with its own synthesized speech and update the graphics to illustrate the consequences of the commands given by the deck handlers. This would create a two­way communication between the system and the deck handlers.

1.3. System Demonstration

To demonstrate how the multimodal Ouija Board discussed in Section 1.2.3 works in practice and preview DeckAssistant in action, we take a look at an example scenario from [1] where a deck handler is trying to prepare an aircraft for launch on a catapult. The deck handler needs to move the aircraft­to­be­launched to the catapult while moving other aircraft that are blocking the way to other locations on deck.

The system has a large tabletop display showing a digital, realistic rendering of an aircraft carrier deck with a complete set of aircraft (Figure 3).

Figure 3: DeckAssistant’s tabletop display with the digital rendering of the deck [1].

17 The deck handler stands in front of the table and issues commands using both hand gestures and speech (Figure 4). DeckAssistant uses either the Leap Motion Sensor (mounted on the edge of the display) or the Microsoft Kinect (mounted above the display) for hand­tracking.

The deck handler wears a wireless Bluetooth headset that supports a two­way conversation with the system through speech.

Figure 4: A deck handler using DeckAssistant with hand gestures and speech commands [1].

Figure 5 shows the initial aircraft arrangement of the deck. There are eleven F­18s (grey strike fighter jets) and two C­2s (white cargo aircraft) placed on the deck. There are four catapults at the front of the deck, and two of them are open. The deck handler will now try to launch one of the C­2s on one of the open catapults, and that requires moving a C­2 from the elevator, which is at the rear of the deck, to an open catapult, which is at the front of the deck.

After viewing the initial arrangement of the deck, the deck handler points at the aircraft to be moved, the lower C­2, and speaks the following command: “Move this C­2 to launch on

Catapult 2”. The display shows where the deck handler is pointing at with an orange dot, and the selected aircraft is highlighted in green (Figure 6).

18

Figure 5: The initial arrangement of the deck [1].

Figure 6: Deck handler points at the aircraft to be moved while speaking the command [1].

Now, DeckAssistant does its analysis to figure out whether the command given by the deck handler can be accomplished without any extra action. In this case, there is an F­18 blocking the path the C­2 needs to take to go to the catapult (Figure 7).

Figure 7: DeckAssistant uses graphics to tell the deck handler that the path to destination is blocked [1].

19 DeckAssistant knows that the F­18 has to be moved out of the way. It uses graphics and synthesized speech to let the deck handler know that additional actions are needed to be taken and ask for the handler’s permission in the form of a yes­no question (Figure 8).

Figure 8: DeckAssistant displays an alternate location for the F­18 that is blocking the path [1].

The aircraft are moved in the simulation if the deck handler agrees to the actions proposed by the system. If not, the system reverts back to the state before the command. If the deck handler does not like the action proposed by the system, they can cancel the command and move aircraft around based on their own strategies. The goal of DeckAssistant here is to take care of small details while the deck handler focuses the more important deck operations without wasting time.

1.4. Thesis Outline

In the next section, we talk about what type of actions are available in DeckAssistant and how they are taken, what the system knows about the deck environment, and how the multimodal interaction works. Section 3 discusses the hardware and software used as well as introducing the software architecture behind DeckAssistant. Sections 4 and 5 look at implementation details discussing hand­tracking, speech synthesis and recognition. Section 6 talks about related work. Section 7 discusses future work and concludes.

20 2. DeckAssistant Functionality

This section gives an overview of actions available in DeckAssistant, discusses what

DeckAssistant knows about the deck environment and the objects, and explains how the multimodal interaction happens.

2.1. Actions in DeckAssistant

The initial version of DeckAssistant focuses only on simple deck actions for aircraft movement and placement. These actions that allow deck handlers to perform tasks such as moving an aircraft from one location to another or preparing an aircraft for launch on a catapult.

These deck actions comprise the logic to perform a command given by the deck handler (Figure

9). As the example in Section 1.3 suggests, these actions are built to be flexible and interactive.

This means that the deck handler is always consulted for their input during an action, they can make alterations with additional commands, or they can suggest alternate actions if needed. The system takes care of the details, saving the deck handler’s time and allowing them to concentrate on more important tasks.

These are four actions available within DeckAssistant, as noted in [1]: ● Moving aircraft from start to destination. ● Finding an alternate location for aircraft to move if the intended destination is full. ● Clearing a path for aircraft to move from start to end location. ● Moving aircraft to launch on catapults.

21

Figure 9: The logic for moving aircraft [1].

2.2. Deck Environment

DeckAssistant has an understanding of the deck environment, which includes various types of aircraft, regions on deck and paths between regions (See Chapter 4 of [1] for the implementation details of the deck environment and objects).

2.2.1. Deck And Space Understanding

DeckAssistant’s user interface represents a scale model of a real deck just like a traditional Ouija Board. The system displays the status of aircraft on this user interface and use the same naming scheme that the deck handlers use for particular regions of the deck (Figure

10). The deck handlers can thus refer to those regions by their names when using the system.

Each of these regions contain a set of parking spots in which the aircraft can reside. These parking spots help the system determine the arrangement of parked aircraft and figure out the

22 occupancy in a region. This means that the system knows if a region has enough room to move aircraft to or if the path from one region to another is clear.

Figure 10: Regions on an aircraft carrier’s deck. Source: Google Images.

2.2.2. Aircraft and Destination Selection

Each aircraft on deck is a unique object that has a tail number (displayed on each aircraft), type, position, status and other information that is useful for the system’s simulation.

Currently, we support two different types of aircraft within DeckAssistant: F­18s and C­2s.

Selection of aircraft can be done two ways. The deck handler can either point at the aircraft (single or multiple) as shown in the example in Section 1.3, or, they can refer to the aircraft by their tail numbers, for instance, “Aircraft Number­8”.

Destination selection is similar. Since destinations are regions on the deck, they can be referred to by their names or they can be pointed at.

2.2.3. Path Calculation and Rerouting

During path planning, the system draws straight lines between regions and uses the wingspan length as the width of the path to make sure that there are no aircraft blocking the way and that the aircraft to move can fit into its path.

23 If a path is clear but the destination does not have enough open parking spots, the system suggests alternate destinations and routes, checking the nearest neighboring regions for open spots.

2.3. Multimodal Interaction

The goal of the multimodal interaction created by DeckAssistant’s user interface is to create a communication between the deck handler and the system. The input in this interaction is a combination of hand gestures and speech performed by the deck handler. The output is the system’s response with synthesized speech and graphical updates.

2.3.1. Input

DeckAssistant uses either the Leap Motion Sensor or the Microsoft Kinect for tracking hands. Hand­tracking allows the system to recognize certain gestures using the position of the hands and fingertips. Currently, the system can only interpret pointing gestures where the deck handler points at aircraft or regions on the deck.

Commands are spoken into the microphone of the wireless Bluetooth headset that the ​ deck handler wears, allowing the deck handler to issue a command using speech alone. In this case, the deck handler has to provide the tail number of the aircraft to be moved as well as the destination name. An example could be: “Move Aircraft Number­8 to the Fantail”. ​ ​ ​ Alternatively, the deck handler can combine speech with one or more pointing gestures. In this case, for example, the deck handler can point at an aircraft to be moved and say “Move this ​ aircraft”; and then he can point at the destination and say “over there”. ​ ​ 2.3.2. Output

The system is very responsive to any input. As soon as the deck handler does a pointing

24 gesture, an orange dot appears on the screen, indicating where the deck handler is pointing at (Figure 11 (a)). If the deck handler is pointing at an aircraft, the system highlights that aircraft with a green color, indicating a potential for selection (Figure 11 (b)). Eventually, if the deck handler takes an action to move aircraft on deck, the selected aircraft are highlighted in orange. As mentioned earlier, the deck handler can select multiple aircraft (Figure 12).

(a) (b) Figure 11: (a) Orange dot represents where the user is pointing at. (b) Aircraft being hovered over is highlighted green [1].

(a) (b) Figure 12: (a) Single aircraft selected. (b) Multiple aircraft selected [1].

The system’s responses to the deck handler’s input depend on the type of action and the aircraft arrangement on deck. If a certain action can be processed without additional actions, the

25 system completes it and confirms it by saying “Okay, done”. If the action cannot be completed for any reason, the system explains why using its synthesized speech and graphical updates, and asks for the deck handler’s permission to take an alternate action. In the case of deck handler approval, the system updates the arrangement on deck. The deck handler declines the suggested alternate action, the system reverts back to its previous state before the deck handler issued their command.

Section 1.3 gave us an example of this scenario where the system warned the user of the aircraft that was blocking the path to a catapult and it recommended an alternate spot to move the aircraft blocking the way. When the deck handler approved, then it could move the aircraft to launch on the catapult.

Let’s take a look at another scenario. Figure 13 shows an example of a situation where a

C­2 cannot be moved to the fantail since there are no open parking spots there. The system circles all the blocking aircraft in red, and suggests an alternate region on deck to move the C­2.

In that case, the new region is highlighted in blue and a clear path to it is drawn (Figure 14). If the deck handler accepts this suggested region, the system moves the C­2 there. If not, it reverts back to its original state and waits for new commands.

Figure 13: Aircraft circled in red, meaning there is not enough room in region [1].

26

Figure 14: Alternate region to move the C­2 is highlighted in blue [1].

27 3. System Implementation

In this section, we introduce DeckAssistant’s hardware setup, the software libraries used and the software architecture design.

3.1. Hardware

Figure 15: The hardware used in DeckAssistant.

As it can be seen in Figure 15, DeckAssistant’s hardware setup consists of:

● Four downward­facing Dell 5100MP projectors mounted over the tabletop. These

projectors create a 42 by 32 inch seamless display with a 2800 x 2100 pixel resolution.

28 ● A white surface digitizer. The display is projected onto this surface.

● A Leap Motion Sensor or a Microsoft Kinect (V1) for tracking hands over the table

surface. The system can use either sensor.

● A Logitech C920 Webcam for viewing the entire surface. This webcam is used to

calibrate the seamless display using the ScalableDesktop Classic software. ​ ​ ● A wireless Bluetooth headset for supporting a two­way conversation with the system.

This setup is powered by a Windows 7 desktop computer with an AMD Radeon HD 6870 graphics card. It should be noted that the need for the surface digitizer, projectors and webcam would be eliminated if the system was configured to use a flat panel for the display.

3.2. Software

All of DeckAssistant’s code is written in Java 7 in the form of a stand­alone application.

This application handles all the system functionality: graphics, speech recognition, speech synthesis, and gesture recognition.

3.2.1. Libraries

Four libraries are used to provide the desired functionality:

● Processing: for graphics; it is a fundamental part of our application framework.

● AT&T Java Codekit: for speech recognition.

● Microsoft Translator Java API: for speech synthesis.

● Leap Motion Java SDK: provides the interface to the Leap Motion Controller sensor for

hand­tracking.

29 3.2.2. Architecture

DeckAssistant’s software architecture is structured around three stacks that handle the multimodal input and output. These three stacks run in parallel and are responsible for speech synthesis, speech recognition and hand­tracking. The Speech Synthesis Stack constructs sentences in response to a deck handler’s command and generates an audio file for that sentence that is played through the system’s speakers. The Speech Recognition Stack constantly listens for commands, does speech­to­text conversion and parses the text to figure out the command that was issued. The Hand­Tracking Stack interfaces either with the Leap Motion Sensor or the

Microsoft Kinect, processes the data received and calculates the position of the user’s pointing finger over the display as well as detecting additional gestures. These three stacks each provide an API (Application Program Interface) so that the other components within DeckAssistant can communicate with them for a multimodal interaction.

Another crucial part of the architecture is the Action Manager component. The Action

Manager’s job is to manipulate the deck by communicating with the three multimodal interaction stacks. Once a deck handler’s command is interpreted, it is passed into the Action Manager which updates the deck state and objects based on the command and responds by leveraging the

Speech Synthesis Stack and graphics.

Finally, all of these stacks and components run on a Processing loop that executes every

30 milliseconds. Each execution of this loop makes sure the multimodal input and output are processed. Figure 16 summarizes the software architecture. The DeckAssistant Software Guide ​ (see Appendix for URL) details the implementation of each component within the system.

30

Figure 16: DeckAssistant software architecture overview.

31 4. Hand Tracking

In Chapter 5 of his thesis [1], Kojo Acquah discusses methods for tracking hands and recognizing pointing gestures using a Microsoft Kinect (V1). These initial hand­tracking methods of DeckAssistant can only recognize outstretched fingers on hands that are held mostly perpendicular to the focal plane of the camera. They do not work well with other hand poses, leaving no way to recognize other gestures. Authors of [8] provide a detailed analysis of the accuracy and resolution of the Kinect sensor’s depth data. Their experimental results show that the random error in depth measurement increases with increasing distance to the sensor, ranging from a few millimeters to approximately 4 centimeters at the maximum range of the sensor. The quality of the data is also found to be affected by the low resolution of the depth measurements that depend on the frame rate (30fps [7]). The authors thus suggest that the obtained accuracy, in general, is sufficient for detecting arm and body gestures, but is not sufficient for precise finger tracking and hand gestures. Experimenting with DeckAssistant’s initial version to take certain actions, we note a laggy and low­accuracy hand­tracking performance by the Kinect sensor. In addition, the Kinect always has to be calibrated before DeckAssistant can be used. This is a time­consuming process. Finally, the current setup has a usability problem; when deck handlers stand in front of the tabletop and point at the aircraft on the display, their hands block the projectors’ lights causing shadows in the display.

Authors of [9] present a study of the accuracy and robustness of the Leap Motion Sensor.

They use an industrial robot with a reference pen allowing suitable position accuracy for the experiment. Their results show high precision (an overall average accuracy of 0.7mm) in fingertip position detection. Even though they do not achieve the accuracy of 0.01mm, as stated

32 by the manufacturer [3], they claim that the Leap Motion Sensor performs better than the

Microsoft Kinect in the same experiment.

This section describes our use of the Leap Motion Sensor, to track hands and recognize gestures, allowing for a high­degree of subjective robustness.

4.1. The Leap Motion Sensor

The Leap Motion Sensor is a 3” long USB device that tracks hand and finger motions. It works by projecting infrared light upward from the device and detecting reflections using monochromatic infrared cameras. Its field of view extends from 25mm to 600mm above the device with a 150° spread and a high frame rate (>200fps) [3]. In addition, more information about the hands is provided by the Application Programming Interface (API) of the Leap Motion

Sensor than the Microsoft Kinect’s (V1).

Figure 17: The Leap Motion Sensor mounted on the edge of the tabletop display.

The Leap Motion Sensor is mounted on the edge of the tabletop display, as shown above in Figure 17. In this position, hands no longer block the projector’s lights, thereby eliminating

33 the shadows in the display. The sensor also removes the need for calibration before use, enabling

DeckAssistant to run without any extra work. Finally, thanks to its accuracy in finger­tracking, the sensor creates the opportunity for more hand gestures to express detail in deck actions (see

Section 4.1.2).

4.1.1. Pointing Detection

The Leap Motion API provides us with motion tracking data as a series of frames. Each frame contains measured positions and other information about detected entities. Since we are interested in detecting pointing, we look at the fingers. The class in the API reports ​Pointable the physical characteristics of detected extended fingers such as tip position, direction, etc. From these extended fingers, we choose the pointing finger as the one that is farthest toward the front in the standard Leap Motion frame of reference. Once we have the pointing finger, we retrieve its tip position by calling the class’ method. This ​Pointable ​stabilizedTipPosition() method applies smoothing and stabilization on the tip position, removing the the flickering caused by sudden hand movements and yielding a more accurate pointing detection that improves the interaction with our 2D visual content. The stabilized tip position lags behind the original tip position by a variable amount (not specified by the manufacturer) [3] depending on the speed of movement.

Finally, we map the tip position from the Leap Motion coordinate system to our system’s

2D display. For this, we use the API class InteractionBox. This class represents a ​ ​ cuboid­shaped region contained in the Leap Motion’s field of view (Figure 18). The

InteractionBox provides normalized coordinates for detected entities within itself. Calling the method of this class returns the normalized 3D coordinates for the tip ​normalizePoint()

34 position within the range [0...1]. Multiplying the X and Y components of these normalized coordinates by the our system’s screen dimensions, we complete the mapping process and obtain the 2D coordinates in our display. Algorithm 1 summarizes the pointing detection process.

Figure 18: Leap Motion’s InteractionBox, colored in red. Source: Leap Motion Developer Portal.

Algorithm 1: Summary of the pointing detection process in pseudocode.

As discussed in Section 2.3.2, the mapped tip position is displayed on the screen as an orange dot.

4.1.2. Gesture Detection

We implemented a new gesture for multiple aircraft selection, using a combination of pointing and pinching. The deck handler can point with their index finger while pinching with

35 their thumb and middle finger to select multiple aircraft. We detect this gesture using the Leap

Motion API’s method. If the deck handler is pinching, the value returned ​pinchStrength() by this method is 1, and 0 otherwise. However, since this value can be affected by movements of the deck handler’s hand due to the device’s sensitivity, we apply a moving average method to make sure that the majority of the values we receive from the method indicate pinching. In addition, we recognize this gesture only if the user is pinching with the thumb and the middle finger. We do this by iterating through the list of fingers in a frame and checking the distance between their tip positions and the thumb’s tip position. The middle finger’s tip position in this case is supposed to have the smallest distance to the thumb’s tip position. The reason for this check is that we do not want to recognize other hand poses as a pinch gesture. For example, if the deck handler is pointing with their index finger and the other fingers are not extended, the system might think that the user is pinching. However, that is not the case and the check we run along with the moving average applied pinch strength value prevents the recognition of such cases. Figure 19 shows an example of multiple aircraft selection using the pinch gesture.

Figure 19: Demonstration of multiple aircraft selection with the pinch gesture.

36 5. Speech Synthesis and Recognition

This section details the improvements in Speech Synthesis and Speech Recognition for

DeckAssistant.

5.1. Speech Synthesis

The initial version of DeckAssistant, as discussed in [1, Section 6.1], used the FreeTTS package for speech synthesis. Even though FreeTTS provides an easy­to­use API and is compatible with many operating systems, it lacks pronunciation quality and clarity in speech. To solve this problem, we implemented a speech synthesizer interface that acts as a front to any speech synthesis library that we plug in. One library that works successfully with our system is the Microsoft Translator API, a cloud­based automatic, machine translation service that supports multiple languages. Since our application uses the English language, we do not use any of the translation features of the service. Instead, we use it to generate a speech file from the text we feed in.

As explained in Section 3.2.2, speech is synthesized in response to a deck handler’s commands. Any module in the software can call the Speech Synthesis Engine of the Speech

Synthesis Stack to generate speech. Once called, the Speech Synthesis Engine feeds the text to be spoken into the Microsoft Translator API through the interface we created. The Microsoft

Translator API then makes a request to the Microsoft Translator API which returns a WAV file that we play through our system’s speakers. In the case of multiple speech synthesis requests, the system queues these requests and handles them in order. Using the Microsoft Translator API enables us to provide high­quality speech synthesis with clear voices. It should be noted that the

37 future developers of DeckAssistant can incorporate any speech synthesis library into the system with ease.

5.2. Speech Recognition

The CMU Sphinx 4 library is used for recognizing speech in the initial version of

DeckAssistant [1, Section 6.2]. Even though Sphinx provides an easy API to convert speech into text with acoustic models and a grammar (rules for specific phrase construction) of our choice, the speech recognition performance is poor in terms of recognition speed and accuracy. In the experiments we ran during development, we ended up repeating ourselves several times until the recognizer picked up what we were saying. In response, we introduced a speech recognizer interface that provides us with the flexibility to use any speech recognition library. Other modules in DeckAssistant can call this interface and use the recognized speech as needed.

5.2.1. Recording Sound

The user can talk to DeckAssistant at any time, without the need for extra actions such as push­to­talk or gestures. For this reason, the system should constantly be recording using the microphone, understanding when the user is done issuing a command, and generating a WAV file of the spoken command. Sphinx’s Live Speech Recognizer took care of this by default.

However, since the speech recognizer library we decided to use (discussed in the next section) did not provide any live speech recognition, we had to implement our own sound recorder that generates WAV files with the spoken commands. For this task, we use SoX (Sound Exchange), a cross­platform command line utility that can record audio files and process them. The SoX command constantly runs in the background to record any sound. It stops recording once no sound is detected after the user has started speaking. It then trims out certain noise bursts and

38 writes the recorded speech to a WAV file which is sent back to DeckAssistant. Once the speech recognizer is done with the speech­to­text operation, this background process is run again to record new commands. For more details about SoX, please refer to the SoX Documentation [4].

5.2.2. Choosing a Speech Recognition Library

To pick the most suitable speech recognition library for our needs, we experimented with four popular APIs:

● Google Speech: It did not provide an official API. We had to send an HTTP request to

their service with a recorded WAV file to get the speech­to­text response, and were

limited to 50 requests per day. Even though the responses for random sentences that we

used for testing were accurate, it did not work very well for our own grammar since the

library does not provide any grammar configuration. A simple example could be the

sentence “Move this C­2”. The recognizer thought that we were saying “Move this see

too”. Since we had a lot of similar issues with other commands, we decided not to use

this library.

● IBM Watson Speech API: Brand new, easy­to­use API. It transcribed the incoming audio

and sent it back to our system with minimal delay, and speech recognition seemed to

improve as it heard more. However, like Google Speech, it did not provide any grammar

configuration which caused inaccuracy in recognizing certain commands in our system.

Therefore, we did not use this library.

● Alexa Voice Service: Amazon recently made this service available. Even though the

speech recognition works well for the purposes it was designed for, it unfortunately

cannot be used as a pure speech­to­text service. Instead of returning the text spoken, the

39 service returns an audio file with a response which is not useful for us. After hacking

with the service, we managed to extract the text that was transcribed from the audio file

we sent in. However, it turns out that the Alexa Voice Service can only be used when the

user says the words “Alexa, tell DeckAssistant to…” before issuing a command. That is

not very usable for our purposes, so we choose not to work with this service.

● AT&T Speech: This system allowed us to configure a vocabulary and a grammar that

made the speech recognition of our specific commands very accurate. Like the IBM

Watson Speech API, the transcription of the audio file we sent in was returned with

minimal delay. Therefore, we ended up using this library for our speech recognizer. The

one downside of this library was that we had to pay a fee1 to receive Premium Access for

the Speech API.

As explained in Section 5.2.1, recognition is performed after each spoken command followed by a brief period of silence. Once the AT&T Speech library recognizes a phrase in our grammar, we pass the transcribed text into our parser.

5.2.3. Parsing Speech Commands

The parser extracts metadata that represents the type of the command being issued as well as any other relevant information. Each transcribed text that is sent to the parser is called a base command. Out of all the base commands, only the Decision Command (Table 1) represents a meaningful action by itself. The parser interprets the rest of the commands in two stages, which allows for gestural input alongside speech. We call these combined commands. Let’s look at an example where we have the command “Move this aircraft, over there”. When issuing this

1 AT&T Developer Premium Access costs $99. 40 command, the deck handler points at the aircraft to be moved and says “Move this aircraft...”, followed by “...over there” while pointing at the destination. In the meantime, the parser sends the metadata extracted from the text to the Action Manager, which holds the information until two base commands can be combined into a single command for an action to be taken. In addition, the Action Manager provides visual and auditory feedback to the deck handler during the process. A full breakdown of speech commands are found in [1] and listed here:

Base Commands

Name Function Example(s)

Move Command Selects aircraft to be moved. “Move this C­2…”

Location Command Selects destination of move. “…to the fantail.”

Launch Command Selects catapult(s) to launch “…to launch on Catapult 2.” aircraft on.

Decision Command Responds to a question from “Yes”, “No”, “Okay”. DeckAssistant

Combined Commands

Name Function Combination

Move to Location Command Moves aircraft to a specified Move Command + Location destination. Command

Move Aircraft to Launch Moves aircraft to launch on one Move Command + Launch Command or more catapults. Command Table 1: Set of commands that are recognized by DeckAssistant.

5.2.4. Speech Recognition Stack in Action

In Figure 20, we outline how the Speech Recognition Stack works with the Action

Manager to create deck actions. As already discussed in Section 5.2.1, the SoX process that we run is constantly recording and waiting for commands. Figure 20 uses a command that moves an

41 aircraft to a deck region as an example. When the deck handler issues the first command, the

SoX process sends the speech recognizer a WAV file to transcribe. The transcribed text is then sent to the speech parser which extracts the metadata. Once the speech recognizer is done transcribing, it restarts the recording of sound through the SoX Command to listen for future commands. Step 1 on Figure 20 shows that the metadata extracted represents a Move Command for an aircraft that is being pointed at. The Action Manager receives this information at Step 2, understands that it is a base command and it waits for another command to combine them into a single command that represents a deck action. In the meantime, the Action Manager consults the

Selection Engine at Step 3 to get the information for the aircraft that is being pointed at. This allows the Action Manager highlight the aircraft that is selected. Meanwhile, the deck handler speaks the rest of the command, which is sent to the parser. Step 4 shows the metadata that is assigned to the base command spoken. In this case, we have a Location Command and the name of the deck region which is the destination. In Step 5, the Action Manager constructs the final command with the second base command, and it fetches the destination information through the

Deck Object. Finally, a Deck Action is created (Step 7) with the information gathered from the

Speech Recognition Stack and other modules.

Implementation of Deck Actions is described in [1, Section 7].

42

Figure 20: A summary of how the speech recognition stack works.

43 6. Related Work

This section presents the work done previously that inspired the DeckAssistant project.

6.1. Navy ADMACS

As mentioned in Section 1.2.2, the Navy is moving towards a more technologically developed and connected system called ADMACS that is a real­time data management system ​ connecting the carrier's air department, ship divisions and sailors who manage aircraft launch and recovery operations.

6.2. Deck Heuristic Action Planner

Ryan et al. have developed ‘a decision support system for flight deck operations that utilizes a conventional integer linear program­based planning algorithm’ [5]. In this system, a human operator inputs the end goals as well as constraints, and the algorithm returns a proposed schedule of operations for the operator’s approval. Even though their experiments showed that human heuristics perform better than the plans produced by the algorithm, human decisions are usually conservative and the system can offer alternate plans. This is an early attempt to aid planning on aircraft carriers.

44 7. Conclusion

In this thesis, we introduced improvements to DeckAssistant, a system that provides a traditional Ouija board interface by displaying a digital rendering of an aircraft carrier deck that assists deck handlers in planning deck operations. DeckAssistant has a large digital tabletop display that shows the status of the deck and has an understanding of certain deck actions for scenario planning. To preserve the conventional way of interacting with the old­school Ouija board where deck handlers move aircraft by hand, the system takes advantage of multiple modes of interaction. Deck handlers plan strategies by pointing at aircraft, gesturing and talking to the system. The system responds with its own speech and updates the display to show the consequences of the actions taken by the handlers. The system can also be used to simulate certain scenarios during the planning process. The multimodal interaction described here creates a communication of sorts between deck handlers and the system.

Our work includes three improvements to the initial version of DeckAssistant built by

Kojo Acquah [1]. First is the introduction of the Leap Motion Sensor for pointing detection and gesture recognition. We presented our subjective opinions on why the Leap Motion device performs better than the Microsoft Kinect and we explain how we achieve pointing detection and gesture recognition using the device. The second improvement is better speech synthesis from our introduction of a new speech synthesis library that provides high­quality pronunciation and clarity in speech. The third improvement is better speech recognition. We discuss the use cases of several speech recognition libraries and figure out which one is the best for our purposes. We explain how to integrate this new library into the current system with our own methods of recording voice.

45 7.1. Future Work

While the current version of DeckAssistant focuses only on aircraft movement based on deck handler actions, future versions may be able to implement algorithms where the system can simulate the most optimal ordering of operations for an end goal, while accounting for deck and aircraft status such as maintenance needs.

Currently, DeckAssistant’s display that is created by the four downward­facing projectors mounted over the tabletop (discussed in Section 3.1) has a high pixel resolution. However, it is not as seamless as it should be. The ScalableDesktop software is being used to accomplish an automatic edge­blending of the four displays, however the regions where the projectors overlap are still visible. Moreover, the ScalableDesktop software has to be run for calibration every time a user tries to start DeckViewer, and the brightness of the display is low. Instead of the projectors and the tabletop surface, a high­resolution, touchscreen LED TV might be mounted flat on a table. This would provide a seamless display free of projector overlaps and remove the need for time­consuming calibration. In addition, with the touchscreen feature, we can introduce drawing gestures where the deck handler can draw out the aircraft movement as well as take notes on the screen.

46 8. References

[1] Kojo Acquah. Towards a Multimodal Ouija Board for Aircraft Carrier Deck Operations. ​ June 2015.

[2] US Navy Air Systems Command. Navy Training System Plan for Aviation Data ​ Management and Control System. March 2002. ​

[3] The Leap Motion Sensor. Leap Motion for Mac and PC. November 2015. ​ ​

[4] SoX Documentation. http://sox.sourceforge.net/Docs/Documentation. February 2013. ​ ​

[5] Ryan et al. Comparing the Performance of Expert User Heuristics and an Integer Linear ​ Program in Aircraft Carrier Deck Operations. 2013. ​

[6] Ziezulewicz, Geoff. "Old­school 'Ouija Board' Being Phased out on Navy Carriers." Stars ​ and Stripes. Stars and Stripes, 10 Aug. 2011. Web. 03 Mar. 2016. ​

[7] Microsoft. Kinect for Windows Sensor Components and Specifications. Web. 07 Mar. 2016. ​ ​

[8] Khoshelham, K.; Elberink, S.O. Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 2012, 12, 1437–1454.

[9] Weichert, F.; Bachmann, D.; Rudak, B.; Fisseler, D. Analysis of the accuracy and robustness of the leap motion controller. Sensors 2013, 13, 6380–6393.

47 9. Appendix

9.1. Code and Documentation

The source code of DeckAssistant, documentation on how to get up and running with the system, and the DeckAssistant Software Guide is available on GitHub: ​ ​ https://github.mit.edu/MUG­CSAIL/DeckViewer. ​

48