Multimodal Interaction with Internet of Things and Augmented Reality
Total Page:16
File Type:pdf, Size:1020Kb
Multimodal Interaction with Internet of Things and Augmented Reality Foundations, Systems and Challenges Joo Chan Kim Multimodal Interaction with Internet of Things and Augmented Reality Foundations, Systems and Challenges Author Joo Chan Kim [email protected] Supervisors Teemu H. Laine [email protected] Christer Åhlund [email protected] Luleå University of Technology Department of Computer Science, Electrical and Space Engineering Division of Computer Science ISSN 1402-1536 ISBN 978-91-7790-562-2 (pdf) Luleå 2020 www.ltu.se Abstract The development of technology has enabled diverse modalities that can be used by humans or machines to interact with computer systems. In particular, the Internet of Things (IoT) and Augmented Reality (AR) are explored in this report due to the new modalities offered by these two innovations which could be used to build multimodal interaction systems. Researchers have utilized multiple modalities in interaction systems for providing better usabil- ity. However, the employment of multiple modalities introduces some challenges that need to be considered in the development of multimodal interaction systems to achieve high usability. In order to identify a number of remaining challenges in the research area of multimodal interaction systems with IoT and AR, we analyzed a body of literature on multimodal interaction systems from the perspectives of system architecture, input and output modalities, data processing methodology and use cases. The identified challenges are regarding of (i) multidisciplinary knowledge, (ii) reusability, scalability and security of multimodal interaction system architecture, (iii) usability of multimodal inter- action interface, (iv) adaptivity of multimodal interface design, (v) limitation of current technology, and (vi) advent of new modalities. We are expecting that the findings of this report and future research can be used to nurture the multimodal interaction system research area, which is still in its infancy. i Table of Contents 1 Human-computer Interaction 1 2 Foundations 2 2.1 Multimodal Interaction . 2 2.2 Internet of Things . 3 2.3 Augmented Reality . 3 3 Multimodal Interaction - Modality 5 3.1 Input (Human ! Computer) . 5 3.1.1 Visual signal . 5 3.1.2 Sound . 6 3.1.3 Biosignals . 7 3.1.4 Inertia & Location . 8 3.1.5 Tangible objects . 9 3.2 Output (Computer ! Human) . 9 3.2.1 Visual representation . 9 3.2.2 Sound . 9 3.2.3 Haptics . 10 3.2.4 Others . 10 4 Multimodal Interaction - System Modeling 11 4.1 Integration (Fusion) . 11 4.1.1 Data level integration . 11 4.1.2 Feature level integration . 11 4.1.3 Decision level integration . 12 4.2 Presentation (Fission) . 12 5 Multimodal Interaction using Internet of Things & Augmented Reality 13 5.1 Internet of Things . 13 5.1.1 Visual signal . 13 5.1.2 Sound . 13 5.1.3 Biosignal . 13 5.1.4 Inertia & Location . 14 5.2 Augmented Reality . 14 5.3 IoT with AR . 14 5.3.1 AR for user interaction . 17 5.3.2 AR for interactive data representation . 18 6 Discussion 19 7 Conclusion 22 ii List of Figures 1 Multimodal interaction framework . 2 2 Internal framework of interaction system . 3 3 Visual input signal types . 5 4 Sound categories . 6 5 Biosignals and corresponding body positions . 7 6 The architecture of three integration types . 11 7 Interaction types in ARIoT . 15 List of Tables 1 Interaction type classification . 15 2 Challenges and Research questions . 20 iii Abbreviations AR Augmented Reality BCI Brain-Computer Interface ECG Electrocardiogram EDA Electrodermal Activity EEG Electroencephalography EMG Electromyography EOG Electrooculography FPS First-person Shooter HCI Human-Computer Interaction HMD Head-Mounted Display IMU Inertial Measurement Unit IoT Internet of Things ISO International Organization for Standardization ITU International Telecommunication Union MI Multimodal Interaction MR Mixed Reality PPG Pulse Pattern Generator RF ID Radio-frequency Identification SCR Skin Conductance Response iv 1 Human-computer Interaction Human-Computer Interaction (HCI) is a research field that mainly focuses on design methods for a human to interact with a computer. The discipline started to grow from the 1980s [1], and the term HCI was popularized by Stuart K. Card [2]. From that time, apart from the ordinary interaction system (i.e., mouse and keyboard), researchers have started to design new interaction systems based on multimodal interaction by combining more than one interaction methods, such as speech and hand gestures [3]. Nowadays, the development of technology enables ubiquitous computing in real life, and this development gives a possibility to utilize many new interaction technologies, such as head-mounted displays [4], gesture recognition sen- sors [5], brain-computer interfaces [6], augmented reality [7], and smart objects [8]. By considering these development environments, the increase in complexity, as well as potential, of multimodal interaction is inevitable. However, an improvement in the usability of multimodal interaction for the user is a remaining challenge in HCI research [9], [10]. Another challenge related to this is that the designer of multimodal interaction systems must have multidisciplinary knowledge in diverse fields to understand the user, the system and the interaction in order to achieve high usabil- ity [9], [11]. The term ‘usability’ has been used when the study aims to evaluate the interaction system from the user’s perspective. According to the ISO 9241-11:2018 standard [12], usability is consisting of effectiveness, efficiency, and satisfaction by the user, and these aspects are typically covered by usability measurement instruments. In this report, we give an overview of multimodal interaction and focus on two aspects of multimodal interaction; the system and the interaction. In particular, this report provides an overview of state-of-the-art research on two innovations, Internet of Things (IoT) [8] and Augmented Reality (AR) [13], due to the new modalities offered by these innovations, which could be used to build multimodal interaction systems. In this report, multimodal interaction refers to multiple inputs and/or outputs from the system’s perspective. We review the state-of-art research on multimodal interaction systems with AR and/or IoT technologies that were published from 2014 to 2019. With comprehensive research, this report gives a general knowledge of multimodal interaction to parties interested in this subject and thereby helps to identify challenges for future research and development. 1 2 Foundations In this section, we explain the definition of the terms, Multimodal Interaction (MI), Internet of Things (IoT) and Augmented Reality (AR), and elaborate upon in relation to previous research. The goals of this section are to: (i) show the ambiguity of the terms due to multiple existing definitions, and (ii) formulate the definitions to be used in this study. 2.1 Multimodal Interaction One of the key terms in multimodal interaction is modality. The sender (as well as the receiver) of the data can either be a human user or a machine. Consequently, the delivery method can incorporate both analogue (e.g., human body parts) and digital (e.g., a digital image) components. We distinguish the state from intent because of the response created based on the received data, which depends on whether the data represents the state or the intent of the sender. For example, a system can understand the user’s intent when the user uses a finger as a tool of modality to point out (i.e., gesture) or select an object (i.e., touch), whereas another modality, the heartbeat, can be used to interpret the user’s state. In this report, modality is defined as follows: Definition 2.1. Modality is a delivery method to transfer data which can be used for interpretation of the sender’s state [9] or intent [11]. Consequently, modalities in a multimodal interaction system are often connected to voluntary actions (e.g., touch, spoken words), but they can also represent signals of involuntary or autonomous nature (e.g., heartbeat, sweating, external monitoring by a third party). The modality for the interaction system can vary depending on the available devices or technology for the system and the purpose of the system. Not only a mouse and a keyboard, but also a computer vision, sound, biosignal, and human behaviour can be utilized for using modalities. In this sense, when an interaction system employs more than one modality for input or output, it is referred to as a multimodal interaction system. Figure 1 picks our framework of multimodal interaction that represents the relationship between the agent and the system by using modalities. We use ‘agent’ in this report referred to both the human and the machine. While Figure 1 illustrates the fundamental case where one agent interacts with a system, it can also be extended to cover multiagent interaction systems by adding more agents. In Section 3, various modalities that have been used for transmitting input or output data in between an agent and the interaction system will be described in detail. Figure 1: Multimodal interaction framework The term ‘multimodal interaction’ was used and defined in other studies [6], [9]. However, we will make our own definition based on our framework: Definition 2.2. Multimodal interaction is the process between the agent and the interaction system that allows the agent to send data by combining and organizing more than one modality (see "INPUT" in Figure 1), while output data can also be provided by the interaction system in multiple modalities (see "OUTPUT" in Figure 1). The primary purpose of multimodal interaction is to provide better usability and user experience while using multiple modalities rather than one modality. As an example, Kefi et al. [14] compared the user satisfaction with two different input modality sets to control a three-dimensional (3D) user interface: (i) mouse only, and (ii) mouse with voice. The result of this study showed that the use of two input modalities (mouse with voice) could provide better user satisfaction than one (mouse) input modality.