Université Paris 8 - Vincennes - Saint-Denis

Laboratoire Paragraphe (EA 349) École doctorale Cognition, Langage, Interaction

Mention: Informatique

Thèse présentée et soutenue publiquement par

Saleh SALOUS

Fusion de données multi-Kinect visant à améliorer l’interaction gestuelle au sein d’une installation de réalité virtuelle

Thèse dirigée par Khaldoun ZREIK Encadrée par Safwan CHENDEB et Laure LEROY

Le 23 Novembre, 2015

Jury :

Pr. Ioannis Kanellos, Telecom Bretagne, Brest Rapporteur

Pr. Fouad Badran, Cnam-Paris Rapporteur

Pr. Benoit Geller, ENSTA Examinateur

Dr. Safwan Chendeb, Université Paris 8 Examinateur

Dr. Taha Riden, ENSTA Examinateur

Dr. Laure Leroy, Université Paris 8 Examinateur

I

Abstract

Virtual Reality is the most modern technology that allows a user to interact with an artificial environment created by Hardware and Software, with visual and aural feedback powerful enough to create the impression of a realistic environment. As a consequence, this form of computer interaction can be used in various contexts such as entertainment, medicine or vehicle driving training. Furthermore, numerous types of VR installations exist depending on the physical and financial constraints as well as on the intended final user experience provided by the system. The subject of this thesis is user interaction in a specific type of VR installation called a CAVE. Our

CAVE, named “Le SAS”, currently relies on AR technology technology to detect users, and a joystick is used to provide directional inputs.

Our objective is to present, describe and analyze an alternative user-tracking method relying on a

4-Kinect set-up tasked with tracking the user‟s movements inside this CAVE. Proper user- tracking is one of the main challenges provided by as well as one of the core elements that define a proper and functional VR system; therefore it is important to implement an effective tracking system.

In order to create true interaction with the provided by the CAVE, the sensors can detect various types of input. In the case of a multi-Kinect system, interaction with the CAVE will be based on user gestures which recognition is performed by the Kinects on a skeleton created after fusing the joint data from the various sensors.

This thesis will focus on four main points, as described below.

II

The first part will provide a context analysis of our immersive CAVE “Le SAS” and define the features as well as the constraints of this specific environment in which the multi-Kinect system is installed.

In the second part, the topic of tracking algorithms will be discussed. Indeed, the immersive

CAVE‟s large-scale implies a tracking system composed of several sensors. The use of a network of cameras to track a user inside the CAVE is synonymous with the use of an algorithm that determines in real-time what sensors provide the most accurate tracking data and will therefore properly recognize the user‟s inputs and movements.

Subsequently, we will propose a gesture detection algorithm. Once the user‟s gestures are properly tracked, such an algorithm is necessary in order to provide interaction. While the

Kinects can capture the user‟s movements, the question of the detection of specific gestures by this system comes into play as the CAVE needs to be configured as to recognize specific gestures as potential inputs. The presented algorithm will focus on three specific gestures:

Raising the right hand, raising the left hand and short hopping.

Lastly, we will provide experimental results comparing the effectiveness of a multi-Kinect set-up with the effectiveness of a single sensor and present data showing a noticeable increase in accuracy with the 4-Kinect system.

III

Résumé

Les technologies liées à la réalité virtuelle sont les outils les plus avancés dans le domaine de l‟interaction numérique, permettant à un utilisateur de communiquer avec une simulation créée à partir d‟un matériel et d‟une solution logicielle dédiés. Le degré d‟immersion proposé par ces technologies et leur feedback audio et vidéo peut donner l‟impression que ces environnements virtuels sont réels.

Par conséquent, de multiples secteurs tels que le divertissement vidéo-ludique ou la médecine peuvent incorporer ces technologies. De plus, les installations de réalité virtuelle existantes sont nombreuses et leurs caractéristiques peuvent varier en fonction des contraintes physiques et financières des projets, ainsi qu‟en fonction de l‟expérience utilisateur souhaitée.

Un de ces types d‟installations de réalité virtuelle, le CAVE, est au cœur de cette thèse. Notre

CAVE, nommé « Le SAS », utilise à l‟heure actuelle une combinaison de l‟technologie AR pour détecter des utilisateurs et d‟un joystick pour récupérer des inputs directionnels. Notre objectif à travers cette thèse est de présenter, décrire et analyser une méthode alternative de détection de mouvements au sein du SAS, reposant sur l‟utilisation d‟un système de 4 Kinects connectées ensemble. Cette analyse est pertinente et justifiée étant donnée l‟importance d‟un système de détection d‟utilisateur dans une installation de réalité virtuelle. Afin de proposer un niveau satisfaisant ‟interaction avec l‟environnement virtuel, les capteurs installés sur le CAVE peuvent détecter différents types d‟inputs. Dans le cadre d‟un système multi-Kinect, l‟interaction repose sur la détection de gestes effectués par l‟utilisateur. Ces gestes sont extraits d‟un squelette virtuel formé à partir des données recueillies par les Kinects.

Cette thèse va aborder quatre points-clés décrits ci-dessous :

IV

Premièrement, nous étudierons le contexte lié à notre CAVE et définirions ses caractéristiques ainsi que les contraintes que cet environnement particulier de réalité virtuelle impose à notre dispositif multi-Kinect. En second lieu, nous aborderons le sujet es algorithmes de suivi d‟utilisateur au sein d‟un CAVE. En effet, les dimensions du SAS amènent à utiliser plusieurs capteurs pour suivre l‟utilisateur. Par conséquent, il devient nécessaire d‟utiliser un algorithme capable de déterminer en temps-réel quelles Kinects produisent les données les plus précises et les plus fiables afin de correctement détecter les mouvements de l‟utilisateur.

Par la suite, nous proposerons un algorithme de détection de gestes. Cette étape est la suite logique de la détection d‟utilisateur et consiste à interpréter les mouvements enregistrés. Bien que les Kinects soient capables d‟enregistrer les mouvements et gestes de l‟utilisateur, le CAVE doit être configuré afin de reconnaître certains gestes spécifiques, créant ainsi la possibilité d‟interagir avec un environnement virtuel. Notre analyse se concentrera sur trois gestes spécifiques : Lever la main droite, lever la main gauche, et effectuer un petit saut. Finalement, nous fournirons des résultats d‟expérience ayant pour objectif de comparer l‟efficacité d‟un système Multi-Kinect par rapport à l‟utilisation d‟un seul capteur. Nous présenterons des données indiquant une amélioration de la précision de la détection de gestes avec plusieurs

Kinects.

V

List of figures

Figure 1.1‎ : Graphic representation of "SAS" (Ridene et al., 2013) ...... 32

Figure 1.2‎ : Kinect consists of Infra-red (IR) projector, IR camera and RGB camera (Smisek,

2011) ...... 33

Figure 1.3‎ : Diagram showing the Kinect FOV by Mr. Riley Porter (8) ...... 35

Figure 1.4‎ : Kinect detects movements of disabled persons(Chang et al,. 2011)...... 37

Figure 1.5‎ : joints of human body (Alexiadis et al., 2011) ...... 38

Figure 1.6:‎ User input and control of system (Du et al., 2011) ...... 39

Figure 1.7‎ : A) original image, B) model scan from Kinect, C) model got from Kinect fusion technique (Lezadi et al, .2011)...... 40

Figure 1.8‎ : A) and C) show facial animation expressions by Kinect, B) and D) shows combined facial animations from database. (Weise et al., 2011)...... 41

Figure 1.9‎ : A) Gloves with accelerometers B) detailed accelerometers (Zafrulla et al., 2011). .. 42

Figure 1.10‎ :A) seated B) standing (Zafrulla et al., 2011)...... 43

Figure 1.11‎ : hand gesture recognition process (Z. Renet et al., 2011) ...... 45

Figure 1.12‎ : 14 Gestures commands and four arithmetic operations (Z. Renet et al., 2011) ...... 45

Figure 1.13‎ : A) addition operation 3+9=12, B) multiplication operation 5*8=40 (Z. Renet et al.,

2011) ...... 46

Figure 1.14‎ : three gestures for Rock-paper-scissors game (Z. Renet et al., 2011) ...... 46

Figure 1.15‎ : two examples of Rock-paper-scissors game (Z. Renet et al., 2011) ...... 47

Figure 1.16‎ : a) Offline step. From multiple 3D face instances the 3DMM is fit to obtain a person specific 3D model b)-d) online steps. b) The person model is registered at each instant to multimodal data to retrieve the head pose c) Head stabilization computed from the inverse head

VI pose parameters and 3D mesh, creating a frontal pose face image the final gaze vector is corrected according to the estimated head pose. d) Obtained gaze vectors (in red our estimation and in green the ground truth). (Albertoet al., 2012) ...... 49

Figure 1.17‎ : skeleton joints for body (5), ...... 51

Figure 1.18‎ : Process of proposed approach ...... 52

Figure 1.19‎ : a. Raised hand right algorithm b) Raised hand right algorithm ...... 52

Figure 1.20‎ : short hopping algorithm ...... 53

Figure 2.1‎ : Kinect selection algorithm (Salous et al,2015) ...... 61

Figure 2.2‎ : percentages for each area ...... 62

Figure 3.1‎ : Plan views (a) and 3D overview (b) of our OmniKinect setup. In (a), vibrating

Kinects are marked green and not vibrating Kinects red. (Kainz et al 2012) ...... 64

Figure 3.2‎ : The StbTracker calibration target (1300 _400_500mm) to gain initial extrinsic camera parameters (a) and the initial calibration view (b) showing the coordinate frame center for one example configuration using nine Kinects. (c) Shows a point cloud rendering of the depth calibration target before (left) and after (right) correction with the camera viewing rays (white lines). (Kainz et al 2012) ...... 65

Figure 3.3‎ : Left: example of the capabilities of the Kinect SDK. Two individuals are tracked, and their skeleton is super impressed on the image. The upper-right box shows segmentation in depth domain. Right: system architecture (Satta et al, .2012)...... 66

Figure 3.4‎ : Kinect camera depth map with interference from parallel Kinect (Sumar, .2011) .... 67

Figure 3.5‎ : Toy train tracking scenario with four Kinect (Faion et al, .2012) ...... 68

VII

Figure 3.6‎ : Illustration of raw shift variances, measured over 64 frames for a different number of active Kinects. Green pixels with varying brightness indicate valid values: black for 0 and green for 1.Red indicates at least one invalid value. (Faion et al, .2012)...... 68

Figure 3.7‎ : Three performers using PESI (Correia,. 2013)...... 69

Figure 3.8:‎ A diagram with the different PESI elements (Correia,. 2013)...... 70

Figure 3.9:‎ capturing setup consists of three Kinects placed in a small half circle with an angular spacing of 45 between each other (left diagram). (Ruhl et al., 2011)...... 71

Figure 3.10‎ : this solution by (Berger et al., 2011), that solves the problem that RGB and depth sensors cannot be calibrated simultaneously. Thus, we use binary surface patterns, e.g. a checkerboard consisting of white diffuse and mirroring patches (left). In the depth (middle) and

IR image (right, threshold for better visibility) the pattern becomes clearly distinguishable. (Ruhl et al,. 2011)...... 71

Figure 3.11‎ : three kinects to scan human body without interference between them (Tong et al.,

2012) ...... 74

Figure 3.12‎ :kinect with disk revolving motor platform (Scholz et al., 2011) ...... 75

Figure 3.13:‎ a) kinect with motor and rubber bands, acrylic frame and accelerometer (Butler et al., 2012), b) kinect with motor that cause motion .(Maimone & Fuchs, 2012) ...... 76

Figure 3.14‎ : two kinect in front of each other (Gatto et al,. 2012) ...... 78

Figure 3.15‎ : 3 kinects with angle between the kinects (Caon et al,. 2011) ...... 78

Figure 3.16‎ : Multi-Kinect setup config 1 which is to connect all Kinects on the same PC , (Kainz et al., 2012) ...... 80

VIII

Figure 3.17‎ : Multi-Kinect Setup config 2. A provision where every Kinect is linked to a mini-

PC, and all are connected to a central server responsible for collecting sensor data(Sattaet al.,2013)...... 81

Figure 3.18‎ : Multi-Kinect config 3. which uses several Pocket PCs to transmit data Kinects to a server, which is also connected to a Kinect (Williamson & Jr, 2012)...... 81

Figure 3.19‎ : This is a representation of the “SAS” with the location of the four Kinects(Salous et al,2014) ...... 88

Figure 3.20‎ : The top-left Kinect‟s location and FOV. Top and side views of the “SAS” ...... 90

Figure 3.21:‎ The top-right Kinect‟s location and FOV. Top and side views of the “SAS” ...... 90

Figure 3.22:‎ The bottom-left Kinect‟s location and FOV. Top and side views of the “SAS” ..... 91

Figure 3.23:‎ The bottom-right Kinect‟s location and FOV. Top and side views of the “SAS” ... 91

Figure 3.24:‎ Overlapping area between upper-left and lower-left Kinect ...... 92

Figure 3.25‎ : Overlapping area between both top Kinects ...... 93

Figure 3.26:‎ Overlapping area between upper-left and lower-right Kinects ...... 93

Figure 3.27:‎ Overlapping area between lower-left and upper-right Kinects ...... 94

Figure 3.28:‎ Overlapping area between both bottom Kinects ...... 94

Figure 3.29‎ : Overlapping area between lower-right and upper-right Kinects ...... 95

Figure 3.30:‎ Overlapping areas according to their level of reliability. There are three types of zones, Red, Orange and Green area...... 96

Figure 3.31:‎ Example of a successful attempt at tracking a user in the “SAS” ...... 96

Figure 3.32:‎ The four areas inside the “SAS” that will be further analyzed and segmented into specific fusion rule sets...... 97

Figure 3.33:‎ Fusion rules in the top-right corner ...... 98

IX

Figure 3.34:‎ Fusion rules in the top-left corner ...... 99

Figure 3.35:‎ Fusion rules in the bottom-left corner ...... 99

Figure 3.36:‎ Fusion rules for the bottom-right corner ...... 100

Figure 3.37:‎ Example of screenshot...... 101

Figure 3.38:‎ Camera Calibration Toolbox ...... 101

Figure 3.39:a)‎ Clicking on the extreme corners b) corners‟ locations c) Extracted corners ...... 103

Figure 3.40:‎ Simultaneous Extrinsic Calibration of two Kinects, checkboard captured by two kinects at same time ...... 104

Figure 4.1: Posture detection algorithm ...... 109

Figure 4.2: Participants in the middle of the experiment performing the three gestures a) Raising the left hand, b) Raising the right hand, c) Short hopping ...... 113

Figure 5.1: Level of accuracy of the gesture recognition module for each of the three tested gestures and both Kinect configurations ...... 115

Figure 5.2:False‎ detection result, single Vs 4 kinects setup ...... 115

List of tables

Table 1 : Multi-Kinect acquisition – comparison between three material configurations ...... 82

Table 2 : Review of existing solutions created by researchers ...... 83

Table 3 : proposed architecture with their needs in terms of software, wiring and hardware ...... 86

Table 4: hardware used in our project...... 111

Table 5: hardware used in our project...... 112

Table 6: software used in our project ...... 112

X

Dedications

I dedicate this dissertation to my mother and father. From an early age they instilled in me a desire to learn and made sacrifices so I would have access to a high quality education. Without their support and guidance I wouldn’t be where I am today. This work is also dedicated to my wife Manar. She has always believed in me and has offered reassurance throughout the process. I could not have accomplished as much as I have without her support and understanding.

XI

Acknowledgements

I would like to first thank both my academic advisors, Dr. Khaldoun ZREIK and Dr. Safwan

CHENDEB who has been guiding my study and teaching me how to do great research from big pictures (like finding important and hard problems to tackle) to every detail (such as presentation skills, writing skills and proving techniques). I am grateful to Dr. Laure LEROY, who is extremely supportive in all of my researches.

I am grateful Dr Taha RIDENE who shared his meticulous research and insights that supported and expanded my own work.

I must acknowledge as well the many friends, colleagues, students and teachers and supported my research and writing efforts over the years. They have consistently helped me keep perspective on what is important in life and shown me how to deal with reality.

Last, but not least, I would like to thank my wife Manar for her understanding and love during the past few years. Her support and encouragement was in the end what made this dissertation possible. My parents receive my deepest gratitude and love for their dedication and the many years of support during my undergraduate studies that provided the foundation for this work.

XII

Table of contents

1 Chapter one: Introduction ...... 21

1.1 Virtual reality ...... 21

1.2 Virtual Reality Importance and Applications...... 22

1.3 Gestural interaction ...... 25

1.3.1 Classification of Gestures ...... 27

1.3.1.1 Irrelevant/Manipulative Gestures ...... 27

1.3.1.2 Side Effect of Expressive Behavior ...... 28

1.3.1.3 Symbolic Gesture ...... 28

1.3.1.4 Interactional Gesture ...... 28

1.3.1.5 Referential/Pointing Gesture ...... 28

1.3.2 Functionality of gestures ...... 29

1.3.2.1 Deictic Gestures ...... 29

1.3.2.2 Iconic gestures ...... 29

1.3.2.3 Metaphoric gestures ...... 29

1.3.2.4 Beat gestures ...... 29

1.3.3 Designing Gesture Interfaces ...... 30

1.3.4 Gestural Interface problems ...... 30

1.3.4.1 Customized 3D Gesture Recognition ...... 30

1.3.4.2 Latency ...... 31

XIII

1.3.4.3 Using Context ...... 31

1.3.4.4 Ecological Validity ...... 31

1.4 Virtual reality cave « SAS » ...... 32

1.5 Kinect sensor ...... 32

1.5.1 Kinect structure ...... 33

1.5.1.1 Depth sensor ...... 33

1.5.1.2 RGB camera ...... 34

1.5.1.3 Multi- array microphone ...... 35

1.5.2 Kinect applications...... 36

1.5.3 Weakness of Kinect ...... 49

1.5.3.1 Visual Field...... 49

1.5.3.2 Persistence of the skeleton ...... 49

1.6 Proposed approach ...... 50

1.7 Summary ...... 53

2 Chapter two: Tracking ...... 55

2.1 Overview of non-optical tracking systems for Virtual Reality ...... 55

2.1.1 Electromagnetic tracking systems...... 55

2.1.2 Acoustic tracking systems...... 56

2.1.3 Mechanical tracking systems ...... 56

2.2 Overview of optical tracking systems with visible markers ...... 56

XIV

2.2.1 ARToolkit ...... 58

2.2.2 ARTracking...... 59

2.2.3 OptiTrack ...... 59

2.3 Proposed tracking algorithm ...... 60

2.3.1 Simulation ...... 61

2.4 Summary ...... 62

3 Chapter three: Multi kinects Module ...... 64

3.1 Multi kinects applications ...... 64

3.1.1 OmniKinect : Real-Time Dense Volumetric Data Acquisition and Applications .. 64

3.1.2 Real-time appearance-based person re-identification over multiple

KinectTMcameras ...... 65

3.1.3 Feasability of Fast Image Processing Using Multiple Kinect Cameras on a Portable

Platform 66

3.1.4 Intelligent sensor-scheduling for multi-Kinect-tracking ...... 67

3.1.5 PESI (Participative and Enacting Sonic Interaction): Extending Mobile Music

Instruments with Social Interaction ...... 69

3.1.6 Scanning 3D full human bodies using Kinects ...... 70

3.1.7 The capturing of turbulent gas flows using multiple Kinects ...... 70

3.2 Multi kinects advantages ...... 72

3.2.1 Big coverage space ...... 72

XV

3.2.2 Low cost ...... 72

3.2.3 Ease of Installation ...... 72

3.3 Multi kinects issues ...... 72

3.3.1 USB band width ...... 73

3.3.2 Interferences between kinects ...... 73

3.3.2.1 Avoiding interference ...... 74

3.3.2.2 Time multiplexing ...... 75

3.3.2.3 Vibrations ...... 75

3.3.2.4 Hole Filling ...... 76

3.3.3 (Maimone, Bidwell, Peng, & Fuchs, 2012) developed a framework that doing

multiple tasks such as: smoothing, data merging, surface generation, and color correction

and hole filling. Interference between Kinects causes black holes in depth images

corresponding to missing data. Hole filling fills these missing points rather than correcting

grossly erroneous depth values. Kinect Calibration ...... 76

3.3.4 Kinects setup ...... 77

3.3.5 Data synchronization ...... 78

3.4 Installation multi kinects inside the SAS ...... 79

3.4.1 Material configuration ...... 79

3.4.1.1 Multiple Kinects on a single PC (config 1) ...... 79

3.4.1.2 Each Kinect on a PC + One Server (config 2) ...... 80

3.4.1.3 Each Kinect on a PC with one also acting as a server (config 3) ...... 81

XVI

3.4.1.4 Comparison an review of existing solutions...... 82

3.4.1.5 Proposed architecture ...... 85

3.4.2 Geometric dispatching ...... 88

3.4.2.1 Top-Left Kinect ...... 90

3.4.2.2 Top-Right Kinect ...... 90

3.4.2.3 Bottom-Left Kinect...... 91

3.4.2.4 Bottom-Right Kinect ...... 91

3.4.2.5 Overlapping between Kinects‟ FOV ...... 92

3.4.2.5.1 Overlapping between the top-left and bottom-left Kinects ...... 92

3.4.2.5.2 Overlapping between both top Kinects ...... 93

3.4.2.5.3 Overlapping between top-left and bottom-right Kinects ...... 93

3.4.2.5.4 Overlapping between bottom-left and top-right Kinects ...... 94

3.4.2.5.5 Overlapping between both bottom Kinects ...... 94

3.4.2.5.6 Overlapping between bottom-right and top-right Kinects ...... 95

3.4.2.6 Segmentation of the “SAS” depending on the multiple Kinects‟ FOV ...... 95

3.4.3 Fusion rules ...... 97

3.4.3.1 Fusion rules for the upper-right corner of the “SAS” ...... 98

3.4.3.2 Fusion rules for the upper-left corner of the “SAS” ...... 99

3.4.3.3 Fusion rules for the lower-left corner of the “SAS” ...... 99

3.4.3.4 Fusion rules for the lower-right corner of the “SAS” ...... 100

XVII

3.5 Kinect Calibration ...... 100

3.5.1 Calibration process...... 100

3.5.2 Intrinsic calibration ...... 102

3.5.3 Extrinsic calibration ...... 103

3.6 Summary ...... 104

4 Chapter four: Validation of system ...... 106

4.1 Protocol of test ...... 106

4.1.1 Posture data collection and posture detection ...... 106

4.1.2 Posture detection algorithm ...... 106

4.2 Hypothesis ...... 109

4.3 Constraints of the CAVE...... 110

4.4 False detection ...... 110

4.5 Material and method...... 111

4.5.1 Hardware ...... 111

4.5.2 Wiring ...... 112

4.5.3 Software ...... 112

4.6 Postures Detection inside the SAS ...... 112

4.7 Participants ...... 113

5 Results ...... 114

5.1 Postures detection inside the SAS ...... 114

XVIII

5.1.1 False detections ...... 115

6 Chapter six: Conclusion and future work ...... 117

6.1 Objectives and several contributions...... 117

6.1.1 Adapting the multi-kinect system to the CAVE...... 117

6.1.2 Addressing the Kinects‟ FOV‟s overlapping in the “SAS” ...... 117

6.1.3 The tracking system ...... 118

6.1.4 The gesture recognition algorithm for the “SAS” ...... 118

6.1.5 Data fusion with multi-Kinect data ...... 118

6.1.6 Data transmission using the UDP network protocol ...... 119

6.2 Summary ...... 119

6.3 Future Work ...... 119

7 References ...... 123

XIX

Université Paris 8 - Vincennes - Saint-Denis

Laboratoire Paragraphe (EA 349) École doctorale Cognition, Langage, Interaction

Mention: Informatique

Thèse présentée et soutenue publiquement par

Saleh SALOUS

Fusion de données multi-Kinect visant à améliorer l’interaction gestuelle au sein d’une installation de réalité virtuelle

Thèse dirigée par Khaldoun ZREIK Encadrée par Safwan CHENDEB et Laure LEROY

Le 23 Novembre, 2015

Jury :

Pr. Ioannis Kanellos, Telecom Bretagne, Brest Rapporteur

Pr. Fouad Badran, Cnam-Paris Rapporteur

Pr. Benoit Geller, ENSTA Examinateur

Dr. Safwan Chendeb, Université Paris 8 Examinateur

Dr. Taha Riden, ENSTA Examinateur

Dr. Laure Leroy, Université Paris 8 Examinateur

1 Chapter one: Introduction

1.1 Virtual reality

Virtual Reality is one of the most interesting domains in computer science as it involves multiple fields at the same time. The various technologies interacting together in VR include immersive computer-generated three-dimensional environments and real-time interaction cycles using those environments. However, virtual reality is a computer-generated world that digitally transcribes real-world interactions in it. This operation needs high performance computer graphics to achieve acceptable level of realism, and it also needs a real time response system to succeed in providing an immersive experience for the end user. (Olanda et al., 2006) Our research objective is to provide a low-cost multi-Kinect setup for the CAVE “Le SAS”. The process described in the thesis can be divided into several steps: the Kinect set-up, the handling of the user-tracking process and the interaction with the SAS. We use four kinects module to track the user and capture its gestures. By this module comes into play as the CAVE requires recognizing the user‟s gestures in order to interpret them into an input.

The methodology we chose to follow as to implement the Kinects is as follows. The Kinect is an inexpensive sensor with motion recognition abilities and a satisfactory range. As we took into account the potential sensors that can be used with the SAS, the Kinect was selected due to its cost and potential in a scalable system. The SAS uses an interaction server that is scalable due to the implementation of modules attributed to each type of sensor. (Ridene et al., 2011). As a result, the choice of a Kinect complied with the SAS‟ features.

21

Our methodology behind the process involving the tracking of users is related to the choice of sensor discussed in the previous paragraph. Because the Kinect sensor displays some limitations in terms of range and IR interferences, our approach was influenced by these newfound constraints. In order to compensate for the inability for a single Kinect to cover the entire SAS, our strategy involves determining tracking accuracy levels and select data from the most accurate

Kinects.

The last challenge tackled in this thesis is the navigation in the virtual environment through interaction between the user and the SAS. The methodology in deciding how to face this topic is as follows. We discussed the various types of existing gestures, their advantages and drawbacks, and chose specific gestures complying with the SAS‟s constraints. Testing the gestures was subsequently necessary as to gauge the feasibility and effectiveness of the chosen gestures as well as the SAS‟ Multi-Kinect system‟s ability to detect them.

1.2 Virtual Reality Importance and Applications

Virtual Reality can be used to represent any type of world, be it at recreation of a real place or a fictional location. Moreover, Virtual Reality benefits from an advanced and powerful ability to interact with, represent, reflect, navigate and visualize an environment. As the worlds created by

VR are computer simulations, the amount of risks for the end user compared to a similar real- world environment and the restrictions imposed on the virtual world are minimal. For these reasons virtual reality is used in many applications like medicine, entertainment, scientific and general education, business, games, geoscience, art, engineering, agriculture, government defense, architecture, geographical information system and media. (Manseur, 2005) (Krogh,

2000) (King et al., 2005) (Kadavasal & Dhara, 2007) (Lin & Loftin, 1998) (Gaitatzes & Roussou,

2002) (Hall et al., 2002) (Tanriverdi & Jacob, 2001) (Basori, 2008).

-Virtual Reality in Education

Virtual Reality plays an important role in ther field of education as it can be used in many ways such as virtual classrooms, virtual labs and virtual flight training. These kind of applications increase their users‟ capabilities, experience, understanding and skills in a way that affects the entire education process. As building laboratories is usually costly and time-consuming, providing an efficient training environment can become an expensive process. Moreover, several discipline-oriented labs such as medical and flight labs in particular require expensive equipment to properly function. Virtual reality is a cost-cutting technology that allows the user to train, interact and learn as many times as required with fewer cost constraints than training in a physical environment. (Bonfigli et al., 2002).

-Virtual Reality in Engineering

Engineering is composed of many disciplines such as Chemical, Civil, Mechanical and Electrical engineering. Each of them can adopt Virtual Reality to assist, increase and speed up the creation of models and interactions with those. For chemical engineering, it is possible to represent a complex Ion model that shows the relationship between atoms of the Ion. In this application, VR comes into play as it becomes possible to interact with this model by rotating it to obtain various views of the chemical. (Manseur, 2005). Mechanical Engineering use Virtual Reality extensively in order to visualize complete systems with all the cylinders and parts present in it. A simple mouse click on a cylinder can return its volume, angle, dimensions as well as other information

(Manseur, 2005). For agricultural engineering, Virtual Reality can study and analyze parameters and factors that affect some products‟ yields and quality using outdoor mobile computers. (King et al., 2005).

-Virtual Reality in Science Education

23

Various examples of VR being used to create representations of scientific objects such as the solar system exist. A VR recreation of the solar system offers the ability to illustrate its colors, natural satellites, rings, orientation, planet locations, relative orbits and relative sizes. When using

Virtual Reality all these elements can represented easily (Olanda et al., 2006). Another example for Virtual Reality in Biology. Indeed, the complexity of the human body‟s reactions can be illustrated and visualized by Virtual Reality. This facilitates the process of simulating reactions to understand the specific mechanisms that are triggered. (Manseur, 2005). One of the most interesting projects related to the solar system is a virtual representation of the Mars planet that intends to educate people on the topography and orography of Mars(Olanda et al., 2006). Our own planet has also been modeled in a virtual 3D representation that can among other uses allow children to have a better understanding of the concept of a spherical planet than 2D flat representations available in books. Geoscientists can rely on VR as well as they need to model sub-surfaces. In order to accomplish such tasks, data visualization techniques are required to perform modeling and Virtual Reality plays a significant potential in the modeling, interpretation and analysis of subsurface in Virtual mode. (Allison & Hodges, 2000).

-Virtual Reality in Business

Virtual Reality provides several functionalities in Business Enterprise. Virtual business meetings can be a cost-effective alternative to physical meetings. VR can also be used to build virtual stores and create models for distance learning. (Kadavasal & Dhara, 2007)

-Virtual Reality in Entertainment

VR„s main application in entertainment mainly mainly resides in games. The continous

development of gaming technology and of the gaming market offers an inventive for video

game companies to push new forms of gaming. As a result, game developers adapt

24

features like Human Haptic Emotion, Virtual Reality Technology and game elements such as

Competing Tasks and Specific Target extensively to build creative and interesting games like

Dance Dance Revolution, Flight Simulator, Alien Arena 2009, Guitar Hero, race car, Counter

Strike, Ace online, Idol Karaoke, Allegiance and Rock Band. The evolution of VR technology

also fostered the development of dedicated VR gaming devices such as the Rift. This

type of VR headset offers an immersive experience by simulating a 3D environment and

directly projecting it to the player.

Another place you can use Virtual Reality is in specific theatres and buildings such as the

Virtual Theatre that helps and enhances the visits by allowing visitors to interact with

immersive historical scenarios using special devices like stereographic glasses and Dolby

surround effects. (Bonfigli et al., 2002).

-Virtual Reality in Medical Research

One of the main aims of Virtual Reality is to represent real world in a virtual form, therefore it„s used widely for reconstruction of organs, mental rehabilitation craniofacial surgery and neurosurgery simulation. For example, a doctor can show their students a complete simulation of a specific operation1.

1.3 Gestural interaction

In the field of Virtual Reality, one of the most important aspects is the interaction between the human user and the computers processing the virtual environment. In order to create a fluid and seamless user experience, it is necessary to properly track the user, recognize its inputs and successfully apply them to the VR scene. Indeed, the number of potential interactions is high, and this step is essential to provide proper interaction between the users and the VR environment.

25

We have to focus in our project on gestural interaction in virtual reality environment. For this we discuss computer technology as a tool service activity. The adaptation of this tool requires consideration of human activity, cognitive functioning of the user, its objectives, its needs and requirements (Bobillier-Chaumont et al. 2005). We need to characterize and describe some

Human aspects related to its environment, knowledge and its ability to adapt.

1 http://www.se.rit.edu/~jrv/research/ar/introduction.html#Section1

26

1.3.1 Classification of Gestures

Classification of gestures can be classified according to their function, based on inferring intent and assisting in the understanding of human activity should closely relate gesture with limited categories of intent in situated human activity. The categories of the broad classification presented here thus correspond to and allow the attribution of limited kinds of intent to humans.

This classification is developed as an aid for helping robots to obtain limited recognition of placed human gestural motion, so as to be able to respond properly if required, while these robots are working in an environment of ambient human activity (such as a home or office).

Applications of this classification will require the mapping of physical appearance of gestural motion in interactional contexts to the five gestural( Garzotto, 2012).

1.3.1.1 Irrelevant/Manipulative Gestures

These types of gestures include irrelevant gestures, body/manipulator motion, side effects of motor behavior, and actions on objects. Generally characterized, created by a human is here understood as doing something to control the non-animate environment or the human's relationship to it (such as position). Gestural motions in this class are manipulative actions (in this sense) and their side effects on body movement. These `gestures' are neither communicative nor socially interactive, but instances and effects of human motion. They may be salient, but are not movements that are mainly employed to communicate or engage a partner in interaction.

Cases include, e.g. motion of the arms and hands when walking; tapping of the fingers; playing with a paper clip; brushing hair away from the face with the hand; scratching; grasping a cup in order to drink its contents ( Garzotto, 2012).

1.3.1.2 Side Effect of Expressive Behavior

Motion of hands, arms and face with during communicating with others, work as a part of the overall communicative behavior, but without any specific interactive, communicative, symbolic, or referential roles. Example: person‟s talk excitedly raising and moving their hands in correlation with changes in voice prosody, rhythm, or emphasis of speech (Garzotto, 2012).

1.3.1.3 Symbolic Gesture

These types of gestures are often used in immersive virtual reality environment when user cannot find a real word to to traditional input devices. Each culture has set of limited, circumscribed set of gestural motions that have specific, prescribed interpretations. Symbolic gesture interfaces.

American Sign Language gestures also fall into this category (Garzotto, 2012).

1.3.1.4 Interactional Gesture

Interactional gestures usually to control the interaction with a partner such as initiate, maintain, invite, synchronize, organize or terminate a particular interactive, cooperative behavior: nodding the head indicating that one is listening. In this category gestures are considered mediators for cooperative action (Garzotto, 2012).

1.3.1.5 Referential/Pointing Gesture

These are used to refer to or to indicate objects of interest, physically present objects, persons, directions or locations the environment or indication of locations in space being used as proxies to represent absent referents in discourse (Garzotto, 2012).

28

1.3.2 Functionality of gestures

1.3.2.1 Deictic Gestures

These types of gestures are mostly seen in human computer interaction (HCI) and are the gestures of pointing, or directing the listener‟s attention to specific events or objects in the environment.

Deictic gestures refer to actual physical entities and locations, or to spaces that have previously been marked as relating to some idea or concept (Garzotto, 2012).

1.3.2.2 Iconic gestures

These gestures are used to send information about the size, shape or orientation of the object.

Characterized by the form of the gesture some features of the action or event being described. For example, when someone says the plane flew like this”, while moving their hand through the air like the flight path of the aircraft (Garzotto, 2012).

1.3.2.3 Metaphoric gestures

In this type of gestures, the concept they represent has no physical form; instead the form of the gesture comes from a common metaphor.” For example, a speaker might say, “It happened over and over again,” while repeatedly tracing a circle (Garzotto, 2012).

1.3.2.4 Beat gestures

This type of gestures related to conversational process. Beat gestures are small baton-like movements that do not change in form with the content of the accompanying speech. Gesture types are defined in terms of the role they play in the discourse, rather than in terms of a specific hand trajectory or class of trajectories. Indeed, researchers have found that there is no canonical set of hand trajectories that define each gesture class. For example, Cassell states, “Deictics do not have to be pointing index fingers.” For non-deictic gestures, it is even harder to characterize a

29

“typical” set of hand shapes or trajectories; there are perhaps an infinite variety of possible iconic and metaphoric gestures (Garzotto, 2012).

1.3.3 Designing Gesture Interfaces

Several points must be kept in mind when designing gestures interfaces, naturalness, adaptability, coordination and fatigue. Naturalness related to set of characteristics if it‟s useful to control the application such as hand signs, absence of an intermediary device, Task control maps well to hand actions and pre-acquired sensorimotor skills. Adaptability means rapidly switch between modes and smoothly. Coordination means how many degrees of freedom do the tasks require?

Fatigue related to effort needed to do the gesture. Gestural commands must therefore be concise and quick to issue in order to minimize effort. In particular, the design of gestural commands must avoid gestures that require high precision over' a long period of time (Billinghurst M.,

2011).

1.3.4 Gestural Interface problems

1.3.4.1 Customized 3D Gesture Recognition

Several issues should be addressed to support customized 3D gestural interaction, first, how users specify what gestures they want to perform for a given task. Second, once these gestures are specified, if using machine learning, how do we get enough data to train the classification algorithms without burdening the user? Ideally, the user should only need to specify a gesture just once. Third, how do we deal with user defined gestures that are very similar to each other? This problem occurs frequently in all kinds of gestures recognition, but the difference in this case is that the users are specifying the 3D gesture and we want them to use whatever gesture they come up with(Jego et al., 2013).

30

1.3.4.2 Latency

This problem related mainly to accuracy and fast of 3D gesture recognition process. It needs to be both fast and accurate to make 3D gestural user interfaces usable and compelling. In fact, the recognition component needs to be somewhat faster than real time because responses based on

3D gestures needs to occur at the moment a user finishes a gesture. Thus, the gesture needs to be recognized a little bit before the user finishes it. This speed requirement makes latency an important problem that needs to be addressed to ensure fluid and natural user experiences. In addition, as sensors get better at acquiring a user‟s position, orientation, and motion in space, the amount of data that must be processed will increase making the latency issue a continuing problem.

1.3.4.3 Using Context

This problem related to information available for recognizing 3D gestures in a 3D gestural interface. It can help to reduce the amount of possible 3D gestures that could be recognizing data by one time and it can assist in improving the recognition accuracy. Also the context is directly integrated into 3D gesture recognizers.

1.3.4.4 Ecological Validity

This problem discusses accuracy and how accurate is the 3D gestural interface when used in its intended setting. Most studies that explore a recognizer‟s accuracy are constrained experiments intended to evaluate the recognizer by having users perform each available gesture n number of times.

31

1.4 Virtual reality cave « SAS »

SAS is an immersive room, or CAVE, which has two screens, one located in front of the user

(named "Face") and the other at his feet (named "Sol"). Each screen is 3 meters wide and 4m long. Floodlights diffuse a stereoscopic image on these screens and stereo speakers produce sound on each side of the stage as in Figure 1‎ .1. AR Track1 equipped cameras; the SAS is also able to track the position of the user via small sensors in order to enable it to interact with the virtual set via the displacement and rotation.

Figure 1‎ .1: Graphic representation of "SAS" (Ridene et al., 2013)

1.5 Kinect sensor

Kinect is one of the most modern interactive device; it uses sensors for capturing audio and video and such sensors play an important role in . The Kinect itself is a tool for which various types of games are developed and its abilities such as voice recognition make it eligible for use in other fields. “Whole hand- input to virtual environments would be the most natural method to interact with natural world” (Kessler et al.,1995). A powerful feature of the Kinect is that there is no need for a controller or any other accessory; it‟s the only necessary device for the user.

32

1.5.1 Kinect structure

A Kinect sensor can be described as a horizontal bar connected to a small base containing a motorized pivot. Kinect consists of three main parts: a RGB camera, a depth sensor and a multi- array microphone as in Figure 1‎ .2 . These parts are working together to provide full-body 3D motion capture, voice recognition, and facial recognition capabilities. The pivot motor requires more power than a USB port can supply, so the Kinect uses another connector combining USB communication with an additional power adapter.

Figure 1‎ .2: Kinect consists of Infra-red (IR) projector, IR camera and RGB camera (Smisek, 2011) 1.5.1.1 Depth sensor

The Depth sensor plays an important role in a Kinect device; it‟s responsible for capturing video data in 3D under any lighting conditions. It consists of an infrared laser projector combined with a monochrome CMOS sensor. The range for a depth sensor is adjustable and it‟s automatically calibrated by Kinect software taking into account the players‟ physical environment, the game play and the presence of furniture. The resolution for video streaming is VGA (640*480 pixels) with 11-bits depth (Popescu & Lungu, 2014).

33

1.5.1.2 RGB camera

The RGB camera uses 8-bit VGA resolution (640*480 pixels) with a Bayer color filter. The practical range limit is 1.2-3.5 m (3.9-11 ft). The area required to play with Kinect is roughly 6 m2. The sensor has a regular field of view of 57° horizontally and 43° vertically, while the motorized pivot is capable of titling the sensor up to 27° up or down (Merlier et al., 2011).

Figure 1‎ .3 shows Kinect‟s Depth Sensor field of view (FOV) with horizontal FOV (57°), vertical

FOV(43°), tilt angle (-27°- 27°) and range from 1.2 to 3.5 meters.

34

Figure 1‎ .3: Diagram showing the Kinect FOV by Mr. Riley Porter (8)

1.5.1.3 Multi- array microphone

Kinect has audio features; it contains four microphones capsules and operates with each channel processing 16-bit audio at a sampling rate of 16 kHz (Cruz et al., 2012) 35

1.5.2 Kinect applications

-Smart Rehabilitation tool

Kinect is used for many applications and applied in many situations. We use the Kinect motion sensor as a smart rehabilitation tool. Many people suffer from disabilities such as muscle atrophy and cerebral palsy that impair their physical abilities. As such, successful recovery through rehab activities for these people requires an important amount of motivation and support. One of the existing solutions is virtual reality; however, virtual reality can solve these problems by wearing some special sensors that track movements of disabled persons, but these sensors are not comfortable for them. Kinect can provide help for these activities without requiring the recovering patient to wear sensors. Furthermore, the Kinect sensor confers various functionalities that can improve the quality and efficiency of the recovery process. It can record movements and help the patient in performing the right gestures by signaling incorrect ones, singling out correct ones and providing medical staff with raw data outlining the patient‟s progress and motivation.

As a consequence, this automation of this part of the rehab process also reduces the therapists‟ workload, thus increasing the number of people who can be simultaneously helped by a given medical team. Such a process has a low level of complexity and is composed of a Kinect motion sensor, a database, video instructions and a voice reminder. These components work together to get an intelligent rehabilitation system. Error! Reference source not found. shows an example of exercise consisting of a set of steps. During this exercise, the Kinect doesn‟t move to the next step until the current one is accurately and correctly completed. The Kinect also is accurate: for example, to detect shoulders, a person must raise both hands 10 cm above his shoulders.

Strength of such exercises is that the therapist can adjust exercise to suit each person‟s needs. The rehab exercises for disabled people are various and can include tasks reminiscent of daily life activities such as brushing one‟s hair and teeth. Kinect training can be used in the case of cerebral

36

palsy and can help regularly performing regular exercise as people who have this problem should undergo a continual rehabilitation program to prevent the muscles from atrophying. (Chang et al,.

2011).

Figure 1‎ .4: Kinect detects movements of disabled persons(Chang et al,. 2011). - Evaluating Dance Performance

Another application that uses Kinect is the automatic evaluation of a dance performance and its comparison with a standard gold performance. The aim of this application is to teach dance to a lot of students at the same time, with no restrictions about the number of students or because of a lack of available space. The teaching process is very simple. There are two main actors: the teacher and the student. The teacher records a performance with detailed steps information and shares it with the students. The performance is editable by the teacher, who can modify, update, add, delete steps and change accuracy levels. Students try to mimic the steps given by the teacher and send the records to the teacher. Firstly, the Kinect calibrates the students, acquiring information about them such as height and body characteristics. Secondly, the Kinect tracks the student dancer skeleton movements. Finally, the teacher evaluates the students‟ dance movements and sends them feedback refine their moves. For this application, the Kinect uses the OpenNI

37

Software Development Kit (SDK). This SDK tracks the positions of 17 joints of the student body

(head, neck, torso, left and right collar, L/R shoulder, L/R elbow, L/R wrist, L/R Hip, L/R knee and L/R foot) as seen in Figure 1‎ .5. (Alexiadis et al., 2011).

Figure 1‎ .5: joints of human body (Alexiadis et al., 2011)

-Modeling of an indoor environment

This application extracts knowledge of an indoor space: items such as walls, floors, windows and objects inside provide data about a rich environment and the people living inside. Existing solutions like vision-based 3D modeling have limited success in their applications and have some technical problems such as lighting settings and less texture detail surfaces. Accurate 3D indoor modeling is possible but costly because it needs a demanding laser scanner that runs on special platforms such as robots. Kinect can be used for this application; it can scan a large environment up to 50 meters-long. This doesn‟t mean that it loses fine details; it also preserves fine details up to one centimeter accuracy. This capability of Kinect leads to other promising applications like measuring dimensions, interactive visualizations and 3D localization. It utilizes a depth and RGB camera for robust 3D modeling, completeness and dense coverage as seen in Figure 1‎ .6. Robust means the system can detect failures and alert the user to rewind or resume scanning.

Completeness means that the user can check the model at any time for coverage and quality. The

38

system can provide suggestions for an incomplete map. Dense coverage relates to the level of details for 3D space. Moreover, this system offers other services such as online feedback, tolerates human errors and helps to obtain complete scene coverage. (Du et al., 2011).

Figure 1‎ .6: User input and control of system (Du et al., 2011)

-Real-time 3D reconstruction and interaction using a moving depth camera

In this application we use a Kinect fusion technique. Kinect is considered low cost, popular. Only one feature of Kinect, the depth sensor, is used to track 3D poses of the user, allowing reconstruction in 3D models of the physical scene in real-time. The Kinect fusion of Lezadi adopts new GPU-based pipelines; it uses the core system for low-cost handheld scanning, geometry-aware augmented reality and physics-based interactions. These extensions allow object segmentation, enabling real-time multi-touch interactions anywhere, allowing any planar or non- planar reconstructed physical surface and user interaction directly in the front of Kinect sensor without camera tracking or reconstruction. The technique for K Kinect fusion uses Kinect camera, with K Kinect moving rapidly within the room to reconstruct geometric high quality 3D

39

models of the scene. The final step is to refine new physical models by filing holes. Figure 1‎ .7 shows differences between physical models; Figure 1‎ .7.b shows a model from a Kinect sensor;

Figure 1‎ .7.c uses the K Kinect fusion technique filling holes. (Lezadi et al, .2011).

Figure 1‎ .7: A) original image, B) model scan from Kinect, C) model obtained from Kinect fusion technique (Lezadi et al, .2011).

-Facial animation using Kinect

This application aims to detect user facial expressions in real time; this can be achieved by using a commercial Kinect 3D sensor. This is done by passing two steps: firstly the user makes a set of calibrated expressions, secondly these expressions are combined to existing database animations.

The combining step is pursued by using certain software packages like the generic facial blend shape rig. There are more than one application that can be applied for facial animation such as a virtual mirror, social interaction, digital gameplay and . Figure 1‎ .8 shows examples for facial animation, Figure 1‎ .8a and Figure 1‎ .8.c show user facial animation expression that are recognized by Kinect, Figure 1‎ .8.b and Figure 1‎ .8.d show corresponding facial expression in the database. (Weise et al., 2011).

40

Figure 1‎ .8: A) and C) show facial animation expressions by Kinect, B) and D) shows combined facial animations from database. (Weise et al., 2011). -American Sign Language Recognition

This application is targeted at deaf children; its aim is to develop, interact with deaf children and improve their memory. These children use a special language called American Sign Language

(ASL) and educational game applications were developed to help them learn this language. The

CopyCat system is the current solution for deaf children; this system consists of two main modules : sign language recognition and sign language phrase verification. This system is relatively complex when compared to a Kinect. It uses special colored gloves with embedded accelerometers to track and trace user movements as seen in Figure 1‎ .9. These accelerometers provide information on movements such as acceleration, direction, and rotation of the hand. The 41

game scenario itself is simple: the player should assign some sentences to the in-game hero; these sentences have a specific order i.e. subject, preposition and an object with one or two optional adjectives. Once the player made the motions to express the sentence, he or she should decide if this phrase is correct or not. The player is rewarded with points if the sentence was correctly transcribed in ASL, but loses points otherwise. The Kinect sensor is a new solution for deaf children and features significant advantages over the CopyCat system: it provides interactivity, user comfort, system robustness, system sustainability, lower cost, and ease of deployment.

Kinect doesn‟t need gloves or any special devices, sensors or accelerometers. The system requires a data collection step which can be achieved by gloves, Kinect and the application; it means that the player should register all possible phrases he needs to play the game with different situations, seated or standing as shown in Figure 1‎ .10.a and Figure 1‎ .10.b shows a seated situation and Figure 1‎ .10.b shows a standing situation. Game interface should provide some navigational buttons, like Start Signing button to begin signing, and Stop Signing Button to stop signing. (Zafrulla et al., 2011).

Figure 1‎ .9: A) Gloves with accelerometers B) detailed accelerometers (Zafrulla et al., 2011).

42

Figure 1‎ .10 :A) seated B) standing (Zafrulla et al., 2011).

-Teaching Natural Interaction

Interactive systems are very interesting topics related to information technology. Interactive systems have one or more interface; their main characteristic being that the user sends commands and waits for the results. Multiple models try to enhance interactions between the user and the machine: such as punch cards, RAND tablet, stylus, Elograph which is a touch device, video place and Flexible Machine Interfaces. The development of graphical user interfaces has provided new intuitive and interactive ways to interact with the system instead of cumbersome and outdated methods. One of the new approaches is to use Kinect as an excellent interactive device added to toolsets to increase interactivity between user and machine in classrooms. Kinect libraries feature functionalities such as recognition of pushing, steady, swiping and waving motions and other gestures, skeleton detection, user segmentation, multiple user detection, pose detection tracking of individual skeleton joints, accessing depth and video data, multiple point tracking, as well as various calibration and smoothing functions. Kinect can affect the classroom mode by dealing with new topics such as Human factors, Aspects of Application, Human-

Centered Evaluation, Developing Effective Interfaces, Accessibility, Emerging Technologies and

Human-Centered Computing. The framework is simple and easy; it consists of two main

43

components, hardware and software. For hardware it needs a kinect sensor; for software there are free libraries that can process kinect data such as OpenKinect, OpenNI and Microsoft kinect

SDK. (Villaroman et al., 2011).

-Robust Hand Gesture Recognition with Kinect Sensor

Hand gesture is one of the important issues in virtual reality, sign language recognition and computer games. It is also a way for people and machine to connect together. Existing solutions suffer from problems such as quality of the input image from optical cameras, light conditions and background clutter. In fact in both hand recognition and hand gesture fields, the challenging problems are related to how robustly detect the hand, how efficiently and accurately recognize the gesture of the hand. Renet has built a system which use Kinect sensor to achieve hand recognition gesture in a robust and efficient way; this application utilizes both depth map and color image to detect hand gesture recognition. For hand detection the application uses depth map and color image obtained by a Kinect sensor to detect the shape of the hand. Hand gesture it still a challenging problem with Kinect, whose resolution is 640X480; this resolution is effective enough to track a large object but not efficient enough for small objects such as a hand. To solve this problem they use a Finger-Earth Mover‟s Distance as a shape distance metric, it measures dissimilarities between different hand shapes, and this metric seems to be efficient and robust for hand variations and distortions. Figure 1‎ .11 shows hand gesture recognition with its steps and details. There are some applications for hand gesture recognition like arithmetic computation and

Rock-paper-scissors game. Figure 1‎ .12 shows 14 gestures that represent 0 to 9 digits and four operations: addition, subtraction, multiplication and division.

44

Figure 1‎ .11: hand gesture recognition process (Z. Renet et al., 2011)

Figure 1‎ .12: 14 Gestures commands and four arithmetic operations (Z. Renet et al., 2011)

Figure 1‎ .13 shows two complete arithmetic operations with corresponding gestures; Figure 1‎ .13.a shows an addition operation 3+9=12; Figure 1‎ .13.b shows a multiplication operation 5*8=40.

Another traditional game is the Rock-paper-scissors game. This game has simple rules: a rock breaks scissors; scissors cut paper; and paper wraps rock. It needs three main gestures as in

Figure 1‎ .14. These gestures consider weapons, the computer randomly selects a weapon, and then according to game rules the system can decide who the winner between man and computer.

Figure 1‎ .15. shows two examples of the game (Z. Renet et al., 2011).

45

Figure 1‎ .13: A) addition operation 3+9=12, B) multiplication operation 5*8=40 (Z. Renet et al., 2011)

Figure 1‎ .14 : three gestures for Rock-paper-scissors game (Z. Renet et al., 2011)

46

Figure 1‎ .15: two examples of Rock-paper-scissors game (Z. Renet et al., 2011)

-Augmented Reality Projection

Augmented reality is used as a way to a combine generated virtual scene with the real world. A lot of techniques are used for this purpose, but they have problems such as rendering graphics because this step is always done from a screen perspective, therefore users have to keep looking perpendicularly at the display and imagine a fixed view from it. The current solution combines object surface information detection through Kinect and perspective-corrected rendering and projection. In broad lines, the Kinect is initially used to get depth maps of the object, which allows detecting flat surfaces with gradient value and direction. Next, the virtual object is modified to match with real object dimensions. Then, the detected shape and orientation information is applied to each surface and reported to an x-axis projection. Finally, the user can have a realistic view of a virtual scene merged with reality (Tang, Lam, Stavness, & Fels, 2011).-

Gaze Estimation based on multimodal Kinect data

47

There are sets of characteristics that determine human behavior such as body gestures, gaze, facial expressions and emotions. There is a big challenge to develop algorithms that extract these characteristics in an accurate way. Alberto and others (Alberto et al., 2012) focus on gaze as it can play an important role in many cases due to it containing a wealth of information, such as where and what a person is looking at, which becomes significant in face to face conversations.

Another importance of gaze is helping psychologists to study larger corpus by creating assisting tools for disabled people lacking certain means of communication and group interaction. There are a lot of gaze estimation strategies which have developed over the last 30 years; these strategies suffer from many problems such as high cost, fixed head pose and restricted head motion which reduces the availability to the general public. Kinect can solve these problems easily without restrictions thanks to its multimodal input device that can use the depth sensor.

Depth sensor is accurate in the tracking of a 3D mesh model and is robust in estimating a person‟s head pose. For eye region rectification, this step needs training data collected via Kinect to determine the direction of the user‟s eye. The process passes through a set of steps: first estimating the head pose from the depth data, second mapping back the head image into a frontal pose and crop the resulting image around each eye, finally transform back the gaze direction to the world coordinate system. Figure 1‎ .16 shows gaze estimation steps, Figure 1‎ .16.a shows offline step that record a person face specific 3D model, Figure 1‎ .16.b shows a person model at each instant with multimodal data to retrieve the head pose, Figure 1‎ .16.c shows a cropped eye as the result image, Figure 1‎ .16.d shows the final result that determines the direction of the person gaze (Albertoet al., 2012).

48

Figure 1‎ .16: a) Offline step. From multiple 3D face instances the 3DMM is fit to obtain a person specific 3D model b)-d) online steps. b) The person model is registered at each instant to multimodal data to retrieve the head pose c) Head stabilization computed from the inverse head pose parameters and 3D mesh, creating a frontal pose face image the final gaze vector is corrected according to the estimated head pose. d) Obtained gaze vectors (in red our estimation and in green the ground truth). (Albertoet al., 2012)

1.5.3 Weakness of Kinect

We show that Kinect is very used in Virtual Reality application. However, this sensor presents some weakness that we want to improve.

1.5.3.1 Visual Field

We show in Figure 1‎ .3 that the field of view is of 57° horizontally. This is not enough to cover a big area of interaction. It is why all the application that we have present do have a limited interaction area. We would like to use the Kinect in a 3x4 meters area, and so, we will see how to use several Kinect and how to fusion their data to cover all the area.

1.5.3.2 Persistence of the skeleton

Sometimes, the Kinect and its SDK can lose track of some articulations. It happens when a part of the body is hidden by another or when the body is not entirely in the visual Field. In the presented applications which use skeleton data, the experiment places the subject before a wall or

49

in an open place and the subjects are always facing the sensor. We would like to allow the subject to move unrestrainedly in our area and not only in front of the Kinect with all visible members.

1.6 Proposed approach

Our purpose is improving capture of gestural interaction by fusing multi Kinects skeleton; we utilize 4 Kinects for the CAVE. In section 3.4.1 we show the proposed material configuration architecture, section 0 shows geometric dispatching for Kinects in the CAVE and section 3.4.3 shows data fusion rules between Kinects in the CAVE. Figure shows the CAVE with the 4

Kinects on the corners. Because each Kinect detects a user skeleton, we have 4 skeletons from 4

Kinects. We take only some joints from each skeleton: .head, left knee, right knee right, left elbow, right elbow, left hand and right hand as in Figure 1‎ .17. We take head joint for tracking, left elbow, left hand for raised left hand, right elbow, right hand for raised right hand and both knees for short hopping. The tracking and gesture recognition system features both local and server processing. The local processing is done in Kinect side, and its objective is to take user skeleton joints and test joint status for required joints in order to send data to the server. The server receives semi processed data from the 4 Kinects, and selects what joints data should be processed. Figure 1‎ .18 shows the overall process of proposed solution.

50

Figure 1‎ .17: skeleton joints for body1 ,

1 http://www.triballabs.net/2011/06/kinectapis/

51

Figure 1‎ .18: Process of proposed approach

In Figure 1‎ .18, we describe local and server processing side. Local processing is tasked with processing skeleton joints data for each Kinect and server processing handles semi processed data obtained from 4 Kinects. Local processing deals with joint status =2 and sends head status with its data directly to the server. The hand raising gesture recognition process is two-fold: first, the system checks whether the elbow and the hand joints‟ status = 2. Second, we perform some calculations as described in Figure 1‎ .19.a and Figure 1‎ .19.b.

Figure 1‎ .19: a. Raised hand right algorithm b) Raised hand right algorithm

For short hopping we do the same processing as with raised hands: we select knee joints with a status =2. Figure 1‎ .20 shows algorithm for short hopping gesture.

Figure 1‎ .20: short hopping algorithm

1.7 Summary

The topics discussed in this chapter are virtual reality, Kinect sensor, virtual reality CAVE, gestural interaction and our proposed approach. In section 0‎ 1.1 and 1.2 we cover the definition of virtual reality as well as its importance, applications and domains. In section Error! Reference source not found. we describe the Kinect sensor in details in term of structure, and provide data about the bundled cameras and microphones and the data that can be produced by the Kinect.

Section Error! Reference source not found. lists a set of single Kinect sensor applications. In section Error! Reference source not found. we talk about the virtual reality CAVE “SAS”. This section will focus on the SAS‟ properties, dimensions and tracking sensors. Section Error!

Reference source not found.‟s main subjects are the classifications of gestural interaction, the functionality of gestures, the design process of gesture interfaces and gestural interface problems. Finally in section 1.6 we discuss our proposed approach in term of number of kinects,

53

gestures, and skeleton joints, algorithms of gestures recognition, local processing and server processing side.

54

2 Chapter two: Tracking

2.1 Overview of non-optical tracking systems for Virtual Reality

2.1.1 Electromagnetic tracking systems

One of the most important tracking systems is Electromagnetic Tracking Systems (EMTS).

Medical applications increasingly use EMTS during the last few years. In general, EMTS

consists of three main components: field generator (FG), sensor unit and central control unit. The

FG uses several coils to generate a position varying magnetic field that is used to establish the

coordinate space. The sensor unit attached to the object contains small coils in which current is

induced via the magnetic field. By measuring the behavior of each coil, the position and

orientation of the object can be determined. Using this method, the positions of sensors are

detected when moving within the coordinate space. The central control unit serves to control the

field generator and capture data from the sensor unit. EMTS can provide three positions (X, Y,

and Z) and two or three orientation angles, and therefore are referred as 5-DOF or 6-DOF. One

important advantage of EMTS is that electromagnetic fields do not depend on line-of-sight for

operation. Therefore, it has been used extensively in motion capture and virtual reality

applications where complicated surroundings will limit the use of optical trackers by the line-of-

sight requirement. Since EMTS depends on the measurement of magnetic fields produced by the

FG or the transponder itself, the tracking units may be disturbed by the presence of any

electronic device that produces EM interference. Another limitation of EMTS application is the

tradeoff between system accuracy and working volume(Wen & Spielman, 2010).

55

2.1.2 Acoustic tracking systems

Acoustic tracking systems are a type of tracking system that sense and produce ultrasonic sound waves to identify the orientation and position of a target. They calculate the time taken for the ultrasonic sound to travel to a sensor. The sensors usually require staying in a stable position and location within the environment and the user must put on ultrasonic emitters. The calculation of the orientation as well as position of the target depending on the time on the time taken by the sound to hit the sensors is achieved by the system. However, the acoustic tracking system has shown various flaws and downsides. Sound passes by quite slowly, so the update‟s rate on a target's position is naturally slow. The efficiency of the system can be affected by the environment as the sound‟s speed through air often changes depending on the humidity, temperature or the barometric pressure found in the environment1.

2.1.3 Mechanical tracking systems

This tracking system is dependent on a physical link between a fixed reference point and the target. One of the many examples is that the mechanical tracking system located in the VR field, which is indeed a BOOM display. A BOOM display, an HMD, is attached on the rear of a mechanical arm consisting 2 points of articulation. The detection of the orientation and position of the system is done through the arm. The rate of update is quite high with mechanical tracking systems, but the demerit is that they limit range of motion for a user 1.

2.2 Overview of optical tracking systems with visible markers

These devices use light to calculate a target's orientation along with position. The signal emitter typically includes a group of infrared LEDs. The sensors consist of nothing but only cameras.

These cameras can understand the infrared light that has been emitted. The LEDs illuminates in a

56

1 http://www.vrs.org.uk/virtual-reality-gear/tracking.html“,22/1/2015.

57

fashion known as sequential pulses. The pulsed signals are recorded by the camera and then the information is sent to the processing unit of the system. Data can be extrapolated by this unit.

This will estimate the position as well as the orientation of the target. The upload rate of optical systems is quite fast which has in fact reduced the tenancy issue. The demerits of the system are that the line of sight between an LED and camera can be obscured, which interferes with the process of tracking. Infrared radiation or ambient light are also different ways that can make a system useless.

2.2.1 ARToolkit

ARToolkit is a software library for building Augmented Reality (AR) applications. These are applications that cover virtual imagery on the real world. For example, one application a three- dimensional virtual character to appear standing on a real card. It can be seen by the user in the head set display they are wearing. When the user moves the card, the virtual character moves with it and appears attached to the real object. One of the key difficulties in developing

Augmented Reality applications is the problem of tracking the user‟s viewpoint. In order to know from what viewpoint to draw the virtual imagery, the application needs to know where the user is looking in the real world. ARToolkit uses computer vision algorithms to solve this problem. The ARToolkit video tracking libraries calculate the real camera position and orientation relative to physical markers in real time. This enables the easy development of a wide range of Augmented Reality applications. Some of the features of ARToolkit such as Single camera position/orientation tracking, Tracking code that uses simple black squares, The ability to use any square marker patterns, Easy camera calibration code, Fast enough for real time AR applications, SGI IRIX, Linux, MacOS, Windows OS distributions and Distributed with

complete source code1.

2.2.2 ARTracking

Tracking technology is necessary in virtual Reality (VR) applications. VR allows user to perform tasks using real world movements and actions. The user sees a stereo image and is able to judge distances and proportions. He can also use his hands to manipulate virtual objects for a realistic interaction with the virtual content. VR is one of the major applications for ART tracking systems. ART delivers a flexible tracking system with the following characteristics: first, high precision to calculate the exact viewpoint and the location of interaction devices, low noise leading to a stable virtual image and interaction devices and low latency for instant movements in VR and reduced simulator sickness2.

2.2.3 OptiTrack

OptiTrack is considered as one of the main motion capture tool and 3D tracking hardware and software. It provides 3D precision, simpler real time workflows, on-site manufacturing enabling the industry‟s lowest pricing and free and open developer access. Many domains such as games and film production, university education and research, engineering, life science, sports performance and injury prevention, and many others use OptiTrack hardware and software.

Furthermore, OptiTrack‟s product line includes motion capture software and high-speed tracking cameras, contract engineering services, the GEARS golf training and club fitting solution, and affiliate consumer tracking products TrackIR and SmartNav. Entertainment customers include: 59

Activision, Electronic Arts, 343 Industries, US Army Game, Cloud Imperium, Square Enix,

Ready at Dawn, TV Globo, NetherRealm, Ubisoft, Rockstar Games, Crytek, Remedy, Game on

Audio, The Moving Picture Company, Animatrik Film Design, and other top studios and developers around the world. Engineering customers include: Boeing, KMel Robotics, NASA,

Oculus VR, Lockheed Martin, John Deere, Mechdyne, Under Armour, Stanford University,

Duke University, Laser Shot, Mitsubishi, and Dassault Systems.”3 .

2.3 Proposed tracking algorithm

Figure 2‎ .1 shows the Kinect selection algorithm. This algorithm extracts the head joint data

(position and orientation) from the Kinects that are the closest to the user. To achieve this, the depth coordinate of the joint returned by the Kinects are compared to each other. The selection algorithm collects the data from the 4 Kinects for the head and associates a status value to each for every skeleton joint head in every frame. This status value is 0 if the Kinect does not recognize a joint, 1 if it detects a joint but does not provide joint data (position and orientation), and 2 if the joint tracking‟s status is optimal. Based on this status values, the algorithm focuses on the joint data from the Kinects with the higher status value and sends it to the SAS‟s interaction server. When several Kinects track the user‟s joint data with excellent status on the same frame, one of the Kinects is randomly chosen. The algorithm also provides a failsafe mechanism for frames where no Kinect returns an acceptable status value. In these situations, the data sent to the server is related to the previous

1 http://www.hitl.washington.edu/artoolkit 2 http://www.ar-tracking.com/home 3 https://www.naturalpoint.com/optitrack/about/press

60

frame where at least one Kinect‟s status was optimal. That means the user stands in dead area which is not detected by any Kinect

Figure 2‎ .1: Kinect selection algorithm (Salous et al,2015) 2.3.1 Simulation

A user moved around the SAS, his skeleton was tracked by the sensors and his joint data was collected by our infrastructure. We chose to focus on head joint data for tracking purpose, and analyzed the data sent by the Kinects for this specific joint for every single frame.

Figure 2‎ .2 shows results of analyzed frames that collected during the simulation. On 83.89% of analyzed frames, there was at least one of the four Kinects that returned a status head joint value

= 2. On 52.56% of analyzed frames there was all Kinects that returned a status head joint=2.

Figure 2‎ .2: percentages for each area 2.4 Summary

This chapter discusses non-optical tracking systems, optical tracking systems and proposed tracking algorithm. In section 2.1 we talk about non-optical tracking system such as

Electromagnetic, Acoustic and Mechanical tracking system. Section 2.2 's topic is optical tracking system such as AR ToolKit, AR Tracking and OptiTrack tracking system. In section 2.3 we describe in details our proposed tracking algorithm that applied in CAVE. Section 2.3.1 provides simulation results of our proposed tracking algorithm.

62

63

3 Chapter three: Multi kinects Module

3.1 Multi kinects applications

3.1.1 OmniKinect : Real-Time Dense Volumetric Data Acquisition and Applications

Kainz et al [2012] use the data provided by multi-kinects to recreate a 3D representation of an object or a person for various purposes such as augmented reality or full-body scanning. This implication allows producing high quality volumetric reconstructions from multiple Kinects whilst overcoming systematic errors in the depth measurements. Moreover, the application provides image enhancements based on rendering by depth measurements, and a comparison with the results to Kinect Fusion.

OmniKinect additionally provides practical insight into achievable spatial and radial range and into bandwidth requirements for depth data acquisition. Finally, we present a number of practical applications of our system. Figure 3.1.a shows a schematic illustration of system setup and Figure 3.1.b shows the current implementation of this setup. Figure 3.2 with StbTracker targets. Each of the target's 3 sides (400

_ 500mm) has 4 markers. The back-projection error for this target is in the sub-pixel range and can therefore be neglected.

Figure 3‎ .1: Plan views (a) and 3D overview (b) of our OmniKinect setup. In (a), vibrating Kinects are marked green and not vibrating Kinects red. (Kainz et al 2012)

64

Figure 3‎ .2: The StbTracker calibration target (1300 _400_500mm) to gain initial extrinsic camera parameters (a) and the initial calibration view (b) showing the coordinate frame center for one example configuration using nine Kinects. (c) Shows a point cloud rendering of the depth calibration target before (left) and after (right) correction with the camera viewing rays (white lines). (Kainz et al 2012)

3.1.2 Real-time appearance-based person re-identification over multiple KinectTMcameras

Satta et al [2012] use multi-kinects as an identification tool to recognize persons within kinects‟ range. This application focuses on two issues related to the deployment of re-identification systems in real-world application scenarios. First, the computational complexity must be low enough to satisfy real-time requirement, second, the accuracy of pedestrian detection, tracking and segmentation (first stage of the pipeline) is critical: in particular, accurate segmentation is needed to avoid including background elements. The application uses in particular kinect sensor with their free libraries that enable enhanced tracking, segmentation and pose estimation, using depth data. Figure 3‎ .3 shows system architecture with an example for two persons.

65

Figure 3‎ .3: Left: example of the capabilities of the Kinect SDK. Two individuals are tracked, and their skeleton is super impressed on the image. The upper-right box shows segmentation in depth domain. Right: system architecture (Satta et al, .2012). System of re-identification works as follows: it first tracks all the individuals seen by a network of Kinect cameras. Afterwards, it adds to a template data base an appearance descriptor

(template) of each acquired track. Finally, it re-identifies online each new individual, by matching its descriptor with all current templates.

3.1.3 Feasability of Fast Image Processing Using Multiple Kinect Cameras on a Portable

Platform

Sumar [2011] collects data from multiple Kinects for the creation of a real-time image processing application. This application aims to develop a multi-Kinect system for near real-time image processing. This was further constrained to detection of coloured spherical markers in each frame to a real world accuracy of ±5 cm at a distance of 5 meters from the camera. The near real-time constraint involves a frame rate that will not be below 10 fps. The purpose of the solution was to detect colored spherical markers, to achieve this goal; this needed to be achieved to an accuracy of

6 pixels from the true center in each captured frame. But this application suffers from interference

66

problem between Kinects, due to structured light for calculating depth value. This means that if multiple Kinects are to be used they would most likely interfere with each other as in Figure 3.4.

Figure 3‎ .4: Kinect camera depth map with interference from parallel Kinect (Sumar, .2011)

3.1.4 Intelligent sensor-scheduling for multi-Kinect-tracking

Faion et al [2012] developed an application based on multi-kinects for scan objects or persons and to perform this, they created an algorithm that selects what Kinects are the closest to them and deactivate the other‟s streams to reduce interferences between kinects. This project describes a method to intelligently schedule a network of multiple RGBD sensors in a Bayesian object tracking scenario, with special focus on Microsoft Kinect TM devices. These setups have issues such as the large amount of raw data generated by the sensors and interference caused by overlapping fields of view. The proposed algorithm addresses these issues by selecting and exclusively activating the sensor that yields the best measurement, as defined by a novel stochastic model that also considers hardware constraints and intrinsic parameters. In addition, as existing solutions to toggle the sensors were found to be insufficient, the development of a hardware module, especially designed for quick toggling and synchronization with the depth 67

stream, is also discussed. The algorithm then is evaluated within the scope of a multi-Kinect object tracking scenario and compared to other scheduling strategies. Figure 3‎ .5 shows an overview of an example tracking scenario. Figure 3‎ .6.b, Figure 3‎ .6.c and Figure 3‎ .6.d shows interferences between one, two, three and four Kinects respectively.

Figure 3‎ .5: Toy train tracking scenario with four Kinect (Faion et al, .2012)

Figure 3‎ .6: Illustration of raw shift variances, measured over 64 frames for a different number of active Kinects. Green pixels with varying brightness indicate valid values: black for 0 and green for 1.Red indicates at least one invalid value. (Faion et al, .2012).

68

3.1.5 PESI (Participative and Enacting Sonic Interaction): Extending Mobile Music Instruments

with Social Interaction

Correia [2013] uses multi-kinects to track users as part of a musical application that produces music according to the users‟ movements and the movements of modded smartphones. The main goal of his project is to expand the participation in collective music practices using physical and social interaction. However, power functionality of mobile phones can be tangible and expressive musical instrument, together with an extended system. The main goal of this system is to understand bodily and social interaction among users, specifically to their location, and coordination with in a group. This system is designed for small-scale collaboration, three performers, since it allows for more complex social interaction then simply two as in Figure 3‎ .7.

PESI is consisting of elements that networked together: the mobile phones (three iPhones) running specially developed software (providing sensory input and audio -haptic output) and a motion capture "extended system" (two Kinect sensor bars and two computers) as in Figure 3‎ .8.

Figure 3‎ .7: Three performers using PESI (Correia,. 2013).

69

Figure 3‎ .8: A diagram with the different PESI elements (Correia,. 2013).

I

3.1.6 Scanning 3D full human bodies using Kinects

Tong et al [2012] built a system to scan 3D full human bodies using multi-Kinects. In this system, each Kinect sees a specific part of the human body; three Kinects can cover the entire human body with minimum overlapping between Kinects. This project presents a new scanning system for capturing 3D full human body models by using multiple Kinects. Also deals with interference problem between Kinects, it uses two Kinects to capture the upper part and lower part of a human body respectively without overlapping region as in Figure 3 .11. A third Kinect is used to capture the middle part of the human body from the opposite direction. They produce a practical approach for registering the various body parts of different views under non-rigid deformation. First, a rough mesh template is constructed and used to deform successive frames pair wisely. Second, global alignment is performed to distribute errors in the deformation space, which can solve the loop closure problem efficiently. Misalignment caused by complex occlusion can also be handled reasonably by a global alignment algorithm.

3.1.7 The capturing of turbulent gas flows using multiple Kinects

Ruhl et al [2011] Use RGB and IR kinect cameras to capture gas flows around objects in the flow using three kinects inside a lab room. This project uses three Kinects, which is sufficient to qualitatively reconstruct no stationary time varying gas flows in the presence of occludes.

Figure 3‎ .9 shows kinect setup, active and passive kinects calibration. While a simultaneous

70

calibration still presents a challenge, this issue can be solved by employing materials with different BRDFs as in Figure 3‎ .10.

Figure 3‎ .9: capturing setup consists of three Kinects placed in a small half circle with an angular spacing of 45 between each other (left diagram). (Ruhl et al., 2011).

In Figure 3‎ .9 Kinects ki capture opposite planes wi using both the passive rgb sensor and the active IR emitter. The refractive medium m, e.g. the propane gas plume of refractive index 1:

3407 distorts the emitted patterns providing cues in the depth images. Right: a setup with three

Kinects and projection walls, the gas nozzle an an occluder. Note that the only constraint imposed on the planes is that they need to be flat and diffuse. No binary pattern printout is needed

Figure 3‎ .10: this solution by (Berger et al., 2011), that solves the problem that RGB and depth sensors cannot be calibrated simultaneously. Thus, we use binary surface patterns, e.g. a checkerboard consisting of white diffuse and mirroring patches (left). In the depth (middle) and IR image (right, threshold for better visibility) the pattern becomes clearly distinguishable. (Ruhl et al,. 2011). 71

3.2 Multi kinects advantages

3.2.1 Big coverage space

Coverage space is one of important feature in multiple kinects architecture. Multiple kinects can cover large space relatively. We use four kinects to cover the space for our cave “SAS”. (Berger et al., 2011) using multiple kinects to cover 3m× 3m×2.5m space. (Ruhl, Schr, Kokem, Magnor,

& Braunschweig, 2011) use 3 kinects to cover 5m×5m×2.5m space.

3.2.2 Low cost

Cost should be taken in consideration as it is an important aspect of a solution that is influenced by numerous factors including the goal of the application. A higher and better level of accuracy can increase the cost of a solution, thus reducing its availability and ease of deployment.

Designing a new solution with acceptable accuracy and an average cost is an efficient solution for most applications. Kinect is considered as a low cost sensor and is very useful in many cases and applications (Gomez, Mohammed, Bologna, & Pun, 2011). The cost of single Kinect equals

150$ in average which is considered cheap(Zafrulla et al., 2011).

3.2.3 Ease of Installation

Working with multiple Kinects is easy, direct and without need for complex installations. Kinect is connected to PC by USB cable and power supply (10).

3.3 Multi kinects issues

Existing multiple kinects in same architecture faces some problems in the architecture; such

as interferences between kinects, calibration, USB hub band width, and kinect

setup/distribution/placement and data synchronization between kinects. 72

3.3.1 USB band width

A Kinect consumes more than 50% of a USB hub bandwidth, this problem solved in config1

by adding additional USB host controller slot to the PC. In term of cost this solution is

relatively not expensive. Config2 and config3 adapt avoiding USB band width problem by

connecting each kinect with separate PC or other micro-computers, utilizing full band width

USB of PC or other micro-computers. 1(11).

3.3.2 Interferences between kinects

One of the challenging problems existing in multiple Kinects architecture is interferences

between Kinects. The mechanism of constructing depth image with a Kinect consists of two

steps: first, Kinect emits his own infrared laser passing through special lens projected on to

the surface that Kinect faces. Second, the infrared camera detects the projected laser pattern

and converted to depth image. Issues can occur when multiple laser pattern are emitted by

multiple Kinects, as each Kinect cannot distinguish the dots from its own pattern from the

ones projected by other Kinects. This cause depth image degradation when several Kinects

are running simultaneously (Scholz, Berger, & Ruhl, 2011).

1 http://msdn.microsoft.com/en-s/library/jj131032.aspx

73

Various solutions for this problem exist such as: Avoiding the interference, time multiplexing,

vibrations and hole filling filter( Roy Or, 2013).

3.3.2.1 Avoiding interference

This solution involves a specific Kinect placement as to eliminate overlapping between the IR

rays as in Figure 3‎ .11: three kinects to scan human body without interference between them

(Tong et al., 2012). This solution is working for two or three kinects, but for more than three

kinects, it‟s hard to place four kinects or more in way without overlapping between them.

Another limitation isn‟t suitable for all applications(Tong, Zhou, Liu, Pan, & Yan, 2012).

Figure 3‎ .11: three kinects to scan human body without interference between them (Tong et al., 2012)

3.3.2.2 Time multiplexing

This approach can be done by several methods:

-Toggling the laser diode: this method alternatively toggles each Kinect‟s laser diode for each

frame (Scholz et al., 2011). This can be done through the API by turning the IR emitter on and

off, but this functionality is only available of a specific Kinect model called Kinect for

Windows.1.

-Revolving Disk: set-up consisting of a tripod with the Kinect standing at a defined position

on the platform with an additional stepper motor that holding the disk is suspended with

rubber rings to minimize vibrations as in Figure 3‎ .12 (Scholz et al., 2011).

Figure 3‎ .12 :kinect with disk revolving motor platform (Scholz et al., 2011)

3.3.2.3 Vibrations

(Maimone & Fuchs, 2012) and (Butler et al., 2012) adapt the vibration solution and vibrate

the Kinect camera unit using a custom rear offset-weight vibration motor to produce vibration

75

in the Kinect unit. Figure 3‎ .13 shows the motor with supplementary rubber bands, acrylic

frame and accelerometer.

Figure 3‎ .13: a) kinect with motor and rubber bands, acrylic frame and accelerometer (Butler et al., 2012), b) kinect with motor that cause motion .(Maimone & Fuchs, 2012) 3.3.2.4 Hole Filling

3.3.3 (Maimone, Bidwell, Peng, & Fuchs, 2012) developed a framework that doing multiple

tasks such as: smoothing, data merging, surface generation, and color correction and hole

filling. Interference between Kinects causes black holes in depth images corresponding to

missing data. Hole filling fills these missing points rather than correcting grossly

erroneous depth values. Kinect Calibration

Calibration is very important step in computer vision, extracting metric information from 2D

images(Zhengyou Zhang & Way, 1999). Calibration produces intrinsic and extrinsic

parameters, intrinsic parameters are focal length, principal point and skew coefficient. The

goal of intrinsic calibration is to determine exact kinect characteristics. Extrinsic calibration

defines the position of the camera center and the camera's heading in world coordinates by

two matrices R and T.(Zhengdong Zhang, 2006).

For a single Kinect system, as each Kinect has its own field of view, there is no need for

extrinsic calibration. For multiple Kinects calibration it‟s important to conduct both intrinsic

and extrinsic calibration for more precision. Extrinsic calibration is very important when 76

working with multiple Kinects in order to achieve the unification of all the sensors‟ fields of

view into a common one. The following step is data fusion. (Cruz,2012).

3.3.4 Kinects setup

Various factors come into play when considering the physical positioning of the Kinects

within the system. For instance, Kinects feature a tilt motor which allows ±27 degree up

/down and their FOV has a 43° vertical angle as well as a 57° horizontal angle (Cruz,2012) .

A proper multi-Kinect set-up in our CAVE “Le SAS” involves a study of the best angles for

each Kinect. What should be considered in order to decide on an optimal Kinect configuration

is the ability to detect and track a user at any position inside the cave. This is a question that

exists in all types of sensor configurations and could be easily solved by adding more kinects

inside the cave. However, this would come to the expense of the image quality, which would

be degraded by multi-Kinect interferences. 2.

The setup/distribution of kinects depends on the number of kinects. Figure 3‎ .14 shows two

kinects in front of each other. This can be in more than one with fixed distance between them.

Another distribution type may be with a different angle between them as in Figure 3‎ .14.

Figure 3‎ .15 shows three Kinects as well as the overlapping between them.

1 http://blogs.msdn.com/b/kinectforwindows/archive 2 nuiimagecamera “microsoft.com

77

Figure 3‎ .14: two kinect in front of each other (Gatto et al,. 2012)

Figure 3‎ .15: 3 kinects with angle between the kinects (Caon et al,. 2011)

3.3.5 Data synchronization

There is no need for data synchronization in configuration 1 because all Kinects are connected to the same PC. As a result, the data from each Kinect is simultaneously received without time differences. In configurations 2 and 3, there is no guarantee that all data packets from the

Kinects arrived at the same timestamp. In configuration 2 and 3 we need to perform some additional network setup to properly connect the Kinects to the server due to potential differences in data timestamps. Among the protocols that can be used to transfer data in client/server system, UDP and TCP were considered. TCP rearranges data packets in the order specified, but UDP has no inherent order as all packets are independent of each other. In term of speed of transfer, TCP is slower than UDP1.

3.4 Installation multi kinects inside the SAS

3.4.1 Material configuration

We detail in this part our choice of a system combining several Kinects inside the CAVE, We seek to connect multiple Kinects together in order to cover all the SAS and retrieve data on the users' position. That's why we decided to list the possibilities available to us before choosing one and determining why the chosen architecture seems most relevant. The 3 figures below show the

3 most popular configurations. (What we have called config 1, 2 and config 3 as in Figure 3‎ .16,

Figure 3‎ .17 and Figure 3‎ .18.

3.4.1.1 Multiple Kinects on a single PC (config 1)

With a powerful computer and as many USB host controllers as Kinects used in the set-up, it is possible to use an architecture of only one PC, without any network or connection. Each Kinect is connected to its own USB Host Controller and transmits data to the same computer which will process the information supplied. Figure 3‎ .16 expose this configuration.

1 http://www.diffen.com/difference/TCP_vs_UDP14-

79

Figure 3‎ .16: Multi-Kinect setup config 1 which is to connect all Kinects on the same PC , (Kainz et al., 2012) 3.4.1.2 Each Kinect on a PC + One Server (config 2)

It is possible to connect each Kinect to a computer dedicated to this Kinect, and send all the data to a centralized server that will process the information streams. This solution allows the use of less powerful computers and achieves greater scalability than single-PC solutions, as the number of computers is equivalent to the number of Kinects. Plugging micro-computers to the Kinects is a cost-effective choice as well. Figure 3‎ .17 illustrates this configuration.

Figure 3‎ .17: Multi-Kinect Setup config 2. A provision where every Kinect is linked to a mini-PC, and all are connected to a central server responsible for collecting sensor data(Sattaet al.,2013). 3.4.1.3 Each Kinect on a PC with one also acting as a server (config 3)

This architecture model is similar to the previous one, but one of the computers is powerful enough to do the data collecting and processing while having a Kinect connected to it. It saves the cost of a computer. Figure 3‎ .18 includes this configuration.

Figure 3‎ .18: Multi-Kinect config 3. which uses several Pocket PCs to transmit data Kinects to a server, which is also connected to a Kinect (Williamson & Jr, 2012).

81

3.4.1.4 Comparison an review of existing solutions

Table 1 shows differences between the config 1, 2 and config 3 in terms of cost, convenience, hardware requirements, advantages and drawbacks. In Table 2 we show multi-kinects applications, in each application we list set of factors such as cost, config type, hardware requirements and number of kinects used.

Table 1 : Multi-Kinect acquisition – comparison between three material configurations

Multi-Kinect / Multi Multi-Kinect / Multi Multi-Kinect / Single PC / Server (config PC / PC + Server PC (config 1) 2) (config 3)

Depends on the Depends on the The cost of a single number of Kinects number of Kinects Cost high-specs computer and the computer and the computer and the Kinects specs specs

Setting-up the Setting-up the Convenient, as there network and the network and the Convenience is no network to Kinects can be a bit Kinects can be a bit manage inconvenient inconvenient

A server, network A server, network A single powerful Hardware cables, as well as cables, as well as computer and USB requirements laptops or laptops or 2.0 Hub Controllers microcomputers for microcomputers for

82

each Kinect each Kinect minus

one

-This solution is - This solution is

-Everything is already scalable and adapts scalable and adapts

centralized on a itself to the number of itself to the number of

Advantages single computer. Kinects used Kinects used

-No need to create a -Aside from the - Aside from the

network server, most machines server, most machines

are not costly are not costly

- The network has to -The computer has to be reliable and fast to do a lot of processing avoid de-syncs -A single computer -The network has to -The Kinect may not be enough Drawbacks be reliable and fast to connected to the PC- for a very high avoid de-syncs server might have a number of Kinects, speed advantage over even with additional the other Kinects USB Controllers when sending data.

Table 2 : Review of existing solutions created by researchers

Solution Number of Solution name Hardware Specs. Cost configuration Kinects used

83

(see previous

table)

-1 PC with an ASUS

Sabertooth X58 mainboard,

OmniKinect: an Intel Core i7 980X 7 (with both Real-Time Dense processor, 16 GB RAM, streams per Volumetric Data and a NVIDA Quadro 6000 Kinect at a time) Acquisition Config 1 graphics card 5.000 US $ or 12 (with only and Applications -3 built-in USB Host the RGB or depth (Kainz et al., controllers and 4 additional stream) 2012) USB controllers

-OpenNI and Microsoft

Kinect SDK

Real-time

appearance-based

person re- 2 in the test, but -1 centralized server identification Config 2 the architecture -1 cheap miniPC per Kinect over multiple allows for more

Kinect cameras

(Satta et al., 2013)

Multi-Kinect -3 Dell Inspirion N7110 Tracking for laptops Dismounted Config 3 -1 Dell Precision T3500 4 Soldier Training -Microsoft Kinect for (Williamson & Jr, Windows SDK 2012)

Fast Image -1 Laptop computer running Processing Using Multiple Kinect Config 1 Windows 7 on a 1.5GHz 2 500 NZ $ Cameras on a Portable Platform Core 2 Duo processor (Sumar, 2011.)

84

Intelligent Sensor- -2 PCs with 2 Kinect Scheduling for connected to each Multi-Kinect- -1 server connected via Tracking (Faion, Config 2 4 LAN Friedberger, Zea, -Ubuntu OS and libfreenect & Hanebeck, drivers 2012)

Multiple Kinect -4 PCs, one per Kinect. One

Studies (Schröder Config 2 of them also controls the 4

et al.,2011) IR-lens shuttering disks.

-2 PCs, one per Kinect PESI: Extending -1 server that receives OSC Mobile Music data Instruments with Config 2 2 -OpenNI library + Social Interaction Processing and (Correia, 2013) Openframeworks

3.4.1.5 Proposed architecture

As we can see in the previous section, the second configuration type is the most suited to our study. That is to say we will use a centralized server and multiple mini PCs, each one connected to a single Kinect. Additionally, the determined number of Kinects to dispatch within the SAS is

4. We therefore explain in this section the requirements for this configuration. Table 3 shows the proposed architecture with needs in terms of software, hardware and wiring.

85

Table 3 : proposed architecture with needs in terms of software, wiring and hardware

Material Software used

Linux OS, libfreenect drivers, OpenNI / Windows 7 OS, Embedded Kinect SDK PCs (Proposed configuration includes UBUNTU) Software Centralized Windows 7 OS, Kinect SDK,

server

We need a synchronization between data issued from Network the 4 embedded PC and acquired by the central server switch – a kind of Times taming process

Materiel Meters Requirements

The cables have to be able to connect 4 Ethernet (variable) Kinects to a switch and the server to this cables

switch

Depending on the location of the Kinects Wiring USB (variable) and their allocated mini-PCs, USB extenders extenders may be needed

Number Material Additional configurations of items

 Windows 7, Windows 8, Windows

Embedded Standard 7, or Windows Kinect 4 Embedded POSReady 7. xbox sensor  32 bit (x86) or 64 bit (x64) processor

Hardware  Dual-core 2.66-GHz or faster

86

processor

 Dedicated USB 2.0 bus

 2 GB RAM

 Ethernet

Embedded Mini PC fanless eCW49501 4 PC Mini PC eCW512L

 Windows 7, Windows 8, Windows

Embedded Standard 7, or Windows

Embedded POSReady 7.

 32 bit (x86) or 64 bit (x64) processor Centralized  Dual-core 2.66-GHz or faster 1 Server processor

 Dedicated USB 2.0 bus

 2 GB RAM

 Ethernet

Ethernet Needs at least a port for the server and one for 1 Switch each Kinect.

87

3.4.2 Geometric dispatching

Four Kinects will be dispatched around the “SAS” to capture a person's skeleton joints inside the

CAVE. The objective of the dispatching is to cover the “SAS” as much as possible to track the

users while they move inside it. Figure 3.19 shows the dispatching.

Figure 3.19: This is a representation of the “SAS” with the location of the four Kinects(Salous et al,2014)

The reasoning behind this specific amount of Kinects is as follows. Several constraints inherent to the SAS such as the size and shape condition the effectiveness of a single Kinect. As explained in our Kinect sensor analyses, the FOV and range of a Kinect is limited; however the

SAS' dimensions are too large for a single Kinect to be effective. The same reasoning applies to set-ups with two or three Kinects, as the combined coverage of less than 4 Kinects is unsatisfying and does not allow to completely track the user. As a result, the gestural interaction between the user and the virtual world is hampered. 88

On the other hand, providing the SAS with more than 4 Kinects would generate various issues.

These new problems are related to overlapping FOVs, Kinect calibration and cost as well as network efficiency.

Expressed in the analysis of the Multi-Kinect installations' drawbacks is the overlapping between multiple Kinects. The result of an ensemble of Kinects pointed at the same target is a decrease in tracking accuracy and a surge in noise data in the depth images returned by the sensors. While 4

Kinects are perfectly suited to the SAS' rectangular shape, adding several sensors would disrupt the balance between covering the SAS and overlapping one another's FOV.

Moreover, the objective of this thesis is to present a efficient and low-cost tracking system in a

VR system. Consequently, the additional costs impaired by the Kinects incur a deviation from the system's initial objectives. Having 5 or 6 Kinects instead of 4 would also imply further calibration procedures before using the SAS as well as supplementary calculations by the server while using the VR system. While the former increases the complexity of the set-up operations, the latter can directly impact the multi-Kinect system's performance by providing increased computing time. Those are the reasons why our experiment focuses on a 4-Kinect system.

89

3.4.2.1 Top-Left Kinect

Figure 3.10: The top-left Kinect‟s location and FOV. Top and side views of the “SAS” 3.4.2.2 Top-Right Kinect

Figure 0‎ 2.21: The top-right Kinect‟s location and FOV. Top and side views of the “SAS”

90

3.4.2.3 Bottom-Left Kinect

Figure 3.22: The bottom-left Kinect‟s location and FOV. Top and side views of the “SAS” 3.4.2.4 Bottom-Right Kinect

Figure 3.23: The bottom-right Kinect‟s location and FOV. Top and side views of the “SAS”

91

3.4.2.5 Overlapping between Kinects’ FOV

The way the multiple Kinects have been dispatched implies overlapping areas. Parts of the

“SAS” will be covered by several Kinects at the same time, and combining the Kinects‟ FOV allows us to determine how the server will manage the tracking of the users (some Kinects can be temporarily disabled if there is no one in their FOV for instance) and how the fusion algorithm will proceed. For reasons of efficiency, processing speed and reduction of IR interference, it may be necessary not to let all Kinects function simultaneously at all times. This is why we have to define the overlapping areas between the Kinects. Our objective with the analysis of the overlapping areas of the “SAS” is to provide data to determine the coverage of the

Kinects, and therefore, the effectiveness of their dispatching we proposed. What will follow are multiple diagrams about the overlapping fields for every combination of 2 Kinects. When merged, those diagrams will define areas that are covered by multiple Kinects.

3.4.2.5.1 Overlapping between the top-left and bottom-left Kinects

Figure 3.24: Overlapping area between upper-left and lower-left Kinect

92

3.4.2.5.2 Overlapping between both top Kinects

Figure 3.25: Overlapping area between both top Kinects 3.4.2.5.3 Overlapping between top-left and bottom-right Kinects

Figure 3.26: Overlapping area between upper-left and lower-right Kinects

93

3.4.2.5.4 Overlapping between bottom-left and top-right Kinects

Figure 3.27: Overlapping area between lower-left and upper-right Kinects 3.4.2.5.5 Overlapping between both bottom Kinects

Figure 3.28: Overlapping area between both bottom Kinects

94

3.4.2.5.6 Overlapping between bottom-right and top-right Kinects

Figure 3.29: Overlapping area between lower-right and upper-right Kinects

3.4.2.6 Segmentation of the “SAS” depending on the multiple Kinects’ FOV

Once we have the overlapping areas and the various dead zones that our Kinect configuration creates, we can determine a display of multiple areas on the “SAS” with variable degree of reliability for the tracking.

95

Figure 3.30: Overlapping areas according to their level of reliability. There are three types of zones, Red, Orange and Green area. -Tracking example

Figure 3.31: Example of a successful attempt at tracking a user in the “SAS”

96

3.4.3 Fusion rules

In order to determine a Fusion rule set, we decided to separate the “Sol” screen into 4 parts as shown in the figure 3.32. The areas and the Kinects are given an ID number to facilitate the creation of the fusion rules later on in this report.

Figure 3.323: The four areas inside the “SAS” that will be further analyzed and segmented into specific fusion rule sets.

In each of these four areas, we will propose an outline of the various sub-areas in which we can apply a specific combination of Kinects for each sub-area. The objective is to create a basis for the code that will be implemented in the fusion algorithm and that chooses which Kinects to activate or deactivate depending on the user‟s location.

In the diagrams below, each corner of the “SAS” will be segmented into three sub-areas with three different levels of reliability and tracking effectiveness:

97

 High: The part of the corner which is close to the middle of the “SAS”, where there are more than two Kinects that overlap  Medium: The middle part of the corner with two Kinects overlapping  Low: An area where only a single Kinect will efficiently detect a person and track it.

Depending on the overlapping areas, there are different rules regarding the activated Kinects. For each sub-area, we will propose one or more Kinects to track the users. Obviously, the dead zones

[2] cannot be covered by any Kinect.

3.4.3.1 Fusion rules for the upper-right corner of the “SAS”

Figure 3.33: Fusion rules in the top-right corner

98

3.4.3.2 Fusion rules for the upper-left corner of the “SAS”

Figure 0‎ 3.34: Fusion rules in the top-left corner 3.4.3.3 Fusion rules for the lower-left corner of the “SAS”

Figure 3.35: Fusion rules in the bottom-left corner

99

3.4.3.4 Fusion rules for the lower-right corner of the “SAS”

Figure 3.36: Fusion rules for the bottom-right corner 3.5 Kinect Calibration

Camera calibration is a documented element of computer vision science and has been subject to various analyses. The process of calibrating a camera returns a set of values that determine the camera‟s characteristics. There are two types of parameters: intrinsic and extrinsic. The intrinsic parameters are various properties of the camera‟s lens. Knowing them enables more accurate measurements. Extrinsic parameters, on the other hand, are related to the position of objects in a world reference frame. This second type of parameter is important in our VR facility as we need to calibrate multiple cameras in relation to one another, and this is possible thanks to extrinsic calibration.

3.5.1 Calibration process

The calibration process acquires data from pictures of a checkerboard taken from the camera and computes the parameters. 20 to 30 snapshots of the checkerboard are taken as references for the

100

calibration as in figure 3.37. The checkerboard‟s position and angle can vary between the pictures but the checkerboard must be visible in its entirety on every snapshot.

To increase the accuracy of the measurements, the checkerboard should be close to the camera and there should be as little lighting reflection as possible on it. Furthermore, the pictures must not be blurry.

Figure 3.37: Example of screenshot

For our experiment, we used ColorBasics-D2D, an application available with the official Kinect

SDK that displays the Kinect RGB camera‟s video feed and can record .bmp screenshots. We slightly tweaked the default parameters to output a resolution of 1280x960 pixels. For calibration process we use camera calibration toolbox that is available in Matlab software as in figure 3.38.

Figure 3.38: Camera Calibration Toolbox

101

3.5.2 Intrinsic calibration

The next step involves processing the pictures and extracting the checkerboard‟s corners. For each picture, we need to click on the corners of the checkerboard as seen in figure 3.39. a. The next step involves processing the pictures and extracting the checkerboard‟s corners as seen in figure 3.39.b. After confirming the crosses‟ locations, the software extracts the corners from the picture as seen in figure 3.39.c. The coordinates of the corners will be used for the calculation of the intrinsic parameters.

Once this process has been repeated for each picture, the Calibration function option takes into account all the extracted grid corners to compute the camera parameters and distortion factors.

The calibration‟s output data is as follows:

-Focal length.

-X and Y coordinate of the Principal Point.

-Skew factor. In our case, the skew factor is always 0 as the pixels are rectangular-shaped.

-5 distortion parameters.

-Pixel error.

In the following matrix called the camera‟s intrinsic matrix, the intrinsic parameters and represent the focal length multiplied by the scale factor for the x and y axes, respectively.

Moreover, and are the coordinates of the principal point.

[ ]

102

Figure 3.39:a) Clicking on the extreme corners b) corners‟ locations c) Extracted corners

3.5.3 Extrinsic calibration

The intrinsic camera parameters represent characteristics of the camera‟s lens, while the extrinsic parameters determine the location of a point in the reference picture in regards with the reference of the camera. This process will only require a single screenshot per camera; however, both types of calibration require clicking on the four corners of the grid to extract points. After the corners are extracted, the toolbox provides extrinsic parameters: a translation vector, a rotation vector, and a rotation matrix. The translation vector T and rotation matrix R are part of the following matrix that is used in computer vision to translate the coordinates of a specific point from one point of reference to another.

[ ]

103

In our situation, this matrix is mandatory to simultaneously calibrate two kinects. We calibrate each kinect with respect to absolute checkboard position inside the SAS as in figure 3.40.

Figure 3.40: Simultaneous Extrinsic Calibration of two Kinects, checkboard captured by two kinects at same time

3.6 Summary

This chapter focuses on Multi kinects applications, Multi kinects advantages, Multi kinects issues, and Installation multi kinects inside the SAS. In section 0 1.1 2.1 3.1 we talk about Multi kinects applications such as Real-Time Dense Volumetric Data Acquisition and Applications,

Scanning 3D full human bodies using Kinects and capturing of turbulent gas flows using multiple Kinects, etc. In section 3.2 we talk about Multi kinects advantages such as big coverage space, low cost and Easley installation. In section 2.3 3.3 we describe multi kinects issues such as

USB band width, interferences between kinects, kinects Calibration, Data synchronization and kinect setup. In section 2.3.1 3.4 we describe important issues related to multi kinects such as

Material configuration, Geometric dispatching and Fusion rules.

104

105

4 Chapter four: Validation of system

4.1 Protocol of test

The testing protocol is as follows. A random user chosen from the 23 participants is asked to move as he or she wishes inside the SAS during 15 minutes without performing any of the three gestures: raising either hand or short hopping. We then analyze the data provided by our gesture recognition system and find the number of false detections.

The aforementioned experiment was conducted twice: once with a single kinect, and once with the Four-Kinect set-up.

4.1.1 Posture data collection and posture detection

The environment in which the data is collected is a CAVE named “Le SAS” with two screens.

Each is 3 meters high and four meters wide. Data about the user‟s location and its joints is measured by four Kinects in a multi-sensor system that was notably discussed in a paper (Salous et al, 2015).

The objective of this system is to cover the maximum amount of space in the “SAS” and track the user‟s gestures regardless of its position inside the CAVE.

As the user is supposed to be able to move around the “SAS”, the multi-sensor system tracks the user‟s coordinates for each frame and an algorithm is used to determine in real-time which of the

Kinects have the more accurate joint data.

4.1.2 Posture detection algorithm

The actual algorithm used by our module to detect gestures can be observed in Figure 4 .1. Each

Kinect sends joint data to the gesture recognition module in real-time. As all Kinects may not be 106

very close to the user, the module applies a filtering mechanism discussed in a previous paper, where only the most relevant Kinects for the desired measured joints are taken into account.

In our case, the specific joints required for the detection of the three gestures we wish to test (cf

Section 4.6 ) are both elbows, both hands and both knees.

Once the user joint data is properly recorded by the CAVE‟s multi sensor-system, the potential gestures reflected by those coordinates have to be detected as such before being translated into inputs to interact with the VR environment. The second part of the algorithm is tasked with analyzing the filtered joint data and detecting gestures.

In order to recognize a raised hand, the algorithm compares the X and Y coordinates of the hand and elbow joints. If the hand‟s two coordinates are higher than the elbow‟s, the module returns a gesture notification.

The method used to detect the short hopping movement is to compare the differences in the X and Y coordinates of both knees between a previous frame and the current frame. If both knees‟ joint data provide a difference lower than a chosen threshold, the system signals a short hop.

It is also possible to develop other specific posture detection calculations related to new gestures we may intend to implement in our CAVE.

As the user is supposed to be able to move around the “SAS”, the multi-sensor system tracks the user‟s coordinates for each frame and an algorithm is used to determine in real-time which of the

Kinects have the more accurate joint data.

The actual algorithm used by our module to detect gestures can be observed in Figure 4 .1. Each

Kinect sends joint data to the gesture recognition module in real-time. As all Kinects may not be 107

very close to the user, the module applies a filtering mechanism discussed in a previous paper, where only the most relevant Kinects for the desired measured joints are taken into account.

In our case, the specific joints required for the detection of the three gestures we wish to test

(Section 4.6 ) are both elbows, both hands and both knees.

Once the user joint data is properly recorded by the CAVE‟s multi sensor-system, the potential gestures reflected by those coordinates have to be detected as such before being translated into inputs to interact with the VR environment. The second part of the algorithm is tasked with analyzing the filtered joint data and detecting gestures.

In order to recognize a raised hand, the algorithm compares the X and Y coordinates of the hand and elbow joints. If the hand‟s two coordinates are higher than the elbow‟s, the module returns a gesture notification.

The method used to detect the short hopping movement is to compare the differences in the X and Y coordinates of both knees between a previous frame and the current frame. If both knees‟ joint data provide a difference lower than a chosen threshold, the system signals a short hop.

It is also possible to develop other specific posture detection calculations related to new gestures we may intend to implement in our CAVE.

As the user is supposed to be able to move around the “SAS”, the multi-sensor system tracks the user‟s coordinates for each frame and an algorithm is used to determine in real-time which of the

Kinects have the more accurate joint data.

108

Figure 4.1: Posture detection algorithm

4.2 Hypothesis

Our starting hypothesis is that four Kinects working in tandem will be more accurate and will cover more potential user locations than a single sensor and are therefore more suited to capture gestures in a CAVE.

In order to obtain proper feedback on the effectiveness of our methodology and to provide conclusive data on our hypothesis, we set-up experiments which objective was to test whether

109

the four Kinects system properly tracked the users‟ gestures and whether it was more reliable than a single Kinect.

4.3 Constraints of the CAVE

While gesture recognition on its own requires complying with a few constraints, the environment in which we wish to deploy the gestural recognition abilities of our sensors provides new challenges.

Indeed, the gestures are used to interact with the VR environment. Therefore, the user should do movements that will not interfere with his or her experience within the CAVE. For instance, asking the user to move backwards in order to make his virtual self-move backwards is potentially dangerous as the user can fall off the ground screen if he or she is not cautious. A similar issue can happen with moving forward as the user may collide with the front screen.

Those constraints have to be taken into account when designing what gestures will be required for our VR applications. However, it is still possible to test gestural recognition with gestures that may not be implemented in future works.

4.4 False detection

As our approach aims at providing an accurate motion sensing system, there is another variable that has to be taken into account before interpreting the experiment results. Indeed, the phenomenon known as “False Detection” can occur and provide a layer of inaccuracy to the data captured from the user‟s movements.

110

False detection happens whenever one of the sensors registers a motion that was not actually performed by the user, thus losing in overall accuracy. In our case, we measured the false detection issue regarding our three tested gestures within our SAS.

Disparities between reality and the image of the skeleton joints captured by the Kinects can be responsible for false detections. Indeed, as the motion detection uses the skeleton images, an inaccurate recording of the depth values can generate an incorrect skeletal image with motion deviating from what happened within the SAS.

4.5 Material and method

As we can see in the subsection 3.4.1.2, the second configuration type is the most suited to our project. That is to say we will use a centralized server and multiple mini PCs, each one connected to a single Kinect. We therefore explain in this section the requirements for this configuration.

4.5.1 Hardware

Table 4: hardware used in our project

Device Number Required configuration Price per unit

Windows 7, Windows 8, Windows Embedded Standard 7, or Windows Microsoft Embedded POSReady 7. 32 bit (x86) or 64 bit (x64) processor Kinect 70 euros 4 Dual-core 2.66-GHz or faster processor xbox Dedicated USB 2.0 bus 2 GB RAM Ethernet PC 200 euros 1(config 2) Mini PC server 600 euros 1 server

111

Ethernet Needs at least a port for the server and one for 10 euros 1 Switch each Kinect.

4.5.2 Wiring

Table 5: hardware used in our project

Material meters Requirements Price per meter

The cables have to be able to connect 4 Kinects to a Ethernet cables variable <‎1€ switch and the server to this switch

Depending on the location USB extenders (variable) of the Kinects and their 2 euros allocated mini-PCs, USB extenders may be needed

4.5.3 Software

Table 6: software used in our project

Materiel Software required

Windows 7 OS, Kinect PC‟s SDK

server Windows 7 OS, Kinect SDK

We need a synchronization between data issued from the 4 Network : switch embedded PC and acquired by the central server – a kind of

TimeStaming process

4.6 Postures Detection inside the SAS

Three gestures were tested in this experiment: raising the right hand, raising the left hand, and doing a short hop as seen in Figure 4 .2. We asked each of the participants to do each of these

112

gestures four times at random locations inside the “SAS” while a single Kinect was tracking them. This procedure was repeated with the four Kinects system instead.

Figure 4.2: Participants in the middle of the experiment performing the three gestures a) Raising the left hand, b) Raising the right hand, c) Short hopping

4.7 Participants

The experiment was conducted as follows: 23 participants were invited to the “SAS” and were not informed beforehand about the specific objective of the experiment. This was done to avoid creating bias in the users‟ behavior during the tests. All of the participants are adults. 12 of them are male and 11 are female.

113

5 Results

5.1 Postures detection inside the SAS

Figure 5 .1 shows the results for the three gestures (raising the right hand, raising the left hand and short hopping). As each gesture was repeated four times by each of the 23 participants, the data pool from which the experiment results were extracted contains 92 recordings for each combination of gesture and Kinect set-up. The results of our experiment are as follows. While the right hand raising gesture was successfully recognized by the four-Kinect system 93% of the time, only 77% of the gestures were detected by the single Kinect. The success rate for the left hand raising gesture is similar to the right hand one, with the four-Kinect system accurately capturing 91% of the gestures and the single Kinect only 75% of them. Short hopping proved to be a slightly more difficult gesture to detect according to the results. Indeed, the four-Kinect system‟s success rate is 87% and the single Kinect‟s is 68%.

114

Figure 5.1: Level of accuracy of the gesture recognition module for each of the three tested gestures and both Kinect configurations

5.1.1 False detections

As explained in4.4 , our experiment also measured the risk of a false detection occurring while using the SAS.

The results are as follows. The single Kinect set-up returned 17 false detections during the 15 minute-long test. The Kinect detected a short hop 9 times, recognized a raised right hand 5 times and returned a raised left hand 3 times as in Figure 5‎ .2 .

Figure 5‎ .2:False detection result, single Vs 4 kinects setup However, data fusion has proved to significantly decrease the number of false detections in a multi-Kinect set-up. Once the 4 Kinects‟ joint data was fused into a single skeleton, only 6 false

115

detections were returned by the system. 3 short hops, 2 raised right hands and 1 raised left hand were detected, thus providing a lower amount of false detections than a single Kinect.

116

6 Chapter six: Conclusion and future work

In this chapter we conclude this dissertation by summarizing our contributions and discussing directions for future work.

Our objective in this thesis was to provide an effective and low-cost method to provide interaction between a user and a specific type of VR system called a CAVE. The current system used by the “SAS” requires AR racking as well as a joystick, and the multi-Kinect system we aimed at implementing provides a more accessible alternative.

Throughout the process leading to that objective, several contributions have been made:

6.1 Objectives and several contributions

6.1.1 Adapting the multi-kinect system to the CAVE.

The first challenge was related to the implementation of the sensors within the “SAS”. Being a specific type of VR environment, a CAVE features several constraints and elements that had to be analyzed before deciding on a sensor placement: size, user behavior, sensor placement and

FOV. This list of constraints provides a significant difference compared to projects that perform in a smaller and less demanding environment.

6.1.2 Addressing the Kinects‟ FOV‟s overlapping in the “SAS”

While the CAVE created additional difficulties in deciding Kinect placement, the sensors‟ own abilities required another study. Indeed, the Kinects‟ IR rays can interfere with one another. As a consequence, we had to analyze the FOVs and adjust the Kinects‟ locations accordingly as to decrease the amount of interference between the sensors.

117

6.1.3 The tracking system

As the SAS was equipped with four Kinects during our experiments, the issue of accurate data collection became relevant. Indeed, the user is given the freedom to move within the SAS and as a result, the Kinects providing the most complete and accurate user joint data change in real-time depending on the user‟s actions and movements.

A Kinect selection algorithm had to be implemented in– the SAS‟ tracking system to determine the closest and most accurate Kinects in real-time. For each frame, accuracy values were given to each Kinect depending on the quality of the provided joint data. This allowed the system to filter unnecessary data output by sensors furthest from the user.

6.1.4 The gesture recognition algorithm for the “SAS”

Gestural recognition in the SAS is a two-part process. A survey of the potential gestures was conducted before the detection algorithm implementation in order to determine gestures that comply with the constraints of the VR environment. Using the information obtained by the previous SAS analysis, three test gestures were decided for this experiment.

The second step was to develop the gestural recognition itself. Despite the Kinect‟s ability to detect motions, a specific and custom algorithm was necessary to capture the gestures the CAVE was set out to test.

6.1.5 Data fusion with multi-Kinect data

Compared to smaller-scale virtual reality installations, our system required several Kinects in order to track the user in the entire SAS. In this context, performing calculations and analyses on the skeleton data cannot be possible without proper data fusion. To perform this task, a specific algorithm was developed and applied to the Kinect skeleton data. 118

6.1.6 Data transmission using the UDP network protocol

One of the multi-sensor system‟s most prominent features is its use of a network of sensors instead of a single camera. This characteristic implies efficient and lossless data transmission between the various units that are part of the system. In order to achieve this objective and minimize data loss, the UDP network protocol was used to send data from the sensors to the computing system.

6.2 Summary

In summary, we provided a multi-Kinect system for a CAVE that fuses joint data to re-create a skeletal representation of a user and detect its gestures in order to translate them into inputs for the VR program. The development of this system required algorithms to detect the most efficient

Kinects in real-time and recognize the gestures.

6.3 Future Work

While the results provided by our analysis and our experiment in the SAS are conclusive, further improvements can be implemented and the scope of the multi-Kinect system can be expanded.

The first area of improvement is the system‟s tracking accuracy. The current method used to track a user in the SAS involves data fusion using joint data from the 4 Kinects. Refining and improving the fusion algorithm can provide more efficient joint mapping by the multi-Kinect system.

Other selection methods can be implemented to increase the accuracy such as collecting the head joint status of all Kinects and using an averaged value to track the user. A second alternative to

119

the current algorithm is resorting to the depth images returned by the Kinect cameras and using them to extract the head joint's position. In the event that such methods are experimented with, additional tests will be required to determine the exact increase or decrease in accuracy these techniques provide. In complement to the accuracy increase for the head joint, the other skeleton joints' accuracy can be increased thanks to the method outlined earlier. Analyzing the Kinect's depth images in real time to measure the skeleton joints' positions may provide a surge in accuracy, though the efficiency and computing-time of this technique needs to be measured and tested. The importance of those tests can be explained by the larger number of skeletal joints that will be affected by the additional computations. Indeed, a slight difference in efficiency can provide a drastic delay when applied to all skeletal joints.

Enhancing the multi-Kinect system's accuracy can also occur through a different data fusion process that is directly based on a recollection of depth data.

The topic of Kinect calibration was discussed in this paper and is also subject to various improvements. While the current method provides a respectable level of accuracy, the checkerboard set-up is time-consuming. Consequently, increasing the efficiency of the procedure lies in finding a faster alternative. Examples can include a movable Kinect, or the use of single calibration image for the extrinsic calibration.

Our analysis explained the reasons why a 4-Kinect set-up was considered and subsequently installed in the SAS. Nonetheless, the optimal number of sensors in a multi-Kinect set-up may be dissimilar in a CAVE with different dimensions. A solution to this issue is a mathematical model that can estimate the number of Kinects suited to a VR installation by computing several variables such as the CAVE's size.

120

Gestural recognition in itself can also be expanded upon in future work. The potential updates to the gesture recognition system are two-fold, as both the number and the complexity of gestures are subject to incremental improvements:

In the experiment described in our analysis, three gestures were recorded and analyzed. Those gestures were chosen in relation to the interactions currently planned within the SAS‟ VR experiences, and as such were sufficient within the actual scope of the SAS‟ use. Nevertheless, as the SAS can be used for various types of VR programs, different types of gestures can be deemed necessary to convey intuitive interaction with the new types of VR worlds. While the gestures described in our thesis cover general purposes, new gestures can also be added in order to accommodate for more specific needs. As a result, the multi-Kinect module can be improved by adding more recognizable gestures.

Furthermore, the system can be improved by recognizing more complex gestures. The complexity of a gesture can be defined by the number of joints involved as well as the degree of accuracy required to perform the gesture. A complex gesture can also involve a set of several simpler gestures performed in a specific order. Depending on the actions available to the user in the context of a VR simulation, a complex gesture can be more suited to a specific type of interaction than a generic gesture. However, the added complexity is also synonymous with an increased need to properly calibrate the gestural recognition. Indeed, while the error margin for a simple gesture is high, the higher number of involved joints and the specificity of these new gestures may substantially decrease the allowed error margin. As a more complex gesture might be more difficult to be recognized by the multi-sensor system, user experience may risk losing the intuitive and natural type of interactivity initially provided by the SAS. This has to be taken into account during future works. 121

Moreover, this process of implementing new motions inside the system can be enacted through machine learning. The main advantage of this solution is the opportunity to gradually fine-tune the SAS' motion recognition ability over the course of several sessions.

The last layer of complexity that can be infused into the tracking system is the possibility to detect and track several users. Several factors and constraints need to be taken into account when developing a multi-user tracking module such as the interferences between users, the impact on the tracking accuracy and the motion recognition. Furthermore, preventing users from hurting one another while performing the required gestures is another problematic topic that further research should aim at fixing.

122

7 References

1. Alberto, K., Mora, F., Odobez, J., & Lausanne, D. (2012). Gaze Estimation from Multimodal Kinect Data. 2. Alexiadis, D., Kelly, P., Connor, N. E. O., & Moussa, M. Ben. (2011). Evaluating a Dancer ‟ s Performance using Kinect-based Skeleton Tracking. Signal Processing, 659– 662. 3. Allison, D., & Hodges, L. F. (2000). Virtual ReaJity for Education ? System, 160–165. 4. Basori, A. H. (2008). The Feasibility of Human Haptic Emotion as a Feature to Enhance Interactivity and Immersiveness on . International Journal, 1(212), 5–6. 5. Berger, K., Ruhl, K., Schroeder, Y., Bruemmer, C., Scholz, A., & Magnor, M. (2011). Markerless Motion Capture using multiple Color-Depth Sensors. 6. Bonfigli, M. E., Guidazzoli, A., Imboden, S., & Mauri, M. A. (2002). 3D Modelling , Virtual Reality and the Creation of Flexible Didactical Tools. Virtual Reality, 58113. 7. Butler, A., Izadi, S., Hilliges, O., Molyneaux, D., Hodges, S., & Kim, D. (2012). Shake ‟ n ‟ Sense : Reducing Interference for Overlapping Structured Light Depth Cameras. 8. Caon, M., & Yue, Y. (2011). Context-Aware 3D Gesture Interaction Based on Multiple Kinects, (c), 7–12. 9. Chang, Y.-J., Chen, S.-F., & Huang, J.-D. (2011). A Kinect-based system for physical rehabilitation: a pilot study for young adults with motor disabilities. Research in Developmental Disabilities, 32(6), 2566–70. doi:10.1016/j.ridd.2011.07.002 10. Correia, N. N. (n.d.). PESI : Extending Mobile Music Instruments with Social Interaction. 11. Cruz, L., Lucio, D., & Velhoz, L. (n.d.). Kinect and RGBD Images : Challenges and Applications. 12. David leonardo acevedo cruz calibration of a multi-kinect system. (2012). 13. Du, H., Henry, P., Ren, X., Cheng, M., Goldman, D. B., Seitz, S. M., & Fox, D. (2011). Interactive 3D modeling of indoor environments with a consumer depth camera. Proceedings of the 13th International Conference on - UbiComp ’11, 75. doi:10.1145/2030112.2030123 14. Faion, F., Friedberger, S., Zea, A., & Hanebeck, U. D. (2012). Intelligent sensor- scheduling for multi-kinect-tracking. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 3993–3999. doi:10.1109/IROS.2012.6386007 15. Gaitatzes, A., & Roussou, M. (2002). Reviving the past : Cultural Heritage meets Virtual Reality. Heritage, 103–110. 16. Gatto, I., & Pittarello, F. (2012). Prototyping a gestural interface for selecting and buying goods in a public environment. Proceedings of the International Working Conference on Advanced Visual Interfaces - AVI ’12, 784. doi:10.1145/2254556.2254713 17. Gomez, J. D., Mohammed, S., Bologna, G., & Pun, T. (2011). Toward 3D Scene Understanding via Audio-description : Kinect-iPad Fusion for the Visually Impaired. Computer, 293–294. 18. Hall, T., Ciolfi, L., Bannon, L., Fraser, M., Benford, S., Bowers, J., … Flintham, M. (2002). The Visitor as Virtual Archaeologist : Explorations in Technology to Enhance Educational and Social Interaction in the Museum. Methods, 91–97.

123

19. Interaction, G. B., & Input, H. (2011). Gesture Based. Gesture Based Interaction, 1–35. doi:10.1109/5326.923274 20. Jego, J. F., Paljic, A., & Fuchs, P. (2013). User-defined gestural interaction: A study on gesture memorization. IEEE Symposium on 3D User Interface 2013, 3DUI 2013 - Proceedings, 7–10. doi:10.1109/3DUI.2013.6550189 21. Kadavasal, M. S., & Dhara, K. K. (2007). Mixed Reality for Enhancing Business Communications using Virtual Worlds. Virtual Reality, 1(212), 233–234. 22. Kainz, B., Hauswiesner, S., Reitmayr, G., Steinberger, M., Grasset, R., Gruber, L., … Schmalstieg, D. (2012). OmniKinect : Real-Time Dense Volumetric Data Acquisition and Applications, 25–32. 23. Kessler, G. D., Hodges, L. F., & Walker, N. (1995). Evaluation of the CyberGlove as a whole-hand input device. ACM Transactions on Computer-Human Interaction, 2(4), 263–283. doi:10.1145/212430.212431 24. King, G. R., Piekarski, W., & Thomas, B. H. (2005). ARVino - Outdoor Augmented Reality Visualisation of Viticulture GIS Data School of Computer and Information Science University of South Australia 2 Related Work. Symposium A Quarterly Journal In Modern Foreign Literatures. 25. Krogh, P. G. (2000). „ Interactive Rooms - augmented reality in an architectural perspective “. Architecture, 135–137. 26. Lin, C.-R., & Loftin, R. B. (1998). Application of virtual reality in the interpretation of geoscience data. Proceedings of the ACM Symposium on Virtual Reality Software and Technology 1998 - VRST ’98, 187–194. doi:10.1145/293701.293736 27. Maimone, A., Bidwell, J., Peng, K., & Fuchs, H. (2012). Enhanced personal autostereoscopic telepresence system using commodity depth cameras. Computers & Graphics, 36(7), 791–807. doi:10.1016/j.cag.2012.04.011 28. Maimone, A., & Fuchs, H. (2012). Reducing interference between multiple structured light depth sensors using motion. 2012 IEEE Virtual Reality (VR), (May), 51–54. doi:10.1109/VR.2012.6180879 29. Manseur, R. (2005). Virtual Reality in Science and Engineering Education. October, 8– 13. 30. Merlier, B., Lyon, U. L., Cedex, L., & Weiss, F. (2008). A 3D GESTURAL CONTROLLER SYSTEM BASED ON EYECON MOTION CAPTURE SOFTWARE : A REVIEW OF 10 YEARS OF EXPERIENCE, 2003. 31. Garzotto, P. D. I. (2012). Master in Computing System Engineering Gesture Based Interaction : A survey, 1–48. 32. Olanda, R., Valencia, U. De, Pérez, M., Morillo, P., Fernández, M., & Casas, S. (2006). Entertainment Virtual Reality System for Simulation of Spaceflights Over the Surface of the Planet Mars. System, 123–132. 33. Popescu, C. R., & Lungu, A. (2014). Real-Time 3D Reconstruction Using a Kinect Sensor, 2(2), 95–99. doi:10.13189/csit.2014.020206 34. Ren, Z., Meng, J., & Yuan, J. (2011). Robust Hand Gesture Recognition with Kinect Sensor. Communications, 1–2. 35. Ridene, T., Leroy, L., & Chendeb, S. (2013). Virtual Reality Server of Interaction eXtensible, 2–6. 36. Ruhl, K. B. K., Schr, M. A. Y., Kokem, A. S. J., Magnor, S. G. M., & Braunschweig, T. U. (2011). The capturing of turbulent gas flows using multiple Kinects, 1108–1113. 124

37. Salous, S., Newton, J., Leroy, L., & Chendeb, S. (2015). Dynamic Sensor Selection Based On Joint Data Quality in the Context of a multi-Kinect Module inside the CAVE “ Le SAS .”, International Conference on Virtual Reality, Los Angeles , USA. 38. Salous, S., Newton, J., Leroy, L., & Chendeb, S. (2015). Gestural Recognition by a Four- Kinect Module in a CAVE “Le SAS”,12th Romanian Human-Computer Interaction Conference,Bucharest, September 24-25. 39. Salous, S., Ridene, T., Newton, J., & Chendeb, S. (2014). Study of geometric dispatching of four-kinect tracking module inside a cave, International conference on Disability virtual reality and associated technologies,Gothenburg , Sweden,September 2-4. 40. Satta, R., Pala, F., Fumera, G., & Roli, F. (n.d.). over multiple Kinect TM cameras, 1–4. 41. Scholz, A., Berger, K., & Ruhl, K. (2011). Multiple Kinect Studies Technical Report 2011-09-15. 42. Seminar in “Advanced Topics in Computer Vision” (048921 – Winter 2013) Presented by: Roy Or – El. (2013). 43. Smisek, J. (2011). 3D with Kinect, 1154–1160. doi:10.1109/ICCVW.2011.6130380 44. Sumar, L. (2011). Feasability of Fast Image Processing Using Multiple Kinect Cameras on a Portable Platform. 45. Tang, Y., Lam, B., Stavness, I., & Fels, S. (2011). Kinect-based augmented reality projection with perspective correction. ACM SIGGRAPH 2011 Posters on - SIGGRAPH ’11, 1. doi:10.1145/2037715.2037804 46. Tanriverdi, V., & Jacob, R. J. K. (2001). VRID : A Design Model and Methodology for Developing Virtual Reality Interfaces. Electrical Engineering, 175–182. 47. Tong, J., Zhou, J., Liu, L., Pan, Z., & Yan, H. (2012). Scanning 3D full human bodies using Kinects. IEEE Transactions on Visualization and Computer Graphics, 18(4), 643– 50. doi:10.1109/TVCG.2012.56 48. Villaroman, N., Rowe, D., Ph, D., & Swan, B. (2011). Teaching Natural User Interaction Using OpenNI and the Microsoft Kinect Sensor. Human Factors, 227–231. 49. Weise, T., Bouaziz, S., Li, H., & Pauly, M. (2011). Kinect-based facial animation. SIGGRAPH Asia 2011 Emerging Technologies on - SA ’11, 1–1. doi:10.1145/2073370.2073371 50. Wen, J., & Spielman, B. (2010). Electromagnetic Tracking for Medical Imaging. Washington University in, (January). 51. Williamson, B. M., & Jr, J. J. L. (2012). Multi-Kinect Tracking for Dismounted Soldier Training, (12378), 1–9. 52. Zafrulla, Z., Brashear, H., & Hamilton, H. (n.d.). American Sign Language Recognition with the Kinect. Work, 279–286. 53. Zhang, Z. (2006). Camera Calibration with Lens Distortion from Low-rank Textures. 54. Zhang, Z., & Way, O. M. (1999). Flexible Camera Calibration By Viewing a Plane From Unknown Orientations, 00(c), 0–7.

125

126