Vision based Natural Assistive

Technologies with gesture recognition

using Kinect

Jae Pyo Son School of Computer Science and Engineering University of New South Wales

A thesis in fulfilment of the requirements for the degree of Master of Science (Computer Science Engineering)

2

3

Abstract

Assistive technologies for disabled persons are widely studied and the demand for them is rapidly growing. Assistive technologies enhance independence by enabling disabled persons to perform tasks that they are otherwise unable to accomplish, or have great difficulty accomplishing, by providing the necessary technology needed in the form of assistive or rehabilitative devices. Despite their great performance, assistive technologies that require disabled persons to wear or mount devices on themselves place undue restrictions on users.

This thesis presents two different attempts of a solution to the problem, proposing two different novel approaches to achieve vision-based Human-Computer Interaction (HCI) for assisting persons with different disabilities to live more independently in a more natural way, in other words, without the need for additional equipment or wearable devices. The first approach aims to build a system that assists visually impaired persons to adjust to a new indoor environment.

Assistants record important objects that are in fixed positions in the new environment using pointing gestures, and visually impaired persons may then use pointing gestures as a virtual cane to search for the objects. The second approach aims to build a system that can assist disabled persons who have difficulties in moving their arms or legs and cannot drive vehicles normally. In the proposed system, the user only needs one hand to perform steering, switching between gears, acceleration, brake and neutral (no acceleration, no brake).

The technical contributions include a new method for pointing gesture extracted from forearm and a new algorithm for measuring the steering angle from the movement of a single hand. The proposed approaches were experimented with and both quantitative and qualitative analyses are presented. The experimental results show that the general performance of the implemented systems is satisfactory, though the effectiveness of the systems in a real-life situation for disabled persons is currently unknown. However, it is believed that these approaches can potentially be used in real life in future, and will encourage development of more natural assistive technologies.

4

CONTENTS

1. Introduction ...... 10 1.1. Research Overview ...... 10 1.2. Scope ...... 11 1.3. Contribution ...... 12 1.4. Organization ...... 13

2. Gesture Recognition for Assistive Technologies ...... 14 2.1. Assistive Technologies ...... 14 2.1.1. Pointing Gesture ...... 15 2.1.2. Hand Gestures ...... 15 2.1.3. Kinect ...... 16 2.2. Gestures in HCI ...... 18 2.2.1. Pointing Gesture ...... 19 2.2.2. Hand Segmentation and Tracking ...... 27 2.2.3. Pattern and Hand Trajectory Recognition ...... 29 2.3. Discussion ...... 33 2.4. Tools, Engines and Frameworks ...... 35 2.4.1. OpenNI ...... 35 2.4.2. OpenCV ...... 36 2.4.3. Unity3D ...... 36 2.5. Summary ...... 36

3. Object-Based Navigation ...... 38 3.1. Pointing Gesture Recognition ...... 38 3.1.1. Body Joints Detection ...... 39 3.1.2. Pointing Direction Estimation ...... 41 3.1.3. Pointing Gesture Detection ...... 44 3.2. System Description ...... 45 3.2.1. Offline Training ...... 46 3.2.2. Object Navigation ...... 47 3.3. Experimental Results ...... 50 3.3.1. Pointing Gesture Accuracy Comparison for Different Body Joints ...... 50 3.3.2. System Performance...... 53 3.4. Limitations ...... 57 3.5. Remarks ...... 57

4. Single-handed Driving System ...... 59 4.1. Hand Segmentation ...... 60 5

4.2. Fingertip Detection ...... 62 4.3. System Description ...... 63 4.3.1. Nonholonomic Steering ...... 65 4.3.2. Differential Steering ...... 68 4.4. Experimental Results ...... 69 4.5. Simulation ...... 71 4.5.1. Testing of Nonholonomic Steering ...... 71 4.5.2. Differential Steering ...... 74 4.6. Limitations ...... 81 4.7. Remarks ...... 82

5. Conclusion ...... 83 5.1. Thesis Overview ...... 83 5.2. Contributions ...... 84 5.3. Limitations and Future Work ...... 85 5.4. Concluding Remarks ...... 86

A. Publications Arising from Thesis ...... 87

B. Acronyms and Abbreviations ...... 88

Bibliography ...... 89

6

List of Figures

1.1: Research Overview ...... 11

2.1: Examples of Pointing Gesture ...... 15

2.2: Examples of Hand Gesture ...... 16

2.3: Mircosoft Kinect ...... 17

2.4: Kinect-based HCI Applications ...... 18

3.1: Three Steps for Pointing Gesture Recognition ...... 39

3.2: Skeleton Tracking Example ...... 40

3.3: Calibration Pose ...... 41

3.4: 3D Space Created by User’s Arm and a Point Cloud ...... 42

3.5: Estimated Pointing Zone Created by θ ...... 44

3.6: Tracking User and Recording a Pointed Object to the Database ...... 45

3.7: System Overview ...... 46

3.8: Scenario for Offline Training ...... 47

3.9: Angle Difference Between User’s Pointing Direction and Objects ...... 48

3.10: Scenario for Object Navigation ...... 50

3.11: Expected Difference in Pointing Direction Estimated from Different Body Joints

...... 51

3.12: Experimental Results for Comparison of Pointing Gestures Estimated from

Different Body Joints ...... 52

3.13: Small Room ...... 54

3.14: Large Room ...... 54

3.15: Blind Testing ...... 56

4.1: System Overview ...... 60

4.2: Hand in 3D Disparity Map ...... 61

4.3: Segmented Hand Before Filtering & After Filtering ...... 61

4.4: Centre of the Segmented Hand Being Tracked Successfully ...... 62

7

4.5: Fingertips Detected Successfully ...... 63

4.6: Distinction between Acceleration (green), Idle (blue) and Brake (red) using Depth

Thresholding ...... 64

4.7: Switching Gears Using Different Number of Fingertips ...... 65

4.8: How Each Features of the Car is Achieved with a Single Hand ...... 65

4.9: Four Different Steps for Nonholonomic Steering Algorithm ...... 66

4.10: Steering Angle Calculation ...... 67

4.11: How Differential Steering Works ...... 68

4.12: Performance Measured for Each Feature ...... 70

4.13: Differential Driving Simulators Tested...... 72

4.14: Visualization for Nonholonomic Steering ...... 73

4.15: Simulation for Differential Steering ...... 74

4.16: Lap Time Comparison between Keyboard Input and Hand Gesture ...... 77

4.17: Collisions Comparison between Keyboard Input and Hand Gesture ...... 80

8

List of Tables

2.1: Pointing Gesture Literature Review Summary ...... 25

2.2: Hand Segmentation and Tracking Literature Review Summary ...... 29

2.3: Pattern and Hand Recognition Literature Review Summary ...... 33

3.1: Scaled Average Error and Standard Deviation for each Body Joints at Different

Angles ...... 51

3.2: Experimental Results ...... 56

4.1: Comparison Between Two Steering Types ...... 63

4.2: Experimental Results ...... 69

4.3: Experimental Results for Lap Time Comparison between Keyboard Input and Hand

Gesture ...... 76

4.4: Experimental Results for Number of Collisions between Keyboard Input and Hand

Gesture ...... 78

9

Chapter 1

Introduction

Hardware-based assistive technologies have great performance but the downside is that disabled persons may resent the intrusiveness of a device that has to be worn on their bodies. It is important to help disabled persons to live their daily lives independently but also equally important to think about their comfort, instead of just focusing on the functionality. The new approach to solve this problem is to develop a natural assistive technology that is simple and does not require any devices to be mounted. This research aims to achieve natural Human-Computer Interaction (HCI) systems that help individuals with disabilities, using a device called Kinect [1], that has both RGB and depth sensor and hand/arm gestures.

In this chapter, a brief overview of the research is presented and the research scope, contribution and organization are discussed.

1.1. Research Overview

The goal of this research is to achieve natural HCI-based assistive techonologies that do not require users to wear devices, which will lead to higher quality of living for disabled persons in the future, as they can choose between more options instead of being required to use what is only available for them, regardless of how they feel about using it.

For example, visually impaired persons need to hold the cane to detect what is near them. Regardless of whether it is electronic or not, having to hold something even in indoor environment can be uncomfortable. Also, disabled or injured persons who cannot use one of their arms or both legs would not be able to drive vehicles as it is very dangerous or nearly impossible.

10

This research proposes two systems as a solution to the problem discussed previously: an object navigation system for visually impaired persons and a single- handed driving system for physically disabled persons. Both parts share the same ultimate goal, which is to implement natural HCI-based assistive technology to assist disabled persons to live with enhanced independence. Each of the proposed systems is designed for people with different types of disabilities, and they use different body gestures as shown in Figure 1.1. The Object navigation system mainly uses the pointing gesture, while the single-handed driving system mainly uses hand gestures. What is common is that both are based on gesture recognition using the Kinect.

Figure 1.1: Research Overview 1.2. Scope

The focus of this thesis is natural HCI-based assistive technology. This thesis includes the following:

i) Image Segmentation is the process of parti tioning an image into multiple

segments, leaving only the desired image so that the image becomes easier to

analyze. In this thesis, the hand is the main target to segment and analyze. Hand

11

segmentation can be divided into colour segmentation and depth segmentation.

Both methods are reviewed and the one that is better suited for this research is

chosen.

ii) Object Localization is a process that marks an object in a map or records the

coordinates of the object. In this thesis, object localization is integrated with the

pointing gesture in order to effectively record the coordinates of each object.

iii) 3D Pointing Gesture is popularly used in vision-based HCI applications. 3D

pointing gesture direction is an extension of the 3D vector from head to hand,

shoulder to hand or elbow to hand. In this thesis, all these body joints are

considered and experimented with to find the most accurate method to estimate

the pointing direction.

iv) Hand Gesture Recognition is the process of analyzing the shape and movement

of a hand and extracting information from it. In this thesis, shapes of the hand

and the fingertips and the movement of the hand are the main topics of interest.

1.3. Contribution

The presented research in this thesis makes numerous contributions to the field of assistive technology and computer vision. In the field of assistive technology, this research utilises a real-time 3D pointing gesture for assisting visually impaired persons and real-time hand tracking with only one hand for driving, to assist persons with arm or leg disabilities. Both novel approaches presented in this thesis are applied to the field of assistive technology for the first time.

In the field of computer vision, this research makes contributions with a robust algorithm for pointing direction estimation with hand and elbow that gives more degrees of freedom than hand-shoulder or hand-head pointing gesture, and algorithms for tracking and measuring the rotation angle of a single hand which can be widely applied to hand gesture-based HCI applications.

12

1.4. Organization

The remaining chapters of this thesis are organized as follows:

The background and literature survey of related works are presented in Chapter 2. The required background for this research is presented first and the related works such as the pointing gesture, hand tracking and pattern and hand trajectory recognition are presented and discussed.

An Object-based Navigation system that integrates 3D pointing gesture and object localization is presented in Chapter 3. It assists visually impaired persons to adjust themselves to their any environment. The methodology, system description and experimental results are presented and future work and limitations are discussed.

A Single-handed driving system that analyzes shapes and movement of the hand is in

Chapter 4. It will assist persons with arm or leg disabilities to drive vehicles. The methodology, system description and experimental results are presented and future work and limitations are discussed.

The conclusion of this thesis is in Chapter 5, wrapping up with an overview of the thesis, contributions made, limitations and future works.

13

Chapter 2

Gesture Recognition for Assistive

Technologies

As the proposed systems are both vision-based, human body gestures are essential tools for the systems. This research is particularly interested in gestures performed by the human arm and hands, and this leads to a study of the pointing gesture and hand gestures. This chapter presents the background of this research and a survey of past work in the related areas. Background on assistive technologies is provided in Section

2.1, related work on gestures in HCI applications is reviewed in Section 2.2, a discussion is presented in Section 2.3, other tools and frameworks are introduced in

Section 2.4, then lastly a summary of this chapter is presented in Section 2.5.

2.1. Assistive Technologies

There have been many approaches to building assistive technologies that provide the visually impaired with autonomous navigation in natural environments, from a walking cane to laser cane [2], auditory guidance [3], Electronic Travel Aids [4] and ultrasonic sensing devices [5], and research continues in this field.

There are also some car-driving assistive technologies for disabled persons such as

Tongue Drive System [6] or Adaptive Driving [7] that allow disabled persons to drive their car, allowing a driver with a disability and a non-disabled driver to drive equally well.

There has been some work on assistive technology performed using the new

Mircosoft device called Kinect [1], that will be introduced in Section 2.1.3, mainly games for the deaf/mute [8] or older persons with wheelchairs [9]. For visually impaired

14 persons, an indoor navigation system was implemented [10] that warns the user of the obstacle in front and the direction to turn. However, this system requires the user to wear a helmet with a Kinect mounted on top and a backpack that contains a laptop and a battery pack inside. In this thesis, the focus is on implementing a natural user interface based on gesture recognition that does not require users to wear anything. Background on pointing gesture, hand gestures and Kinect are now discussed.

2.1.1. Pointing Gesture

Pointing gesture is a non-verbal language that is normally performed by persons using their hands as shown in Figure 2.1. Pointing gesture is used as a means of communication between persons and mainly used for indicating direction or objects.

Pointing gesture is popularly used in HCI-based applications as it is a natural body language that can be easily performed by the user and is not affected by noise.

Figure 2.1: Examples of Pointing Gesture [11]

2.1.2. Hand Gestures

Hand gestures are non-verbal languages that involve all the different kind of gestures performed by the hand as illustrated in Figure 2.2. For example, hand gestures include handshaking, clapping, waving and opening/closing of the hand and they have different

15 meanings [12]. Hand gestures are very popular in the field of computer vision, HCI and

Human-Robot Interaction (HRI) as they contain useful information such as the shape of the hand and movement of the hand.

3 Figure 2.2: Examples of Hand Gesture 2.1.3. Kinect

Kinect [1] is a Microsoft product that was originally built for entertainment purposes (to work with the Xbox360 console) but due to its ability to obtain depth information and capture user motion, it has been widely used by programmers for both academic and entertainment purposes. There are many applications of Kinect already released, mostly by individuals or companies. One of the advantages of using the Kinect is that it is inexpensive compared to a stereo or ToF (Time-of-Flight) camera while it can perform as well or even better, except for the depth range which is limited to approximately 0.7-

6m. The Kinect and its in-built devices are illustrated in Figure 2.3.

Kinect can be purchased separately from the xbox360 console at any games or electronic stores. The price is approximately $100 AUD.

16

4

5 Figure 2.3: Microsoft Kinect. [2]

There have been many HCI applications such browsing images with hands using the

Kinect [13, 14] and mounting the Kinect on the head to detect obstacles in front of visually impaired users [11] as shown in Figure 2.4. The Kinect can be plugged into a

PC and has an in-built processor that can track the skeletonised human body.

17

6 Figure 2.4: Kinect-based HCI Applications [13,14][11]

2.2. Gestures in HCI

There have been many different uses of human body gestures in the field of HCI. In order to find out the most appropriate method for each step of this research, past works on related areas such as pointing gesture, hand segmentation and tracking and pattern and hand trajectory recognition are studied and discussed. A review of related work on pointing gesture-based applications will be presented in Section 2.2.1, hand-gesture based applications that use hand segmentation and tracking will be presented in Section

2.2.2, then pattern and hand trajectory recognition will be presented in Section 2.2.3.

18

2.2.1. Pointing Gesture

In the field of HCI and HRI, pointing gesture is commonly used to indicate direction of an object. Many different approaches to implement the pointing gesture in HCI applications have been attempted.

Jojic et al. [15] presented detection and estimation of pointing gesture using dense disparity map since it is less sensitive to lighting changes and the colour of the clothes worn by the user as compared to colour images. They used a commercially available real-time stereo system called Triclops built by PointGrey [16]. The system uses three cameras to minimize correlation problems using both horizontal and vertical disparities.

Background subtraction and foreground segmentation were performed to extract the human silhouette from the depth image. Head and fingertip were found using Gaussian blobs defining extremal points. Then the pointing gesture was estimated using the line- of-sight created by coordinates of the head and fingertip. The experimental results showed that the system works better when the user is pointing at large objects, where the success rate was 90%. It was concluded that the system is robust but not sensitive to sudden large motions since it uses a single frame to allow fast estimation of the pointing gesture. It was found that line-of-sight pointing gesture is less accurate than the full-arm pointing gesture, however fully stretching the arm for a long time can be tiring and is not physically desirable.

Kehl et al. [17] created a portal constructed as a three-sided stereo back-projection system with the screens arranged in a rectangular layout. Multiple cameras were placed around the portal and using the images from these cameras, they detected the head and hand by extracting the silhouettes of the segmented foreground and searching for the extremal points. Kehl et al. estimated pointing direction using line of sight created between the locations of the eye and fingertip. The position of the eye was estimated by subtracting a constant from the top of the head of the user, which was 15cm in this case.

This might limit the accuracy of the system as it was assumed that the distance between

19 the top of the user head and eye is 15cm, which might not be always correct. The experiment was done inside a dark room and the pointed area was lightened in order to obtain visual feedback. However, Kehl et al. could not measure the exact accuracy of the system since the user was pointing at a virtual object in a virtual scene inside a dark room. Therefore, qualitative measurement was use. Kehl et al. concluded that their system is fast and robust with very good accuracy and the system does not require any initialization for tracking and the user can point at any direction but the ground.

Nickel et al. [18] used a stereo camera to obtain colour and disparity information.

Fast 3D blob tracker was implemented and used for skin colour distribution to detect head and hand locations. Three types of pointing direction estimation were experimented with: head-hand line, forearm orientation and head orientation. While the head-hand line and forearm orientation were extracted from stereo images, the head orientation was measured with a head-mounted magnetic sensor. Hidden Markov model

(HMM) based classifier was trained to detect the pointing gestures. The experimental results showed that the accuracy of head-hand line, forearm orientation and head orientation were 90%, 73% and 75% respectively. From the result, it was concluded that the head-hand line is most accurate among the three methods, followed by head orientation.

Yamamoto et al. [19] proposed a Ubiquitous Stereo Vision (USV) system which uses multiple fixed cameras to provide three-dimensional information for individual arm-pointing gesture recognition. Using a crossing hierarchical method, three- dimensional information from the multiple stereo cameras is transformed to two- dimensional image sequence. The system segments the two-dimensional image sequences and gains personal information vectors which are the body information vector and arm information vector. Yamamoto et al. distinguished between intentional arm-pointing gesture and unconscious arm movements by defining intentional arm- pointing gesture as a single straight movement of an arm. Then the intentional arm- pointing gesture was divided into horizontal and vertical directions. Horizontal 20 intentional direction of arm pointing was defined as a line between the tip of the arm and the centre of gravity of the person. Vertical intentional direction of arm pointing was calculated using the height of the arm, height of the eye and height of the person.

The pointing gesture was triggered by pointing at the same direction for a fixed time and then the gesture was recognized. The experimental results showed that the accuracy of the pointing gesture while standing was 97.4% and the accuracy of pointing gesture while sitting was 94%, which are both reasonably high. It was concluded that the USV system is not restricted to the position and direction of the user and distinguishes between intentional pointing gesture and unconscious arm movements. However, the drawback is that the user has to wait until the system decides whether the gesture is intentional or not since the trigger is time dependent. Lowering the threshold time would result in weaker boundary between intentional and unintentional gestures, so there is a trade-off between time and performance.

Sato et al. [20] constructed a system in which humans can use pointing gestures to instruct a robot to follow a specified route and point at an object. The user wore a cap and gloves, then an open source program called iSpace was used to track the cap and gloves using RGB information from cameras. Japanese body ratio was used to calculate the length of shoulder and fuzzy associative memory (FAM) was used to recognize pointing gestures. A virtual room was created using Java3D for visual feedback. Several experiments were performed but most of the results just showed that the robot could point at an object, or it could be parked at a specified position guided by the user. In conclusion, a multipurpose system was constructed which can tell the robot to follow a specified route, point at an object and park at a desired location. However, wearing a cap and gloves for body parts detection can be very inconvenient, and using Japanese body ratios to calculate shoulder length can limit the compatibility of the system.

Guan et al. [21] developed a foreground extraction algorithm based on wavelet multi-scale transformation across background subtraction to segment the human body from a stereo image. Fingertip and head were tracked using a stereo matching strategy 21 based on geometric constraints between fingertip, arm axis and epipolar line. This method has advantages over other methods for detecting body parts because it does not depend on skin colour of the user, while some of the other methods use skin colour to detect body parts. Pointing direction was estimated using the 3D positions of finger tip and eye. Guan et al. estimated the position of the eye by subtracting 8cm vertically from the position of the head top and as previously explained, this can result in lower accuracy as more people use the system. The experimental results showed that the success rate was 95.6% on average. It was noticed that the precision decreased as the distance from user to target increases. The reason might be that the user’s eyesight object resolution decreases, and the user becomes unsure of whether they are pointing at the correct target.

Carbini et al. [22] proposed to approximate the eye-to-fingertip line by face-to-hand direction for pointing purpose and interaction with large screens. The best ellipsoidal models of the face and hands could be found with Expectation Maximization (EM) algorithm using skin colour pixels seen by two cameras. Since face detection is accurate but slow, face detection was only performed once when the user entered the field of view instead of detecting it continuously. Due to low resolution, the hand was detected with an interesting approach. Using disparity information, the 3D position of the face could be found and an assumption was made that when the user is pointing at something, there is a certain distance between the face and the hand. Carbini et al. created a search zone which is 30cm distant from the face, and a moving skin colour zone was considered as the hand. This method seems to be not sufficiently robust enough as hand detection entirely depends on skin colour and has low compatibility since a constant was used to find a human body part. The experimental results showed that the moving pointing gesture has lower accuracy than a still pointing gesture. Also it was found that an unintended pointing gesture was taken as intentional pointing gesture sometimes, since no distinguishing method between intentional and unintentional pointing gesture was applied.

22

Chien et al. [23] proposed a system with three calibrated cameras to track a user’s pointing arm and estimate the pointing line and the pointed targets in 3D space. Direct

Linear Transform (DLT) and particle filtering were used to track the two 3D points of the pointing arm. A bounding box was used to enclose the arm then background subtraction and skin colour detection were used to segment the arm region. Two end points of the forearm were used (fingertip and end of sleeve) for pointing direction estimation. Experiments were performed under two conditions. The first experiment was performed with no precision refinement and the second one with iterative precision refinement algorithms using epipolar geometry to find the more accurate pointing line.

The results showed that the average success rate for pointing gesture with/without refinement were 84.75%/ 94.5% respectively which proved that refinement contributed to better accuracy. The limitations are that the system gives the best results when the user is 30cm from the targets and the system is influenced by the lighting condition, the target position and arm angle which may be critical to robustness. Also, the user must wear a short sleeved t-shirt in order to have sufficient arm region segmented.

Kim et al. [24] studied how to extract an unknown object pointed at by a person and make a robot learn that object. The aim is to provide a natural interface for a robot to register new objects without additional equipment or pre-built face database. Firstly, face detection is achieved by combining stereoscopic range information and CBCH

(Cascade of Boosted classifiers with Haar-like features). Then an arm-pointing vector is obtained by using centre-of-shoulder and centre-of-hand as starting point and end point respectively; this vector is combined with centre-of-face. Then the 3D position in the camera coordinate system is transformed to pointing coordinate system using 3D matrix transformation. ROI (Region of Interest) is found using this 3D pointing vector and the object inside ROI is extracted by image processing techniques. The limitations are that the result is unsatisfactory due to failure to achieve accurate object extraction.

Extracting an unknown object is considered a difficult problem and this author states that further study on this topic is required.

23

Park et al. [12] developed a real-time 3D pointing gesture recognition algorithm for mobile robots, based on a cascade Hidden Markov model (HMM) and a particle filter.

The aim of this paper was to achieve robust real-time 3D pointing gesture recognition algorithm with high gesture recognition and target selection rate. Firstly, automatic face and hand detection is implemented using Viola and Jone’s face detector [] with heuristic rules for face detection and depth image foreground segmentation for hand detection.

Hand was detected using skin colour of the detected face and the position of the shoulder was estimated using the overall region of human body. Secondly, 3D particle filters using Gaussian density are implemented for robust tracking. Thirdly, hand position mapping for pointing direction estimation is done by calculating the line of projection using shoulder and hand positions and calculating the intersection point using head-hand line. Lastly, gesture spotting by the second stage HMM is developed which divides gestures into three phases which are non-gesture, move-to and point-to phases.

Park et al. tried both face-to-hand and shoulder-to-hand lines to estimate the pointing direction and the experimental results showed that the face-to-hand line is more accurate for horizontal pointing gestures, and a shoulder-to-hand line is more accurate for vertical pointing gestures. The gesture recognition and target selection rate from the experiment are better than 89% and 99% respectively during human-robot interaction.

This work appears to be current state-of-art for the pointing gesture algorithm, since they achieved a high robustness using cascade HMM and particle filter.

Li et al. [25] tracked a face using Lienhart’s algorithm based on Haar-like features and boosted classifiers using colour information from Web camera and depth information from TOF (Time of Flight) range camera. It was stated that TOF range camera has advantages over stereo camera because compared to stereo camera, TOF camera is illumination and colour invariant, texture invariant and provides higher depth measurement accuracy. The hand is detected using particle filter in colour images with foreground only. In order to reduce some constraints during filtering, short sleeves were worn. However there are still drawbacks such as narrow field of view, low resolution

24 and noisy distance data. In order to fix the noise problem, median and Gaussian filter were used to remove pixels whose signal-noise ratios are low, and adjust the exposure time and process the depth data. Li et al. tried head-finger line and forearm orientation using PCA (Principal Component Analysis) method and RANSAC (Random Sample

Consensus) framework to estimate pointing directions. PCA was used to transform data to a new coordinate system and RANSAC framework was used to handle the outliers that PCA cannot handle. To detect the pointing gesture, Li et al. adopt a simple method which detects the pointing gesture by finding angle difference by pointing vector and user body which was assumed vertically downwards. This kind of approach is not robust enough but Li et al. stated that they already had a hand shape recognition module and they would combine their work with a speech module as well to fix the pointing gesture detection issue in future. They stated that they focused on accuracy of the pointing direction estimation. Li et al. had experiments on pointing direction estimation using different body parts (forearm orientation, head-hand line and head-finger line) for comparison. After experiments they concluded that head-finger line has greater accuracy than head-hand line and forearm orientation.

A summary of related work in the field of pointing gesture recognition is shown in

Table 2.1.

25

Table 1

Table 2.1: Pointing Gesture Literature Review Summary

Author Year Device Body parts Pointing Pointing (et al.) detection direction gesture algorithm estimation detection Jojic 2000 Multiple Dense disparity Head- none [15] Stereo map Fingertip Vision line Kehl 2004 Multiple Background Eye- none [17] Stereo subtraction, Fingertip Vision Foreground segmentation Nickel 2004 Multiple 3D blob-tracker, Head-hand, HMM [18] Stereo skin colour Forearm Vision, detection orientation, Magnetic Head sensor orientation Yamam 2004 Multiple USV Eye-tip of Timing oto[19] Stereo the arm trigger Vision Sato 2007 Multiple iSpace Head-hand Nodding [20] Camera (cap-glove) motion Guan 2007 Multiple Background Head- none [21] Stereo subtraction, Fingertip Vision Foreground segmentation Carbini 2007 Multiple EM Algorithm Eye- none [22] Stereo Fingertip Vision Chien 2007 Multiple DLT, Particle Eye-End of none [23] calibrated Filtering sleeve cameras Kim 2008 Single CBCH, skin- Face-Hand none [24] Stereo colour detection Vision Park 2008 Single Viola & Jone’s Face-Hand, HMM [12] Stereo Face detector, Shoulder- Vision 3D particle filter Hand Zhi 2010 Web Lienhart’s Forearm, Angle [25] camera, algorithm for face Head- difference TOF detection, Hand, between range Particle filter Head- pointing camera Finger and body vector

26

2.2.2. Hand Segmentation and Tracking

Hand tracking is a very common and essential tool for vision-based applications, as the hand is one of the easiest and smallest body parts that human can control. In order to track the hand, it is very important to segment the hand successfully. Past work on hand segmentation and tracking were reviewed in order to find the most appropriate method to segment the hand and track it.

Chai et al. [26] presented a robust hand gesture analysis method using 3D depth data.

They focused on accurate hand segmentation by removing the negative effect of the forearm part. The coarse hand region was detected with the depth data captured by 3D camera. The geometric circle feature was extracted to represent the palm region to determine the part in the coarse region that belongs to a hand. After the palm was located in the coarse hand region, the forearm cutting was implemented by determining whether the cutting direction was horizontal or vertical and using the spatial relationship between centre of the hand and centre of the palm.

Bao et al. [27] implemented a real-time hand tracking module using a new robust algorithm called Tower method to obtain hand region and used skin colour for hand segmentation. The skin segmentation was based on YCbCr colour space to use the system in unrestricted environment and morphological operations were used to smooth the image and remove the noise while extracting the hand with Tower tracking method.

Tower tracking is a method that determines the features of the object and approximates the value of the distance between “towers”. Then it generates the coarse towers with the distances and scans the signal in all coarse towers. With each signal, a boundary spreading algorithm is executed and refines the set of points found in the algorithm. The results showed that the proposed algorithm was reasonably robust.

Wang et al. [28] developed an entertainment robot which plays the rock, scissors and paper game. Image pre-processing was performed to capture the hand image of a

27 person by skin colour-extraction to distinguish between hand and the background, and dilation and erosion operations to obtain a clean gray image of a hand leaving the background black. Hand gesture recognition was performed by removing the hand image without fingers from the hand image with fingers, which leaves the fingers only; then the number of fingers are counted to determine whether the gesture is rock, scissors or paper. The study indicates that image processing can be applied in human-robot interaction for simple entertainment purposes. The strength of this research is that simple image-processing technique was used, but the limitations are that the scope and the techniques used are relatively low, and this explains why the result has relatively low success rate. The approach is good but newer techniques would make it better.

Raheja et al. [29] presented a new approach for controlling a robotic hand or an individual robot by showing hand gestures in front of a camera. The system captures a frame containing some gestures and extracts the hand gesture area from the captured frame. The hand gesture area was obtained by cropping the hand region after global thresholding. The result showed 90% accuracy in proper light arrangement and the accuracy was affected with poor lighting arrangement. Using a colour transform such as

YCrCb or HSV for segmentation would have solved that problem.

Yu et. al [30] presented a feature extraction method for hand gestures based on multi-layer perception. The hand was detected by skin segmentation using YCbCr colour space. The hand silhouette and features were accurately extracted by binarizing the hand image and enhancing the contrast. Median and smoothing filters were integrated in order to remove the noise. The results showed 97.4% recognition rate which proves that the hand detection was robust enough.

A summary of related work in the field of hand segmentation and tracking is shown in

Table 2.2.

28

Table 2 Table 2.2: Hand Segmentation and Tracking Literature Review Summary

Author Year Segmentation Method Device (et al.) Chai [26] 2009 Geometric circle feature 3D camera forearm cutting Bao [27] 2009 YCbCr colour segmentation RGB Tower tracking camera Wang [28] 2010 Skin colour extraction RGB camera Raheja [29] 2010 RGB colour segmentation RGB camera Yu [30] 2010 YCbCr colour segmentation RGB camera

2.2.3. Pattern and Hand Trajectory Recognition

Pointing gestures can only indicate a direction or an object. Unless pointing gestures are used as a replacement for the mouse cursor, it is impossible to switch between different states such as turning something on and off. Past work on pattern and hand trajectory recognition were reviewed in order to find the best methods to switch between states and issue different commands to the system.

Yang et al [31] proposed an algorithm for extracting and classifying 2D motion in an image sequence based on motion trajectories. Multiscale segmentation was performed to generate homogeneous regions in each frame. Then regions between consecutive frames were matched to get two-view correspondences. Affine transformations were computed from each pair of corresponding regions to define pixel matches. Motion patterns were learned from the extracted motion trajectories using a time-delay neural network. The proposed method was applied to recognize 40 hand gestures of American Sign Language. The results showed that the motion patterns of hand gestures could be recognized with reasonable accuracy using motion trajectories.

Oka et al. [32] presented an augmented desk interface which involves fingertip trajectory recognition. They obtained fingertip trajectories by computing correspondences of detected fingertips between successive image frames. Kalman filter

29 was used to predict fingertip locations in one image frame based on their locations detected in the previous frame. This process was applied separately for each fingertip.

HMM was used to recognize fingertip trajectory. The input to HMM consists of two components for recognizing multiple fingertip trajectories. The average success rate was

98.2% for thumb only and 98.3% for drawing with forefinger only which are pretty high.

Although the main objective of this research was different from the research proposed in this thesis, it involved good symbol recognition for fingertip trajectories, which can be very similar to hand trajectory recognition.

Keskin et al. [33] have developed a human-computer interaction interface (HCI) based on real-time hand tracking and 3D dynamic gesture recognition using Hidden

Markov Models (HMM). The system captures and recognizes hand gestures of the user wearing a coloured glove. The coordinates of the hand are obtained via stereo image.

For feature extraction, Kalman Filter was used to filter the trajectory of the hand motion.

3D coordinates of the marker in successive images were transformed into sequences of quantized velocity vectors in order to eliminate coordinate system dependence. An

HMM then interprets these sequences which are directional codewords characterizing the trajectory of the motion. The results showed 98.75% average accuracy which is reasonably high. Eliminating the dependency on a coloured glove would make the system more direct and natural.

Bhuyan et al. [34] proposed extraction of certain features from the gesture trajectory so as to identify the form of the trajectory. It was stated that these features can be efficiently used for trajectory guided recognition of hand gestures. For motion vector estimation, the tracking algorithm was based on the Hausdorff object tracker. This algorithm matches a two-dimensional binary model of the object against subsequent frames using the Hausdorff distance measure. Bhuyen et al. segmented the frames in the gesture sequence so as to form video object planes (VOP) where the hand was considered as a video object. Piecewise approximation of spatial positions of the centroids in successive VOPs was applied in order to obtain a smooth trajectory. Then 30 static and dynamic features of trajectories were extracted and normalized for gesture matching. The experimental results showed that the accuracy of the proposed system was 99%.

Elmezain et al. [35] proposed an automatic system that recognizes both isolated and continuous gestures for Arabic numbers in real time based on Hidden Markov Model

(HMM). Skin segmentation of hands and face was performed using stereo colour image sequences, which calculates the depth value in addition to skin colour information. Out of three basic features, namely location, orientation and velocity, orientation was used as the main feature in the system. The quantized orientation was obtained, then used as input to the HMM as a discrete vector. The isolated and continuous gesture paths were recognized by their discrete vector and HMM Forward algorithm corresponding to maximal gesture models over the Viterbi best path. The experimental results showed that the average recognition rates for isolated and continuous gestures were 98.94% and

95.7% respectively.

Wenjun et al. [36] presented a novel approach based on motion trajectories of hands and hand shapes of the key frames. Since only hand trajectories are the topic of interest, hand shape recognition will be omitted from this review. Hand segmentation was performed using skin colour detection. The feature of hand motion trajectory was considered as a group of discrete values indicating the direction of motion. For 2D coordinates, they were divided into 8 average directions of movement. Wenjun et al. stated that the difference between Hidden Markov Model (HMM) and Dynamic Time

Warping (DTW) is small, yet DTW is easier to compute and requires no additional computing time compared to HMM. Therefore, DTW was used for motion trajectory recognition. The recognition rate was pretty low when the recognition was only based on motion trajectory but the rate was greatly improved when the motion trajectory was combined with key frames.

31

Balakrishna et al. [37] presented a human-robot interaction system using Wiimote- based gestures. The system is trained by the user using a Wii Remote in order to recognize future gestures. This is different from hand trajectory recognition but the basic methodology for recognition is similar, though the method to extract features is different. In this paper, WEKA Naïve Bayes Classifier was implemented in C#. The evaluation showed that the algorithm achieved an accuracy of 99% with 10 training samples per gesture and 95% with one training sample per gesture.

Wang et al. [38] developed a real-time gesture recognition system for their service- robot based on a joint approach. The aim of their work was to achieve static and dynamic hand gesture recognition. Firstly, Wang et al. used a cascade classifier to locate potential hand regions, trained by the AdaBoost procedure. Secondly, 2D Gabor transformation was used to analyze the local texture of the hand region obtained by the cascade classifier. Then Support Vector Machine (SVM) was used to classify hand shape and Kalman filter to track the position and velocity of the region centre given the image. Lastly for gesture detection, Hidden Markov Model was used to recognize the label observations with the likelihood given by Viterbi algorithm. The strength of this work is that the system architecture is shown very clearly and different methods are used to achieve robust gesture recognition including classifier, filter and HMM. The limitation is that despite a high success rate for static hand gesture recognition, the success rate for dynamic hand gesture recognition is very low.

Gaus et al. [39] introduced a method to identify hand gesture trajectories in a constrained environment. The method consists of three steps, namely are collection of input images, skin segmentation and feature extraction. YCbCr colour transformation was performed so that the skin colour segmentation performance is not affected by lighting conditions. The hand gesture trajectory was recognized by using two methods: template matching and division by shape. The results showed upto 80% of accuracy for gesture trajectory recognition, which is relatively low. This might be due to problems with the classification method. 32

A summary of related work in the field of pattern and hand trajectory recognition is

shown in Table 2.3. Table 2.3.

Table 3 Table 2.3: Pattern and Hand Trajectory Recognition Literature Review Summary

Author Year Detection Trajectory Recognition (et. al) Method Tracking Yang [31] 2002 Multiscale Affine Time-delay segmentation transformation neural network Oka [32] 2002 Colour Kalman filter Hidden Markov Segmentation Model (HMM) Keskin [33] 2003 Colour Kalman filter HMM segmenation (Coloured glove) Bhuyan 2006 Frame Hausdorff Normalization of [34] segmentation object tracker static and dynamic features Elmezain 2008 Skin colour + Quantized HMM [35] depth orientation + segmentation Viterbi best path Wenjun 2010 Skin colour Key frame Dynamic Time [36] segmentation extraction Warping (DTW) Balakrishna 2010 n/a n/a WEKA Naive [37] (Wii Remote) Bayes Classifier Wang [38] 2010 Cascade Support Vector HMM classifier, Machine 2D Gabor (SVM), transformation Kalman filter Gaus [39] 2011 YCbCr colour Feature Template segmentation extraction matching, Division by shape

2.3. Discussion

Several problems and limitations of the pointing gesture can be found in past research.

First of all, colour segmentation to detect the hand or face is highly dependent on the lighting conditions, and some approaches require users to wear gloves or long sleeved shirts. For pointing direction estimation, the line-of-sight method has limits on its accuracy. Depth segmentation is a better choice than colour segmentation, however the former assumes that the body parts that need to be segmented are always in the foreground, which greatly reduces the range for the pointing gesture. The techniques

33 used for locating the eye are mostly based on detecting the head or face, obtaining their centre coordinates or subtracting a constant vertically from the top of the head or face.

These approaches are not robust since every person has a different-sized head and face.

Also, the eye-fingertip line cannot estimate the pointing direction accurately in situations where the person is pointing forward with an arm pointing parallel to the ground; since the fingertip is below the user’s eye, the system will estimate the direction vertically below the actual direction that the user is pointing at. To make the line-of- sight method as accurate as possible, the user must hold the arm at eye level, but that is not physically comfortable. Also, line-of-sight and even shoulder-hand line require the user to fully stretch the arm when pointing, and this can be tiring if the user uses the system for a long time. The reason why line-of-sight method is still used for pointing direction estimation may be that it is very difficult to track other body parts with reasonable accuracy, such as the shoulder and elbow where skin colour detection cannot be used. This might be why some past research works state that forearm orientation has lower accuracy than line-of-sight estimation. Using multiple fixed cameras might achieve robustness and high accuracy, however in terms of usability in the real world, multiple cameras in a fixed environment are expensive and technically challenging as they must be configured for the environment. Using HMM to distinguish between intentional and unintentional pointing gestures may be computationally expensive. On the other hand, as long as there is a clear input that tells the system that the user is pointing, the processing speed does not have to be sacrificed.

For hand tracking, it was found that most works on hand gesture recognition and hand tracking use either colour or depth segmentation for hand detection. Colour segmentation does not require a depth camera, which is more expensive than a colour camera, and this can be an advantage over depth segmentation. However, colour segmentation is highly affected by background colour and lighting conditions which can be a disadvantage compared to depth segmentation. From reviewed literature, it was found that YCbCr colour space is popular for hand segmentation, since it is less

34 dependent on lighting conditions compared to RGB colour space. Therefore, if the testing environment is fixed and the background colour is always distinctive from the hand colour, YCbCr colour segmentation would be suitable for hand segmentation; and if the testing environment is not fixed, depth segmentation would be more suitable for hand segmentation.

For pattern and hand trajectory recognition, it was found that colour segmentation was mostly used for detecting the hand region, and the Hidden Markov Model (HMM) was mostly used for hand trajectory recognition. For hand trajectory tracking, there is a variety of methods used but it appears that the Kalman filter is the most popular.

For assistive technologies, the systems presented are all brilliant and they all deserve respect in their intent to help disabled persons. However, as assisting disabled persons is a difficult problem, most systems presented require some specific conditions to be fulfilled, such as requiring the user to wear or even plant an object on their bodies. For example, the Tongue Driving System requires a device to be planted on the user’s tongue so that it would not fall off. Whether disabled or not, people tend to seek comfort when they use something, especially if it is something that they use in their daily lives. It would be nice if there is a system that can assist disabled persons without a need for them to wear additional devices.

2.4. Tools, Engines and Frameworks

Some tools, engines and frameworks have been used in this thesis. OpenNI and

OpenCV are used as they are essential frameworks for vision-based applications using

Kinect and the testing environment was built with Unity3D engine.

2.4.1. OpenNI

OpenNI or Open Natural Interaction is an industry-led, open source framework focused on certifying the compatibility and improving interoperability of Natural Interaction (NI) devices such as Kinect, applications and middleware [40]. OpenNI was chosen for

35 tracking the body parts of a person as it contains modifiable skeleton tracking method that can track users’ body joints. Even though Kinect with OpenNI can detect humans and return their body joints, it does not perform pointing direction estimation and pointing gesture recognition, which are major goals of this work.

2.4.2. OpenCV

OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at real-time computer vision, developed by Intel and now supported by Willow Garage [41]. It is free for use under the open source BSD licence.

The library is cross-platform. It focuses mainly on real-time image processing. OpenCV was chosen for system implementation as it requires computer vision methods such as image processing, filtering and many others.

2.4.3. Unity3D

Unity3D is a game engine and game development environment [42]. It supports multiple platforms such as , Mac OS X and web (via Unity Web

Player plugin). Unity3D was chosen for creating a testing environment for single- handed driving system.

2.5. Summary

In this chapter, background of this research was presented and related literature reviewed and discussed. In this chapter, essential background of assistive technologies, gestures in HCI and Kinect were presented, followed by in-depth literature reviews of past work on pointing gesture, hand segmentation and tracking, patter and hand trajectory recognition along with tools, engines and frameworks used in this research.

Advantages and disadvantages of each method were discussed and they provided a good guidance on using appropriate methods to solve different problems.

36

In the next chapter, an Object Navigation System for visually impaired persons is introduced, with a new approach to integrate object localization and 3D pointing gesture, based on a new algorithm for pointing direction estimation using hand and elbow.

37

Chapter 3

Object-Based Navigation*

This chapter presents a new low-cost natural user interface that can help visually impaired persons to readjust their living and working skills to a new environment by using the pointing gesture, which has not been used in assistive technology before. The challenge is to successfully estimate the pointing gesture using the forearm for users’ comfort, while maintaining reasonable accuracy; and to apply this as a navigation tool for the visually impaired and integrate it with a pre-trained system that contains a database of 3D coordinates of objects, thereby enhancing the effectiveness and robustness of the pointing gesture. This chapter first presents the methods used for pointing gesture recognition, and describes how the system works and how the pointing gesture is used in the system. Lastly, the experimental results are presented and limitations discussed.

3.1. Pointing Gesture Recognition

Unlike past works on pointing gesture recognition which were based on sighted users, pointing gesture is not only used for object selection but also for navigation in this work, as the targeted end-users are visually impaired persons. Since the system is able to track

3D locations of the user’s body joints and the pointed spot in real time, pointing gesture can be used as a virtual cane. Therefore, pointing gesture in this chapter returns additional information which is the distance between the user’s hand and the pointed spot. The use of this information will be described in Section 3.2.2.

* Parts of this chapter have been presented at ARATA (Australian rehabilitation Assistive Technology Association) National Conference 2012 & BRC 2013 (4th ISSNIP Biosignals & Biorobotics Conference) and published. (Appendix A, (1) and (2)) 38

Pointing gesture recognition consists of three steps, which are body joints detection, pointing direction estimation and pointing gesture detection, as shown in Figure 3.1.

This section describes how each step is achieved with methods presented.

Pointing Pointing Body Joints Direction Gesture Detection Estimation Detection

7 Figure 3.1: Three Steps for Pointing Gesture Recognition

3.1.1. Body Joints Detection

First of all, a full scan of the scene is done to generate a 3D disparity map in real time to read 3D coordinates of point clouds in the frame. Each pixel of the generated 3D disparity map contains x,y,z coordinates that can be used for calculating vector and distances.

With OpenNI framework and the generated 3D disparity map, Kinect can track human body joints in real time and obtain 3D coordinates of each joint using a skeleton tracking module, which draws and tracks the skeleton of the user’s body as shown in

Figure 3.2 below.

39

8

Figure 3.2: Skeleton Tracking Example [45]

The old version of OpenNI required users to perform calibration poses as shown in

Figure 3.3 below, which was necessary for skeleton tracking. However the new version of OpenNI does not require the calibration pose but requires the user’s entire body to be in the Kinect camera vision. The ideal working distance for tracking is 1m - 3.5m but practical range is much longer as mentioned previously in section 2.1.3 and the recent version of Kinect contains Near-mode which lowers minimum distance limit to 40cm.

For the best performance, the user should not make very fast motions as they may cause tracking failure. Also, the user has to stay in the “sight” of the Kinect, which therefore needs to be mounted at a place where it can clearly see the whole body of the user in the room at any position [46]. The coordinates obtained may be used to calculate the pointing vector from the pointing gesture. The required body joints for this system are the head, both hands and right elbow. The elbow and right hand are used for pointing direction estimation, as the person does not have to fully stretch their arm to point, while pointing gesture using head-hand or shoulder-hand requires them to do so. The head and left hand are used for triggering input for pointing gesture detection which will be explained in detail in Section 3.2.1. 40

9 Figure 3.3: Calibration Pose [46]

3.1.2. Pointing Direction Estimation

The basic idea of pointing direction estimation is to extract a pointing vector that connects elbow and hand and return the first point cloud which lies on that vector.

Pointing gesture usually marks a single point but in this work, the system marks all the point clouds which are not exactly on the pointing vector but still close enough to it.

There are two reasons for marking multiple point clouds. The first is that Kinect fails to collect depth information on some transparent or reflective surfaces due to the fact that it projects an infrared (IR) laser and calculates depth using the IR pattern, which is not visible on transparent or reflective surfaces. The second reason is that pointing gesture in this chapter is mainly used by visually impaired persons who use it to search for objects that they cannot see. Therefore it should work like a torch which creates a circle of light on the surface, so that it is easier to find objects that a person is unable to see directly. To decide whether each point cloud in the 3D disparity map is close enough to the pointing vector, the angle difference θ, between the pointing vector and the vector connecting elbow and the point cloud, is calculated.

41

10 Figure 3.4: 3D Space Created by User’s Arm and a Point Cloud

The boxes in Figure 3.4 above represent 3D spaces formed between elbow, hand and point cloud. These points are centre points of each body joint captured by Kinect. The

Euclidean Distance equations below show how θ is calculated. The lengths of the hypotenuses of the ground planes in each box are:

2 2 α = (푋ℎ − 푋푒 ) − (푍ℎ − 푍푒 ) (3.1)

2 2 β = (푋표 − 푋ℎ ) − (푍표 − 푍ℎ ) (3.2)

2 2 γ = (푋표 − 푋푒 ) − (푍표 − 푍푒 ) (3.3)

The distances between hand and elbow, point cloud and hand and point cloud and elbow are:

42

2 2 a = 훼 + (푌ℎ − 푌푒 ) (3.4)

2 2 b = 훽 + (푌표 − 푌ℎ ) (3.5)

2 2 c = 훾 + (푌표 − 푌푒 ) (3.6)

The angle difference θ between the line between a point cloud and elbow and the line between a point cloud and hand is given by:

푎2 + 푏2 − 푐2 cos 휃 = (3.7) 2푎푏

If θ is smaller than a predefined threshold, it will be marked. All the marked point clouds form a circle of dots which will be referred to as estimated pointing zone.

Threshold for θ is used to control the size of the estimated pointing zone. Only the point that is close enough to the line connecting elbow and hand will be marked (see Fig 3.5) and a threshold is used to decide whether the point should be marked or not. As shown in Figure 3.5, larger θ would give a larger estimated pointing zone with lower accuracy.

After trial and error, the threshold for θ was set at 3°, which was found to provide good trade-off between efficiency and accuracy. The circle of red dots in Figure 3.6 is the estimated pointing zone and it can be seen that some dots are missing because of the reflective surface of the white board.

43

11 Figure 3.5: Estimated Pointing Zone Created by θ.

3.1.3. Pointing Gesture Detection

The system uses both intentional and unintentional pointing gestures. No pointing gesture detection is needed for unintentional gestures since it is considered as a random movement of the arm. In this work, every single movement of the visually impaired user is considered to be an unintentional pointing gesture since they are searching for a target instead of pointing at one. However, when only an intentional pointing gesture performed by a sighted person is desired to be captured, a trigger is required to distinguish between intentional and unintentional gestures. We use a simple and computationally cheap method to detect an intentional pointing gesture. Because the system tracks the user’s head, elbow and both hands in real time, when the real world y- coordinate of the user’s non-pointing hand is higher than the real world y-coordinate of the user’s head, as in Figure 3.6, the trigger alerts the system that the user is performing an intentional pointing gesture. The use of this trigger for intentional gestures will be explained in the Section 3.2.1.

44

12 Figure 3.6: Tracking User and Recording a Pointed Object to the Database.

3.2. System Description

The system consists of two parts, namely offline training and object navigation, as shown in Figure 3.7. These two parts are linked to a database which is updated by a sighted person and used by the visually impaired user. The sighted person who performs offline training to register objects in the database will be referred to as “assistant” and the blind or visually impaired person who uses object-based navigation will be referred to as “user” from now on. This section describes how each system part uses the pointing gesture for different purposes, their functionality and how they are integrated to assist visually impaired persons to navigate.

45

13 Figure 3.7: System Overview

3.2.1. Offline Training

Offline training must be performed before object-based navigation can start. It is necessary that offline training is performed by sighted users, since this process uses pointing gesture for object selection to record coordinates of the objects in a database by pointing at each of them in sequence.

First of all, the assistant should to have a list of objects that are to be recorded. Once the program starts running, Kinect detects the assistant and obtains 3D coordinates of her head, elbow and hands, then starts tracking them. While the assistant points at an object that is to be recorded, she simply needs to raise her other hand above her head to signal to the system to record the location of the object, as shown in Figure 3.6. The 3D coordinates of marked point clouds are stored in an array and the centre point is obtained. The coordinates of the centre point are then converted to real-world coordinates and recorded in the database. The system then tells the assistant that the object has been successfully recorded via the speech synthesizer. When the location of a pointed object is too close to the location of other objects already saved in the database, the system skips recording that object to prevent object duplication. Once all objects are recorded, the assistant needs to type in the names of the objects manually into the

46 database in the order of recording. An example scenario 1 for offline training is shown in Figure 3.8.

14 Figure 3.8: Scenario for Offline Training

3.2.2. Object Navigation

Once all the objects are successfully registered by the assistant using offline training, the user can start using object-based navigation. First of all, the user signals to the system by raising his/her left hand above his/her head while performing a pointing gesture. The system returns the direction to turn in and the orientation, based on the

1 Demo video available at https://www.dropbox.com/s/ih51k5cw0agbxrv/offline%20training.wmv 47 user’s pointing direction and objects in the vicinity. For example, in a situation as in

Figure 3.9, the system will tell the user “Object 1 is α degrees anti-clockwise from your pointing direction. Object 2 is β degrees clockwise from your pointing direction”. The turning angle and the direction are calculated using a method similar to finding θ in

Figure 3.4. The only difference is that the turning angles are calculated by using only x and z coordinates from a top view.

15 Figure 3.9: Angle Difference between User’s Pointing Direction and Objects.

Then the user can start searching for an object by moving his/her forearm around. When any point cloud in the estimated pointing zone created by the user’s pointing gesture is close to one of the objects in the database, the system tells the user what the object is and how far it is from his/her hand. If there are multiple objects in the estimated pointing zone, the system will provide information on each object. If the pointed object is what the user is looking for, then the user simply needs to move in that direction and the system tells the user that she has arrived at the object when the user’s hand is close enough to the object.

Obstacle detection is also implemented for safety’s sake, to prevent potentially dangerous situations such as the user accidentally touching hazardous obstacles like

48 sharp-edged, hot or fragile objects. The system measures the distance between the hand and anything that the forearm is pointing at. When this distance falls below a threshold, which is currently set at 50cm, the system recognizes that something is very close to the user’s hand. Then the system checks whether this is one of the objects recorded in database. If not, the system considers it as a potentially hazardous obstacle and alerts the user with a continuous beep sound. An example 2 scenario for object navigation is shown as in Figure 3.10.

2 Demo video available at https://www.dropbox.com/s/lxlqj1xba1gr6yu/object%20navigation.wmv 49

16 Figure 3.10: Scenario for Object Navigation

3.3. Experimental Results

Two types of experiments were performed. The first one compared pointing gesture accuracy using different body joints for pointing direction estimation. The second one tested general system performance.

3.3.1. Pointing Gesture Accuracy Comparison for Different

Body Joints

An experimented module was implemented to test the accuracy of pointing gestures using different body joints for pointing direction estimation. Three different body joints were tested, namely head-hand, shoulder-hand and elbow-hand line. It was expected that estimating from these three different body joints would result in different pointing directions as in Figure 3.11.

50

17 Figure 3.11: Expected Differences in Pointing Directions Estimated from Different

Body Joints

The experimented module marks with a different colour, the point cloud that is on the same vector as the pointing direction from each body joint. When the user performed pointing gesture while looking at the target on the wall, another person marked the point on the wall while watching the computer screen to ensure that he is marking each point from the three different body joints correctly. Then the distances between the actual target and each of the other three marked points were calculated. Since the error scales with the distance of the target from the user’s hand, the error was divided by the distance between the target and the user to compensate for it. Both the position of the target and the user were varied in order to test the accuracy of the pointing gestures at various angles ranging from -150 degrees to -30 degrees and 30 degrees to 150 degrees.

The angles closer to 180 degrees or -180 degrees were not measured, because the user in this case would be either pointing directly at the Kinect or opposite to where Kinect cannot see the user’s arm. The number of iteration was 50 times for each angle, and the scaled average error and standard deviation of pointing gestures estimated from each body joints were calculated (Table 3.1) and plotted in Figure 3.12.

Table 4

Table 3.1: Scaled Average Error (m/m) and Standard Deviation for each Body Joints at

Different Angles 51

Body Head-Hand Shoulder-Hand Elbow-Hand Joints Scaled Standard Scaled Standard Scaled Standard Avg. Deviation Avg. Deviation Avg. Deviation Angle (˚) Error(m/m) (m/m) Error(m/m) (m/m) Error(m/m) (m/m) -150 0.442 0.0219 0.342 0.0171 0.364 0.0169 -135 0.434 0.0155 0.334 0.0166 0.331 0.0165 -120 0.426 0.0138 0.326 0.0161 0.319 0.0162 -105 0.422 0.0152 0.322 0.0158 0.305 0.0163 -90 0.418 0.0145 0.318 0.0159 0.286 0.0159 -75 0.421 0.0139 0.321 0.0155 0.31 0.0157 -60 0.428 0.0147 0.328 0.0162 0.322 0.0136 -45 0.433 0.0142 0.333 0.0165 0.337 0.0162 -30 0.447 0.0157 0.347 0.0169 0.358 0.0168 30 0.452 0.0155 0.352 0.0168 0.349 0.0169 45 0.446 0.0148 0.346 0.0162 0.334 0.0162 60 0.422 0.015 0.322 0.0164 0.311 0.0159 75 0.423 0.0138 0.323 0.0159 0.306 0.0156 90 0.429 0.014 0.329 0.0158 0.278 0.0159 105 0.439 0.0143 0.339 0.016 0.31 0.016 120 0.437 0.0138 0.337 0.0161 0.324 0.0164 135 0.449 0.015 0.349 0.0166 0.341 0.0166 150 0.456 0.0157 0.356 0.0171 0.361 0.0168

Scaled average error for each body joints with different pointing angle

0.5

0.4

0.3

0.2 HH SH 0.1 EH 0 Scaled Average Error (m/m) Error Average Scaled

Angle (˚)

18 Figure 3.12: Experimental Results for Comparison of Pointing Gestures Estimated

from Different Body Joints (HH = Head-Hand, SH = Shoulder-Hand, EH = Elbow-

Hand)

52

The experimental result showed that the elbow-hand line gave best performance, which was slightly better than the shoulder-hand line. Head-hand line resulted in relatively poor performance compared to the other two. The horizontal difference between marked points from each pointing gesture was small. However, the vertical difference between them was huge, especially for head-hand line which was far lower than the other two.

The results were similar as expected in Figure 3.11, except for the fact that shoulder- hand and elbow-hand line were much closer. Although head-hand line was found to be the most inaccurate, it was observed that average error for head-hand line was very consistent regardless of the pointing angle. Also, standard deviation for head-hand line was found to be smallest while there was no significant difference in standard deviation between shoulder-hand line and elbow-hand line. In comparison to head-hand line, it was found that shoulder-hand line and elbow-hand line were relatively dependent of the pointing angle. This could be because the head position is least affected by orientation compared to other body parts, as it remains still even when the pointing angle changes, and the head is relatively easier for Kinect to detect compared to shoulder and elbow at any camera angle.

3.3.2. System Performance

The system performance experiments were performed in two differently sized rooms as shown in Figures 3.13 and 3.14. The small room was approximately 5m x 4m, and the large room approximately 7m x 5m.

53

19 Figure 3.13: Small Room

20 Figure 3.14: Large Room

54

To test for accuracy, the success rate for the pointing gesture recognition is measured.

The success rate was measured by counting the number of successful object detections by the system when the user performs pointing gesture at the object without looking at the screen. To test for effectiveness, total time taken to search for all the objects is measured, and the average time taken per object is computed. Pointing gesture detection rate was measured by counting the number of successful pointing gesture detections by the system when the user raises his/her left arm above his/her head while performing the pointing gesture. The distance error was found by measuring the actual distance between the object and user’s hand, and the distance computed by the system, then calculating the difference between them. The experiments were carried out with blindfolding as shown in Figure 3.15, to test system effectiveness in a real-life situation for visually impaired persons.

The success rate for pointing gesture recognition, pointing gesture detection and distance error were measured without blindfolding, to compare between expected outcome and actual outcome. Five objects were placed in a small room experiment and eight objects in the large room experiment. More objects could not be tested, as only objects of certain size were tested and too many of them would overlap each other, leading to Kinect misrecognizing the objects. The number of trials was 50 for each experiment and the experimental result is shown in Table 3.2.

55

21

Figure 3.15: Blindfold Testing

Table 5 Table 3.2: Experimental Results

Without Blindfolding With Blindfolding Room Size Pointing Pointing Distance Time Obstacle Gesture Gesture Error taken per Detection Recognition Detection object Rate Rate (%) Rate Small 92% 94% 0.14m 56.4s 94% (5 x 4m) Large 96% 100% 0.18m 68.2s 98% (7 x 5m)

It was found that the system generally works better in the larger room; this could be due to the fact that it is difficult for Kinect to track the person’s whole body in a small- sized room because of the angle of vision. Also It was found that when the person was too close to Kinect so that the person’s lower body is be barely seen, the coordinates of the person’s head, hand and elbow had fluctuations due to unstable tracking, which had negative effects on pointing gesture recognition, obstacle detection and pointing gesture detection results. The large room experiment took longer object detection time; this 56 could be because the testing environment is larger and the user had to navigate a greater distance to find each object, which were widely placed in the room. Also, the large room experiments had larger distance error, which could be because the tested objects were further from the user’s hand than the ones in the small room, and longer testing distances between user’s hand and the object resulted in larger distance errors.

The results show that the system is robust and accurate, provided the user’s pointing direction is not parallel to the direction that Kinect is facing. There is a short time lag of

0.5-1 second due to computation time, but during tests it was found not to be significant enough to affect system effectiveness. More exhaustive tests should throw more light on this aspect. Because the main purpose of this system is to help visually impaired persons to adapt to new environments, it is believed that once they localize object positions, they can find them in a shorter time and eventually find the objects without the need for this system, which is a desirable goal.

3.4. Limitations

The system limitations are that the room should be small enough for Kinect to see all objects from a fixed position due to Kinect’s depth range limit, and the objects should be in fixed locations. Therefore objects such as a fridge, door or closet will be better detected, than small objects such as a cup, pen or remote controller that may be moved easily. Also, there is a blind spot when the user’s pointing direction is parallel to the direction that Kinect is facing. In this case, Kinect loses track of the user’s arm; to avoid this situation, multiple cameras may be a viable choice.

3.5. Remarks

In this chapter, a novel approach for natural assistive technology for visually impaired persons using pointing gestures is presented. This system is able to tell visually impaired persons where the important objects are and how far they are from the visually impaired user’s hand, once it is trained by sighted assistants. It is helpful to visually

57 impaired persons who need to adjust themselves to their new home or places such as a hospital room or guest room. There is currently no known work on assisting visually impaired persons with pointing gestures without any devices.

One possible future work is to replace the Kinect with a depth camera that has longer depth range so that the system can be used in larger rooms with more objects. Then the system would not be able to detect the elbow with reasonable accuracy since only

Kinect can do that, hence shoulder-hand line, that was found to be not too different from elbow-hand line in terms of accuracy, may be used instead.

The system presented in this chapter is built for assisting visually impaired persons.

In the next chapter, another system for helping disabled persons to drive a vehicle will be presented.

58

Chapter 4

Single-handed Driving System**

With advances in technology, people are constantly looking for more efficient and smarter ways to improve their daily lives. For example, old cell phones have “evolved” from folding phone to smart phones that have replaced “pressing” the button with

“touching” the screen and have many more functions than these in old cell phones. The international airports in Australia and New Zealand now use SmartGate [47], which automatically processes the ePassport [48] information of Australia and New Zealand citizens and checks the face of the traveller, using a face-recognition system, to compare between the picture in the passport and the picture taken by the SmartGate, instead of the normal immigration checks. These examples show that Human-Computer

Interaction (HCI) may applied to our daily lives very broadly and there may be more applications in the future. The main objective of this chapter is to create another successful example of HCI with hand gestures that may be applied to driving wheeled vehicles by replacing the steering wheels or a remote controller. In the field of UI (User

Interface) research, recent interest has shifted from Guided User Interface (GUI) to

Natural User Interface (NUI) which is more direct and intuitive. There have been some vision-based systems [49, 50 ] that drive robots or play racing games with Kinect using hand gestures. The main idea is to replace a remote controller or steering wheel with human hands or arms. Most applications use open source software such as OpenNI or

Kinect SDK [51] to enable Kinect to automatically track body joints of a human, then using the location of human hands or arms, to control or drive robots or vehicles.

However, all those past works on vision-based driving systems require both hands to drive the robots or vehicles. Also, they cannot issue any commands other than steering, or triggering features of the robot such as fork lifting, grabbing objects or taking

** Parts of this chapter have been accepted for presentation at HCII 2013 conference (Appendix A, (3)) 59 pictures. This chapter introduces a system that requires the use of only one hand to drive and control a vehicle or robot with Kinect. The challenge is to extract as much information as possible from a single hand in a robust way to achieve maximal functionality of the system, so that users may drive or control vehicles safely with minimal difficulty. The single-handed driving system uses the functions as shown in

Figure 4.1, for hand gesture recognition and tracking. This chapter will present how each method is used in this system and how the system is designed to support vehicles with different types of steering.

Convexity Fingertip Defects Detection Hand Filtering Hand Contour Segmentation Centre of the Angle of hand Rotation

22

Figure 4.1: System Overview

4.1. Hand Segmentation

First of all, the hand has to be successfully detected by hand segmentation to separate the hand from the background. Depth segmentation was chosen for detecting the hand.

The reason for choosing depth segmentation is that when the system is installed on an actual vehicle such as a car or wheelchair in future, the environment would be light- dependent, leading to poor segmentation result with colour segmentation. Also, Kinect can be used to generate a 3D disparity map that returns depth values of each pixel in the frame, as previously discussed in Section 3.1.1. After generating the 3D disparity map, the hand can be seen more clearly as shown in Figure 4.2 because it is much closer to the Kinect compared to the background.

60

23 Figure 4.2: Hand in 3D Disparity Map

Under the assumption that the user’s hand would be always in the front, the hand was segmented by finding the smallest depth value after scanning the whole frame, then thresholding the frame based on that value with a constant (average hand thickness) added. After thresholding, some noise was found. As noise can badly affect the hand contours and the convexity defects that will be used for fingertip detection, they were filtered by firstly eroding, then dilating and lastly blurring the image frame as shown in

Figure 4.3. The difference is not too significant, however the filtering process has removed the random dots appearing near the hand.

24 Figure 4.3: Segmented Hand Before Filtering (Left) & After Filtering (Right)

61

The centre of the segmented hand was found by enclosing the hand in a minimum sized circle and obtaining the centre point of that circle. This centre point is displayed as shown in Figure 4.4 and tracked in real time so that the steering angle can be measured later on.

25 Figure 4.4: Centre of the Segmented Hand Being Tracked Successfully

4.2. Fingertip Detection

After implementing hand region detection, it was concluded that hand trajectory recognition would take some time for drawing trajectory and recognizing it. The reason for using hand trajectory recognition is to issue commands or shift gears, but since it would consume significant amount of time for processing while running the hand- driven system in real time, it was decided to replace hand trajectory recognition to fingertip detection which is computationally less costly than hand trajectory recognition, so that the number of fingers could issue commands or shift gears in a faster and more distinctive way.

Fingertips were detected using OpenCV functions. Once the hand is segmented, a hand contour was drawn and convex hull found based on the points lying on the hand contour stored in a matrix. Then the inner angle of each hull corner was calculated within the

62 upper region only since the fingers are always in the upper region of the hand assuming that the hand is always upright. The hull corners with inner angle lower than the predefined threshold were marked as fingertips and drawn in the frame as shown in

Figure 4.5 below.

26 Figure 4.5: Fingertips Detected Successfully

4.3. System Description

The system supports both steering types, namely nonholonomic steering and differential steering. Nonholonomic steering is for vehicles that have four wheels and change directions by rotating forward wheels like a car. Differential steering is for two-wheeled vehicles such as a wheelchair or robot. The vehicles with differential steering change direction by applying a different velocity on each wheel, so that one wheel moves faster

than the other, which makes vehicles turn.

Table 6

Table 4.1: Comparison between two steering types Nonholonomic Steering Differential Steering Vehicles Cars Robot and electronic wheelchair Acceleration/Brake Depth information of the Number of fingertips hand Gear Shift Number of fingertips na Change modes/functions na Number of fingertips

63

As shown in the Table 4.1, depth information of the hand and number of fingertips are used to switch between acceleration, brake and gears. Because the vehicles with nonholonomic and differential steering have different characteristics, depth information of the hand and number of fingertips are used differently for each steering type, as shown in Figures 4.6 and 4.7 below.

27 Figure 4.6: Distinction Between Acceleration (green), Idle (blue) and Brake (red)

Using Depth Thresholding

64

28

Figure 4.7: Switching Gears Using Different Number of Fingertips

4.3.1. Nonholonomic Steering

As previously mentioned, nonholonomic steering is for vehicles such as a car and the system can perform steering, acceleration, brake and gear shift using the information from a single hand as shown in Figure 4.8.

29 Figure 4.8: How each features of the car is achieved with a single hand

65

The steering angle for nonholonomic steering is measured similar to a real car steering.

In the real world, the driver steers a car by grabbing the steering wheel first, then rotating it with the hands holding the steering wheel. In this work, since only one hand is used for steering, the steering angle can be measured by tracking the movement of the hand while it is holding the “virtual” steering wheel and measuring the angle of rotation with the centre of the frame as the centre of rotation. In order to distinguish between steering state (hand holding the steering wheel) and non-steering state (hand not holding the steering wheel), the status (open/closed) of the hand is used. The open hand and closed hand are distinguished by the number of fingertips detected. Five fingertips represent an open hand and zero fingertips represent closed hand.

30 Figure 4.9: Four different steps for nonholonomic steering algorithm

There are four different steps for nonholonomic steering as shown in Figure 4.9. The first step shows that the user is moving the hand while it is open so that the steering mode is not activated. The second step shows that the user just closed the hand to begin steering. The steering mode is activated and records the coordinates of the initially closed hand C0. The steering angle is still zero since the hand has just closed and did not move after that. However, now the system is ready for measuring the steering angle.

The third step shows that the user is moving the hand while it is closed. During this state, the steering angle is measured by measuring the angle difference between the initial position of the closed hand and the current position of the closed hand, with the centre of the frame as the centre of rotation. The steering angle measuring algorithm

66 divides the frame into quadrants. The steering angle within the same quadrant is measured using simple trigonometry as shown in Figure 4.10 below.

31

Figure 4.10: Steering angle calculation

푋0− 퐶 α =tan−1 (4.1) 푌0− 퐶

푋1− 퐶 β = tan−1 (4.2) 푌1− 퐶

θ= β- α (4.3)

The previous position of the centre of the hand was stored in a buffer; when it moves to another quadrant, the system checks the quadrant that the hand came from, then the angle of the rotation from the previous quadrant is added or subtracted depending on the direction of rotation. Because the system counts the number of quadrants that the hand has crossed (incrementing for clockwise direction, decrementing for anti-clockwise direction) in real time, there is no limit on the number of cycles for steering.

67

4.3.2. Differential Steering

Differential steering is simpler than nonholonomic steering, because two-wheeled vehicles move in a simpler way. As previously mentioned, differential steering is achieved by applying a different velocity on each wheel and this difference in velocity controls the rate of rotation. Since the desired output is the rate of rotation, the system was designed to generate the rate of rotation by measuring the angle of rotation (θ) of the hand in the same way as for nonholonomic steering, except that the initial position for the closed hand is replaced with y-axis in the centre of the screen, as shown in

Figure 4.11.

32 Figure 4.11: How differential steering works

The main difference is that the differential steering mode uses the number of fingertips to switch between forward, backward and neutral (idle) mode while the nonholonomic steering mode uses depth information of the hand to switch between acceleration, brake and neutral (no acceleration and no brake) mode. The reason for this is that robots or electronic wheelchairs do not have gears so that there is significant number of fingertips to assign commands such as move forward, backward and stop without the need to use depth value. As robots or electronic wheelchairs do not support acceleration or deceleration like a car, it is more appropriate to use number of fingertips as they are more distinctive than depth value that is more suitable for acceleration or deceleration.

68

4.4. Experimental Results

For both steering modes, the accuracy of the steering and the success rates for switching between gears, acceleration, neutral and brake were measured to test general system performance. While the user is performing steering using the system, steering accuracy was measured by calculating the difference between the expected steering angle by the user and the actual steering angle computed by the system. The number of iteration was

50, with different steering angles and starting points and the hand initially closed.

Success rates for the gear shift using fingertips and switching between acceleration, neutral and brake using depth values were measured for 50 trials as well. The experiments were performed in two different conditions, one with visual feedback that allows users to see how their hands are being tracked and the current system status and the other one without visual feedback. The purpose of testing the system under the two different conditions is to find out how much users depend on visual feedback while they are using the system and the consequence of not providing the visual feedback. The

results are shown in Table 4.2.

Table 7

Table 4.2: Experimental Results

With Visual Feedback Without Visual Feedback Steering Angle 92.8% 89.2% Parking (4 Fingers) 98% 94% Drive (3 Fingers) 96% 94% Reverse (2 Fingers) 98% 92% Acceleration 98% 80% Neutral 98% 82% Brake 96% 86%

69

Performance Measured For Each Feature 100.00% 90.00% 80.00% 70.00% 60.00% 50.00%

Accuracy 40.00% 30.00% With Visual Feedback 20.00% Without Visual Feedback 10.00% 0.00%

33 Figure 4.12: Performance Measured for Each Feature

The experimental results show that the system generally works better when the users have visual feedback while using the system, which is also clear in Figure 4.12.

However, the difference is insignificant for steering angle and fingertip detection, because the users can still see their hands and fingers without watching the screen. For switching between acceleration, neutral and brake, the difference is relatively large, and the reason could be that the screen displays how far the hand is from the Kinect by using different colour for each depth range. For inexperienced users, it would be difficult for them to estimate how far their hands are from Kinect, but this can be solved if they try the system several times so that they would eventually know the depth ranges for acceleration, neutral and brake. In the real world, car driving is more complicated since it also includes other features such as indicator, window wiper, lights and others which are essential for driving. However, the experimental results show that this system can drive vehicles that only require those features or fewer, such as a robot or electronic wheelchair.

70

4.5. Simulation

The system was tested with driving simulation software to test how effective it would be and what kind of unexpected problems could occur in real life situations. The driving simulation software used for testing were TORCS [52], Vdrift [53] and Stunt Ralley

[54], and a custom-made simulation tool using Unity3D. Both nonholonomic and differential steering systems were tested using different simulation tools.

4.5.1. Testing of Nonholonomic Steering

As the physics engines of driving simulators, shown in Figure 4.13, are designed for simple keyboard input only, it was inappropriate to test nonholonomic steering, because steering angles of 10 and 360 degrees would both be considered as the same arrow key input. Therefore, steering was separated from other features and tested differently by creating a simple simulator using a steering wheel image that rotates, with the steering angle extracted from the user’s hand gestures in order to visualize the performance of the nonholonomic steering system, as shown in Figure 4.14. The other features such as acceleration, brake, neutral and gears were visualized by implementing a program that converts the user’s hand gestures into keyboard inputs and sending these inputs to the driving simulator. The different depth value zones of the hand and different numbers of fingertips were converted to different keyboard inputs and sent to the driving simulator to control the car. As cars in the driving simulators still need a replacement for left and right arrow key that change direction of the car, nonholonomic steering was modified by vertically dividing the frame into three zones. When the hand is in the left zone, the system assigns the left arrow key input to the simulator, when the hand is in the right zone, the system assigns the right arrow key input to the system and when the hand is in the middle zone, the system assigns neither of them to the simulator.

71

34 Figure 4.13: Different Driving Simulators Tested [52-54]

For steering, it was found that the steering wheel in Figure 4.14 rotated in response to hand movements without significant delay. For other features tested with a driving simulator, it was difficult to use the system to control the cars in driving simulators for the first few times, as the default speed of the car in the simulator was very fast but after 72 that, it became easier to control the vehicles with the hand with greatly reduced number of collisions and lap time, though it was still easier to control them with the keyboard.

The users might need training before using the system in actual situations.

35

Figure 4.14: Visualization for Nonholonomic Steering 3

3 Demo video available at https://www.dropbox.com/s/wi7hdspo36l0kvf/nonholonomic.wmv 73

4.5.2. Differential Steering

Driving simulators were inappropriate for simulating differential steering, as driving simulators simulate cars that have four wheels, while differential steering is for vehicles that have two wheels. Using Unity3D, a simple simulator was implemented to visualize the differential steering system’s performance, as shown in Figure 4.15. As Unity3D was not entirely compatible with software tools that were used in this system, the program that was implemented for converting the user’s hand gestures to keyboard inputs to test the driving simulator, was modified to provide different ratios of velocities between wheels according to the steering performed by the user. The other remaining features such as forward/backward and neutral were visualized with the same program that was used in the previous section.

36 Figure 4.15: Simulation for Differential Steering 4

4 Demo video available at https://www.dropbox.com/s/5yykm802qa4cyn9/differential.wmv 74

While vehicles such as cars cannot be controlled with keyboard inputs, vehicles with differential steering such as a robot can be controlled easily with keyboard inputs.

Therefore, all the functions for differential steering could be simulated by converting the hand gestures into keyboard inputs. The simulation was performed to test various features of the differential steering such as rotation in a fixed position, acceleration, brake, cornering and parking. The simulation result was satisfactory as the system in differential steering mode completed all the tasks successfully with 100% accuracy.

Also, the speed of the vehicle in the custom-made simulator was slow, therefore controlling the vehicle was relatively easier than controlling the ones in driving simulators. It was found that using number of fingertips for switching between acceleration and brake worked better than using different depth zones of the hand. The reason is that number of fingertips is discrete and there are clear boundaries between the number of fingertips, while depth zone of the hand is continuous as the boundary is very thin, for example, a depth difference of 1 can change acceleration to neutral mode.

As it was difficult to quantify the performance of the differential steering system, another experiment was performed to compare between performance of simulations driven by keyboard inputs and the single-handed driving system. As the system eventually generates keyboard inputs converted by hand gestures to drive the simulator, it is impossible for the system to perform better than the keyboard inputs. However, the main interest was in finding out how close the system is to the actual keyboard inputs in terms of performance. The experiment was performed by 5 persons with 10 consecutive iterations per person to generate the learning curve, lap time and number of collisions during the simulation. An input with smaller lap time and smaller number of collisions means that it is easier to use and control precise steering respectively. The experimental results for lap times are shown in Tables 4.3 and 4.4 and Figures 4.16 and 4.17.

75

Table 8 9 Table 10 Table 4.3: Experimental Results for Lap Time (s) Comparison between Keyboard Input

and Hand Gesture, Measured in seconds

Person 1 2 3 4 5 Iteration Key Hand Key Hand Key Hand Key Hand Key Hand 1 71.2 98.1 65.1 88.7 56.2 95.6 48.7 86.3 69.5 95.8 2 60.4 74.8 60.4 66.4 53.8 80.8 45.6 70.3 65.4 85.4 3 53.1 65.9 58.4 67.8 51.6 72.8 43.1 59.7 62.1 84.7 4 50.4 62.4 55.2 63.3 47.9 68.1 44.3 56.7 60.3 73.5 5 49.6 63.5 54.1 62.1 50.1 65.9 42.7 55.2 61.2 76.8 6 50.1 65.1 48.7 59.5 46.7 71.5 43.6 58.9 59.7 75.1 7 51.2 59.2 48.2 55.3 44.5 72.3 41.2 58.7 60.8 72.9 8 48.3 60.1 45.6 56.8 44.8 61.3 42.3 54.5 57.3 73.8 9 49.8 61.7 43.4 57.9 43.1 59.6 41.8 56.1 59.5 72.1 10 47.1 58.6 44.7 55.7 44.6 58.2 42.5 55.4 58.1 69.8

Comparison of Lap Time by Person 1 120 100 80 60 Key

Lap Time (s) Time Lap 40 Hand 20 0 1 2 3 4 5 6 7 8 9 10 Iteration

Comparison of Lap Time by Person 2 100 90 80 70 60 50 40 Key

Lap Time (s) Time Lap 30 Hand 20 10 0 1 2 3 4 5 6 7 8 9 10 Iteration

76

Comparison of Lap Time by Person 3 120 100 80 60 Key

Lap Time (s) Time Lap 40 Hand 20 0 1 2 3 4 5 6 7 8 9 10 Iteration

Comparison of Lap Time by Person 4 100 90 80 70 60 50 40 Series1

Lap Time (s) Time Lap 30 Series2 20 10 0 1 2 3 4 5 6 7 8 9 10 Iteration

Comparison of Lap Time by Person 5 120 100 80 60 Key

Lap Time (s) Time Lap 40 Hand 20 0 1 2 3 4 5 6 7 8 9 10 Iteration

37

77

Average Lap Time Comparison 120 100 80 60 Key Mean

Lap Time (s) Time Lap 40 Hand Mean 20 0 1 2 3 4 5 6 7 8 9 10 Axis Title

Figure 4.16: Lap Time Comparison between Keyboard Input and Hand Gesture

Table 11 12 Table 4.4: Experimental Results for Number of Collisions between Keyboard Input and Hand Gesture Person 1 2 3 4 5 Iteration Key Hand Key Hand Key Hand Key Hand Key Hand 1 8 27 7 25 7 26 4 21 9 31 2 7 20 7 21 6 22 3 14 7 25 3 8 17 6 22 7 20 3 15 8 22 4 6 18 8 17 5 18 4 12 7 26 5 6 16 5 15 3 15 3 13 6 24 6 5 16 6 16 3 16 2 15 7 23 7 5 15 5 13 4 17 3 17 5 25 8 6 12 4 14 3 18 2 13 4 21 9 5 13 4 11 2 15 1 15 4 23 10 5 14 4 12 3 13 2 11 4 22

78

Comparison of Collisions by Person 1 30 25 20 15 Key 10 Hand

Number of Collisions Number 5 0 1 2 3 4 5 6 7 8 9 10 Iteration

Comparison of Collisions by Person 2 30 25 20 15 Key 10 Hand

Number of Collisions Number 5 0 1 2 3 4 5 6 7 8 9 10 Iteration

Comparison of Collisions by Person 3 30 25 20 15 Key 10 Hand

Number of Collisions Number 5 0 1 2 3 4 5 6 7 8 9 10 Iteration

79

Comparison of Collisions by Person 4 25

20

15

10 Key Hand 5 Number of Collisions Number

0 1 2 3 4 5 6 7 8 9 10 Iteration

Comparison of Collisions by Person 5 35 30 25 20

15 Key 10 Hand

Number of Collisions Number 5 0 1 2 3 4 5 6 7 8 9 10 Iteration

Average No. of Collisions Comparison 35 30 25 20

15 Key Mean 10 Hand Mean

Number of Collisions Number 5 0 1 2 3 4 5 6 7 8 9 10 Axis Title

38 39 Figure 4.17: Collisions Comparison between Keyboard Input and Hand Gesture 80

For lap time comparison, it was found that the simulation using keyboard inputs resulted in shorter lap times compared to the one using hand gestures with a certain gap, as shown in Figure 4.16. The shape of the graph for each person who performed simulation is different, but learning curves were commonly presented by all of them, and the initial slope of the learning curve for hand gestures is steeper than the one for keyboard inputs. The reason for this could be that the difference between lap time for the first iteration and the second iteration with hand gestures by all five persons were significantly longer than the ones with keyboard inputs. As the trend is similar and the lap time gaps between keyboard inputs and hand gestures decrease over iterations, this proves that the system can perform as almost as well as keyboard inputs once the users get used to the system.

When comparing the number of collisions, it was found that the general trend of the graphs is similar to the ones for lap time comparison. However, several fluctuations over the iterations were found in collisions comparison graph with hand gestures, while the lap time graphs with hand gestures were generally stable after 5 or 6 iterations .

Although the system can perform differential steering with hand gestures, it was difficult for the system to achieve precise control while maintaining a certain speed.

However, the speeds of robots or electronic wheelchairs are slower than the ones in the simulation, therefore the gaps would be smaller if the simulation was performed at slower vehicle speeds.

4.6. Limitations

The limitation of this system is that it does not cover all the features of a car such as the indicator, window wiper, light and many others. Also, even though the system performance was experimented with and found to be reasonably satisfactory with visual

81 feedback, the system could not be tested in real vehicles so system performance in real life remains unknown. The most critical issue is that the hand can move much faster than an actual steering wheel. This is because the hand is not physically linked to the car, leading to lack of resistance and physical feedback. If the user moves their hand too fast, the system would eventually fail in tracking the hand, which could lead to dangerous situations in real life.

4.7. Remarks

In this chapter, a novel approach for natural assistive technology for persons with arms or legs disability using one hand only has been presented. This system uses a new algorithm for measuring steering angle from the movement of one hand and different depth values and number of fingertips to switch between gears or other functions. The system is helpful to those who have difficulties in driving due to their disabilities or injuries with their arms or legs.

Since the number of fingertips is limited to 5 using only one hand, other functions of the vehicles cannot be added if they exceed 5 (for geared vehicles, even fewer). One possible future extension is to extract more information from the hand using hand trajectory recognition, that can be used for controlling other features that can be added in future. Issuing voice command using speech recognition would help in managing more functions as well.

82

Chapter 5

Conclusion

This thesis focused on the challenges of assistive technologies by presenting two novel approaches to achieving HCI-based natural assistive technologies for the disabled.

Applying pointing gesture to assist visually impaired persons and using hand gesture with a single hand to assist persons with disabilities in arms or legs to drive have not been tried before to our knowledge, and this research presented both of them.

In this chapter, contributions of this thesis are presented and limitations and future work are also discussed, concluding with remarks.

5.1. Thesis Overview

This thesis has presented natural HCI-based assistive technologies that can help disabled persons in different ways. It presents an object-based navigation system with methods to integrate 3D pointing gesture estimated from elbow-hand line with object localization to achieve an object navigation system that can assist visually impaired persons to adjust to their new environments. It also presents a single-handed driving system with methods to use a single hand to drive vehicles with robust depth segmentation, hand tracking with a new algorithm for steering angle measurement from single hand, and fingertip detection.

The Object-based navigation system is mainly based on the pointing gesture, as it is used for both marking and searching for objects. The experimental results of the proposed system show that the accuracy of the pointing gesture from elbow to hand is better than the pointing gestures from other body joints such as head-hand and shoulder- hand lines. Also, the results show that the performance of the system is satisfactory and it can be applied to real-life situations with some improvements.

83

The Single-handed driving system is mainly based on hand gestures as it is used for steering and switching between functions and gears. The experimental results of the proposed system show that general system performance is satisfactory, but the system needs more improvements to be applied to real vehicles as the system cannot perform all the functions of real vehicles yet. More improvements in system usability, safety and capacity would enable the system to be useful in real vehicles in future.

5.2. Contributions

The contributions include novel approaches to achieve the following:

i) object-based navigation system that assists visually impaired users to adjust to

new environments in a natural way using Kinect, with a novel approach to

integrate pointing gestures with object localization and apply them to assistive

technology. A novel algorithm to estimate the pointing gesture direction from

elbow and use it as a searching tool is presented, making a new technical

contribution. This system also makes a contribution in the field of HCI

applications and assistive technologies, as it can help visually impaired persons

to learn where important objects are in a new indoor environment without

wearing any additional equipment.

ii) single-handed driving system that enables persons with difficulties in using their

arms or legs to drive or control vehicles in a natural way using Kinect,

presenting a novel approach to use a single hand to perform steering and

switching between gears and functions. A new algorithm for measuring steering

angle without cycle limits in real time from one hand is presented, making a

novel technical contribution. This system also makes a contribution in the field

of HCI applications and assistive technologies as it can enable persons with

difficulties in moving their arms or legs to drive or control vehicles with one

hand.

84

iii) novel evaluation methods for both systems. For the object-based navigation

system, novel methods for comparing pointing gesture accuracy between three

different body joints and measuring the system performance and usability are

presented. For the single-handed driving system, novel methods for measuring

steering angle accuracy, system performance and usability with simulation

software are presented.

5.3. Limitations and Future Work

The size and the number of objects tested by the object-based Navigation system in the experiments are limited. The main reason is that Kinect’s depth range is limited, therefore all the objects must be in Kinect’s depth range without one overlapping another. The other reason is that this system only works for objects at fixed locations, therefore small objects that can be easily moved are excluded. A new device with improved depth range should fix this problem. For pointing direction estimation, the accuracy of the elbow and hand detection needs to be improved as the current pointing direction estimation algorithm is highly dependent on the position of the elbow and hand.

Currently, the single-handed driving system was only tested with software. Integrating the system with hardware such as a robot or electronic car would help visualize the system performance more effectively. Improving the capacity and the usability of the system is crucial as the system needs to perform more real functions of real vehicles, not just steering and gear change. Using speech recognition or hand trajectory recognition would enable the system to be used in real vehicles as they will greatly increase the maximum number of commands or functions that can be issued.

In future, both the systems may be used for different purposes with some modification.

For example, the object navigation system can be used for marking objects in a 2D map and for teaching a robot to recognize what is being pointed at. The single-handed driving system can not only benefit disabled persons but also able persons in the future,

85 as there will be many advantages for driving with one hand while maintaining safety.

People can then legally use cellphones, drink water or eat snacks while driving with their other hand. This would help busy persons to save time as they can perform multiple actions while driving.

Both systems require user experience studies, based on which system usability could be improved.

5.4. Concluding Remarks

Most assistive technologies are focused on hardware rather than software. This thesis has developed methods to achieve HCI-based assistive technology that uses Kinect and users’ body parts instead of wearing or mounting hardware. Hardware and software- based assistive technologies have their own advantages. By encouraging more software- based assistive technology that balances with hardware-based assistive technology, new assistive technology that combines them in an optimal way can become reality, in order to maximize advantages while minimizing disadvantages. Instead of leaving disabled persons to use what is only available for them, creating a wide range of assistive technology at different scales and types would improve the quality of disabled persons’ lives. Hopefully, future research in this field will lead to more natural HCI-based assistive technologies, in practical industry, including the systems presented in this thesis.

86

Appendix A

Publications Arising from Thesis

1. Paper “Assisting Blind and Visually Impaired with Kinect” accepted and presented

at ARATA (Australian Rehabilitation & Assistive Technology Association)

National Conference 2012, ARATA, http://www.arata.org.au/arataconf12/papers/

2. Full Paper “Real-Time 3D Pointing Gesture with Kinect for Object-based

Navigation by the Visually Impaired”, Proc. 4th ISSNIP Biosignals & Biorobotics

Conference (BRC 2013) , IEEE

3. Full Paper “Single-Handed Driving System with Kinect” accepted by 15th

International Conference on Human-Computer Interaction (HCII 2013), Las Vegas,

USA, July 21-26 (to be published)

87

Appendix B

Acronyms and Abbreviations

2D/3D Two/Three-Dimensional

RGB Red Green Blue

HCI Human-Computer Interaction

HRI Human- Robot Interaction

BSD Berkeley Software Distribution

HMM Hidden Markov Model

DTW Dynamic Time Warping

SVM State Vector Machine

USV Ubiquitous Stereo Vision

FAM Fuzzy Associative Memory

EM Expectation Maximization

ToF Time-of-Flight

DLT Direct Linear Transform

CBCH Cascade of Boosted classifier working with Haar-like feature

PCA Principal Component Analysis

RANSAC Random Sample Consensus

VOP Video Object Plane

88

Bibliography

[1] D. Terdiman, “Microsoft looks to Kinect as game-changer, CNET”. Internet:

http://reviews.cnet.com/8301-21539_7-20007681-10391702.html, June. 14,

2010 [May. 16, 2012].

[2] J. M. Benjamin, N. A. Ali, and A. F. Schepis, “A Laser Cane for the Blind”,

Proceedings of the San Diego Biomedical Symposium, vol. 12, pp. 53-57, 1973..

[3] S. Shoval, J. Borenstein and Y. Koren, “Auditory guidance with the Navbelt-a

computerized travel aid for the blind”, Systems, Man, and Cybernetics, Part C:

Applications and Reviews, IEEE Transactions, vol. 28(3), pp. 459-467, 1998.

[4] J. Borenstein, and I. Ulrich, The GuideCane – “A Computerized Travel Aid for

the Active Guidance of Blind Pedestrians”, IEEE International Conference on

Robotics and Automation, vol.2, pp.1283-1288, 1997.

[5] Bay Advanced Technologies ltd (BAT) K – Sonar Ultrasonic sensing device for

the blind & visually impaired | Auckland. Internet: http:://www.batforblind.co.nz,

[May. 15, 2012].

[6] Tongue Drive Assistive Technology, Internet:

http://users.ece.gatech.edu/mghovan/index_files/TongueDrive.htm

[7] Vehicle Hand Controls and Adaptive Products for Driving, http://www.disabled-

world.com/assistivedevices/automotive/

[8] F. Soltani, F. Eskandari, S. Golestan, "Developing a Gesture-Based Game for

Deaf/Mute People Using Microsoft Kinect", Sixth International Conference

on Complex, Intelligent and Software Intensive Systems (CISIS), 2012, pp.491-

495, 2012.

[9] I-Tsun Chiang, Jong-Chang Tsai, Shang-Ti Chen, "Using Xbox 360 Kinect

Games on Enhancing Visual Performance Skills on Institutionalized Older

Adults with Wheelchairs," 2012 IEEE Fourth International Conference

89

on Digital Game and Intelligent Toy Enhanced Learning (DIGITEL), pp.263-

267, 2012.

[10] “Human-Computer Interaction | Mensch-Computer Interaktion | Usablity

Engineering | Konstantz”. Internet: http://hci.uni-

konstanz.de/blog/2011/03/15/navi/?lang=en, [Oct. 4, 2012].

[11] C. Park, M. Roh, and S. Lee, “Real-time 3D pointing gesture recognition in

mobile space”, 8th IEEE International Conference on Automatic Face &

Gesture Recognition, pp. 1-6, 2008.

[12] “Your hands are speaking for you | Psychology today“, Internet:

http://www.psychologytoday.com/blog/brain-wise/201209/your-hand-gestures-

are-speaking-you, [Dec, 11, 2013]

[13] “Kinect unleashed for commercial Windows apps”, Internet:

http://www.windowsfordevices.com/c/a/News/Kinect-unleashed-for-

commercial-Windows-apps/, [Feb. 21, 2013]

[14] “Kinect tinker brings motion controls to Adobe Flash”, Internet:

http://news.cnet.com/8301-10805_3-20028237-75.html, [Feb. 21, 2013]

[15] N. Jojic, B. Brumitt, B. Meyers, S. Harris and Huang, T, “Detection and

estimation of pointing gestures in dense disparity maps”, Proceedings of the

Fourth IEEE International Conference on Automatic Face and Gesture

Recognition, pp. 468-475, 2000.

[16] Point Grey Research, http://ww2.ptgrey.com/

[17] R. Kehl, and L. Van Gool, “Real-time pointing gesture recognition for an

immersive environment”, Proceedings of the Sixth IEEE International

Conference on Automatic Face and Gesture Recognition, pp. 577- 582, 2004.

[18] K. Nickel, E. Scemann and R. Stiefelhagen, “3D-tracking of head and hands for

pointing gesture recognition in a human-robot interaction scenario”,

Proceedings of the Sixth IEEE International Conference on Automatic Face and

Gesture Recognition, pp. 565- 570, 2004.

90

[19] Y. Yamamoto, I. Yoda, and K. Sakaue, “Arm-pointing gesture interface using

surrounded stereo cameras system," Proceedings of the 17th International

Conference on Pattern Recognition, pp. 965- 970, 2004.

[20] Y. Guan, Stereo vision based video real-time “3D pointing gesture recognition”,

IET Conference on Wireless, Mobile and Sensor Network, pp. 355-358, 2007.

[21] S. Carbini, J. Viallet and O. Bernier, “Pointing gesture visual recognition for

large display”, In Proc. of International Workshop on Visual Observation of

Deictic Gestures, pp. 27-32, 2004.

[22] H. Kim, S. Kim and S. Park, “Pointing gesture-based unknown object extraction

for learning objects with robot,” International Conference on Control,

Automation and Systems, pp. 2156-2161, 2008.

[23] C. Park, M. Roh, and S. Lee, “Real-time 3D pointing gesture recognition in

mobile space”, 8th IEEE International Conference on Automatic Face &

Gesture Recognition, pp. 1-6, 2008.

[24] E. Sato, T. Yamaguchi and F. Harashima, “Natural Interface Using Pointing

Behavior for Human–Robot Gestural Interaction”, IEEE Transactions on

Industrial Electronics, vol. 54(2), pp. 1105-1112, 2007.

[25] L. Zhi, and R. Jarvis, “Visual interpretation of natural pointing gestures in 3D

space for human-robot interaction”, 11th International Conference on Control

Automation Robotics & Vision, pp. 2513-2518, 2010

[26] Chai, X., Fang, Y., Wang, K.: Robust hand gesture analysis and application in

gallery browsing, IEEE International Conference on Multimedia and Expo,

pp.938-941 (2009)

[27] Bao, P., Binh, N., Khoa, T.: A New Approach to Hand Tracking and Gesture

Recognition by a New Feature Type and HMM. Sixth International Conference

on Fuzzy Systems and Knowledge Discovery, FSKD ’09, vol.4, pp.3-6 (2009)

91

[28] Wang, K., Wang, L., Li, R., Zhao, L.: Real-Time Hand Gesture Recognition for

Service Robot. International Conference on Intelligent Computation Technology

and Automation (ICICTA), vol.2, pp.946-979 (2010)

[29] Raheja, J.L., Shyam, R., Kumar, U., Prasad, P.B.: Real-Time Robotic Hand

Control Using Hand Gestures. Second International Conference on Machine

Learning and Computing (ICMLC), pp. 12-16 (2010)

[30] Yu, C., Wang, X., Huang, H., Shen, J., Wu, K.: Vision-Based Hand Gesture

Recognition Using Combinational Features. Sixth International Conference on

Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP),

pp.543-546 (2010)

[31] Ming-Hsuan Yang; Ahuja, N.; Tabb, M.; , "Extraction of 2D motion trajectories

and its application to hand gesture recognition," Pattern Analysis and Machine

Intelligence, IEEE Transactions on , vol.24, no.8, pp. 1061- 1074, Aug 2002,

doi: 10.1109/TPAMI.2002.1023803

[32] Oka, K.; Sato, Y.; Koike, H.; , "Real-time fingertip tracking and gesture

recognition," Computer Graphics and Applications, IEEE , vol.22, no.6, pp. 64-

71, Nov/Dec 2002, doi: 10.1109/MCG.2002.1046630

[33] C. Keskin, A.N. Erkan, L. Akarun, “Real time hand tracking and 3D gesture

recognition for interactive interfaces using HMM”, Proceedings

ICANN/ICONIP, Istanbul, 2003.

[34] Bhuyan, M.K.; Ghosh, D.; Bora, P.K.; , "Feature Extraction from 2D Gesture

Trajectory in Dynamic Hand Gesture Recognition," Cybernetics and Intelligent

Systems, 2006 IEEE Conference on , vol., no., pp.1-6, 7-9 June 2006, doi:

10.1109/ICCIS.2006.252353

[35] Elmezain, M.; Al-Hamadi, A.; Appenrodt, J.; Michaelis, B.; , "A Hidden

Markov Model-based continuous gesture recognition system for hand motion

trajectory," Pattern Recognition, 2008. ICPR 2008. 19th International

Conference on , vol., no., pp.1-4, 8-11 Dec. 2008, doi:

10.1109/ICPR.2008.4761080 92

[36] Tan Wenjun; Wu Chengdong; Zhao Shuying; Jiang Li; , "Dynamic hand gesture

recognition using motion trajectories and key frames," Advanced Computer

Control (ICACC), 2010 2nd International Conference on , vol.3, no., pp.163-

167, 27-29 March 2010, doi: 10.1109/ICACC.2010.5486760

[37] Balakrishna, D.; Sailaja, P.; Rao, R.V.V.P.; Indurkhya, B.; , "A novel human

robot interaction using the Wiimote," Robotics and Biomimetics (ROBIO), 2010

IEEE International Conference on , vol., no., pp.645-650, 14-18 Dec. 2010, doi:

10.1109/ROBIO.2010.5723402

[38] Wang Ke; Wang Li; Li Ruifeng; Zhao Lijun; , "Real-Time Hand Gesture

Recognition for Service Robot," Intelligent Computation Technology and

Automation (ICICTA), 2010 International Conference on , vol.2, no., pp.976-

979, 11-12 May 2010 doi: 10.1109/ICICTA.2010.413

[39] bte Abdul Gaus, Y.F.; Tze, F.W.H.; Kin, K.T.T.; , "Feature extraction from 2D

gesture trajectory in Malaysian Sign Language recognition," Mechatronics

(ICOM), 2011 4th International Conference On , vol., no., pp.1-6, 17-19 May

2011, doi: 10.1109/ICOM.2011.5937179

[40] OpenNI, http://www.openni.org/

[41] OpenCV, http://opencv.willowgarage.com/wiki/

[42] Unity3D, http://unity3d.com/

[43] D. Pascolini and SPM. Mariotti, “Global estimates of visual impairment: 2010”,

British Journal Ophthalmology Online, 2011.

[44] “San Antonio Lighthouse for the blind”. Internet: http://www.salighthouse.org,

[Aug. 21, 2012].

[45] “Skeleton Tracking OpenNI”, Internet:

http://gmv.cast.uark.edu/uncategorized/working-with-data-from-the-

kinect/attachment/openni_testapp_screen/, [Jan. 28, 2013].

[46] “PrimeSense”, Internet: www.primesense.com/ , [Dec. 6, 2012].

93

[47] “SmartGate”. Internet: http://www.customs.gov.au/smartgate/default.asp, [Oct.

14, 2012].

[48] “The Australian ePassport”. Internet: http://www.dfat.gov.au/dept/passports/,

[Oct.14, 2012].

[49] “Kinect Experiments: Hometown GP Kinect F1 Racer”. Internet:

http://www.youtube.com/watch?v=W4ZNHqVdM9o, [Oct. 25, 2012].

[50] “FRC Team 2809, K-Botics driving robot with Kinect”. Internet:

http://www.youtube.com/watch?v=TsPQXiii3X4, [Oct. 25, 2012]

[51] “Kinect for Windows | Voice, Movement & Gesture Recognition Technology”.

Internet: http://www.microsoft.com/en-us/kinectforwindows/, [June. 15, 2012].

[52] Torcs, http://torcs.sourceforge.net/

[53] VDrift, http://vdrift.net/

[54] Stunt Rally, https://code.google.com/p/vdrift-ogre/

94