<<

Analysis and Categorization of 2D Multi-Touch Recognition Techniques

THESIS

Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University

By

Aditi Singhal

Graduate Program in and Engineering

The Ohio State University

2013

Master's Examination Committee:

Dr. Rajiv Ramnath, Adviser

Dr. Jay Ramanathan, Co-Adviser

Copyright by

Aditi Singhal

2013

Abstract

Various techniques have been implemented for wide variety of gesture recognition applications. However, discerning the best technique to adopt for a specific gesture application is still a challenge. Wide variety, complexity in the gesture and the need for complex make implementation of recognition difficult even for basic gesture applications. In this thesis, different techniques for two dimensional (2D) touch are reviewed, compared and categorized based on user requirements of the gesture applications. This work introduces two main paradigms for gesture applications: a) gestures for direct manipulation and navigation, b) gesture-based languages. These two paradigms have specific and separate roles in 2D touch gesture systems. To provide the clear distinction between two paradigms, three different algorithms are implemented for basic to complex 2D gestures using simple sometimes, as well as complex techniques such as linear regression and Hidden Markov Models. Thereafter, these algorithms are analyzed based on their performance and their fit with application requirements.

ii

Dedication

I lovingly dedicate this thesis to my husband Shashnk Agrawal, who supported me each step of the way, my son Aryan Agrawal who in his 3 year old wisdom knew that mommy should not be disturbed while studying, and my late father Dr. Devendra Kumar Singhal whose passion for education encouraged me to increase my knowledge and was a driving

force in my decision to pursue a graduate degree.

iii

Acknowledgments

I would like to express my deepest gratitude to my advisor, Dr. Rajiv Ramnath, for his excellent guidance, patience, and provision of wonderful atmosphere to excel. His continuous support and ease of interaction when asked for any work or non-work related assistance were greatly appreciated.

I would also like to thank my co-adviser, Dr. Jay Ramanathan for believing in me.

Her guidance helped me in doing research and in writing my thesis.

In addition, a special thank you to Tom Lynch for all his help and support, and also the encouragement and insightful comments, which helped me shape my thesis.

iv

Vita

2001-2005...... BS, Information Technology, Galgotia

college of Engineering and Technology,

India

April 2006 to May 2007…………………… Tata Consultancy Services, India

Sep 2010 to Present………………………….Graduate student at The Ohio State

University

Fields of Study

Major Field: Computer Science and Engineering

v

Table of Contents

Abstract……………………………………………………………………………….…...ii

Dedication………………………………………………………………………………...iii

Acknowledgments………………………………………………………………………..iv

Vita…………………………………………………………………………...…….……..v

List of Tables…………………………………………………………………………...... ix

List of Figures……………………………………………………………………………..x

Chapter 1: Introduction...... 1

1.1 Gesture Recognition Overview ...... 1

1.2 Motivation...... 3

1.3 Thesis contribution and scope...... 4

Chapter 2: Related Work ...... 6

Chapter 3: Architectural Framework and Requirements ...... 12

3.1 Architecture Overview ...... 12

3.2 Windows 7 Touch Framework...... 13

3.2.1 Touch Events ...... 13

vi

3.2.2 Manipulation Events...... 14

3.2.3 Touch Frame-Reported Event ...... 15

Chapter 4: Direct Manipulation ...... 17

4.1 Algorithm Overview ...... 17

4.2 Experimental Procedure and Software Design ...... 18

4.3 Results and Findings ...... 22

Chapter 5: Normalization and Gesture Recognition...... 24

5.1 Algorithm Overview ...... 24

5.2 Algorithm concepts ...... 24

5.2.1 Input Homogenization and Normalization ...... 24

5.2.2 Gesture Recognition ...... 27

5.3 Experimental Procedure and Software Design ...... 31

5.3.1 Normalization and Heuristic...... 31

5.3.2 Normalization and Pattern Matching...... 33

5.4 Results and Findings ...... 36

Chapter 6: Generalized Gesture using The Hidden Markov Model ...... 40

6.1 Introduction ...... 40

6.2 Hidden Markov Model overview ...... 40

6.2.1 Specification and parameter of an HMM ...... 41

vii

6.2.2 Learning Problem of Hidden Markov Model...... 42

6.3 The Hidden Markov Model in Gesture Recognition...... 44

6.4 Experimental Procedure and Software Design ...... 47

6.5 Results and Findings ...... 50

Chapter 7: Comparison, Conclusion and Future Work...... 55

7.1 Algorithm selection and distinction based on two paradigms of gesture recognition:

...... 55

7.2 Findings...... 57

7.3 Conclusions...... 58

7.4 Future Work ...... 58

References...... 60

viii

List of Tables

Table 1: Line circle interception...... 26

Table 2: Heuristic two finger gesture results ...... 36

Table 3: Pattern matching 2 finger gesture results ...... 37

Table 4: Results for digit database...... 53

Table 5: Results for customized database...... 53

Table 6: Results for DragZoomRotate database ...... 54

ix

List of Figures

Figure 1: Gesture Recognition Systems and Techniques...... 2

Figure 2: Tabletop using SmartSkin Technology [10]...... 3

Figure 3: Touch sensor technology in tabletops [10]...... 7

Figure 4: Camera mounted hat and plastic markers to project gesture information [16]. .. 8

Figure 5: Image feature extraction and gesture recognition [18]...... 8

Figure 6: Various hand gloves in hand tracking systems [9]...... 9

Figure 7: Gesture recognition using classification [20]...... 10

Figure 8: Use of the remote with other applications for gesture recognition [21, 22]11

Figure 9: Ordered flow of the touch events...... 14

Figure 10: Ordered flow of the touch events...... 15

Figure 11: Package diagram for “Direct Manipulation”...... 18

Figure 12: Drag gesture as finger moves ...... 20

Figure 13: Zoom gesture as finger moves...... 21

Figure 14: Rotate gesture with angle view...... 21

Figure 15: Results of Direct Manipulation algorithm...... 22

Figure 16: Line circle interception algorithm...... 25

Figure 17: Gesture homogenization and normalization...... 26

x

Figure 18: Linear least square...... 28

Figure 19: Normalization and Heuristic algorithm-package diagram...... 31

Figure 20: Normalization and Pattern Matching –Package Diagram...... 35

Figure 21: Results for Normalization and Pattern Matching Algorithm...... 37

Figure 22: Gesture errors ...... 38

Figure 23: Hidden Markov Model State Diagram...... 43

Figure 24: HMM states and touch gestures...... 44

Figure 25: Hidden Markov Model Learning Classifier ...... 46

Figure 26: HMM Forward Topology...... 47

Figure 27: Generalized Gesture using HMM Package Diagram ...... 48

Figure 28: Rotate gesture recognition after HMM training...... 51

Figure 29: Three implemented databases for this algorithm ...... 52

xi

Chapter 1: Introduction

1.1 Gesture Recognition Overview

The most expressive components of human interactions are hand gestures. Hand and body gestures provide a natural way to communicate information.

Gesture recognition systems apply the concept of gestures in the interactive and engaging applications. We are now seeing the increasing usage of hand and body gestures within everyday devices. Multi touch gesture support is increasing in mobile devices, commercial systems and in personal computers in general. Thus, a wide range of gesture recognition applications have been developed in the field of artificial intelligence resulting in various gesture recognition devices on the market today. Examples include multi touch tabletops, personal computers running Windows 8 and multi-touch screens in the Wii, Leapfrog, small MP3 players, smart phones, etc. Gesture recognition applications use 2D as well as multidimensional gesture recognition, camera based feature detection and motion tracking and analysis (Figure 1).

In this thesis, finger touch 2D gestures application are discussed, analyzed and categorized.

1

Figure 1: Gesture Recognition Systems and Techniques1.

Performance and user friendliness are the key factors that are shifting the focus from mouse-based interaction to touch-based interaction. There has been research, which indicates that if implemented correctly, a multi-touch interface provide better performance [2, 6, 7] and user friendliness than a single mouse interface. For instance,

Kin, Agrawala and DeRose [1] found that the fastest multi-touch interaction is about twice as fast as mouse-based selection, independent of the number of targets. The reduction in selection time is primarily due to the direct-touch nature of multi touch

(about 83% of the reduction) [1].

1 Image from URL : https://www.google.com/search?q=gesture+recognition&hl=en&tbm=isch&tbo=u&sourc e=univ&sa=X&ei=yWNcUczcDM-04APc- IH4Bg&ved=0CF4QsAQ&biw=1054&bih=518 2

In 2D multi-touch gesture recognition systems, the touch screen captures the movement of multiple fingers on an interactive surface. New touch computing technologies not only provide a direct touch manipulation experience to the user, but also allow multiple users to operate on the same touch screen simultaneously (such as on large tabletops; Figure 2) without interfering with other’s work [5,10].

Figure 2: Tabletop using SmartSkin Technology [10].

1.2 Motivation

Although multi-touch devices are becoming very popular, implementing these touch gestures and multi-touch systems are still considered a specialized task due to the inherent difficulty of implementing gestures [3, 4]. Multi-touch systems should be designed to provide more interactive experience for the user. The fastest, most accurate, gesture-recognition algorithm will not be appreciated if the user finds the application difficult to understand. Moreover, a touch interface is very different from mouse- interface. With a mouse interface, a mouse can be moved around the screen and its position can be tracked before taking any action. However, with a touch interface the touch itself may cause the action.

3

Other difficulties in implementing the multi-touch gestures are touch inaccuracy due to finger size [4], various touch state transitions (For example: first transition as finger first touches the screen, another transition: finger movements and lastly: finger stops), difference between the gesture implementation using multiple fingers of one hand, or different hands [3] and number of fingers being used. The size of touch screen also affects the implementation of multi-touch systems. For instance, when small touch devices are used, a user largely uses one or two fingers. However, when interacting with large touch surfaces like tabletops, a user is given a larger space to work on and can use multiple fingers more naturally.

Any gesture recognition algorithm highly depends upon the type of touch device of the gesture application. The, the wide variety of gesture applications, different development platforms and the various ways in which to implement a gesture system is confusing to the developer when he tries to approach for developing gesture application.

The premise of this thesis is to provide this guidance, i.e. guidelines for selecting the paradigms as well as the algorithms for 2D touch based gesture recognition.

1.3 Thesis contribution and scope

Gesture-based development is widely open research space. Various studies in the past provided detailed information of each type of gesture application. However, a user has to thoroughly research each to understand each domain application and algorithm selection process. The major contribution of this thesis is to shrink the research space by providing feedback for algorithm selection and developing easier applications.

4

In order to analyze and compare various 2D touch techniques, this thesis introduces two main paradigms for gesture applications:

o Gestures for manipulation and navigation: This paradigm is for gesture

applications that provide direct manipulations (such as zoom, drag, rotate)

and navigation (such as flick, one or two finger tap etc.) of natural

elements such as photographs.

o Gesture based language: This paradigm provides for a finer interpretation

of gestures through a gesture language consisting of combination of

complete gestures.

Thinking in terms of these two paradigms provides a means of distinguishing between applications for purpose of selecting the proper gesture algorithm. To implement these two paradigms, three different algorithms have been developed. These algorithms use techniques like mathematical image vector transformation, linear regression for differentiating zoom and rotate gestures, line-circle intersection algorithm for raw touch point input sampling, and Hidden Markov Model for recognizing gesture languages.

The scope of this thesis is to understand and recognize touch gestures on touch screen computers. However, these algorithms could also be used for touch smart applications on smart phone, tablets and tabletops. This work can also be extended to 3D gesture application.

5

Chapter 2: Related Work

In the earlier years of gesture development, some basic user interaction support and methodologies were developed (e.g., Artkit [13] and Agate [14]), which described key concepts regarding gesture development but these toolkits are limited in scope and outdated in today’s world.

One of the old technologies with various new concepts and modifications in gesture recognition is to develop the interactive touch surfaces such as tabletops, which helps the applications to recognize gestures. For example, Jun Rekimoto [10] talks about an interactive surface system based on the SmartSkin sensor which uses the capacitive sensing technique. It describes that when the density of the sensor mash is increased, the shape of the hand can be accurately determined, detecting the different positions of the fingers. The new table tops techniques (Figure 3) focus on improving and modifying the hardware and software processes, and thus implement the gesture recognition system that are beyond the scope of normal mouse-based interactions [10,11].

6

Figure 3: Touch sensor technology in tabletops [10].

In another example, Mike Wu, Chia Shen and others have created a design methodology for recognizing gestures. They use the concept of registration, relaxation, and reusing gestures for multi-point direct touch surfaces and use a DiamondSpin Java

Toolkit and the DiamondTouch table for gesture development [15].

Other gesture recognition techniques like using wearable hand gloves, vision- based 2D and 3D gesture systems, virtual reality environments, and mouse and pen-based gestural input have been widely discussed in recent years.

Vision-based gesture techniques [7, 16, 17] use cameras, projectors, and other hardware devices to understand the gestures. For example, the gesture interface created by Pranav Mistry, Pattie Maes and Liyan Chang [16] uses a tiny projector and a camera mounted on a hat or coupled in a pendant-like wearable device (Figure 4). The main purpose of this technique is to project the information on any kind of physical object (like walls and tables), and then use natural hand gestures (such as arm movements or direct object interaction) to interact with the projected information. The idea is to process the 7 video stream data captured by the camera, and use plastic color markers to capture the finger movement.

Figure 4: Camera mounted hat and plastic markers to project gesture information [16].

Other approaches in camera-based gesture recognition heavily use image- processing techniques like feature extraction, edge detection, and touch interpretation to provide more sophisticated gesture recognition techniques [17, 18]. Camera based gesture recognition techniques [18] are more suitable for application domains (like video conferencing) and interface as in the movie Minority Report (Figure 5).

Figure 5: Image feature extraction and gesture recognition [18].

8

Another type of gesture-recognition requires the use of wearable gloves. The early development of hand-tracking systems using gloves began with the concept of measuring finger movement through various kinds of sensor technologies [19] placed on the gloves

(Figure 6). For example, J LaViola [9] describes in his survey of hand-tracking systems details of different type of gloves and technology being used for gesture recognition. The technologies used in hand tracking are position tracking, optical tracking, marker systems, magnetic tracking, or acoustic tracking.

Figure 6: Various hand gloves in hand tracking systems [9].

However, the challenge that these gloves must a) always be worn by the user, and b) should be attached to the computer, created hindrances in gesture development, causing developers to focus more on vision-based gesture development environments.

But the recent development of colored gloves [20] has taken hand-tracking systems to new dimensions. The color design of the glove consists of large color patches where varieties of color patterns are used as a key concept for hand tracking [Figure 7]. A database of hand poses was created and then the nearest neighbor approach for the pose which needed to be recognized was applied.

9

Figure 7: Gesture recognition using classification [20].

One, widely accepted, low cost gesture recognition application currently on the market is the Wii RemoteTM which is a sensor-based, multimodal, and touchless approach with a very strong and different hardware support [21]. Wii RemoteTM can also be used to create a custom-based gesture recognition system where the open source software libraries, for connecting to a Wii RemoteTM, parsing the input report data, and configuring the controller, are available for nearly every major development platform on

Microsoft Windows, Mac OS, and Linux (Figure 8). Bellucci and Andrea describe in their paper the basic interaction list for touchless interaction devices and libraries [22].

10

Figure 8: Use of the with other applications for gesture recognition [21, 22]

In the recent years, Microsoft has provided enormous Application Programming

Interface (API) support for tablets and touch gesture recognition. Windows 7 developer platform support for Windows touch and Windows 7 Multitouch .NET Interlop Sample

Library are the part of recent supports provided by Microsoft. This support mostly deals with the basic touch gestures (like drag, zoom, rotate, one finger tap, two finger tap, swipe etc.) which are supported by the Microsoft Touch hardware.

Other Microsoft support for 3D gesture recognition is the Software

Development Toolkit (SDK) used in video gaming application which uses feature vectors that are derived from the skeleton model provided by Kinect SDK in real time.

11

Chapter 3: Architectural Framework and Requirements

3.1 Architecture Overview

o Hardware requirements: Touch sensor computer

o Software requirements: Visual studio 2010, Windows 7, Windows 7 Touch

libraries.

In this thesis, C# development is used for algorithm implementation. Visual studio offers two main frameworks for C# development:

o Windows Form framework

o Windows Presentation Foundation (WPF) framework

“Windows Form” framework is legacy framework and has been used for many years. WPF framework is new and provides a good new technology support.

Both the frameworks support the touch development. However, touch support provided by the Windows Form framework is limited, complicated, and uses various flags and messages to handle the touch application.

WPF provides the good touch support by providing various types of touch events

(like manipulation and touch events) and can be easily implemented by the developers.

Therefore, in this thesis, WPF framework is chosen for the development of applications.

12

A brief introduction about Window 7 Touch support framework is provided to better understand the pros and cons of each technique.

3.2 Windows 7 Touch Framework

Windows 7 Touch support starts from providing raw touch point events to higher- level event support for object manipulations. All the routed touch events are automatically called when the user interacts with the touch surface and moves his finger on the touch device. The order in which these events are called is predefined. Various touch events presented in Windows7 are discussed below.

3.2.1 Touch Events

These events provide raw touch point information and touch finger IDs when the user touches the input screen (Figure 9). These touch events (which deal with specialized touch interpretations, and handling and customizing complex gestures based on different application requirements) are suited for the “Gesture-based language paradigm”. For example, these events must be used to implement handwriting recognition.

13

Figure 9: Ordered flow of the touch events.2

3.2.2 Manipulation Events

Manipulation events are higher-level events and handle object transformation very well. These events (which provide support for some basic gestures like drag zoom, flick, and click etc., supported by Windows framework) are suited for the “Gestures for manipulation and navigation paradigm”. There are only three types of manipulations, translation, expansion, and rotation, that can be performed by these events.

As shown in Figure 10, a complete manipulation begins with the

“ManipulationStarting” event (followed soon thereafter by “ManipulationStarted”) and ends with “ManipulationCompleted”. In between, there might be many

“ManipulationDelta” events. Manipulation events use the matrix transformation property to manipulate objects in a 2D plane. By using the matrix transform, the matrix object is

2 Image from URL :http://msdn.microsoft.com/en-us/library/ms754010.aspx

14 created which is set on the manipulated image element. Once the object is created, calling object’s scale, translate and rotate function can modify the image3.

Figure 10: Ordered flow of the touch events.

3.2.3 Touch Frame-Reported Event

A Touch “Frame-Reported” event is different from the touch and manipulation events. The reason is that the frame-reported event does not follow the same event model used by other WPF input events. Touch events and manipulation events use the bubbling strategies that potentially route through the object tree of a (UI) and are exposed as an element specific level.

The frame-reported event uses the series of multi-touch messages or touch points to handle different finger touches. The touch point details can be accessed through

“TouchFrameEventArgs” event data. So, the frame-reported event is handled at the

3 http://msdn.microsoft.com/en-us/magazine/ff898416.aspx 15 application level and therefore one cannot use the sender parameter of the event handler to determine what element is touched.

The other events, which are supported by Windows 7 Touch interface, are the stylus events used for drawing on the touch surface.

16

Chapter 4: Direct Manipulation Algorithm

4.1 Algorithm Overview

As discussed in the previous chapter, manipulation events are high-level touch events that enable users to manipulate gestures easily, thus providing inspiration for direct manipulation algorithm. The concept of a direct manipulation algorithm is to provide gesture manipulation by handling the raw touch point events instead of using the manipulation events. Direct manipulation algorithm could be used in other language domains where high-level functionality similar to manipulation events is not present.

Using this algorithm, the user should be able to perform the gesture transformation on the raw touch points and also achieve performance similar to manipulation events.

The structure of this algorithm is to capture the raw touch points of the image in a

2D plane, calculate the distance and rotation vectors, and change the image properties

(like translation, scaling and rotation properties) when the finger moves on the touch surface. When image properties are changed simultaneously with finger movement, gesture transformation takes place which is fast and effective.

In this work, the algorithm is implemented for drag, rotate and zoom in, and zoom out gestures, and image is transformed according to the user touch input.

17

4.2 Experimental Procedure and Software Design

This algorithm is divided into three main modules [Figure 11] : MyImage,

ImageHandler and ImageTransformer.

Figure 11: Package diagram for “Direct Manipulation”.

MyImage: The purpose of this module is to handle the image on which the user wants to perform a gesture transformation. The image is handled as a user control where various image properties like Angle, Path, ScaleX, ScaleY, X, Y are declared and defined. The Extensible Markup Language (.XML) file of this user control is used for binding these properties and attaching it to different image transformation properties.

ImageHandler: The “Direct Manipulation” algorithm is not only limited to a single image manipulation but deals with the different images placed on the screen. So,

18 the purpose of this module is to handle various images with the help of operations like

“Find Picture”, “Bring Picture to Front”, “Load Pictures”, etc. When one or more pictures are placed on the screen, the Z-Index property needs to be established to manipulate and handle the pictures. The purpose of this property is to bring the picture in front that requires manipulation. Also, the touch events are implemented in this module.

ImageTransformer: This module provides the implementation logic for image transformation. When a user touches the picture on the screen, control is transferred to the “ImageHandler” module to process the three touch events (touch down, touch move and touch up). These touch events pass control to the “ImageTransformation” module where image properties, like angle and scale are calculated and image is transformed.

Image transformation logic for various gestures:

o Drag Gesture

The drag gesture uses one finger and the object is moved in the same direction as

the finger moves on the touch surface. Figure 12 shows how hallow ball represent

the previous state, and touch represents the current state. So, if an object’s current

position is (4,4) and previous position is (1,1), the translation on the drag gesture

would be

(4,4) –(1,1) = (3,3)

19

Figure 12: Drag gesture as finger moves

o Zoom Gesture

The zoom gesture uses two fingers to perform gesture. In Figure 13, hallow ball

represents the previous state points (the vector between these points calculates the

distance between the two fingers when fingers first touch the screen). Similarly,

the distance between orange touch calculates the distance after the points are

moved and the zoom has occurred.

The mathematical formula for zoom transformation is described as follows:

InitialDistanceVector = InitialFirstFingerPosition – InitialSecondFingerPosition

NewDistanceVector = NewFirstFingerPosition – NewSecondFingerPosition

Scale Calculation:

Scale = Scale * length_(NewDistanceVector) / length_(InitialDistanceVector)

20

Figure 13: Zoom gesture as finger moves.

o Rotate Gesture

Rotate is also a two finger gesture. The idea behind it is to calculate the change in

angle to perform the rotate gesture.

Figure 14: Rotate gesture with angle view.

The rotation angle, which is required, is shown in Figure 14. This angle calculates

the angle change between the previous and current position vectors. Here, atan2

function is used to calculate the angle change. The atan2(y, x) gives the angle in

radians between the positive x-axis of a plane and the point given by the

coordinates (x, y) on it. The angle is positive for counter-clockwise angles (upper

half-plane, y > 0), and negative for clockwise angles (lower half-plane, y < 0).

Therefore, the change in angle is

21

angleChange = atan2(CurrentPosition.y, CurrentPosition.x) –

atan2(PreviousPosition.y, PreviousPosition. x)

4.3 Results and Findings

The “Direct Manipulation” algorithm is very simple and easy to implement and does not require the complex knowledge domain of gesture recognition systems such as template creation methodologies, pattern matching algorithms and artificial intelligence approaches to implement gestures. This algorithm requires the simple mathematical vector calculation for image transformation.

Figure 15: Results of Direct Manipulation algorithm.

Another important result is the algorithm’s suitability for “Gestures for manipulation and navigation paradigm” which only focuses on touch manipulation and navigation making it less applicable towards gesture recognition applications. Although, 22

This algorithm provides recognition between drag and zoom/rotate gestures but its main purpose is to perform the object transformation. For instance, in the algorithm when image properties are calculated and changed, its angle and scale properties are changed simultaneously and thus image is scaled and rotated simultaneously. Therefore, this algorithm cannot distinguish between rotate and zoom gesture (Figure 15).

The shortcoming of this algorithm is that direct manipulation algorithm is only suitable for basic simple gesture manipulation and navigation (e.g., click, drag, one finer tap, two-finger tap, flick, rotate and zoom. It does not deal with customized and symbolic gestures (e.g., different type of letters, digits and other symbols), and therefore, has a limited scope.

Even with this disadvantage, direct manipulation could be used for wide applications development areas. In today’s world, all smart phones and small touch screen surfaces require touch support and basic 2D gestures implementation functionality. This algorithm is best suited for such development.

23

Chapter 5: Normalization and Gesture Recognition

5.1 Algorithm Overview

This algorithm structure is more complex from the previously discussed algorithm. The idea behind the “Normalization and Gesture Recognition” algorithm is to perform input point sampling and apply the gesture recognition techniques for different type of gestures. Here, two different methodologies are implemented for gesture recognition, which lead to two variations of this algorithm: a) Gesture Normalization and

Heuristics and b) Gesture Normalization and Pattern Matching. Both these variations follow the same approach for input point sampling and recognition techniques, which are discussed below in detail.

5.2 Algorithm concepts

5.2.1 Input Homogenization and Normalization

The gesture is a collection of raw touch points on the input canvas. However, nature of direct finger movement and canvas touch contribute towards inconsistent input point space (total raw touch points of a gesture and inconsistent point density), and thus, could lead to either inaccurate or inefficient gesture recognition. Sampling normalizes

24 and homogenizes the input point space, which is used later by the recognizer to perform gesture recognition in the sampled pattern.

Here, input homogenization and normalization is performed using a line-circle intersection algorithm, discussed below

Line Circle Intersection Algorithm

A line determined by two points (x1,y1), and (x2,y2) may a) not intersect a circle of radius r and center (0, 0) (Figure 16, left), b) generate a single point intersection ( Figure

16, middle), or c) intersect at two points (Figure 16, right).

Figure 16: Line circle interception algorithm4.

To obtain the common points of a circle and a line, circle and line equations are put together and solved, which creates another quadratic equation with the discriminant as: Δ = b2 – 4ac.

4 Image from URL :http://mathworld.wolfram.com/Circle-LineIntersection.html 25

Δ < 0 No intersection Δ = 0 Single point intersection Δ > 0 Two points intersection Table 1: Line circle interception

As shown in the Table 1, the discriminant value can easily identify a two-point interception, which is applied in this algorithm for sampling the input raw data.

Sampling Approach:

Figure 17: Gesture homogenization and normalization.

26

In this sampling approach, the goal is to get a path defined by equally distanced points, scaled to fit a 1x1 square. This can be obtained by iterating through the initial path, and finding intersections with a circle of radius r (Figure 17). This radius is simply determined by dividing the length of the initial path by the number of points desired in the resulting path. Because the number of points in the resulting path is predefined for

1x1 squares, this technique produces a consistent input space.

This sampling technique is very effective in terms of its ease of implementation and ability of changing the total number of points based on the application requirement.

However, the shortcoming of this technique is its weakness to acknowledge an improper density space due to equal distant point selection.

5.2.2 Gesture Recognition

As the previous “Data Manipulation” algorithm is used for implementing basic gestures like zoom, drag and rotate, I started to look for the simplest way to perform gesture recognition for basic gestures.

One observation is that some gestures are highly different from each other (e.g., zoom and rotate) and recognizing those gestures could be achieved by implementing a simple algorithm such as curve fitting using regression. One can easily distinguish between one and two finger gestures by using the Windows 7 Touch functionality. For each finger, Windows 7 touch events capture the predefined integer touch device IDs that distinguish between one and two fingers (e.g., drag vs. zoom/rotate) touch gestures. To distinguish between zoom and rotate, the linear least square curve fitting method is used.

27

Linear Least Square Curve Fitting

Figure 18: Linear least square5

The linear least squares fitting technique is the simplest and most commonly applied form of linear regression which provides a solution for finding the best fitting straight line through a set of points. The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity.

To find the best fit, the sum of the squares of the vertical distances between the line and the points need to be minimized [Figure 18].

As a mathematical equation, the error function E is:

5 Image from URL : http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29 28

As shown in the equation, E(m,c) function needs to be minimized where x and y values are the raw touch points. To get the m and c values, partial derivatives of E with respect to m and c are taken and set to 0 to get the minimum value. Once the m and c values are captured, these values are put into an error equation to get the minimum error. The mathematical representation for this is as below:

The derivative with respect to m

The derivative with respect to c

where,

and

29

Linear least square method in gesture recognition

One can intuitively understand that the zoom gesture consists of a point sequence which is linear and can be fitted as a straight line. This reasoning is used when distinguishing between zoom and rotate gestures.

When a linear least square curve fit technique is applied for the zoom gesture, the curve fitting error I got was in the range of 0.01 to 0.7. One could also predict that curve fit error for rotate gesture should be significantly high. This prediction was supported by my findings where the error margin for the rotate gesture was in the range 0.8 to 3.8.

Using this approach, a recognition technique was developed. When zoom and rotate gestures are drawn, two fingers draw two input point sequences. The recognition technique curve fit each point sequence and summed up the minimum error for both sequences.

Also, keep in mind that this algorithm uses the curve-fitting technique to perform gesture recognition, and thus, only a linear regression can be applied here. If a non-linear regression or multi-dimensional polynomial curve-fitting technique is used for gesture recognition, one cannot distinguish between the zoom and rotate gestures due to the inherit nature of a multidimensional polynomial technique to best fit the curve using higher dimensional polynomial.

30

5.3 Experimental Procedure and Software Design

5.3.1 Normalization and Heuristic

Figure 19: Normalization and Heuristic algorithm-package diagram.

This algorithm deals with the sampling and heuristic approach for gesture recognition. There are five main modules for this algorithm: GestureSampler,

GestureCreator, GestureImplementer, GestureDrawer and GestureRecognizer.

GestureSampler: This module is used for converting the raw touch point path to a simplified path (to fit a 1x1 square). As the gesture can be performed anywhere on the screen, gesture input space has co-ordinates over a long range. So, this module scales the gesture path to a 1x1 square (keeping all the values will be between 0 and 1).

31

GestureCreator: This module keeps a record of a particular gesture. It is simply done by keeping a list of all the points that the gesture travelled.

GestureImplementer: This module keeps a record of all touch gestures. If a user is creating multiple touch gestures by using multiple fingers (if supported by the hardware device), then this module keeps a record of all touch gestures by storing them in a dictionary with the touch ID as a key. This module also checks if a particular touch ID is already tracked or not.

Gesture Drawer: This module provides a visual representation of a gesture that is drawn on the canvas. Whenever any finger touches on the screen, finger movements are captured and drawn on the touch surface. This functionality is very important for gesture recognition systems because it provides a visual feedback to the user about screen touch.

Touch insensitivity and incorrect finger touch sometimes produce hindrance to capture the raw points touched by the user, resulting in a gesture not being fully drawn on the surface. In such scenarios, feedback of finger movement is very important.

GestureRecognizer: This module uses the linear least square fit algorithm discussed in the previous section for gesture recognition. Because the heuristic approach is followed in this algorithm, gestures are recognized based on the observation collected by doing the experiment. The process for gesture recognition is as follows:

Distinguish between one and two finger gesture.

32

o If it is one finger gesture, it is a drag gesture.

o If it is a two-finger gesture, it could either be a rotate or zoom gesture.

. Apply the linear least square curve fit algorithm to find the error

for the gesture performed by the user.

• If error is less than 0.6, then it is a zoom gesture.

• If error is greater than 0.6, then it is a rotate gesture.

5.3.2 Normalization and Pattern Matching

Instead of using a heuristic approach, this algorithm variation uses the pattern matching technique to perform the gesture recognition, and thus, it requires a data collection module for storing the gestures performed by various users. I have 250 (drag, zoom and rotate) gestures performed by 6 different users stored in the gesture data file, which consists of gesture name and its traversed touch point sequence. Each gesture is also curve fitted using the linear least square technique. The curve fit error and gesture type is added in the gesture error file.

The gesture error file format is as follows:

Gesture Name Curve Fit Error

Rotate 1.82334555

Zoom 0.455555555

Pattern Matching and Gesture Recognition technique:

o Curve fitting the error for the gesture needed to be recognized.

33

o Calculate the maximum zoom error, minimum rotate error, average zoom and

rotate error from the gesture error file.

o If the gesture error is less than maximum zoom error, the gesture is a

zoom gesture.

o If error is greater than minimum rotate error, gesture is a rotate gesture.

o If the error comes in between the maximum zoom and minimum rotate

. Calculate the average zoom error and average rotate error

• If gesture error is less than or equal to average zoom then

it is a zoom gesture otherwise it is a rotate gesture.

At this point maximum zoom and minimum rotate error are considered, because when zoom and rotate gestures are curve fitted, zoom error is always less than the rotate error.

34

Experimental Procedure and Software Design

Figure 20: Normalization and Pattern Matching –Package Diagram.

For this algorithm, GestureDrawer, GestureCreator, GestureSampler modules are the same. However, GestureImplementer and GestureRecognizer modules (Figure 20) do have changes.

GestureImplementer: Apart from the previous functionality, this module is also responsible for data collection and pattern matching. In the data collection, it asks users to enter gestures and store the sequence data into the gesture data file. It also creates the gesture error file and calculates the minimum, maximum and average errors for different gestures. 35

GestureRecognizer: This module is responsible for both implementing the linear least square regression method and for classifying gestures.

5.4 Results and Findings

Normalization and Heuristic: In this approach, as no data collection is required, six different users performed 200 gestures with the following results (Table 2, Figure 21):

o One-finger gesture (drag) was 95% recognized. 5% inaccuracy is due to touch

insensitivity in capturing the finger movement.

o Two finger gestures classified were classified with 87% accuracy.

Percentage Reason for inaccurate recognition 5% Touch surface did not capture the finger movement 5% Zoom to rotate (due to the recognition algorithm) 3% Rotate to zoom (due to the recognition algorithm) Table 2: Heuristic two finger gesture results

Normalization and Pattern Matching: For this variation, 250 drag, zoom and rotate gestures performed by 6 different users are stored in the gesture data and gesture error files with the following results (Table 3, Figure 21):

o One-finger gesture (drag) was 95% recognized. 5% inaccuracy is due to touch

insensitivity to capture the finger movement.

o Two finger gestures were classified with 90% accuracy.

36

Percentage Reason for inaccurate recognition 5% Touch surface did not capture the finger movement 3% Zoom to rotate (due to the recognition algorithm) 2% Rotate to zoom (due to the recognition algorithm) Table 3: Pattern matching 2 finger gesture results

Algorithm output and findings:

Figure 21: Results for Normalization and Pattern Matching Algorithm.

Initially when a user started to enter gestures, finger movements were not captured properly but as the user continued to use the touch surface, touch errors were reduced dramatically. Therefore, it took some practice for a new user to efficiently work on the touch surface. Other 5% gesture errors are encountered due to the constraint of the line-circle interception algorithm.

37

Figure 22: Gesture errors

For example, in Figure 22 the rotate gesture (a) was inaccurately identified as zoom due to producing a lower curve fit error. Similarly, in Figure 22 the zoom gesture

(b) was inaccurately identified as rotate due to producing higher curve fit error.

Gesture normalization and heuristic-based approach is easy to implement and does not require extra overhead of data collection and template matching processing.

Even though this algorithm is easy to implement, its results are still very effective for basic gesture recognition like drag, zoom and rotate.

The line-circle intersection algorithm works really well for normalizing or sampling the gesture data. Other complex sampling techniques like systematic sampling or stratified sampling do not work well here because of the greater complexity and random sample point selection criteria.

The liner least square curve fitting technique gives a good gesture recognition, but is only applicable for the gestures where gestures can be easily classified between single line and curve gestures. Therefore this algorithm cannot distinguish between two curve gestures like a circle and eclipse. The scope of this recognizer is very limited. There have

38 been some similar gesture algorithms developed in past few years with different recognition schemes. One algorithm discussed by Jacob O. Wobbrock, Andrew D.

Wilson and Yang Li [12] also uses the sampling and recognizing technique for gesture implementation. In their algorithm, sampling is done twice. The first sampling rate is determined by the sensing hardware and software, and then they resample the path. Our sampling technique is simpler and requires only one cycle of sampling. However, their recognition technique is quite effective and more versatile and can be used to recognize some customized gestures. They use the Euclidean distance to perform the pattern matching. The gesture is recognized by finding the closest point path sequence using

Euclidean distance.

39

Chapter 6: Generalized Gesture using The Hidden Markov Model

6.1 Introduction

This generalized algorithm uses the Hidden Markov Model to implement various gestures. The most important aspects of this algorithm are its gesture database and implementation of the Hidden Markov Model. In the algorithm, any kind of gesture database is created with the class label definition and gesture point input sequence. Once a database is created, Hidden Markov Model can train the system with the correct class label and classify the gestures.

Various gestures, which are implemented and recognized for this algorithm, are drag, zoom, and rotate gestures. Other gestures are customized gestures like star, left curly bracket, right curly bracket, question mark etc. This algorithm can also be used in handwriting recognition such as digits and letter recognition. To explain the algorithm, brief introduction of the Hidden Markov Model follows.

6.2 Hidden Markov Model overview

The Hidden Markov Model (HMM) is a powerful statistical tool for modeling generative sequences that can be characterized by an underlying process generating an observable sequence.

40

The Hidden Markov Model is a probabilistic model, which uses the Markov assumption. Under the Markov assumption, the values in any state are only influenced by the values of the state that directly preceded it. This is an important simplifying assumption, which reduces the complexity of planning sequences of actions. It allows the influence diagrams to remove states once they have left a specified time window because it is assumed that the removed state no longer has any effect on the current state.

Therefore, Markov Models are especially suited to model behaviors defined over time.

The model assumes these behaviors contain a fixed number of inner states.

6.2.1 Specification and parameter of an HMM n: Number of states

Q: {q1; q2; : : : ;qT} - set of states

M: The number of symbols (observables)

O: {o1; o2; : : : ;oT} - set of symbols

A: The state transition probability matrix where A= a1,1,a1,2,…,an,n. Each ai,j represents the probability of transitioning from state qi to qj.

aij = P(qt+1 = j|qt = i)

B: Observation probability distribution which is the probability of observation ot being emitted by qi

bj(k) = P(ot = k|qt = j) i = k = M

π: The initial state distribution

Full HMM is thus specified as a triplet: λ = (A, B, π, n)

41

The Hidden Markov Model deals with the three types of problem structures: evaluation, decoding and learning. This algorithm uses the learning model to solve the gesture recognition.

6.2.2 Learning Problem of Hidden Markov Model

“Given a training set of observations, determine optimum model”. One can think of this as finding λ, such that P(O|λ) is maximal.

Unfortunately, there is no known way to analytically find a global maximum. But it is possible to find a local maximum. Therefore, given an initial model, one can always find a model, such that .

So, the value of needs to be calculated.

To find the local maxima:

Consider a sequence of observations y = and a corresponding sequence of states x = [Figure 23].

42

Figure 23: Hidden Markov Model State Diagram.

The probability of any sequence of observations occurring when following a given sequence of states can be stated as:

Because probabilities of states are hidden from us, one has to consider all possible combinations of x in order to draw the probability of y. In other words, one has to marginalize the joint probability over x by summing over all possible variation of x:

This is the probability of an observation sequence which is the sum of the probabilities of all possible state sequences in the HMM. Obviously, considering all possible combinations would be quite intractable, even for small sequences or a small number of states.

To avoid this problem the forward-backward (or Baum-Welch) algorithm is used.

The forward algorithm computes a matrix of state probabilities, which can be used to assess the probability of being in each of the states at any given time in the sequence.

43

Similarly, the backward algorithm computes the backward probability (the probability of being in state qi, given the partial observation ot+1,…,oT) in the opposite direction.

Therefore, to find local maximum of , the iterative expectation- maximization algorithm is used. Hence, using an initial parameter instantiation, the forward-backward algorithm iteratively re-estimates the parameters and improves the probability that the given observation is generated by the new parameters.

6.3 The Hidden Markov Model in Gesture Recognition

The Hidden Markov Model uses hidden states. To understand this, consider the zoom gesture. In the zoom gesture, at first two fingers touch the screen, and then, they start moving far apart (Figure 24). So, for the zoom gesture, observations are the raw touch points. For the hidden states, one could intuitively think of the zoom gesture as containing three different states: a) the finger first touches the screen b) ensuing finger movement; and lastly, c) the finger stops. These stages occur in this particular order to create a full gesture. To train the HMM, a trial and error method is used to select the total number of states.

Figure 24: HMM states and touch gestures.

44

For this algorithm, Accord.NET Framework6 is used for HMM implementation, which provides tools and libraries for scientific computing applications, such as statistical data processing, machine learning, pattern recognition etc. The framework offers a large number of probability distributions, hypothesis tests, kernel functions and support for most popular performance measurements techniques.

Some important classes which are used in Hidden Markov Model for learning and classifying gestures are given below:

HiddenMarkovClassifierLearning : This class acts as a teacher for classifiers based on Hidden Markov Models (Figure 25). This learning technique uses a set of Hidden Markov Models to classify sequences of real (double-precision floating point) numbers or arrays of those numbers. Each model will try to learn and recognize each of the different output classes.

6 https://code.google.com/p/accord/ 45

Figure 25: Hidden Markov Model Learning Classifier7

Thus, in Accord.NET Framework HMM is trained individually for each class.

When all models are trained, the probability of the unknown-class sequence is computed for each model and output is selected by determining the highest probability of a most likely class.

BaumWelchLearning : The Baum-Welch algorithm is a kind of expectation-maximization algorithm. For continuous models, it estimates the matrix of state transition probabilities “A” and the vector of initial state probabilities “pi”. For the state emission densities, it weights each observation and lets the estimation algorithms for each of the densities fit the distributions to the observations.

7 Image from URL: http://code.google.com/p/accord/source/browse/trunk/Sources/Accord.Docs/?r=359#Acc ord.Docs%2FAccord.Documentation%2FDiagrams%2FClasses 46

Accord.Statistics.Models.Markov.Topology:

Forward Topology: Forward topologies are commonly used to initialize models in which training sequences can be organized in samples, such as in the recognition of gesture input point sequence (Figure 26).

Figure 26: HMM Forward Topology

6.4 Experimental Procedure and Software Design

This experimental procedure is created by extending “Sequence Classifiers in C#

- Part I: Hidden Markov Models” by Cesar de Souza8.

There are five main modules in this experiment: Sequence, GestureDrawer,

Database, GestureImplementer, and MainWindow (Figure 27).

8 http://www.codeproject.com/Articles/541428/Sequence-Classifiers-in-Csharp-Part-I- Hidden-Marko?msg=4512017#xx4512017xx 47

This design uses the .XML file to store the gesture information. A gesture can be represented as a “sample” where each sample stores the class label (which is the class category provided to HMM and can be represented as a gesture name such as zoom, rotate, etc.), input point sequence (source path) and the output value.

Figure 27: Generalized Gesture using HMM Package Diagram

Sequence: The Sequence module represents each “sample” stored in the XML file. The Sequence module gets the source path and preprocesses a sequence.

GestureDrawer: This module helps in drawing the gesture when the user touches the screen. Here, a single gesture starting from the touchdown to touchup event is represented by a “GestureDrawer” object where each “GestureDrawer” object consists of

48 a list of touch points and color used by the object. To draw by multiple fingers simultaneously, GestureDrawer object also stores the finger touch ID so that each finger can draw its own drawing without interfering with another finger drawing.

Database: The database module is a data collection module whose sole purpose is to operate on the corpus used in this application. This module deals with functionality like open a gesture database, add new gestures into the database or create a new gesture database. For this algorithm, the gesture database uses the XML file as a data source where information about each gesture (like its class labels information, source path/ raw touch point sequence information, and output value) is stored and manipulated.

GestureImplementer: This is the module where touch events are implemented and raw touch points are manipulated to create a sequence for each gesture. If two finger gestures are implemented like zoom and rotate, two sequences are created for each finger and concatenated together to make a single sequence. This module also calls the

GestureDrawer module to draw the gestures.

MainWindow: It is the purpose of the MainWindow module to make the system learn and classify gestures using HMM implementation. To create the Hidden Markov

Model accord.NET “HmmClassifier” is used. The fundamental details of the HMM classifier are explained in the previous section. hmm = new

HiddenMarkovClassifier(numberOfClasses, new

Forward(states), new MultivariateNormalDistribution(2), classes);

49

Because states are hidden in HMM, the trial and error method is used to determine the total number of states. Here, for this algorithm, the total states are choosen as 5. Once the model is created, the classifier is trained and learning is developed. When the learning is done, any new gesture drawn by the user can be recognized by the algorithm.

var teacher = new

HiddenMarkovClassifierLearning(hmm, i => new

BaumWelchLearning(hmm.Models[i]) {

Tolerance = tolerance,

Iterations = iterations,

FittingOptions = new NormalOptions()

{ Regularization = 1e-5 } } );

6.5 Results and Findings

The Hidden Markov Model algorithm is highly generalized and can recognize from simple to customized gestures (Figure 28).

50

Figure 28: Rotate gesture recognition after HMM training

Three types of gesture databases were created to test this algorithm (Figure 29).

The first database defined a digit database (which only used one finger touch), which consisted of 0 to 9 digits. The second database dealt with customized symbols (star, left curly bracket, right curly bracket, question mark, zoom and delta) which is drawn using one and two fingers. The last database performed drag, zoom and rotate gestures.

51

Figure 29: Three implemented databases for this algorithm

Digit database: For this database, 6 different users drew a total 450 digits from 0 to 9. The algorithm produced 80% correct results which are shown in Table 4.

52

Percentage Actual gesture Recognized as 90% 1 Empty database entry problem 2% 2 Three 1% 3 Nine 4% 4 Eight and nine 2% 5 Six and nine 3% 6 Eight and zero 0% 7 - 3% 8 Zero 1% 9 Four 4% 0 Six Table 4: Results for digit database

Customized gesture database: For this database, 6 different users drew 300 gestures. The algorithm results are displayed in Table 5.

Percentage Actual gesture Recognized as 0% Left curly bracket ---- 0% Right curly bracket ----- 0% Star ---- 0% Question Mark ---- 3% Zoom Rotate 2% Rotate Zoom 2% Delta Rotate Table 5: Results for customized database

DragZoomRotate database: For this database, 6 different users drew 200 gestures. The algorithms results are captured in Table 6.

53

Percentage Actual gesture Recognized as 0% Drag ------3% Zoom Rotate 3% Rotate Zoom Table 6: Results for DragZoomRotate database

Of the three databases, the digit database gave the most inaccurate results. This may be explained by the fact that each user can draw digits very differently. Also, inconsistent drawing creates similarities among the various digits (e.g., similarity between 4 and 9, 6 and 8, 0 and 6 etc).

Sometimes, due to touch screen insensitivity and incorrect finger touch, gesture drawings were not captured properly. However, as the user continued to use the touch surface, touch errors were reduced dramatically. Therefore, some practice is required for a new user to efficiently work on the touch surface.

There are also some downsides with this algorithm implementation. The learning algorithm takes some time to train the system with a large database. Thus, a performance criterion comes into play. It takes around 1 minute to train the system for 10 class labels and 450 gestures. Also, it requires a good learning curve and understanding of artificial intelligence domain to implement the Hidden Markov Model.

54

Chapter 7: Comparison, Conclusion and Future Work

7.1 Algorithm selection and distinction based on two paradigms of gesture recognition:

7.1.1 Gestures for manipulation and navigation:

The “Direct Manipulation” algorithm is best suited for this paradigm as it focuses more on gestures which require manipulation and navigation properties. The algorithm operates for basic gesture support and covers the highly commercialized application domain such as touch development in mobile and small touch devices.

7.1.2 Gesture based language:

The “Normalization and Gesture Recognition” algorithm can fit into both

(“Gestures for manipulation and navigation” and “Gesture-based language”) paradigms based on the implementation of its recognition module. However, if implemented with finer recognition module, its applicability is more towards the gesture-based language paradigm. The algorithm “ Generalized gestures using HMM” comes under gesture-based language paradigm because this algorithm focuses more on gesture recognition and provides finer interpretation of gestures which can be used for gesture language development.

55

Comparison of two algorithms:

o Normalization and gesture recognition:

. This algorithm divides the gesture recognition into very fine modules like

input space sampling, corpus collection of gesture database and gesture

recognition. These concepts can be applied into higher dimension gesture

domain and real time body and motion tracking systems.

. This algorithm does not require an outside library support for

implementing customized gestures and uses a very simple approach for

sampling and gesture recognition.

. However, one shortcoming of the algorithm is its limited scope and

inability to define the time continuity.

o Generalized gestures using HMM:

. This is a highly flexible algorithm. It can cover a huge application domain

starting from developing basic gestures to many customized gestures and

can be used for any 2D finger gesture development like handwriting

recognition, or digit recognitions.

. Various online libraries are available to implement HMM and can be used

easily for gesture recognition.

. This algorithm takes time continuity into account, which is very important

to distinguish certain gestures (such as left and right, up and down etc.).

This time continuity also plays an important role for 3D gesture

56

development, and thus, this algorithm is best suited for the future work as

a 3D gesture development extension.

7.2 Findings

The finger gesture recognition algorithm is a thread-based program where each finger gets assigned to a touch device ID and therefore gets assigned to a thread automatically. When the touch events get fired based on the user touch device interaction, events are fired in the touchdown, touchmove and touchend order where the touch move event is fired continuously until the finger moves on the screen. As each finger thread is controlled automatically, manual thread synchronization and customization sometimes becomes a challenge for the developer and lead to the undesirable or incorrect results.

It is important not to disregard the use of multiple fingers but to introduce the complexity involved in the gesture manual customization of touch events when multiple fingers are used to perform gestures. As the main idea behind the touch surface gesture is to provide higher touch experience satisfaction to the user, the unwanted, or additional finger touch, sometimes becomes a hazard for the user when performing the gesture on a touch surface. Therefore, the main purpose of the developer should be to create a gesture system in the most simplified way with minimum finger touch requirement( i.e., one finger drag is easier to perform rather than four finger drag). This would not only provide more user friendliness but also lead to the easy development of the touch application.

57

7.3 Conclusions

Most of the times gestures applications are considered a specialized task to implement various complex techniques. Higher knowledge-driven methodologies have been implemented to recognize or perform these gestures. However, no clear distinction is provided between applications, which require gesture recognition and gesture manipulation separately. The two gesture paradigms introduced in this thesis not only helps in the algorithm selection for 2D gesture application but also reduces the gesture research domain complexity.

The thesis also shifts the focus and look for easier methods to perform gestures.

There are certainly some scenarios where complex techniques are required for gesture development but not every gesture recognition application requires the use of complex techniques like Hidden Markov Model and simple gesture applications can be implemented by applying as simple an approach as a linear regression, Euclidian distance, or mathematical vector calculations.

7.4 Future Work

The three algorithms provide a great opportunity to improve gesture application.

o The first improvement would be to provide an extension to support more gestures

(e.g., flick, one finger tap, two finger tap, cut- copy -paste etc).

o Another improvement is to implement a better recognition module for the

“Normalized and Pattern Matching” algorithm to make it more suited for gesture

based language paradigm.

58 o As this thesis closely focuses on touch surface based recognition, another

improvement could be the extensive research on the tabletop applications and its

hardware/software technologies. o In the future, converting from 2D domain to 3D domain would be greatly adviced,

and subsequently trying to implement “Normalization and Gesture Recognition”

or “Generalized Gesture using HMM” algorithms in 3D space. The pattern-

matching algorithm can be extended by creating the small image chunks as

templates, and applying the pattern matching approach to perform the gesture

recognition.

59

References

[1] Kin, Kenrick, Maneesh Agrawala, and Tony DeRose. "Determining the benefits of direct-touch, bimanual, and multifinger input on a multitouch workstation." Proceedings of Graphics interface 2009.

Canadian Information Processing Society, 2009.

[2] Balakrishnan, Ravin, and Ken Hinckley. "Symmetric bimanual interaction." Proceedings of the

SIGCHI conference on Human factors in computing systems. ACM, 2000.

[3] Dang, Chi Tai, Martin Straub, and Elisabeth André. "Hand distinction for multi-touch tabletop interaction." Proceedings of the ACM international Conference on interactive Tabletops and Surfaces.

ACM, 2009.

[4] Holz, Christian, and Patrick Baudisch. "The generalized perceived input point model and how to double touch accuracy by extracting fingerprints." Proceedings of the 28th international conference on

Human factors in computing systems. ACM, 2010.

[5] Jiang, Yingying, et al. "Unistroke gestures on multi-touch interaction: supporting flexible touches with key stroke extraction." Proceedings of the 2012 ACM international conference on Intelligent User

Interfaces. ACM, 2012.

[6] Buxton, William, and Brad Myers. "A study in two-handed input." ACM SIGCHI Bulletin. Vol. 17. No.

4. ACM, 1986.

[7] Wu, Ying, and Thomas S. Huang. "Vision-based gesture recognition: A review." Urbana 51 (1999):

61801.

60

[8] Diedrichsen, Jörn, et al. "Moving to directly cued locations abolishes spatial interference during bimanual actions." Psychological Science 12.6 (2001): 493-498.

[9] LaViola, Joseph. "A survey of hand posture and gesture recognition techniques and technology."

Brown University, Providence, RI (1999).

[10] Rekimoto, Jun. "SmartSkin: an infrastructure for freehand manipulation on interactive surfaces."

Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves. ACM, 2002.

[11] Wu, Mike, and Ravin Balakrishnan. "Multi-finger and whole hand gestural interaction techniques for multi-user tabletop displays." Proceedings of the 16th annual ACM symposium on User interface software and technology. ACM, 2003.

[12] Wobbrock, Jacob O., Andrew D. Wilson, and Yang Li. "Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes." Proceedings of the 20th annual ACM symposium on

User interface software and technology. ACM, 2007.

[13] Henry, Tyson R., Scott E. Hudson, and Gary L. Newell. "Integrating gesture and snapping into a user interface toolkit." Proceedings of the 3rd annual ACM SIGGRAPH symposium on User interface software and technology. ACM, 1990.

[14] Landay, James A., and Brad A. Myers. "Extending an existing user interface toolkit to support gesture recognition." INTERACT'93 and CHI'93 Conference Companion on Human Factors in Computing Systems.

ACM, 1993.

[15] Wu, Mike, et al. "Gesture registration, relaxation, and reuse for multi-point direct-touch surfaces."

Horizontal Interactive Human-Computer Systems, 2006. TableTop 2006. First IEEE International

Workshop on. IEEE, 2006.

[16] Mistry, Pranav, Pattie Maes, and Liyan Chang. "WUW-wear Ur world: a wearable gestural interface."

CHI'09 extended abstracts on Human factors in computing systems. ACM, 2009.

61

[17] Wilson, Andrew D. "TouchLight: an imaging touch screen and display for gesture-based interaction."

Proceedings of the 6th international conference on Multimodal interfaces. ACM, 2004.

[18] Yang, Ming-Hsuan, and Narendra Ahuja. "Recognizing hand gestures using motion trajectories."

Face Detection and Gesture Recognition for Human-Computer Interaction. Springer US, 2001. 53-81.

[19] Sturman, David J., and David Zeltzer. "A survey of glove-based input." Computer Graphics and

Applications, IEEE 14.1 (1994): 30-39.

[20] Wang, Robert Y., and Jovan Popović. "Real-time hand-tracking with a color glove." ACM

Transactions on Graphics (TOG). Vol. 28. No. 3. ACM, 2009.

[21] Lee, J. Chung. "Hacking the nintendo wii remote." Pervasive Computing, IEEE 7.3 (2008): 39-45.

[22] Bellucci, Andrea, et al. "Human-display interaction technology: Emerging remote interfaces for pervasive display environments." Pervasive Computing, IEEE 9.2 (2010): 72-76.

62