Department of Computer Science & Software Engineering

CONCORDIA UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE & SOFTWARE ENGINEERING COMP 471 Computer Graphics: Real-time Video FALL 2006 ______

SEMA4 Real-time Semaphore Translator

Due Date: December 11, 2006

Team Members:

Erum Asghar 5256771 Jhumur Banik 5006740 L’Emir-Nader Chehab 4824180 Abolfazl Rabie 4848853 `

Concept

A practical application that maps semaphore signals to readable characters.

Abstract

Communicating information through hand gestures has been done for thousands of years; whether it is for simple greetings or complex dialog. Humans express time-varying patterns such as different hand positions in order to convey a message to a recipient. However, this type of communication has always been restricted to human-human interaction. Only recently have we begun to integrate this type of interaction into machines. The fields of computer vision and pattern recognition are just getting started, but already offer exciting possibilities. This paper describes the Sema4 Translation System, which is an attempt to implement a system that recognizes simple arm positions and translate them into letters.

2 `

Acknowledgments

First of all, the Sema4 team would like to thank Professor Sha Xin Wei for giving us the opportunity to work in a graphics environment and introducing us to the very interesting field of real-time video. We all have enjoyed learning about Max/Jitter and developing our human-interactive system Sema4.

Professor Sha Xin Wei http://www.lcc.gatech.edu/~xinwei/

Our team is also thankful to Paul Kolesnik, Research Student in McGill University, who provided us the HMM (Hidden Markov Model) patch and helpful documentation about this tool. With his kind help, we learned the basic concepts of the Markov Model, a tool for pattern and speech recognition.

Paul Kolesnik is Research Student at McGill University http://www.music.mcgill.ca/~pkoles/

Our Website Address

Our website can be found at http://www.sema4.tk It contains various information about the project, the sema4 patch, screenshots and a video demonstration of the patch in action.

Team Members and Roles

This is our team in alphabetical order:

Erum Asghar: Jitter Implementation, Mathematical Analysis and Documentation Jhumur Banik: Jitter Implementation, Installation, Algorithms and Documentation Nader Chehab: Jitter Implementation, Web Design, Algorithms and Documentation Abolfazl Rabie: Jitter Implementation, Mathematical Analysis and Documentation

3 `

Table of Contents

1. Project Overview ………………………………………………. 6

2. Installation ……………………………………………… 7

3. Scope of Application …………………………………………….. 8 3.1 Military Communication …………………………………. 8 3.2 Traffic Control ……………………………………………. 8

4. Technical Aspects ………………………………………………… 9 4.1 Thresholding………………………………………………… 10 4.2 Largest Blob Isolation ……………………………………… 10 4.3 Blob Bounding ……………………………………………… 10 4.4 Face Detection ……………………………………………… 11

5. Region Segmentation: Mathematical Analysis…………………… 12 5.1 Four Symbols ………………………………………………. 12 5.2 Nine Symbols ……………………………………………….. 13 5.3 Sixteen Symbols ……………………………………………. 14 5.4 Twenty-five Symbols ……………………………………….. 15 5.5 More than Twenty-five Symbols ……………………………. 15

6. Hidden Markov Models.…………………………………………… 15 6.1 Overview …………………………………………………….. 15 6.2 Formal Definition ……………………………………………. 16 6.3 Learning and Recognition with HMM ………………………. 16 6.4 Advantages of HMM ………………………………………… 17 6.5 Disadvantages of HMM ……………………………………… 18

4 `

7. Conclusion……………………………………………………………. 19

8. References …………………………………………………...... 20

List of Figures

Figure 1: The Semaphore Signalling System …………………………. 5 Figure 2: The Sema4 patch in the lab environment ………………….. 6 Figure 3: Military communication ………………………………….. 7 Figure 4: Semaphore signals in Aviation……………………………….. 7 Figure 5: Sema4 Flow Chart …………………………………………… 8 Figure 6: Four Symbols ………………………………………………… 11 Figure 7: Nine Symbols ………………………………………………… 12 Figure 8: Thirteen Symbols …………………………………………….. 13 Figure 9: Example of Hidden Markov Models ……………………….. 16

5 `

1. Project Overview

Sema4 is an innovative translation system designed by a team of the Real-time Video (COMP 471) students, at Concordia University. The goal of the project is to automate the interpretation of a semaphore code captured by a camera using basic image processing techniques. Some of these techniques involve color detection, color filtering, object tracking and face detection. They will be described in more detailed in the Technical Aspects section.

6 `

Figure 1: The Semaphore Signalling System

2. Installation

The camera is positioned approximately 2.5 meters away from the actor who will be wearing colored gloves and performing semaphore signals with his arms. The corresponding letter is then displayed on a computer screen or projected on the wall. The actor can then proceed to signal the next character. In case a projector is used, the actor and the camera should not interfere with the projected image in order to avoid a recursive effect.

7 `

Figure 2: The Sema4 patch in action in the lab environment

3. Scope of Application

3.1 Military Communication

Semaphore is mostly used throughout the world by naval military forces to communicate between ships whether in dock or at sea. It involves personnel that should be able to perform and understand the code precisely. Since humans are vulnerable to making mistakes serious consequences could result in case of misinterpretation. The goal of the Sema4 project is to increase reliability by using a machine for translation instead of a human interpreter.

8 `

Figure 3: Military communication

3.2 Traffic Control

Semaphore is currently used in airports to maneuver aircrafts. In the future, a camera can be embedded in aircrafts and detect signalmen on the ground instead of relying on the pilot’s vision. This can increase reliability.

Figure 4: Semaphore signals in Aviation

Military signalmen use hand and body gestures to direct flight operations aboard aircraft carriers

4. Technical Aspects

The primary tool used to implement the project is the Max/Jitter environment which provides real time video processing. In this section we explore each of the techniques used.

9 `

Figure 5: Sema4 Flow Chart

4.1 Thresholding (Color Isolation) The actor will be wearing two different color gloves. The goal of color detection is to isolate each glove from the rest of the image. By using the chromakey object from cv.jit library, we managed to isolate the gloves from the background.

Algorithm for Thresholding (pseudo-code):

10 `

Convert RGB to HSV Color Space (Hue, Saturation, Value) FOR each pixel of the HSV input frame If Hue is between MinHue and MaxHue AND Saturation is between MinSaturation and MaxSaturation Make the pixel white ELSE Make the pixel black

The values of MinHue, MaxHue, MinSaturation and MaxSaturation depend on the color of each glove.

4.1 Blob bounding The cv.jit.blobs.bounds object is used to find location of each blob received from the chromakey object. The coordinates are calculated by finding the center of the bounding box.

4.2 Largest Blob Isolation At this point we need to eliminate all blobs which are not the gloves. In addition the image might contain scattered pixels (noise) of the same colors as the gloves. We assume that the gloves are the largest blobs of that specific color. For example, if the gloves are blue, we assume that they are the largest blue blobs in the image. The disadvantage of using this method is that the algorithm won’t work if there are blue objects that are larger than the gloves. For selecting the largest blob, we use the cv.jit.label object.

Algorithm for Largest Blob Isolation (pseudo-code)

AREA = 0 FOR each blob in the input image IF the blob’s area is larger than AREA then AREA = blob’s area

11 `

4.4 Face Detection We decided to calculate the hands’ coordinates relative to the face. This ensures that the subject’s body movements do not affect the relative position of the gloves. Face detection was done using the cv.jit.face object from which we determined the face’s coordinates. Because of a lack of technical details regarding the face detection object found in the cv.jit library, we will describe a commonly used face detection technique which might be similar to the algorithm used in our project.

The first step in most face detection algorithms is color-based segmentation. The goal is to remove the maximum number of pixels that do not belong to a face in order to narrow the search window. The HSV color space is most widely used for this purpose.

The likelihood that a pixel (with hue H) belongs to the class of skin pixels is given by

where the Nc component Gaussian mixture-model is specified by the parameter set

of mixture-weights ωi, means μi, and variances learned from a set of training data.

If the size of any skin-colored region is below the minimum expected sizes of the head, the region is considered as noise and removed from consideration. A closing operation is then performed, and the next smallest objects are then removed. A significant portion of the remaining non-face objects are thin and long, such as arms. The aspect ratio of these objects may be found to selectively remove them. The resulting regions most likely represent the faces. Unfortunately, when there are more than one face to recognize, it is possible to have connected faces. To separate the faces erosion is used to break up those connections.

5. Region Segmentation: Mathematical Analysis

12 `

Depending on the number of symbols that need to be recognized, different approaches are recommended. The screen is divided into a certain number of regions. Depending on which region the hands are in, a different symbol is displayed.

Legend: lhx = left hand x coordinate lhy = left hand y coordinate rhx = right hand x coordinate rhy = right hand y coordinate fx = face x coordinate fy = face y coordinate

5.1 Four Symbols If we intend to recognize four symbols (ex: A, B, C, D) We divide the screen into 4 different regions as shown in the picture.

Figure 6: Four Symbols

13 `

5.2 Nine Symbols We use the slopes of the diagonal lines which are 1 and -1 and calculate the slopes of the arms using the follow formula: (fy-lhy)/(fx-lhx)

Figure 7: Nine Symbols By comparing the slopes and the coordinates, we determine the region.

14 `

In case of a division by zero, we add a small number to zero. (ex: 0.0001)

5.3 Sixteen Symbols We use the same approach as 2. but we use four lines instead of two lines. This involves more comparisons.

Figure 8: Thirteen Symbols

15 `

5.4 Twenty-five Symbols We can use a similar approach as for sixteen symbols. This method is recommended to show the English alphabet, even though we have 26 symbols. We can show the 26th symbols by breaking the rule that right hand should stay only on right side.

5.5 More than Twenty-five Symbols For more than 25 symbols, we recommend using Hidden Markov Models (HMMs), because it's impractical to divide the screen into more regions. The advantage of using HMM is the ability to capture an arbitrary number of symbols, without the need of additional programming. However, HMM requires a great deal of training for each symbol. The following section is a brief explanation of Hidden Markov Models.

6. Hidden Markov Models

6.1 Overview Hidden Markov Models allow us to estimate the probability of unobserved events. They are composed of a finite set of states, each of which is associated with a probability distribution. Transitions among the states are governed by a set of probabilities called transition probabilities. In regular Markov models, the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but variables influenced by the state are visible. In a particular state an outcome or observation can be generated, according to the associated probability distribution. It is only the outcome, not the state that is visible to the external observer and therefore states are “hidden'' to the user; hence the name “Hidden Markov Model”. HMMs are very common in computational linguistics. In speech recognition, for example, the observed data is the acoustic signal and the words are the hidden parameters.

16 `

6.2 Formal Definition N, number of hidden states

S {S1 , S 2 ,..., S N }

M , number of possibleobservations

V {v1 ,v2 ,...,vM }

A{aij }, thetransition probabilities aij  P(qt1  S j | qt  Si ) 1 i, j  N

B {b j (k)}, theoberservation probabilities b j (k)  P(Ot vk | qt  S j ) 1 j  N 1 k  M

Ot is the value of observation at timestep t;

 { i }, the prior distribution over hidden states

 i  P(q1  Si ) 1i  N

6.3 Recognition and Learning with HMM The discrete form of HMM is represented by three matrices: λ(A, B, π).

 The matrix A = aij specifies the probability that the internal state will change from i to j.

 The matrix B = bjk represents the probability that the system will generate the observable output symbol k on transition to state j.  A vector π indicating the distribution of probability that any given state is the initial state of the hidden Markov process.

There are three problems associated with HMM:

Evaluation: Determining the probability of an observation sequence

Decoding: Finding the state sequence that best explains the observations.

17 `

Learning: Adjusting the model parameters to maximize the probability of an observation given λ.

The evaluation problem is not a main concern in our project since we are only dealing with a vector of 3 points (face, left and right hands) which results in 33 = 9 possibilities. However we are more concerned with the learning problem because we have many possible hands and face positions in each region which should be recognized.

Figure 9: Example of Hidden Markov Models

6.4 Advantages of HMM 1. Efficient algorithms for inference and parameter estimation available. 2. Often troublesome issue of segmentation is seamlessly integrated into the inference algorithm.

18 `

6.5 Disadvantages of HMM

 The structure of the HMM is not always obvious. Some domain knowledge is required to construct the states and transitions that an HMM can take.

 They do not have fast learning curves. This is usually because there are a large number of parameters that must be set.

 They do not automatically cope with multiple variations within the same class.

19 `

7. Conclusion

As we have seen, different approaches can be taken to implement a semaphore recognition system. We have chosen to use the region segmentation technique which gave very accurate results for 9 symbols, under normal lighting conditions. In the future, it is possible to extend the system to recognize the entire alphabet. Another possible extension is a buffer that saves previous signals, which will allow an actor to write messages on the screen, or alternatively, give commands to the operating system like launch a web browser. Through this project, we have learned various important video-processing and computer vision techniques such as thresholding, blob tracking, face recognition and relevant mathematical concepts like Hidden Markov Models.

20 `

8. References

- Hidden Markov Models, an Introduction, Guillaume Infantes, 2006

- A simple digital semaphore decoder using DSP processor: http:// ieeexplore.ieee.org/iel5/7129/19251/00892267.pdf

- Implementation of the Discrete Hidden Markov Model in Max / MSP Environment: http://www.music.mcgill.ca/musictech/idmil/papers/FLAIRS2005_Kolesnik.pdf

- Wikipedia article on HMM: http://en.wikipedia.org/wiki/Hidden_Markov_model

- Hidden Markov Models http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html

- The Semaphore Signalling System: http://www.anbg.gov.au/flags/semaphore.html

- Wikipedia article on Semaphores: http://en.wikipedia.org/wiki/Semaphore#Flag_semaphore_system

- A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition: http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf

- Semaphore Java applet: http://www.wxs.ca/applets/semaphore/

- English using Semaphore Flags: http://www.travelphrases.info/languages/EnglishUsingSemaphoreFlags.htm

- Differential video coding of face and gesture events in presentation videos, Robin Tan and James W. Davis, 2002

- Face detection and gender recognition, Michael Bax, Chunlei Liu, and Ping Li, 2003

21