A Gestural Interface in a Computer-Based Conducting System
Total Page:16
File Type:pdf, Size:1020Kb
A Gestural Interface in a Computer-based Conducting System A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements For the Degree of Master of Science In Computer Science University of Regina By Lijuan Peng Regina, Saskatchewan October, 2008 ©Copyright 2008: L. Peng Library and Archives Bibliotheque et 1*1 Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition 395 Wellington Street 395, rue Wellington OttawaONK1A0N4 OttawaONK1A0N4 Canada Canada Your file Votre r6f6rence ISBN: 978-0-494-55050-2 Our file Notre inference ISBN: 978-0-494-55050-2 NOTICE: AVIS: The author has granted a non L'auteur a accorde une licence non exclusive exclusive license allowing Library and permettant a la Bibliotheque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, preter, telecommunication or on the Internet, distribuer et vendre des theses partout dans le loan, distribute and sell theses monde, a des fins commerciales ou autres, sur worldwide, for commercial or non support microforme, papier, electronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats. The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette these. Ni thesis. Neither the thesis nor la these ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent etre imprimes ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author's permission. In compliance with the Canadian Conformement a la loi canadienne sur la Privacy Act some supporting forms protection de la vie privee, quelques may have been removed from this formulaires secondaires ont ete enleves de thesis. cette these. While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n'y aura aucun contenu removal does not represent any loss manquant. of content from the thesis. 1+1 Canada UNIVERSITY OF REGINA FACULTY OF GRADUATE STUDIES AND RESEARCH SUPERVISORY AND EXAMINING COMMITTEE Lijuan Peng, candidate for the degree of Master of Science in Computer Science, has presented a thesis titled, A Gestural Interface in a Computer-Based Conducting System, in an oral examination held on September 22, 2008. The following committee members have found the thesis acceptable in form and content, and that the candidate demonstrated satisfactory knowledge of the subject material. External Examiner: Dr. Thomas J. Conroy, Faculty of Engineering Supervisor: Dr. David Gerhard, Department of Computer Science Committee Member: Dr. Daryl Hepting, Department of Computer Science Committee Member: Professor Brent Ghiglione, Department of Music Chair of Defense: Dr. Pauline Minevich, Department of Music Abstract Over the past few years, a number of computer-based conducting systems have been designed and implemented. However, only a few of them have been developed to help a user learn and practice musical conducting gestures. Few systems provide both visual representation for a conducting gesture and aural feedback. This thesis is intended to address research related to this area. It focuses on a gestural interface designed and developed for a computer-based conducting system. This gestural interface utilizes an infrared technique to track the motions of the right arm and an acceleration sensor for the gestures of the left arm. The infrared sensor enables the system to be used in a natural environment and has little influence on the conducting. The gesture recognition is based on the inherent characteristics of conducting gestures including positions and amplitudes. It is an accurate and relatively simple process. The conducting is interpreted using a few visual items that show a conducting gesture very clearly and straightforwardly reveal its qual ity. In addition, this gestural interface supports both tempo following and dynamics following and provides straightforward visual representations for them. The aural representation included in the interface is to inform users of the occurrence of beats i or errors. ii Acknowledgements I would like to take this opportunity to express my sincere gratitude to my supervisor, Dr. David Gerhard. His guidance, encouragement, and financial aid ensured the completion of my thesis. His invaluable suggestions and comments placed me on the right path and were essential to the process of completing my graduate studies. I also want to thank the members of my thesis committee, Professor Brent Ghiglione and Dr. Daryl Hepting, for their time and comments. In addition, I would like to thank the University of Regina, the Department of Computer Science, and the Faculty of Graduate Studies and Research for giving me the opportunity to study here and for providing me with financial support. Finally, I would like to thank my family for their love and for always being there for me. in Contents Abstract i Acknowledgements iii List of Tables viii List of Figures x 1 INTRODUCTION 1 1.1 Motivation and contribution 2 1.2 Thesis overview 3 2 BACKGROUND AND RELATED RESEARCH 4 2.1 Human Computer Interface 4 2.2 Gestural interface in musical systems 7 2.3 Visual representation of musical parameters 8 2.4 Conducting 9 2.5 Overview of computer-based conducting systems 11 2.5.1 Summary table 11 iv 2.5.2 Summary 12 3 DESIGN OF A GESTURAL INTERFACE 18 3.1 Gestures 18 3.1.1 Conducting gestures 19 3.1.2 Beat patterns 20 3.1.3 Dynamics 21 3.2 Gesture tracking 22 3.3 Gesture analysis 22 3.3.1 Segmentation 23 3.3.2 Feature extraction 24 3.4 Gesture recognition/following 25 3.4.1 Recognition 25 3.4.2 Following 27 3.5 Response 28 3.5.1 Visual representation 28 3.5.2 Aural representation 29 4 IMPLEMENTATION AND EVALUATION 31 4.1 Development and runtime environment 31 4.1.1 Wii Remote 32 4.1.2 A baton-like infrared stick 35 4.1.3 WiTiltv2.5 37 4.1.4 Software 39 v 4.2 Main window 40 4.3 Gesture tracking 42 4.4 Gesture analysis 44 4.4.1 Coordinates 44 4.4.2 Beats 44 4.5 Gesture recognition/following 46 4.5.1 Separate beat pattern recognition and accuracy 46 4.5.2 Mixed beat pattern recognition and accuracy 50 4.5.3 Tempo tracking and accuracy 52 4.5.4 Dynamics tracking 53 4.6 Aural Representation 55 5 DISCUSSION 57 5.1 Video camera 57 5.1.1 Segmentation 58 5.1.2 Feature extraction 60 5.1.3 Comparison between video camera and Wii Remote 62 5.2 WiTilt v2.5 64 5.2.1 Gesture tracking 64 5.2.2 Feature extraction 65 5.2.3 Gesture recognition 66 5.2.4 Comparison between WiTilt v2.5 and Wii Remote 69 6 CONCLUSION AND FUTURE RESEARCH 70 vi 6.1 Conclusion 70 6.2 Future research 71 Glossary 75 vn List of Tables 2.1 Computer-based conducting systems 13 3.1 Gesture recognition rules for three beat patterns 26 3.2 The mapping between beats and MIDI notes in the system 30 4.1 The results of recognition only based on the downbeat detection ... 51 4.2 Comparison between the calculated average tempo and the real average tempo 54 5.1 Comparison between video camera and Wii Remote 63 vin List of Figures 2.1 The relationship between the interface and the computer system . 5 2.2 4-beat patterns (drawn based on the pictures in [34]) 10 2.3 4-beat patterns (drawn based on the pictures in [27] and [21]) .... 11 3.1 Five aspects to design a gestural interface 19 3.2 Three beat patterns 21 3.3 An example of visual representations 29 4.1 The setup 32 4.2 Wii Remote (from its website) 33 4.3 The interface of the DarwiinremoteOSC 34 4.4 A baton-like infrared stick 35 4.5 The WiTilt 2.5 used in our system 37 4.6 The coordinates of the WiTilt v2.5 38 4.7 The interfaces of W20 38 4.8 An example of a Max/MSP patch 40 4.9 A snapshot of the main window 41 4.10 A Max/MSP patch to receive sample data via UDP 43 ix 4.11 A correct gesture for a 2-beat pattern 45 4.12 A correct gesture for a 3-beat pattern 47 4.13 A correct gesture for a 4-beat pattern 47 4.14 An error downbeat for a 2-beat pattern 49 4.15 An incorrect gesture for a 2-beat pattern 49 4.16 An example of 12-beat patterns 50 4.17 A screenshot of tempo practice 53 4.18 Examples of the changes of dynamics 55 5.1 The segmentation of a moving hand 60 5.2 The blobs in a moving hand at a certain time 61 5.3 A trajectory before and after smoothing 61 5.4 The effect of linear interpolation 64 5.5 The effect of the smoothing 65 5.6 Three HMMs 66 5.7 A visible states sequence generated using different methods 68 x Chapter 1 INTRODUCTION Gestures axe widely used to aid face-to-face communication between people. Such gestures include hand movements, body language, and eye contact. They deliver information to others without relying on speech. Conducting is leading a musical performance with conducting gestures, such as hand gestures and eye contact. These gestures convey the understanding and intent of a conductor to members in an orchestra.