Free-Space Gesture Mappings for Music and Sound Gabrielle Odowichuk Master of Applied Science
Total Page:16
File Type:pdf, Size:1020Kb
Free-Space Gesture Mappings for Music and Sound by Gabrielle Odowichuk BEng, University of Victoria, 2009 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Applied Science in the Department of Electrical and Computer Engineering c Gabrielle Odowichuk, 2012 University of Victoria All rights reserved. This thesis may not be reproduced in whole or in part by photocopy or other means, without the permission of the author. ii Free-Space Gesture Mappings for Music and Sound by Gabrielle Odowichuk BEng, University of Victoria, 2009 Supervisory Committee Dr. P. Driessen, Co-Supervisor (Department of Electrical and Computer Engineering) Dr. G. Tzanetakis, Co-Supervisor (Department of Computer Science) Dr. Wyatt Page, Member (Department of Electrical and Computer Engineering) iii Supervisory Committee Dr. P. Driessen, Co-Supervisor (Department of Electrical and Computer Engineering) Dr. G. Tzanetakis, Co-Supervisor (Department of Computer Science) Dr. Wyatt Page, Member (Department of Electrical and Computer Engineering) Abstract This thesis describes a set of software applications for real-time gesturally con- trolled interactions with music and sound. The applications for each system are varied but related, addressing unsolved problems in the field of audio and music technology. The three systems presented in this work capture 3D human motion with spatial sensors and map position data from the sensors onto sonic parameters. Two different spatial sensors are used interchangeably to perform motion capture: the radiodrum and the Xbox Kinect. The first two systems are aimed at creating immersive virtually-augmented environments. The first application uses human ges- ture to move sounds spatially in a 3D surround sound by physically modelling the movement of sound in a space. The second application is a gesturally controlled self- organized music browser in which songs are clustered based on auditory similarity. The third application is specifically aimed at extending musical performance through the development of a digitally augmented vibraphone. Each of these applications is presented with related work, theoretical and technical details for implementation, and discussions of future work. iv Table of Contents Supervisory Committee ii Abstract iii Table of Contents iv List of Figures vi Acknowledgements viii 1 Introduction 1 1.1 Problem Formulation . .2 1.2 Thesis Structure . .4 2 Background And Motivation 6 2.1 Contextualizing a Gesture . .7 2.2 Data Mapping . .8 2.3 Free-space Gesture Controllers . 10 2.4 A Case Study . 13 3 Capturing Motion 16 3.1 Spatial Sensor Comparison . 17 3.2 Latency . 19 3.3 Range . 20 v 3.4 Software Tools . 22 3.5 Future Work with Motion Capture . 24 4 Motion-controlled Spatialization 27 4.1 Related Work . 28 4.2 Sound Localization . 29 4.3 Creating a Spatial Model . 30 4.4 Implementation . 31 4.5 Summary and Future Work . 37 5 Gesturally-controlled Music Browsing 38 5.1 Related Work . 39 5.2 Organizing Music in a 3D space . 40 5.3 Navigating through the collection . 44 5.4 Implementation . 45 5.5 Summary and Future Work . 47 6 Hyper-Vibraphone 48 6.1 Related Work . 50 6.2 Gestural Range (Magic Eyes) . 51 6.3 Adaptive Control (Fantom Faders) . 54 6.4 Summary and Future Work . 57 7 Conclusions 59 7.1 Recommendations for Future Work . 60 Bibliography 62 vi List of Figures 2.1 Interactions between Sound and Motion . .6 2.2 Data Mapping from a Gesture to Sound . .9 2.3 Mickey Mouse, controlling a cartoon world with his movements in Fantasia . 10 2.4 Leon Theremin playing the Theremin . 11 2.5 Radiodrum design diagram . 12 2.6 Still shots from MISTIC concert . 15 3.1 Sensor Fusion Experiment Hardware Diagram . 17 3.2 Sensor Fusion Experiment Software Diagram . 18 3.3 Demonstration of Latency for the Radiodrum and Kinect . 19 3.4 Captured Motion of Four Drum Strikes . 21 3.5 Radiodrum Viewable Area . 21 3.6 Kinect Viewable Area . 22 3.7 Horizontal Range of both controllers . 23 4.1 Room within a room model . 31 4.2 Implementation Flow Chart . 32 4.3 Delay Line Implementation . 33 4.4 Image Source Model . 35 4.5 OpenGL Screenshot . 36 vii 5.1 A 3D self organizing map before (a) and after (b) training with an 8-color dataset . 42 5.2 3D SOM with two genres and user-controlled cursor . 44 5.3 Implementation Diagram . 46 6.1 Music Control Design . 52 6.2 Audio Signal Chain . 53 6.3 Virtual Vibraphone Faders . 54 6.4 Computer Vision Diagram . 55 6.5 Virtual recreation of the vibraphone . 56 viii Acknowledgements I'd like to begin by thanking my co-supervisors, Dr. George Tzanetakis and Dr. Peter Driessen, for their support, patience, and many teachings through my undergraduate and graduate studies at UVic. Peter's enthusiasm for my potential and my future has given me motivation and confidence, especially combined with the respect I have for his incredible knowledge and experience. Whenever I asked George if he was finally getting sick of me, he would assure me that could never happen. I'm still not sure how that's possible after all this time, but what a relief, and I will always strive to one day be as totally awesome in every way as George. My first encounter with this field of research and much of my early enthusiasm came from sitting in the classroom of Dr. Andy Schloss. His dry sense of humour and passion for the material is what got me into this world. Thanks also to Kirk McNally, for helping me set up the speaker cube and teaching me some crucial skills with audio equipment, and to Dr. Wyatt Page for his help with my thesis and for showing me what an amazing academic presentation looks like. Early on in my master's program, Steven Ness welcomed me into our research lab, and has helped me understand how to be an effective researcher. Many other friends and colleagues have helped me a long the way: Tiago Tiavares, Sonmez Zehtabi, Alex Lerch, and Scott Miller were all of particular importance to me. A large chapter of this thesis is about a collaboration with Shawn Trail, who is a dear friend and the inspiration for what is, in my mind, the research with the most possible impact down the road. The use of this type of gestural control, when completely into music practice, has expressive possibilities that are still very much untapped. Thanks also to David Parfit for collaborating with me in the Trimpin ix concert, which gave me more context and empirical proof that this type of control is rich with expressive possibilities. Paul Reimer is a close friend and my indispensable coding consultant. If I found myself spending more than a few hours beating my head against a wall with a techni- cal issue, I need only ask Paul for help and my problem would soon be solved. Marlene Stewart has been another source of much support. It's so rare to have people in your life you can rely on so completely like Paul and Marlene. Thanks mom 'n dad for being the proud supportive parents that raised the kind of daughter who goes and gets a master's degree in engineering. And finally thank you to NSERC and SSHRC for supplying the funding for this research. Chapter 1 Introduction The ability for sound and human gestures to affect one another is a fascinating and useful notion, often associated with artistic expression. For example, a pianist will make gestures and motions that affect the sounds produced by the piano, and also some that do not. Both types of gesture are important to the full experience of the performance. A dancer, though not directly changing the music, is also creating a expressive representation of the music, or the ideas and emotions evoked by the music. In this case, sound affects motion. The connection between auditory and visual senses is a large part of what makes audio-visual performances interesting to watch and listen to. Advances in personal computing and the adoption of new technologies allow the creation of new and novel mappings between visual and auditory information. A large motivator for this research is the growing capabilities of personal computing. The mapping of free-space human gesture to sound used to be a strictly off-line operation. A collection of computers and sensors were used to capture motion, and then calculations to produce a corresponding auditory output were synthesized and played back afterwards. Modern computers are able to sense motion and gesture and react almost instantaneously. The ability to capture free-space motion, perform complex calculations, and pro- 2 duce corresponding audio in real-time is a fundamental requirement for the imple- mentation of these systems. This type of control requires thought into how to use this control in many contexts. The secondary feedback of audio playback is an important aspect of what makes gesture-controlled sound and music useful, because accessing or manipulating aural information by listening to auditory feedback of that information is intuitive and natural. Though there are many types of gestures used in human-computer interaction (HCI), in particular this work focuses on three dimensional motion capture of large, relatively slow, continuous human motions. The purpose of this work is not to classify these motions and recognize gestures to trigger events. Instead, the focus is on the mapping continuous human motion onto sonic parameters in three new ways that are both intuitive and useful in the music and audio industry. 1.1 Problem Formulation The possible applications of these gesturally controlled audio systems span several different facets of HCI, and address a variety of music and audio-industry related problems, such as: • Intuitive control in 2D and 3D control scenarios Intuitive and ergonomic control are an important consideration in the field of HCI.