<<

MAESTRO: USING TECHNOLOGY TO IMPROVE Environments, Proc. of IEEE International Confer- of Playing Scales on the Piano, Proc. of Interna- ence on Pervasive Computing, pp. 351–356 (2004). tional Conference on Music Perception and Cogni- KINESTHETIC SKILL LEARNING OF MUSIC CONDUCTORS tion, pp. 1843–1847 (2006). [7] A. Suganuma, Y. Ogata, A. Shimada, D. Arita, and R. Taniguchi: Billiard Instruction System for Begin- [18] S. Smoliar, J. Waterworth, and P. Kellock: pi- Andrea E. Brown Yonatan Sasson ners with a Projector-Camera System, Proc. of Inter- anoFORTE: A System for Piano Education Beyond national Conference on Advances in Computer En- Notation Literacy, Proc. of the Third ACM Inter- School of Music tertainment Technology, pp. 3–8 (2008). national Conference on Multimedia, pp. 457–465 Georgia Institute of Technology (1995). [8] C. Lewiston: MaGKeyS: A Haptic Guidance Key- ABSTRACT difficulties. We contend that use of the system board System for Facilitating Sensorimotor Training [19] T. Kitamura and M. Miura: Constructing a Support in such traditional learning environments would enhance and Rehabilitation, PhD Thesis. MIT Media Labora- System for Self-learning Playing the Piano at the The use of technology in music conductor training is a the learning experience and encourage kinesthetic tory Beginning Stage, Proc. of International Conference growing area of interest. The expressive, subtle, and (2008). awareness and overall musical skill development. on Music Perception and Cognition, pp. 258–262 meaning-rich gestures that are used in , serve The project seeks to advance previous conducting [9] C. Pinhanez: The Everywhere Displays Projector: as fruitful ground for innovative research in areas such (2006). technology and pedagogy through two core advances: a) A Device to Create Ubiquitous Graphical Interfaces, as artificial vision, gesture following, and musical the delivery of rich real-time audio and visual feedback Proc. of ACM International Conference on Ubiqui- [20] Y. Takegawa, T. Terada, and T. Tsukamoto: Design mapping. While it is known that the kinesthetic skills of through the Maestro system to enable the refinement of tous Computing (2001). and Implementation of a Piano Practice Support Sys- conducting are acquired through hours of intensive kinesthetic skills of conducting gestures affecting tem using a Real-Time Fingering Recognition Tech- training, practice with real time audio and visual variations of speed, articulation, dynamic, and speed, [10] H. Ishii, C. Wisneski, J. Orbanes, B. Chun, and nique, Proc. of International Computer Music Con- feedback is severely limited by availability, focus, and and b) the ability to practice conducting gestures without J. Paradiso: PingPongPlus: Design of an Athletic- ference (2011). good will of live musicians. The current project, titled the need for live musicians or peers. The Maestro system Tangible Interface for Computer-Supported Cooper- Maestro, builds upon previous work and provides a new introduces technical innovation-based research in three ative Play, Proc.s of ACM conference on Human fac- [21] Y. Takegawa, T. Terada, and S. Nishio: Design and approach for training beginning conductors: a system main areas: a) gesture anticipation and tracking; b) tors in computing systems, pp. 394–401 (1999). Implementation of a Real-Time Fingering Detec- allowing the conductor to practice basic to advanced machine learning for gesture detection and classification; tion System for Piano Performances, Proc. of In- baton skills accompanied by a virtual that [11] K. Huang, E. Y. Do, and T. Starner: PianoTouch: A c) utilization of physical modeling for high quality, ternational Computer Music Conference, pp. 67–74 responds to the conductor’s baton gestures affecting wearable Haptic Piano Instruction System for Pas- subtle musical feedback. This work is designed to foster (2006). , duration, articulation, and dynamics. By more opportunities for meaningful learning experiences sive Learning of Piano Skills, Proc. of IEEE Interna- incorporating gesture anticipation and tracking, machine through the beginning conductor’s discovery of tional Symposium on Wearable Computers, pp. 41– learning for gesture analysis, utilization of physical subtleties of gestures and their effect on musical 44, (2008). modeling for high-quality audio, Maestro provides performance. [12] M. Mukai, N. Emura, M. MIURA, and M. Yanagida: immediate feedback that is directly related to subtle variations of performed conducting gestures. Generation of Suitable Phrases for Basic Training to 2. RELATED WOKS Overcome Weak Points in Playing the Piano, Proc. 1. INTRODUCTION In recent years, there have been several attempts to of International Congress on Acoustics, MUS-07- simulate the conductor’s baton. Developments in mobile 018 (2007). Performing music, whether playing an instrument, technology and the wide availability of sensors and singing, or conducting, requires a combination of aural, accelerometers encouraged researchers to explore the [13] P. Misty, P. Maes, and L. Chang: WUW - wear Ur cognitive, and kinesthetic skills that require specific hitherto relatively uncharted realm of conducting. The world: a Wearable Gestural Interface, Proc. of ACM practice to improve [1], [2]. Such skills could include Radio Baton [6] was one of the first systems developed conference on Human factors in computing systems, learning the fingering patterns of major and minor scales in this field. It offered an interactive conducting pp. 4114–4116 (2009). on a particular instrument or the weight on the bow of a experience by controlling the tempo of a MIDI sequence stringed instrument. Kinesthetic skills are also the as a feedback to the gesture. Other systems in later years [14] R. B. Dannenberg, M. Sanchez, A. Joseph, P. Capell, foundation of beginning music conducting skills [3]. incorporated sensors for more precise input analysis, R. Joseph, and R. Saul: A Computer-Based Multi- Beginning conducting students must learn a plethora of such as measuring the pressure on the baton [7], tracking Media Tutor for Beginning Piano Students, Jour- movements that include instruction on torso, head, and the conductor’s muscle tension [8], and using a built-in nal of New Music Research, 19 (2-3), pp. 155–173, arm positions and a variety of expressive movements camera on the baton [9]. Improvement over the years 1990. intended to bring about a response from performers. included transition from MIDI to audio-based musical The acquisition of such skills is a challenging task, [15] R. Raskar, J. V. Baar, P. Beardsley, T. Willwacher, feedback [10] to more sophisticated and realistic forms which is historically achieved with individual or group W. Rao, and C. Forlines: iLamps: Geometrically of sound generations [11]. instruction, followed by individual practice. Indeed, Aware and Self-Configuring Projectors, Proc. of In- Similar projects targeted simulation of the several technological innovations address this effort by conducting experience as a way to experience ternational Conference and Exhibition on Computer putting an emphasis on the development of kinesthetic controlling an orchestra, rather than for researching the Graphics and Interactive Techniques (2003). skills related to performing music or providing subtleties of conducting gestures and their musical sophisticated feedback (either in real-time or non real- [16] S. Akinaga, M. MIURA, N. Emura, and Masuzo effect. In 2004, Borchers offered children the time) to act as a virtual music teacher. Yanagida: An Algorithm to Evaluate the Appropri- opportunity to conduct the Vienna Philharmonic Such tools present different solutions for the ateness for Playing Scales on the Piano, Proc. of Orchestra. The ‘conductor’ would stand in front of a practical issues as well as the psychological aspects of International Congress on Acoustics, MUS-07-005 video screen and control the tempo of an actual acquiring musical skills. Practicing in front of a teacher, (2007). performance [12]. Two other systems with similar focus peers, and eventually an audience may cause different are iSymphony [13] and Pinocchio [14], developed a few responses ranging from indifference to anxiety [4], [5]. [17] S. Akinaga, M. MIURA, N. Emura, and Masuzo years later. Creating individualized instructional tools and allowing Yanagida: Toward Realizing Automatic Evaluation Along with programs designed to familiarize and more comfortable practicing environments can be introduce the conducting experience to non-musicians, invaluable to many populations that are affected by such _332 _333

some conducting systems have been developed with with a set of pre-trained gestures, and b) real-time chosen since they track only the movement of infrared analysis to the rich space of physical modeling sound educational and research goals in mind. A system identification of higher-resolution characteristics of the light sources, thus avoiding confusion with other objects generation, we are able to provide a sophisticated and designed to analyze and classify hand gestures of classified gesture. Both comparisons will be performed in space [21]. The baton is wireless to help simulate a intuitive response that would imitate the response of a conductors was implemented on a basis of Hidden with the ultimate goal of mapping any subtle change in a real conducting environment. real orchestra. Markov Models (HMM) and developed for MAX/MSP gesture to subtle parameters that will influence the audio In addition to the infrared sensor on the baton, The high-resolution sensing and tracking devices, [15]. Similar ideas and goals can be seen in Conga, a feedback. higher-level detection (with a lower sampling rate) of the along with the proposed machine learning-based gesture gesture analysis system using graph theory [11], analysis conductor’s head and torso movements are also sensed classification, allow for an intuitive utilization of without real-time component [16], video analysis of 3.3. Physical Modeling and allow the detection of skeletal movement in the 3D physical modeling synthesis, with which we map one conductor’s gestures [17], and a baton simulation using a space. This analysis, combined with the baton gesture to multiple parameters of a physical modeling- Wii remote [18]. Physical modeling is a set of audio signal processing and movement, allows the rendering of the visual feedback. based musical response. Some of the projects introduce complex algorithms synthesis algorithms and models that have been 4.4.2 Visual Feedback and systems for conductors’ gesture analysis, and within developed based on intensive research of the behavior of 4.2. Anticipation and Tracking the constraints and limitations they impose on the acoustic instruments. These models allow the synthesis The visual feedback is provided to the user in multiple Once data are fed into the system (raw coordinates of gestures, they report high accuracy. Most of these related of realistic-sounding audio with relatively low ways. First, the user is able to see a replication of the baton movement), 2D representations of the baton works focused on the activity of conducting in its highest computational and technological resource cost [19], [20]. path of the baton via the infrared LED at the baton’s tip. movements are reconstructed by the software, and are level – developing a digital system that allows an The high-resolution sensing and tracking devices, This path is viewed as a 2D plot that traces the gesture analyzed in two parallel stages. A gesture detection individual to conduct a short excerpt of or an entire along with the proposed machine learning-based gesture as a whole so that the entire gesture can be viewed from algorithm distinguishes between random movement, musical composition. These systems focus primarily on classification, will allow for an intuitive utilization of start to finish. Additionally, the interface enables the system noise, and intentional gestures. The system then the use of movement to control the speed of prerecorded physical modeling synthesis, where we will map one user to view a mirror image of their torso, arms, and searches for specific characteristics of the conducting pieces of music, seeming to aim for the education and gesture to multiple parameters of physical modeling- head while performing a gesture in real time. Both of gestures (e.g. beginning and end of a gesture). A second entertainment of the general public rather than the based musical response. Previous conducting projects these visualizations provide rich valuable feedback to algorithm anticipates the end of a gesture (i.e. attack – learning of kinesthetic skills in order to produce effective have used either MIDI [6], [10], [18] or sampled sounds the user in combination with the audio response. when the baton movement stops) that allows a time- conducting gestures to indicate a combination of tempo, [11], [12], [13] for audio feedback. Physical modeling is accurate, audio feedback without discernable time delay. duration, articulation, and dynamics. another major step towards a realistic conducting 5. CONCLUSIONS This algorithm gathers information on a gesture before environment. the parallel algorithms determines that the current The main achievement of this work is the development 3. INNOVATIONS movement is indeed a gesture. of a complete conducting system that allows a conductor 4. SYSTEM DESIGN In all the projects described above, there is a missing to perform gestures and receive multi-dimensional component which the Maestro system improves upon: The system consists of four interconnected modules as 4.3. Gesture Classification feedback in real-time that matches the musical intent previous systems have been developed based on the illustrated in Figure 1. The modules will include a conveyed by the conductor. In particular, several goals Classification of gestures relies on two orthogonal assumption that the conductor’s gestures convey mostly conductor’s baton; a tracking and sensing system; were achieved: the anticipation algorithm allows the algorithms, providing two layers of detection accuracy. (or only) temporal information; when in practice, a computer software to analyze the gestures, and an system to provide audio feedback with time delay of 5 First, gesture statistics pertaining to the current gesture conducting gesture must convey additional aspects of interface for audio and video feedback. The baton will ms from the end of the gesture (attack). The system was characteristics (e.g. vertical gesture length, acceleration, sound generation, such as articulation, volume, and serve as the physical interface for the user. Spatial pre-trained with 12 different gestures that vary by attack attack characteristics) are gathered by the anticipation duration. coordinates of performed gestures are sent through an style and dynamic intention, while tempo information algorithm, and are compared with gathered statistics of In order to detect and provide feedback for various IR transceiver to the desktop application, where they are was extracted from the gesture in real-time. During tests the trained gestures. The second layer is a Hidden aspects of gestures, the Maestro system introduces recorded and analyzed. Once the analysis algorithm with the authors conducting, the detection rate (judging Markov Model (HMM) algorithm, commonly used for technical innovation based on research in three main recognizes the completion of a gesture, the system if a certain baton movement is a gesture) was 92%, gesture classification and following, and specifically for areas: a) gesture anticipation and tracking; b) machine generates audio and visual output that correlates to the while the classification rate (match between the conducting gesture classification [22], [15]. The HMM learning for gesture analysis; c) utilization of physical performed gesture. conductor’s intent and the perceived audio feedback) algorithm compares the gesture as a whole once the modeling for high-quality audio feedback. was 81%. The system detects discrete gestures and plays statistical analysis is complete, and there is a positive back audio feedback comprised of one instrument, and match between a performed gesture and a trained one. 3.1. Gesture anticipation and tracking displays the visual feedback in real time as a mirror The two algorithms complement each other to image of the gesture. achieve two goals: anticipate the next gesture to provide Since any delay that occurs between a performed gesture audio feedback with no discernable time delay, and and its audio feedback is undesirable within a music- 6. FUTURE WORK prevent false positives for cases in which random baton conducting system, gesture anticipation, allowing movements might be mistaken to be real gestures. The second iteration of the system will include an precision of a few milliseconds is an essential expansion of the number of sets of trained gestures and requirement. The Maestro system uses a high-speed 4.4. Audio and Visual Feedback melodic excerpts in order to provide a richer learning sensing device that provides a data-sampling rate close environment. These will build on the current discrete to 100 Hz in 3D space. Such high-resolution data, Once classification is successful, the musical content is gestures to successive gestures and multiple meter combined with pre-trained gestures, allows the constructed and the recognized gesture is translated to patterns. Additionally, future work with audio feedback anticipation of gestures and achieved accuracy of a few audio and visual feedback. will move beyond a single instrument sound to allow the milliseconds. user the option to hear full orchestra, band, or vocal 4.4.1 Audio Feedback Figure 1. Schematic representation of the design. sounds in response to their gestures. A second iteration 3.2. Machine Learning for Gesture Analysis Parameters gathered from the detection algorithm, along will also include a sophisticated, yet intuitive user 4.1. Conductor’s Baton with the classified characteristics of the gesture are interface to allow the user to change sound preferences, Once a gesture is detected, Maestro’s machine learning mapped to produce a tailored sound, correlating in The baton is a real conductor’s baton, fashioned with an move between practice modules, visually and audibly algorithm requires two kinds of analyses a) real-time dynamics, duration, and articulation to the performed record their session, and change camera viewpoints. classification of a performed gesture by comparing it infrared LED (Light Emitting Diode) at its tip to allow gesture. By mapping the rich space of subtle gesture movement tracking in a 3D space. Infrared sensors were

_334 _335

some conducting systems have been developed with with a set of pre-trained gestures, and b) real-time chosen since they track only the movement of infrared analysis to the rich space of physical modeling sound educational and research goals in mind. A system identification of higher-resolution characteristics of the light sources, thus avoiding confusion with other objects generation, we are able to provide a sophisticated and designed to analyze and classify hand gestures of classified gesture. Both comparisons will be performed in space [21]. The baton is wireless to help simulate a intuitive response that would imitate the response of a conductors was implemented on a basis of Hidden with the ultimate goal of mapping any subtle change in a real conducting environment. real orchestra. Markov Models (HMM) and developed for MAX/MSP gesture to subtle parameters that will influence the audio In addition to the infrared sensor on the baton, The high-resolution sensing and tracking devices, [15]. Similar ideas and goals can be seen in Conga, a feedback. higher-level detection (with a lower sampling rate) of the along with the proposed machine learning-based gesture gesture analysis system using graph theory [11], analysis conductor’s head and torso movements are also sensed classification, allow for an intuitive utilization of without real-time component [16], video analysis of 3.3. Physical Modeling and allow the detection of skeletal movement in the 3D physical modeling synthesis, with which we map one conductor’s gestures [17], and a baton simulation using a space. This analysis, combined with the baton gesture to multiple parameters of a physical modeling- Wii remote [18]. Physical modeling is a set of audio signal processing and movement, allows the rendering of the visual feedback. based musical response. Some of the projects introduce complex algorithms synthesis algorithms and models that have been 4.4.2 Visual Feedback and systems for conductors’ gesture analysis, and within developed based on intensive research of the behavior of 4.2. Anticipation and Tracking the constraints and limitations they impose on the acoustic instruments. These models allow the synthesis The visual feedback is provided to the user in multiple Once data are fed into the system (raw coordinates of gestures, they report high accuracy. Most of these related of realistic-sounding audio with relatively low ways. First, the user is able to see a replication of the baton movement), 2D representations of the baton works focused on the activity of conducting in its highest computational and technological resource cost [19], [20]. path of the baton via the infrared LED at the baton’s tip. movements are reconstructed by the software, and are level – developing a digital system that allows an The high-resolution sensing and tracking devices, This path is viewed as a 2D plot that traces the gesture analyzed in two parallel stages. A gesture detection individual to conduct a short excerpt of or an entire along with the proposed machine learning-based gesture as a whole so that the entire gesture can be viewed from algorithm distinguishes between random movement, musical composition. These systems focus primarily on classification, will allow for an intuitive utilization of start to finish. Additionally, the interface enables the system noise, and intentional gestures. The system then the use of movement to control the speed of prerecorded physical modeling synthesis, where we will map one user to view a mirror image of their torso, arms, and searches for specific characteristics of the conducting pieces of music, seeming to aim for the education and gesture to multiple parameters of physical modeling- head while performing a gesture in real time. Both of gestures (e.g. beginning and end of a gesture). A second entertainment of the general public rather than the based musical response. Previous conducting projects these visualizations provide rich valuable feedback to algorithm anticipates the end of a gesture (i.e. attack – learning of kinesthetic skills in order to produce effective have used either MIDI [6], [10], [18] or sampled sounds the user in combination with the audio response. when the baton movement stops) that allows a time- conducting gestures to indicate a combination of tempo, [11], [12], [13] for audio feedback. Physical modeling is accurate, audio feedback without discernable time delay. duration, articulation, and dynamics. another major step towards a realistic conducting 5. CONCLUSIONS This algorithm gathers information on a gesture before environment. the parallel algorithms determines that the current The main achievement of this work is the development 3. INNOVATIONS movement is indeed a gesture. of a complete conducting system that allows a conductor 4. SYSTEM DESIGN In all the projects described above, there is a missing to perform gestures and receive multi-dimensional component which the Maestro system improves upon: The system consists of four interconnected modules as 4.3. Gesture Classification feedback in real-time that matches the musical intent previous systems have been developed based on the illustrated in Figure 1. The modules will include a conveyed by the conductor. In particular, several goals Classification of gestures relies on two orthogonal assumption that the conductor’s gestures convey mostly conductor’s baton; a tracking and sensing system; were achieved: the anticipation algorithm allows the algorithms, providing two layers of detection accuracy. (or only) temporal information; when in practice, a computer software to analyze the gestures, and an system to provide audio feedback with time delay of 5 First, gesture statistics pertaining to the current gesture conducting gesture must convey additional aspects of interface for audio and video feedback. The baton will ms from the end of the gesture (attack). The system was characteristics (e.g. vertical gesture length, acceleration, sound generation, such as articulation, volume, and serve as the physical interface for the user. Spatial pre-trained with 12 different gestures that vary by attack attack characteristics) are gathered by the anticipation duration. coordinates of performed gestures are sent through an style and dynamic intention, while tempo information algorithm, and are compared with gathered statistics of In order to detect and provide feedback for various IR transceiver to the desktop application, where they are was extracted from the gesture in real-time. During tests the trained gestures. The second layer is a Hidden aspects of gestures, the Maestro system introduces recorded and analyzed. Once the analysis algorithm with the authors conducting, the detection rate (judging Markov Model (HMM) algorithm, commonly used for technical innovation based on research in three main recognizes the completion of a gesture, the system if a certain baton movement is a gesture) was 92%, gesture classification and following, and specifically for areas: a) gesture anticipation and tracking; b) machine generates audio and visual output that correlates to the while the classification rate (match between the conducting gesture classification [22], [15]. The HMM learning for gesture analysis; c) utilization of physical performed gesture. conductor’s intent and the perceived audio feedback) algorithm compares the gesture as a whole once the modeling for high-quality audio feedback. was 81%. The system detects discrete gestures and plays statistical analysis is complete, and there is a positive back audio feedback comprised of one instrument, and match between a performed gesture and a trained one. 3.1. Gesture anticipation and tracking displays the visual feedback in real time as a mirror The two algorithms complement each other to image of the gesture. achieve two goals: anticipate the next gesture to provide Since any delay that occurs between a performed gesture audio feedback with no discernable time delay, and and its audio feedback is undesirable within a music- 6. FUTURE WORK prevent false positives for cases in which random baton conducting system, gesture anticipation, allowing movements might be mistaken to be real gestures. The second iteration of the system will include an precision of a few milliseconds is an essential expansion of the number of sets of trained gestures and requirement. The Maestro system uses a high-speed 4.4. Audio and Visual Feedback melodic excerpts in order to provide a richer learning sensing device that provides a data-sampling rate close environment. These will build on the current discrete to 100 Hz in 3D space. Such high-resolution data, Once classification is successful, the musical content is gestures to successive gestures and multiple meter combined with pre-trained gestures, allows the constructed and the recognized gesture is translated to patterns. Additionally, future work with audio feedback anticipation of gestures and achieved accuracy of a few audio and visual feedback. will move beyond a single instrument sound to allow the milliseconds. user the option to hear full orchestra, band, or vocal 4.4.1 Audio Feedback Figure 1. Schematic representation of the design. sounds in response to their gestures. A second iteration 3.2. Machine Learning for Gesture Analysis Parameters gathered from the detection algorithm, along will also include a sophisticated, yet intuitive user 4.1. Conductor’s Baton with the classified characteristics of the gesture are interface to allow the user to change sound preferences, Once a gesture is detected, Maestro’s machine learning mapped to produce a tailored sound, correlating in The baton is a real conductor’s baton, fashioned with an move between practice modules, visually and audibly algorithm requires two kinds of analyses a) real-time dynamics, duration, and articulation to the performed record their session, and change camera viewpoints. classification of a performed gesture by comparing it infrared LED (Light Emitting Diode) at its tip to allow gesture. By mapping the rich space of subtle gesture movement tracking in a 3D space. Infrared sensors were

_334 _335

The desired end result of this work is to provide a expressive gesture. PhD thesis, Massachusetts Institute of Technology. new, meaningful tool to music conducting pedagogy NUANCE: A SOFTWARE TOOL FOR CAPTURING that enhances conductors’ development of subtle [9] Murphy, D., T. H. Andersen, and K. Jensen. 2003. gestures affecting a full range of musical expression. Conducting audio files via computer vision. SYNCHRONOUS DATA STREAMS FROM MULTIMODAL The Maestro system is being developed iteratively and Proceedings of the Gesture Workshop, Genova. MUSICAL SYSTEMS incrementally with input from conductors of various [10] Ilmonen, T. 2000. The virtual orchestra competency levels. An accompanying curriculum is also performance. CHI. ACM. 1,2 1,2 being developed and will be deployed within the context Jordan Hochenbaum Ajay Kapur 1 2 of a music conducting class of undergraduate music [11] Grull, I. 2005. Conga: A conducting gesture analysis New Zealand School of Music California Institute of the Arts majors. The system will be disseminated and evaluated framework. Masters Thesis, University of Ulm. PO Box 2332 24700 McBean Parkway in an undergraduate introductory conducting course, [12] Borchers, J., E. Lee, and W. Samminger. 2004. Wellington 6140, New Zealand Valencia CA, 91355 evaluated by participating students and the course Personal orchestra: A real-time audio/video system [email protected] [email protected] instructors. Following the analysis of the evaluations, for interactive conducting. Multimedia Systems 9, further modifications to the Maestro system and 458–465. ABSTRACT collaborative curriculum will be made before another [13] Lee, E., T. Karrer, H. Kiel. 2006. iSymphony: An accelerometer and air-pressure sensor measures the iteration of the study the following year. Adaptive Interactive Orchestral Conducting System In this paper we describe Nuance, a software characteristics of the saxophone performance. Given the Future potential uses of the project include for Digital Audio and Video Streams. In application for recording synchronous data streams highly individualized nature of working with different widespread accessibility to conductor training programs Proceedings of CHI, Montreal, Canada, 259-262. from modern musical systems that involve audio and instruments and musical contexts, each problem requires a different software tool to be written for and the appropriation of the project for use by [14] Bruegge, B., C. Teschner, P. Lachenmaier, E. Fenzl, gesture signals. Nuance currently supports recording acquiring the data set. Imagine being a recording or live individuals at all levels of musical skill and age. System D. Schmidt, and S. Bierbaum. 2007. Pinocchio: data from a number of input sources including real-time sound engineer and requiring a specific piece of components and techniques that will be developed as conducting a virtual symphony orchestra. In Proc. audio, and any sensor system, musical interface, or part of the project could also be used in medical ACE 2007, ACM Press, 294-295. instrument which outputs serial, Open Sound Control hardware, or software plug-in, to interface with each research such as communicative and movement abilities (OSC), or MIDI. Nuance is unique in that it is a highly instrument being used in a performance. In this paper, [15] Kolesnik, P., and M. Wanderley. 2004. Recognition, we describe a software tool we have created called of disabled persons, sign language technologies for analysis and performance with expressive customizable to the user and unknown musical systems Nuance, which begins to address such scenarios. We people with visual disabilities, novel gaming interfaces, conducting gestures. In Proceedings of the 2004 for music information retrieval (MIR), allowing and music creation software. International Computer Music Conference virtually any multimodal input to be recorded with hope Nuance brings the task of gathering multimodal (ICMC2004), Miami, Fl. minimal effort. Targeted toward musicians working data sets for MIR one step closer to the ease, usability, 7. ACKNOWLEDGEMENTS and productive workflow refined in traditional Digital [16] Je, H., J. Kim, and D. Kim. 2007. “Hand gesture with MIR researchers, Nuance considerably minimizes the set-up and running times of MIR data acquisition Audio Workstations [4]. The authors would like to thank Marcelo Cicconet for recognition to understand musical conducting The remainder of this paper is as organized as action,” in Proc. of. IEEE International Conference scenarios. Nuance attempts to eliminate most of the his help with the initial setup of the hardware interface, follows. Section 2 describes the motivations behind on Robot & Human Interactive Communication. software programming required to gather data from and his continuous help with the implementation of the Nuance, based on the shortcomings of other available custom multimodal systems, and provides an easy drag- HMM algorithm. [17] Nakra, T. M., A. Salgian, and M. Pfirrmann. 2009. solutions. Section 3 describes the software architecture “Musical analysis of conducting gestures using and-drop user interface for setting up, configuring, and recording synchronous multimodal data streams. and capabilities of Nuance. Sections 4 and 5 detail 8. REFERENCES methods from computer vision.” International recent and possible future research (respectively) Computer Music Conference, Montreal. [1] Costa-Giomi, 2005; Does music instruction improve 1. INTRODUCTION supported by the software, and lastly conclusions are fine motor abilities? Annals of the New York [18] Peng, L., and D. Gerhard. 2009. A Wii-based discussed in section 6. Academy of Sciences. 1060, 262-4. gestural interface for computer-based conducting systems. In Proceedings of the 2009 Conference on Imagine a common scenario where a researcher is investigating some music related problem. Whether the 2. BACKGROUND AND MOTIVATION [2] Dickey, 1992; A review of research on modeling in New Interfaces For Musical Expression, Pittsburgh, task is a classification problem, clustering, pattern music teaching and learning. 133 (Summer), 27-40. PA, USA. matching, query/retrieval, musical perception and Before creating Nuance, a number of available software

[19] Scavone, G. P. 1997. An acoustic analysis of single- cognition problem, etc, all tasks share the initial step of options were considered. While not comprehensive, the [3] Haithcock, M., K. Geraldi, and B. Doyle. 2011. reed woodwind instruments with an emphasis on acquiring and preparing the data set. While this point tools discussed in this section were the most ubiquitous Conducting Textbook. (self-published). design and performance issues and digital tools that appeared to fit the required use cases. The seems quite trivial, consider the following. Say the task waveguide modeling techniques, Ph.D. thesis, Music main requirement was to output synchronized [4] Kenny, D. 2011. The Psychology of Music Dept., Stanford University. is a performance metrics problem and the dataset is a Performance Anxiety. Oxford University Press. collection of features extracted from microphone recordings from a variety of input sources including [20] Smith, J. O. 2004. Virtual acoustic musical recordings of a drummer. The researcher would like to audio, MIDI, OSC, serial sensor interfaces, and [5] Wilson, G. D. 1997. Performance anxiety. In instruments: review and update J. New Music Res. perform a similar experiment with a saxophonist. No hyperinstruments. Figure 1 offers an input requirement Hargreaves DJ, North AC (eds) The social (33), 283–304. problem, there are tools the experimenter could easily comparison between five of the available software / psychology of music. Oxford University Press, [21] Guy, E. G., F. Malvar-Ruiz, and F. Stoltzfus. 1999. framework candidates we studied. Oxford, pp 229–245. use to record the audio, perform feature extraction, and Virtual conducting practice environment. In finally analysis. This scenario, however, becomes much The three candidates represented by fully dashed [6] Matthews, M. V. 1991. The radio baton and the Proceedings of the International Computer Music more difficult when the experiment involves custom rectangles in Figure 1 (Marsyas, Chuck, and the conductor program, or: Pitch—the most important Conference. ICMA. Pages 371-374. instruments, interfaces, and multimodal/multisensory CREATE Signal Library or CSL) are popular and least expressive part of music. Computer Music programming languages or frameworks that are capable [22] Usa, S. and Y. Mochida (1998). A conducting input systems. Let’s say the drummer mentioned is Journal 15(4), 37–46. of multimodal data collection. Both Marsyas [12] and recognition system on the model of musicians playing a drum modified with various sensors on the Chuck [13], for example, have many features for [7] Marrin, T. 1997. Possibilities for the digital baton as process. Journal of Acoustical Society of Japan drumhead and stick, the data of which is to be captured performing data capturing, analysis, machine learning, a general-purpose gestural interface. CHI, pages 19(4), 275–287. alongside the audio recording. Similarly, an 311-312. ACM.

[8] Nakra, T. M. 2000. Inside the Conductors Jacket: analysis, interpretation and musical synthesis of

_336 _337