Abstract Kinect-based Music Application for Children with Severe Physical Disabilities I Made Satrya Rudana

Teknisk- naturvetenskaplig fakultet UTH-enheten Based on initial interviews with music teachers at Årstra Special School, Uppsala, it was found that each child in a music playing Besöksadress: session has different preferences of type and sound of a music Ångströmlaboratoriet Lägerhyddsvägen 1 instrument. However, most of them have combined cognitive and physical Hus 4, Plan 0 impairments, preventing them from playing the instrument that they might like. Postadress: Box 536 751 21 Uppsala Starting from this idea, we developed a music application using virtual instruments, so that various types of instruments and sound can be used Telefon: during a single music playing session. As an input device, we used a 018 – 471 30 03 Kinect sensor developed by Microsoft, i.e., a camera based sensor that

Telefax: detects human gestures. Our application used this Kinect sensor 018 – 471 30 00 capability to allow users to control and play the sound by just moving their arms in the air. Hemsida: http://www.teknat.uu.se/student Our study has shown promising results of this applicatio, such as the positive response from the participant towards the application and the ability to change the sound of an instrument to match the participant's preference easily. However there are still some things to consider before releasing it as a consumer product, for instance better calibration and accuracy.

Handledare: Lars Oestreicher Ämnesgranskare: Justin Pearson Examinator: Anders Jansson IT 17011

Contents

1 Introduction 4

1.1 Background...... 4

1.2 ResearchQuestions...... 6

1.3 ResearchGoals ...... 6

1.4 RelatedWorks ...... 6

2 Background Theory 8

2.1 MIDI...... 8

2.1.1 MIDI Messages ...... 9

2.1.2 Virtual Instrument ...... 10

2.2 Kinect...... 11

3 Method 13

3.1 PilotStudy ...... 13

3.2 Target group ...... 13

3.3 InstrumentPlayingSession ...... 13

3.4 DataGathering...... 14

3.5 Software and Hardware Requirements ...... 14

3.5.1 Digital Audio Workstation ...... 14

3.5.2 Processing3 ...... 15

3.5.3 Microsoft Windows 10 ...... 16

4 Implementation 17

4.1 Kinect-based Music Application ...... 17

1 4.1.1 Hardware Requirements ...... 17

4.1.2 Skeletal Tracking ...... 19

4.1.3 Joint Angle Calculation ...... 19

4.1.4 Joint Angle to MIDI Message ...... 20

4.1.5 User Interface ...... 20

5 Result 23

5.1 Tracking...... 23

5.2 Participant Response ...... 25

6 Discussion and Future Research 27

6.1 Calibration and Accuracy ...... 27

6.2 Portability, Ease of Use, and Price ...... 27

6.3 Support ...... 28

6.4 ConnectedSystem ...... 28

6.5 Evaluation Method ...... 28

7 Conclusion 29

References 30

Appendices 32

2 List of Figures

1 -8, an 8-voice polyphonic analogue synthesizer manufactured by Roland in 1981 without MIDI capabilities (Roland, 2014) ...... 8

2 Roland Midi Processing Unit MPU-401 (Jim, 2014) ...... 9

3 KinectV2 ...... 12

4 FruityLoops ...... 15

5 Ableton Live ...... 15

6 Processing ...... 16

7 Kinect-based Music Application Process Diagram ...... 18

8 Upper Body Joints and Pelvis Joints ...... 19

9 User Interface of the Application ...... 22

10 SuccessTracking ...... 23

11 False Detection of Arm ...... 25

3 1 Introduction

This master thesis is conducted in IT Department of Uppsala University, Sweden as a part of a bigger research project led by Lars Oestreicher called Muminprojektet. Muminprojektet or the Mumin Project (Oestreicher, 2016) combines the capabilities of current available technology and music as a music instrument for people with physical disabilities. The part of the project explored in this thesis focused on an active music participation activity for the patient while this thesis focused on the possibility of using Kinect sensor as a music instrument in a Kinect-based music application.

1.1 Background

Based on the interview with one of the teachers in Arsta˚ Special School, it was found that each child has a di↵erent preference regarding to the instrument type, when they are having a music therapy session at school.

This observation proved to be a very important point in the music playing sessions especially for children with physical disabilities, because a research showed that children’s limitation of motor skills in certain leisure activities could lead to both disappointment and frustration (Kanagasabai, Mulligan, Mirfin-Veitch, & Hale, 2014). There are several instruments that are available in the school at the moment for the Mumin project. There are traditional instruments, such as: piano, snare drums, xylophones, one stringed bass , and also more alternative instruments, such as: drum pads, electric harp, Theremini and other electronic instruments. The instrumetns are always adapted to the children’s abilities, so, for example, some of the children have very limited abilities to move their arm, and could only move their fingers at most. Thus they would prefer an instrument that is easy to play and suits their reduced motor skill. For instance, based on our initial interview in Arsta˚ Special School, children who only could control their fingers prefer an instrument such as Theremin1. Other children, who had more flexible and stronger arms, preferred to play an instrument that has tactile feedback for example a piano or drum pad.

We tried to avoid the use of term therapy in this application development because we did not evaluate this application clinically and it has thus not been proven to be a therapy tool. Still we have tried to develop an application that can be used in a music playing session in Arsta˚ special school. We still suspect from the ongoing research work, that this application has a great potential to be used in a physical therapy if a certain

1Theremin is an instrument that can be played without touching. Small hand gestures around the antenna is used to control the pitch and volume (World, 2005).

4 medical guidelines taken into consideration when developing this application.

The development of hardware nowadays has caused technical issues in music pro- duction — like latency or response time, memory capacity limitation, hard disk speed limi- tation, and processing power — slowly being reduced to minimum, even allowing a virtual instrument to be played live by a professional musician on stage. The advantage of virtual instrument compared to an normal instrument is the virtual instrument can produce the sound of any music instrument while can be played in various way. This is essential when we design a music instrument for children with various physical disabilities. Children who enjoy the sound of an acoustic for instance, might not be able to play a regular acoustic gui- tar because they have too weak arms to hold the guitar. With the help of technology, music therapist could provide a music instrument with buttons that is connected to a so the children can play the sound of guitar easily only by pushing di↵erent buttons.

Research in computer vision, virtual reality, and augmented reality has now given opportunity for the creation of applications with potential benefits for people with dis- abilities (Gonzlez-Ortega, Daz-Pernas, Martnez-Zarzuela, & Antn-Rodrguez, 2014). This includes the use of Kinect sensor and compatible applications to motivate people with phys- ical disability to perform several activities as physical therapy. For instance, a Kinect-based therapy application made by Reflexion Health called Vera system, has a United States Food and Drug Association Premarket approval, which means that the application is soon to be on the market (Microsoft, 2015). Further research suggested future improvements of adding entertaining and amusing elements this type of applications that uses Kinect sensor as input device (Chang, Chen, & Huang, 2011). Starting from that, this research tried to replicate the work done in previous researchs (Gonzlez-Ortega et al., 2014), (Microsoft, 2015), and (Chang et al., 2011) with di↵erent type of application implementation. This research would explore the capabilities of Kinect to be used as an input device and connect it to a vir- tual instrument program. The users were then encouraged to move their body and trigger di↵erent sound’s volume and pitch.

However, there are still some issues concerning to the therapy session itself. It has been shown that therapy sessions in a therapy center were very time consuming (Christy, Chapman, & Murphy, 2012). Transportation is needed to transport the patient from home to the therapy center and back. A home therapy session with family member has then been considered as an alternative. Funding or financial issues can also be a challenge because most of therapy session required trained therapist and tools to support the therapy (Cada & O’Shea, 2008). It will cost a large amount of money to pay for every therapy session. Also, parents that have children with disabilities pointed out the availability of trained therapists still being an issue (Cada & O’Shea, 2008). By combining the capabilities of the Kinect

5 sensor and a computer, this research aims to build a working system that considers such important factors for the target user groups as portability, ease of use, and price.

Based on our background study, we tried to develop an application that can be used in a music playing session while also considering its portability, ease of use, and price. We found Kinect sensor as a virtual instrument could be an alternative tool in music playing session. Kinect sensor provides 25 joints tracking capability to track human figure. This tracking information then sent to an application that we developed using Processing frame- work version 3.0 and Microsoft Kinect SDK V2. Our application then will process the data and send it to a Digital Audio Workstation (DAW) software to trigger any sound based on user preference.

1.2 Research Questions

For the thesis I have started from the following set of research questions:

1. What are the capabilities of Kinect sensor that potentially can be used as an input device in a music playing session for people with disabilities?

2. For what types and to what degrees of disability is this Kinect-based music application suitable to be used as music instrument?

3. How do the target group respond when introduced to this type of application?

1.3 Research Goals

The main goal of this thesis is to find out how Kinect sensor performs as an input device in a music playing session using a virtual music instrument. The results of this thesis are going to contribute as a proof of concept for further research in designing alternative music instruments for people with special needs.

1.4 Related Works

In one research, it was mentioned a case where a woman trumpet player and music teacher just started su↵ering disseminated sclerosis, a disease that attacks nerve system and causes muscle weakness (Bate, 1976). Her condition made her unable to hold the weight of a trumpet for a long period without resting. Bill Thompson from Phil Parker Brass Studio

6 then came up with an idea to modify the trumpet so she can continue to pursue her career as a trumpet player. He managed to modify the trumpet valve so the instrument’s weight will rest on her thigh. This new set up focused on adjusting the existing instrument to fit her needs, resulting a satisfying solution for her condition. However, this approach seems dicult to be applied in a group of users with di↵erent needs and conditions. One new design that works on one user might not work for other users. One solution is to group people with similar physical condition and try to find one or more music instruments that are best suit to their condition.

There are also other challenges in a therapy service access for people with disabil- ities including financial/funding issues, trained therapist availability, transportation issues, the use of convenient times, and location of therapy (Cada & O’Shea, 2008). In other words, the design of a new music instrument should consider its price, portability, and the ease of use when it is going to be used without a trained therapist around. Thus far we have added several keyboards and other interactive devices to the general setup, all of which fulfill the given requirements on the instruments.

Further research has provided qualitative data that suggests that the use of elec- tronic technology had been successful for people who have physical disabilities (Magee, 2006). The vast development of computer parts lately has made electronic technology cheaper, and also made more complex systems more accessible for public use. Looking at this condition, electronic technology in the form of a music instrument therefore has an opportunity to face the challenge of two previous researches. A cheap simple device or ap- plication could be developed so its users can calibrate the setting to adapt to their physical condition, small enough to be carried, and require minimum knowledge to operate even without the help of experts.

7 2 Background Theory

In this section, we will elaborate all the components that were used during this study. The standard set of MIDI was the main protocol used for the communication between the Kinect-based application and the Virtual Instruments.

2.1 MIDI

The Musical Instrument Digital Interface or MIDI protocol is used for communication be- tween instruments and as a common language (Gu´erin, 2005). Before MIDI was invented in 1983, there were very limited ways for musician to make their instruments communicate with each other, especially between instruments that are produced from di↵erent manufacturers.

For instance, if musicians wanted to play two di↵erent synthesizers made by two di↵erent manufacturer, they had to play it one at a time or with their two hands simultane- ously. They could not play the notes from one synthesizer, and send the notes simultaneously to another synthesizer because there was no way to sync the notes between synthesizers.

During National Association of Music Merchants (NAMM) meeting in 1982, from Circuits in USA and Ikutaro Kakehashi from Roland Corp. in Japan, met and discussed about method to enable synthesizer (figure 1) to talk to each other. They were agreed to call this protocol as UMI or Universal Music Instrument.

Figure 1: Roland Jupiter-8, an 8-voice polyphonic analogue synthesizer manufactured by Roland in 1981 without MIDI capabilities (Roland, 2014) .

Not long after that, in 1983 another music instrument manufacturers such as Ober- heim and Yamaha, joined to improve UMI and finally set the new protocol MIDI as a stan-

8 Figure 2: Roland Midi Processing Unit MPU-401 (Jim, 2014) dard. MIDI was first introduced as a communication standard between pure synthesizers or sequencers. However, Roland Corp. found an opportunity to enable personal computers (PC) to be used as a digital alternative to the analog sequencers. Therefore in 1984, Roland Corp. released MPU-401, one of the earliest model of MIDI interface for connecting music instruments and PCs.

MIDI has developed and been adapted by many companies for their instruments. During its development, the interface used for MIDI protocol evolves from the old serial connection to the latest Universal Serial Bus (USB) that is available in almost every PC or laptops these days. By letting synthesizers, sequencers, and PCs communicate through MIDI protocol; they have the ability to send and receive information through these connections in binary format. This information is called MIDI messages.

2.1.1 MIDI Messages

The general idea of MIDI messages is sending binary data through the MIDI interface, so every device or instrument that is connected can communicate to other. The overall structure of the basic MIDI messages consists of one status byte which can be followed by one or more Data bytes. MIDI interaction units are by default very abstract and on a very low level computationally. This is both an advantage, being very general, and a disadvantage since it requires much low level communication skills.

The status byte identifies the kind of information being sent and to what channel it should be sent. The kind of information that can be identified are: Note On, Note O↵,

9 Polyphonic Key Pressure, Control Change, Program Change, Aftertouch, and Pitch Bend Change. To complete the message, a number between 1 and 16 should be assigned for channel identification. For example, a Note O↵ message sent to channel 4, in bits or binary will be sent as 10000011. Another example, a Note On message sent to channel 4 will be sent as 10010011.

First four bits of the first example (1000) identified as the Note O↵ message while in the second example (1001) identified as the Note On message. The last four bits of both examples (0011) identified as the channel number 4.

While the status byte provides the identification of the message, the actual value of the message is sent as data bytes. For example, from the previous Note On message, it will require 2 Data bytes sent along with the Status byte. First Data byte will be identified as the note number and the other Data byte as the velocity value.

In the end a complete single Note On MIDI message would be (10010011) (00111100) (00111100). This message means Note on to channel 4, with a note number 60 and velocity value also 60.

2.1.2 Virtual Instrument

As its name suggests, the virtual instrument is an application or program that virtually simulates various instruments in a PC. It will produce a sound that is similar to what a synthesizer or other instruments could produce. It can also communicate through the MIDI protocol. Virtual instruments can be found in a Digital Audio Workstation application (DAW) or in a stand-alone application.

The advantage of using a virtual instrument are (Gu´erin, 2005):

1. Diversity. It is capable to produce a various type of sound without the limitation of the hardware, as we found in an analogue synthesizer.

2. Economy. A few of virtual instruments are free, but most of the professional produc- tion level virtual instruments cost some money too. However, because of the diversity of instrument that can be used in a virtual instrument application, it is still much cheaper and easier to use the application than getting each hardware instrument to produce each di↵erent sound.

3. Flexibility. Because there is no limitation of the hardware, virtual instrument pro- vides a full flexibility for their user to be creative in producing a sound. Users can use

10 multiple virtual synthesizers without the hassle that they might found when using the analogue synthesizer.

However, compared to a real instrument, there are several disadvantages of using virtual instruments (Gu´erin, 2005):

1. Processing power, memory, and hard disk speed limitation. Virtual instrument depends on its host PC’s capability to perform well. The faster the PC capabilities to process the instructions, the smoother will the program run. On the other hand real instrument will perform consistently without any limitation to the processing power. It is also applied to memory and hard disk speed. The bigger capacity of RAM means more sounds can be produced at the same time, and the faster hard disk means a faster response of the program.

2. Sound quality. The quality of the sounds that produced by some of the real instru- ments is so high and complex that the virtual instrument cannot imitate.

Nevertheless, latest computers started to solve those disadvantage issues. Rapid development of computer hardware has reduced the prices of the components while at the same time increase its performance. As a result, more virtual instruments capable computers are available and more complex sound can be produced.

2.2 Kinect

Kinect is a device developed by Microsoft that originally sold as an additional device for Xbox gaming console. Kinect sensor combines RGB camera and depth sensor to provide a 3d motion capture of a human body and also to recognize human gesture (Chang et al., 2011). The latest Kinect sensor is called Kinect version 2 (V2), released in July 2014 for 200 USD retail price. (see Figure 3)

There are 3 main components that make the Kinect V2 work: RGB camera, depth sensor, and microphone array.

The RGB camera performs as a regular camera and captures the RGB value of every pixel from the camera sensor. On the other hand, depth sensor projects infrared laser lights and capture depth information. Microphone arrays are used to di↵erentiate between users at the same time by detecting the direction of sound that users made. Kinect sensor uses Kinect Software Development Kit (SDK) V2 or OpenNI library to be able to run on PC or Macintosh computers. Kinect sensor V2 captures 1920x1080 pixels of color stream

11 Figure 3: Kinect V2 and 512x424 pixels of depth and infrared stream. All of the streams are captured in 30 frames per second frame rate.

The Kinect sensor capturing process begins with an infrared projector transmits an invisible infrared laser lights to the target surface in a dot pattern. Each dots of infrared lights hits di↵erent parts of the target and then bounced back to the depth sensor. Kinect sensor then calculates the di↵erent time each speckle travels from the projector until it reaches back to the depth sensor. By using this information, Kinect sensor is able to map di↵erent distance of objects that is placed in front of the sensor.

The RGB camera and depth sensor of the Kinect are not placed in the exact same position, there will be a slightly di↵erent stream produced between its RGB camera and the depth sensor. A calibration is needed to make sure this stream can support each other while also creating useful color-depth information of the target. A CoordinateMapper function provided by Kinect SDK V2 can be used to deal with this challenge.

After getting all the stream and depth information of the target, the Kinect sensor along with its SDK processes all the information and displays it to the screen. Using the SDK, the Kinect sensor has a ”built-in” capability to detect body joints, body skeletons, facial expression, and hand gesture of a human.

12 3 Method

In this section, we will elaborate the requirements and the method used during the study.

3.1 Pilot Study

The pilot study was set up in order to obtain a result that can be used to support a further research with similar area of implementation. This study used a Kinect-based music application prototype that built in the beginning of the study, and then evaluated after every pilot session conducted. Feedback was gathered from every session and then used to improve and develop the prototype so it can be used for the next pilot session. Each pilot session held for about 20 minutes for every participant. On each day, there were 3 to 5 sessions held.

3.2 Target group

Participant that joined this research are children from Arsta˚ Special School in Uppsala, Sweden. They have di↵erent types of disability, including autism, cerebral palsy, and ADHD. The age of the participants varies from 6 to 15 years old. A total of 7 children have participated in this research.

All participants used a wheel chair and had a personal assistant to take them anywhere in the school. Every Monday to Friday, they go to school from 8 am to 5 pm and have di↵erent types of activity. The purpose of activity is to let them learn, exercise, have fun, and rehabilitate from their condition.

These activities influenced not only their physical condition but also mental condi- tion. These activities including playing musical instrument, watching movies, drawing, are scheduled every day, in individual or group. Each week, each children at least have a one session of individual instrument playing, guided by a teacher, an assistant, and a researcher from Uppsala University, Lars Oestreicher.

3.3 Instrument Playing Session

Playing musical instrument is one of regular activities in Arsta˚ Special School. There are several types of musical instruments that can be used in an instrument playing session.

13 Students are introduced with all types of instruments, but eventually each student have their preference of musical instrument to play with. There are several types of musical instrument available in the school, including percussion, string instruments, and electrical instruments. During the last few months a research led by Lars Oestreicher has been conducted to find out how electronic music instruments can be used in the session.

3.4 Data Gathering

During this study, qualitative data were gathered from unstructured interview and obser- vation. Interviews with teachers and personal assistants were held after every session ends. During the pilot session, several screen shots from the computer used in the session were collected to get an overview of the session for each participant. Later, teacher and assistants of the children would have a semi-structured interview to identify issues when using this Kinect-based music application, and discuss the screen shots that were taken during the session.

3.5 Software and Hardware Requirements

In this section, we elaborate all the required software and hardware to build the application for the study. There are several alternatives of software and hardware that can be used to build the similar application, however the final result may vary.

3.5.1 Digital Audio Workstation

A Digital Audio Workstation (DAW) is a software that can be used for playing virtual instruments using a computer. It enables the users to play virtual instruments by receiving MIDI signals from a MIDI controller input device, such as a keyboard, or the Kinect camera. This input device sends the MIDI signal through a MIDI protocol between the input device and the PC, and then triggers a specific sound from a virtual instrument in the DAW.

Fruity Loops (figure 4) and Ableton live (figure 5) are both examples of DAW software that o↵er various choices of virtual instruments, including synthesizers, string in- struments, percussion, and also ranging from free to paid instruments.

14 Figure 4: Fruity Loops

Figure 5: Ableton Live

3.5.2 Processing 3

Processing is a programming language that used by students, researchers, artists, and de- signers in learning and prototyping in the context of visual arts. Processing is an open source

15 software developed by Ben Fry and Casey Reas that built from Java language (Processing, 2015). Processing version 3 used in this study to develop a prototype of an application that performs as an input device for playing virtual instrument. This application sends a MIDI signal to the DAW to trigger a virtual instrument.

Figure 6: Processing

3.5.3 Microsoft Windows 10

Microsoft Windows 10 used in this study as a operating system in the PC host of the system. This is important to be mentioned because Microsoft Kinect SDK is used in this study. Up until now, the Kinect SDK that is released by Microsoft works only with Microsoft Windows as operating system. Another option that could work is using Mac OS X or Linux but with di↵erent SDK and Kinect processing libraries.

16 4 Implementation

In this section, we will explain how this application was developed. We conducted a PACT analysis to have a basic idea for designing the user interface of the application. The results of this analysis is included in the description of the implementation.

4.1 Kinect-based Music Application

A Kinect-based music application was developed to use in this study, using the Processing framework, an application prototype was built using the software development kit provided by Microsoft.

Several libraries used in the Processing framework:

1. The Midibus: a MIDI library for processing, to enable application to send and receive MIDI messages (Smith, 2016).

2. ControlP5: a library for GUI used in the processing (Schlegel, 2015).

Figure 7 shows how this application works and the role of each component in the whole system. First Kinect sensor captures video stream and depth information of an object in front of the Kinect sensor. This raw data has to be processed further using an SDK or other open source libraries to determine whether the object is a human figure or not.

This raw data is then sent to the host computer. Processing framework and with Kinect SDK processed this data into 3 di↵erent type of data: skeletal tracking data, MIDI signal, and video stream. After raw data processed by SDK, information of skeletal tracking generated and ready to be used. Skeletal tracking information contained the x, y, and z coordinate of human joints and bones representation in video stream.

4.1.1 Hardware Requirements

According to Microsoft, there are several PC/laptop minimum requirements to have, in order to keep a good frame rate and performance of the application using Kinect V2 including:

1. 64-bit processor

2. Physical dual core processor 3.1 Ghz or faster

17 Figure 7: Kinect-based Music Application Process Diagram

3. 4 GB of RAM

4. USB 3.0 port

5. Graphic card that supports DirectX 11

During this research, a laptop computer used with specification:

1. 64-bit Intel i5 2.4 Ghz processor

2. 4 GB of RAM

3. USB 3.0 port

4. Intel Iris 5100 graphic card

5. Microsoft Windows 10

18 4.1.2 Skeletal Tracking

With the SDK provided by Microsoft, Kinect V2 is able to detect up to 25 body joints. The information of the body joints is composed by a number of x, y, and z coordinates. By connecting specific joints with a straight line, an approximate representation of the bone structure can be formed. Even though there are in all 25 joints information that available, only some of these are visible in this application (Figure 8). It is because all of our participants sat on a wheelchair, making the joints from the lower part of the body unusable. The joints that are used in this application were: arm, shoulder, head, neck, upper spine, and lower spine. Skeletal tracking of the user works in real time with 30 frames or detection per second. In our testing session this frame rate works very well and in a well-controlled environment, providing an accurate skeletal tracking.

Figure 8: Upper Body Joints and Pelvis Joints

4.1.3 Joint Angle Calculation

In order to play the music instrument, user of this program has to move their arm and shoulder. By moving their arm and shoulder up and down, angles are formed and then transformed into MIDI message by this application. For example, when a user put their arms straight pointing down, it makes a 0 degree of shoulder angle. But then if user put their arms straight pointing up, then the user makes a 180-degree of shoulder angle. Therefore, when user put their arms between this positions, a certain angles can be formed between 0 to 180 degrees. This integer value of angles is used to control the volume of the sound and what note to play. There are 4 joint angles that are used in this application: angle on left arm, angle on the right arm, angle under left shoulder, angle under right shoulder.

19 4.1.4 Joint Angle to MIDI Message

After getting the integer values of 4 joints angle, these values need to be handled so that a sound can be triggered from the movement of the arm and shoulder. Each of the angles is an integer value from 0 to 180. To overcome this problem, the integer value can be processed into a MIDI message and sent to the DAW. There were 2 MIDI messages that are used in this application: NoteOn and NoteO↵. NoteOn message carries the information of what note to be played and in what velocity the note should be played. On the other hand, NoteO↵ message sent when the active note needs to be ended. Using normalization, the value of 0 to 180 from the angle normalized to 1 to 127, an 8 bit data that MIDI message can carry.

However, in real deployments of the application, a user might not generate the full 0 to 180 range of values. Some of the users may have a physical limitation that prevents the to move their arms freely. They may have diculties to put their arm straight pointing up, and unable to make a shoulder angle of 180 degree. As a solution, a calibration could be used to adapt each individual user condition. Meaning, a limit of 0 and 180 can be increased or decreased and normalized to the a 0 to 180 degree range.

4.1.5 User Interface

The PACT (People Activity Context Technology) analysis was used to get a better under- standing of requirements in designing a user interface(Benyon, Turner, & Turner, 2005). In PACT analysis, we identified who the user groups were, what activities were conducted while using our application, in what context the application was used, and what types of technology were used in the application. Based on that information, we made scenario that gave us view on how the application should work. Thus, it gave us idea about what fac- tors we should focus on when designing a better user interface for the next design process iteration of the application. All the factors were described below.

1. People: Usage di↵erences: Participants that joined this study were children from 6 to 15 years old. However, because of the system complexity, it was better to have at least one other person to set up the hardware and be ready to operate the application during the session. This person could be teacher, assistants, or parents. Therefore there are two types of roles in this application usage: the user and the operator.

2. Activity: There were 2 activities involved while using this application.

20 Play instrument: the main usage of this application was to motivate children with disabilities to play a musical instrument by moving their hands in front of the Kinect sensor. Set up and operate the application: teachers, personal assistants, or parents had a role to set up the components needed by the application.

3. Context: Physical environment: The usage of this application would be indoor, with a well lit room condition. A clear background was also needed in order to have a good tracking performance of the system. Social context: Due to the limitation of the application, it was not possible to have a more than one user at one time. Thus, there would be only a user and minimum of one operator during the study. The interaction that would occur was only between them.

4. Technology: Input: Kinect sensor used as an input device. The Kinect camera must be put on the table or tripod to make sure that the position of the user is covered and that the Kinect is aimed straight at the user. Output: The application would be run in a computer with a screen showing the video streaming of the user playing and also the interface of the application. The video streaming can also be projected directly on to the wall using a digital projector.

Based on our PACT factors, we did not found much di↵erences in the context and technology part. Therefore, our scenarios would be first, participant play the musical instrument and second, the operator operate the application. With these two di↵erent scenarios, we conclude that we have to focus on people and activity factors when designing the interface. Below are our analysis on how we should design the interface based on people and activity factor.

1. There were two group of users that were using this application. The first group of users were the children who were playing with this application. This group of participants would only interact with the camera and screen without changing any controls or parameters. The second group of users was the teacher and assistants. They would operate the application and choose any instruments to be played by the participants.

2. None of these user groups were expected to be an expert in using this application, thus a very simple, less customization, and intuitive control would be preferred. We have to separate this activity, so it would not distract each other. We designed the control interface to be placed only on the sides of the screen so the operator could

21 easily operate the application while the participants can focus only on their reflection on the middle of the screen.

Figure 9: User Interface of the Application

22 5 Result

In this section we will show the result of our pilot sessions based on our evaluation. We eval- uated the application based on interviews done with the teachers and the personal assistants after each pilot session. Because of participants’ physical conditions, it is not possible to have an interview directly with them. However, teachers and personal assistants who spent most of the time with them at school have an very valuable experience in knowing what kind of expression or feeling that the children were having during pilot sessions. Knowing this information, we asked teachers and personal assistants several questions regarding the children’s feeling and also what they think about the potential of this application in the future.

5.1 Tracking

Figure 10: Success Tracking

By looking at figure 10 we can see that the participant is tracked accurately. How- ever, several times it failed to detect a child, primarily due to the sitting position of the participant. We found out that this sitting position really made a di↵erence for this ap- plication to distinguish a human figure. Not least was there a disturbance induced by the features of the wheel chair, providing the detection of ”false limbs”.

In Kinect V2, there are 25 di↵erent joints that can be detected. It would return a

23 value of true in human detection if all of these 25 joints are detected in the same time. The problem here was that all of the participants are sitting on a wheelchair. The presence of wheelchair parts on the tracking area led to the failure of tracking. For instance, the lower part of the wheelchair would distract the Kinect sensor to detect the feet. It could detect the upper body, but it could not detect the lower part of the body accurately. Therefore the Kinect sensor in some cases failed to detect a human figure, whereas it very quickly discovered someone who was standing up.

One promising solution that we found during the session was to let the participant sit on a big gymnastics ball with the help of the personal assistant. The simple shape of the ball did not distract the human figure, thus making the tracking capability worked better than when participant was sitting on a wheelchair. However, this position had other drawbacks, such as a dicult sitting position during the filming session, not least for the personal assistant and the teachers.

Compared to Kinect V2, Kinect V1 can only detect 20 di↵erent joints. However, we found out that Kinect V1 has a specific ”sitting mode”, making a simpler tracking mechanism to detect human figure in sitting position. It would only detect several upper body joints and leave the lower body joints untracked. This feature has been removed in Kinect V2, making it more dicult to detect a person sitting on a chair or a wheelchair.

This answers our first research question about the capability of Kinect sensor that could be used in a music application. As an input device, we found that Kinect sensor worked very well in a music application that we develop. In most cases, its sensors could detect human figure, track 25 joints, and by calculating joint angles it could control the music instrument very well. However, there were some important conditions to consider if the user is using a wheelchair and the general solution to this problem was not found.

Another problem that we encountered during the pilot session was the false track- ing. Several times the Kinect sensor falsely detected the wheel chair parts, such as the arm rests instead of the users arm, and put the joint tracking on the wheel chair arm (figure 11). This problem could be avoided by removing the handle of the wheel chair. However, this also caused the child to become less stable in the chair, which meant that there was a constant need for monitoring.

The last problem that we found during the pilot session was the clothing of the user. One of the participants wore a sweater jacket during one of the pilot session. It was very hard to detect the participant, until we remove the sweater jacket. This sweater jacket changed the appearance of the participant arms, thus making Kinect sensor hard to detect them.

24 Figure 11: False Detection of Arm

This answers our second research question about the relation between the type or degree of disability and the performance of the application. Based on our result, our application performance did not rely on the type and degree of specific disability. The problem in tracking a human figure only occurred because of the wheelchair and the clothes that participants used. Our suggestion regarding this problem is, to have a simple form of support rather than a complicated wheelchair. A chair without handle or a standing support that made of transparent fiber would work. Standing support is also a beneficial position for many of the children, in that it relieves them of the pressures from sitting. Normally the standing position becomes boring for the children, but in this case they are very motivated.

5.2 Participant Response

After every pilot session a unstructured interview with the personal assistant and teacher was held to find out response of the participants. Participants in this study were children with mental or physical disability, thus making it dicult (or impossible) for them to express their experiences in order to compare to the children without these disabilities.

Based on the interview, teachers and personal assistants agreed that this applica- tion showed a potential to be used as therapy tools in the future. During the pilot session we found that some of the participants had diculties in the beginning to understand that their arm movement could generate a sound. Several approaches were made to motivate

25 the participants to move their arms and hands. For example, we tried to communicate to them verbally, we physically helped them to move their arms and hands, and also tried to give them some object to grab in a distance, so they have to move their arms and hands. But then finally, as they found out that their hand movement could generate a sound, they would start moving their hands in the air and showed an excitement.

Another key point of this application regarding to the participants response was the capability of changing sound of the virtual instrument. The operator could easily change the instrument sound by clicking the instrument part in the DAW. Each of our participants had a di↵erent preference of sound, and a wrongly chosen sound led to a disappointment and less exciting response.

This answers our third research question. Our participants showed no response in the beginning of the study, because they did not understand how the application works yet. After few minutes trying, they finally understood and slowly tried to move their hand to control the sound.

26 6 Discussion and Future Research

In this section we will discuss several areas of improvements that can be considered while developing similar application in the future.

6.1 Calibration and Accuracy

Even though we saw that Kinect-based music application has a potential to be used as a musical instrument alternative, however before being released as a consumer product, the discovered problems relating to accuracy must be decreased to minimal. The main capabilities of Kinect sensor is the tracking of a human figure, and if this function does not work accurately, the application itself would not work well.

Calibration is one approach to decrease this problem. One research suggests that to improve accuracy and to have a faster detection, multiple Kinect camera units can be used collaboratively (Li, Pathirana, & Caelli, 2014). Other researches and suggest adding an external sensor other than Kinect to be used when using a Kinect-based application (Guevara, Vietri, Prabakar, & Jong-Hoon Kim, 2013) (Bo, Hayashibe, & Poignet, 2011). For example, an exoskeleton or external wearable sensor such as gyroscopic sensor could work. As a result, further research of calibration method to increase accuracy is needed.

6.2 Portability, Ease of Use, and Price

The main advantage of Kinect sensor compared to machines used in physiotherapy is, these machines are usually used for a particular part of the body therapy, while a single Kinect can be used virtually for any part of the body (Roy, Soni, & Dubey, 2013). During our research, Kinect always be carried in one single medium bag along with the laptop computer for running the application. This makes Kinect portable, a lot easier to use, and cheaper to maintenance. A number of researches also showed Kinect-based application successfully used in home environment rehabilitation for stroke patients and for traumatic brain injury patients (Pastor, Hayes, & Bamberg, 2012) (Venugopalan, Cheng, Stokes, & Wang, 2013). A further research should also be done in this area, for example comparing Kinect with other depth camera on its portability, ease of use, and price.

To ensure the portability of Kinect-based music application, a guideline of how to set up Kinect in a home environment should be formulated. During our study, we made a simple instruction manual (Appendix A) on how to set up the application. However, a

27 further usability evaluation of this guideline is needed to make sure this guideline is usable in most condition.

6.3 Support

Our pilot study shows a false tracking of arms when the participant sat on a wheelchair. Sometimes the wheelchair’s handle tracked as a human arm. A further study focusing on the use of invisible supports must be done, to enable users with limited mobility to use a similar application without interruption caused by their support tools.

6.4 Connected System

Researches showed the successful implementation of Kinect-based application in home en- vironment, even without a therapist supervision (Pastor et al., 2012) (Venugopalan et al., 2013) (Exell et al., 2013). However, a home environment therapy still needs to be monitored by patient’s therapist. A system that connected to Internet could be built so therapist can monitor their patient’s condition during the home therapy.

6.5 Evaluation Method

Because of participants’ physical conditions we were unable to gather a satisfaction level of the user. A wider group of participants for instance, could be asked to participate in future research to be able to get a more accurate evaluation of similar application.

28 7 Conclusion

Kinect-based music application that we made showed a promising result as tools for people with physical disability in music playing session. Our background study also found that Kinect is a portable, cheap, and easy to use device, and it appears to have a large probability to be used in home without a therapist.

However, accuracy is still one biggest problem with this application. Using the SDK provided by Microsoft, there were several occasion that Kinect falsely detect arm of the user, or even failed to detect the human figure. An external calibration device or multiple Kinect usage proposed as a possible solution to overcome this problem.

29 References

Bate, P. (1976). Instruments for the disabled. The Galpin Society Journal, 29 , 127-128. Retrieved from http://www.jstor.org/stable/841876 Benyon, D., Turner, P., & Turner, S. (2005). Designing interactive systems: People, activi- ties, contexts, technologies. Pearson Education. Bo, A., Hayashibe, M., & Poignet, P. (2011). Joint angle estimation in rehabilitation with inertial sensors and its integration with kinect. In Embc’11: 33rd annual international conference of the ieee engineering in medicine and biology society (pp. 3479–3483). Cada, E. A., & O’Shea, R. K. (2008). Identifying barriers to occupational and physical therapy services for children with cerebral palsy. Journal of pediatric rehabilitation medicine, 1 , 127–135. Chang, Y.-J., Chen, S.-F., & Huang, J.-D. (2011, November). A Kinect-based system for physical rehabilitation: A pilot study for young adults with motor disabilities. Research in Developmental Disabilities, 32 (6), 2566–2570. Christy, J. B., Chapman, C. G., & Murphy, P. (2012). The e↵ect of intense physical therapy for children with cerebral palsy. Journal of Pediatric Rehabilitation Medicine: An Inderdisciplinary Approach, 5 . Exell, T., Freeman, C., Meadmore, K., Kutlu, M., Rogers, E., Hughes, A.-M., . . . Burridge, J. (2013). Goal orientated stroke rehabilitation utilising electrical stimulation, iterative learning and microsoft kinect. In Rehabilitation robotics (icorr), 2013 ieee international conference on (pp. 1–6). Gonzlez-Ortega, D., Daz-Pernas, F., Martnez-Zarzuela, M., & Antn-Rodrguez, M. (2014, February). A Kinect-based system for cognitive rehabilitation exercises monitoring. Computer Methods and Programs in Biomedicine, 113 (2), 620–631. Retrieved 2016- 06-12, from http://linkinghub.elsevier.com/retrieve/pii/S0169260713003568 doi: 10.1016/j.cmpb.2013.10.014 Gu´erin, R. (2005). Midi power!: The comprehensive guide. Course Technology Press, Boston. Guevara, D. C., Vietri, G., Prabakar, M., & Jong-Hoon Kim. (2013, May). Robotic Exoskele- ton System Controlled by Kinect and Haptic Sensors for Physical Therapy. In (pp. 71–72). IEEE. Retrieved 2016-08-18, from http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=6525681 doi: 10.1109/SBEC.2013.44 Jim, S. (2014, Aug). Roland mpu-401. Retrieved from [http://www.shepherdjim.com/ roland-mpu-401-3] Kanagasabai, P. S., Mulligan, H., Mirfin-Veitch, B., & Hale, L. A. (2014, December). Asso- ciation between motor functioning and leisure participation of children with physical disability: an integrative review. Developmental Medicine & Child Neurology, 56 (12),

30 1147–1162. Li, S., Pathirana, P. N., & Caelli, T. (2014). Multi-kinect skeleton fusion for physical rehabilitation monitoring. In 2014 36th annual international conference of the ieee engineering in medicine and biology society (pp. 5060–5063). Magee, W. L. (2006). Electronic technologies in clinical music therapy: A survey of practice and attitudes. Technology and Disability, 18 , 139–146. Microsoft. (2015, Nov). Kinect-powered rehab system gets fda clearance. Re- trieved from https://blogs.msdn.microsoft.com/kinectforwindows/2015/11/ 11/kinect-powered-rehab-system-gets-fda-clearance/ Oestreicher, L. (2016). Mumin projektet. Retrieved from https://muminprojektet .wordpress.com/ Pastor, I., Hayes, H. A., & Bamberg, S. J. (2012). A feasibility study of an upper limb rehabilitation system using kinect and computer games. In 2012 annual international conference of the ieee engineering in medicine and biology society (pp. 1286–1289). Processing. (2015). A short introduction to the processing software and projects from the community. Retrieved from https://processing.org/overview/ Roland. (2014, Feb). Roland synth chronicle: 1973-2014. Retrieved from http://www.rolandus.com/blog/2014/02/19/roland-synth-chronicle-1973 -through-2013/ Roy, A. K., Soni, Y., & Dubey, S. (2013). Enhancing e↵ectiveness of motor rehabilitation using kinect motion sensing technology. In Global humanitarian technology conference: South asia satellite (ghtc-sas), 2013 ieee (pp. 298–304). Schlegel, A. (2015, 11). A short introduction to the processing software and projects from the community. Retrieved from http://www.sojamo.de/libraries/controlP5/ Smith, S. (2016). The midibus. Retrieved from http://www.smallbutdigital.com/ themidibus.php Venugopalan, J., Cheng, C., Stokes, T. H., & Wang, M. D. (2013). Kinect-based rehabilita- tion system for patients with traumatic brain injury. In 2013 35th annual international conference of the ieee engineering in medicine and biology society (embc) (pp. 4625– 4628). World, T. (2005, Dec). What’s a theremin? Retrieved from http://www.thereminworld .com/Article/14232/whats-a-theremin

31 Appendices

Appendix A: Instruction Manual

32 33