<<

Cinemacraft: Immersive Live as an Empathetic Musical Storytelling Platform

Siddharth Narayanan Ivica Bukvic Electrical and Computer Engineering Institue for Creativity, Arts & Technology Virginia Tech Virginia Tech [email protected] [email protected]

ABSTRACT or joystick manipulation for various body motions, rang- ing from simple actions such as walking or jumping to In the following paper we present Cinemacraft, a technology- more elaborate tasks like opening doors or pulling levers. mediated immersive machinima platform for collaborative Such an approach can often result in non-natural and po- performance and musical human-computer interaction. To tentially limiting interactions. Such interactions also lead achieve this, Cinemacraft innovates upon a reverse-engineered to profoundly different bodily experiences for the user and version of , offering a unique collection of live may detract from the sense of immersion [10]. Modern day machinima production tools and a newly introduced game avatars are also loaded with body, posture, and ani- HD module that allows for embodied interaction, includ- mation details in an unending quest for realism and com- ing posture, arm movement, facial expressions, and a lip pelling human representations that are often stylized and syncing based on captured voice input. The result is a mal- limited due to the aforesaid limited forms of interaction. leable and accessible sensory fusion platform capable of These challenges lead to the uncanny valley problem that delivering compelling live immersive and empathetic mu- has been explored through the human likeness of digitally sical storytelling that through the use of low fidelity avatars created faces [11], and the effects of varying degrees of re- also successfully sidesteps the uncanny valley. alism in visualizing players’ hands [12]. Some titles have taken an alternative approach towards sidestepping the un- 1. INTRODUCTION canny valley by utilizing simplified cartoon-like graphics. For instance, in Minecraft, a sandbox-like gaming environ- Recent revolution in the area of off-the-shelf immersive ment that has attained an unprecedented of popularity, technologies and phenomenology has changed the way users the simple avatar movements and highly stylized cartoon- interact with games, media, and the arts. Creative projects and 8bit-like graphics [13] make it ”work” because of a have actively adopted the Microsoft Kinect [1], along with strong mix of the game’s aesthetic sensibility, open-ended an array of affordable alternative all-in-one consumer-level design, mechanics, development history, and the creative devices to explore novel interactions and activities of its players [14]. A growing number of research perceptions. Music-centric research, in particular, has fo- projects explore Minecraft as a platform for enhanced im- cused on generating sound through user’s movement [2], a mersion [15] and with full body and facial immersion [16]. dancer being the ideal user candidate due to the high level However, with the exception of HTC Vive’s VR implemen- of body awareness, physical ability, and movement vocab- tation of Minecraft [17] they have remained by and large ulary. Experiments have also integrated human interaction confined to complex, difficult to reproduce setups. In ad- into performance, either as stylized body movements [3] or dition, such projects often require motion-capture data to through the use of virtual interfaces [4]. Some have gone be aligned with other information, resulting in a compli- as far as to create new environments for musical compo- cated endeavour when utilizing a combination of devices sition and performance [5]. games have also served [18]. This limitation also impedes on the level of immer- as a rich foundation for artistic expression using immersive sion the platform is capable of delivering. Mutli-sensory devices through machinima [6] and [7]. immersive virtual environments have been shown to be ca- These new modes of interaction are also significant boon pable of inducing emotional responses [19] and immersion to the level of immersion in technology. Stud- has also been shown to significantly depend on the aspects ies have shown that an embodied interaction is correlated of virtual presence like emotion [20]. Such emotional im- to players’ increased engagement [8] and a stronger af- mersion [21] also promotes the building of a bond between fective experience [9]. Most modern video games, how- the user and in game avatars in the . In par- ever, continue to be played using low fidelity interaction ticular, it helps in building empathy for the characters, an devices–interactions between video game characters and important aspect of storytelling [22] and also a significant the user are often accomplished through keyboard presses challenge for the , where delivering compelling characters continues to be one of the primary Copyright: c 2016 Siddharth Narayanan et al. This is an open-access challenges [23]. article distributed under the terms of the Creative Commons Attribution While previous related work has focused on promoting License 3.0 Unported, which permits unrestricted use, distribution, and empathy, primarily through musical experiences [24], vi- reproduction in any medium, provided the original author and are sual modality is also seen to contribute significantly to the credited. perception of emotion in a musical performance. In par-

384 2017 ICMC/EMW ticular, the ability to experience expressive body language The platform is aimed at extending the frontiers of col- has been shown to be important in perceiving emotional laborative content creation as well as broadening audience Cinemacraft: Immersive Live Machinima as an Empathetic Musical intensity of a performance [25]. Although typically unin- impact to enhance creativity and emotional experiences. Storytelling Platform tended and similar to the expressive that accom- pany speech [26], these ancillary gestures serve to commu- nicate information which clarifies performer’s emotional 2.1 The OPERAcraft Lineage intentions. It is through these ancillary gestures that peo- Siddharth Narayanan Ivica Ico Bukvic The OPERAcraft platform [30], a precursor to Cinemacraft Electrical and Computer Engineering Institue for Creativity, Arts & Technology ple are able to differentiate between musicians playing in a restrained and expressive manner even in the absence of was envisioned as an environment to aid creativity and Virginia Tech Virginia Tech thinking skills and better self-expression, with particular [email protected] [email protected] audio [27]. Expressive body movements have also been shown to be significant for musicians in conveying emo- focus on the K-12 education opportunities. tional intentionality as a visual component during a mu- sical performance to enhance the observer’s identification ABSTRACT or joystick manipulation for various body motions, rang- with the performer [28]. This has also been explored through ing from simple actions such as walking or jumping to the integration of motion-capture and video-analysis tech- In the following paper we present Cinemacraft, a technology- more elaborate tasks like opening doors or pulling levers. niques into music-centric research [29]. However, the use mediated immersive machinima platform for collaborative Such an approach can often result in non-natural and po- of consumer-level motion capture devices for the immer- performance and musical human-computer interaction. To tentially limiting interactions. Such interactions also lead sive, embodied, and affective computing, as well as com- achieve this, Cinemacraft innovates upon a reverse-engineered to profoundly different bodily experiences for the user and pelling story telling within the context of performative art version of Minecraft, offering a unique collection of live may detract from the sense of immersion [10]. Modern day has yet to be fully explored. machinima production tools and a newly introduced Kinect game avatars are also loaded with body, posture, and ani- HD module that allows for embodied interaction, includ- mation details in an unending quest for realism and com- 1.1 Motivation ing posture, arm movement, facial expressions, and a lip pelling human representations that are often stylized and syncing based on captured voice input. The result is a mal- limited due to the aforesaid limited forms of interaction. It appears the potential for immersive environments to serve leable and accessible sensory fusion platform capable of These challenges lead to the uncanny valley problem that as a compelling and easily accessible platform for musical delivering compelling live immersive and empathetic mu- has been explored through the human likeness of digitally expression, empathy, and storytelling remains largely un- sical storytelling that through the use of low fidelity avatars created faces [11], and the effects of varying degrees of re- derutilized. This is in good part due to the aforesaid un- Figure 1. Inspiring K-12 students to create stories through a live produc- also successfully sidesteps the uncanny valley. alism in visualizing players’ hands [12]. Some titles have canny valley challenge, as well as due to commonly com- tion of Operacraft taken an alternative approach towards sidestepping the un- plex, site-specific, and/or cost-prohibitive nature of the cur- rent solutions. Inspired by the successes of projects like It was built as an arts+technology education platform where 1. INTRODUCTION canny valley by utilizing simplified cartoon-like graphics. For instance, in Minecraft, a sandbox-like gaming environ- Minecraft, we see this as an opportunity to introduce an students could write a story and libretto, build a virtual set, Recent revolution in the area of off-the-shelf immersive ment that has attained an unprecedented level of popularity, accessible and affordable alternative that seeks to sidestep costumes or virtual character skins, and ultimately control technologies and phenomenology has changed the way users the simple avatar movements and highly stylized cartoon- uncanny valley by employing cartoon-like environment with- the characters within the virtual setting in a live perfor- interact with games, media, and the arts. Creative projects and 8bit-like graphics [13] make it ”work” because of a out sacrificing the emotional and expressive depth. As ev- mance accompanied by live singers and musicians. Many have actively adopted the Microsoft Kinect [1], along with strong mix of the game’s aesthetic sensibility, open-ended idenced by the cartoon and gaming industry, such a plat- of these affordances are inherent to Minecraft platform– an array of affordable alternative all-in-one consumer-level design, mechanics, development history, and the creative form would not have to sacrifice the empathy or the story- users can easily sculpt the landscape, interact with it, and motion capture devices to explore novel interactions and activities of its players [14]. A growing number of research telling potential. In addition, as part of this paper we see change their own appearance. Others were added as part perceptions. Music-centric research, in particular, has fo- projects explore Minecraft as a platform for enhanced im- the newfound white space as an opportunity to broaden of the reverse engineering effort, resulting in a that cused on generating sound through user’s movement [2], a mersion [15] and with full body and facial immersion [16]. the definition of interfaces for musical expression, to in- is deeply integrated into Minecraft’s core. These include dancer being the ideal user candidate due to the high level However, with the exception of HTC Vive’s VR implemen- corporate enabling technologies that can project a com- character lip syncing based on the singer’s input processed of body awareness, physical ability, and movement vocab- tation of Minecraft [17] they have remained by and large pelling musical experience in an immersive technology- through the Pd-L2Ork [31] and forwarded to a FUDI-based ulary. Experiments have also integrated human interaction confined to complex, difficult to reproduce setups. In ad- mediated environment, including telematics. In this pa- parser via a UDP socket embedded inside reverse-engineered into performance, either as stylized body movements [3] or dition, such projects often require motion-capture data to per we focus primarily on the machinima-like implemen- version of Minecraft, audience subtitles and stage cues only through the use of virtual interfaces [4]. Some have gone be aligned with other information, resulting in a compli- tations that leverage existing gaming environments as a po- visible to the actors, ability to change between discrete arm as far as to create new environments for musical compo- cated endeavour when utilizing a combination of devices tentially compelling storytelling platform. positions and interpolate between them to provide rudi- sition and performance [5]. Video games have also served [18]. This limitation also impedes on the level of immer- mentary body language, and near-instantaneous scene changes as a rich foundation for artistic expression using immersive sion the platform is capable of delivering. Mutli-sensory through coordinated character teleportations and scene cross- devices through machinima [6] and digital puppetry [7]. immersive virtual environments have been shown to be ca- 2. CINEMACRAFT fades. In an ongoing pursuit of building a compelling real- These new modes of interaction are also significant boon time machinima production platform, the second genera- pable of inducing emotional responses [19] and immersion Cinemacraft is a novel technology-mediated immersive ma- to the level of immersion in video game technology. Stud- tion of OPERAcraft introduced in the fall 2015 as part of has also been shown to significantly depend on the aspects chinima platform for collaborative performance and musi- ies have shown that an embodied interaction is correlated the second opera production offers additional affordances, of virtual presence like emotion [20]. Such emotional im- cal human-computer interaction. It innovates on a custom, to players’ increased engagement [8] and a stronger af- including multiple camera views and cameras that are only mersion [21] also promotes the building of a bond between reverse-engineered version of Minecraft to offer a collec- fective experience [9]. Most modern video games, how- visible to the actors, invisible bystanders, as well as stabil- the user and in game avatars in the virtual world. In par- tion of live theatrical and cinematic production tools, and ever, continue to be played using low fidelity interaction ity improvements and optimizations that allowed the mod ticular, it helps in building empathy for the characters, an leverages the Microsoft Kinect HD for embodied inter- devices–interactions between video game characters and to scale beyond the original limit of five actors. important aspect of storytelling [22] and also a significant action, including posture, arm movement, facial expres- the user are often accomplished through keyboard presses challenge for the video game industry, where delivering sions, and through the sensory fusion lip syncing based 2.2 New Platform compelling characters continues to be one of the primary on captured voice input. It is designed as an out-of-box Copyright: c 2016 Siddharth Narayanan et al. This is an open-access challenges [23]. turnkey solution that sidesteps the uncanny valley by uti- Cinemacraft builds on the live performance capture aspects article distributed under the terms of the Creative Commons Attribution While previous related work has focused on promoting lizing the extremely popular sandbox-like gaming environ- of the existing system through the integration of the Mi- License 3.0 Unported, which permits unrestricted use, distribution, and empathy, primarily through musical experiences [24], vi- ment, for simple yet compelling storytelling along with crosoft Kinect HD [1] device to provide a more immer- reproduction in any medium, provided the original author and source are sual modality is also seen to contribute significantly to the multiple live camera views, scene changes, subtitles, lip sive and expressive embodied experience in a virtual world credited. perception of emotion in a musical performance. In par- sync, production-centric stage cues, and virtual audience. including both kinematic data and facial expressions. It

HEARING THE SELF 385 in many ways supplants previously keyboard controlled 2.3 Modes of Interaction arm expressions and instead provides full body immersion Cinemacraft offers different modes of embodied interac- to the extent allowed by the simple skeletal structure of tion captured by Kinect HD, namely regular, mirrored, and the Minecraft avatars that lack hands, elbows, and knees, upper torso only. For instance, if the user prefers to both and further enhances expressivity by tracking facial ex- control the avatar and act out the gestures and facial ex- pressions. Based on the user tests and audience feedback, pressions they can gesticulate to the other players to com- the avatar remains compelling despite the minimal char- plement their speech and chat messages and thereby in- acter design due to the reported feeling of sentience of- crease the effectiveness of the conversations, while still fered by the user’s real-life body motion and facial expres- being able to navigate the expansive landscape outside the sions. As a result, the avatar can show a dynamic range of range afforded by the area monitored by Kinect HD using emotional reactions and responses. In cinematic terms, the more conventional controls (e.g. keyboard). The same can avatar no longer appears to be merely acting. Rather, it is be also applied in the gaming scenarios, as well as hybrid the actor who is responding to their projection in and the situations where a separate user controls the avatar while situational awareness of the virtual environment. Sponta- an operatic singer, for instance, provides only upper body neous reactions like squinting against a sudden bright light language. Similarly, the mirroring mode has been added help to humanize characters and make them more com- to explore illusory experience interactions with the avatar, pelling than current game characters that seem shallow and most notably through the Mirrorworlds research project fo- with whom we have a hard time forming compelling, co- cusing on the study of integration of physical and virtual herent relationships [23]. mirrored presence [33]. These modes provide an oppor- tunity to draw parity between the different approaches to machinima and also open new exciting possibilites for sen- sory fusion, with the introduction of HMD displays along -based and haptic controllers. Cinemacraft also in- herits a battery of OPERAcraft’s cinematic tools, empow- ering user to explore the methods of live machinima pro- duction, including live theatrical play, as well as produced cinema. The virtual audience feature, that enables audi- ence members to freely roam the scene, or the ensuing world in which the storytelling takes place, offers new re- search opportunities in the study of perception of story telling, drama, and empathy as a function of vantage point.

3. IMPLEMENTATION Cinemacraft mirrors the user’s motions and expressions in real time within Minecraft using the Kinect HD sensor and a custom C# client that leverages Kinect HD API. Our plat- Figure 2. Cinemacraft - Avatar in action form expands on the previously modified Minecraft 1.5.2 codebase from the OPERAraft project to make acting more expressive and closer to the levels of post-produced ma- chinima. The C# Kinect application and Cinemacraft mod Sensory fusion and multi-sensory stimulation has been for Minecraft are packaged as a drop-in mod and an in- shown to help in illusorily experiencing a virtual body and dependent executable file, respectively. A major consid- virtual body parts [32]. Its use within the context of em- eration for our setup was accessibility. For this reason, bodied interaction can be particularly useful in situations although earlier prototypes relied on two first generation where one sensory input does not provide adequate resolu- Kinects, due to their observed inability to simultaneously tion and could benefit from secondary inputs that improve do body and face tracking, the current implementation re- its accuracy and fidelity. In Cinemacraft, such a situation lies on just one second generation Kinect HD device re- can be observed when trying to monitor mouth movement. sponsible both for kinematic and facial tracking. Kinect HD, particularly when a user is located farther away In order for the Kinect HD client to interface with the from the sensor, so as to allow for full-body capture, is not Minecraft mod, core Minecraft was reverse engineered and capable of providing accurate capture of fast and varied retrofitted with a FUDI-compliant protocol [34] capable of shapes generated by the mouth. In this case, sensory fusion parsing remote messages. Its simpler version is already through the use of voice analysis can be an appropriate way found in OPERAcraft where it was used to coordinate var- of complementing visual capture. Similarly, in situations ious aspects of the production, including switching cam- where a user has their mouth opened and are making no era angles, lip syncing as detected by the singers’ micro- sound, visual capture needs to take precedence as the only phones, subtitles, and stage cues. As a result these could be reliable source of information. The current Cinemacraft handled remotely through multiple distributed Pd-L2Ork version implements OPERAcraft-based avatar lip syncing clients [31]. based on the simple transient and melisma detection that One of the main objectives of the system is to provide is cross-pollinated with the Kinect HD-based capture with a compelling emotional delivery of storytelling within the the two swapping precedence based on the context. context of arts. Therefore, it was important to test how re-

386 2017 ICMC/EMW in many ways supplants previously keyboard controlled 2.3 Modes of Interaction of 30 frames per second, as well as audio-centric outliers, arm expressions and instead provides full body immersion such as the cartoon-like quivering of lips in a sung operatic Cinemacraft offers different modes of embodied interac- to the extent allowed by the simple skeletal structure of melisma. Conversely, we also relied on the depth camera tion captured by Kinect HD, namely regular, mirrored, and the Minecraft avatars that lack hands, elbows, and knees, feed in situations when no sound was being made, e.g. to upper torso only. For instance, if the user prefers to both and further enhances expressivity by tracking facial ex- portray gaping mouth. The ensuing implementation uti- control the avatar and act out the gestures and facial ex- pressions. Based on the user tests and audience feedback, lizes a simple logic by which the two sensory inputs are pressions they can gesticulate to the other players to com- the avatar remains compelling despite the minimal char- given precedence. plement their speech and chat messages and thereby in- acter design due to the reported feeling of sentience of- crease the effectiveness of the conversations, while still fered by the user’s real-life body motion and facial expres- being able to navigate the expansive landscape outside the 4. REAL WORLD TESTING & USER FEEDBACK sions. As a result, the avatar can show a dynamic range of range afforded by the area monitored by Kinect HD using emotional reactions and responses. In cinematic terms, the To date, Cinemacraft was showcased as part of two high more conventional controls (e.g. keyboard). The same can avatar no longer appears to be merely acting. Rather, it is profile exhibitions with two additional opportunities pend- be also applied in the gaming scenarios, as well as hybrid the actor who is responding to their projection in and the ing. The team has used such opportunities to iteratively situations where a separate user controls the avatar while situational awareness of the virtual environment. Sponta- improve upon and refine the design, as informed by the an operatic singer, for instance, provides only upper body neous reactions like squinting against a sudden bright light outcomes demonstrations and real world user feedback. In language. Similarly, the mirroring mode has been added help to humanize characters and make them more com- particular, the prototype was showcased at Virginia Tech’s to explore illusory experience interactions with the avatar, pelling than current game characters that seem shallow and official exhibit at South by Southwest 2016 [35], and as most notably through the Mirrorworlds research project fo- with whom we have a hard time forming compelling, co- part of ICAT day showcase at the Moss Arts Center in Vir- cusing on the study of integration of physical and virtual Figure 3. Cinemacraft - Kinect, client, server architecture herent relationships [23]. ginia Tech [36]. More recently, Cinemacraft has also been mirrored presence [33]. These modes provide an oppor- chosen to be displayed at the Science Museum of South- tunity to draw parity between the different approaches to alistically the avatar could reflect that user’s kinematic and west Virginia [37]. As a result of the strong response to machinima and also open new exciting possibilites for sen- facial expression. As a result, the system was developed and interest in the tool, it has also been selected to be in- sory fusion, with the introduction of HMD displays along using iterative design approach. Its testing focused primar- tegrated in the Virginia Tech Visitor Center. Both exhibits gesture-based and haptic controllers. Cinemacraft also in- ily on assessing the perceived visual fidelity of mirroring are scheduled to open in the winter of 2017. herits a battery of OPERAcraft’s cinematic tools, empow- user’s interaction. ering user to explore the methods of live machinima pro- User feedback from real world assessment also helped us duction, including live theatrical play, as well as produced 3.1 Sensory Fusion iteratively improve the platform performance. A number cinema. The virtual audience feature, that enables audi- of problems were eliminated during the early assessment, The emphasis on ease of use and reliance only a single ence members to freely roam the scene, or the ensuing a prominent one being the jittery avatar movements due Kinect HD device requires our implementation to essen- world in which the storytelling takes place, offers new re- to the discrete and rapidly fluctuating values from the real tially stretch the limits of the current Kinect HD API. De- search opportunities in the study of perception of story time motion capture data from the Kinect HD. This was spite its improved resolution over the first generation, Kinect telling, drama, and empathy as a function of vantage point. solved by discriminating against implied joint positions HD is still best suited for face tracking in close proximity and data points suggesting sudden jumps. Further issues which limits its ability to track body. In turn, our imple- 3. IMPLEMENTATION included the structure of arms and legs rotations. The most mentation offers accurate simultaneous full body and fa- notable user feedback from the tests was about their abil- Cinemacraft mirrors the user’s motions and expressions in cial tracking. Further still, we have identified problems ity to understand avatar’s inference of gestures that had a real time within Minecraft using the Kinect HD sensor and with Kinect’s machine learned library of postures and fa- strong reliance on elbows despite the lack of such a joint a custom C# client that leverages Kinect HD API. Our plat- cial expressions that have resulted in a prevalent number in the avatar’s skeletal structure. Given the lack of elbows, form expands on the previously modified Minecraft 1.5.2 of false positives pertaining to eye winks, eyebrow move- Figure 2. Cinemacraft - Avatar in action hands, and knees, the body anatomy of a Minecraft char- codebase from the OPERAraft project to make acting more ment, and eyeglass detection. For this reason in the cur- acters was significantly different to the human form, due expressive and closer to the levels of post-produced ma- rent iteration eye winks and eyebrow movement has been to which the position and nature of rotation of joints had chinima. The C# Kinect application and Cinemacraft mod temporarily disabled. We aim to address these challenges to be modified. This is one particular research area we Sensory fusion and multi-sensory stimulation has been for Minecraft are packaged as a drop-in mod and an in- through an additional layer of sensory fusion. One possible are keen on exploring further. Users also expressed the shown to help in illusorily experiencing a virtual body and dependent executable file, respectively. A major consid- approach is to utilize a low-latency algo- desire for wider range of actions which would necessar- virtual body parts [32]. Its use within the context of em- eration for our setup was accessibility. For this reason, rithm on the area of the image that Kinect HD identifies ily require a fundamental re-thinking of how human ac- bodied interaction can be particularly useful in situations although earlier prototypes relied on two first generation as containing the tracked human face. Another could be tions can be translated within this context. Another signifi- where one sensory input does not provide adequate resolu- Kinects, due to their observed inability to simultaneously further enhancing face detection with infrared video feed cant feedback was the importance of facial expressions and tion and could benefit from secondary inputs that improve do body and face tracking, the current implementation re- inherent to Kinect HD and look for high reflectivity of eye that the real-time tracked facial movements massively im- its accuracy and fidelity. In Cinemacraft, such a situation lies on just one second generation Kinect HD device re- pupils. proved emotive qualities of the avatar. Tracking and repre- can be observed when trying to monitor mouth movement. sponsible both for kinematic and facial tracking. We have envisioned a platform with parallel pipelines of sentation of eye winks and eyebrow movements were dis- Kinect HD, particularly when a user is located farther away In order for the Kinect HD client to interface with the Audio Inputs, Kinect API and Computer Vision optimiza- abled for certain periods of the assessment to reduce the from the sensor, so as to allow for full-body capture, is not Minecraft mod, core Minecraft was reverse engineered and tion and learning for improving facial Expressions, with all number of false positives, and this seemed to sharply re- capable of providing accurate capture of fast and varied retrofitted with a FUDI-compliant protocol [34] capable of 3 working together to further refine the platforms capabil- duce the users’ level of immersion. There was also strong shapes generated by the mouth. In this case, sensory fusion parsing remote messages. Its simpler version is already ities through sensory fusion. To address Kinect HD’s lim- positive feedback for the ease of stepping in and out of the through the use of voice analysis can be an appropriate way found in OPERAcraft where it was used to coordinate var- ited ability to detect mouth shapes, we merged the depth virtual world. The application has been built to dynami- of complementing visual capture. Similarly, in situations ious aspects of the production, including switching cam- camera data with the captured audio input. Here, the sen- cally lock onto the closest user in the scene and is robust where a user has their mouth opened and are making no era angles, lip syncing as detected by the singers’ micro- sory fusion allowed us to use voice detection to combine in handling their movements when they leave and walking sound, visual capture needs to take precedence as the only phones, subtitles, and stage cues. As a result these could be the performer’s audio with the facial tracking data and thereby back into the designated space. reliable source of information. The current Cinemacraft handled remotely through multiple distributed Pd-L2Ork improve detection of minor gestures and expressions which version implements OPERAcraft-based avatar lip syncing clients [31]. may not be otherwise captured due the technical limita- 4.1 based on the simple transient and melisma detection that One of the main objectives of the system is to provide tions of the two distinct approaches to monitoring user’s is cross-pollinated with the Kinect HD-based capture with a compelling emotional delivery of storytelling within the input. For instance, doing so enabled us to animate mouth Throughout Cinemacraft’s development we have interacted the two swapping precedence based on the context. context of arts. Therefore, it was important to test how re- motion through captured audio that exceeds the resolution with a number of machinima studios worldwide to gain

HEARING THE SELF 387 their feedback on the mod and the interface. One of the shift to Minetest has been made to support versatile ad- most significant observations coming out of this interac- ditions. In particular is the support for multiplayer in- tion was the recently proposed switch from Minecraft to teractions which would involve a multitude of body mo- an open source alternative–Minetest [38]. Minetest offers tions, gestures and expressions. Such multi-user interac- several potential advantages over Minecraft. The Minecraft tions have been shown to be very engaging and enrich im- code is not readily modifiable and requires a community- mersion by giving the interactions with virtual environ- driven effort to decompile JAVA runtime into a human- ments an interpersonal dimension [41]. Open questions readable API. As such its forward compatibility is at best also remain around the use of Kinect HD. Current reliance cumbersome. The legal implications of Minecraft on Kinect HD hardware restricts the user to a relatively are also not entirely clear, suggesting Microsoft by default small physical area, and mapping movements around in a owns all modded code, and as a result the distribution of large virtual world is challenging. In addition, there also the ensuing deeply integrated mod is difficult if not im- seem to be some inherent gaps in Kinect detection. One possible. Minetest on the other hand, is strikingly similar prominent instance of this was during the detection of the yet highly moddable voxel game engine that covers a ma- user blinks when the user tilted his head toward one side jority of Minecraft features. The in-game client server in- beyond a certain angle. Further research is required into teractions are also better handled in Minetest with modded the exact technical limitations of the sensor and how much servers sending textures and other required resources to the optimization can be achieved through manipula- clients, unlike Minecraft that may require resources to be tion. Most importantly, the uncertainty surrounding Kinect downloaded separately. As a result, we are currently in the HD’s future has prompted us to explore alternative open process of shifting mod onto the Minetest platform. source solutions.

5. CONCLUSION 8. REFERENCES [1] M. K. HD. (2017) Kinect Kinect HD. [Online]. We have designed a new platform for immersive performance- Available: http://www.xbox.com/en-US/xbox-one/ centric interaction for the purpose of musical expression accessories/kinect and story telling. The new platform is inspired by the suc- cess of Minecraft and builds on its approach that success- [2] T. Berg, D. Chattopadhyay, M. Schedel, and T. Val- fully sidesteps the uncanny valley. Our results so far are lier, “Interactive music: Human motion initiated music promising. As evidenced by audience and user feedback, generation using skeletal tracking by Kinect.” including the machinima community, as well as interest in the platform, we were able to create a high level of immer- [3] J. Ratcliffe, “Hand motion-controlled audio mixing in- sion by combining multiple interaction techniques into a terface,” Proc. of New Interfaces for Musical Expres- single system despite relying on a cartoon-like low fidelity sion (NIME) 2014, pp. 136–139, 2014. environment. Extending sophisticated technology like im- mersive VR and gesture tracking to easy markerless mo- [4] T. Ahmaniemi, “Gesture Controlled Virtual Instrument tion capture our performers could control their avatar with with Dynamic Vibrotactile Feedback.” in NIME, 2010, relative ease and accuracy without extended training ses- pp. 485–488. sions. [5] C. Honigman, A. Walton, and A. Kapur, “The third room: A 3d virtual music paradigm,” in Proceedings of the International Conference on New Interfaces for 6. BROADER IMPACT Musical Expression, 2013, pp. 29–34. Cinemacraft offers opportunities to extend virtual presence and consequently outreach by allowing audiences to en- [6] R. Kastelein. (2013) The Rise of Machinima, the gage with the production directly in-game. The team en- Artform. [Online]. Available: http://insights.wired. visions the ensuing implementation being appropriate in com/profiles/blogs/the-rise-of-machinima-the-artform a broad range of live and post production scenarios, be- [7] E. Polyak, “Virtual Impersonation Using Interactive yond its original intent, from machinima movie-making to Glove ,” in SIGGRAPH Asia 2012 Posters, theatre. Studies have shown that learning movie and the- ser. SA ’12. New York, NY, USA: ACM, 2012, pp. atre production skills help to instill the sense of ownership, 31:1–31:1. [Online]. Available: http://doi.acm.org/10. confidence and self-belief in students [39]. Minecraft has 1145/2407156.2407191 already been effectively used as an education tool through the successful MinecraftEdu [40]. Our shift towards the [8] N. Bianchi-Berthouze, “Understanding the role of Minetest platform and an increased reliance on sensory fu- body movement in player engagement,” Human– sion has a potential improve such an educational experi- Computer Interaction, vol. 28, no. 1, pp. 40–75, 2013. ence as a live learning tool. [9] N. Bianchi-Berthouze, W. W. Kim, and D. Patel, “Does body movement engage you more in digital game play? 7. FUTURE WORK And Why?” in International Conference on Affec- tive Computing and Intelligent Interaction. Springer, The core functionality of Cinemacraft has been developed 2007, pp. 102–113. keeping in mind, future opportunities for exciting improve- ments that could be implemented in the future. Cinemacraft’s [10] J. Parker, “Buttons, simplicity, and natural interfaces.”

388 2017 ICMC/EMW their feedback on the mod and the interface. One of the shift to Minetest has been made to support versatile ad- [11] S. Lay, N. Brace, G. Pike, and F. Pollick, “Circling [26] M. M. Wanderley, “Quantitative analysis of non- most significant observations coming out of this interac- ditions. In particular is the support for multiplayer in- Around the Uncanny Valley: Design Principles obvious performer gestures,” in International Gesture tion was the recently proposed switch from Minecraft to teractions which would involve a multitude of body mo- for Research Into the Relation Between Human Workshop. Springer, 2001, pp. 241–253. an open source alternative–Minetest [38]. Minetest offers tions, gestures and expressions. Such multi-user interac- Likeness and Eeriness,” i-Perception, vol. 7, no. 6, several potential advantages over Minecraft. The Minecraft tions have been shown to be very engaging and enrich im- p. 2041669516681309, 2016. [Online]. Available: [27] J. W. Davidson, “Visual perception of performance code is not readily modifiable and requires a community- mersion by giving the interactions with virtual environ- http://dx.doi.org/10.1177/2041669516681309 manner in the movements of solo musicians,” Psychol- driven effort to decompile JAVA runtime into a human- ments an interpersonal dimension [41]. Open questions ogy of music, vol. 21, no. 2, pp. 103–113, 1993. readable API. As such its forward compatibility is at best also remain around the use of Kinect HD. Current reliance [12] F. Argelaguet, L. Hoyet, M. Trico, and A. Lecuyer,´ [28] B. W. Vines, C. L. Krumhansl, M. M. Wanderley, and cumbersome. The legal implications of modding Minecraft on Kinect HD hardware restricts the user to a relatively “The role of interaction in virtual embodiment: Effects D. J. Levitin, “Cross-modal interactions in the per- are also not entirely clear, suggesting Microsoft by default small physical area, and mapping movements around in a of the virtual hand representation,” in ception of musical performance,” Cognition, vol. 101, owns all modded code, and as a result the distribution of large virtual world is challenging. In addition, there also (VR), 2016 IEEE. IEEE, 2016, pp. 3–10. no. 1, pp. 80–113, 2006. the ensuing deeply integrated mod is difficult if not im- seem to be some inherent gaps in Kinect detection. One [13] N. Garrelts, Understanding Minecraft: Essays on Play, possible. Minetest on the other hand, is strikingly similar prominent instance of this was during the detection of the [29] V. Bergeron and D. M. Lopes, “Hearing and seeing mu- Community and Possibilities. McFarland, 2014. yet highly moddable voxel game engine that covers a ma- user blinks when the user tilted his head toward one side sical expression,” Philosophy and Phenomenological Research, vol. 78, no. 1, pp. 1–16, 2009. jority of Minecraft features. The in-game client server in- beyond a certain angle. Further research is required into [14] S. C. Duncan, “Minecraft, beyond construction and teractions are also better handled in Minetest with modded the exact technical limitations of the sensor and how much survival,” Well Played: a journal on video games, [30] I. I. Bukvic, C. Cahoon, A. Wyatt, T. Cowden, and servers sending textures and other required resources to the optimization can be achieved through software manipula- value and meaning, vol. 1, no. 1, pp. 1–22, 2011. K. Dredger, “OPERAcraft: Blurring the Lines between clients, unlike Minecraft that may require resources to be tion. Most importantly, the uncertainty surrounding Kinect Real and Virtual.” in ICMC, 2014. downloaded separately. As a result, we are currently in the HD’s future has prompted us to explore alternative open [15] S. Choney. (2016) Microsoft Stores offering free process of shifting mod onto the Minetest platform. source solutions. Minecraft VR demos on . [Online]. Avail- [31] I. Bukvic, “A Behind-the-Scenes Peek at World’s First able: https://blogs.microsoft.com/firehose/2016/09/29/ -Based Laptop Orchestra–The Design of L2Ork microsoft-stores-offering-free-minecraft-vr][][http: Infrastructure and Lessons Learned.” 8. REFERENCES 5. CONCLUSION //dl.acm.org/citation.cfm?id=2910632 [32] A. Lcuyer, “Playing with Senses in VR: Alternate Per- [1] M. K. HD. (2017) Kinect Kinect HD. [Online]. We have designed a new platform for immersive performance- ceptions Combining Vision and Touch,” IEEE Com- Available: http://www.xbox.com/en-US/xbox-one/ [16] N. Viniconis. (2011) MineCraft + Kinect : centric interaction for the purpose of musical expression puter Graphics and Applications, vol. 37, no. 1, pp. accessories/kinect Building Worlds! [Online]. Available: http: and story telling. The new platform is inspired by the suc- //www.orderofevents.com/MineCraft/KinectInfo.htm 20–26, Jan 2017. cess of Minecraft and builds on its approach that success- [2] T. Berg, D. Chattopadhyay, M. Schedel, and T. Val- fully sidesteps the uncanny valley. Our results so far are [17] Vivecraft. (2016) Vivecraft. [Online]. Available: http: [33] N. F. Polys, B. Knapp, M. Bock, C. Lidwin, D. Web- lier, “Interactive music: Human motion initiated music ster, N. Waggoner, and I. Bukvic, “Fusality: an open promising. As evidenced by audience and user feedback, generation using skeletal tracking by Kinect.” //www.vivecraft.org/ including the machinima community, as well as interest in framework for cross-platform mirror world installa- the platform, we were able to create a high level of immer- [3] J. Ratcliffe, “Hand motion-controlled audio mixing in- [18] B. Carey and B. Ulas, “VR’SPACE OPERA’: Mimetic tions,” in Proceedings of the 20th International Con- sion by combining multiple interaction techniques into a terface,” Proc. of New Interfaces for Musical Expres- Spectralism in an Immersive Starlight Audification ference on 3D Web Technology. ACM, 2015, pp. 171– single system despite relying on a cartoon-like low fidelity sion (NIME) 2014, pp. 136–139, 2014. System,” arXiv preprint arXiv:1611.03081, 2016. 179. environment. Extending sophisticated technology like im- [19] J. Bailey, J. N. Bailenson, A. S. Won, J. Flora, and [34] FUDI. (2017) FUDI. [Online]. Available: https: mersive VR and gesture tracking to easy markerless mo- [4] T. Ahmaniemi, “Gesture Controlled Virtual Instrument K. C. Armel, “Presence and memory: immersive vir- //en.wikipedia.org/wiki/FUDI tion capture our performers could control their avatar with with Dynamic Vibrotactile Feedback.” in NIME, 2010, tual reality effects on cued recall.” Citeseer. relative ease and accuracy without extended training ses- pp. 485–488. [35] V. Tech. (2016) South by Southwest 2016. [Online]. Available: http://www.vt.edu/sxsw.html sions. [5] C. Honigman, A. Walton, and A. Kapur, “The third [20] M. Lombard and T. Ditton, “At the heart of it all: The room: A 3d virtual music paradigm,” in Proceedings concept of presence,” Journal of Computer-Mediated [36] A. Institute for Creativity and Technology. (2016) of the International Conference on New Interfaces for Communication, vol. 3, no. 2, pp. 0–0, 1997. ICAT Day 2016. [Online]. Available: https://www. 6. BROADER IMPACT Musical Expression, 2013, pp. 29–34. icat.vt.edu/content/icat-day-2016 [21] J.-N. Thon, “Immersion revisited: on the value of a Cinemacraft offers opportunities to extend virtual presence contested concept,” Extending Experiences-Structure, [37] V. Tech. (2017) Science Museum of Western Virginia. and consequently outreach by allowing audiences to en- [6] R. Kastelein. (2013) The Rise of Machinima, the analysis and design of computer game player experi- [Online]. Available: http://www.icat.vt.edu/smwv gage with the production directly in-game. The team en- Artform. [Online]. Available: http://insights.wired. ence, pp. 29–43, 2008. visions the ensuing implementation being appropriate in com/profiles/blogs/the-rise-of-machinima-the-artform [38] Minetest. (2016) Meet Minetest. [Online]. Available: a broad range of live and post production scenarios, be- [7] E. Polyak, “Virtual Impersonation Using Interactive [22] M. Simon and S. Mayr, Immersion in a Virtual World: http://www.minetest.net/ yond its original intent, from machinima movie-making to Glove Puppets,” in SIGGRAPH Asia 2012 Posters, Interactive Drama and Affective Sciences: Interactive et al. theatre. Studies have shown that learning movie and the- [39] A. Swainston, N. Jeanneret , “Wot Opera: A joy- ser. SA ’12. New York, NY, USA: ACM, 2012, pp. Drama and Affective Sciences. Anchor Academic Music: Ed- atre production skills help to instill the sense of ownership, ful, creative and immersive experience,” in 31:1–31:1. [Online]. Available: http://doi.acm.org/10. Publishing, 2014. ucating for life. ASME XXth National Conference Pro- confidence and self-belief in students [39]. Minecraft has 1145/2407156.2407191 ceedings already been effectively used as an education tool through [23] C. Buecheler. (2010) Character: The . Australian Society for Music Education, 2015, p. 99. the successful MinecraftEdu [40]. Our shift towards the [8] N. Bianchi-Berthouze, “Understanding the role of Next Great Gaming Frontier? [On- Minetest platform and an increased reliance on sensory fu- body movement in player engagement,” Human– line]. Available: http://www.chicagotribune.com/ [40] Microsoft. (2016) Minecraft Education Edition. [On- sion has a potential improve such an educational experi- Computer Interaction, vol. 28, no. 1, pp. 40–75, 2013. sns-gamereview-feature-characters-story.html line]. Available: https://education.minecraft.net/ ence as a live learning tool. [9] N. Bianchi-Berthouze, W. W. Kim, and D. Patel, “Does [24] F. Visi, R. Schramm, and E. R. Miranda, “Use of Body [41] M. Loviska, O. Krause, H. A. Engelbrecht, J. B. Nel, body movement engage you more in digital game play? Motion to Enhance Traditional Musical Instruments.” G. Schiele, A. Burger, S. Schmeißer, C. Cichiwskyj, 7. FUTURE WORK And Why?” in International Conference on Affec- in NIME, 2014, pp. 601–604. L. Calvet, C. Griwodz et al., “Immersed gaming in tive Computing and Intelligent Interaction. Springer, Minecraft,” in Proceedings of the 7th International The core functionality of Cinemacraft has been developed 2007, pp. 102–113. [25] H. Blyth, “Cross-Modal Perception of Emotion from Conference on Multimedia Systems. ACM, 2016, keeping in mind, future opportunities for exciting improve- Dynamic Point-Light Displays of Musical Perfor- p. 32. ments that could be implemented in the future. Cinemacraft’s [10] J. Parker, “Buttons, simplicity, and natural interfaces.” mance,” 2015.

HEARING THE SELF 389