The Phone with the Flow: Combining Touch + Optical Flow in Mobile Instruments Cagan Arslan, Florent Berthaut, Jean Martinet, Ioan Marius Bilasco, Laurent Grisoni
Total Page:16
File Type:pdf, Size:1020Kb
The Phone with the Flow: Combining Touch + Optical Flow in Mobile Instruments Cagan Arslan, Florent Berthaut, Jean Martinet, Ioan Marius Bilasco, Laurent Grisoni To cite this version: Cagan Arslan, Florent Berthaut, Jean Martinet, Ioan Marius Bilasco, Laurent Grisoni. The Phone with the Flow: Combining Touch + Optical Flow in Mobile Instruments. NIME 2018 - 18th Interna- tional Conference on New Interfaces for Musical Expression, Jun 2018, Blacksburg, VA, United States. hal-01809210 HAL Id: hal-01809210 https://hal.archives-ouvertes.fr/hal-01809210 Submitted on 6 Jun 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Phone with the Flow: Combining Touch + Optical Flow in Mobile Instruments Cagan Arslan, Florent Berthaut, Jean Martinet, Ioan Marius Bilasco, Laurent Grisoni CRIStAL, CNRS, University of Lille, France {cagan.arslan,florent.berthaut,jean.martinet,marius.bilasco,laurent.grisoni}@univ-lille.fr ABSTRACT to gather position and motion information that are mapped Mobile devices have been a promising platform for musical to MIDI controls [15] [14]. performance thanks to the various sensors readily available However, to our knowledge, built-in cameras of mobile on board. In particular, mobile cameras can provide rich devices have not been used for sonification of moving el- input as they can capture a wide variety of user gestures ements in the camera image. Controlling sound through or environment dynamics. However, this raw camera input movement has always been of interest and can be traced only provides continuous parameters and requires expensive back to Kurenniemi's DIMI-O [12] and Rokeby's Very Ner- computation. In this paper, we propose combining camera vous System [1]. More recent examples such as [13][7][8] [4] based motion/gesture input with the touch input, in order use difference between images and optical flow to represent to filter movement information both temporally and spa- the moving parts of the image, but the methods have been tially, thus increasing expressiveness while reducing compu- either too simple to extract rich information or too heavy tation time. We present a design space which demonstrates to be run on mobile devices. The details of optical flow the diversity of interactions that our technique enables. We methods will be further discussed in section 2.2. also report the results of a user study in which we observe In this paper, we propose combining visual movement de- how musicians appropriate the interaction space with an tection with the touchscreen of mobile devices in order to example instrument. reduce the computation time and to open expressive op- portunities. We analyze these possibilities using a design space. We also study how users appropriate the instrument Author Keywords for two different movement sources: the user's gestures and Mobile instruments, Optical flow, touchscreen the environment. CCS Concepts 2. THE PHONE WITH THE FLOW In this section, we first present the main approach behind •Applied computing ! Sound and music computing; our system. It is currently implemented as an Android App •Computing methodologies ! Motion capture; (www.caganarslan.info/pwf.html). Optical flow features, extracted as described in the next section, are then mapped 1. INTRODUCTION to sound synthesis parameters. The synthesis can be done Many sensing capabilities of mobile devices have been ex- either directly on the mobile device, with restrictions on the plored for musical interaction [3]. Most of the mobile de- complexity of the synthesis due to limited computing capa- vices today includes various sensors such as a microphone, bilities of the mobile device, or the features can be sent to motion sensors, a touch screen, and multiple cameras. Hence, external musical software via OpenSoundControl messages. numerous musical interaction devices were designed around We also describe a design space of the interaction and smartphones and tablets [5] [2] [17], in particular using mappings opportunities afforded by our system. the discrete and continuous input capabilities of the touch- 2.1 From the Scene to the Touchscreen screen. Among these sensors however, the use of built-in camera of mobile devices has been little explored. The sim- Our system relies on real-time spatial and temporal filtering plest approach is to use the camera as a tone-hole. Ananya of rich motion information provided by the built-in cameras et al. [2] use the average gray-scale value of the camera of mobile devices. Fig. 1 depicts an example scenario of use input image to detect if the lens is covered. Covering or of The Phone with the Flow. uncovering of the lens modifies the pitch of the sound simi- Movement-rich Environment: The physical scene cap- larly to a tone-hole. Keefe and Essl [9] used low-level visual tured by the built-in camera offers a large range of move- features of the input image such as the edginess to create ments, with various periodicity, directions and sources. The mobile music performances with visual contributions. Cam- sources can be artificial (displays, mechanisms), natural era image has also been used to track visual reference points (weather, animals, plants), or originate from people (user's body, other musicians, spectators, by-standers). Choosing the Interaction Space Mobile cameras have a limited field of view but their portability enables explo- ration of the surroundings by a point-and-shoot approach. Unlike fixed installations, the mobile camera's field of view Licensed under a Creative Commons Attribution can be changed without effort by simply changing its posi- 4.0 International License (CC BY 4.0). Copyright tion and orientation. When the camera aims at a part of remains with the author(s). the movement-rich environment, objects in the field of view NIME’18, June 3-6, 2018, Blacksburg, Virginia, USA. are captured, providing the user with a visual feedback of (a) A person, a computer screen, a (b) The user chooses a movement (c) The user selects a region on the fan, objects passing by the window are source screen movement sources. Figure 1: From scene to the the touchscreen. their movement sources. The user is then free to interact a pyramidal Lucas-Kanade method[18] to track the sparsely in the combined interaction volume [11] of the camera and identified points. He also discusses the potential mappings the touchscreen. of the flow field to sound parameters. However he concludes Filtering the Movements Once the movement sources that the flow fields' temporal resolution is poor and the fea- are displayed on the touchscreen, the user focuses on a re- ture detection is not robust. CaMuS2[14] estimates the gion of interest by touching it. The touch selects a region optical flow from a camera phone. A 176x144 pixels image of the image to be analyzed further for detailed movement is divided into 22x18 blocks and cross-correlations between information. The user can change the position of the region successive pairs of block images are calculated to obtain by dragging their finger on the screen, alter its size, com- the optical flow. Because only 4 points are sampled in each bine multiple regions and switch between them. The touch block, the algorithm performs quickly. The simplicity of the input thus enables filtering of the movements. method allows obtaining the global translation and rotation at 15fps, but the system is unable to provide rich motion 2.2 Optical Flow information from the moving objects in the image. Optical flow is the distribution of apparent velocities of In recent years, there have been advances both in smart- moving brightness patterns in an image [6]. The movement phone hardware (improved cameras, CPU and GPU) and indicates a motion resulting from the relation between cap- optical flow algorithms, which open up new possibilities for tured objects and the viewer. The motion at a point can movement detection in real-time. be represented with ∆x and ∆y, the horizontal and the In 2016, Kroeger et al. [10] introduced DIS-Flow, an op- vertical displacement respectively. Therefore displacement tical flow estimation method that is one order of magnitude vector can be expressed as v(∆x; ∆y), but also in polar faster than others and produces roughly similar flow quality. coordinates v(r; Θ) to represent the amplitude and the di- Their objective is to trade-off a less accurate flow estima- rection of the movement for a pixel. Fig. 2 shows a color tion for large decreases in run-time for time critical tasks. coded representation of the optical flow. This method provides a dense flow output, in which ev- ery pixel is associated to a displacement vector v(∆x; ∆y). This advancement enables us to envision the creation of an innovative design space that allows richer interaction and sonification methods. Optical flow computation is noisy and presents important motion discontinuities as it measures the displacement in the 2D image plane of the real 3D world. Filtering out noise and focusing on the coherent motion information is the key to successful optical flow interaction. Our approach to extracting features from raw flow data is as follows. First of all, for every region of interest, the pixels that have a Figure 2: Color representation of the optical flow. (Top) smaller displacement value than a threshold are discarded Color space: hue indicates direction, saturation indicates to eliminate the noise (2). We identified key information to amplitude.