Audio Synthesis and Effects Processing with Web Browser In
Total Page:16
File Type:pdf, Size:1020Kb
Aalto University School of Science Degree Programme in Computer Science and Engineering Jarkko Tuomas Juslin Audio synthesis and effects processing with web browser in open web Master's Thesis Espoo, January 14, 2016 Supervisors: Professor Petri Vuorimaa, Aalto University Advisor: Jari Kleimola D.Sc. (Tech.), Aalto University Aalto University School of Science ABSTRACT OF Degree Programme in Computer Science and Engineering MASTER'S THESIS Author: Jarkko Tuomas Juslin Title: Audio synthesis and effects processing with web browser in open web Date: January 14, 2016 Pages: vii + 47 Major: Department of Media Technology Code: T-110 Supervisors: Professor Petri Vuorimaa Advisor: Jari Kleimola D.Sc. (Tech.) This thesis examines the viability of audio synthesis and effects processing in open web with and within a web browser. Two use cases are presented: local audio synthesis in web browser utilizing PNaCl and Emscripten technologies, and audio synthesis as a service utilizing Web Sockets and WebRTC technologies. Devised test procedures aim to find out the end-to-end audio latency and polyphony performance of the implemented systems. The second use case is also examined from general perspective on distributed audio applications by gained experience while doing this work. After a short introduction the thesis present more thorough background to the subject and describes the related concepts and used technologies. The use cases and test procedures and settings are introduced next along with measured re- sults. The results are analysed and discussed after and finally the conclusions are presented. The lowest latency was achieved with PNaCl measuring 39 milliseconds for real- time audio synthesis in web browser, being on the par with baseline measure- ments with native desktop performance. The lowest latency for Emscripten was measured to be 48 milliseconds. In the second use case, WebRTC achieved 833 millisecond latency and Web Sockets implementation was found to be inappli- cable for real-time streaming not being able to produce clean audio playback. Keywords: audio synthesis, web audio api, emscripten, pnacl, webrtc, web sockets, html5 Language: English ii Aalto-yliopisto Perustieteiden korkeakoulu DIPLOMITYON¨ Tietotekniikan koulutusohjelma TIIVISTELMA¨ Tekij¨a: Jarkko Tuomas Juslin Ty¨on nimi: A¨¨anisynteesi sek¨a tehosteprosessointi verkkoselaimella avoimessa internetiss¨a P¨aiv¨ays: 14. tammikuuta 2016 Sivum¨a¨ar¨a: vii + 47 P¨a¨aaine: Mediatekniikan laitos Koodi: T-110 Valvojat: Professori Petri Vuorimaa Ohjaaja: Tekniikan tohtori Jari Kleimola T¨am¨a ty¨o tutkii ¨a¨anisynteesin ja tehosteprosessoinnin tehokkuutta verkkose- laimella avoimessa internetiss¨a. Ty¨oss¨a esitet¨a¨an kaksi k¨aytt¨otapausta: lokaa- li ¨a¨anisynteesi verkkoselaimessa k¨aytt¨aen PNaCl ja Emscripten -tekniikoita, sek¨a pilvipalveluna tarjottu ¨a¨anisynteesi k¨aytt¨aen Web Sockets ja WebRTC - tekniikoita. Laaditut testausmenetelm¨at pyrkiv¨at selvitt¨am¨a¨an kunkin asetelman p¨a¨ast¨a-p¨a¨ah¨an synnytt¨am¨a¨a viivett¨a ¨a¨anisynteesiss¨a sek¨a polyfonia suoritusky- ky¨a. Toista k¨aytt¨otapausta tarkastellaan my¨os yleisemm¨ast¨a n¨ak¨okulmasta kes- kittyen hajautettuihin ¨a¨anisovelluksiin perustuen t¨ast¨a ty¨ost¨a saatuun kokemuk- seen. Lyhyen johdannon j¨alkeen ty¨o esittelee tarkemmin olennaiset k¨asitteet sek¨a k¨aytety teknologiat. Seuraavaksi k¨aytt¨otapaukset, testausmenetelm¨at kuvataan ja tulokset esitet¨a¨an. Tulokset analysoidaan ja niit¨a pohditaan t¨am¨an j¨alkeen, ja lopuksi esitet¨a¨an ty¨on loppup¨a¨atelm¨at. Alhaisin saavutettu p¨a¨ast¨a-p¨a¨ah¨an viive t¨ass¨a ty¨oss¨a oli 39 millisekuntia k¨aytt¨aen PNaCl-tekniikkaa, ollen tasoissa natiivin kontrollimittauksen kanssa. Emscripten-tekniikkaa hy¨odynt¨aen selain pystyi 48 millisekunnin viiveisiin. Toi- sessa k¨aytt¨otapauksessa WebRTC ylsi 833 millisekunnin viiveisiin ja Web Sockets toteutus havaittiin kykenett¨om¨aksi reaaliaikaiseen median suoratoistoon. Asiasanat: ¨a¨anisynteesi, web audio api, emscripten, pnacl, webrtc, web sockets, html5 Kieli: Englanti iii Acknowledgements First a thank you to my supervisor Professor Petri Vuorimaa. Petri agreed to supervise my work even though at the time I had only a faint idea for the subject. A big thanks to my advisor Doctor of Technology Jari Kleimola. Jari proposed an alternative subject for the thesis, which I accepted. And am really glad I did. Jari also provided me with much guidance to get me through a lot of digital audio theory and practices. Finally a huge thanks to my beloved life partner Tiina Moilanen | and my whole family, mother, father, my siblings, and the rest | for life long support and love. You know who you are. End awhile. Espoo, January 14, 2016 Jarkko Tuomas Juslin iv Abbreviations and Acronyms ALSA Advanced Linux Sound Architecure API Application Programming Interface DAW Digital Audio Workstation DSP Digital Signal Processing GUI Graphical User Interface PCM Pulse Code Modulation PNaCl Portable Native Client VST Virtual Studio Technology WASAPI Windows Audio Session API WIMP Windows, Icons, Mouse, Pointers v Contents Abbreviations and Acronyms v 1 Introduction 1 2 Background 3 2.1 History of audio synthesis . .4 2.1.1 Analog synthesis . .4 2.1.2 Hybrid synthesis . .5 2.1.3 Digital synthesis . .6 2.1.4 Mixed and desktop synthesis . .6 2.1.5 Web browser based synthesis . .7 2.2 Concepts . .7 2.2.1 Audio Stack . .8 2.2.2 Digital Audio . 10 2.2.3 Digital Audio Workstation . 11 2.2.4 Latency and polyphony . 11 2.2.4.1 Local audio system . 12 2.2.4.2 Distributed audio system . 13 2.3 Browser based audio synthesis . 14 2.3.1 Web Audio API . 15 2.3.2 Media Capture and Streams . 16 2.3.3 Web MIDI API . 17 2.3.4 Portable Native Client . 17 2.3.5 Emscripten and asm.js . 18 2.3.6 NW.js . 19 2.4 Transferring audio, data and interface . 19 2.4.1 WebRTC and WebSocket . 19 2.4.2 Virtual cable routing . 20 2.4.3 VNC . 20 vi 3 Research 22 3.1 Baseline . 23 3.2 Use case 1: browser as a VST Host . 24 3.2.1 Use case 1.1: PNaCl . 24 3.2.2 Use case 1.2: Emscripten . 25 3.2.3 Test settings . 25 3.2.4 Results . 26 3.3 Use case 2: remote desktop VST Host . 28 3.3.1 Use case 2.1: Web Sockets . 28 3.3.2 Use case 2.2: WebRTC . 28 3.3.3 Test settings . 29 3.3.4 Results . 29 4 Analysis 31 4.1 Use case 1: browser as a VST Host . 31 4.1.1 Use case 1.1: PNaCl . 33 4.1.2 Use case 1.2: Emscripten . 34 4.1.3 Discussion . 35 4.2 Use case 2: remote VST Host . 37 4.2.1 Use case 2.1: Web Sockets . 37 4.2.2 Use case 2.2: WebRTC . 38 4.2.3 Discussion . 38 5 Conclusions 40 vii Chapter 1 Introduction There are generally two fundamental problems with audio processing: P-1 the available processing power must be sufficient and P-2 the available tools must be simple and efficient. One way to measure the success of overcoming these problems is with two factors: F-1, the end-to-end audio signal latency of the system, and F-2, the achieved polyphony. This thesis sets to find out the viability a modern web browser for audio synthesis and effects processing with main focus on measuring the end-to-end latency F-1, and with a simple polyphony performance F-2 testing. Web is a strong candidate to be the future platform for application de- velopment, and a huge step towards has been taken with the emergence of the HTML5 and related technologies. Among these are a subset, more or less dealing with audio, making possible complex audio applications to be run with and within a web browser. However the viability of a web browser for the purpose can be brought into question. Processing power of a per- sonal computer has been sufficient to perform real-time audio synthesis with desktop applications in studio quality for a long time, but can a web browser utilize the underlying resources to achieve the same? This is the primary research question of this thesis Q-1 : How viable is a web browser for audio platform. To find out, two use cases are set up: UC-1 web browser as a VST Host and UC-2 remote desktop VST Host. In UC-1 audio synthesis and effects processing are performed locally inside a web browser, whereas in UC-2 the work is carried out by a desktop VST Host application on a remote server and accessed as a service by the browser. Testing focus is on the processing power P-1 available in web browser by measuring latency F-1 and polyphony F-2. UC-2 draws out another aspect to consider. Since the processing power is no longer an issue, the focus could be shifted more towards the applica- 1 CHAPTER 1. INTRODUCTION 2 bility of distributed audio applications. The main issue will no doubt be the added delay introduced by the network, but also the usability of streamed user interface and managing of control flow. This brings out the secondary research question for this thesis Q-2 : How applicable are distributed audio applications in general. Q-2 will be examined more on an observational level based on experiences gained by conducting this research. No measures, other than latency, will be taken nor will there be heuristic evaluation of any sort. The reporting is based solely on empirical experience gained implementing and testing the system. The structure of the thesis is as follows. Next, in chapter 2, a more thorough introduction is provided with relevant technologies, concepts and related work.