Masaryk University Faculty of Informatics

Linear Audio Editor

Bachelor’s Thesis

Josef Hornych

Brno, Fall 2015 Replace this page with a copy of the official signed thesis assignment and the copy of the Statement of an Author. Declaration

Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Josef Hornych

Advisor: RNDr. Miloš Liška, Ph.D.

i Acknowledgement

First and foremost, my sincere thanks go to RNDr. Miloš Liška, Ph.D. for supervising me and helping me with my thesis. I would also like to thank the people who helped me test my application, namely Pavel Šindelka and Filip Holec. Without their input, several major errors in my application would go unnoticed. And lastly, my gratitude goes to the hosting team at Impact Hub Brno and my colleagues at Mathesio s. . o. for their words of encour- agement and support.

ii Abstract

The goal of this thesis is to create a audio editor. The user interface of the implemented editor is focused around the use on tablet computers running Windows 8.1. Firstly, the term ’audio editing’ is defined along with the possible use cases for such editors on portable devices. The main part of the text contains the description of the implemented audio editor along with the the supporting libraries that were necessary to be created. The output of this thesis is a tablet-oriented audio editor, which supports manipulating multiple tracks, and stereo audio clips as well as three basic audio filters: normalization, fade in and fade out. It contains a software through which a project created in the editor can be mixed down and exported to an output file. Along with the editor itself, this thesis helped spawn several code libraries useful for third-party developers, available under the MIT license.

iii Keywords audio editor, tablet, Windows 8.1, zvukový editor, Windows Store

iv Contents

1 Introduction ...... 1 2 Audio Editing ...... 3 3 Tools Used ...... 5 3.1 NAudio ...... 5 3.2 TypeScript ...... 6 3.3 RequireJS ...... 7 3.4 Knockout ...... 7 4 Implementation ...... 8 4.1 EditoneLib ...... 8 4.1.1 Hierarchy ...... 8 4.1.2 Mixer ...... 9 4.1.3 Serialization ...... 10 4.1.4 Audio Format ...... 10 4.2 EditoneLibRT ...... 11 4.2.1 Delegation ...... 11 4.2.2 Optimization ...... 13 4.2.3 Input and Output ...... 14 4.2.4 Bridge to APIs ...... 14 4.3 EditoneApp ...... 15 4.3.1 Delegation ...... 15 4.3.2 LESS ...... 15 4.3.3 Editor ...... 16 4.3.4 Tiled Clip Rendering ...... 16 4.3.5 Open and Save Project Dialogs ...... 16 5 User Interface ...... 18 5.1 Adherence to Design Guidelines ...... 18 5.1.1 Left Margins ...... 18 5.1.2 Flyouts ...... 19 5.1.3 Panning ...... 20 5.2 Examples of Tablet-Oriented Approach ...... 20 5.2.1 Clip Drag and Drop ...... 20 5.2.2 The Knob Control ...... 20 5.2.3 The AppBar ...... 21 5.2.4 Responsivity ...... 21 6 Platform’s Limitations ...... 23

v 6.1 Debugging C# and JavaScript Simultaneously ...... 23 6.2 Custom Zooming and Native Panning ...... 23 6.3 Plug-ins ...... 24 6.4 Folder Access ...... 24 6.5 Discussion ...... 25 7 Conclusion ...... 26 7.1 Monetization Potential ...... 27 7.2 Final Words ...... 27 Bibliography ...... 29 Index ...... 32 A Appendix: Archive Contents ...... 32

vi 1 Introduction

Whether one is a musician recording their own song, an aspiring writer seeking to record their own audiobook or a professional audio engineer, a good audio editor is a necessity. There are many existing paid and free solutions for all of the well-established platforms, but some platforms lack the diversity. When I searched on the Windows Store[1] for adequate solutions for cutting and mixing multiple audio tracks, I was met with no results. This lead me to the idea to create an application which would fulfill the needs of an audio editing enthusiast on the go. Having previous experiences developing programs for audio manipulation, I found this to be an interesting challenge and a great choice of a topic for my bachelor’s thesis. The goal of my thesis is to create a linear audio editing application with user interface well suited for Windows tablet computers. Its design should follow the best practices for tablet-oriented applications as described in ’s design guidelines[2]. In the first part of my thesis, I describe what lies under the umbrella term of audio editing. I explain where my application fits in and what use cases it satisfies. The following two chapters are a discussion of how I implemented my solution. A list of the tools, libraries and frameworks I used is fol- lowed by the description of the components and the models I created for my application. Afterwards, I write about the user interface and I demonstrate how my application is optimized for use on tablets. I show a few examples of the GUI decisions I made and explain what makes them suited for touchscreen interfaces. The following chapter of my thesis contains the discussion of the problems and drawbacks I ran into. Some of these are intentional restrictions of the Windows Runtime platform, others are unexpected hurdles I had to jump or otherwise cope with. In each case I investigate the problem and arrive to a solution or a possible workaround. In the final chapter I summarize the work I have done for this thesis. I take a brief look on how my application could be monetized and I

1 1. Introduction conclude my thesis with thoughts on how I could further improve on my work in the future.

2 2 Audio Editing

Editing audio on computers is ubiquitous and has become the stan- dard way of treating sound ever since computers became fast enough to handle it[3]. Computers, similarly to the revolution they helped to start in the video industry, completely changed what was possible with sound. The advent of freeware and open source programs such as Au- dacity meant that anybody could afford to do professional-sounding edits to their recordings. The ability to edit sound on computers is nowadays taken for granted. Audio editing, in its broadest sense, is any manipulation of sound, be it cropping, shifting, changing the volume, applying a compressor effect or equalization. On portable devices, some of these operations can be too demanding on the CPU and the battery. Fortunately, this does not concern the casual user, because the functionality they are using is mostly undemanding and easy to process. Take, for example, a journalist going by public transport back from an interview with a recording. They may want to cut some parts out, change the volume, mark up several spots, but they have no need to use an equalizer or the time to tinker with it. These sorts of operations are simple and can be performed in real time even on a portable device such as a tablet or a smartphone. The basic audio editing functions an editor should have include: ∙ Cropping, splitting and positioning (changing the offset of) au- dio clips

∙ Adjusting the volume of individual tracks

∙ Changing the panning of the clips

∙ Fade in and fade out effects

∙ Superimposing (mixing) multiple tracks at the same offset

∙ Mixing down multiple tracks to a single-file output There are already, of course, applications that offer just that. For the major platforms, like Android and iOS, reasonable options exist, some of which are even for free[4][5]. Many of them, however, lack the

3 2. Audio Editing

ability to edit multiple tracks, or only support saving to a downmixed file as opposed to also offering to save the project as a work inprogress for later further editing. When Microsoft released Windows 8 and introduced Windows Store applications, I saw an opportunity to create the first audio editing application for the new platform. I set out to create an app which would feel native and be easy to use on a Windows tablet. I decided to name it Editone, a portmanteau of edit and tone. For Editone, I set these goals:

1. Create a platform-independent core library that could be used by third parties. The idea behind this library is to enable de- velopers to easily organize audio clips on a timeline, serialize and deserialize this data. It should provide an API for the basic audio editing functions I outlined earlier.

2. Write a simple audio mixer that supports mixing multiple au- dio tracks at once. This would allow the users to play their projects back immediately and export them to a down-mixed audio file.

3. Implement a set of basic audio filters that a user would need the most. The thesis’ topic mandates a fade-in and a fade-out effect and one other effect of my choice.

4. Create a user interface for use on tablets and other devices with touchscreens that would feel fluent, as uncluttered as possible, easy to use and, at the same time, familiar to a user experienced with other audio editing software.

4 3 Tools Used

Editone is written using Visual Studio 2013[6]. Its front end is written in HTML5, while its back end is implemented in C#. In the following sections I describe the libraries and tools I used to help me develop Editone.

3.1 NAudio

The NAudio library[7] is an open source project which aims to provide sound manipulation functionality for the .NET platform. It’s written in C# and it supplies a collection of wrappers to Windows audio APIs such as Media Foundation (MF) and Windows Audio Session API (WASAPI). Besides providing wrappers to native APIs, it also contains classes for the functionality that the Windows platform does not offer. NAudio is focused around handling audio streams. It includes many classes which inherit from System.IO.Stream, some of which are sources (microphone, file decoders, ...) and others that act as filters (boost, crop, mono to stereo). The streams themselves are immutable (cannot be written to) and the desired alterations and filterings are done on the fly when they are read from. They then get passed toan output class, like a file encoders or speaker output. This concept is similar to how physical real-time audio processing chains work with sound inputs, effects and sound outputs. Superficially, the NAudio’s approach may seem like it is lacking the expressive power of a more bare-metal approach, wherein sound is represented by a large array of samples and all the operations are applied to the array directly. That would be a misunderstanding of what a filter class offers. When read from, a filter stream can seekin it’s source stream and read from an arbitrary offset. It has the same expressive power as the bare-metal approach, because it can access and read from the whole source stream every time it’s read from. I chose NAudio, because it’s a library that’s well maintained. It offers many file decoders, resamplers, filters and output classes that Editone needs and it has few rivals on the C# scene.

5 3. Tools Used

input filter ...

mp3

filter output

Figure 3.1: Audio processing chain

3.2 TypeScript

The Windows Store platform for applications only officially supports applications written in a few programming and scripting languages: C++, C# or JavaScript[8]. While, originally, I intended to write the en- tire application in C#, which uses XAML as the language for defining the user interface, I quickly reconsidered. It proved to be too challeng- ing for me to design custom components in XAML. It it too verbose and it can be tedious to work with. What’s more, XAML’s dialect for Windows Store Applications also differed too much from the dialect present in the Windows Presentation Foundation framework I had prior experience with. After my initial attempts to cope, I decided to re- write the application using a combination of C#, HTML5 technologies and TypeScript. TypeScript is a language which compiles into JavaScript. It intro- duces classes, interfaces and lambda functions, while still maintaining backwards compatibility with JavaScript code[9]. The goal of Type- Script is to ensure that no errors that could be prevented using a statically typed language occur on runtime. It features built-in sup- port for common JavaScript module loaders, such as AMD, CommonJS and RequireJS, which is one of the reasons I chose it as the scripting language for the user interface.

6 3. Tools Used 3.3 RequireJS

Asynchronous module loading is a JavaScript code pattern that en- ables the developer to use and create modular code. RequireJS expects the developer to write their code as modules, each of which resides in separate file. These modules loaded asynchronously, when nec- essary, either by the browser or by another JavaScript runtime envi- ronment[10]. This allows the developer to create meaningful file and directory structures and to better organize their source code. RequireJS is a module loader that is directly supported by Type- Script, which made it easy to incorporate into my project’s develop- ment stack. I decided to leverage it in order to split my code into logical units and to keep my source code from getting confusing and hard to orientate in.

3.4 Knockout

Having prior positive experiences developing with Knockout[11], a JavaScript MVVM (Model-View-View Model) framework, I felt like this would be a good opportunity to use it again in Editone. Knockout manages the graphical user interface and data bindings through. It uses observables to, unidirectionally or bidirectionally, bind view- model data to the view. This way, Knockout takes care of updating the user interface with no intervention necessary on my part. Another useful feature of Knockout are dynamically loaded com- ponents. Knockout supports user interface components which can be re-used throughout the application. Each one consists of a view model and a template, which are loaded into Knockout and displayed on demand. Knockout’s component pattern corresponds with how JavaScript modules are structured in my application. In fact, Knockout uses the module loader of my choice, RequireJS, to load the compo- nents’ view models and templates. The use of custom-written compo- nents allowed the front end of my application to have an even clearer source code structure.

7 4 Implementation

I implemented Editone as a solution consisting of three projects. In this chapter I describe each project as a layer of the application. These layers form what is essentially a sandwich architecture: EditoneLib – the base, EditoneLibRT – the filling, EditoneApp – the presentation layer.

4.1 EditoneLib

First, the core layer: EditoneLib. It’s the model around which the rest of the application is built. Completely independent on the remainder of the app, it is written as a Portable Class Library (PCL). Consequently, EditoneLib is restricted to use only the most basic and platform inde- pendent features of the .NET library. This, among other things, means that it doesn’t handle any input or output. As abstract as possible, the idea behind EditoneLib is to handle the streams passed into it as black boxes. EditoneLib’s main purpose is to enable the audio to be structured into a clip-track-project hierarchy.

4.1.1 Hierarchy EditoneLib provides a hierarchical system for organizing clips into tracks and tracks into projects. I wrote EditoneLib with versatility and extensibility in mind; I did not limit EditoneLib only to support mono and stereo audio tracks. The library is written to potentially support any kind of track and clip. Examples of future non-audio tracks that EditoneLib can support are lyrics tracks, piano roll tracks and rhythm tracks. While not currently present, they could be added to the library with no changes to its current API. A project is, plainly put, a collection of tracks. Besides being a track container, it also holds a reference to the project settings such as the project’s volume and each individual output channel’s volume. Tracks are the individual time lines in the project. The ITrack interface, which all tracks in EditoneLib implement, is defined as a collection of non-overlapping clips indexed by their offsets. Tracks in EditoneLib also provide basic means of customization: ITrack exposes

8 4. Implementation

Figure 4.1: Hierarchical model properties Color and Name, which allow the user to visually distinguish one track from another more easily. In EditoneLib, clips are the leaves of the hierarchy and the focal point of the whole library. They are a container for generic data rele- vant to the project. Depending on their type, they can contain audio streams or anything else. Note that clips don’t have a reference to their containing Tracks and Tracks, too, don’t hold a reference to their Project. This is done intentionally in order to allow third-party developers not to use the whole hierarchy when not necessary. For example when one wants to use EditoneLib to simply split a stereo clip into two pieces, it is enough to instantiate a StereoClip with the source stream and call StereoClip::Split. Each layer in EditoneLib’s hierarchy is indepen- dent of the layer above it. When one only needs to use the Clip class, there is no need to create a Project or a Track.

4.1.2 Mixer Besides organizing clips, EditoneLib provides a basic software audio mixer. Its implementation is located in the ProjectStream class. It

9 4. Implementation

Table 4.1: EditoneLib’s internal audio format Encoding Linear pulse-code modulation (PCM) Sampling frequency 48000 Hz Sample data type IEEE Single precision float (32-bit) Channel count Varies works by adding together the floating point samples from audio clips’ streams, which are multiplied by a coefficient based on how the tracks are mapped to the output [12]. Individual tracks and their channels can be mapped to any one or more output channels. The ChannelMapping class maintains a record of how the tracks are mapped. That depends on the track’s channel count, volume and panning. By default, mono audio tracks are mapped to all the output channels and stereo tracks’ left and right channels are mapped to the corresponding channels of the mixer’s output.

4.1.3 Serialization

Another functionality I wrote in EditoneLib is serialization. Because EditoneLib is a Portable Class Library, it has no concept of files and folders. This means that it has to serialize itself into a platform-agnostic container, which then gets passed elsewhere to be persisted in some kind of permanent storage. The classes in the Serialization names- pace are to be serialized by DataContractSerializer in a class imple- menting the IEditoneSerializer interface.

4.1.4 Audio Format

All data passed into and out from EditoneLib share a common audio format. EditoneLib does not handle any resampling or format conver- sions and leaves this task to the libraries that build upon it. All audio passed into EditoneLib is expected to be represented as streams of 32-bit IEEE floating point values sampled at a rate of 48 kilohertz. This format is defined as constants in the Settings class.

10 4. Implementation 4.2 EditoneLibRT

EditoneLibRT is the the first platform-dependent layer of the solution. It is a Windows Runtime Component that serves several purposes, which I outline in this section.

4.2.1 Delegation When a Windows Store application written in HTML5 needs to ref- erence and interoperate with code written in in a different language, it is done using Windows Runtime Components. EditoneLibRT is an intermediary layer that acts as a gateway to EditoneLib’s functionality for EditoneApp. When writing a Windows Runtime component, it is not possible to simply copy the declarations of EditoneLib’s classes and use them in EditoneLibRT. Classes publicly exposed by Windows Runtime Com- ponents must be declared as sealed[13], which means that the API exposed by EditoneLibRT can’t express EditoneLib’s class hierarchy. Instead of doing a naive copy of EditoneLib, I decided to write classes that would reflect the basic idea of Editone, but express itina more comprehensible format for JavaScript.

Project The main goal of EditoneLibRT.Project, besides delegating the methods and properties of it’s counterpart in EditoneLib, is to provide easy access to operations that are directly related to a project’s usage. This is the class where you can find methods for playback, track im- port, project export, saving and loading. The class provides additional properties useful for displaying the information about a project’s state: PlaybackState and PlaybackMsec are used in the UI for displaying the playback cursor and current playback time. A project’s playback is executed in a thread separate from the rest of the application. It uses Windows Audio Session API (WASAPI) to output the sound to the default sound device. For the source of the sound data, EditoneLib’s ProjectStream is used, which mixes all the tracks in the project down to a single audio stream. Track import is handled in a new thread for each clip. The user is first prompted to select a file to import. The NAudio’s wrapper forthe

11 4. Implementation

Media Foundation decoder then gets invoked and, upon successful decoding, a new track is inserted into the project. In order to inform the user that their action was successful, a ghost representation of the clip is inserted into the new track. It serves as a placeholder for where the real clip will be inserted once it’s done converting and resampling to EditoneLib’s audio format. After the processing is done, the ghost clip is removed and replaced with its concrete counterpart.

Track The Track class in EditoneLibRT is more than a wrapper around the core track class. Besides holding a collection of concrete clips projected from the delegated EditoneLib.Track, it holds a sep- arate collection of virtual ghost clips. They need to be held in two separate collections in order to simplify maintaining the delegation of the core clips EditoneLib.Track provides. To the UI layer, Track presents these two collections as one merged collection. An instance of EditoneLibRT.Track can represent a mono audio track, a stereo audio track or any other kind of track present in Edi- toneLib. It contains a few fields by which the UI layer can distinguish the type of track it’s handling. These fields are:

Type String holding the full name of the type of the delegated track.

IsAudioTrack Boolean value that is true if the delegated track inherits from EditoneLib.BaseAudioTrack<>

NumChannels Represents the source channel count on the delegated track.

Clip Besides reflecting and delegating methods and properties to clips in EditoneLib, EditoneLibRT.Clip has a few tricks of its own up its sleeve. The Clip class can be used as a way of representing ghost clips. When the user chooses to import a track from an audio file, the clip’s graphical outline is shown immediately, with animation that signifies loading. This way the user knows that a clip is being loaded and they have a reference as to where the clip is located and how long it is. This is an extension of EditoneLib’s model. The core library’s hierarchy has no concept of virtual clips that are only just loading in.

12 4. Implementation

EditoneLibRT’s Clip also contains methods for applying the built- in filters (fade-in, fade-out and normalization). These methods instan- tiate the corresponding filter streams and replace the clip’s source stream. Other utility methods Clip provides are the Move and CanMoveTo methods. The latter is used to detect whether a track has enough space at a given offset to hold the referenced clip. If not, it uses a heuristic algorithm to try and suggest another offset where the clip would fitin. The Move method simply removes the clip from its current track and inserts it into the specified track at the given offset.

4.2.2 Optimization Besides delivering EditoneLib’s functionality to its public interface, EditoneLibRT provides several callbacks for the front-end layer that, were they implemented in JavaScript instead, would be to demanding on the application’s performance and would degrade the user experi- ence. This code is tightly bound to the front-end’s functionality. An example of this is the ClipVisualizer class. The user interface layer uses the ClipVisualiser class for render- ing visual representations of audio clips. It renders a 2D bitmap image, in which each pixel column visualizes the range of sample values from its corresponding time span. The resulting bitmap then gets converted into a PNG image, which is passed into the user interface in base64 encoding. This operation, had it been written in JavaScript, would be too performance-demanding and would have resulted in poor user ex- perience. Instead, EditoneApp submits a request for the image, which is processed in an EditoneLibRT’s background thread. Another example of offloading a resource intensive task from JavaScript to C# is the method EditoneLibRT.Clip.CanMoveTo(track, offset). This method is used to check whether moving a clip to the specified offset in a given track would make it overlap with another clip. Such an operation would result in unexpected behavior during mixdown. If an overlap is detected, the method attempts to offer an al- ternate offset, which is closest to the original offset, where the clipcan be placed without overlapping. This can easily become a non-trivial operation if the target track hosts many clips with many spaces in between that the clip being moved can’t fit in.

13 4. Implementation

4.2.3 Input and Output

A very important part of EditoneLibRT are its input and output capa- bilities. This layer is the only part of the solution that has access to both EditoneLib’s serialization model and the system’s storage and audio APIs. EditoneLibRT is thus the only place where audio playback, track imports and project saving and loading can happen. While EditoneLib contains a serialization model, the core library has no access to system-specific storage APIs. Because EditoneLib’s serialization can’t perform all the steps necessary by iteslf, EditoneLi- bRT contains an implementation of EditoneLib’s IEditoneSerializer, which provides the methods necessary for EditoneLib’s serialization chain to work. The SaveStreamAsync method saves the raw data from a project’s clips in the folder where the project is saved. The file names are randomly generated GUID[14] strings which are returned back to the serialization model to be saved in the project’s XML file. During project’s loading, the stream data is retrieved from these files back into EditoneLib’s clips using the LoadStreamAsync method.

4.2.4 Bridge to APIs

In Windows Store applications, JavaScript code is restricted from ac- cessing some of the system APIs available to other languages[8]. For ex- ample, JavaScript can’t access the Windows Audio Session API (WAS- API), which is required for EditoneLib to output sound to playback devices. EditoneLibRT therefore handles the sound output by passing the core library mixer’s output to WASAPI (or rather NAudio’s WAS- API wrapper). It also informs EditoneApp of the stream’s position and state by triggering events whenever either changes. Another API that is unavailable directly to either EditoneApp or EditoneLib is Media Foundation. Specifically, Media Foundation’s source reader[15] is required to decode the supported audio file for- mats[16] into the format supported by EditoneLib (see table 4.1.4). The decoding of the supported file formats has to be therefore done in EditoneLibRT. Similarly, encoding audio for export has to be done in this layer as well. The C# code for Media Foundation’s functionality are provided by wrapper classes in the NAudio library.

14 4. Implementation 4.3 EditoneApp

The front-end layer of the application is called EditoneApp. I wrote the user interface in HTML5, as mentioned in Chapter 3. EditoneApp is designed as a single page application[17]. All of the page’s controls load on start and are either hidden (e.g. open and save project dialogs, track rename dialog), or dynamically added when necessary (track settings, clips, control knobs, ...). This is managed using the Knockout framework.

4.3.1 Delegation

However, there’s much more to EditoneApp than just HTML tem- plates. In order to reflect EditoneLibRT’s functionality and pass its classes’ properties into Knockout as observables, it is necessary to create wrapper classes for EditoneLibRT. These classes are then used in the rest of the layer as the most low-level model available. It would be impossible to utilize Knockout’s bindings without these wrapper classes.

4.3.2 LESS

Another important part of EditoneApp are the Cascading Style Sheets (CSS). I wrote my style sheets using the language LESS[18], which gets compiled into CSS. When compiled, all of the style sheets in the project get imported into a single file. The file is then compiled and the resulting file is linked in the program’s default HTML file. LESS is efficiently utilized in the theme-color-setting classes, a visually interesting feature of EditoneApp’s style sheets. Each track can have its own theme color. When a track’s color is changed, its root element’s theme class changes. All colors in the track controls (TrackSettings and TrackVisual) are declared using CSS classes, which are defined for each possible theme color they inherit. This means that the theme color gets propagated throughout all of the track’s DOM with only one variable change in the track’s model. I also use LESS to simplify the media queries used in my app to make it responsive. I wrote two mixins, in which detached rulesets, de-

15 4. Implementation

fined immediately before the mixin’s invocation, are called to generate media queries for different screen resolutions.

4.3.3 Editor EditoneApp provides user interface for manipulating with tracks and clips. To apply these manipulations, the application first must know which track or which clip the user wishes to manipulate with. In the Editor class, I therefore implemented a focus mechanism. When the user taps a clip, it gains clip focus. Similarly, when the user taps inside a track, the track gains track focus. Both is indicated by a thick border visible on the relevant elements. Some operations, like playback, import or clip splitting, need to know at what offset they should operate. The user specifies thisby moving the cursor. The cursor can be placed by tapping a place in the timeline. EditoneApp provides two visual cursors. First is the offset cursor, which is positioned by the user. The other is the playback cursor, which is green and indicates the current position of playback.

4.3.4 Tiled Clip Rendering Clips need to be redrawn whenever the zoom level reaches a set thresh- old in order to maintain good resolution. In order to speed up drawing the visual representation of a clip’s waveform, I implemented tiled rendering. Each time a redraw is issued in a clip, the clip component divides the data to be rendered into sections of desirable width. Each of this section gets rendered in its own thread using the method Get- Base64PngArrayAsync provided by EditoneLibRT.ClipVisualizer. As a precaution against clogging up the thread pool, if another re- draw gets issued while previous tiles are still rendering, the previous rendering tasks get canceled.

4.3.5 Open and Save Project Dialogs Editone relies on being able to access multiple files alongside the project file, which is possible only if the user picks a folder where the project should reside when they are prompted (see 6.4). The JavaScript control library for Windows Store application (WinJS), as of version 3.0

16 4. Implementation

Figure 4.2: Save dialog, illustrating the implemented solution

– the last version targeted at Windows 8.1 Store Applications, doesn’t have a control for modal dialogs with arbitrary content. Therefore, I had to create a user interface which would allow the user to pick a folder and then pick a project file or create a new one. I decided for designing the dialogs as pages that replace the view of the project’s timeline. This makes sense especially in the case of the project load dialog, because when a user decides to load another project, they aren’t interested in seeing the timeline of the old one. In the case of project save dialog, this approach also prevents the user from making any changes to the project while it’s being serialized.

17 5 User Interface

Editone is designed to be used on a Windows tablet computer. I tried to focus on making its user interface touch friendly and ergonomic.

5.1 Adherence to Design Guidelines

When developing a Windows Store application for Windows 8.1, the developer is met with a comprehensive guide on how to design such an application. This guide covers everything from the appropriate margins, responsivity and fonts to splash screen design and localiza- tion[2]. When designing Editone, I attempted to adhere to these design guidelines. In this section I describe which suggestions and principles I implemented and where my attempts failed.

5.1.1 Left Margins The guidelines define recommended margins on the left side of the application. They are supposed to provide white space to emphasize the focus on the main content of the application. The minimum left margin specified in the guidelines is 20 pixels. Moreover, some appli- cations bundled with the OS (e.g. the Weather application) use wider margins on the left side of the screen on larger viewports.

Figure 5.1: User interface, showing three clips in two stereo tracks

18 5. User Interface

Figure 5.2: The rename track flyout

For Editone, I decided to follow the guidelines, as well as my inspirations from other applications in the Windows Store. The user interface always presents a minimum of 20 pixels of white space on the left side of the screen. When the viewport is large enough, the white space gets wider, so that the user can grip their device without obstructing any important content with their palm.

5.1.2 Flyouts

There are several places in Editone where flyouts are used: Context menus, error messages and context dialogs. The guidelines mention several dos and don’ts, which i followed:

∙ Error flyouts feature no ’Close’ button or any other button, be- cause that would imply that there’s an action the user can do. Instead they rely on the user tapping outside of them to be dis- missed.

∙ The flyouts are kept as small as possible, with no padding added to the contents.

∙ They are shown only in response to a user action, usually a tap on a button in the tool bar or the AppBar.

19 5. User Interface

5.1.3 Panning Due to a platform limitation (see 6.2), I had to write my own imple- mentation of content panning in Editone. I tried my best to mimic the native scrolling behavior. Editone uses two-axis railed scrolling to allow the user to view their project.

5.2 Examples of Tablet-Oriented Approach

A tablet application has to be well optimized for touch input. A great app should look good on all possible screen sizes and orientations. In this section I describe what compromises I made and what optimiza- tions I used to make Editone as ergonomic as possible.

5.2.1 Clip Drag and Drop The most natural way of moving things around on a touchscreen is by dragging them from one place to another. Because a clip’s position is not defined by its X and Y coordinates on screen, but rather its offset and the track it’s in, I could not use any pre-existing solution. The operation is started by holding a clip for a while. Similar to holding a live tile in the OS’s Start screen, the clip pops up, informing the user that they can move the clip. A shadow of the clip follows the user’s finger and the target position is checked every time the position changes for possible collisions with other clips. If a collision is detected, either a position closest to the finger’s position where no collision occurs is suggested instead, or the clip’s shadow turns red and if the clip is dropped, the move operation fails. While dragging, the user can hover their finger over all of the tracks visible in the viewport to move the clip to that track. If a clip is dropped into a track that does not support its type, the move fails.

5.2.2 The Knob Control The restricted area of a tablet’s screen calls for space-efficient controls wherever possible. One example of such a control is the Knob. Knobs are a replacement for the native sliders, which consume a lot of space in either horizontal or vertical direction. They are great for use in

20 5. User Interface

a flyout, on a page with a lot of white space or in situations when there’s just one or two sliders in the whole application, but in Editone’s timeline they would just waste valuable space. When dealing with Knobs, I had to design them in such a way that would allow them to be turned easily even if they were placed on the very edge of the screen. To cope with this, I decided for a simple solution: the value of a knob is controlled by moving the finger on the vertical axis. The knob changes its value base on the change in the vertical coordinate divided by a constant. If a knob is placed near the top or bottom edge, this allows easy control by allowing the user to drag their finger from the knob to the center of the screen and adjusting the knob’s value there.

5.2.3 The AppBar A great user experience starts with good immersion. A well designed AppBar is supposed to enhance it. It’s purpose is to hide non-essential commands and GUI, so that the user can concentrate on the applica- tion’s content[19]. Editone’s AppBar is almost invisible most of the time. The only thing hinting its presence is a thick black line at the bottom of the screen. However insignificant it seems, it is essential. It contains im- portant, but not too often used commands, like saving or loading a project, deleting a clip or adding, removing and renaming a track. Without an AppBar, the user interface could become cluttered very fast, especially on smaller screen sizes. This would lead to more accidental ’mistaps’ – tapping mistakenly on undesired buttons. Using an AppBar keeps the application looking appealing and easy to use and it saves a lot of area for the main content of the application.

5.2.4 Responsivity Even though typically the user will want to have the widest possible view of the timeline, Editone does not rule out the possibility of using the app in portrait mode. If the app’s window get’s even thinner, it adjusts to provide the best possible experience – its header, instead of just one row, becomes two rows and the track headers shrink to free as much space for the timeline as possible.

21 5. User Interface

Some non-essential elements of the user interface may be omitted in thinner viewports, such as track names or button labels. This is usually not to the detriment of the overall user experience, but instead to save as much area as possible for the the most important element of the application. Despite all of these optimizations, it should be clear that it is still best to use the app in landscape mode and the widest viewport possi- ble.

22 6 Platform’s Limitations

In this chapter, I describe the limitations of the Windows Store Appli- cation platform and its tools that I ran into during the application’s development. Many of these could be solved by a workaround, but the possible workarounds might be inconvenient for the user or the developer, or not as efficient as the intended original solution.

6.1 Debugging C# and JavaScript Simultaneously

A big obstacle in the development of Editone was the fact, that Visual Studio 2013 can’t debug JavaScript and C# code in Windows Store apps at the same time [20]. For me, this meant that to debug an error, I often had to restart the application a few times to switch the programming language I was debugging, just to get to the root cause of the bug. There is a way to debug both scripts and managed code at the same time. It involves launching another instance of Visual Studio and attaching it to the application already running from the first Visual Studio instance[21]. However, this is tedious, because it is required to reattach the second debugger each time the application is run. Another inconvenience is the fact that to debug both kinds of code, the developer has to switch between two windows, which could lead to losing the thought context.

6.2 Custom Zooming and Native Panning

Content panning and zooming are one of the common user interac- tions that are hardware accelerated in Windows Store applications[22]. If a developer wishes to handle panning and zooming gestures differ- ently, it is made possible using the CSS property touch-action set to none, the MSGesture object and MSGesture* events. The touch-action property disables the default panning and zooming and enables the events, making it possible to provide custom panning and zooming behavior. This, however, does not translate well into a situation where a developer wants to have smooth, hardware accelerated panning, but

23 6. Platform’s Limitations

custom zooming behavior[23]. In Editone’s case, the victim of this is the timeline. Unable to find any solutions that would allow me tohave native content panning, but custom zooming, or any existing solu- tions for emulating the native panning behavior using the MSGesture* events, I had to resort to writing my own implementation. I wrote code for custom handling of panning and zooming, which I plan to release as a stand-alone library. Its aim is to provide as native and as smooth experience possible. The library consumes MSGesture* events and fires its own events instead. Apart from gestures for pan- ning and zooming, it supports dragging and dropping held elements. An indispensable part of the library is the custom scroll component, which adds a scrollable viewport with scroll bars. It supports scrolling using touch gestures, mouse wheel and moving the scroll bars. It improves on the native experience in that is allows scrolling in the alternate direction on single-axis mouse wheels when alt is held on the keyboard.

6.3 Plug-ins

Windows Store applications don’t support dynamically loaded binary or managed components[24]. The application can only provide the functionality it was deployed with. This means that I, unfortunately, could not implement any mechanism of dynamically loading filter plug-ins. There is a workaround: While the application cannot run any ex- ternal C++ or C# code, it is possible to run arbitrary JavaScript code on runtime. An online store of audio filters written in JavaScript could be created and used to apply generic transformations. The Chakra JavaScript engine used in Windows 10’s Universal Windows Platform supports asm.js[25], a performant subset of JavaScript, which would make it possible to process Editone’s audio in JavaScript in real time. I will explore this further in the future.

6.4 Folder Access

As far as arbitrary files and folders are concerned, the Windows Run- time API only grants the application access to those which the user

24 6. Platform’s Limitations

has selected through the FileSavePicker, FileOpenPicker and Fold- erPicker controls[26]. While I have no strong objections to this model, and I completely understand its security reasoning, I ran into some inconveniences when designing the procedures for saving and loading the project files. Editone saves its projects as a set of files: One main project file in an XML format and a list of binary files containing raw clip data. This poses a problem, because to to save multiple files, the application first has to ask the user for permission to access a folder. I had to create a dialog page for saving and opening projects. This dialog consists of a folder picker and a prompt with a list of project files in the folder. What proved to be confusing to the users was the usage of Windows Store platform’s folder picker. It looks exactly like a file picker and users get easily confused as to how to select folders and instead try to select files. Nevertheless, my solution is the least cumbersome wayof letting the application create or read more files in a folder.

6.5 Discussion

While everything mentioned in this chapter hindered the progress of Editone’s development, with the exception of the originally intended plug-in capabilities, I eventually managed to overcome these hurdles and came up with solutions for the problems I encountered. I plan to release some parts of my solution as stand-alone libraries for use by third parties.

25 7 Conclusion

In this thesis I described Editone, a sound editing application I im- plemented for tablet computers running Windows 8.1. I created the application as a solution of 3 layers:

1. EditoneLib - the core model of the application, is a platform- independent library that organizes clips into a hierarchy. It is written with extensibility and versatility in mind. It provides serialization features and a software audio mixer, as well as three most commonly used audio filters.

2. EditoneLibRT - the intermediate layer, provides the functionality of EditoneLib to the UI layer. It supplies EditoneLib with the platform-specific components it needs, facilitating support for serialization into files and sound input and output

3. EditoneApp - the front-end layer, is used to present the under- lying concepts to the user. It lets the user manipulate data the lower layers provide in an intuitive way.

A lot of effort went into developing Editone’s audio mixer, the core library’s design, serialization, maintaining data bindings across all the layers, optimizing performance requirements and making the user interface work as intended. During the time I spent implementing Editone, I had to solve several problems and overcome the obstacles I outlined in the previous chapter and others. Some of the parts of Editone solve problems other developers were facing, but couldn’t find a solution themselves. I believe that the work I did for mythesis will have a bigger impact than the thesis itself. Three years after the introduction of the Windows Store platform, several audio editing applications have been released. While Editone offers more features than most of its competition[27][28], its capabili- ties have been matched by two applications: Recording Studio[29] and Sound Editor R2[30].

26 7. Conclusion 7.1 Monetization Potential

As I mentioned the introduction (1), I was very eager an passionate about creating Editone ever since Windows Store apps were intro- duced. I believed that I had found a hole in the market, which, however small, could be capitalized on. I’m convinced that Editone has a big potential as far as monetization is concerned. The simplest way of monetizing the application, would be to pub- lish it to the Store as a paid app. It’s the most straightforward strategy available, but I doubt it’s the best one. Another possible strategy would be to offer the application for free. The users could use the basic functionality and some built-in filters with no limitations. The crux of this strategy lies in plug-ins and the ever more popular in-app purchases. As mentioned in 6.3, with the advent of Chakra and asm.js in Windows 10’s UWP, plug-ins could be written by third parties in JavaScript. This opens the opportunity to introduce an in-app store to which users could submit their filters and sell them to other users. The possibilities I outlined here surely deserve much more thought before they can become real strategies. But I envision that Editone could have a profitable future. I look forward to competing with similar applications on the Windows Store platform

7.2 Final Words

The goal of my thesis was to create a linear multi-track audio editor. The editor runs to on Windows 8.1, with the possibility of being run on a tablet computer. The application enables the user to apply effects to audio clips, split them or move them about on the timeline. The result of my implementation is an audio editor for a broad au- dience. It fulfills all requirements mentioned in the thesis’ description. I expanded on the original description of my thesis’ topic in many ways that make Editone even more useful and friendly to the user. Moreover, thanks to its multi-track model and software mixer, it out- competes other applications available on the Windows Store platform. I submitted the entirety of my work in an online repository[31] under the MIT license[32].

27 7. Conclusion

In the time the application was being developed, Microsoft re- leased another major version of Windows. As a future goal, Editone would greatly benefit from being ported under Windows 10 and from adjusting its user interface to better fit the style of Universal Windows Platform applications[33]. Editone is still fairly inadequate when it comes to the functionality it offers. Only a basic amount of tools for sound editing is supported. Therefore it would be necessary to expand on the filters and tools it provides for it to truly become competitive with applications available on other platforms.

28 Bibliography

[1] – výsledky hledání. url: https://www.microsoft. com/cs-cz/store/search/apps?q=audio%20edit (visited on 14/12/2015). [2] Windows 8 Design and coding guidelines. url: http://go.microsoft. com/fwlink/p/?linkid=258743 (visited on 20/12/2015). [3] R. Derry. PC Audio Editing: Broadcast, Desktop, and CD Audio Pro- duction. Focal Press. Focal, 2003. isbn: 978-0-24-051697-4. [4] Top 5 Best Audio Editing Apps For Android Devices | Android Fan Club. url: http://www.androidfanclub.net/2015/04/5-best- audio-editing-apps-for-android-devices.html (visited on 14/12/2015). [5] 10 iPhone Apps for Recording & Editing Audio | Blog | Kenney My- ers. url: http://www.kenneymyers.com/blog/10-iphone-apps- for-recording-editing-audio/ (visited on 14/12/2015). [6] Microsoft. Welcome to Visual Studio 2013. url: https://msdn. microsoft.com/en-us/library/dd831853(v=vs.120).aspx (visited on 10/14/2015). [7] NAudio - Home. url: https://naudio.codeplex.com/ (visited on 10/05/2014). [8] Kraig Brockschmidt. My Take on HTML/JS vs. C#/XAML vs. C++/DirectX (choosing a language for a Windows Store app). Jan. 17, 2013. url: http : / / www . kraigbrockschmidt . com / 2013 / 01 / 17 / html - javascript-xaml-directx-language-windows-store-app/. [9] Richard Jones. “ECOOP 2014 – Object-Oriented Programming: 28th European Conference, Uppsala, Sweden, July 28–August 1, 2014, Proceedings”. In: Springer Berlin Heidelberg, 2014. Chap. Un- derstanding TypeScript. isbn: 978-3-66-244202-9. [10] Why AMD? url: http://requirejs.org/docs/whyamd.html (visited on 14/12/2015). [11] Knockout : Home. url: http://knockoutjs.com/index.html (visited on 14/12/2015). [12] Roey Izhaki. “Mixing Audio Concepts, Practices and Tools”. In: Taylor & Francis Ltd, 2007. Chap. Software mixers: The internal architecture. isbn: 978-0-24-052068-1.

29 BIBLIOGRAPHY

[13] Kraig Brockschmidt. “Programming Windows Store Apps with HTML, CSS, and JavaScript”. In: , 2014. Chap. 18. isbn: 978-0-73-569570-2. [14] GUID structure (Windows). url: https://msdn.microsoft.com/ en-us/library/aa373931(VS.85).aspx (visited on 19/11/2015). [15] Playing in-memory audio streams on Windows 8. url: http : / / blogorama.nerdworks.in/playinginmemoryaudiostreamsonw/ #readingmusicmetadatausingthemicrosoftmediafoundation (vis- ited on 19/11/2015). [16] Supported Media Formats in Media Foundation (Windows). url: https: //msdn.microsoft.com/cs-cz/library/windows/desktop/ dd757927(v=vs.85).aspx (visited on 01/01/2016). [17] Introduction. url: http://singlepageappbook.com/index.html (visited on 21/12/2015). [18] Getting started | Less.js. url: http://lesscss.org/ (visited on 30/12/2015). [19] Embracing UI on demand with the app bar - Windows 8 app developer blog - Site Home - MSDN Blogs. url: http://blogs.msdn.com/ b/windowsappdev/archive/2012/09/06/embracing- ui- on- demand-with-the-app-bar.aspx (visited on 01/12/2015). [20] Kraig Brockschmidt. “Programming Windows Store Apps with HTML, CSS, and JavaScript”. In: Microsoft Press, 2014. Chap. 16. isbn: 978-0-73-569570-2. [21] Simultaneously debugging script and managed/native code in Win- dows 8 | kraig brockschmidt. url: http://www.kraigbrockschmidt. com/2012/11/27/simultaneous-debugging-script-managed- native/ (visited on 16/12/2015). [22] Make your site touch-ready (). url: https://msdn. microsoft.com/en- us/library/jj583807(v=vs.85).aspx (visited on 16/12/2015). [23] Metro style apps with HTML5/JavaScript - MSPointer (MSGesture) in scrollable elements. url: http : / / answers . flyppdevportal . com/categories/metro/html5jscript.aspx?ID=07b11063- 0e9a-402c-82e7-e0bed787017b (visited on 15/09/2015). [24] Plugins? url: https://social.msdn.microsoft.com/Forums/ en-US/61444580-8120-44cf-a019-d6d64801f037/plugins? forum=winappswithcsharp (visited on 16/12/2015).

30 BIBLIOGRAPHY

[25] Supercharging JavaScript performance with asm.js | Microsoft Edge Dev Blog. url: https://blogs.windows.com/msedgedev/2015/ 11/10/supercharging-javascript-performance-with-asm- js/ (visited on 16/12/2015). [26] Maarten van de Bospoort Jeffrey Richter. “Windows Runtime via C#”. In: Pearson Education, 2013. Chap. Accessing user files via explicit user consent. isbn: 978-0-73-567923-8. [27] Lexis Audio Editor – Windows Apps on Microsoft Store. url: https: / / www . microsoft . com / en - us / store / apps / lexis - audio - editor/9wzdncrdsx0c (visited on 04/01/2016). [28] Sound Editor – Windows Apps on Microsoft Store. url: https:// www . microsoft . com / en - us / store / apps / sound - editor / 9wzdncrfhmws (visited on 04/01/2016). [29] Recording Studio – Windows Apps on Microsoft Store. url: https:// www.microsoft.com/en-us/store/apps/recording-studio/ 9wzdncrfhv4d (visited on 04/01/2016). [30] Sound Editor R2 – Windows Apps on Microsoft Store. url: https: //www.microsoft.com/en- us/store/apps/sound- editor- r2/9nblgggz5z4d (visited on 04/01/2016). [31] Editone - Home. url: https://editone.codeplex.com/ (visited on 26/12/2015). [32] Editone - License. url: https://editone.codeplex.com/license (visited on 29/12/2015). [33] Guide to Universal Windows Platform (UWP) apps - Windows app development. url: https://msdn.microsoft.com/en-us/library/ windows/apps/dn894631.aspx#user_experience (visited on 26/12/2015).

31 A Appendix: Archive Contents

The archive attached to this thesis contains the following files and directories:

∙ Source – directory containing the entirety of my application’s source code. Also available in Editone’s Codeplex repository [31].

∙ ABSTRACT – file containing the abstract written in Czech andin English

∙ BUILDING – guide on how to build and run Editone

∙ KNOWN-BUGS – list of bugs present in the archived version

∙ LICENSE – copy of the MIT license, under which Editone is re- leased

∙ User Manual.pdf – user manual for performing basic tasks in Editone.

32