Master’s Thesis

Optimizing team collaboration web applications through local storage and synchronization

Dennis Andersen & Andreas Nilsson Department of Computer Science Faculty of Engineering LTH Lund University, 2013

ISSN 1650-2884 LU-CS-EX: 2013-34

Optimizing team collaboration web applications through local storage and synchronization

Dennis Andersen Andreas Nilsson [email protected] [email protected]

September 9, 2013

Master's thesis work carried out at RefinedWiki AB.

Supervisor: Björn Johnsson, [email protected] Examiner: Per Andersson, [email protected]

Abstract

Team collaboration software is a great way for people to share their work. But often the need arises for accessing the work while on the go using limited net- work connections. The ongoing development in the area of HTML5 and web applications has opened up many new possibilities, especially for storing data locally. In this thesis we extensively test and compare these new solutions to find the best alternative. Along with testing and comparing techniques for repli- cation and synchronization we hope to combine the two, enhancing the user experience by optimizing the team collaboration .

Keywords: Database, replication, HTML5, local storage, optimization ii Acknowledgements

We would like to thank the people over at RefinedWiki for providing us with a great work- ing environment, along with feedback and support. We would also like to thank our su- pervisor Björn Johnsson for proofreading this report and giving us feedback.

iii iv Contents

1 Introduction 1 1.1 Goals ...... 1 1.2 Outline ...... 2 1.3 Division of labor ...... 3

2 Background 5 2.1 Team collaboration ...... 5 2.1.1 Web clients ...... 5 2.1.2 Atlassian Confluence ...... 6 2.2 HTML5 overview ...... 7 2.2.1 The HTML5 specification ...... 8 2.2.2 HTML5 performance features ...... 8 2.3 Client-side storage ...... 9 2.3.1 Before HTML5 ...... 9 2.3.2 HTML5 Storage solutions ...... 10 2.3.3 ...... 12 2.3.4 Web SQL Database ...... 12 2.3.5 Indexed Database API ...... 14 2.3.6 FileSystem API ...... 15 2.4 Database replication ...... 16 2.4.1 Database Replication Terms ...... 16 2.4.2 Master-Slave and Group ...... 17 2.4.3 Lazy and Eager replication ...... 17 2.4.4 Partial and Full replication ...... 17 2.4.5 Conflicts ...... 17 2.4.6 Combining characteristics ...... 18 2.4.7 Two-tier replication ...... 18 2.4.8 Transaction-Level Result-Set Propagation ...... 19 2.4.9 Three-tier replication ...... 21

v CONTENTS

3 Defining the model 23 3.1 Client-side storage ...... 23 3.1.1 Our requirements ...... 23 3.1.2 Method ...... 24 3.1.3 Implementation and results ...... 26 3.2 Database Replication ...... 33 3.2.1 Our requirements ...... 33 3.2.2 Method ...... 34 3.2.3 Results ...... 39

4 Implementation 41 4.1 Overview of functionality ...... 41 4.2 Confluence Architecture ...... 42 4.3 Implementation Design ...... 42 4.3.1 Server ...... 42 4.3.2 REST-API ...... 45 4.3.3 Client ...... 46

5 Result 49 5.1 Use cases ...... 49 5.1.1 Use case 1 ...... 49 5.1.2 Use case 2 ...... 50 5.1.3 Use case 3 ...... 51 5.1.4 Use case 4 ...... 51 5.2 Result of use cases ...... 52

6 Conclusion & Future work 55 6.1 Future Work ...... 56

Appendix A Test resources 65 A.1 Reading values of increasing size ...... 65 A.2 Writing values with increasing size ...... 65 A.3 Fetching entries from storage ...... 65 A.4 Loading a page with storage of increasing size ...... 66

vi Chapter 1 Introduction

Team collaboration software is an important tool for companies and organizations to man- age large projects. Market reports show that the team collaboration and web conference software market is growing each year [1]. Often these types of software have implemented a web interface which allow for project members to collaborate from anywhere in the world and at any time through their computers or mobile phones [2, 5]. However, since the soft- ware requires an internet connection to work, there is an immediate loss in performance due to response times and the fact that all the relevant data is stored on a remote server. With the new HTML5 standard being developed along with new Application Program- ming Interface specifications for web applications, there exists new ways to store persistent data inside the browser.

The use of team collaboration software in large companies can be essential, something that's verified by the many thousand companies using solutions such as Atlassian Confluence [4]. The problem with implementing a local storage solution in a team collaboration soft- ware as opposed to other software types is that the data is shared among many users, mean- ing the data can quickly become outdated. Compare this to a software that deal with micro blogging, bulletin boards or mail clients where the user is responsible for its own data; where it is far more easy to synchronize, edit locally and upload changes, since it mostly affects the user that is making the changes. In a team collaboration software environment, the data you are editing might be simultaneously edited by several other users.

1.1 Goals

The purpose of this thesis is to combine state of the art browser technologies for storing data locally with an efficient database replication algorithm, modified for our needs. Doing so we hope to construct a powerful web application capable of efficiently using a web-based team collaboration system with a minimal load on the server. Our goals are to achieve the

1 1. Introduction following:

• Response times should be reduced to half of their original values.

• To handle several users using and editing the same content by investigating and im- plementing the handling of conflicts.

• Users should be able to synchronize and store large amount of data entries (at least 200) from the software.

In order to achieve this, extensive testing should be conducted in order to derive the best solution. When evaluating the client-side storage the following aspects are of interest:

Performance: The overall performance should be good. Reading, writing and searching the local data should be done efficiently.

Size capability: Supporting the storage of large amount of data is important, as a large team collaboration instance can contain thousands of entries.

Browser support: The solution should be able to run on as many browsers and versions as possible in order to cover as many users as possible.

In regards to synchronizing and maintaining the database the following aspects are of im- portance:

Partial database: The solution should be able to synchronize specified parts of the database as opposed to the entire database.

Conflicts: When synchronizing the solution should be able to detect and handle conflicts.

Synchronization: Since performance is important, the number of synchronization occa- sions should be as few as possible while still meeting the two criteria above.

Extra data: The additional amount of data needed locally in order to maintain the database and its synchronization should be kept as low as possible, prioritizing the collabora- tion data. 1.2 Outline

The thesis is structured according to the following outline:

Chapter 2 Serves as an introduction to the relevant theory and technology explored and tested. The chapter is meant to offer the necessary information needed to understand the problems and solutions presented in this thesis.

Chapter 3 This chapter studies the relevant solutions in greater detail by testing them ac- cording to our requirements and specifications. They are measured against each other and compared. This is done in order to derive the best route to go when implement- ing a prototype solution.

2 1.3 Division of labor

Chapter 4 Presents the implementation and structure of the prototype that was imple- mented. An overview of the functionality and design is presented and the different components are described.

Chapter 5 In this section several use cases are presented that is used to test the imple- mented prototype. The results are presented and compared to the original instance.

Chapter 6 Here the results previously presented is evaluated. The implementation is an- alyzed according to the goals set up and how well it performed. Furthermore the implementation is evaluated as to what flaws or deficiencies are still present and how it could be improved upon in the future. 1.3 Division of labor

Both authors have worked along side each other on RefinedWiki and as such both has been involved in all aspects of the thesis. Coding sessions have at times been done together, in addition to meetings and discussions. As both authors have specialized in the same areas the division of labor was done arbitrarily as follows:

• Dennis Andersen

– Client-side storage comparison & test – Client implementation

• Andreas Nilsson

– Database replication & synchronization – Server implementation

• Collaboration

– REST-API – Use cases & results – Report

3 1. Introduction

4 Chapter 2 Background

In order to further understand the need and potential of an optimized team collaboration web application, this section offers background information on all relevant components needed in order to achieve the goals specified.

2.1 Team collaboration

Team collaboration software is often used in large projects where a lot of people are in- volved in the development. These types of software can be described as platforms where project members can share different types of resources with each other and communicate with a lot of colleagues at the same time. The resource types that are shared in these types of software varies and often depend on the system architecture, but some common types include pages [2, 5], blogs [2] and documents [3]. These resources have different properties that define them. In addition to a type, there is usually an owner or creator that is the user who is responsible for a resource. After being created the resource may have different per- missions attached to it that allows it to be viewable or modifiable for different users.

A problem that might occur when sharing a lot of resources between many users is that two or more users might work with the same resource at the same time. This can result in a conflict that can be handled in different ways. In the case of TWiki for example, the software tries to merge the content automatically and if that fails the user must resolve the conflict manually [5]. Sharepoint on the other hand, locks out other users from editing the same content while another user is editing, thus avoiding conflicts [6].

2.1.1 Web clients It is not uncommon for these types of software to implement their client application as a web client that runs in a browser. A problem with this approach is that it is often dependent

5 2. Background on running against a server. Viewing, editing and sharing resources must all go through a server. It is often unavoidable to do these things without communicating with a server since sharing over the internet requires a connection. However, when reading and editing data it might not always be necessary to communicate with a server. Microsoft Sharepoint for example has implemented a desktop version [7] that allows for offline use. In this project we will analyze to see if some similar desktop type features could be implemented in a web client for the purpose of optimizing it.

2.1.2 Atlassian Confluence

In this project we will focus our implementation and run our tests on the specific software Confluence by Atlassian [2]. Confluence is an Enterprise Wiki developed in Java. By running the Confluence software on a server, users can access the server through a web interface to collaborate on projects. It deals with multiple different resources such as pages, blog posts, attachments (files) and comments. Everything in a Confluence instance is organized using different spaces that collects different resources. As such, every resource in Confluence is tied to a space for organizing purposes.

Pages In this project we will mainly work with the resource type called pages. A page is where information is stored and it has several properties that defines it. The most important ones that will be used are defined in table 2.1

Table 2.1: Confluence page properties

Property Description id A unique identifier for the page. title A title for the page. content This property contains the XHTML content of the page. position The position of the page in the page tree. parentId The parent page id. spaceKey The identification key of the space it belongs to. creatorDate The date it was created. isRoot Tells us if the page is at the top of the tree. creator The user that created the page

The id property is a globally unique integer that identifies the page. The content and title property is what contains the main information of a page that is rendered in the web client. Those are the two properties that is mostly updated by users when they are editing pages. See figure 2.1 for a screenshot of a page in Confluence. The parentId property is used together with the position and isRoot properties to determine the page's position in the current space's page tree. A page tree contains all the pages that belong to a space ordered as a tree. The top level pages has no parent, but otherwise each page has a parent

6 2.2 HTML5 overview page that is set in the parentId property, and the page's position among all of the parent's children is found in the position property.

Figure 2.1: A screenshot of a rendered page in Confluence. To the left there is a page tree.

Atlassian SDK Atlassian has released it's own Software Development Kit called Atlassian SDK [8] to help third party developers to develop plugins for Confluence. With the help of the SDK, third party applications can use tools developed for Confluence that deal with Resources, for example, Pages or Spaces. In this project we have also used the SDK when implementing our prototype.

2.2 HTML5 overview

The web is constantly evolving and web applications have become more and more promi- nent as a tool allowing users to work directly on the web. The modern web features a full range of web applications that could serve as replacement for desktop applications, includ- ing team collaboration software. This development has led to the need for a new updated web specification.

7 2. Background

2.2.1 The HTML5 specification

To further enable the development of web applications and complex web solutions a small group of people started the Web Hypertext Application Working Group (WHATWG) in 2004 [12]. In order to enable new applications and address the current HTML specifica- tion's shortcomings the group created the HTML5 specification and started working on new features focusing on web applications [9, pp.1]. They were joined by the Consortium (W3C) in 2006 and the two organizations worked on the specification resulting in the first draft being published in 2008. Since the specification was focusing heavily on web application features from the start, HTML5 solved a lot of practical problems, leading to several early implementations from the browser vendors even though the specification wasn't finalized. It was only in December 2012 that the specification reached the form of a W3C Candidate Recommendation [13]. To quote the W3C Technical Report Development Process [14], A Candidate Recommendation is a document that W3C believes has been widely reviewed and satisfies the Working Group's technical requirements. W3C publishes a Candidate Recommendation to gather implementation experience. W3C specifies four maturity levels for their work in progress: Working Draft, Can- didate Recommendation, Proposed Recommendation and W3C Recommendation. This means that the HTML5 specification is currently in its second stage of becoming a finalized specification. Having a constantly changing early draft of the specification to implement towards has led to several different implementations and new suggested specifications from the browser vendors meaning no single solution has become the norm or standard. As we will see in section 2.3, this is also the case regarding HTML5 storage solutions.

2.2.2 HTML5 performance features

It's clear that the new HTML5 specification has had performance as a priority. The specifi- cation introduces several new and improved technologies that increases the performance of browsing the web and using web applications. Some of the general methods used in order to improve performance, according to project site HTML5 Rocks [15], has been to:

• Store locally - Data stored locally is quicker to access than accessing data from the server.

• Minimize connections - Connections require computation time for setup and tear- down procedures.

• Process in the background - If a process or calculation doesn't result in a visual change to the users action it can be calculated in the background.

• Decrease bandwidth - Less bandwidth means a faster response.

One of the most important area where HTML5 will gain a lot on performance is through the local storage technologies described in detail in section 2.3. Other technologies include Web Workers, Web Sockets and Link prefetching.

8 2.3 Client-side storage

2.3 Client-side storage

Client-side storage or simply local storage, as the name might imply, describes the technology used for saving data locally as opposed to saving it on the server-side. Normally there are two ways of saving this data locally, namely to store it in memory or to save the data to disk. The latter allows for persistent data storage spanning several sessions. The reason behind using client-side storage, as pointed out by Mahemoff [19], is twofold:

1. The information and thus the web application will be available offline.

2. The performance of the web application is increased.

This makes client-side storage particularly interesting as a tool for optimizing web applica- tions, making it a key feature in this project.

In the context of HTML5 and this project, client-side storage is the general term used for several different but related storage specifications. These specifications were developed ei- ther in conjunction with the HTML5 specification or at a later stage as a closely related technology. There are four major that's been specified regarding client-side storage: Web Stor- age, Web SQL Database, Indexed Database and the FileSystem API. In section 2.3.2 we take a closer look at the similarities and differences of these storage solutions and in section 2.3.3 through 2.3.6 we describe them each in detail. The need for persistent data storage has of course been present longer than HTML5 and because of this, several earlier storage solutions exist. In the next section, section 2.3.1, we describe how some of these work and why they are insufficient for modern web applications and what we need in this project.

2.3.1 Before HTML5 Cookies Since early on, cookies has been the main solution for storing small chunks of data on the client-side. A cookie is a small file saved on the clients computer and later attached to each HTTP request [10, pp.49]. Even though there's still a widespread use of cookies they are not without limitations. A cookie is limited to storing at most 4 KB of data. As stated this cookie is sent with every HTTP request adding to the data being sent and slowing down the bandwidth. Furthermore this means that the cookie and its content is visible if the connection isn't encrypted, posing as a potential security risk.

Google In 2007, Google launched the Google Gears project [17]. Gears was an open source browser plugin aimed at enabling more powerful web applications. The plugin enabled the browser to work offline and store more data locally [10, pp.2]. The local storage was handled with an API to an embedded SQL database based on SQLite [18]. By 2010 however, Google had shifted efforts towards the new HTML5 standard and Google Gears was discontinued [16].

9 2. Background

2.3.2 HTML5 Storage solutions

As we previously stated, HTML5 set out to fill the need of additional features regarding, among other things, client-side storage. The main specifications; Web Storage, Web SQL Database, Indexed Database and the FileSystem API, fall into three different categories: Web/Browser Storage, Database Storage and File Storage. Although they are all almost exclusively made available offline thanks to being stored locally, there is another term called Offline Storage which is often used when describing a different API, namely Offline Web Application. The Offline Web Application API is part of the HTML5 specification [29]. This API uses a cache manifest specifying resources for the browser to save locally and thus make them available offline. The browser will automatically keep them updated when browsing online and read them locally when offline. Since the API focus on saving HTML resources for of- fline use and not saving or handling data locally we will not consider the API as a solution for client-side storage.

The HTML5 storage APIs all use the local storage of the client in their own distinct way in order to achieve a persistent data storage. In this section we will describe each of these technologies closer in order to determine their strengths and weaknesses and how they differ from each other. Although different, the APIs share some common features and similarities that we'll mention before describing what separates them. One obvious feature they all share is the ability to store and retrieve information on and from the client. Along with this feature there are some additional similarities [19], described below.

Same origin policy All four APIs work in a sandbox fashion, practicing the Same origin policy [20]. The policy is arguably, as stated in Google's Handbook [21, part 2], the most impor- tant security concept within modern browsers. The principal is to allow the communica- tion between several pages on the same domain while restricting the sharing of resources between domains. The storage APIs are tied to a single origin, meaning a unique com- bination of domain, protocol and port. This means that if http://www.example.com saves some data, the same data can be read from a different page on the same domain, e.g. http://www.example.com/page2.html. The very same data will not, however, be available from http://abc.example.com.

Quotas If every storage solution from every domain was allowed unlimited storage capabilities, the memory or disk space available on the client would quickly run out. Therefore, each of the specifications enforce a quota limit. This limit varies depending on the browser implementation and the API, but typically the browser will either prompt the user for more space when the limit has been reached or implement a limit that cannot be exceeded or changed. Since the solutions are bound by the same origin policy the expected behavior of the browsers is to enforce a single limit for the storage of every origin. In the case where the API has a fixed limit, this is even the recommended behavior according to the specifications [22,

10 2.3 Client-side storage

Ch.5], [23, Ch.6] stating "A mostly arbitrary limit of five megabytes per origin is recommended. Implementation feedback is welcome and will be used to update this suggestion in the future." In order to combine the rules of same origin policy with the quota limitation this would mean that all subdomains (i.e. origins) should not exceed said limit, resulting in a much larger limit for the top domain. The browsers implement the quota limitations in this man- ner, even though the specifications specifically state that storage distributed over several subdomains should be avoided. To quote from [22, Ch.5], [23, Ch.6], User agents should guard against sites storing data under the origins [of] other affiliated sites, e.g. storing up to the limit in a1.example.com, a2.example.com, a3.example.com, etc, circumventing the main example.com stor- age limit.

Transactions The two database storage APIs, Web SQL Storage and Indexed Database, share the feature of using transactions. In short, transactions are used for two things:

1. Encapsulate database queries and work to allow failure recovery if there was a prob- lem along the way.

2. Provide isolation of instances accessing the database concurrently, allowing queries to be executed in a concurrent manner.

Transactions will be described in more detail in section 2.4.1.

Synchronous and Asynchronous modes All storage solutions support both synchronous and asynchronous modes with the excep- tion of Web Storage, which only supports synchronous operations. Synchronous mode is sufficient to use on small operations that only take a few milliseconds and it's a much simpler programming model. On the downside, synchronous mode is blocking, meaning that the operations are executed in succession where the next operation is started after the previous one finished. In practice this would mean that if a large operation is executed, everything else (e.g. the user interface, rendering and user input) is blocked until that op- eration is completed. In section 2.2.2 we mentioned an HTML5-based solution solving this called Web Workers. They allow for running synchronous operations in a separate thread, making the main thread available to continue. The other storage solutions are asynchronous per default and the only way to circum- vent this (i.e. to support synchronous mode) is by running synchronous operations in a , which in a sense makes the operation asynchronous. Contrary to the synchronous mode, asynchronous mode will execute the next line of code directly following the start of a storage operation. When the operation has completed a callback function is called where the result is handled. This programming model might not be as straightforward to follow as a synchronous one, but it allows the user to use the application and the user interface during storage operations. Furthermore it will allow for operations to run in parallel, making the application run more smoothly.

Even though the APIs share several features, they are at some points quite different. In the following sections (2.3.3 - 2.3.6) we'll go through each of them and describe them in more detail.

11 2. Background

2.3.3 Web Storage The Web Storage API is usually the specification referred to when talking about client-side storage or local storage. The reason behind this is because the Web Storage API was the first storage solution specified as a part of the HTML5 specification [25, Ch.5.11]. Sometimes this storage solution is also referred to as DOM Storage [24], the reason for which we'll describe shortly. Originally a part of the HTML5 specification, the API was extracted and specified in its own specification in 2009 [26]. In the specification Ian Hickson introduces two mech- anisms that the storage API is meant to handle, that the predecessor HTTP session cookies fails to handle [22, Ch.1] (see Cookies in section 2.3.1): 1. The first is designed for scenarios where the user is carrying out a single transaction, but could be carrying out multiple transactions in different windows at the same time.

2. The second storage mechanism is designed for storage that spans multiple windows, and lasts beyond the current session. In particular, Web applications may wish to store megabytes of user data, such as entire user-authored documents or a user's mail- box, on the client side for performance reasons. Web Storage is designed to store data in easily accessible JavaScript objects, located in the (DOM) which is why it's sometimes called DOM Storage. The data is saved as key-value pairs where both the key and value must be strings, in the same way as cookies save data. This means that objects and other types of data have to be converted and parsed to and from strings (e.g. using JSON.stringify and JSON.parse) before being stored using Web Storage. The API specifies a very simple interface Storage defining methods and properties as can be seen in table 2.2 [22, Ch.4.1]. The Web Storage specification provides two attributes implementing the Storage in- terface: sessionStorage and localStorage. The sessionStorage implementation only persists through a session. This means that when the tab or window is closed, the values stored are cleared. The localStorage implementation however, is persistent and saves the values to the hard drive. With this exception, the two implementations work in the same way.

When storing data using Web Storage the specification recommends a limit of 5 megabytes per origin [22]. However, this behavior is inconsistent between the browsers where, some- times the user is silently allowed more space than 5 megabytes, and in other cases prompting the user for more storage space. When a webpage containing data stored with Web Storage is loaded, the persistent data stored in localStorage has to be loaded from disk. If this process is slow or if a lot of data is stored locally this could be noticeable since Web Storage only supports synchronous operations, as previously discussed in section 2.3.2.

2.3.4 Web SQL Database The Web SQL Database API extends the functionality of client-side storage in order to support data in a structured way. The Web SQL Database specification states that, This

12 2.3 Client-side storage

Table 2.2: Storage interface specification

Name Type Description length property Returns the number of key- value pairs currently present in the storage object. key(index) method Returns the key with the given index. getItem(key) method Retrieves the value with the specified key or null if no key exists. setItem(key, value) method Creates or updates a given key with the given value. removeItem(key) method Removes the key and its asso- ciated value. clear() method Clears the storage object of ev- ery key/value pair. specification defines an API for storing data in that can be queried using a variant of SQL [23]. This means an API that provides a structured database with all the functionality and complexity of an SQL database. The API supports key-value pair much like the Web Storage API, with the added ability to index fields from these values in database tables, making searching faster and allows for a structured way of storing data. The data stored is limited to basic SQL-database types such as strings and numbers. The database allows for asynchronous operations through a JavaScript interface using the SQL dialect SQLite [9, pp.286]. To interact with the database the API provides three core methods summarized in table 2.3 [11, pp.284].

Table 2.3: Web SQL Database base methods

Method Description openDatabase Opens an existing database or creates a new based on the supplied parameters. transaction Method that takes a single function as main argu- ment, encapsulating queries in a transaction ca- pable of being rolled back. executeSql Method for running SQL commands on the local database.

In order to query the database a transaction request is performed. Within this transaction, one or more executeSql commands are specified, creating a single transaction to be ex- ecuted on the database. The code below illustrates how to open and execute a query on a database:

13 2. Background

// Open database 'myDatabase' with version 1.0 and size 2 MB var db = openDatabase('myDatabase', '1.0', 'My description', 2 * 1024 * 1024); db.transaction(function(tx) { tx.executeSql('INSERT INTO MYTABLE (id, log) VALUES (1, "foobar")'); });

As with Web Storage, the specification recommends an arbitrary size limitation of 5 megabytes [23, Ch.6]. The fact that the specification only has one implementation, using SQLite, is one of the underlying reasons for the specification no longer being maintained. The current specifi- cation documents states that, This document was on the W3C Recommendation track but specifi- cation work has stopped. The specification reached an impasse: all interested implementors have used the same SQL backend (Sqlite), but we need multiple independent implementations to proceed along a standardisation path [23]. Because of the lacking number of implementations, and will not support or implement the API in their browsers [28].

2.3.5 Indexed Database API The Indexed Database API (often abbreviated IndexedDB), was originally proposed by Or- acle under the name WebSimpleDB API [30]. The original specification was later renamed and gained contributors in conjunction with rejecting the Web SQL Database API in favor for Indexed Database [27, 28]. As its alternative, the Web SQL Database, the API aims at improving and adding func- tionality over Web Storage. There is a need for storing large number of objects locally, and the specification states, [Web Storage] is useful for storing pairs of keys and their corresponding val- ues. However, it does not provide in-order retrieval of keys, efficient searching over values, or storage of duplicate values for a key [27]. The Indexed Database API provides this functionality, making it possible for the user to e.g. fetch several key-value pairs in a specified range indexed on some attribute. IndexedDB uses something called an object store, which could be compared to the ta- bles of a Web SQL Database. These stores provide an interface to store JavaScript objects directly, without the need to parse them. The objects in turn must contain a key to be re- trieved on, and may also contain additional keys, called indexes [10]. This allows for sorting and searching a store by a certain attribute fast, by creating the relevant indexes for the type of object stored. Interaction with the database is done in the context of transactions. This allows for several instances of the database to be opened at the same time without having to worry about concurrency problems. It also gives the ability to rollback the transaction in case of error. In order to store an object the user opens the database, from where a transaction can be created. Inside the transaction the object store can be accessed from where database oper- ations like put and get can be executed. The basic and structure of the IndexedDB API is shown in figure 2.2. As figure 2.2 illustrates, every interaction with the database is done in the form of an IDBRequest. This request is asynchronous by default and takes two callback methods as

14 2.3 Client-side storage

Figure 2.2: Diagram illustrating the structure of the Indexed Database API and how to access object stores.

parameters: onsuccess and onerror, that will be called respectively depending on if the request was successful or if an error occurred. For example, if the get method is called from the IDBObjectStore, the result will be an IDBRequest whose onsuccess method will be called containing the value corresponding to the key provided in the get method. At this time the specification is in the final stages as a working draft [27] (see section 2.2.1). As a fairly young specification there isn't any guidelines regarding storage quota limitations, leaving this open for the separate browser implementations.

2.3.6 FileSystem API

The FileSystem API is a file storage solution that builds on the more basic File API [31]. As stated in the specification [32], the API defines ways for a browser to expose a sandbox version of the user's local file system to web applications. While the API is a form of client- side storage, it differs from the other solutions as it aims at solving issues not handled well by databases. These issues involve cases such as storing large amounts of binary data, or making the data available outside the application by storing files locally. So in the instance of team collaboration and its resources such as pages and comments, the FileSystem API is arguably not the right tool to use. Although it could provide a solution for handling resources as files, especially attachments, it's outside the scope of this thesis and will not be investigated further.

15 2. Background

2.4 Database replication

To use a local database it has to be populated with data. The term Database replication is often described as the process when copies of data from a database is present on many nodes at the same time [34, pp.1]. A database replication scheme can be implemented in many ways, and in general we look at the following characteristics:

• How many primary copies of the database are there?

• Who can update the database and when?

• Do we want to replicate a partial or a full version of the database?

Section 2.4.2-2.4.4 contains a description of each characteristic.

To understand how database replications work we need to define some terms, which can be found in section 2.4.1 below.

2.4.1 Database Replication Terms Primary database: The primary database is a version of the database that is considered to be the ''real database''. Depending on the system chosen, there could be many primary copies.

Node: When we talk about nodes we refer to anyone or anything in the system that com- municates with the database. For example, clients browsing a (server) could be called nodes. In the context of this report, every node has got a replica of the primary database.

Database replica: A database replica refers to a copy of the primary database that a node reads from.

Consistency: If a database replication scheme is consistent every node has the same copy of the database all the time. It is closely tied with synchronous updates.

Transactions: All operations performed on a database (such as write operations) are called transactions. With a database transaction there are certain properties, called ''ACID''- properties, that describe the transaction in most database systems. The properties are known as atomicity, consistency, isolation and durability [36].

Atomicity: The transaction is atomic if all the operations occur in the transaction, or if no operations occur. Consistency: A transaction should not violate database constraints set up in the database. A constraint could for example be that a value may not be null. Isolation: The isolation property points to the fact that no transactions should ap- pear to be executed at the same time. Durability: Finally, the durability requirement is the property that the effects of a transaction is permanent.

16 2.4 Database replication

Update a node: Updating a node refers to updating a nodes database replica to reflect the changes made by other nodes. This does not necessarily mean that the the replica will be exactly the same as the primary database after an update, since we can update a node with only partial changes.

2.4.2 Master-Slave and Group The relationship between nodes determine who can update the nodes and their replicas. Two techniques for this are Master-slave and Group [37]. In the master-slave scheme there is only one node that can update other nodes in the database. Therefore, if a node wants to make changes to the database it has to be done using the master-node. The master node must determine if the requested transactions should be performed or not. In a group environment, every node can perform transactions. This means that every node can (and should) update every other node.

2.4.3 Lazy and Eager replication Lazy and eager replications (also known as asynchronous and synchronous replications) refers to when each node is updated [34, pp.5]. In a lazy scheme a replica is first fully updated and afterwards the other replicas may be updated. This means that a node doesn't have to wait for the other nodes to update their data, provided it has been fully updated. In an eager replication scheme though, all the data should be consistent. A node must thus wait for all nodes to update their replicas of the database before continuing with oper- ations.

2.4.4 Partial and Full replication When replicating a database one might not need all entries and it is certainly desirable in systems with large databases. In the case of partial replication there are two major areas to consider; which data to replicate and which nodes should have a partial replication and which should not. Two known replication methods are called Pure partial replication and Hybrid partial replication [35]. In a pure partial replication, none of the nodes contains the full database. In a hybrid partial replication however, at least one node contains a full replication of the database and at least one contains a partial replication of the database. This can be compared to the non-partial replication, (known as Full database replication) where every node has a complete replica of the database.

2.4.5 Conflicts When two or more nodes edit the same content, and both commit their changes to the server a conflict occurs. The system can handle conflicts in many different ways. It could try to merge the content, send it back to the user or perhaps simply discard it. Also, the case where two or more transactions are executed at the same time should not be able to occur since it violates the ACID laws, as described in Section 2.4.

17 2. Background

2.4.6 Combining characteristics When implementing a database replication scheme using the above described properties there are four basic combinations, as shown in table 2.4, that are achieved by combining the lazy and eager parameters with the master-slave and group parameters. The identification of these parameters was first defined by Gray et al. [37, pp.174] and later used in many other papers to describe database replication schemes [38, 39].

Table 2.4: Database replication scheme combinations

Lazy (Asynchronous) Eager (Synchronous) One primary database One primary database Nodes updated when possible Nodes updated at the same time Master-slave (All copies might not be consistent) (All copies are consistent) N primary databases N primary databases Nodes updated when possible Nodes updated at the same time Group (All copies might not be consistent) (All copies are consistent)

General Model Wiesmann et al. constructed a general model to describe and compare database replication schemes [38]. This is done by defining the following general phases:

1. Request - This phase is where a node requests to perform an operation on other repli- cas.

2. Server Coordination - In this phase the synchronization of the replicas is carried out.

3. Execution - The execution phase is where the operations are carried out on the repli- cas.

4. Agreement Coordination - This phase is where all replicas agree on what the trans- action will produce.

5. Response - In the final phase, the requesting node receives a response describing what was agreed upon.

It is not necessary for a scheme to implement all the phases. For example, in a lazy database replication scheme the phase Server Coordination would be skipped since the up- dates are done asynchronously.

2.4.7 Two-tier replication In [37], Gray et al. argues that even though lazy master replication schemes have a slightly better deadlock rate than lazy-group and eager systems, it is still not suitable for mobile applications since that approach assumes that the nodes are always online, which may not always be the case. Therefore Gray introduces a scheme to cope with this, called Two-tier replication. In this scheme, there are two types of nodes called Base nodes, which are online

18 2.4 Database replication nodes where primary copies of the database reside, and Mobile nodes, which are nodes that are offline often. The mobile node's database replicas has two types of data items, Master versions and Tentative versions. The master versions are items that correspond to the latest known values received from a base node and the tentative versions are data items that has been modified locally. Because of the two database versions there are also two kinds of trans- actions, called Base transactions and Tentative transactions. Base transactions are transactions performed on the base nodes, i.e. the primary database(s), whereas tentative transactions are transactions that are performed on the local data and those changes are saved as tenta- tive versions in the mobile nodes. An illustration of the transaction process can be found in figure 2.3.

A node thus performs its tentative transactions on the local database replica and saves the result from these in the tentative version. When that node goes online, it re-transmits the tentative transactions as base transactions. However, since the replica the tentative transac- tions are executed on is not necessarily a reflection of the current primary database, there must be an acception criterion that has to be fulfilled for the transaction to be executed, oth- it will be discarded.

Figure 2.3: The figure describes the transaction process for a mobile node in the two-tier replication scheme. The tentative transactions (T1, T2 and T3) are executed locally first and then executed on the base node as base transaction (B1, B2 and B3)

2.4.8 Transaction-Level Result-Set Propagation Another algorithm was proposed in [40] by Ding et. al, called Transaction-Level Result- Set Propagation and is in some sense a continuation of the two-tier technique described in [37] since it also works in two layers, namely the mobile nodes and the base nodes. The algorithm focuses on conflict detection and resolution by assigning objects into different sets to determine which data in a node is commitable and which is not. For every transaction in a node the following sets are defined:

19 2. Background

• ReadSet, WriteSet & ResultSet. These sets contain the objects that are accessed in a transaction with objects read in the ReadSet and objects written to in the Write- Set. The ResultSet contains the new values for the written objects in the WriteSet.

• AccessSet. This set contains all the objects accessed in a transaction and all chrono- logically previous transactions, i.e. it has all accessed data items up until that point.

• ReadableSet. This set contains all the objects accessed up until the current trans- action that has not been updated since the last time the node synchronized with the base node.

• CommitSet & CancelSet. With the help of the other sets, the two sets CommitSet and CancelSet are filled with transactions that are valid and invalid respectively. The process is described in more detail later in this section.

The AccessSet is populated when a node is offline. During this phase, all transactions are saved locally and when a mobile node connects to a base node, the AccessSet is computed and sent together with the list of transactions to the base node which will split the transac- tions into the two sets, CommitSet and CancelSet. The procedure can be described by the following steps:

Initialization In the initialization process we start of with empty CommitSets and CancelSets. In this phase, the ReadableSet for the first transaction is computed by comparing the objects in the AccessSet with the ones on the server.

Conflict Detection In this step, the algorithm looks at every transaction in order and decides whether or not the transaction fully ''passes''. This is decided by looking at the WriteSet and to see if the current transaction is the only one accessing it and if the ReadSet is a subset of ReadableSet. If that is the case, the transaction has passed, i.e. no conflict has been detected and all objects in the transaction are added to the CommitSet. The WriteSet is also added to the ReadableSet. However if the transaction does not pass, meaning we have found a conflict, objects in the WriteSet will be removed provided they are present in the ReadableSet, since they are not considered readable anymore. In the code below we assume that Set.add will never add duplicate objects from a transaction.

#for(T in OrderedTransactions) #if(exclusive(T) && isSubSet(ReadSet(T), ReadableSet)) CommitSet.add(T) ReadableSet.add(WriteSet(T)) #else CancelSet.add(T) ReadableSet.remove(WriteSet(T)) #end #end

20 2.4 Database replication

Incorporation When the conflict phase is done the transactions in the CommitSet are executed and the transactions in the CancelSet are sent back to the mobile node.

For a complete description of the algorithm the reader should refer to [40]. Apart from the above algorithm, there is also a synchronization process that is described below.

Synchronization The client sends a timestamp and its id to the server with a primary database replica. The server collects all entries updated since the last time, marks them as read-only and transmits them to the mobile node. After transmitting, read-only locks are removed from all entries.

2.4.9 Three-tier replication Baldoni et al. [41] proposes an alternative to two-tier replication by introducing a mid-tier component between the clients and servers. The mid-tier is in charge of ordering requests and communicating with the servers. This is done by implementing the mid-tier proto- col with an Active Replication Handler and a Sequencer. The Active Replication Handler is the component that receives the incoming requests, orders them (with the help of the se- quencer) and sends them to the servers. The sequencer is in charge of assigning a sequence number to each request. The Active Replication Handler also deals with the communica- tion in the opposite direction, i.e. sending the responses from the servers to the clients. As such, Baldoni's solution gives the responsibility of consistency to the mid-tier com- ponent since the communication between the clients and server replicas are no longer direct.

21 2. Background

22 Chapter 3 Defining the model

To achieve the best result when developing a prototype it's important to use the right tech- nologies and solutions. In this chapter we'll revisit our goals and derive tests and methods for comparing the solutions from that. We'll present the findings and what the final imple- mentation will consist of.

3.1 Client-side storage

Selecting what HTML5 Storage solution to use in order to optimize a web application isn't an obvious choice. The different solutions described in sections 2.3.2 through 2.3.6 could all be used to store necessary information locally. In this section we will revisit the re- quirements we've established for this project and go through our methods of testing and comparing the solutions. This will be done with our criteria as guidelines to see how the different solutions perform. Since the theoretical specifications behind the solutions isn't enough to determine what storage API to use, we'll take the test results into account when choosing what technology to use.

3.1.1 Our requirements Among our goals defined in section 1.1, we've specified that our solution should improve response times for reading data and that it should be possible to store a large amount of data on the client-side. When evaluating and comparing the client-side storage solutions the following criteria are the most important ones:

1. Performance - The overall performance has the highest priority. Fetching (i.e. search- ing and finding entries) and reading data should be done efficiently without hinder- ing performance. Similarly writing new data or updating existing data should be executed as fast as possible. In addition to this the storage solution should be loaded

23 3. Defining the model

such that when a new page loads the storage solution won't affect load times too much or freeze up the application.

2. Size capacity - The size capacity of the storage solution is important. We want the user to have access to large amounts of data for a long period of time without the need to request data from the server. Furthermore the storage solution needs additional space available for user created content and any potential extra information such as settings or database replication information.

3. Browser support - The number of browsers and browser versions that supports the storage solution is our least prioritized criterion but should be able to decide between solutions that performs equally well in regards to the other criteria. Here most pop- ular browsers will have priority.

In addition to these criteria, since the problems to address involve integrating the opti- mization (i.e. local storage and synchronization) with an existing team collaboration web application, the chosen solution must be sufficiently general and flexible. With this in mind we don't only investigate the solutions according to the criteria, but also the type of cus- tomization options available and what types of data that can be stored that might occur in a collaborative environment.

3.1.2 Method Even though all of the solutions covered in section 2.3 could be used in order to save data on the client-side and thus potentially optimize the web application, we want to find the one best suited for our needs. Since we need to be able to apply the solution to a specific web application while meeting our requirements some thorough testing will be needed in order to determine which solution to use. For reasons explained in section 2.3.6, we will conduct these tests on three different can- didates for our solution: Web Storage, Web SQL Database and Indexed Database. Since it's desirable for our solution to persist over browser sessions, the Web Storage attribute sessionStorage will be omitted from the tests. Instead, only the localStorage at- tribute will be tested regarding Web Storage. Note that even though the specification for Web SQL Database is deprecated (see sec- tion 2.3.4) the support is still wide and it's still used and supported by several browsers since no definitive alternative is available. As such we still consider it a valid candidate to consider.

Test strategies In order to maintain comparability between the different solutions, the tests should to the greatest possible extent be designed the same way. This means that the same test data should be used and that the test code is as similar as possible. The tests should be kept simple and only isolate and measure code and operations concerning the storage API currently tested. Each test will be executed three separate times, and the final results will be derived as the average of the three results. In order to test and cover each of the requirements described in section 3.1.1 the fol- lowing storage aspects should be tested:

24 3.1 Client-side storage

• Reading data of different sizes from the storage.

• Writing data of different sizes to the storage.

• Fetching an entry from a storage of varying size.

• Loading a page containing a storage of varying size.

• Storing a large amount of data.

• Percentage of browsers supported.

Some of these might seem similar but we'll see the differences between them and how they are implemented in the following section, section 3.1.3. When measuring sizes and quotas we need to distinguish between the amount of data stored and the number of characters stored. In many cases a character corresponds to 1 byte of data, meaning that a million characters equals 1 megabyte of data. This is not the case with JavaScript however, as JavaScript uses UTF-16 encoding. This means that each character takes up 2 bytes of memory. When testing performance, we'll measure sizes as the amount of data stored in memory. When testing storage limits however, we'll use the number of characters stored as a measurement.

Tools and resources According to StatsCounter [42], the most popular browser at the moment is having around 36.6% of the market (see figure 3.1). Furthermore Google Chrome is the only browser supporting all three solutions being tested. With this in mind all tests are conducted using Chrome 25.0.1364.99 on a Mac running OS X 10.7.4.

Figure 3.1: Statistics generated on [42] showing the most popular browsers

A great tool that we'll make use of when testing is jsPerf [43], an easy way to create test cases and comparing the performance of different JavaScript code snippets using benchmarks.

25 3. Defining the model

The tool allows you to define some preparation code and any potential startup or teardown function. Furthermore, the defined test cases can be marked to be executed as asynchronous tests as well, enabling us to measure asynchronous operations properly. After writing and running the test cases the result of the benchmark, presented as the number of operations per second, can be obtained. An illustration of a simple test can be seen in figure 3.2, and can be found directly at http://jsperf.com/localstorage-reading/6.

Figure 3.2: A simple test comparing different ways of reading Web Storage values performed using jsPerf

The results presented from jsPerf is measured in Ops/sec - operations per second, mean- ing higher results is better. The tests will run until the error margin is small enough or until a maximum execution time limit is reached. In order to measure page loading times in our tests, we make use of another HTML5 technology called Navigation Timing [44]. We use this technology in the form of a script called Loadtime Breakdown [45], giving a simple overview of the page load time break- down. In order to measure the loading time of a page we calculate the difference between performance.timing.loadEventEnd and performance.timing.responseEnd. The script is illustrated in figure 3.3.

3.1.3 Implementation and results Reading values of increasing size In order to read some values from the storage, we must first populate it with data. We define five values of increasing size in the following manner:

var smallValue = "my small value";

var largeValue = "1234567890";

26 3.1 Client-side storage

Figure 3.3: Page load time breakdown as presented by the script Loadtime Breakdown [45].

for (var i = 1; i < 500; i++){ // 10*2*500 bytes = 10kb largeValue += "1234567890"; } var large2Value = largeValue + largeValue; // 20kb var large3Value = large2Value + large2Value; // 40kb var large4Value = large3Value + large3Value + large3Value + large3Value; // 160kb

These values are stored in the storage being tested and we then define separate test cases for reading each of the values. As we saw in the small sample test for jsPerf in section 3.1.2, using the getItem() method for localStorage yield the best performance, making it our primary way of read- ing data. With WebSQL a simple SELECT query is used within a transaction. In the case with indexedDB a get() method is executed on the object store containing the values. The tests can be found in full and be executed through the found in appendix A.1. The results are presented in table 3.1.

Table 3.1: Test result | Reading values of increasing size

Read size Web Storage WebSQL indexedDB Ops/sec Ops/sec Ops/sec Small (28 bytes) 1,743,622 439 511 10kB 151,040 434 483 20kB 79,700 434 481 40kB 40,669 419 476 160kB 10,050 399 422

Not surprisingly, we see that as sizes grow the performance diminishes. Although the per- formance of Web Storage diminishes at a much faster rate than the other two, it's overall much faster than its competitors where indexedDB have a slight upper hand.

27 3. Defining the model

Writing values with increasing size The test for writing values only requires the targeted storage solution to be predefined and available when the test is executed. Because of this fact, the preparation code could be designed to define all storage solutions, giving us the possibility to compare all solutions at once. Similar to the reading tests, we define some values of varying sizes. In this case we'll use a small value, followed by 10kB and finally 100kB. The test is available through Appendix A.2, and table 3.2 presents the resulting benchmarks. Once again Web Storage

Table 3.2: Test result | Writing values with increasing size.

Write size Web Storage WebSQL indexedDB Ops/sec Ops/sec Ops/sec Small (28 bytes) 53,380 419 461 10kB 49,228 397 376 100kB 7,602 263 265 comes out on top with much better writing performance. As with the reading performance, there isn't much separating WebSQL from indexedDB.

Fetching entries from storage Fetching an entry might seem to be the same as reading a value, but the aim is to test another aspect of the storage solution. We want to measure the performance when searching the storage, i.e. the key lookup time, rather than the reading. Unlike the reading test cases, this time we fill the storage with a varying number of entries. Given an arbitrary succession of the keys, we'll measure the time it takes to fetch a key from different places in the storage and to fetch several different keys in succession. Three test cases are defined: fetching the first key, the key in the middle of the key succession and then 10 keys in succession. The different tests can be found in Appendix A.3. The number of keys defined in each storage ranges from 10 - 100,000, each containing a small arbitrary value. In figure 3.4 below, the result of the three test cases is presented for each storage solution. Once again the performance of Web Storage outshines the others. We see that there is no apparent performance loss depending on where in the storage a key is fetched from. Fetching 10 keys in succession renders a result slightly below one tenth that of fetching one key. This would imply a very slight overhead regarding the function call itself. In the case of WebSQL, fetching 10 keys is actually a much faster operation if counted as an average per key. This is the result of all 10 queries being executed in the same transaction as opposed to 10 separate transactions containing one query each. However, the WebSQL performance deteriorates rapidly as the number of entries in our table increases. Looking at indexedDB we see that it doesn't suffer from the same deterioration but rather maintains a consistent performance level independent of the number of entries. In addition, this level of performance outgrows that of WebSQL even further as the number of entries increases. Taking a closer look at the indexedDB key lookup result in figure 3.4, we see a fourth test included named 10 keys (no cursor). This test was included as a comparison to the regular way of fetching several subsequent keys with indexedDB, namely by using a cursor [27,

28 3.1 Client-side storage

Figure 3.4: Test results when fetching entries for different stor- ages.

Ch.3.1.10]. The cursor acts as an iterator over a range of keys, resulting in a fast and efficient way to fetch several keys in succession. As we see in the result, it's a much more efficient way to fetch keys rather than to not use a cursor, but a separate get() method for each key instead.

Loading a page with storage of increasing size In order to test the impact of the different storage APIs regarding page load times we'll write our own HTML pages containing the necessary scripts. We do this by constructing three different pages, one for each storage solution. Each with the ability to store data in the storage, allowing us to test the load times in regards to the amount of characters saved. The three pages, localstorageLoadPage., WebSQLLoadPage.html and indexedDB.html, implement a simple way to save data to the storage and to clear all data. More information on these pages can be found in Appendix A.4. The pages will be hosted locally, and in order to get a correct measurement of the load time each test case will be executed in the following manner:

1. Open the page corresponding to the targeted storage API.

2. Save the amount of characters specified by the test case.

3. Quit the browser.

4. Open a fresh instance of the browser and navigate to the page.

5. When done loading, note the load time using Loadtime Breakdown (see section 3.1.2) or the result presented by the page.

29 3. Defining the model

This way we ensure that the storage is freshly loaded for each test, and not loaded from cached memory. Even though it's a fixed size for each test case, the storage could look very different. For comparability, each test will be conducted in two ways: storing the targeted amount of character in 1 entry and storing the characters in many entries. In the second case this will be done with entries each containing 100 characters. The resulting values are presented in figure 3.5.

Figure 3.5: Page load times in relation to the number of characters stored in the different APIs.

The first thing to note is that there aren't any data representing several entries for Web- SQL and indexedDB. This is because the load times weren't affected by the number of entries using these storages as the loading is asynchronous. In the case of localStorage however, we see that the load time increases sharply when the same amount of characters is saved over several entries. Furthermore localStorage isn't represented when storing 3 million characters. Remember that Web Storage in Chrome has a quota limit of 5MB (see section 2.3.3), and that 3 million characters in JavaScript corresponds to 6MB of data (see section 3.1.2), resulting in a thrown error QuotaExceededError for this test case. Another thing to note is that the loading of localStorage is done synchronously, meaning that the entire page is blocked while the storage is being loaded, while the other two APIs are loaded asynchronously.

Storing a large amount of data The maximum amount of data that can be stored differ depending on the browser and stor- age solution. Because of this we will compare the storage limitations on the most commonly used browsers: Chrome, Internet Explorer and Firefox. Since the test environment uses the Chrome browser on a Mac running OS X, there is no way to test the storage limitations of Internet Explorer. Instead these limitations are taken from the Microsoft Developer Documentation [49] regarding Web Storage and from the MSDN Blogs [50] regarding indexedDB. Recall that WebSQL isn't supported by Internet Explorer or Firefox and therefore not tested on those browsers.

30 3.1 Client-side storage

In addition to the Microsoft sources, two external sites is consulted to test the limitations of Web Storage in regards to different browsers [47], [48]. In Firefox these storage limitations are configurable by the user by typing about:config in the address bar [51]. However the default values will be used and form the basis of our test regarding Firefox. The Chrome browser handles these limitations by using the Quota Management API [33] when using indexedDB or WebSQL. This means that there is no well-defined limit on the storage amount. Instead it depends on the current system in question and the amount of space currently available. A web application using the Quota Management API is allowed at most 20% of a shared pool of disk space. This shared pool can in turn be at most 50% of the available disk space. At the time of this test the system has 172GB of free memory, meaning a shared pool of 86GB available space. This means that a storage solution can utilize at most 20% of 86GB, which is equal to 17.2GB. This implies a very variable amount of available storage, although often a potentially larger amount compared to Web Storage. In figure 3.6 the Chrome indexedDB folders are shown after saving 5000 entries each containing 10,000 characters using indexedDB.html (see Appendix A.4).

Figure 3.6: A list of stored Indexed Databases for domains in Chrome.

A summary of the storage limitations for the targeted storage solutions using the top browsers can be found in table 3.3 below. Table 3.3: Storage limitations per storage instance and browser.

Browser Web Storage indexedDB WebSQL Chrome 5MB (2.5M chars) 20% of shared pool with at most 50% of free space (Test instance: 17.2GB). Firefox 10MB(5Mchars) 50MB then user Not supported. is prompted for more space. Internet Explorer 10MB(5Mchars) 250MB Not supported.

Overall indexedDB and WebSQL provides more available storage (except for in very extreme cases where the user has less than 50MB disk space available) than that of Web Storage.

31 3. Defining the model

Browser support In order to determine the supported browsers we'll consult the webpage caniuse.com [46]. As stated in the Tools and resources paragraph of section 3.1.2 the webpage uses StatsCounter [42] in order to derive the global support percentage of different browser techniques, as well as tables presenting the support status of different browsers regarding a certain specification or technique. When evaluating the overall support of the different techniques, the more popular browsers - Chrome, Internet Explorer and Firefox - should have higher priority. The result is summarized in the following table, table 3.4:

Table 3.4: Browser support.

Storage Global support Top browsers supported Web Storage 92,7% Chrome, IE, Firefox indexedDB 53,0% Chrome, IE10, Firefox WebSQL 50,5% Chrome

Both Web Storage and Indexed Database is supported by the top three browsers, even though Indexed Database is only supported in the latest version of Internet Explorer. The fact that WebSQL has been deprecated is here further confirmed in that neither Internet Explorer nor Firefox has added support for it. Web Storage has the best overall support, being supported by all major browsers for both desktop and mobile. WebSQL and indexedDB has divided support, where WebSQL has a wide support on mobile browsers, while indexedDB is more focused on desktop browsers.

Overall the Web Storage solution performed very well in regards to reading and writing. However it is very limited in its storage capabilities and lacks ways of organizing or searching data. As data sizes grow the Web Storage performance dwindles, something that becomes more apparent since the solution is synchronous and therefor blocks the UI. In order to allow for a seamless integration with a team collaboration software we will look at IndexedDB or WebSQL for their support regarding structuring and searching data. WebSQL provide this to some extent but would require an entire SQL database structure implemented on the client. Even though the asynchronous solution performs well, the performance becomes insufficient on large instances as can be seen in the test Fetching entries from storage (see Figure 3.4). The fact that it's no longer supported further motivates us not choosing WebSQL. IndexedDB supports large amounts of organized data and can handle it fast and effi- ciently. Page load times along with reading and writing data is done without any noticeable loss in performance as the instance and amount of data grow. In addition to this it has a large storage capacity and its indexation possibilities makes it a flexible solution where we could search and find collaboration resources fast and efficiently. With that said our final choice of client side storage solution will be indexedDB.

32 3.2 Database Replication

3.2 Database Replication 3.2.1 Our requirements

Choosing how to manage the database replication is a key factor in optimizing a web appli- cation through local databases. In this section we will once again look at the requirements specified by the goals, and we will compare different solutions based on these requirements. A solution could be an algorithm, an already existing framework or something we have to develop on our own. Thus we will look back at some of the definitions, algorithms and techniques described in section 2.4. We will also keep in mind the choice of client-side storage technique if it limits us in any way.

1. Our solution should support partial replication. Partly because we want to deal with a lot of data but mainly because there could be entities in the database that has per- mission restrictions.

2. The solution should be able to detect and handle conflicts. There will be times when two users have edited the same content.

3. The number of synchronization occasions should be as few as possible. This means that we prioritize to have few occasions when we update the replica even if this means that the conflicts become more complex.

4. Consistency should not affect performance too much. We are confident that in a team collaboration environment some complexity of eventual conflicts is tolerable since conflicts occur often already. Users today might edit a single piece of content for many hours, which will often result in conflicts that the user must deal with. Therefore, we want to minimize delay times for clients that are a direct result of con- sistency.

5. Extra data needed to be saved in the database should be as small as possible. Since the size client side replica of the database is limited, we do not want to use too much of the space on the database to store extra information.

6. Failure rate, i.e. the number of failed update attempts per client, should be as low as possible.

7. Finally, we will allow for a solution where the user manually chooses when to syn- chronize, if this increases our performance.

We have ordered the requirements from the most important to the least important. As such, two first requirements are the most important, without them our system would behave erroneously. Requirement 3-6 are based on optimization, and cannot be considered if the first two cannot be achieved first. The last requirement, that is focused on usability, is not a typical requirement since it broadens our solutions spectrum instead of narrowing it, but it should only be used if it gives us a clear performance gain.

33 3. Defining the model

3.2.2 Method When choosing our desired replication method, we will compare the theory from section 2.4 with the criteria described in section 3.2.1. To determine the appropriate method for our project we need to define the limits of our client-server relationship. In a web appli- cation team collaboration environment such as Confluence, the clients are not always con- nected to the server. A client may be offline for an hour, a day or may never come online again. This makes it very difficult to, for example, implement a pure partial solution where no node has the full database, as described in Section 2.4.4. Furthermore we won't test the Three-Tier algorithm described in Section 2.4.9 since it requires a new architecture with a middle component to run, which is too complex to integrate in an already existing software for this project. We still mention it though, as it might be relevant in projects where the team collaboration software is implemented from scratch. We also assume that there is only one primary database.

Lazy vs Eager Since the nodes in our project are not always connected, an eager replication method has certain disadvantages. For example, how can we update nodes that are not connected at the time? We will therefore define that nodes are only part of the system when they are connected, and thus a node will be updated if necessary every time it connects. That way the consistency that is a requirement in an eager replication system is maintained among the nodes that are connected. To compare how lazy and eager approaches differ we have implemented a few very sim- ple simulation examples. The simulations are implemented in Java on a local server since only failures and updates are measured.

In our first simulation we test the failure rates of a simple lazy master algorithm with up to 200 clients with varying disconnecting probability values. These values determine the prob- ability of whether or not the system will update a node's replica before another transaction will be processed. E.g. if the disconnecting probability is 0.2, 20% of the nodes will have their replicas updated before the next transaction is processed. If a transaction tries to mod- ify data that does not correspond with the servers the server abort the transaction and we count it as a failure. Our first simulations uses the following data: Running the simulations with two different disconnecting probabilities, hereby known as delay probabilities, we get the results found in Figure 3.7 and 3.8. This means an eager replication algorithm would be the equivalent of using a discon- necting probability value of 100%. That would of course give us zero failures per client since every node would be consistent. However, to optimize our application we will need to look at more than just the failures. As specified in the requirements in section 3.2.1, we want to minimize the number of synchronization attempts. Thus, when comparing against the eager replication scheme we will instead look at the number of times an update of a clients database is made. Using the exact same parameters we generate the graph found in Figure 3.9. If we use a higher delay probability we get another result, found in Figure 3.10. This can be compared with the eager replication scheme where there is one update per client for every transaction. Apart from updating the affected node's database it has to update N − 1 more databases, giving a total of N database updates per write. Using the above parameters, would render 50 updates per client. This phase corresponds to the Server

34 3.2 Database Replication

Table 3.5: Simulation parameters and values

Parameter Value Description Database size 200 How many entries there are in the database. Transaction size 3 Number of operations per transaction. Number of transactions 50 Number of transactions transmitted to server per client. Disconnecting probability 10 & 90% Delay probabilities in each test.

Lazy master scheme failure rate Full replication (delay prob = 10%)

0.35 0.3 t n

e 0.25 i l c

r 0.2 e p

s

e 0.15 r u l i

a 0.1 F 0.05 0 0 200 400 600 800 1000 1200 Number of clients

Figure 3.7: Failure rate from a lazy master simulation with a delay probability of 10 % coordination phase described by Wiesmann et al. [38] which, in the case of eager replication, blocks the entire synchronization process. When describing the lazy master scheme using the same general model, that phase is non-existing since updating is done asynchronously.

Partial solution Since it is not very likely that each user in a team collaboration environment will need every piece of data in the entire database we will now simulate the lazy master scheme and the eager master scheme using partial replication. Partial replication should likely reduce the number of updates since only nodes that has the replicated data are affected. We have therefore also simulated this process and it should be noted that in these simulations the number of clients are fixed to 200 and the replicated database size at each client varies. Each client's data portion is randomly chosen. The results are presented in Figure 3.11 and

35 3. Defining the model

Lazy master scheme failure rate Full replication (delay prob = 90%)

0.1 0.09 0.08 t n e

i 0.07 l c

r 0.06 e p

0.05 s e

r 0.04 u l i

a 0.03 F 0.02 0.01 0 0 200 400 600 800 1000 1200 Number of clients

Figure 3.8: Failure rate from a lazy master simulation with a delay probability of 90 %

Lazy full replication (200 db entries) Delay probability 10% 8 7 t

n 6 e i l c

5 r e p 4 s e t

a 3 d p

U 2 1 0 0 20 40 60 80 100 120 140 160 180 200

Number of clients

Figure 3.9: Number of updates from a lazy master simulation with a delay probability of 10 %

3.12. We can see an almost linear increase of updates when increasing the partial database for each client. Since an comparison to the eager replication scheme might be interesting, we have also simulated that and presented the result in Figure 3.13. We can see a similar increase using the eager replication scheme. It is of course expected and we can also see that the eager replication scheme produces slightly more updates for each partial replication size as compared to the lazy replication scheme.

36 3.2 Database Replication

Lazy full replication (200 db entries) Delay probability 90% 47

46.5 t n e i l 46 c

r e p

45.5 s e t a

d 45 p U 44.5

44 0 20 40 60 80 100 120 140 160 180 200

Number of clients

Figure 3.10: Number of updates from a lazy master simulation with a delay probability of 90 %

Lazy master partial

Delay probability = 90 % 50 45 40 t

n 35 e i l

c 30

r

e 25 p

s

e 20 t a

d 15 p

U 10 5 0 0 20 40 60 80 100 120 140 160 180 200

Partial db size, i.e. no of data items (Full database = 200)

Figure 3.11: Number of updates from a lazy master simulation with a delay probability of 90% using partial replication.

Synchronization frequency From an optimization point of view a lazy master scheme might be the obvious choice, but as argued by Gray et al. [37] such a scheme is not very useful for mobile nodes since it requires them to be connected to a master node all the time. We therefore will compare it to some of the asynchronous solutions described in Section 2.4, and in particular Two-tier replication and Transaction-Level Result-Set Propagation (TLRSP). Using slight modifications of these two algorithms we can achieve manual synchronization support (as described by the

37 3. Defining the model

Lazy master partial

Delay probability = 10 %

6

5 t n e i

l 4 c

r e p

3 s e t a

d 2 p U 1

0 0 20 40 60 80 100 120 140 160 180 200

Partial db size, i.e. no of data items (Full database = 200)

Figure 3.12: Number of updates from a lazy master simulation with a delay probability of 10% using partial replication.

Eager master partial

60

50 t n

e 40 i l c

r e

p 30

s e t a

d 20 p U 10

0 0 20 40 60 80 100 120 140 160 180 200 Partial db size, i.e. no of data items (Full database = 200)

Figure 3.13: Number of updates from a eager master simulation using partial replication. last requirement) which will greatly decrease the number of synchronization attempts. Using a non-manual approach on the two algorithms would instead give us the same syn- chronization frequency as a standard Lazy master approach. The biggest difference (in terms of synchronization attempts) would be that Two-tier and TLRSP could synchronize when the node becomes online in the cases where it is temporary offline. This cannot be achieved with a standard Lazy Scheme since it cannot commit transactions locally.

38 3.2 Database Replication

However, we argue that our simulations are still relevant for the case of the two algorithms, since a lazy master scheme simulation with a low delay parameter can be seen as a manual commit based implementation of either of the two algorithms when looking at failure rate or number of updates per client. This is true since if there is a small delay probability pa- rameter, a lazy master scheme will update a very small number of nodes after each commit, as would be the case with the algorithms since they are commited locally first. I.e. a lazy master scheme with a low delay parameter 'stacks up' the number of clients that have not updated between commits, thus simulating the same behavior as Two Tier Replication and TLRSP-algorithms, where commits are stacked up locally. But to determine which of the two that would suit our team collaboration application best, we will have to look at other criteria.

Conflicts As stated by our requirements, our solution must be able to detect and handle conflicts in some manner. Eager and Lazy replication schemes are not affected by this requirement since they only read locally and transmits the transactions directly to the server. The Two- Tier replication scheme uses the implementation specific acceptance criteria [37, 180] to de- termine if a transaction is accepted. The TLRSP scheme instead assigns each data item a version to keep track of the changes made to a certain piece of data. By saving a copy of those version for each client when a data item is stored locally, the server and clients can keep track of changes made to that data item. This way, conflicts can be detected.

Extra Storage We will now look more closely at the extra storage needed locally for each of the algorithms to function. The eager and lazy replication schemes do not write locally, they only need to have copies of the data items that will be read. The two-tier algorithm and TLRSP algorithm however will need to log their transactions locally so they can be retransmitted to the server later. Both algorithms will need to store timestamps and identifiers of each transaction to distinguish them from each other. In [40] Ding et. al describes the log structure needed for the TLRSP algorithm. As we saw in Section 2.4.8 the AccessSet and ReadableSet can be calculated using the ReadSet and WriteSet. Thus we only need to save the ReadSet, WriteSet and ResultSet.

Two-Tier replication on the other hand requires another structure for saving the transac- tions. As described by Gray et. al, a tentative transaction produces both a tentative version of the data and a base transaction that can be commited on the server [37, 180]. Since the Two-Tier algorithm work with an acceptance criterion model instead of a Transaction Queue, each generated transaction must contain all necessary information to commit it on the server. This can be compared to the TLRSP algorithm where the real transactions that are carried out on the server are generated on the same server using the different sets. Thus TLRSP uses up slightly less of the local storage than the Two-Tier algorithm.

3.2.3 Results We will now present our results from the comparison in a table that can be viewed in Table 3.6. It contains the evaluation of each technique according to each requirement that is listed below.

39 3. Defining the model

1. Partial replication support.

2. Conflict detection and handling.

3. Synchronization rate (number of updates per client).

4. Consistency delays.

5. Extra storage size for transactions.

6. Failure rate (number of failed update attempts per client).

7. Manual synchronization support.

Table 3.6: Comparison of techniques

Requirement Lazy Master Eager Master Two-Tier TLRSP Req. 1 Full Support Full Support Full Support Full Support Req. 2 No need No need Supported Supported Req. 3 Relatively low High Relatively low Relatively low Req. 4 No delay Delay No delay No delay Req. 5 Only DB Only DB DB + Transactions DB+ Rset,WSet & ResSet Req. 6 Relatively low No Failures Relatively low Relatively low Req. 7 Not supported Not supported Supported Supported

As the final choice for our implementation we choose to go with the Transaction-Level Result-Set Propagation Scheme since it fulfills the requirements best. It clearly beats the Eager Replication Scheme in terms of number of updates and consistency delay. Both the Two-Tier replication scheme and TLRSP beats the Lazy Master Replication Scheme when it comes to number of updates since it can implement the features described in the last requirement which effectively lowers the number of updates by using manual synchroniza- tion. The reason we choose TLRSP over Two-Tier is the extra storage requirement which TLRSP wins. Therefore, TLRSP will be the basis for our synchronization algorithm.

40 Chapter 4 Implementation

In this section we will present our implementation of the prototype and an overview of the system architecture. We will present the overall design of the system but we will also be looking into details in some key features. The key features of the prototype, local browser storage and synchronization, has been implemented using IndexedDB and Transaction-Level Result-Set Propagation replication scheme as was determined in Section 3.1.2.

4.1 Overview of functionality

The prototype application that can be run by a user has support for saving Confluence Pages locally in an IndexedDB database. When a user navigates to a page, the application tries to read it from the local database and if it's not found it requests the server for the page and stores the response locally. The next time the application wants to read that page, it will be read locally. There is also a prefetching functionality which means that when the client requests the server for a page, it also requests the server for all child pages of that page that is not stored locally already. In addition to this the prototype makes use of information present in Confluence in order to also prefetch popular and favorite content for that user. The user can manually refresh unmodified pages that is stored locally. Refreshing manually isn't required though as an automatic check is implemented, where the client asks the server if any relevant changes has been made with a fixed interval. To prevent the local database from filling up the local disk space of the client, the user has the option to clear the local database. In addition to this there is also a feature where the user is prompted on startup to remove pages that hasn't been viewed or edited since a specified time.

To edit a page that is stored locally, the user clicks an edit button, changes some of the page content in an editor and finally clicks a save button, which prompts the application to update the entry in the local storage. To push those changes up to the server, the user has

41 4. Implementation to manually commit them by clicking on a link. The server will answer with a response, telling the client whether or not the changes could be made. If there was a conflict, the new version of the page will be sent in the response which the client saves locally. Using the application's ''Resolve Screen'', where both the local and the server version is shown in an editor, the user can resolve the conflicts by simply editing, saving and committing again. There is also functionality for creating new pages and moving pages (to new positions or to other parent pages) locally. This is done in a similar manner as editing, since the new page or new positions are saved locally and those changes are not reflected on the server until the user commits them.

The prototype only supports storing of confluence pages. There is no support for other content types such as comments or blog posts.

4.2 Confluence Architecture

The Confluence Developer Documentation states that there are two ways to develop with Confluence; by using the remote API or developing a plugin [58]. Since we are integrating and adding functionality to confluence, our system is implemented as a plugin to Conflu- ence, that runs on a single server.

4.3 Implementation Design

There are three sides to the implementation, the client-side, the server-side and the commu- nication between the two. We have chosen to implement a stateless REST-API [52] as our main structure for maintaining the communication, by exploiting some standard request methods available such as GET, POST, PUT, and DELETE. The prefetch functionality is added to make use of the asynchronous nature of the client. With it, the user can have several relevant resources synchronized and available locally while only having to send one initial request. Since the prototype is developed as a plugin to Confluence, which consists of a Java- based back end, we have developed our back end in Java as well. This includes the database classes as well as the REST API. The front end code that runs in the browser is written in JavaScript. The three main parts of the system are presented in sections 4.3.1 - 4.3.3.

4.3.1 Server It is on the server side that our chosen replication algorithm TLRSP is implemented. It is implemented as a method in a class called RestLocalManager which uses a DatabaseManager class to communicate with the database. As described in the specification for the algorithm [40, pp.392] the procedure to determine if changes made to a data item are permitted, is by comparing the client's version to the current primary database version. We have chosen to keep track of all the client versions on the server through two database entity classes DBVersion and ClientVersion. The first entity represents a data item's latest version, which is the version that corresponds to the data item on the primary database, and the second entity represents a client's version of that data item. For example, if User A

42 4.3 Implementation Design and User B both receives an item from the server that has version 1, and User B mod- ifies the item and commits it, then the DBVersion entity for that item will have ver- sion 2 and the two ClientVersion entities for User A and B will be 1 and 2 respec- tively. Thus, if User A tries to modify the same item and commit, the versions are different (clientV ersionA ≠ DBV ersionitem ⇔ 1 ≠ 2) and a conflict has been detected.

Upload algorithm The upload procedure when a client uploads its changes is written using the TLRSP algo- rithm previously discussed as a basis. We have chosen to represent each attribute of a page as a data item. To identify each item in a Transaction object we add a string to the ReadSet and/or WriteSet that's created using the page's id and the targeted attribute, bound together with a colon. By using the TLRSP algorithm for conflict detection and using our system for attribute identification, we can then easily handle different modification scenarios by filling the ReadSets and WriteSets with different values. Some of these scenarios are presented in Table 4.1 using the example of a page structure from Figure 4.1. We also refer the reader to Figure 4.2 and 4.3 for the last two scenarios in the table that deals with moving pages.

Table 4.1: Example of different upload scenarios with examples from Figures 4.1, 4.2 & 4.3.

Modification Type ReadSet WriteSet Edit Content of Page 1 page1:content page1:content Edit Title and Content of Page 1 page1:title page1:title page1:content page1:content Create new page (Page 3) at top level -- page3:title page3:content page3:position ...etc. Move page (Figure 4.2) page11:position page12:position page12:position page13:position page13:position page14:position page14:position page15:position page12:parentId Move page (Figure 4.3) page11:position page12:position page12:position page13:position page13:position page14:position page14:position page15:position page15:position page12:parentId page12:parentId page22:position

We create our transactions for different scenarios by simply putting all the the entities that our modification depend on in the ReadSet and all the entities that has to be updated in the database in the ReadSet. For example, when moving a page we need to to make

43 4. Implementation

Page1 Page11 Page12 Page13 Page14 Page15 Page2 Page22

Figure 4.1: Example of page structure where Page 1 and Page 2 has page children.

Page1 Page11 Page1 Page12 Page11 Page13 Page13 Page14 Page14 Page15 Page12 Page2 Page15 Page22 Page2 Page22

Figure 4.2: Example of moving a page within the same parent

Page1 Page11 Page1 Page12 Page11 Page13 Page13 Page14 Page14 Page15 Page15 Page2 Page2 Page22 Page22 Page12

Figure 4.3: Example of moving a page to another parent sure that all the siblings positions are unchanged so we put them in the ReadSet. But we only need to update the positions of those pages that are between the original position and the new position, so only those pages are added to the WriteSet. This way of representing the data, as opposed to only using page ids in the Sets, makes the system less vulnerable to conflicts, since only changes to the exact same attribute can spark a conflict.

Transactions are uploaded as a queue to the server and then the AccessSet and the Read- ableSet are computed as described by Ding et. al [40]. The AccessSet contains all data items that will be accessed in the database and the ReadableSet contains each item in the Access- Set that has not been changed on the server since the client downloaded that particular data

44 4.3 Implementation Design item. As a modification to the algorithm, we have added a new set called NotFoundSet, in which we store all the data items in the queue that are not found in the database. With this new set we can determine if the transaction is trying to create a new page. It is also useful for detecting conflicts where a page has been deleted.

When the upload function has calculated which transactions are valid it stores the valid transactions in the CommitSet and the invalid ones in the CancelSet. It then executes the changes in the CommitSet and sends back the CancelSet to the client in a special response that is further explored in Section 4.3.2.

4.3.2 REST-API The REST-API is used as a bridge of communication between the clients and the server. By implementing a set of REST resources [53] we can make authenticated calls to the server, requesting and uploading pages. A list of all resources implemented in the prototype is presented in table 4.2.

Table 4.2: List of implemented REST resources

Name Type Parameters Response getPage GET Page id Page response in JSON format. getPages GET List of page ids List of page responses in JSON format. getPreFetch GET Page id List of page ids. upload POST TransactonQueue Upload Response in JSON for- mat. getOutdatedPageIds GET None List of page ids. deleteClientVersions DELETE None Empty.

One representation that is of interest is the Transaction representation in JSON format that is sent to the server when uploading takes place, in the form of a list. It is presented below: transaction = { transactionId:integer, readSet:[string], writeSet:[string], resultSet: map }

As we can see, it follows the same log structure as laid out by Ding et. al [40]. The resultSet is simply a hashmap mapping data items to new values. There are also two main response types that are represented in JSON format, namely Page Response and Upload Response. The page response looks like this:

45 4. Implementation page = { id:integer, title:string, content:string, position:integer, parentId:integer, children:[string], spaceKey:string, isRoot:bool }

and the upload response looks like the following: uploadResponse = { clientId:string, cancelSet:[integer], serversLatestPages:[page], mergedPages:[page], newIds:map, miscErrorList:[string] }

The page response is a complete copy of the servers page entity and contains everything that is needed to present the page in the browser. The upload response however is parsed on the client-side where different actions take place. The cancelSet contains a list of all the ids of transactions that was not accepted. Since the user would want to resolve those transactions that did not pass the latest versions of those pages present in the WriteSet of those transactions are found in the set called serverLatestPages. If a transaction indeed did not pass due to a conflict it is still possible for Confluence to merge two pages, which the system will try to do. If such a merge was successful the set mergedPages contains all the pages that were merged so the client can update it's local database with those pages. Finally, to deal with the case when a new page has been created on the server the set newIds is used. Since page ids are generated on the server when a new page is created, the client will have to receive the generated ids for those pages, and those ids are stored in the newIds set mapped to the old client-generated ids.

4.3.3 Client The client side of the implementation is responsible for maintaining a Single-page applica- tion, meaning a web application all contained on one page, representing the visible part of the prototype presented to the user. From here the user has the ability to access many of the key features along with the functionality of the prototype. As previously stated in section 4.3, the client side implementation is done entirely in JavaScript. In order to facilitate the use of the REST-API as well as provide some structure for the components on the client side, the implementation is done using Backbone [54].

46 4.3 Implementation Design

Backbone is a lightweight JavaScript library that in a way follows the Model-View-Controller or MVC design pattern. We say in a way because, as the people behind Backbone mentions, different definitions tends to disagree about how a controller is defined [55]. Backbone provides models using key-value bindings, collections of models, and views with event han- dling and ties it together with REST. The view can be seen as part of the controller, handling events from the UI and rendering an HTML template acting as the true view. As so, the view is often tied to a model, something that deviates from the traditional MVC pattern. The client side structure contains a number of packages. The bundled backbone library is included in the js.backbone package, and the templates used for rendering HTML are contained in the templates.pages.soy package. The package named js.editor con- tains the files necessary for a simple WYSIWYG-editor (What you see is what you get-editor). For this purpose we've chosen a lightweight easy-to-use editor named NicEdit [56]. The reason this is done is because the bundled editor in Confluence is quite advance and uses several external resources and would hinder our goal to reduce the number of server re- quests. Finally the package js.page is the main client side package containing the Backbone components for page resources. Together with the file local.indexedDB.js they rep- resent the bulk of the client side implementation and are described in more detail below.

Backbone Components The MVC pattern is maintained by three components named controller.js, view.js and model.js. The controller.js file is mostly used to initialize components. It listens for when the local database has been opened and binds click events and create views and navigation models. The file containing all views, view.js, is also responsible for the event handling of those views by creating and calling models, which in turn are populated from the database or server, and then used to render the view. The models are found in the file models.js. There are models for each component represented in the local database, namely Pages, Transactions and Conflicts. These aren't ac- cessed directly however, but instead their Collection counterparts are used: PageCollection, ConflictCollection and TransactionCollection. These acts as managers retriev- ing models from the local database, or if not present, fetches the model from the server and creates a local representation. From the models the majority of the communication with both the server and the local database is done. local.indexedDB.js The file local.indexedDB.js contains the implementation for the local database. This file defines a global namespace representing the local indexed database. The file implements some basic database operations as private functions, and some public functions for specific operations. In order to access and store the different components efficiently we have implemented some object stores and indexes as described in section 2.3.5. They can be seen in figure 4.4 and are describes as follows:

Conflicts: This object store holds any conflicts that were detected and sent back from the server as the REST response uploadResponse. In this case the id of the conflicting

47 4. Implementation

resource is stored with the server's version of the resource as it's value. This way the conflicting page can be accessed from the page store and if the id is present in the conflict store, there is a conflict and the servers version can be retrieved when resolving locally.

Pages: The Pages object store is responsible for storing page resources as was described in section 2.1.2. These pages are stored with their page ID as keys, and our representa- tion of a page resource as its value. A previewed page resource can be seen in figure 4.4. The object store also has two additional indexes except the page id, namely spaceKey and modified. In order to fetch pages depending on what space they belong to the spaceKey index is used. The modified index is used for fetching pages depending on when they were last viewed or modified.

Transactions: The final object store Transactions is used to store our transaction represen- tations that were described in section 4.3.1 and 4.3.2. These are not to be confused with the internal transactions used by indexedDB. The transaction JSON represen- tation presented in 4.3.2 also has an additional attribute and index used internally called contentId, representing the ID of the main resource affected by this trans- action.

Figure 4.4: An overview of the local indexed database presented in Chrome DevTools [57] with object stores and corresponding indexes.

Due to the fact that browser implementations of IndexedDB still differ in some ar- eas, we implemented a namespace called to act as a wrapper. This also allows for some logic to be implemented in the wrapper's public functions. For example if function called createPage is called this will internally create a new addPage type transaction that will be stored along with the page.

48 Chapter 5 Result

In order to be able to compare the implementation against the original instance and to be able to test the desired functionality several use cases are defined. These along with the results from conducting them are presented in this chapter. 5.1 Use cases

To test if our initial assumptions described in section 1.1 about the prototype are true, we have defined a few use cases. In the first two cases we will measure the response times of requests made by the client that directly relates to the content. This includes e.g. fetching the HTML document, getting the page content or uploading content when saving. We have not included any requests that do not relate to the page data, such as CSS files, navigation menus or other similar features that are beyond the scope of this project. Additionally, we have measured the time it takes for requests made to the local database to complete and included them in the total response times presented. We have also measured the size of the responses that are generated by these request to get an idea of the traffic on the network.

5.1.1 Use case 1 Brief Description In this use case we want to show that response times are lower when reading data while using our prototype compared to using the Confluence without the prototype. It describes the case when a user navigates to several different pages for the first time after starting the application.

Applications tested 1. Confluence with our prototype 2. Confluence without our prototype

49 5. Result

Preconditions The user is logged in and has no pages stored locally.

Measurements We measure the response times of all requests done, the total number of requests and the total size of responses.

Basic Flow of Events

1. The user starts the application with a chosen space.

2. The user navigates to another top-level page using the page tree.

3. The user navigates sequentially to five of the child pages within the current top-level page.

4. The user returns to the space home page.

Post-condition Each page visited, as well as its children has been saved locally. In addition each top-level page has also been saved.

5.1.2 Use case 2 Brief Description In this use case we want to show that response times are lower when writing data while using our prototype compared to using the Confluence without the prototype. It describes when a user edits multiple pages and then commit these changes to the server.

Applications tested

1. Confluence with our prototype

2. Confluence without our prototype

Preconditions The user is logged in and is using the application. When running with the prototype, the pages edited are stored locally beforehand.

Measurements We measure the response times of all requests done, the total number of requests and the total size of responses.

Basic Flow of Events

1. The user navigates to a page she wants to edit.

2. The user presses the edit button located on the page.

3. In the editor presented the user edits the page content.

50 5.1 Use cases

4. The user clicks the save button. 5. The user repeats steps 1-4 for another page. 6. If the user is using the prototype, she clicks the commit all-button.

Post-condition The changes made by the user are reflected locally and on the server.

5.1.3 Use case 3 Brief Description In this use case we want to show that the prototype can store a sufficient amount of re- sources. It describes when a user navigates for an extended period of time, reading and storing large amount of pages. This use case simulates a long session using the application as well as changing the space environment while still using the application.

Applications tested 1. Confluence with our prototype

Preconditions The user is logged in and has no pages stored locally.

Measurements We measure the the number of pages stored locally and the total size of them.

Basic Flow of Events 1. The user starts the application. 2. The user navigates to all pages found in the space one by one. 3. The user expands the space list and navigates to a new space. 4. Step 2-3 are repeated until at least 200 pages have been visited and stored locally. 5. The user logs out and closes the browser. 6. The user opens the browser and logs in. 7. The user starts the application.

Post-condition The pages navigated are still present locally and available for viewing.

5.1.4 Use case 4 Brief Description In this use case we want to show that the prototype can detect and handle conflicts. It describes when two users tries to update the same content and one user receive a merge conflict.

51 5. Result

Applications tested

1. Confluence with our prototype

Preconditions Two users, User1 and User2, are logged in and none of them has pages stored locally.

Measurements We will simply study the response and local storage of user 2 to verify that the conflict was handled correctly.

Basic Flow of Events

1. User1 starts the application with a chosen space.

2. User2 starts the application with the same space.

3. User1 edits the content of a top level page and saves.

4. User2 edits the content of the same page and saves.

5. User1 clicks the commit all-button.

6. User2 clicks the commit all-button.

7. User2 receives a message informing that there was a conflict.

Post-condition The changes made by User1 are reflected locally for User1 and on the server. The changes made by User2 are only reflected locally. The local database of User2 should also contain a record of the conflicts. 5.2 Result of use cases

When running the first two use cases we have used three different connection types: a stan- dard wired connection, a WiFi connection and a 3G connection. The results are presented in tables 5.1 - 5.3. Use case 3 gave us a result where the local database contained 200 pages with a total size of 1.78 MB. Also, our last use case gave us a response from the server to User 2 which contained information about the conflict. We could also see the conflict stored in the local database, thus confirming the applications ability to handle conflicts.

52 5.2 Result of use cases

Table 5.1: Result of use case 1 and 2 using a wired connection.

Measurement Number of Requests Response Times Size Use case 1 (Prototype) 11 1343 ms 70.5 kB Use case 1 (Confluence) 24 2285 ms 502 kB Use case 2 (Prototype) 1 96 ms 0.28 kB Use case 2 (Confluence) 14 1194 ms 412 kB

Table 5.2: Result of use case 1 and 2 using a WiFi connection.

Measurement Number of Requests Response Times Size Use case 1 (Prototype) 11 2425 ms 70.5 kB Use case 1 (Confluence) 24 2868 ms 502 kB Use case 2 (Prototype) 1 169 ms 0.28 kB Use case 2 (Confluence) 14 1595 ms 412 kB

Table 5.3: Result of use case 1 and 2 using a 3G connection.

Measurement Number of Requests Response Times Size Use case 1 (Prototype) 11 5264 ms 70.5 kB Use case 1 (Confluence) 24 18828 ms 502 kB Use case 2 (Prototype) 1 966 ms 0.28 kB Use case 2 (Confluence) 14 9610 ms 412 kB

53 5. Result

54 Chapter 6 Conclusion & Future work

When starting the project we set up a few goals that we thought could be achieved in a team collaboration web application by storing data locally in the browser. These goals, which can be viewed in Section 1.1, were based on optimization. In this project we have tried to find the best optimization technologies that would fit such an application when it comes to browser storage and database replication algorithms. We have studied the three major local storage technologies called Web Storage, WebSQL and IndexedDB, and four major database replication schemes called Lazy Master Replication, Eager Master Replica- tion, Two-Tier Replication and Transaction-Level Result-Set Propagation. By setting up different criteria based on optimization for the two technologies we have chosen one of each technology and implemented a prototype based on them. The prototype is developed as a plugin to the team collaboration software Confluence. Finally, we have measured response times and network traffic in Confluence with the plugin installed and without the plugin installed, and compared the results against our initial goals set up.

With the help of our use cases we feel that we have shown that a certain level of optimiza- tion can be achieved by storing data locally in the browser, and particularly for the case of team collaboration software. We feel that the prototype succeeds with providing this op- timization while still maintaining the core functionality and characteristics of the original software. By looking at the result from the use cases we can see the gap in response times between standard Confluence and our prototype is smaller in the initialization phase, i.e. when a user uses the application for the first time. This would be expected since all the data must be downloaded first for the prototype to read locally. The gap thus widens the longer the prototype is used. In use case 2, when pages are already stored locally, the difference in response times is much greater in the favor of our prototype. Looking at the results from the first two use cases we can see that it is when we test with a 3G connection that we start to see the real performance gain. On slow networks, the high number of requests makes the application perform slow. On a wired connection and to

55 6. Conclusion & Future work some extent on a WiFi connection, there is still a significant difference in response times but they are still fairly small and one could therefore question the need for an optimization in the case of fast networks.

However, there is of course the problem with conflicts that needs careful consideration. As shown in our database replication tests, the number of failures in asynchronous solutions are higher when the time between synchronizations are longer. The prototype could not work without support for conflict detection and conflict handling in some way, which is why we defined use case 3 that tests the conflict detection process. Of course, if there would be a very large number of users modifying the same data frequently, the prototype would be very frustrating to use. But we argue that the same case would also be troublesome using Confluence without our prototype, since there would also be conflicts. Storage capabilities and storage efficient replication algorithms became less of an issue as we chose to use IndexedDB for handling the local storage as opposed to, for example, Web Storage which can only store 5 to 10 MB depending on the browser. However, the chosen replication algorithm TLRSP proved to have other properties that was useful for the implementation. The log structure made it possible to implement different strategies for handling different modification scenarios, which proved highly useful. The implemented scenarios can be viewed in table 4.1.

Another practical problem with saving data in the browser is that the user could delete the data by hand without the knowledge of either the server or the application. If the user has uncommitted transactions saved, they would all be lost since they only exist locally. It would have to be communicated clearly that the user should not delete its local data. Furthermore, the server only identifies the user by it's login information and does not know which browser the user is using. If the user were to switch browser or computer, the server would assume that the user has certain data stored locally that is in reality stored in some other browser and/or computer. A solution for that problem, by for example sending information of the browser and computer to the server, is something that would have to be implemented for the prototype to work better. 6.1 Future Work

Even though the prototype implemented in this thesis has been shown to fulfill the goals specified, there is always room for improvement. The local database could be extended to support additional resources as those described in section 2.1. Support for images and files could be added by combining the database with the FileSystem API. In addition to this the indexedDB wrapper implemented could be combined with other storage solutions such as Web Storage or WebSQL in order to extend the browser support.

The next logical step would be to extend and convert the prototype to a complete offline web application, making it completely available offline. This would require in introduction of the Offline Web Application API, and some rework on the database replication algorithm as well as on the client.

56 Bibliography

[1] Global team collaboration software, & audio, video, web conferencing solutions mar- ket (2009-2015) ASDReports ://www.asdreports.com/shopexd.asp? id=15283 [Downloaded: 15 May, 2013]

[2] "Create content in a fast, rich editor" Atlassian [Online] Available: http://www. atlassian.com/software/confluence/overview/the-highlight-reel [Downloaded: 15 May, 2013]

[3] "SharePoint 2013 Overview" Microsoft [Online] Available: http://office. microsoft.com/en-us/sharepoint/sharepoint-2013-overview- collaboration-software-features-FX103789323.aspx [Downloaded: 15 May, 2013]

[4] "Atlassian customers - enterprise and Starter | Atlassian," Atlassian [Online] Avail- able: http://www.atlassian.com/company/customers/customer-list/ ?tab=confluence [Downloaded: 29 May, 2013]

[5] "TWiki Main Page" TWiki [Online] Available: http://twiki.org/ [Downloaded: 15 May, 2013]

[6] "Resolving edit conflicts in a SharePoint workspace" Microsoft Sharepoint Development Blog [Online] Available: http://blogs.msdn.com/b/sharepoint_workspace_ development_team/archive/2010/06/21/resolving-edit-conflicts- in-a-sharepoint-workspace.aspx [Downloaded: 15 May, 2013]

[7] "SharePoint Workspace 2010 overview" Microsoft Technet [Online] Available: http://technet.microsoft.com/en-us/library/ee649102(v=office. 14).aspx [Downloaded: 15 May, 2013]

[8] "Getting Started" Atlassian Developers [Online] Available https://developer. atlassian.com/display/DOCS/Getting+Started [Downloaded: 15 May, 2013]

57 BIBLIOGRAPHY

[9] P. Lubbers, B. Albers and F. Salim, Pro HTML5 Programming, 2nd ed., New York: APress, 2011

[10] Z. Kessin, Programming HTML5 Applications, 1st ed., Sebastopol: O'Reilly, 2011

[11] C. Hudson, T. Leadbetter, HTML5 Developer's Cookbook, Boston: Addison Wesley, 2011

[12] "HTML standard," WHATWG [Online] Available: http://www.whatwg. org/specs/web-apps/current-work/multipage/introduction.html# history-1 [Downloaded: 18 February, 2013]

[13] "HTML5 ? Smile, it's a Snapshot!," W3C Blog [Online] Available: http://www. w3.org/QA/2012/12/html5_smile_its_a_snapshot.html [Downloaded: 18 February, 2013]

[14] "World Wide Web Consortium Process Document," W3C [Online] Available: http://www.w3.org/2005/10/Process-20051014/ [Downloaded: 18 Febru- ary, 2013]

[15] "Performance - HTML5 Rocks," HTML5 Rocks [Online] Available: http://www. html5rocks.com/en/features/performance [Downloaded: 20 May, 2013]

[16] "Gears API Blog," Google Gears [Online] Available: http://gearsblog. blogspot.se/2011/03/stopping-gears.html [Downloaded: 20 May, 2013]

[17] "Gears - Google project hosting," Google Gears [Online] Available: https://code. google.com/p/gears/ [Downloaded: 20 May, 2013]

[18] M. Pilgrim, "Local Storage - Dive into HTML5," Dive into HTML5 Available: http: //diveintohtml5.info/storage.html [Downloaded: 20 May, 2013]

[19] M. Mahemoff, "Client-Side Storage," HTML5 Rocks [Online] Available: http:// www.html5rocks.com/en/tutorials/offline/storage/ [Downloaded: 19 February, 2013]

[20] "Same Origin Policy - Web Security," W3C Wiki [Online] Available: http:// www.w3.org/Security/wiki/Same_Origin_Policy [Downloaded: 19 Febru- ary, 2013]

[21] M. Zalewski, Browser Security Handbook, Google, 2008. [E-book] Available: http: //code.google.com/p/browsersec/wiki/Main [Downloaded: 19 February, 2013]

[22] I. Hickson, "Web Storage," W3C Specification [Online] Available: http://www.w3. org/TR/webstorage/ [Downloaded: 20 February, 2013]

[23] I. Hickson, "Web SQL Database," W3C Specification [Online] Available: http:// www.w3.org/TR/webdatabase/ [Downloaded: 20 February, 2013]

58 BIBLIOGRAPHY

[24] "DOM Storage - Document Object Model (DOM) | MDN," Mozilla Developer Network [Online] Available: https://developer.mozilla.org/en-US/docs/ DOM/Storage [Downloaded: 22 February, 2013]

[25] I. Hickson, D. Hyatt, "HTML5 W3C Working Draft 12 February 2009," W3C Specification [Online] Available: http://www.w3.org/TR/2009/WD-html5- 20090212/ [Downloaded: 22 February, 2013]

[26] I. Hickson, "Web Storage W3C Working Draft 23 April 2009," W3C Spec- ification [Online] Available: http://www.w3.org/TR/2009/WD-webstorage- 20090423/ [Downloaded: 22 February, 2013]

[27] N. Mehta, J. Sicking, E. Graff, A. Popescu and J. Orlow, "Indexed Database API," W3C Specification [Online] Available: http://www.w3.org/TR/IndexedDB/ [Downloaded: 5 March, 2013]

[28] A. Ranganathan, "Beyond HTML5: Database APIs and the Road to In- dexedDB," Mozilla Hacks [Online] Available: http://hacks.mozilla.org/2010/ 06/beyond-html5-database-apis-and-the-road-to-indexeddb/ [Down- loaded: 26 May, 2013]

[29] "Offline Web applications - HTML Standard," WHATWG [Online] Available: http://www.whatwg.org/specs/web-apps/current-work/multipage/ offline.html [Downloaded: 26 May, 2013]

[30] N. Mehta, "WebSimpleDB API," W3C Specification [Online] Available: http: //www.w3.org/TR/2009/WD-WebSimpleDB-20090929/ [Downloaded: 27 May, 2013]

[31] A. Ranganathan, J. Sicking, "File API," W3C Specification[Online] Available: http: //dev.w3.org/2006/webapi/FileAPI/ [Downloaded: 28 May, 2013]

[32] E. Uhrhane, "File API: Directories and Systems," W3C Specification[Online] Available: http://www.w3.org/TR/file-system-api/ [Downloaded: 28 May, 2013]

[33] K. Yasuda, "Quota Management API," W3C Specification [Online] Available: http: //www.w3.org/TR/quota-api/ [Downloaded: 29 May, 2013]

[34] B. Kemme, R. Jimenez-Peris and M. Patiño-Martínez, Database Replication. Guernsey: Morgan & Claypool, 2010

[35] Y. D. Serrano, M. Patiño-Martínez, R. Jimenez-Peris and B. Kemme, Boosting Database Replication Scalability through Partial Replication and 1-Copy-Snapshot-Isolation, Proceeding PRDC '07 Proceedings of the 13th Pacific Rim International Symposium on Depend- able Computing, p. 290-297, 2007.

[36] H. Garcia-Molina, J. D. Ullman and J. Widom, Database Systems, 2nd ed., Upper Sad- dle River, N.J: Pearson Prentice Hall, 2009

59 BIBLIOGRAPHY

[37] J. Gray, P. Helland, P. O'Neil and D. Shasha, The Dangers of Replication and a Solution In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, p. 173-182, 1996 [Online] Available: Microsoft Research, http://research. microsoft.com. [Downloaded: 18 February, 2013].

[38] M. Wiesmann, F. Pedone, A. Schiper, B. Kemme and G. Alonso, Understanding replica- tion in databases and distributed systems In Proceedings of 20th International Conference on Distributed Computing Systems (ICDCS?2000), p. 464 -474, 2000 [Online] Avail- able: Cite Seer X, http://citeseerx.ist.psu.edu [Downloaded: 19 February, 2013]

[39] E. Pacitti, P. Minet and E. Simon, Fast Algorithm for Maintaining Replica Consistency in Lazy Master Replicated Databases VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases, p. 126-137 [Online] Available: HAL Inria, http://hal.inria.fr [Downloaded: 19 February, 2013]

[40] Z. Ding, X. Meng and S. Wang, A Transactional Asynchronous Replication Scheme for Mobile Database Systems, Journal of Computer Science and Technology archive, Volume 17 Issue 4, July 2002, p. 389-396 [Online] Available: Journal of Computer Science and Technology, http://jcst.ict.ac.cn [Downloaded: 21 February, 2013]

[41] R. Baldoni, C. Marchetti and S. Tucci-Piergiovanni, Asynchronous active replication in three-tier distributed systems,Dependable Computing, 2002. Proceedings. 2002 Pa- cific Rim International Symposium on Dependable Computing, p.19-26 [Online] Available: Università di Roma, http://www.dis.uniroma1.it [Downloaded: 20 February, 2013]

[42] "StatCounter Global Stats," Statcounter [Online] Available: http://gs. statcounter.com/ [Downloaded: 1 March, 2013]

[43] "jsPerf: JavaScript performance playground," jsPerf [Online] Available: http:// jsperf.com/ [Downloaded: 1 March, 2013]

[44] Z. Wang, "Navigation Timing," W3C Specification [Online] Available: http://www. w3.org/TR/navigation-timing/ [Downloaded: 5 March, 2013]

[45] "Navigation Timing API," Github Documentation [Online] Available: http:// kaaes.github.com/timing/ [Downloaded: 5 March, 2013]

[46] "Can I use... Support tables for HTML5, CSS3, etc," caniuse [Online] Available: http://caniuse.com/#search=stor [Downloaded: 18 March, 2013]

[47] "Test of localStorage limits/quota," arty.name [Online] Available: http://arty. name/localstorage.html [Downloaded: 18 March, 2013]

[48] "Web Storage Support Test," dev-test.nemikor [Online] Available: http://dev- test.nemikor.com/web-storage/support-test/ [Downloaded: 18 March, 2013]

60 BIBLIOGRAPHY

[49] "Introduction to Web Storage (Internet Explorer)," Dev Center - Windows Store apps [Online] Available: http://msdn.microsoft.com/en-us/library/windows/ apps/cc197062.aspx [Downloaded: 19 March, 2013]

[50] R. Yang, "Using HTML5/Javascript in Windows Store apps: Data access and storage mechanism (III)," MSDN Blogs [Online] Available: http://blogs.msdn.com/b/ win8devsupport/archive/2013/01/10/using-html5--in- windows-store-apps-data-access-and-storage-mechanism-iii.aspx [Downloaded: 19 March, 2013]

[51] "About:config entries," MozillaZine Knowledge Base [Online] Available: http:// kb.mozillazine.org/Firefox_:_FAQs_:_About:config_Entries [Down- loaded: 19 March, 2013]

[52] R.T. Fielding, "Chapter 5 - Representational State Transfer (REST)" Architec- tural Styles and the Design of Network-based Software Architectures [Online] Available: http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_ style.htm [Downloaded: 14 May, 2013]

[53] "Resources" Read the Docs [Online] Available: https://restful-api-design. readthedocs.org/en/latest/resources.html [Downloaded: 15 May, 2013]

[54] "Backbone.js," Backbone [Online] Available: http://backbonejs.org/ [Down- loaded: 17 May, 2013]

[55] "Backbone.js F.A.Q.," Backbone [Online] Available: http://backbonejs.org/ #FAQ-mvc [Downloaded: 17 May, 2013]

[56] "NicEdit - WYSIWYG Content Editor, Inline Rich Text Application," NicEdit [On- line] Available: http://nicedit.com/index.php [Downloaded: 17 May, 2013]

[57] "Chrome DevTools ? Google Developers," Google Developers [Online] Available: https://developers.google.com/chrome-developer-tools/ [Down- loaded: 17 May, 2013]

[58] "Confluence Developer Document," Atlassian Developers [Online] Available: https://developer.atlassian.com/display/CONFDEV/Confluence+ Developer+Documentation [Downloaded: 20 May, 2013]

61 BIBLIOGRAPHY

62 Appendices

63

Appendix A Test resources

Here follows a list of resources used when conducting the client-side storage tests.

A.1 Reading values of increasing size

• Web Storage: http://jsperf.com/localstorage-reading/7

• Web SQL Database: http://jsperf.com/client-side-reading

• Indexed Database: http://jsperf.com/client-side-reading-indexeddb

A.2 Writing values with increasing size

Test containing all three APIs: http://jsperf.com/client-side-writing

A.3 Fetching entries from storage

• Web Storage: http://jsperf.com/testing-key-lookup-in-localstorage

• Web SQL Database: http://jsperf.com/reading-writing-in-large-web-sql-database/3 http://jsperf.com/reading-writing-in-large-web-sql-database/4

• Indexed Database: http://jsperf.com/client-side-reading-indexeddb/ 2

65 A. Test resources

A.4 Loading a page with storage of increas- ing size

Three separate files were implemented to test the storage abilities of localStorage, webSQL and indexedDB respectively. An instance of each can be found below in figures A.1 to A.3. Each page contains a link to another page on the same domain in order to be able to measure load times both initially and after the storage has been loaded.

Figure A.1: A running instance of localstorageLoadPage.html. It shows the time it took to load the page, and contains options for adding a specified number of new entries to the storage. For large entries it can also save data from a specified file.

Figure A.2: An instance of the file WebSQLLoadPage.html. It will measure the time the page took to load and contains options for clearing and storing a specified amount of entries.

66 A.4 Loading a page with storage of increasing size

Figure A.3: The file indexedDB.html contains a textarea and the option to add that content 1 or 1000 times. The status area will display load times and other information.

67