Masaryk University Faculty of Informatics

Declarative handling system for monitoring activity

Bachelor’s Thesis

Boris Petrenko

Brno, Fall 2017

Masaryk University Faculty of Informatics

Declarative event handling system for monitoring web browser activity

Bachelor’s Thesis

Boris Petrenko

Brno, Fall 2017

This is where a copy of the official signed thesis assignment and a copy ofthe Statement of an Author is located in the printed version of the document.

Declaration

Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Boris Petrenko

Advisor: prof. RNDr. Jiří Barnat Ph.D.

i

Acknowledgement

I would like to thank Professor Barnat for guiding me through the process of creating the thesis and for a valuable constructive criticism. I would also like to thank my friends who gave me feedback on early versions of the work.

iii Abstract

The goal of the work is to design, develop and evaluate an open source system for monitoring web browser activities of users. The system is intended to monitor how users interact with a monitored website and asynchronously sends activity data to an endpoint where the data is collected. User actions are monitored via native browser event handlers, processed client-side using RxJS and then sent to the server to be stored or processed in another way.

iv Keywords web, event, activity, tracking, monitoring, , rxjs

v

Contents

1 Introduction 1

2 Description of web applications 5 2.1 Languages ...... 5 2.2 Event listeners ...... 6

3 Existing technologies 7 3.1 Google Analytics ...... 7 3.2 In-house implementation ...... 8

4 Implementation 11 4.1 Technologies used ...... 11 4.1.1 JavaScript ...... 11 4.1.2 JSON ...... 11 4.1.3 ES6 ...... 12 4.1.4 Node ...... 14 4.1.5 RxJS ...... 15 4.1.6 Tape ...... 17 4.1.7 Flow ...... 17 4.1.8 Babel ...... 18 4.1.9 Gulp ...... 18 4.1.10 ESlint ...... 19 4.1.11 Prettier ...... 19 4.2 Requirements ...... 19 4.3 Library design ...... 20 4.3.1 High-level overview ...... 20 4.3.2 External API ...... 21 4.3.3 Internal API ...... 25 4.3.4 Build process ...... 28 4.3.5 Release ...... 29

5 Usage and evaluation 31 5.1 Setup ...... 32 5.1.1 Installation ...... 32 5.1.2 Mounting ...... 32 5.1.3 Dynamic data ...... 33

vii 5.2 Example use case ...... 34 5.3 Performance implications ...... 37 5.3.1 Client ...... 37 5.3.2 Server ...... 38

6 Conclusion 39

Bibliography 41

viii 1 Introduction

Monitoring user activity has always been a lucrative domain for any website owner or maintainer looking for improving their margin prof- its, customer satisfaction and rankings. Information about what the user does on the website can assist in shaping the future of the busi- ness, aid in important high-level decisions or help website maintainers finding subtle errors. Tracking user activity simply refers to gathering data about what the user does on a website. What buttons do they click, how do they navigate the website and what content they tend to look at. This in- formation is then processed and turned into statistics which can be studied further and serve as a basis for next product decisions. As a business owner, some of the most important factors for main- taining a successful enterprise with a website is making users stay on the web and provide them with the service they expect. Decreasing the percentage of users leaving the website after viewing just a single page, also called bounce rate, while increasing the ratio between successful transactions versus the number of visitors, known as conversion rate, is the top priority. Any new feature, change or any technical issues are noticeably impacting both these metrics. Bounce rate is determined simply by comparing the number of users who did a page transition to the number of users who did not. However, the important part is determining the reason why a person left the website prematurely. It can be a confusing page navigation, a non-working transition button or simply a bad design. Finding the cause can be as simple as finding a bug on a button, or as complex as having a machine learn user patterns and determining the confusing parts of the website. What is common in both cases is having a good set of user data that can be studied over time. Conversion rate often goes hand in hand with the inverse of bounce rate. As more users tend to stay on the website, the amount of users that purchase a service naturally increases as well. In the case of conversion we care about the whole flow, not only keeping users on a single page, thus the spectrum of user data is larger than when determining the cause of an increased bounce rate. Little changes and subtle bugs can also have a much higher impact on conversion rate, because, when

1 1. Introduction it comes to purchasing services, users are much more likely to have trust issues than when simply navigating the website. Both these metrics are relevant to businesses with websites that on selling a product. How is gathering user data important for free services, blogs or non-profit organisations? If conversion rate is practically non-existent due to the nature of the websites, and bounce rate is irrelevant, because it does not matter if a person comes to a blog to read one post at a time, or reads more in one session. One interesting use case would simply be making statistics of user behaviour. Creating heat maps of the user’s mouse navigation, tracking clicks on different parts of the website, making statistics of the content different kinds of people tend to look at. All this data can, but doesn’t have to be, monetised by advertising companies, or can simply be interesting to provide to the public. Now that we looked into what user tracking is and why is it in- teresting, let’s tackle what different kinds of data can we actually get. Information gathered from tracking user events is very diversified, ranging from simple clicks to complex heat maps and graphs of mouse navigation patterns. Often the most important pieces of information include clicking and navigating on a specific piece of content. It is vital to also associate the specific user with the kind of content he showed interest in.This enables us to study what kind of content do different gender and age groups tend to appreciate. We then evaluate this information against our target audience, giving us a foundation on the potential services we can offer. For example, let’s say we’re selling plane tickets. When we know most of our clients are between 20 to 30 years of life, and that large cities are the most popular among this age group, it makes sense focusing on marketing large cities as a destination to attract potential customers. Mouse navigation patterns serve as a useful metrics of user ex- perience quality. If we for example find out that many users tend to confusingly navigate between a set of website elements and trying to click on them, we can adjust the design so the elements do not look interactive on the first sight. On the other hand, if users ignore a certain important aspect of a website, it is a good sign we should make the section more visible or highlighted.

2 1. Introduction

The implemented program is called creepx. Its source code is lo- cated on GitHub1, a place for open source projects using Git, a popular version control system. The library is installable via NPM2 and can be used in any JavaScript . In the first part of the thesis we will look at an example ofan existing implementation of the most popular user activity monitoring tool, its history and use cases. Then we will compare it to a custom solution that is used in Kiwi.com, advantages and disadvantages of both. We will cover the motivation behind introducing a new custom implementation, what problems does it solve and the possibilities it unlocks. In the second part we will look more in depth of our custom im- plementation, the technologies that were used, why were they used, how do they compare to similar solutions. We will cover technical implications for the consumers of the service both on the client and the server side. We will look at the different design decisions, the problems that appeared and how they were solved. Lastly, the third part of the thesis will cover the usage of the ser- vice, installation, possible use cases and examples of data that can be gathered.

1. https://github.com 2. https://www.npmjs.com

3

2 Description of web applications

This chapter contains a brief description of how web applications work and how do the technologies relate to our library.

2.1 Languages

Websites use HTML (HyperText Markup Language) to display content and give the website a structure. A nice metaphor is that HTML is the skeleton of the website. HTML gives us the layout, textual content and images. HTML5, the newest HTML version, contains semantic elements that mark different parts of your website, such as the nav- igation bar, a footer, headers, sidebars. HTML5 also contains some advanced elements such as a canvas for displaying graphics and video or a so-called iframe that allows us to display a website within another website, which is a feature useful for widgets. CSS (Cascading StyleSheets) is a language used to give websites graphics. It is used for sizing, colouring, positioning and other general styling of elements. CSS targets HTML elements via their ID, class or the tag itself. CSS3 introduced new kinds of selectors, such as selecting child elements, nearest elements, or special elements called pseudo- elements used for interactive visual effects. JavaScript is the scripting language of today’s browsers. The lan- guage is weakly,dynamically typed, interpreted and notoriously known for its sometimes absurd features. Although being used as an object- oriented language in the past using prototypal inheritance, it is used as a functional language much more in the recent years. JavaScript in the browser implements maintained by W3C (World Wide Web Consortium), which makes it possible to manipulate elements of the HTML document called the (DOM). This allows us to dynamically change the HTML content, layout and even styling.

5 2. Description of web applications 2.2 Event listeners

User events are handled with so-called event listeners. We can attach event listeners to certain DOM elements, such an on-click listener on a button. This is done imperatively via JavaScript: const btn= document.querySelector("#mybutton"); btn.addEventListener("click", ev =>{ // a function that operates on the event object });

The function that handles the user event gives us an access to an event object. The object contains various information about the element that was clicked, description of the event, what is the current state of the browser and some metadata. The technique that utilises fetching data from the server via JavaScript’s XHR or fetch API (application programming interface) is called (asynchronous JavaScript and XML). When the data arrive, the DOM is then mutated to display the data or the state of the request. The second way of updating the website is keeping the applica- tion state on the server, sending a request with the description of the change, then waiting for a full page reload. This technique, although more secure, is slowly declining in popularity due to a much worse client experience caused by constant page reloads. In the thesis, we will build our user tracking implementation that focuses on applications that use AJAX to update the website.

6 3 Existing technologies

This chapter showcases an existing public service, an in-house solu- tion, cover pros and cons of both. There will be a brief comparison of the solutions and our library, as well as an overview how a custom implementation could aid our specific needs.

3.1 Google Analytics

The biggest user tracking implementation currently on the market is Google Analytics (GA). It is a freemium analytics service offered by Google that tracks and reports website traffic. Google Analytics implement page tags, which are snippets of JavaScript code that are on every page of the website. They collect data from the user and then send them to their server as a part of a request for a web beacon. At present web analytics data are typically collected from server logs or using web beacons. Web-beacons are small image requests placed in a web page to cause communication between the user’s device and a server. The server may be controlled by the analytics provider, by the vendor whose website contains the web-beacons, or by another party. Web-beacons are also known as clear GIFs, web bugs, image requests, or pixel tags. [1] Google Analytics collects data about page views, site visits, bounce rate, average time on site, pages per visit and percentage of new visits. Its main goal is checking whether a user accessed certain content. Tracking custom events is possible, however, the API is imperative, which is impractical to deal with in modern web applications that use declarative technologies. Another issue is that the API is suited for higher level events when a user interacts with certain content. Imple- menting many granular events for finer statistics would be impractical, as the amount of events is limited. The main goal for building a custom event tracking implementation is not to replace Google Analytics, but it enables us to utilise the good parts of Google Analytics and leave other, more specific requirements to an in-house implementation.

7 3. Existing technologies 3.2 In-house implementation

In Kiwi.com, we have a service called cuckoo. The motivation for creating this service is different than the use case behind Google Analytics and many other analytic services. The main use case of cuckoo is tracking user specific data and how they happened over time. The data is then used to assist customers in case of any problems or complains. It also helps in relieving the company from any fake accusations of a fraud. Cuckoo is a JavaScript module that exports a class with a bunch of methods for different levels of tracking. These range from log to critical. Log is used for informational events, such as when a user clicks a button. Critical is used for error events which the application cannot recover from. The use case of cuckoo is very straightforward - the developer imports the module, picks a unique event name, determines the right importance level of the event being tracked and places the call within the appropriate function. The problem with this very imperative approach is that we have cuckoo calls all over the codebase. This poses a big problem for:

• Unit tests

• Integration tests

• Server-side markup rendering

• Refactoring

Testing can be dealt with either using mocking, dependency injec- tion or environment variables. Mocking requires the least amount of effort, dependency injection is the neatest solution. However, mocking and dependency injection only works in unit tests, not in integration tests, since our integration tests of web applications are run within the browser. For dealing with integration tests, we need to set a runtime environment variable to disable any calls to the log server endpoint. This can be also done hand in hand with one of the aforementioned methods to also allow better unit testability, as we can check if the log function was called properly in the unit test.

8 3. Existing technologies

Why do we need a custom implementation if we managed to deal with the problems of an imperative custom logging service? There are two main reasons - too much boilerplate code and the imperative nature of the service. Developers always have to re-create the function calls with just a bit different metadata when adding extra user event tracking, which is a lot of unnecessary code. Regarding imperative code, our whole web application technology stack is based around declarative technologies. Imperative log function calls really get in the way when developing, and to introduce an additional function call can often involve quite a bit of refactoring.

9

4 Implementation

This chapter focuses on the technologies used, different design deci- sions, internal and external API of our library, setup and testing. There is a brief description of all languages, third party libraries, tools and techniques together with a numerous code examples.

4.1 Technologies used

4.1.1 JavaScript The language of choice in implementing our solution was JavaScript. Since it was the one and only possible web language, and still is in many older browsers, it was an obvious choice. The other possibility was to utilise the new web assembly standard. Engineers from all major browser vendors have risen to the challenge and collaboratively designed a new low-level byte code for the web called WebAssembly. It offers compact representation, fast and simple validation and compilation, low to no-overhead safe execution, and easy interoperation with the web platform, including direct access to JavaScript and Web APIs.[2] Compilers currently exist for the C and C++ language, however, the community is eager on writing compilers for many other languages popular in the web development world, such as Haskell and Go. Since the web assembly standard is very new and lacks a robust implementation in some browsers, we will stick to plain old JavaScript. JavaScript, as mentioned earlier, is a language that enables us to either do object-oriented programming via prototypes, or functional programming, since functions are first-class objects and it supports partial application via binding. We will utilise the functional aspect of JavaScript for creating pipelines that transform incoming events.

4.1.2 JSON JavaScript Object Notation (JSON) is a text format for the serialization of structured data. It is derived from the object literals of JavaScript, as defined in the ECMAScript Language Standard, Third Edition.[3]

11 4. Implementation

JSON is widely used in modern web applications, replacing XML as the most common format for data structures. Its main benefit is that it is very lightweight compared to XML, as well as easily read by humans. JSON is a subset of JavaScript’s object syntax. In order for an object to be a valid JSON, these rules must be followed:

• No comments allowed

• No trailing commas allowed

• No functions allowed

• Object or array literal must be the top-level entity

• Every object property must be wrapped in double quotes

• Strings must use double quotes

4.1.3 ES6 EcmaScript 6 (ES6) is a new JavaScript standard introduced in 2015. It is one of the biggest JavaScript updates since the language’s initial release. Some of the ES6 features used in this project are:

• Modules

• Arrow functions

• Array and object destructuring

Modules are something that existed in the JavaScript world for a longer time in the of Node’s module system. Node’s system only works on the server though, not in the client’s browser. ES6 introduced a different module system that will be both in Node and in the browser, regardless of the browser vendor. The module system will allow importing and exporting functionality within files, much like other languages with module systems work. The module system is static, requiring any imports to be located at the top of the file and the imported modules to exist at the application’s start time.

12 4. Implementation

import * as React from "react"; // imports a 3rd party module

// exports our class export default class MyComponent extends React.PureComponent { // ... }

Arrow functions, in addition to be a nice syntactic sugar on the top of the existing function definition, has a feature that is useful when dealing with callbacks. Arrow functions, apart from the classic functions, preserve their context’s this. Until arrow functions, every new function defined its own this value (a new object in the case of a constructor, undefined in strict mode function calls, the base object if the function is called as an "object method", etc.). This proved to be less than ideal with an object-oriented style of programming.1 Before arrow functions, in order to use this in a callback, we had to store the reference to a variable:

function Person() { var that= this; that.age=0;

setInterval(function growUp() { // The callback refers to the `that` variable of which // the value is the expected object. that.age++; }, 1000); }

With arrow functions, this is preserved of the Person function, thus we don’t need to save the reference:

function Person() { this.age=0;

setInterval(() =>{

1. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions

13 4. Implementation

this.age++; // |this| properly refers to the person object }, 1000); } var p= new Person();

Array and object destructuring allows us to break objects and arrays up into their content, making it very easy to access their content:

// Before // --- function compareQuartals(elements) { const fst= elements[0]; const snd= elements[1]; const trd= elements[2];

return calcQuartal(fst, snd) !== calcQuartal(snd, trd); } const fst= obj.fst; const snd= obj.snd;

// After // --- // Breaks an array up into variables based on their // position in the array function compareQuartals([fst, snd, trd]) { return calcQuartal(fst, snd) !== calcQuartal(snd, trd); }

// Breaks an object up into variables based on the // name of its properties const { fst, snd }= obj;

4.1.4 Node Node (or Node.js as commonly referred to) is a JavaScript runtime based on Google’s V8 JavaScript engine.

14 4. Implementation

V8 is Google’s open source high-performance JavaScript engine, written in C++ and used in , the open source browser from Google, and in Node.js, among others. It implements ECMAScript as specified in ECMA-262, and runs on Windows 7 or later, macOS 10.5+, and Linux systems that use IA-32, ARM, or MIPS processors. V8 can run standalone, or can be embedded into any C++ application.2 Node comes installed with NPM (Node package manager). NPM is responsible for downloading dependencies for Node applications, called node modules, as well as keeping track which dependencies are required for the project to be installed. Node is sometimes used for development of back-end applications such as REST API servers, micro services or GraphQL servers. We will utilise the second major use case for Node, which is pre-processing JavaScript code for optimisation, compatibility and static analysis. This is a common practice when publishing NPM modules.

4.1.5 RxJS Rx is a language-agnostic API called reactive extensions that solves asynchronous problems using streams, utilising functional reactive programming. RxJS is a JavaScript implementation of this API. Functional reactive programming is an extension of functional programming, adhering immutability and functional concepts such as functors and monads. Reactive programming deals with asyn- chronous data flows, for example using streams. Combining these two, RxJS deals with asynchronous data flows using aforementioned functional concepts: const btn= document.querySelector("#btn");

// Create a stream of click events const stream$= Rx.Observable.fromEvent("click", btn);

stream$ .map(ev => ev.target.dataset.state) // Take the 'data-state' attribute .filter(Boolean) // Only pass non-empty values .subscribe(state =>{ // Values are pushed here

2. https://developers.google.com/v8/

15 4. Implementation

console.log("Button clicked with state: ", state); }); RxJS uses push-based method of providing asynchronous values. This means that it’s the library’s responsibility to call a callback when- ever a new value is produced in the stream. Pull-based method means the developer has to check if something got updated manually in intervals. Our implementation uses a technique called marble tests for testing RxJS streams. "Marble Tests" are tests that use a specialised Scheduler called the TestScheduler. They enable us to test asynchronous oper- ations in a synchronous and dependable manner. The "marble no- tation" is something that’s been adapted from many teachings and documents.3 import Rx from "rxjs/Rx"; import async from "rxjs/scheduler/async"; import deepEqual from "deepequal";

// The function we are testing. Simply delays the stream values. // Allow dependency injection via optional arguments: // - scheduler // - delay const delay= (stream$, scheduler= async, delay= 100) => stream$ .delay(delay, scheduler);

// Create a test scheduler const ts= new Rx.TestScheduler(deepEqual);

// Create marble diagrams of the input/output event streams const istream= "--a---|"; const ostream= "----a-|";

// Create the observable to test const stream$= ts.createHotObservable(istream);

3. https://github.com/ReactiveX/rxjs/blob/master/doc/writing-marble- tests.md

16 4. Implementation

// Mount an assertion that the output matches the diagram. ts.expectObservable(delay(stream$, ts, 20)).toBe(ostream);

// Run the test ts.flush();

4.1.6 Tape Tape is a lightweight test runner producing TAP (Test Anything Pro- tocol)4. TAP is a format produced by tests and consumed by tools that can:

• Format TAP in a human readable way

• Produce summaries of test runs

• Run other programs based on the test output

4.1.7 Flow Flow is a static type checker for JavaScript. It adds special type an- notations to the standard JavaScript syntax that enable static type analysis of the source code. There are three ways one can write type annotations:

• Inline types - used mainly in actual applications

• Comment types - used in projects that want types, but don’t want any additional build complexity

• Flow files - used mainly for annotating JavaScript libraries from NPM

The benefit of having type annotations directly in source codeis that it enables us to check the source file itself, which makes it possible to check any private variables or functions for that file. Example of type annotations directly in source code:

4. https://testanything.org/

17 4. Implementation

// File: index.js // Plain JavaScript export const multiply= (a, b) =>a* b;

// Inline types export const multiply=( a: number, b: number ): number =>a* b;

// Comment types export const multiply=( a/*: number*/ , b/*: number*/ )/*: number*/ =>a* b;

Since our project is a NPM module, we will use flow files. They declare exported variables or functions from a file with the same name located next to it. Flow files are good for typing NPM modules, because they nicely show the external API of the library and don’t require any additional build step for removing the types. Example of a flow file:

// File: index.js.flow // co-located with index.js declare export var multiply= (a: number, b: number) => number;

4.1.8 Babel Babel is a set of tools that compile modern or non-standard JavaScript, turning it into a standard, runnable code. Since all code has to go through babel, it enables developers to perform various optimisations, such as removing any unused code, evaluating and inlining static expressions. Our main use case of babel is transforming ES6 code to ES3, making it runnable in older web browsers.

4.1.9 Gulp Gulp is a build tool that pipes files through a pipeline that modifies them, then outputs them into a specified folder. Our library uses gulp

18 4. Implementation

in conjugation with Babel to transpile newer JavaScript versions to older ones.

4.1.10 ESlint ESlint is a static analysis tool that checks both semantic quality and code style quality. It is highly customisable and enables developers to implement their own plugins for any custom rules they would want to adhere. Our implementation follows the most popular code style as of 2017, the Airbnb JavaScript code style, its base variant to be precise. The original also adds rules for React, a view library written in JavaScript. Since we’re not using React, the base variant is sufficient. The standard is very strict, all semantic and stylistic rules are set up to an error level, meaning not even one mistake can be found without ESlint check failing. Having a consistent, generally accepted and strict code style is mainly beneficial for larger groups of developers and open source projects, because it ensures consistency and high code quality at all times, while also preventing many potential bugs.

4.1.11 Prettier Prettier is a code formatting tool. It decomposes code into an AST (ab- stract syntax tree), composing it back together in a consistent manner. Prettier is integrated into ESlint via a plugin and a configuration. This ensures that no ESlint rules conflict with the way pretter formats the code, as well as it allows ESlint to format the code prettier does, making it enough just to run ESlint’s fix function.

4.2 Requirements

Based on our previous description of the motivation for building a custom user event tracking technology, we can summarise our main goals of the implementation to be: • Declarative API • Minimal impact on existing code

19 4. Implementation

• Customisability of event payload

Since we are building a NPM module with the code being hosted on GitHub, we want our library to be as generic as possible. This means we will not include any backend or application specific code.

4.3 Library design

4.3.1 High-level overview

One fact that can really help us making the code declarative is the fact that events propagate. This means that when a button is clicked, unless the event is explicitly stopped from propagating using its native function, the event bubbles up the DOM tree. This allows us to listen for events in one place only, as long as we also provide a function for listening for events on nested elements. We can utilise the fact, that the topmost document object classifies as any other DOM element. Thus, mounting event listeners on the document object is the same as mounting event listeners to DOM elements. Combining the fact that events propagate and that the document object has event listeners as any other DOM element allows us to simply mount any DOM element dependent or independent event listeners on the document object. DOM events have a function called stopPropagation. This prevents the event from bubbling up the DOM tree, thus mounting event listen- ers only on the document object is insufficient, because certain events could get stopped from propagating. Our library will thus provide an API for mounting listeners on arbitrary DOM elements to cover this edge case.

20 4. Implementation

A diagram of the flow of user action to the API server.

4.3.2 External API External API refers to the library interface consumers of the library will interact with. We want the external API to be easy to use and understand, as well as making it flexible enough so it can be used ina majority of applications. A part of an external API is the name of the library. The name creepx was chosen for two reasons: • The application tracks user events, which can be considered creepy • Putting the letter x at the end makes it a compound word and is easy to remember

21 4. Implementation

The name can thus be remembered easily as it has a semantic meaning, as well as a slight cosmetic touch to make it unique (not a simple verb). From now on, we will refer to our library as creepx. Creepx exports several functions for tracking different events, as well as one so-called default function, that simply combines all the other functions into one. All the functions, including the default function, have a common signature. This is a Flow notation of the type signature: type Event= /* union of all event types */ type Subscription={| unsubscribe:() => any, |}; type Callback= (payload: Event) => any;

// The type of all exported functions type Creeper= (el: Element, cb: Callback) => Subscription;

Every function takes a DOM element and a callback function, that receives the event object. Every event object has a type property that can be one of the following:

• click - user clicked something

• copy - user copied something into their clipboard

• creepmove - user moved their mouse in a different direction than before

• cut - user cut out a text

• doubleclick - user clicked twice

• keydown - user pressed a keyboard key

• multiclick - user clicked more than twice

• paste - user pasted some text

22 4. Implementation

• rightclick - user right clicked

• select - user selected a text

• shakemove - user shook his mouse

• wheel - user scrolled the scrolling wheel

Some events contain a meta property with some information about the event, such as the current location of the cursor. Some events support the data property, which is taken from a data attribute of a DOM element. The data attribute should be named creepx. This attribute name was chosen because it is very easy to remember, since it is the same as the library name.

These attributes can contain custom data that the consumer of the library wants to track to their server in a JSON format. The simplest use case of creepx would be importing the default function and using it to mount the event handlers on the document object. This will capture every single event, except ones that were stopped from propagating:

import creep from "creepx";

creep(document, payload =>{ // call your API server });

Creepx contains certain optimisations so the callback is not called too often, such as buffering keydown events, however, it is still recom- mended to only actually track what is valuable for the consumer. Another use case is mounting listeners for events interesting for the consumer individually:

23 4. Implementation import { creepClicks, creepKeydown } from "creepx"; function callback(payload) { // call your API server } creepClicks(document, callback); creepKeydown(document, callback);

The last use case is listening for events on a specific DOM ele- ment, rather than the top-level document object. This can be useful for tracking events which are prevented from propagating: import { creepClicks } from "creepx"; import track from "../services/track"; const btn= document.querySelector("#mybutton"); if (btn) { creepClicks(btn, track); }

Every public creep function also returns an object with an unsub- scribe function, that removes any listeners created by that function: import creep from "creepx"; const sub= creep(document, payload =>{ // call your API server });

// ... later in code sub.unsubscribe();

Creepx also exports Flow types for all of the external API so that consumers can type check whether they are using the library correctly. Two types are exported:

• Event - union of all the event types

24 4. Implementation

• Subscription - the return value of creep functions

/* @flow */ import { creepClicks } from "creepx"; import type { Event, Subscription } from "creepx";

function callback(payload: Event){ // call your API server }

const sub: Subscription= creepClicks(document, callback);

4.3.3 Internal API The folder structure reflects the two main source parts of the package:

• Events - the events folder

• Utilities - the utils folder

Utilities are simply pure functions that are used by the event pipelines. Events refers to all the events that can be produced. Each file in the folder contains a pipeline that takes a stream as an input and outputs a stream of the mapped event objects.

// click.js import { async } from "rxjs/scheduler/async";

import extractData from "../utils/extractData";

const click= (stream$, scheduler= async, delay= 350) => // Takes a stream of events stream$ // Processes them .bufferWhen(() => stream$.delay(delay, scheduler)) .filter(list => list.length ===1) // Maps them to event objects .map(list =>({

25 4. Implementation

event: "click", meta:{ x: list[0].clientX, y: list[0].clientY, }, data: extractData(list[0].target), })); export default click;

Certain events have two additional optional arguments, the sched- uler and delay. A scheduler is a tool that determines when certain events should happen. Its default value is an instance of an asyn- chronous scheduler that works with actual time. The delay variable is then used in conjugation with the scheduler variable to determine the number of milliseconds the scheduler is working with. The reason the two variables are overridable is to allow synchronous marble testing. Every event has its corresponding marble test.

// click.spec.js import Rx from "rxjs/Rx"; import test from "tape"; import click from "../click";

// A mock event const event={ clientX: 13, clientY: 37, target:{ dataset:{ creepx: JSON.stringify({ lol: "kek" }), }, }, }; test("click", t =>{ // Synchronous test scheduler

26 4. Implementation

const ts= new Rx.TestScheduler((a, e) => t.deepEqual(a, e));

// Test that a single click produces a click event // input/output marble diagrams const iclick= "--e---|"; const oclick= "----v-|";

// A mock observable stream of events const click$= ts.createHotObservable(iclick, { e: event });

// Assert our processed input matches the expected output ts.expectObservable(click(click$, ts, 20)).toBe(oclick, { v:{ event: "click", meta:{x: 13, y: 37}, data: { lol: "kek"}, }, });

// Test that a double click does not produce a click event const idblclick= "-e-e--|"; const odblclick= "------|"; const dblclick$= ts.createHotObservable(idblclick, { e: event });

ts.expectObservable(click(dblclick$, ts, 20)).toBe(odblclick);

// Run the test ts.flush(); t.end(); });

With event pipelines set up and tested like this, we simply import them to the index file where our creep functions are located. They take the DOM element, create an event listener, pipe the stream of events to the pipeline and then subscribe to the event output by the supplied callback function. If more streams are piped into more pipelines, the resulting streams are merged.

27 4. Implementation

// index.js // Only 1 included function for brevity import Rx from "rxjs/Rx"; // Import pipelines import click from "./events/click"; import doubleclick from "./events/doubleclick"; import multiclick from "./events/multiclick"; import rightclick from "./events/rightclick"; export function creepClicks(target, callback) { // Create event streams const click$= Rx.Observable.fromEvent(target, "click"); const rightclick$= Rx.Observable.fromEvent(target, "contextmenu");

// Pipe and merge the streams return Rx.Observable.merge( click(click$), doubleclick(click$), multiclick(click$), rightclick(rightclick$), ).subscribe(callback); }

4.3.4 Build process

Since the source code is written in a modern version of JavaScript and we want to also support users who prefer writing code compatible with the older versions, we need to compile it to an appropriate standard. The compilation is done via Babel and Gulp. There are two output folders each with a different build setup, a lib and an es folder. Flow types are copied to both of the folders. Lib folder is compiled to ES3 and can be used by older versions of browsers and Node. ES folder is meant for usage by a modern JavaScript if someone wants to benefit from the modern module sys- tem.

28 4. Implementation

4.3.5 Release Every time before releasing a new version, the following steps are executed:

• Build folders are deleted

• Tests are run

• Types are checked

• Lint is run

• Build is run

Then the package is published to NPM. Only the output folders, NPM manifest (package.json file) and the README file are published, other files are not important for usage.

29

5 Usage and evaluation

n this chapter we will look how a typical user of the library performs its installation and mounting, what does he need to do to get the most value out of creepx and what kind of data does he receive. We will also compare the results of the observation to the require- ments, evaluate whether they were met and cover some implications developers need to keep in mind when using creepx. Both setup and evaluation will be done on an application that is an extension of a popular project called TodoMVC, where users can add, edit, remove or filter their tasks. The application is written in React, which enables us to easily dynamically change data attributes on HTML elements, thus allows for easy dynamic data tracking.

An overview of our demo application.

31 5. Usage and evaluation 5.1 Setup

5.1.1 Installation Creepx is distributed as a NPM module. In order to be consumed by a client application, it must be further compiled through a web bundling tool like Webpack. Installation is done using:

• Yarn - yarn add creepx

• NPM - npm i creepx

5.1.2 Mounting To set creepx up, we need:

• Importing the desired creep functions

• A track function

• A root element to listen for events on

First we import the required creep functions. import { creepClicks, creepKeydown } from "creepx";

Our tracking function will only log data to the console so it’s easy to see what the output of creepx events is. function track(payload) { console.log("Creepx ::", payload); }

We will then mount our listener functions onto the root element of our application. const app= document.getElementById("react"); if (app) { // Render our application

32 5. Usage and evaluation

hydrate(, app);

// Track whenever the user submits something via the 'Enter' key creepKeydown(app, payload =>{ if (payload.meta.key === "Enter"){ track(payload); } });

// Track all page clicks creepClicks(app, track); }

Our application will now listen for click and keydown events for its full lifecycle.

5.1.3 Dynamic data

Once we have the initial setup in place, we want to add our custom tracking payload that is relevant to the application. For this, we can use the creepx data HTML attribute. Since our demo application uses React, adding HTML attributes is trivial, thanks to React’s JSX syntax, which is an extension of JavaScript that renders HTML the same way it is written in code.

Together with our creep function setup, if a user clicks the input element, or he presses the enter key while in editing element, our track function gets called.

33 5. Usage and evaluation 5.2 Example use case

An example data set that is desirable to be collected in a task managing application consists of:

• The name of the user

• Any actions the user performs on his tasks

Thus, in addition to the basic setup, we need to add creepx data attributes to the required elements. To keep things consistent, our creepx data attributes will have a common signature:

// Used on input fields type DataInput={ target: string, value: string, };

// Used on buttons type DataButton={ action: string, item?:{ id: string, text: string, complete: boolean, }, };

We put the creepx data attributes on:

• User input

• New todo input

• Edit todo input

• Complete task

• Complete all tasks

34 5. Usage and evaluation

• Delete task • Delete completed tasks • Change task filter

When user submits a field using the enter button or when he presses a button that performs an action, our tracking callback gets called. Let’s look at an example set of data gathered after a common use of the application.

A user with an username Boris entered the application.

He created 4 new tasks.

35 5. Usage and evaluation

The user toggled an action as complete.

He has deleted a task.

The user changed the view filter to only display active tasks.

The user has cleared completed tasks.

36 5. Usage and evaluation

Adding a simple tracking call to the beginning of the application will also allow us to determine the bounce rate. If the user entered the application, but leaves before entering his username, we know he did not access the rest of the application. For this to be possible, we need to keep a track of which session belongs to which events. We can accomplish this by modifying the track function to also include a session ID.

const session= uuid.v4();

function track(payload) { // Add the 'session' parameter console.log("Creepx ::", Object.assign({}, payload, { session, }); }

// ... rest of the code

// Use our existing 'track' function // at the start of the session track({ event: "start" });

The existing setup together with keeping a track of the session ID gives us information about both the bounce rate and the user actions. If our application was monetized, we would also know the conversion rate from monitoring the bounce rate and a payment action. Bounce rate helps us tracking the ease of use of our application, while user action tracking is essential for resolving user issues, keeping track of trends and make statistics.

5.3 Performance implications

5.3.1 Client While the impact on performance of the web application itself is min- imal and only consists of the event listeners. The stream pipelines themselves are optimised well and do not contain any complicated

37 5. Usage and evaluation computations with higher than linear complexity. Even then, events happening often are debounced have a threshold. That said, if a user sets up all the event listeners and their tracking function contains a more complex logic, it could have an impactful performance hit on slower devices. It is thus recommended to keep the tracking function lightweight as well as only track events that are meaningful to the consumer.

5.3.2 Server Users of creepx also need to keep in mind the load their log server is able to withstand. Tracking every possible event while having hun- dreds of thousands daily page visits makes up for quite a bit of server requests, thus creating a high load. This can be a problem not only for web servers themselves, but also for the database responsible for writing the data.

38 6 Conclusion

Tracking user events is a highly discussed topic in today’s web develop- ment world. Many existing platforms exist with a different approach, however, more often than not, a complementary in-house solution is also required due to specific needs of each application. Our goal was to provide developers with a tool that aids setting up an in-house tracking system. The implementation should be:

• Declarative

• Customisable

• Easy to adopt

Creepx utilises parts of the browser API that allow a simple setup while keeping the data developers want to track customisable. The only imperative API is the actual setup of creepx, adding custom data is done in a declarative manner using HTML5’s data attributes. The library is thus easy to adapt in an existing codebase and cus- tomisable via the data attributes to aid the majority of possible use cases of tracking user actions.

39

Bibliography

1. WONG, Catherine; ERROR, Brett Michael. Web-beacon plug-ins and their certification. Google Patents, 2013. US Patent 8,352,917. 2. SCHUFF, Andreas Haas Andreas Rossberg Derek; WAGNER, Ben Titzer Dan Gohman Luke; HOLMAN, Alon Zakai JF Bastien Michael. Bringing the Web up to Speed with WebAssembly [Draft]. 3. CROCKFORD, Douglas. The application/json media type for javascript object notation (json). 2006.

41