MASARYK UNIVERSITY FACULTY OF INFORMATICS

Document Maps: Visualization Tool for Semantic Document Representations

BACHELOR'S THESIS

Michal Petr

Brno, Spring 2021

MASARYK UNIVERSITY FACULTY OF INFORMATICS

Document Maps: Visualization Tool for Semantic Document Representations

BACHELOR'S THESIS

Michal Petr

Brno, Spring 2021

Declaration

Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Michal Petr

Advisor: RNDr. Vít Novotný

i

Acknowledgements

I want to thank my supervisor Vit Novotny and advisor Jan Byska for their professional supervising, skilful advice and help with the design and implementation of the application, and their general enthusiasm, willingness, and patience with the subsequent work.

iii Abstract

Visualising textual data can be beneficial for fields such as machine learning, but unlike computers, which can work with data that make no sense to our human brains, we sometimes require to understand what data we are working with and thus require a better way to analyse such large sets of data. The main objective of this thesis is to study and research the means of visualising interactive sets of textual data, such as documents in a corpus, and to implement an interactive web application that would allow us to visualise this data. This application would also help us analyse all the documents' mutual similarities by putting them in a force-directed simulation and by giving us insight into which words in each document contribute to their similarity.

iv Keywords

D3.js, HTML, JavaScript, visual analytics, visualization, web applica• tion, web development

v

Contents

1 Introduction 1

2 Background 3 2.1 Data visualisation 3 2.2 Visualising similarities of textual data 3 2.2.1 Spatial arrangement visualisations 4 2.2.2 Force-directed algorithm 4 2.2.3 Scatter plot 5 2.3 Calculating distances 5 2.3.1 Euclidean distance 6 2.3.2 Manhattan distance 6 2.3.3 Cosine similarity 7 2.3.4 Soft cosine similarity 8 2.4 Modern web development 9 2.4.1 Front-end JavaScript frameworks 9 2.4.2 Component-based web development 10 2.4.3 Hyper-Text Markup Language 10 2.4.4 Cascading Style Sheets 10 2.4.5 JavaScript 12 2.4.6 TypeScript 12 2.4.7 Reactive Extensions for JavaScript 12 2.4.8 Node.js 12 2.4.9 Angular 13 2.4.10 Scalable Vector Graphics 13 2.4.11 Data-Driven Documents 13 2.5 Existing tools for visualizing similarities 14 2.5.1 Sketch Engine 14 2.5.2 VisCoDeR 14 2.5.3 Data-for-research browser 15

3 Design 17 3.1 Required features 17 3.1.1 Visualizing similarities in a corpus 17 3.1.2 Word contribution to similarities 18 3.1.3 Highlighting of word matches 18

vii 3.2 Concepts 18 3.2.1 Initial screen 18 3.2.2 Map design 20 3.2.3 Comparison screen 21

4 Implementation 25 4.1 Application components 25 4.1.1 App component 25 4.1.2 Init component 26 4.1.3 Home component 26 4.1.4 Graph component 27 4.1.5 User interface component 27 4.1.6 Sidenav component 28 4.2 Services 28 4.2.1 Query service 29 4.2.2 Loading service 29 4.2.3 JSON validate service 29 4.3 Pipes 30 4.3.1 Escape HTML pipe 30 4.3.2 Pair split pipe 30 4.4 Corpus loaded guard 30 4.5 Graph data web worker 30 4.6 Utility libraries 31 4.6.1 Query utility library 31 4.6.2 Graph utility library 31 4.6.3 Various utility library 32 4.7 Documentation 32

5 Conclusion 33 5.1 Future work 33

A The live demo and the source code 37 A.l Building the project 37 A.l.l Installing the project 37 A. 1.2 Starting the development server 37 A. 1.3 Building for deployment 38 A. 1.4 Generating documentation 38

viii B How-to guide 41 B.l Importing the corpus 41 B.2 Navigating the map 41 B.3 Selecting nodes 41 B.4 Viewing the document content 42 B.5 Comparing documents 43 B.5.1 Selecting words 43 B.6 Changing settings 44

Bibliography 45

ix

List of Figures

2.1 Causes of mortality by Florence Nightingale 4 2.2 A simulation of a force layout diagram 5 2.3 A similarity scatter plot 6 2.4 A depiction of measures 7 2.5 A diagram of component based development 11 2.6 A comparison of raster and vector graphics 13 2.7 The thesaurus concepts, designed by Lucia Kocinová 15 2.8 The results of VisCoDeR 16 3.1 Concept drawings of the initial screen 19 3.2 A drawing of the loading screen 20 3.3 The concepts for the map design 20 3.4 The range of colours used to depict deviations 21 3.5 The expanded settings menu 22 3.6 The first conceptualizations of the comparison window 22 3.7 The initial word match selection concept 23 3.8 The final comparison screen concept 24 4.1 The hierarchical structure of the implemented components 26 4.2 The comparison of naive and sRGB colour mixing 31 5.1 A screenshot of the developed application 35 B.l The on-screen camera controls of the application 42 B.2 The on-screen UI element displaying the deviation error 42 B.3 The comparison button used to access the comparison screen 43 B.4 The word selection UI, showing checked and unchecked words 44

xi

1 Introduction

Attempting to compare whether two documents are similar or not could be a dubious task for the human brain. Many documents could have multiple topics they talk about, making it harder to determine this problem. They could also use the same terminology but talk about something entirely else, or there could even be thousands of documents, which could take a human years to go through. We would therefore like to use a computer to perform this com• paring action, giving us a summary of why the documents are similar. To do this, we would need to develop an application that would give us this insight into a large set of documents organised into a corpus. This thesis aims to develop a user-friendly open source front-end for such an application, where the corpus data will eventually be pulled from a Representational State Transfer Application Program• ming Interface, or REST API for short. This front-end will aid us in exploring such corpus data and help us visualise and summarise what contributes to the similarities of any two documents. The thesis is divided into four chapters. The first chapter provides an overview of the current state of visualising textual data. First, we ex• plore existing tools used to visualise textual data, and then we will go over the means of developing a modern web application. The second describes what is expected from an implementation of the applica• tion and shows the creative process of iterating over the application's design. In the third chapter, we will go over the implementation de• tails and the component structure of the implemented code. We will conclude in the fourth chapter by summarizing the contributions of this thesis and suggesting directions for future work. We can also find some additional information in the appendices, such as the link to the source code and a simple usage guide for the tool.

1

2 Background

This chapter will go over some of the techniques used to visualise textual data and similarities between two texts and the technologies used to create modern interactive web applications. We will also briefly examine some existing tools used for such visualizations of similarities.

2.1 Data visualisation

Data visualisations and analyses are needed now more than ever as we shift to a more digitised world. It is no longer just an activity for scientists and statisticians, and many ordinary people require and need it in some way or another. [1] With the acceleration of productivity, we require quick means of consuming as much information as possible. To reach this lofty goal, we need a better way to visualise and analyse said information:

Fortunately, we humans are intensely visual creatures. Few of us can detect patterns among rows of number, but even young children can interpret bar charts, extracting meaning from those numbers' visual representations. [2]

Although today's need for visualising data may unprecedented in its urgency, visualizations have been around for quite some time. Archaeologists have discovered many artefacts from ancient history that seem to be visualising numerical data. [3] Innovative data visualisation also saved lives during the Crimean War in the 19th century. The statistical prodigy Florence Nightingale created a new type of diagram, portrayed in Figure 2.1 on the following page, showing the primary cause of death among the army actually to be disease, helping to stop its spread.

2.2 Visualising similarities of textual data

When we are talking about the similarity of two documents, we can have numerous metrics in mind. However, the most desired metric

3 2. BACKGROUND

HAfiRAfMC OP TO ZXU$&$ OF MORTALITY APRIL 1855 10 MARCH 1856. IN THE ARMY IN THE EAST, ATHTI 1H54 joMARClf 1A55.

Tha drvasi vfih'.Riu-,. fJ,. & IhisJc- wedges arfy eaisA measured- frenis the.-centre as the-- emanm, eerteeat

/'the- kill'; tmdut*- mwured f'rm,- tit* wJr* e/'lhcwr.i/:. ,y«,„rsu. '<,-/-'/, ihr ,//;:.•' //,,:• tii.tJi'* i'r/.-it; /'ri-m./J.-j"/ la'Aiiii/j/if/i'i '/ii/i/wi,:-, U'J'W.I:.:-: /•':• TtAwedff'.- ,;it(t:~,i.nd /-rem, tit? c/.-i/r tin- riealhx ii-av n-r/ifid': ii. ///•• blafih weiUf'i IMHSUWJ/ f'ren' tin /-.-,>?;•! tin /l.'/i/hj fi•••w tiit i,ih: r cti'ix'. • Tiu UacM tint tea-m-.- ??4 /i-m.•'./•// in -,\'nt' i.i'Si- in ark-* /As imfidtay //'/in /.'.'••'//'•>• fi•!„•!' -I'J '-r rj;!7.s-t d-y-fin/i th*. rn//t,i/t,.

In-0cldber48$4-l A' .if/ii ifi-i-i.-thi i'i ".<:.'/• arm ^<'',U'.-id/-.s:/i:/•'/.- ,•/•.••/ ;si -inY?n«r,»,-u./u-y iXi/.i. •.'!,'• '.Mtini.il.-: :"/././' //:,-• ''/•id:. Hie-entire-artas may be rtntpartj in,- ,'iil>atin\

Figure 2.1: The "Diagram of the causes of mortality" by Florence Nightingale that helped prevent the spread of disease during the Crimean War [4] we want to analyse on with a text document is the specific topic it conveys. Visualising such a metric would allow us to easily group up countless similar documents within a blink of an eye.

2.2.1 Spatial arrangement visualisations

Visualising the arrangement of multiple elements is done the easiest and most clearly on what is called a map. According to the Oxford Dictionary, a map is a diagram or collection of data showing the spatial arrangement or distribution of something over an area. [5] The easiest and the clearest way to visualize a similarity be• tween many documents within a corpus at once would be in a two-dimensional spacial arrangement, where the individual compo• nents are placed in space and interact with each other based on their similarity, clumping documents together if they are similar.

2.2.2 Force-directed algorithm

A force-directed algorithm, also called a spring-embedder algorithm, is an algorithm primarily used in simulations for drawing individ-

4 2. BACKGROUND

Figure 2.2: A simulation of a force layout forming a tree-like struc• ture [7] ual nodes and their relations to one another, and is portrayed in Fig• ure 2.2. [6] It binds the nodes together with force pulling or pushing away like a string. Such an algorithm is ideal for interactivity within a force-directed graph simulation, as we can change these forces on the go. Using this algorithm would be ideal for our purpose, as we require this interactivity within our application.

2.2.3 Scatter plot

An alternative to force-directed graphs is pre-computing the coor• dinates of each node and proceeding to plot them on a scatter plot, where every node is a point, as illustrated in figure 2.3. A pre-computing algorithm like this usually sacrifices interactiv• ity for accuracy, but since we are attempting to create an interactive application, this method is unfeasible for our purposes.

2.3 Calculating distances

Since we have chosen to use force-directed graphing, we will require to calculate the forces applied to each node. If we attempt to visualise the data based on their similarities in a vector space model (VSM) [9],

5 2. BACKGROUND

Figure 2.3: A scatter plot, depicting the similarities of songs calculated by the Principal Component Analysis algorithm [8] we now require to calculate these forces based on a metric of a high- dimensional metric space.

2.3.1 Euclidean distance

The simplest way to measure the similarity between two vectors is to calculate the Euclidean distance between two nodes. The Euclidean distance is define in formula 2.1, where a and b are fc-dimensional vectors, depicted in Figure 2.4 on the facing page. We can then apply the force between two nodes based on this distance measure. [6]

(2.1)

However, calculating the similarity from the Euclidean distance suffers from the curse of dimensionality, where the distance between the endpoints of two vectors near the origin could be the same as the distance between the endpoints of two vectors far from the origin. [10]

2.3.2 Manhattan distance

An alternative, yet related distance measure, is the so-called Manhat• tan or city block distance, where we simply sum up the distances of the absolute differences of the two endpoints in each dimension, as

6 2. BACKGROUND

i

\ 3

Figure 2.4: A depiction of each measure described in two dimensions. Euclidean distance (left), Manhattan distance (center) and Cosine similarity (right) shown in formula 2.2, where a and b are once again fc-dimensional vectors, depicted in Figure 2.4.

k

^Manhattan= ^ K ~~ ^'1 (2>2) i=l This method yields similar results to the Euclidean distance. However, since the distances are not squared, the magnitude of the differences is preserved.

2.3.3 Cosine similarity

A common way to measure the similarity between two high- dimensional vectors is to calculate the cosine of the angle between the two vectors, see formula 2.3, where a and b are again fc-dimensional

an vectors, \\v\\ = dEUCiidean(^/^)/ d 6 is the angle between the two vectors, and again depicted in Figure 2.4.

cos(d) = ,,^,1 | = / , 1 , (2-3)

This method will yield a value in the range [—1,1] where 1 indicates that the documents are practically the same and —1 indicates the

7 2. BACKGROUND opposite. With the cosine similarity, we are essentially only taking into account the orientation of the document vector, not its magnitude. The main advantage of the cosine similarity over the other two beforementioned methods is the fact that even if two points are far apart, the angle between them can be small, which helps combat the curse of dimensionality. [11]

2.3.3.1 Cosine distance

To interpret the cosine similarity as a quasi-distance measure, we sub• tract the result from 1, as demonstrated in Formula 2.4. This transforms the cosine similarity to the range [0,2], where two elements are the same when their quasi-distance is zero.

^cosine(0) = l-COs(0) (2.4)

However, the cosine distance is not a proper distance metric, as it violates the triangle inequality. [12] This violation is not necessarily a problem for the purpose of this thesis, but it is a limitation in some fields of studies, like the indexation of metric trees, where we need to use angular distance.

2.3.4 Soft cosine similarity

We can extend the definition of the cosine similarity by comparing the two vectors based on the similarity of their individual features. A simple example is presented by Sidorov et al.:

Suppose that we have two texts: (1) play, game, (2) player, gamer. It defines a 4-dimensional VSM with the following features: play, player, game, gamer. We have two vectors a and b: a = [1,0,1,0] and b = [0,1,0,1]. The traditional cosine similarity of these two vectors is 0. But if we take into consideration the similarity of words, it turned out that these vectors are quite similar. [13]

This method allows for a better and more refined notion of document similarity for our purpose, because we consider the meaning of words as well.

8 2. BACKGROUND

We can calculate the soft cosine similarity measure by using the formula 2.5, where a and b are fc-dimensional vectors, and S = (s)y is a matrix of feature similarities.

yk_ Tk_ s-a-b-

softcosines(a, b) = ^i=i_H=i v ' 1 ^

2.3.4.1 Soft cosine distance The soft cosine similarity is just the cosine similarity in a vector space transformed by the change-of-basis matrix E, where S = EET. [14] It follows that we can again transform the soft cosine similarity into the soft cosine quasi-distance just by subtracting the result from 1, as demonstrated in Formula 2.6.

dsoftcosine(fl, &) = 1 — softcosine(fl, b) (2.6)

2.4 Modern web development Developing dynamic web applications in this day and age quickly and efficiently differs substantially from what methods we had available a few years ago. [15] Web development evolves at a fairly steady pace and keeps chang• ing frequently. [16] If one does not keep up with the ever-changing flood of new technologies within the world of web development, only for a few months, they may find themselves lost within the influx of new information. 2.4.1 Front-end JavaScript frameworks As web development was, and still is, mainly evolving, many new technologies and libraries arose to ease the development of JavaScript applications. Although many libraries have been created since the inception of JQuery in 2006 [17], none of them offered a genuinely robust and self-contained development. A front-end JavaScript framework is a large collection of functions and tools that provides ease of development. They usually offer cross- compatibility across devices, operating systems, and browsers, more

9 2. BACKGROUND accessible asynchronous communication with a back-end server, form controls, and many more features. Another significant advantage of a front-end framework is that all is rendered client-side, removing servers from the equation and lifting unnecessary load. According to Benitte and Greif [18], the most popular front-end framework nowadays is considered to be React. However, React is merely a library [19], although one with a powerful ecosystem. This survey would then place Angular as the most popular framework, with Vue.js being a close second.

2.4.2 Component-based web development

When developing an application, not necessarily one aimed for the web, we usually attempt to separate our implementation into indi• vidual components, each tasked with their own concern, depicted in Figure 2.5 on the next page. This methodology is known as the separation of concerns and is one of the building blocks of software engineering. [20] Component-based web development emphasises this building block and utilises the reusability of such components. This process then reduces the possibility of errors and allows for easier develop• ment and future refactoring. [21]

2.4.3 Hyper-Text Markup Language

The hyper-Text Markup Language, or simply abbreviated to HTML, is a web standard initially developed and released by the World Wide Web Consortium, also known as the W3C. [23] HTML instructs the browser how individual components should be hierarchically structured on a web page in what is called a Docu• ment Object Model (DOM) tree. [24]

2.4.4 Cascading Style Sheets

Cascading Style Sheets, or CSS for short, is another web standard developed by the W3C. CSS instructs the browser how the given

10 2. BACKGROUND

Figure 2.5: A diagram of the hierarchical structure of component based development [22]

components, written in HTML, should be visualised and presented to the user. [23]

2.4.4.1 Syntactically Awesome Style Sheets

Since the code written in CSS can get very repetitive at times [15], a handful of tools have been developed to reduce the size of the code. One such tool is Syntactically Awesome Style Sheets, commonly, ab• breviated to SASS. Developed by Hampton Catlin, Natalie Weizenbaum, and Chris Eppstein, SASS converts the shorter code into full-length CSS code upon compilation, maintaining backwards compatibility with browsers and reducing the likelihood of human errors.

11 2. BACKGROUND

2.4.5 JavaScript

JavaScript is part of the triad of technologies that all Web developers must learn: HTML to specify the content of web pages, CSS to specify the presentation of web pages and JavaScript to specify the behaviour of web pages. [25]

JavaScript follows the ECMAScript standard, designed by Ecma Inter• national, which gets updated on an annual basis. [26] It is the beating heart of all interactive applications and tells the browser how elements on a web page should behave and how they should interact with each other and the end-user.

2.4.6 TypeScript

JavaScript, as it currently stands, lacks any kind of static typing. This means that anyone can pass any type of object into any function, which can cause some issues with bigger programs where we require data integrity. TypeScript is a superset of JavaScript that solves this issue [27], introducing not only static typing but many other valuable features, like access modifiers, interfaces and other object-oriented features. TypeScript is actively developed and maintained by . [28]

2.4.7 Reactive Extensions for JavaScript

Reactive Extensions for JavaScript, shortened as RxJS, is a JavaScript library used for a powerful and extensive approach to events. [29]

2.4.8 Node.js

Node.js is an asynchronous event-driven JavaScript runtime. [30] It allows for easy management of dependencies of a project and is also used to run JavaScript code outside of the browser. Node.js was originally created by Ryan Dahl and is now developed by the OpenJS Foundation.

12 2. BACKGROUND

oo Figure 2.6: A comparison of a circle scaled up in vector (left) and raster (right) graphics

2.4.9 Angular

Angular is a JavaScript framework, which we've introduced in Sec• tion 2.4.1, developed and maintained by , utilising component- based user interface development. [31] Angular has been in development since 2010 but went through a major rewrite between the years 2014 and 2016, breaking backwards compatibility. This had caused a considerable backlash, with many users abandoning the framework altogether, but it has since regained some popularity. [18]

2.4.9.1 Angular Material

Angular Material is a component library made for Angular, which Google has created. Its main goal is to unify the styling of compo• nents and add accessibility while maintaining performance and cross- compatibility. [32]

2.4.10 Scalable Vector Graphics

Scalable Vector Graphics, also abbreviated to SVG, is a markup lan• guage used for drawing two-dimensional vector images, supporting animations and interactivity. [33] As shown in Figure 2.6, the vector graphics are infinitely scalable without losing any quality, unlike the raster graphics.

2.4.11 Data-Driven Documents

Data-Driven Documents, commonly abbreviated to D3, is a JavaScript library used for manipulating documents based on some input data.

13 2. BACKGROUND

It uses a simple DOM binding to the data provided, allowing us to manipulate the input data and visualise it in countless ways. [34]

D3 is not a monolithic framework that seeks to provide every conceivable feature. Instead, D3 solves the crux of the problem: efficient manipulation of documents based on data. [34]

D3-force is an extension library for D3, which allows for simple visualisations and simulations of force-directed graphs, which we have discussed in Section 2.2.2. [35]

2.5 Existing tools for visualizing similarities

There are a few tools for exploring textual data available on the internet, which we shall now briefly explore.

2.5.1 Sketch Engine

Sketch Engine is a web service for querying and analysing corpora and texts, used primarily for lexicographical purposes. [36] It allows for a quick analysis of an extensive data set of textual data, finding word matches, synonyms, and even translations. Kocinova [37] had created some excellent visualisations using the data from Sketch Engine, one of which visualise the similarities between words portrayed in Figure 2.7 on the next page. This visuali• sation is, in principle, the same subject we want to visualise, although with documents.

2.5.2 VisCoDeR

[VisCoDeR is] a tool that leverages comparative visualiza• tion to support learning and analyzing different dimension• ality reduction (DR) methods. [8]

VisCoDeR defines two modes; the discover mode and the explore mode. The discover mode, which takes a data set of high-dimensional vectors and neatly displays how different advanced algorithms reduce

14 2. BACKGROUND

Figure 2.7: The thesaurus concepts, designed by Lucia Kocinová, visu• alising the similarities of words [37] these dimensions so our human minds can clearly read and interpret them, also visible in Figure 2.8 on the following page. The explore mode, which helps with analysing parameterised dimensionality re• duction results. However, this tool is too generalised for our purpose, as it deals with simple yet large data sets and not complex textual data sets.

2.5.3 Data-for-research browser

The Data-for-research browser, more commonly known as DFR- browser, is an interactive tool developed by Andrew Goldstone et al. It uses a large set of publicised articles spanning the entire last century and the current century. Its main intention is to group up these documents based on their topic, generated from the most common words in the documents. [38] This tool could inspire us in the future when implementing a metric for grouping the elements.

15 2. BACKGROUND

OVERVIEW \T\ 0 Q LLE DIMENSIONS [netall Enter Sandman

I /» v-. k. rs. ih. c:i3 fir

|T| brjsh « SVD • EIG neighbours 12

Q proximity ,reW- •

Figure 2.8: The results of VisCoDeR, comparing various dimensionality reduction algorithms [8]

16 3 Design

In this chapter, we will go over the required features of our application, later implemented in Chapter 4, and explore the proposed design concepts of the user interface.

3.1 Required features

For the purpose of our application, we required the program to be written as an open source web application for easier accessibility on the internet and the cross-compatibility on browsers and operating systems. In this thesis, we will use the terms document or text, describing the individual compared objects containing the actual words we want to compare the documents by, the term query, describing an inputted search query from the back-end, and the term corpus, describing a collection of these documents, also containing the individual word similarities. As we are only developing the front-end of the application, the back-end that would return this corpus is currently represented with a simple JSON file, but the application is built in such a way that it can easily be extended to pull the data from a REST API on a server.

3.1.1 Visualizing similarities in a corpus

Our application allows us to interactively explore all of the documents in a corpus by running a force-directed simulation based on their mutual similarities. The force and the distance are determined by the soft cosine simi• larity and the soft cosine distance, respectively. The simulation then finds the most stable state of the nodes, representing the documents, and gives the user the option to move around the map and select and view the document's content.

17 3. DESIGN

3.1.2 Word contribution to similarities

Upon selecting exactly two documents on the visualised map, the application gives the user the option to compare the two documents. This action opens up another screen, where the user can see all matched words between the texts, either soft matches or exact matches, and see their percentual contribution to the total value of the soft cosine similarity of the two selected documents.

3.1.3 Highlighting of word matches

While in the comparison screen, the user is given the option to select certain matched words. The application then highlights the match in the actual content of the two documents, informing the user of the context of the selected matches.

3.2 Concepts

Since we have decided to write our application in Angular, the de• sign and the interface of Angular Material were in our minds when designing the individual components. Angular Material, discussed in Section 2.4.9.1, comes with some neatly designed components and pre-defined themes that contain web-safe and accessible colours, so we decided to use those. We needed to produce some components from scratch, since they are either too complex or required a feature that is not yet supported by Angular Material. We describe the components we have created in the following subsections.

3.2.1 Initial screen

The initial screen welcomes the user to the tool, so it is generally a bad idea to drive off the audience with an overwhelming amount of information, buttons and other fidgets. The concept, shown in Figure 3.1 on the next page, shows the simple and straight-forward nature of the component. As we need to introduce the application, it contains the application's name and its

18 3- DESIGN

Figure 3.1: Concept drawings of the initial screen (left) and the text box (right) general description, and gives the user multiple options on how to proceed with the tool. To draw the user's attention to the buttons, they have been coloured, making them stand out to the user's eyes. Each of these buttons then describes, with icons and text, what it does.

3.2.1.1 Inserting as text

Since it can be pretty useful for testing or swiftly adjusting data, a concept was composed, shown in Figure 3.1, giving the user the option to to enter the data as text instead of a file. This simple window would prompt the user to paste in data, give the option to go back with a button or send the data to the app with a highlighted button.

3.2.1.2 Loading screen

When the program starts to process the input data, it can take some time to do so, especially with larger data sets. Since it is generally not good not to give the user any feedback on the progress, the concept shown in Figure 3.2 on the next page would give the user some insight on the progress of the calculation of the simulation.

19 3. DESIGN

Figure 3.2: A drawing of the loading screen

I Doc. 2 • Vec 1

• Vec 3, 0 Que*? -f i •Vet 2. Doc. •? # Doc. 3

Figure 3.3: The concepts for the map design, showing the first iteration (left) and the latter iteration (right)

3.2.2 Map design

The map design went through some minor rework since the first iteration, but the concept stayed the same: show the documents as nodes on a map with their respective labels. The concept, depicted in Figure 3.3, shows the first iteration of the map display. Due to their smaller size, the nodes turned out to be hard to select. This meant that the nodes had to be enlarged in later iterations, sacrificing some precision for a better user experience when nodes were tightly packed. This problem was later resolved by implementing a distance scale option, discussed in Section 4.1.5.1.

3.2.2.1 Visualising node deviations

Since we are reducing the number of dimensions in the corpus to just two dimensions, the distances are inevitably going to be either larger or smaller than in the original vector space. This issue usually only happens with larger distances between nodes.

20 3. DESIGN

Figure 3.4: The range of colours used to depict the deviations, blue meaning that the node should be closer and red meaning that the node should be further away

To inform the user of this deviation from the actual length when the user selects a single node, we colour in the nodes based on the difference of the Euclidean distance and the soft cosine distance as depicted in Figure 3.4. Considering the colours could have arbitrary meaning to the user, we need to inform the user of each colour's meaning. We can do this by displaying a legend and an indicator that appears when the deviations are shown.

3.2.2.2 Camera controls

The user can navigate the map by dragging it around with the mouse and scrolling the mouse wheel, panning and zooming the map in or out, respectively. However, for better accessibility, some of the controls can be added as a user interface element. In both the map concepts, shown in Figure 3.3 on the facing page, we can see this user interface on the side of the map itself.

3.2.2.3 Settings menu

Giving the user the option to customize the visualization to their needs is a desired feature. Doing otherwise would compromise the interactivity of our application. The concept portrayed in Figure 3.5 on the next page shows the possible design of a settings menu, which opens upon clicking the cogwheel button.

3.2.3 Comparison screen

As the main feature of the application, we need to convey the most important message in this component: which word matches between the two documents contribute to the documents being similar.

21 3. DESIGN

Figure 3.5: The expanded settings menu

T.tt>~ I ... 'Vv^ v\ Te»t... A. ... Text ... -—• «wv. < >

Figure 3.6: The first conceptualizations of the comparison window, side-by-side (left) and carousel-like (right)

3.2.3.1 Early approaches

Initially, these word matches were to be displayed side by side, shown on the left of Figure 3.6, which had the side-effect of dissociating the snippet out of the context of the sentence. Another experimental concept was briefly considered, shown on the right of figure 3.6, putting the matches below each other, showing one at a time but being able to switch through them in a carousel-like fashion. This concept would allow us to show more of the surroundings of the match, but it still would not solve the issue of portraying the scale and frequency of the matched words.

22 3. DESIGN

Oct 4

I — gat

"S* A, A

Doc 1

/- /V 1

Figure 3.7: The word match selection concept, without the match separation

3.2.3.2 Word match selection

Instead of showing all the word matches at once, an option can be given to the user to tick only the words they are interested in. This would reduce the size of the list and remove some matches they have no interest in, like the uninformative matches of conjunctions and prepositions. A better approach than the carousel would be to display the entire document on screen, as shown in Figure 3.7, highlighting the matches directly in the text. This approach would give the user the most context for each match while also informing them of the frequency of each match. Initially, each ticked match was supposed to have its own colour, but this idea was quickly scrapped, since having more than just three colours would be confusing. Instead, the decision was made only to differentiate the exact matches, when the word is exactly the same, from the soft matches, when the words are only similar, but not the same. This led to the eventual splitting of the match column in two, one for exact matches and the other for these soft matches, as depicted in Figure 3.8 on the following page.

23 3. DESIGN

Cbc'« Dec. 1

Sxttd w* fries

gj -— ^yamm^i 17*. • ^ mam sirr » v/. • ~- m 5% Of — § iv. gi^r 1 o.i'/.

Figure 3.8: The final comparison screen concept

24 4 Implementation

The implementation of the application was done as a web application, as discussed before in Section 3.1. For this purpose, a default Angular 11 project was created. All of the variables and functions have been thoroughly documented in the source code, and more complex func• tions and actions have been accompanied with additional comments, more explored in 4.7. The following subchapters are split into their respective Angular functionality. Firstly, we will explore the individual user interface components, following the concepts established in Section 3.2. Then, we will go over the services and other smaller modules. Finally, we will inspect the utility libraries.

4.1 Application components

The components of the application are mainly responsible for display• ing the user interface. They usually contain the logic that is necessary to present the data in the HTML template. The hierarchy of the imple• mented components can be viewed in Figure 4.1 on the next page.

4.1.1 App component

The main wrapper component for all components is the App com• ponent. It contains little logic since most data passed between other components is done through services. The component, however, contains the loading screen, which can be shown at any time based on the data received from the Loading service, later implemented in Section 4.2.2. This has to be done because when routing1 between components, no actual child components are loaded.

1. Routing is the act of loading specific application components based on the active Uniform Resource Locator (URL).

25 4. IMPLEMENTATION

App

Home In it

Graph User Interface Sidenav

Settings Document Comparison

r > r Document Comparison Content Entry

Figure 4.1: The hierarchical structure of the implemented components

4.1.2 Init component

The Init component follows the design of the concept discussed in Section 3.2.1, containing an introductory text and a few buttons that put the user into the actual program. The HTML and SCSS define several wrappers and a number of form elements that are either wrapped in the Angular Material com• ponents or use its directives: functions modifying the DOM elements that they have been applied to, with simple styling applied to them. The TypeScript logic then mainly contains validation callbacks to the JSON validate service, later discussed in Section 4.2.3.

4.1.3 Home component

The Home component is the parent of most of the actual program's components. It contains and stores most of the data that is being passed between the children as the single source of truth2.

2. The single source of truth is a variable that is used by many child components but only located in one place.

26 4. IMPLEMENTATION

The TypeScript code initially requests the web worker, later dis• cussed in Section 4.5, to start processing the imported data, calculating the soft cosine similarities between the documents, so that we can in• terpret the data for the graph. The Typescript code also defines multiple event handlers since each child's raised event needs to be processed in this component. This is because most of the data sources originate here. More useful functions are also defined here, such as helper functions that manipulate the selection array. The HTML defines the Angular Material wrapper for the side navigation and the child other child components, passing through the stored data.

4.1.4 Graph component

For our drawing and simulation logic of the D3 force-directed simula• tion, explored in Section 2.2.2, we define the Graph component. Upon creation, it creates SVG elements from the data passed in, which then populate the HTML graph wrapper, marked in the HTML template. To move the graph around, we use a provided zoom and pan be• haviour from the D3 library. We also define some unique behaviour to some of the elements, such as raising an event to the parent compo• nent on clicking or hovering over a node or even moving the graph by pressing keyboard keys. This component is also responsible for calculating and drawing the deviation error discussed in Section 3.2.2.1.

4.1.5 User interface component

The User interface component is responsible for displaying the interac• tive buttons for camera controls, the settings menu, and the deviation error legend. All form elements use Angular Material components, which pro• vide us with valuable events that we immediately feed to the parent Home component.

27 4. IMPLEMENTATION

4.1.5.1 Settings component

The Settings component displays the available options below each other. When the user changes a setting, the component tells the Type- Script to update the settings object, which then propagates all the way back to the Home component.

4.1.6 Sidenav component

The Sidenav component is responsible for transferring the data from the Comparison child component to the Document child component, which are described in the following subsections. In addition to the child components, the HTML template also defines two buttons, where one closes the sidebar, and the other expands the sidebar to show the comparison screen.

4.1.6.1 Comparison component

We use the Comparison component to generate and display the word matches from the selected documents. This then allows us to populate the individual columns present in this component with the checkbox inputs, allowing the word match selection. The actual word match entry is split into its own small subcom• ponents for the convenience of the for loop present in the HTML template.

4.1.6.2 Document component

The Document component's sole goal is to format and display the text of the given document, highlighting the words selected by the user in the two word selection columns. The component does this by using a regular expression that matches each of the words and then promptly replaces them with a styled HTML tag.

4.2 Services

Services in Angular typically lift the burden of direct data manipula• tion from the components. They are also frequently used to pass data

28 4. IMPLEMENTATION to other unrelated components, where bubbling data through the tree would be seemingly impossible. This is usually done by creating a subject, which is a variable, that has an observable watching it, emitting to its subscribers every time it changes. The application uses three elementary services that are usually just responsible for passing the data between components.

4.2.1 Query service

The most frequently used service of the application is the Query ser• vice, which stores the parsed corpus data as a subject and allows the components to query it for information. It provides wrapper functions of the Query utility library, detailed later in Section 4.6.1, passing in the globally loaded corpus automatically. On creation, the service also instantiates a new web worker, de• scribed later in Section 4.5, and contains a function, returning the observable to initialise the graph data.

4.2.2 Loading service

The app utilises the Loading service to pass the current progress of a long-lasting function from within components all the way to the App component containing the loading user interface detailed in Section 4.1.1. This service defines three subjects: the loading stage, the loading percentage, and a variable that determines whether we are currently loading anything at all.

4.2.3 JSON validate service

Since we want to recognise if the user passed in valid corpus data, we need to validate the parsed data. This is the main objective of the JSON validate service, which uses a JavaScript Validator: a public library for validating objects.

29 4. IMPLEMENTATION

4.3 Pipes

In Angular, if we want to syntactically clearly and quickly transform some data to another, we can use a pipe. A pipe takes some input data and transforms it to another.

4.3.1 Escape HTML pipe

The Escape HTML pipe finds and replaces all the HTML unsafe sym• bols3 in a string with their safe, escaped variant.

4.3.2 Pair split pipe

The pair split pipe is used for splitting string pairs to their individual components. They are used primarily to pair up the soft matches and use them as keys in a dictionary, since JavaScript and TypeScript are not able to use immutables as keys in a dictionary, and JavaScript does not have tuples yet.

4.4 Corpus loaded guard

A guard is used to prevent the user from navigating to a route that they should not be able to access. The only used guard in the application is the Corpus loaded guard, which checks if the corpus is loaded in the Query service, and denies the user to access the route if it is not.

4.5 Graph data web worker

JavaScript generally runs in a single thread. Therefore, if we were to perform demanding computation, the browser could freeze up, making the web page completely unresponsive. The W3C defined a specification addressing this issue, allowing JavaScript to use threads running in the background. [39] The Graph data web worker is responsible for calculating the soft cosine measures between all pairs of documents at the beginning of

3. The &, <, >, " and ' symbols are generally considered unsafe, as they have syntactical meaning in HTML.

30 4. IMPLEMENTATION

Figure 4.2: The comparison of the darker, more displeasing naive average mixing (top) and the saturation- and lightness-preserving sRGB model mixing (bottom) the simulation while informing the Loading service of the progress of the computation. Implementing the web worker was a challenge since the web worker cannot import any modules, as most browsers currently do not support importing in workers. [40] This prompted a handful of rewrites in the code so that the worker does not require any imports.

4.6 Utility libraries

The utility libraries export several useful functions that are not neces• sarily bound to a class or a component. They help reduce the repeti- tiveness of the code, which makes it less error-prone and makes future refactoring easier.

4.6.1 Query utility library

The Query utility library defines a handful of functions for querying a corpus. It also defines most of the interfaces used throughout the application, namely the actual corpus interface.

4.6.2 Graph utility library

To define some general functions used throughout the application related to the drawing of components, we use the Graph utility li-

31 4. IMPLEMENTATION brary. This library defines a helper Color class that defines an sRGB model mixing method, which yields better-looking results than the regular naive averaging of the individual colour components, visually compared in Figure 4.2 on the preceding page.

4.6.3 Various utility library

Being the smallest utility library, the Various utility library defines some additional functions to reduce code repetitiveness.

4.7 Documentation

The source code is fully documented and commented, explaining the code in greater detail and the mental process behind each func• tion. The documentation is compliant with the proposed standard by Microsoft Corporation [41], where every function, field and class is briefly described. In addition to the in-code documentation, we have also provided an automatically generated explorable documentation web page. The link to this documentation web page can be found in Appendix A.

32 5 Conclusion

Textual data analytics in this computerised age is a widely utilised field of study, especially in the world of natural language processing. Creating open source interactive visualisation tools with modern web development technologies, such as the application designed and im• plemented in this thesis, helps to contribute to the ever-expanding field of machine learning and create even more excellent tools and algorithms. The primary objective of this thesis was to design and implement a tool that would allow us to explore and visualise the similarity of an extensive collection of documents and texts and to allow the comparison of two specific documents within this collection. We can view the finished application in Figure 5.1 on page 35. Both of these objectives were achieved in Chapters 3 and 4. A how-to guide for the actual application was created in addition to this thesis described in Appendix B. We can utilise the features of this application to aid the analysis and exploration of data of natural language processing research and to create visualisation figures for publications. The application's design was discussed and iterated on with the MIR MU team to their requirements and was subsequently imple• mented as a web application. Upon completion, a demo of the appli• cation was uploaded to the internet, link in Appendix A; this includes the source code as well, which was uploaded to GitHub, also linked in Appendix A.

5.1 Future work

Since the tool was designed in Angular, which has great scalability, and was made open source under the MIT licence, the tool can be easily expanded with new features. As it currently stands, the application is designed specifically for use in a laboratory; therefore, it can seem very inaccessible to a layperson. We could expand on this by implementing some hover- on hints or even explanation button hints in the user interface. This

33 5. CONCLUSION addition would tremendously improve accessibility and allow even a non-expert to operate this application. For the purpose of this thesis, we have only implemented cosine similarity as an allowed metric, but we might want to use a different similarity metric altogether. Therefore, the ability to switch between different similarity metrics would be a welcome addition. One other useful feature would be to store the documents temporar• ily for the current session, which would stop unnecessary repetitive computation of the same data on every reload of the web page. Another possible new feature could be the algorithmic categori• sation of the documents, which would group up multiple similar documents and colour them based on their category. More ideas come to mind, like a search bar, allowing to select a document based on its name, or even adding new documents on the go. We can see that the application is still in its infancy as we can get many new propositions on how to expand it, and it certainly has much work in front of it.

34 1803 (2586362 A. 9 'o Document 218800 Document 1496885 1496385 1496885 218800 my ii> ui yuui lyfJt? VIL CHNFINITY HOISUB OlSUP OISUB Oil NT NIO VIL UITIMES of integral is an integral such that the^^y,^,,^% nd is discontinuesU^L^U,,™ ^ in OiDIVIDE FISIN UITIMES VIT1 VIX UIPLUS OlSUP FIABS VIX VIA an intermediate point V!C if you type the right words you will A.10 Nil OfDIFFERENT/AL-D V!X +IOIUMIT OISUB OILIMIT FI—> VIL obtain right examples ask in your mind what words you may type CHNFINITY +101SUB OlSUP OISUB 01 INT VIL NIO UITIMES 0'DIVIDE in search in an internet search engine as you said we can split FISIN UITIMES VITI VIX UIPLUS OlSUP FIABS VIX VIA Nil UIEQ OlSUP OISUB OUNT Nil N!2 UIPLUS OlSUP OISUB OHNT Nil OIDIFFERENTIAL-D VIX since xheWWand is odd the Integrals NIO OlSUP OISUB OIINT NIO NI2 and use limits to solve the aji8^§o

0^Píl§§342 Exact matches Soft matches

mproper 2108% integrand • 4.58% © • integral ntegrand • 9.05% I <—1 integral ntegral • 6.99% • 3.67% © • integrand tlOILIMIT • 4.34% •—- converges • i 2.89% • OILIMIT • 4.14% convergent OIDIFFERENTL. 3.45% r—1 integrand 1 i 2.57% • integrals OIINT • 3.25% 1—- integrals i 1.08% • +IOISUB 1 2.59% integrand ntegrals 1 2.56% ,-| CHNFINITY 0 • i 0.66% m - 1—1 +I01SUB

Figure 5.1: The finished application with the comparison window opened and highlighted words

A The live demo and the source code

A public live demo of our tool is available online on the Aisa server at https://www.f i.muni.cz/~xpetr2/document-maps/

The source code of our tool is available on the following locations:

• On GitHub: https: //github. com/xpetr2/document-maps.

• Digitally attached to this thesis in the thesis archive.

The interactive documentation is also available on the Aisa server: https://www.fi.muni.cz/~xpetr2/document-maps/documentation/.

A.l Building the project

Upon cloning or forking the project, you can execute several com• mands to aid you with the manipulation of the project.

A.1.1 Installing the project

To install all the needed dependencies for the application you will require Node.js to be installed. If you have Node.js installed, then you can proceed to execute the following Node.js command in the root folder of the project: install This will install all the needed dependencies, required for the project to run and will create an environment for you to work with.

A.1.2 Starting the development server

In order to test and develop the application, without the need to build the project every time a change is made, you can use the development server, which automatically quickly builds the project for you. This will improve your workflow substantially. You can run the development server, by running the following command in the root folder of the project:

37 A. THE LIVE DEMO AND THE SOURCE CODE npm run start You can then navigate to the http: //localhost: 4200/ URL in your browser of choice, where the development server is running the ap• plication.

A.1.3 Building for deployment

To build and deploy the final project you can execute the following command: npm run build-prod This will build and compile the project into the final HTML, CSS and JavaScript files and place them into the dist folder in the project's root folder. You can then copy the contents of the dist folder onto an HTTP server.

A.1.3.1 Non-root path deployment

Beware, if you are planning to run the deployed program in a non- root path of the server URL, you are required to change the final index. file. Open the compiled index. html file, located in the dist folder, with a text editor and find the following line: You then have to replace the contents of the href parameter to match the deployed application's path. For example, if the application was deployed on a web page with the URL http://example.com/foo/bar/document-maps/, then the base tag would look like the following:

A.1.4 Generating documentation The project comes with the compodoc tool, which can quickly create an interactive web page, compiled from the TSDoc documentation contained in the code. 38 A. THE LIVE DEMO AND THE SOURCE CODE

To generate this documentation, you can run the npm run compodoc command, which will create the documentation web page and place it into the documentation folder, contained in the project's root folder.

39

B How-to guide

This additional chapter will briefly go over how to use the application.

B.l Importing the corpus

To proceed into the main application, you need to use a valid corpus. What counts as a valid corpus is explained in the documentation1 on the page Interfaces/Corpus. To import a corpus, you can click the Insert corpus as file button and proceed to select your corpus JSON file, or you can enter the corpus as a pasted text by clicking on the Insert corpus as JSON text. If you just wish to explore the tool, an example corpus is provided upon clicking the Use the example corpus button.

B.2 Navigating the map

To navigate the map, you can simply click and drag the map around and scroll up to zoom in or scroll down to zoom out. If you wish to move the map around without a mouse, you can do so using arrow keys and the '+' and '-' keys on your keyboard to zoom in and out, respectively. You can also use the on-screen user interface to centre or zoom the camera depicted in Figure B.l on the following page.

B.3 Selecting nodes

To select a document node, you can simply click on it. By default, this will colour all the other nodes depending on their deviation from the calculated soft cosine distance. You can refer to the deviation legend that appears. This legend displays an indicator whenever you hover over a node, as portrayed in Figure B.2 on the next page.

1. The documentation is available online: https://www.fi.muni.cz/~xpetr2/ do cument-map s/do cument at ion

41 B. HOW-TO GUIDE

If you desire to highlight multiple nodes, you can do so by holding the 'CTRL' key on your keyboard.

O Open Settings

© Center camera © Zoom camera in ii

0 Zoom camera out

Figure B.l: The on-screen camera controls of the application

Too far

Perfect

Too close

Figure B.2: The on-screen UI element displaying the deviation error

B.4 Viewing the document content

When selecting a document, you can view its contents on the right side of the screen. If you wish to view multiple documents at once, you can do so by selecting multiple nodes and then exploring them on the right side of the screen.

42 B. HOW-TO GUIDE

X Compare >

Document 1124686 1124686

frank you can use gauss algorithm even if modulo is not prime the only thing you need to take care is that multiplier should be co prime to modulo just keep multiplying denominator by number so that denominator is near till denominator become however the multiplier must

Figure B.3: The comparison button used to access the comparison screen

B.5 Comparing documents

To compare documents, you need to select precisely two documents and then click the Compare button in the top right, pictured in Fig• ure B.3. This then opens the comparison screen, where you can com• pare the documents in further detail.

B.5.1 Selecting words

If you click on a checkbox, shown in Figure B.4 on the next page, next to one of the matched words, you will select that word for highlighting in the documents displayed above. If you select the words from the exact matches column, the high• lights will be coloured in yellow, and if you select a match from the soft match column, the first word will be displayed in the left document in red, and the second will be displayed in the right document, also in red. You can further single out highlights by hovering over the word in the word selections columns.

43 B. HOW-TO GUIDE

Exact matches Soft matches

series • 17.67% 0! FACTORIAL 4.73% +!0!SUB • +10ISUB 7.00% I • +!0!SUB

value 5.78% U 1 1J9% • CÜNFINITY r~i vin • 5.60% — O'FACTORIAL

Figure B.4: The word selection UI, showing checked and unchecked words

B.6 Changing settings

You can change the settings of the tool by pressing the cogwheel button in the top left and tweaking the options that appear.

44 Bibliography

1. KIRK, Andy. Data Visualization: Representing Information on Mod• ern Web. Packt Publishing, 2016. ISBN 9781787129764. 2. MURRAY, Scott. Interactive data visualization for the web: an intro• duction to designing with D3 / Scott Murray. 1st ed. Sebastopol, CA, United States of America: O'Reilly Media, 2013. ISBN 978-1-449- 33973-9. 3. DRAGICEVIC, Pierre and JANSEN, Yvonne. List of Physical Visu• alizations [online]. 2021 [visited on 2021-05-17]. Available from: http://dataphys.org/list/. 4. NIGHTINGALE, Florence. Diagram of the Causes of Mortality [on• line]. 2020 [visited on 2021-05-17]. Available from: https : / / commons.wikimedia.org/wiki/File:Nightingale-mortality. JPg- 5. LEXICO. Definition of MAP by Oxford Dictionary [online]. 2021 [visited on 2021-05-11]. Available from: https : //www. lexico . com/definition/map. 6. KOBOUROV, Stephen. Spring Embedders and Force Directed Graph Drawing Algorithms. 2012. 7. BOSTOCK, Mike. Force-Directed Tree [online]. 2021 [visited on 2021-05-12]. Available from: https : //observablehq. com/@d3/ force-directed-tree. 8. CUTURA, R., HOLZER, S., AUPETIT, M. and SEDLMAIR, M. Vis- CoDeR: A tool for visually comparing dimensionality reduction algorithms. In: ESANN 2018 - Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. 2018, pp. 105-110. 9. SALTON, Gerard and BUCKLEY, Chris. Term-weighting approaches in automatic text retrieval. Information Processing and Management. 1988, vol. 24, pp. 513-523.

45 BIBLIOGRAPHY

10. GAINA, Anatol. What are the advantages of Euclidian distance and cosine distance [online]. Quora [visited on 2021-05-11]. Available from: https://www.quora.com/What-are-the-advantages-of- Euclidian-distance-and-cosine-distance-respectively. 11. SHARADA, R. Cosine Similarity [online]. 2020 [visited on 2021-05- 11]. Available from: https : //www.geeksf orgeeks. org/cosine- similarity/. 12. NATIONAL INSTITUE OF STANDARDS AND TECHNOLOGY. Cosine Distance, Cosine Similarity, Angular Cosine Distance, Angular Cosine Similarity [online]. NIST, 2021 [visited on 2021-05-20]. Available from: https : //www. itl.nist.gov/div898/software/ dataplot/refman2/auxillar/cosdist.htm. 13. SIDOROV, Grigori, GELBUKH, Alexander, GÓMEZ-ADORNO, Helena and PINTO, David. Soft Similarity and Soft Cosine Mea• sure: Similarity of Features in Vector Space Model. Computación y Sistemas. 2014, vol. 18, no. 3, pp. 491-504. ISSN 1405-5546. Available fromDoi: 10.13053/cys-18-3-2043. 14. NOVOTNÝ, Vít. Implementation notes for the soft cosine mea• sure. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. Torino, Italy: Association for Computing Machinery, 2018, pp. 1639-1642. 15. WATTS, Luke. Mastering Sass. Birmingham, UK: Packt Publishing, 2016. ISBN 9781785883361. 16. AGGARWAL, Sanchit. Modern Web-Development using ReactJS. International Journal of Recent Research Aspects. 2018, vol. 5, no. 1, pp. 133-137. ISSN 2349-7688. 17. RESIG, John. JQuery 1.0 [online]. 2006 [visited on 2021-05-12]. Available from: https : / / blog . j query . com / 2006 / 08 / 26 / -10/. 18. BENITTE, Raphael and GREIF, Sacha. State of JS 2020: Front-end Frameworks [online]. Japan, 2020 [visited on 2021- 05-10]. Available from: https : //2020 . stateofjs . com/en- US/technologies/front-end-frameworks/.

46 BIBLIOGRAPHY

19. FACEBOOK. React: A JavaScript library for building user interfaces [online]. 2021 [visited on 2021-05-12]. Available from: https : //reactjs.org/. 20. MRÁZ, Marcel. Component-based UI Web Development. Brno, 2019. Available also from: https: //is .muni . cz/th/zpb3k/. Bachelor thesis. Masaryk University, Faculty of Informatics. 21. MOHAPATRA, Pratap K.J. Software Engineering: A Lifecycle Ap• proach. New Delhi: New Age International, 2010. ISBN 978-81-224- 2846-9. 22. VAN GINNEKEN, Leon. Component based development in UI5 [online]. 2020 [visited on 2021-05-17]. Available from: https : // blogs.sap.com/2020/06/21/component-based-development- in-ui5/. 23. WORLD WIDE WEB CONSORTIUM. HTML & CSS [online]. 2016 [visited on 2021-05-10]. Available from: https : //www. w3. org/standards/webdesign/htmlcss. 24. WEB HYPERTEXT APPLICATION TECHNOLOGY WORKING GROUP. HTML Standard [online]. 2021 [visited on 2021-05-10]. Available from: https: //html. spec. whatwg. org/. 25. FLANAGAN, David. Javascript: The Definitive Guide. 6th ed. New• ton, United States of America: O'Reilly Media, 2011. ISBN 978- 0596805524. 26. . ECMAScript 2020 Language Specifica• tion [online]. 2020 [visited on 2021-05-10]. Available from: https: //262.ecma-international.org/11.0/. 27. JANSEN, Remo H, VANE, Vilic and WOLFF, Ivo Gabe de. Type- Script: Modern JavaScript Development. 1st ed. Birmingham, UK: Packt Publishing, 2016. ISBN 9781787289086. 28. MICROSOFT CORPORATION. TypeScript [online]. Redmond, Boston, SF & Dublin, 2021 [visited on 2021-05-10]. Available from: https://www.typescriptlang.org/. 29. TRONCONE, Brian. Reactive Extensions for JavaScript [online]. 2020 [visited on 2021-05-10]. Available from: https : / /www . learnrxjs.io/.

47 BIBLIOGRAPHY

30. OPENJS FOUNDATION. About Node.JS [online]. 2021 [visited on 2021-05-10]. Available from: https : //nodej s. org/en/about/. 31. GOOGLE LLC. Angular [online]. 2021 [visited on 2021-05-10]. Available from: https: //angular. io/. 32. GOOGLE LLC. Angular Material UI component library [online]. 2021 [visited on 2021-05-10]. Available from: https : //material. angular.io/. 33. WORLD WIDE WEB CONSORTIUM. Scalable Vector Graphics (SVG) 2 [online]. 2018 [visited on 2021-05-15]. Available from: https://www.w3.org/TR/SVG2/. 34. BOSTOCK, Mike. D3.js - Data-Driven Documents [online]. 2020 [visited on 2021-05-10]. Available from: https: //d3js . org/. 35. BOSTOCK, Mike et al. D3/d3-force: Force-directed graph layout using velocity Verlet integration [online]. GitHub, 2021 [visited on 2021- 05-10]. Available from: https ://github. com/d3/d3-f orce. 36. LEXICAL COMPUTING. Sketch Engine [online]. 2021 [visited on 2021-05-12]. Available from: https : //www. sketchengine. eu/. 37. KOCINOVÁ, Lucia. Interactive visualization methods for Sketch En• gine [online]. 2015 [visited on 2021-05-20]. Available from: https: //is .muni . cz/th/odmai/. Master's thesis. Masaryk University, Faculty of Informatics. Supervised by Barbora KOZLÍKOVÁ. 38. GOLDSTONE, Andrew. DFR-browser [online]. 2016 [visited on 2021-05-20]. Available from: http: //agoldst. github. io/df r- browser/. 39. WORLD WIDE WEB CONSORTIUM. Web Workers [online]. 2021 [visited on 2021-05-15]. Available from: https : //www. w3. org/ TR/workers/. 40. MOZILLA. Browser support - import - JavaScript [online]. 2021 [visited on 2021-05-15]. Available from: https : / /developer . mozilla . org / en - US / docs / Web / JavaScript / Reference / Statements/import#browser_compatibility. 41. MICROSOFT CORPORATION. TSDoc [online]. 2021 [visited on 2021-05-23]. Available from: https : //tsdoc. org/.

48