Classification Storage

A practical solution to file classification for information security

En praktisk lösning till fil klassificering för informationssäkerhet

Joël Sloof

Faculty of Health, Science and Technology Master thesis in Computer Science Second Cycle, 30 hp (ECTS) Supervisor: Leonardo A. Martucci, University of Karlstad, SWE Examiner: Karl­Johan Grinnemo, University of Karlstad, SWE Karlstad, June 6th, 2021

Abstract

In the information age we currently live in, data has become the most valuable resource in the world. These data resources are high value targets for cyber criminals and digital warfare. To mitigate these threats, information security, laws and legislation is required. It can be challenging for organisations to have control over their data, to comply with laws and legislation that require data classification. Data classification is often required to determine appropriate security measured for storing sensitive data. The goal of this thesis is to create a system that makes it easy for organisations to handle file classifications, and raise information security awareness among users. In this thesis, the Classification Storage system is designed, implemented and evaluated. The Classification Storage system is a Client–Server solution that together create a virtual filesystem. The virtual filesystem is presented as one network drive, while data is stored separately, based on the classifications that are set by users. Evaluating the Classification Storage system is realised through a usability study. The study shows that users find the Classification Storage system to be intuitive, easy to use and users become more information security aware by using the system.

Keywords

Data Classification, Information Classification, User­Driven Classification, Information Security Awareness

iii Sammanfattning

I dagens informationsålder har data blivit den mest värdefulla tillgången i världen. Datatillgångar har blivit högt prioriterade mål för cyberkriminella och digital krigsföring. För att minska dessa hot, finns det ett behov av informationssäkerhet, lagar och lagstiftning. Det kan vara utmanande för organisationer att ha kontroll över sitt data för att följa lagar som kräver data klassificering för att lagra känsligt data. Målet med avhandlingen är att skapa ett system som gör det lättare för organisationer att hantera filklassificering och som ökar informationssäkerhets medvetande bland användare. Classification Storage systemet har designats, implementerats och evaluerats i avhandlingen. Classification Storage systemet är en Klient–Server lösning som tillsammans skapar ett virtuellt filsystem. Det virtuella filsystemet är presenterad som en nätverksenhet, där data lagras separat, beroende på den klassificeringen användare sätter. Classification Storage systemet är evaluerat genom en användbarhetsstudie. Studien visar att användare tycker att Classification Storage systemet är intuitivt, lätt att använda och användare blir mer informationssäkerhets medveten genom att använda systemet.

Nyckelord

Dataklassificering, Informationsklassificering, Användardriven Klassificering, Informationssäkerhet Medvetenhet

iv Acknowledgements

I would like to thank and acknowledge all the people at Veriscan for introducing me to the field of information security. Their experience and guidance, motivated me to pursue a career in information security and inspired me to create the Classification Storage system.

My family has always supported me in my ventures and that includes this one. I would like to thank my parents and siblings for supporting me and my crazy ideas in life and in this project.

Special thanks goes to my fiance Matilda for her love, support and for always believing in me.

v Contents

1 Introduction 1 1.1 Thesis Goals and Results ...... 2 1.2 Methodology ...... 2 1.3 Ethics and Sustainability ...... 3 1.4 Thesis Outline ...... 4

2 Background and Related Work 5 2.1 WebDAV ...... 6 2.2 Middleware ...... 6 2.3 Bit Field ...... 7 2.4 Filesystem in Userspace ...... 7 2.5 Windows Shell Extension ...... 8 2.6 Sticky Policies ...... 8 2.7 Microsoft 365 ...... 9 2.8 Summary ...... 9

3 System Architecture 11 3.1 Classification Storage ...... 11 3.1.1 File Storage ...... 13 3.1.2 Server / Virtual Storage ...... 13 3.1.3 Clients ...... 13 3.2 Server Architecture ...... 14 3.2.1 Sabre/DAV ...... 15 3.2.2 Authentication Module ...... 15 3.2.3 Virtual Filesystem ...... 15 3.2.4 Server API ...... 16 3.3 Client Architecture ...... 16 3.3.1 Client (FUSE) ...... 17 3.3.2 Windows FUSE Module ...... 18 3.3.3 Classification Service ...... 18 3.3.4 Classification Module ...... 19 3.4 Summary ...... 19

4 System Implementation 20

vi CONTENTS

4.1 Server ...... 20 4.1.1 Sabre/DAV ...... 22 4.1.2 Virtual Filesystem ...... 23 4.1.3 Classification Plugin ...... 24 4.2 Client ...... 26 4.2.1 CSFS ...... 27 4.2.2 CSShellExtension ...... 29 4.2.3 CSDialogBox ...... 31 4.3 Summary ...... 32

5 System Evaluation 34 5.1 Usability Study ...... 35 5.1.1 Part One: Interview ...... 35 5.1.2 Part Two: Experiment ...... 35 5.1.3 Part Three: Post Experiment Interview ...... 38 5.2 Recruitment and Participants ...... 39 5.3 Ethical Considerations ...... 39 5.4 Data Analysis ...... 40 5.5 Results ...... 43 5.6 Limitations ...... 44 5.7 Summary ...... 45

6 Discussion 46 6.1 Classification Storage ...... 46 6.2 Evaluation Results ...... 47 6.3 Limitations ...... 48 6.4 Summary ...... 49

7 Conclusions 50

References 52

vii Chapter 1

Introduction

Throughout history, information has always been an important asset and with the invention of information technology systems (IT systems) and the digital evolution, these assets have evolved to becoming the most valuable resource in the world [37]. The rapid development and adoption of IT and communication systems have changed the world through a digital revolution leading to the current information age [7].

Organisations all over the world are constantly collecting and processing more and more data to analyse and use to their benefit [43]. With data being as valuable as it is and organisations and systems being interconnected across the world, data has become a high value target for cyber criminals and digital warfare [13, 20]. These threats require mitigation in the form of information security, legislation and laws. Legislation and laws are not only focused on the protection of data but also on regulating unlawfully acquired data, preventing organisations from breaching privacy laws and protecting users.

For organisations to be compliant with legislation and laws, adequate controls need to be implemented to ensure that sensitive data is secure at all times. Two examples of such controls are ISO 27001 Annex: A.8.2.1 Classification of Information and Annex: A.8.2.2 Labeling of Information [1]. A.8.2.1 is about securing information according to its significance. A.8.2.2 is about labeling information following a classification scheme or policy for an organisation. These types of controls can be challenging to implement on unstructured data such as Word­documents and other files. File classification and labeling of this type of unstructured data, enhances information security and governance [36].

Information security awareness is an important part of information security and information security awareness training has proven to enhance security for organisations [9]. Data classification tools can be used for file classification to enhance information security and comply with legislation and laws. Data classification tools and systems do not only improve information security and identifiability of data, but also raises awareness among users of their responsibilities in protecting the data [36].

1 CHAPTER 1. INTRODUCTION

This thesis explores if a system can be created, where users set classifications and labels when working with files directly. The system should also store files separately depending on classification levels.

1.1 Thesis Goals and Results

The goal of this thesis is to design, implement and evaluate a system that allows users to perform file classifications. The system should be able to store files separately based on classification to allow for appropriate security measures to be applied for each classification level. To users, this distributed storage system should be presented as one network drive for ease of use. By making users set classifications and/or labels on files before saving them to the network storage, the goal is to both achieve user driven classification and raise awareness of information security. From a user perspective, the system should be intuitive, easy to use and make it easy to follow company information security classification policies. A system that makes users more information security aware and makes it easy to follow company policies, should result in accurate classifications and a change in user behaviour to the benefit of information security. The system that is designed and implemented in this thesis consists of a server and a client that together create a virtual filesystem. This virtual filesystem makes users set a classification on a file when the file is saved. File interaction is sent to the server with its classification as metadata throughout the process. The server analyses the metadata and stores the classified file in a location matching the classification of the file. This allows for a higher level of security to be applied on files with classifications that require this. The system is called Classification Storage and achieves the goals set for the system. The Classification Storage system is evaluated through a usability study. The study shows that users find the Classification Storage system to be intuitive, easy to use and users become more information security aware by using the system.

1.2 Methodology

This thesis explores if a system can be created where users set classifications and labels when working with files directly. Can a system like this improve user awareness of information security and make it easier for users to follow company policies? Answering this question is accomplished by creating and testing the system. The thesis work is realised in three steps: design, implementation and evaluation.

2 CHAPTER 1. INTRODUCTION

Firstly, a system is designed that achieves the technical thesis goals. Secondly, the system architecture is used to develop the Classification Storage system. The final implementation of the Classification Storage system is slightly different from the System Architecture. These changes are found to be better solutions than the architecture describes and some modules were not implemented because of time restrains. Lastly the Classification Storage system is evaluated through a usability study. The usability study is performed through experiments and interviews with participants. The Classification Storage system is depending on users to set file classifications and tags. Therefore, the system is required to be intuitive and user friendly, and is evaluated using the Nielsen’s heuristics [27], which is commonly used to evaluate user interface design. Furthermore, one of the thesis goals is to evaluate, if the Classification Storage system makes users more information security aware. Those requirements and goals make a usability study the best method for evaluating the system. Participants gain experience using the system through experiments, followed by interviews that establish the findings of participants. Through this evaluation we can conclude if the created system improves user information security awareness and if participants find that this system would make it easier to follow company policies.

1.3 Ethics and Sustainability

In the thesis work, a system is created that is used to classify files and data, following organisational policies and rules. The process of file and data classification is an ethical dilemma, where policies can reflect social values [5]. The rules and policies that are applied to data classification, are created by humans with their judgement and social values [35]. This causes the ethical discussion surrounding data classification to be closely connected to ethical values in society. The Classification Storage system impacts society by supporting organisation to store sensitive data securely and making users more information security aware. This easy to used system could make it compelling for more organisations to start following information security standards. The Classification Storage system can be used as a tool to follow laws and legislation. This is however, very dependent on the way an organisation integrates the system. The Classification Storage system is designed to be customisable, to support organisational policies. If an organisation would want to store and classify data unethically, then this system could probably also be implemented and utilized to support that.

3 CHAPTER 1. INTRODUCTION

Information classification is a key part of information security, to protect sensitive data. Protecting data is not only important from an organisational perspective but also as a consumer. Evermore regulations such as GDPR are imposing legislation on data handling and storage, requiring organisations to classify their data. The motivation for creating the Classification Storage system, is to make it easier for organisations to comply with controls imposed by laws and legislation. Organisations can raise information security and help prevent incidents, by utilising the Classification Storage system.

1.4 Thesis Outline

This thesis focuses on the Classification Storage system that is designed, implemented and evaluated. First, the background is explained and discussed in Chapter 2. The system architecture is described in Chapter 3. From the system architecture, the Classification Storage system is developed. The implementation of the Classification Storage system is described in Chapter 4. The Classification Storage system is evaluated through a usability study that is described in Chapter 5. Discussions about the final system along with results and limitations are discussed in Chapter 6. Lastly, the conclusion of the thesis work and future works are covered in Chapter 7.

4 Chapter 2

Background and Related Work

This chapter discusses and explains concepts, solutions and related work, that are of importance to the system created in this thesis work. The system is a Client–Server solution implemented using different software solutions, techniques and protocols. Some of the software solutions, techniques and protocols require further clarification before they are discussed in the System Architecture and System Implementation Chapters 3 and 4. This chapter is organized as follows. Sections 2.1–2.5 introduce the communication protocol, techniques and framework used in the implementation of our solution. Sections 2.6 and 2.7 present the related work. The communication protocol WebDAV is explained in Section 2.1. This protocol is used for all communication between the client and the server. Sections 2.2 and 2.3 explain two techniques, middleware and bitfields. These are used in the implementation of the server. Sections 2.4 described a framework that is used to create a filesystem in the client and Section 2.5 explains what Windows shell extensions are and how the client integrates with Windows. Sections 2.6 and 2.7 describe two related solutions. These solutions both have similarities and partially provide the same functionality that the system proposed in this thesis aims to achieve. Sticky policies are described in Section 2.6 and Microsoft 365 is described in Section 2.7. Lastly, Section 2.8 summarizes the chapter.

5 CHAPTER 2. BACKGROUND AND RELATED WORK

2.1 WebDAV

SabreDAV is a PHP implementation of Web Distributed Authoring and Versioning (WebDAV) that is used for Client–Server communication [12]. The Sabre/Dav code is extended with a virtual filesystem layer. The combined codes runs on a web­server that functions as a file sharing server for a local network. WebDAV is a protocol that extends the Hypertext Transfer Protocol (HTTP). The WebDAV protocol allows clients to manage file properties, create files, create collections (folders) and implements a locking system to prevent collisions between clients [16]. This protocol is defined by the Internet Engineering Task Force (IETF), a non­profit organisation that develops internet standards [39]. WebDAV has a wide range of extensions that further enhance the capabilities of the protocol, such as access control, sharing, contact information sharing, extended collection management and many more [14, 15, 17, 38].

2.2 Middleware

For this thesis, middleware is used on the server to handle classification and tag metadata as described in Section 4.1.3. Middleware is software that is used as a middle layer between applications and is often used to manage data in distributed applications [31]. Through the use of middleware applications can for example, be made compatible by altering input and output data structures. Middleware can also be used to describe database connection software and is also often used with Web APIs as seen in Figure 2.2.1.

Figure 2.2.1: Middleware: The request and response both pass through the middleware before reaching the API and Client. This way, data can be restructured or certain behaviours can be achieved without the need to change the client or application API.

6 CHAPTER 2. BACKGROUND AND RELATED WORK

2.3 Bit Field

Bit Fields are data structures that represent a group of Boolean values that are also known as binary bit flags. Binary bit flags are used in this thesis when tags are stored and used as described in section 4.1.3.A 1 represents true and 0 represents false. These Boolean values are grouped together into a single integer as seen in Table 2.3.1

int Bit Value Boolean Value 0 0 false 1 1 true 2 10 true , false 3 11 true , true 4 100 true , false , false 8 1000 true , false , false , false

Table 2.3.1: Integer values represented as bit values and Boolean values.

This means that many Boolean values can be stored in one integer value. The value of a specific Boolean can be calculated with a simple bitwise operation as seen in Figure 2.3.1.

1 #define TAG_ONE 1 // 1 2 #define TAG_SEVEN 64 // 1000000 3 4 if ((flags & TAG_ONE) != 0) //tag one is set 5 if ((flags & TAG_SEVEN) != 0) //tag seven is set

Figure 2.3.1: Example code showing how to see if bit flags are set or not.

2.4 Filesystem in Userspace

The target audience for the Classification Storage client, is assumed to be Windows users and therefore a Windows implementation is decided on for our work. The Classification Storage client is developed with an implementation of the Filesystem in Userspace (FUSE) framework for Windows called WinFsp [42]. FUSE is a framework in the kernel with which the development of virtual file systems in userspace can be achieved [40]. In most operating systems, userspace is a part in memory where applications are executed, while kernelspace is a more restricted and privileged part of memory where mostly hardware drivers are executed. Filesystem code in an operating system is executed in the privileged kernelspace making customizing filesystems a challenging task. FUSE allows for non­privileged users to create virtual filesystem without the need to modify any kernel code. FUSE systems have been ported to many different operating systems [2, 10, 11, 42]. Allowing for the creation of a virtual filesystem following the FUSE principles and allows for easy porting of the project to other platforms that support FUSE.

7 CHAPTER 2. BACKGROUND AND RELATED WORK

2.5 Windows Shell Extension

Windows shell extension is used in the CSShellExtension of the client as described in Section 4.2.2. The handlers used in CSShellExtension are icon overlay handlers and a context menu handler.

The Windows shell is the GUI of Windows. The shell has a close relation to the Windows file explorer that is used to browse files and directories, launch the start menu and desktop, file type associations and many other Windows features [41].

Windows shell extensions are extensions to the Windows shell through registry entries and .ini files. These extensions can be seen as plugins to Windows file explorer. These extensions are often implemented by using shell extension handlers. These handlers are called before shell actions are performed giving them the chance to change the behaviour of the shell [25].

2.6 Sticky Policies

The sticky policy paradigm is an attempt to allow restrictions and permissions to be directly attached to data. Sticky policies were first proposed in 2002, and have since mainly been used in fields such as cloud computing and internet of things networks [18, 26].

The goals of sticky policies are to regulate permissions throughout the life cycle of the data and be able to distribute the data across multiple organisations following predetermined principles. This allows users to have more control over their data. The principles that are attached to the data would have strict rules as to how the data could be used and by whom. The rules that apply to the data would always need to be followed by the systems having access to the data, and this is one of the challenges with sticky policies. The paradigm only works if all the systems that have access to the data follow the privacy rules or the integrity of the entire system fails. This could be the reason that sticky policies have seen implementations in fields such as cloud computing and internet of things. These fields are relatively closed ecosystems with the control to enforce compliance with the policies and rules set on data [26].

With the use of sticky policies, it would seem a natural step to implement file classifications. These classifications could be attached by users and further encryption could also be implemented on those files that require this.

Sticky policies have many overlapping features with the goals of this thesis. However, the main difference in approach is that sticky policies requires the policies to be followed by any system that has access to the data. This thesis aims is to create a virtual file system that allows for classifications and tags to be set directly on a file. This is controlled by Client–Server setup that enforces no further restrictions on other systems using the data.

8 CHAPTER 2. BACKGROUND AND RELATED WORK

2.7 Microsoft 365

Microsoft 365, better known as Office 365 is an office software suite that provides a wide range of products and services. These products and services are mainly cloud­ based productivity solutions for businesses such as Word, Excel, PowerPoint and Outlook [23]. Microsoft 365 offers many features in regard to information protection and governance. These features range from manual labeling of files to automated pattern matching for classification. These features are possible through the use of cloud technologies and the relatively closed ecosystem that files live in, only interacting with software in the same suite or through the Microsoft Information Protection SDK [22, 24]. The sensitivity labels feature of Microsoft 365 allows labels to be set on files that are used in the office suite. These labels can be set to enforce restrictions, encryption and permissions on a per­file bases, achieving user­driven classification. The main difference between Microsoft 365 and the solution this thesis is exploring, is that Microsoft 365 relies on cloud techniques whereas the system in this thesis aims to achieve this in a distributed local storage solution. Furthermore Microsoft 365 is limited to the the office suite and the file­types associated with it.

2.8 Summary

This thesis aims to design, develop and evaluate a system that allows users to perform file classifications in a distributed storage system. This chapter explained a few software solutions, techniques and protocols that are used in the implementation of the Client–Server solution that was developed. Further more, two related solutions with similar functionality, were discussed, compared and shortcomings were identified. The Client–Server solution that is developed in this thesis makes use of the WebDAV protocol for its communication needs. Middleware is used in the server implementation to handle classification and tag metadata. The server also makes use of bit fields for storing and communicating tag metadata. The client is developed using a FUSE framework to create a filesystem. Windows shell extensions are used to integrate additional functionality to this filesystem in Windows to allow users to set classifications on files. Setting classification or labels on files can be achieved with Sticky policies or Microsoft 365. These solutions can partially achieve the same functionality as the thesis goals. Nonetheless, these solutions also have shortcomings that the system designed in this thesis aims to overcome. The sticky policies paradigm requires all connected systems to follow file policies and restrictions. Therefore, files in a system like that can only be used for purposes explicitly allowed by file policies, creating a closed environment. Whereas the Client–Server solution that is developed in this thesis aims to be an

9 CHAPTER 2. BACKGROUND AND RELATED WORK open environment allowing for other systems to interact with the data without further restrictions. Microsoft 365 offers user­driven file classification through labeling of files in there cloud­based productivity solutions. These labels can be used to enforce file classification restrictions, permissions and encryption. These features all rely on cloud techniques whereas this thesis aims to achieve these without dependencies of cloud techniques. The following chapter describes the system architecture that was found to achieve the goals of this thesis and overcome the previously mentioned shortcomings of other solutions.

10 Chapter 3

System Architecture

This chapter describes and explains the architecture that achieves the goals of this thesis. This work started during a previous course, Research Project in Computer Science (DVAE07). Most of the architecture was designed during that course, allowing for more time to implement the system during this thesis work. The goal of this system is to allow users to perform file classification in a distributed storage system. To the user this should be presented as one network storage for simplicity and ease of use. This system also aims to raise awareness of information security among users. This creates the demand for an easy to use and intuitive system to successfully involve users more closely with file classifications. Firstly, the entire system and its components are described in Section 3.1, along with design decisions that were made. Secondly and thirdly the server and client are described in more details in Sections 3.2 and 3.3. Lastly, the chapter is summarised in section 3.4.

3.1 Classification Storage

The architecture that is proposed to achieve the goals in this thesis consists of a Client–Server solution as seen in Figure 3.1.1. This section explains the role of the three main components of the system, how they communicate and the design choices that were made. After that the different components are described in more detail in Sections 3.1.1, 3.1.2 and 3.1.3. The server connects with storage solutions that represent the different desired classifications. These storage solutions are combined to one virtual drive and is shared over a local network. Clients can connect to the shared network storage and request classifications and tags from user input before writing to the virtual storage. The classifications and tags are sent to the server using metadata appended to the chosen protocol.

11 CHAPTER 3. SYSTEM ARCHITECTURE

Figure 3.1.1: Classification Storage Architecture. An overview of the Client–Server layout and connections. The Server creates a virtual storage from the multiple different storage devices and servers. This virtual storage is used by the clients to create a virtual file system for the users.

WebDAV, as described in Section 2.1, was analysed and decided on, to be used as communication protocol. WebDAV includes file locking and supports HTTPS for added security. Sabre/Dav is a WebDAV server software that can be used as a basis to work from where custom modules can be added. These modules can achieve the modifications needed to communicate classifications and tags between the clients and server. Different data transfer protocols that could be used in this design were analysed and compared. During the process of choosing a data transfer protocol, the following protocols and approaches were considered but discarded: • SMB protocol, • Secure Copy Protocol (SCP), • SSH File Transfer Protocol (SFTP) • rsync. The SMB protocol was looked at because of its implementation in Windows and because the assumed users of this product are Windows users. Some time was spent investigating a potential implementation using SAMBA with a custom virtual file system module [30]. This solution was ruled out because of the difficulty of implementation and licensing.

12 CHAPTER 3. SYSTEM ARCHITECTURE

The following network file copy protocols were considered: Secure Copy Protocol (SCP), SSH File Transfer Protocol (SFTP) and rsync [29, 32, 33]. These protocols all have the same shortcomings in relation to this project. These shortcomings can be summarised to requirements of handling multiple clients at the same time and these protocols do not handle file locking to prevent collisions.

3.1.1 File Storage

The file storage servers or drives are separated by classification. These different classifications are determined by the organisations requirements and are not part of this research. The File storage servers share a network drive with the server or local drives are used for classifications. This way the system guarantees that files with different classifications are stored separately. This is desirable, because different classifications can require different levels of security.

3.1.2 Server / Virtual Storage

Files of different classification levels are stored on different file storage solutions, that are connected to the server. These storage solutions are mounted and used to store and get files of the corresponding classifications. The server creates a virtual file system combining all the connected storage solutions. This virtual file system contains all the files of different classifications with additional metadata of the classifications, tags and restrictions of the file. Clients that connect to the virtual storage are required to send classification metadata with requests, to ensure that the file being processed is stored on the proper level of security following its classification. All requests also require user authentication data to ensure that only allowed files or levels of classification are accessible. The connection protocol used is WebDAV with additional metadata in the client requests and server responses.

3.1.3 Clients

Clients connect to the virtual storage with custom client software using an extended WebDAV protocol. This client makes use of custom metadata to pass different classification data to the server. The connections to the virtual storage is based on the authorisation level of the user, where only certain roles can acquire read and write rights for a given classification level. The client is designed for Windows because the target audience for this system is assumed to be Windows users.

13 CHAPTER 3. SYSTEM ARCHITECTURE

3.2 Server Architecture

The server architecture is dependent on four main modules that run on a web server as seen in Figure 3.2.1. This section first describes how the modules are connected and design choices. After that the different modules are described in more detail in sections 3.2.1, 3.2.2, 3.2.3 and 3.2.4.

Figure 3.2.1: Server Architecture. All server modules and how they interact within the server. An Apache web server is the basis for the server software. On the web server the Sabre/DAV WebDAV software is running with three additional modules that together handle the all logic and communication with the clients.

Ubuntu 20.04 was selected as the server OS because of its high customizability and the authors preexisting Linux server knowledge [6]. Apache is used as the web server. The web server implements the HTTP protocol in combination with many different modules that expand its functionalities. For our work the mod_php module is used to support the PHP server­side programming language required by Sabre/DAV.

The server connects to the local and network storage solutions using tools available in the operating system. The different storage locations represent the different classifications and are set in a config file for the virtual filesystem. The Sabre/DAV module is installed on top of the Apache web server and an authentication module is added. A virtual filesystem module and server API are created using the Sabre/DAV framework to implement the WebDAV protocol with extended functionality to allow file classification and tag metadata. The server API is used to handle all client communications.

14 CHAPTER 3. SYSTEM ARCHITECTURE

3.2.1 Sabre/DAV

The Sabre/DAV module is a PHP implementation of the WebDAV server protocol. This module functions as the framework for all the Client–Server communication. The following three modules are extensions to this WebDAV server implementation.

3.2.2 Authentication Module

The authentication module authenticates clients and displays the correct data according to the classification and authentication level that the client is connected with. The server has configurable features, e.g., file accessibility levels and requests for temporary elevated accessibility. Higher classification level files that are not permitted to be read by certain clients are displayed as empty (0 byte) files as seen in Figure 3.2.2. These empty files have an option to gain temporary access after a request is granted by the data owner as seen in Figure 3.2.3.

Figure 3.2.2: Get File Access Flow. First, the request is checked for access through authentication. If the user sending the request is authorized, the requested file is returned. For unauthorized users the tag file is checked if the user has a temporary granted access tag. Users with temporary access receive the requested file and users without access receive an empty file.

3.2.3 Virtual Filesystem

The virtual filesystem module controls where a file is stored and what classification applies to a file that is read. All the storage locations that are connected are merged into one virtual file storage, where all files have classification and tag metadata. All the file operations that a client can perform on the virtual file storage are handled in the virtual filesystem.

15 CHAPTER 3. SYSTEM ARCHITECTURE

Figure 3.2.3: Temporary Access Request Flow. The temporary access request is send to the server from a user that needs access to a file that is above its authorization level. This request is send to the Data Owner by Email. The Data Owner can add a temporary user token to the requested file’s tags to allow temporary access.

• Read operations make sure that the file is read from the correct path on the correct storage location and returned with the corresponding classification and tag metadata. • Write operations make sure that files are written to the correct storage location matching the file classification. • Classification changes are translated to moving files from one storage location to another.

3.2.4 Server API

The server API module uses the WebDAV protocol and extends its functionality to communicate the classifications and tag metadata to the clients. The clients connect to the server API with file operation requests. After getting verified by the authentication module, the file operation requests are handled by the virtual filesystem module that uses the server API to reply to the client requests.

3.3 Client Architecture

To achieve the desired functionalities, the client makes use of a FUSE implementation for Windows, along with three other modules to create a virtual filesystem as seen in Figure 3.3.1. Clients connect to the server API using the Client (FUSE) module and user authentication data is used to achieve a secure connection. The Client (FUSE) module uses a Windows FUSE Module, as described in Section 2.4. The Client (FUSE) module creates and presents the virtual filesystem in an integrated way that Windows can use and interact with it.

16 CHAPTER 3. SYSTEM ARCHITECTURE

The Classification Service Module runs as a Windows service that enables classification functionality in Windows File Explorer. These functionalities allow users to set classifications and tags by right­clicking a file and calling functions from the Classification Module. The following sections explain the different modules in further detail.

Figure 3.3.1: Client Architecture. All client modules for a Windows client. The client uses a Windows FUSE module to create a virtual filesystem for Windows with data from the server. A classification module and service create a Windows integration that allows classifications and tags to be set on files in the virtual filesystem.

3.3.1 Client (FUSE)

The Client (FUSE) module handles connections to the server using the WebDAV protocol with an extension that allows for classification and tag metadata to be appended. This communication is made secure through authentication and secure communication protocols. The module is be able to handle all communication that is expected of a network storage system. The classification and tag data that is needed, is retrieved from the Classification Module if they are not already attached to the request when this module receives it.

An example of how the Client (FUSE) module handles file requests:

1. A file operation is requested and no classification is set.

2. The request is paused.

3. The Classification Module is called to set a classification.

4. The classification metadata is appended to the request.

5. The request is sent to the server API.

17 CHAPTER 3. SYSTEM ARCHITECTURE

3.3.2 Windows FUSE Module

The Windows FUSE Module is used to create a virtual filesystem in userspace. The module has hooks into the Windows OS allowing the regular Windows file explorer to present the virtual filesystem. By using this implementation, other programs can interact with the files in the virtual filesystem as if they were local files.

3.3.3 Classification Service

The Classification Service is an active Windows service that enables classification features on files that are in the virtual filesystem presented by the Windows FUSE Module. These features set icon overlays and add functions to the right­click menu for the files as seen in the mockup Figure 3.3.2. The icon overlays on files making it visually easy to distinguish different classifications. The right­click menu is extended with classification and tag functionality that correspond with functions in the Classification Module.

Figure 3.3.2: Mockup of client implementation where file classifications are displayed and an easy right­click classification change feature is displayed.

18 CHAPTER 3. SYSTEM ARCHITECTURE

3.3.4 Classification Module

The Classification Module handles classification and tag functionality for all file operations that could change classifications or tags. This includes both requests from the operating system through the Client (FUSE) module and manual requests from the Classification Service. Requests from the operating system are verified to confirm that classifications are set and that the current user has the privilege to do so. Valid requests continue to the server while invalid requests are blocked. For blocked requests, the user is prompted with the classification options that apply for the file and is forced to set a classification before the request is allowed to continue. When users need to add or change classifications and tags, this module provides users with the available classification options on the server and excludes options that policies of selected tags require. For example: files with the Company Contracts tag can not be saved with a Public classification, requiring Confidential classification. These policies are configured on the server and the client requests these settings on connecting. All changes made, are added to requests as classification and tag metadata and sent to the server API using the Client (FUSE) module.

3.4 Summary

In this chapter, the overall architecture of the Classification Storage system was outlined. The described Client–Server architecture aims to achieve the thesis and system goals. The server hosts a virtual filesystem based on a virtual storage that is a combination of storage locations that represent different classification levels. Clients connect to the server using an extended WebDAV protocol, allowing for classification and tag metadata to be communicated. The client makes use of FUSE to create a virtual filesystem in userspace allowing for a seamless windows integration. All the main modules in both the client and server are described in detail and design choices are explained. The implementation of this architecture is described and discussed in the next chapter.

19 Chapter 4

System Implementation

In this chapter, the Classification Storage system is described as it has been implemented for this thesis. The system is developed largely following the architecture described in Chapter 3. For the Authentication Module and the client GUI, modifications are made that proved to be better solutions or needed to be down scaled because of time restrains. The Authentication Module on the server is not implemented because this is to time consuming for this thesis. For the implementation of the client, the changing of file classifications is improved by utilizing a more user friendly GUI instead of the right­click menu described in Section 3.3.3.

This chapter starts with describing the server implementation in Section 4.1. The server implementation is described through the three main components, Sabre/DAV, Virtual Filesystem and Classification Plugin. Following the server description the client implementation is described in Section 4.2. The client is described in three sections, CSFS, CSShellExtension and CSDialogBox. Lastly, the chapter is summarized in Section 4.3

4.1 Server

The implementation of the Classification Storage server runs in a Linux environment on a web server. The server is created as a Virtual Machine (VM) in VirtualBox [28]. Ubuntu 20.04 is chosen as server OS and for the web server Apache is decided on, as explained in Section 3.2. The following versions were used in the server Environment.

20 CHAPTER 4. SYSTEM IMPLEMENTATION

Server environment Virtualization Software VirtualBox 6.1.16

Operating system Linux 5.4.0-67-generic Ubuntu 20.04.1 LTS

Web Server Apache/2.4.41 (Ubuntu) PHP 7.4.3

Figure 4.1.1: Server Implementation. The figure illustrates implemented server modules and how they interact within the server. An Apache web server is the bases for the server software. On the web server the Sabre/DAV webDAV software is running with two plugins that together handle all logic and communication with the clients and the different storage locations.

The server is designed using four main modules, Sabre/DAV, Virtual Filesystem, Server API and the Authentication Module. Because of time constrains the Authentication Module was not implemented and is an item of future work for this thesis. The Server API module was originally intended to handle all the connections. This would have created a space where the classification and tag metadata could be extracted and used to manage the way files are stored. During implementation this is found to not be the best solution because the Sabre/DAV framework proved more robust and well suited. Instead this logic is implemented as the Classification Plugin that extends the Sabre/DAV framework, as seen in Figure 4.1.1.

21 CHAPTER 4. SYSTEM IMPLEMENTATION

The Sabre/DAV framework handles much of the server side functionality as it manages the communication with clients through the WebDAV protocol and it connects the Virtual Filesystem and the Classification Plugin. The implemented structure is presented in Figure 4.1.1. Sabre/DAV, Virtual filesystem and Classification Plugin are further described in the following sections.

4.1.1 Sabre/DAV

Sabre/DAV is a framework for a WebDAV server that runs in a PHP web server environment. Sabre/DAV has many different features and plugins that can be utilized. One of the main features that makes Sabre/DAV a good fit for this project is a plugin that achieves file locking. File locking is a necessary feature when multiple clients want to access the same file. The plugin handles the conflicts that can arrive, preventing potential data loss. The framework also allows for middleware to be added, making it possible to intercept and manipulate server requests and responses. The Sabre/DAV framework includes a fully functional WebDAV server following the IETF defined WebDAV protocol as described in Section 2.1. The server is originally initiated with a root directory that maps to a directory on the server as seen in Figure 4.1.2. That root directory is shared using the WebDAV server software, allowing any WebDAV client to connect and edit the files in the directory.

1 $rootDirectory = new DAV\FS\Directory('public'); 2 $server = new DAV\Server($rootDirectory);

Figure 4.1.2: PHP code of the Sabre/DAV WebDAV server initiation with a single directory as root directory.

The Classification Storage server requires multiple file storage sources to be combined into one virtual filesystem. This can not be achieved with the single root directory approach shown in Figure 4.1.2. This is solved by creating custom Directory, File and Root classes that combine into a virtual filesystem, which is further explained in the Virtual Filesystem section.

1 $lockBackend = new DAV\Locks\Backend\File('data/locks'); 2 $lockPlugin = new DAV\Locks\Plugin($lockBackend); 3 $server->addPlugin($lockPlugin); 4 5 $csClassificationPlugin = new CSClassificationPlugin(); 6 $server->addPlugin($csClassificationPlugin);

Figure 4.1.3: PHP code of the lockPlugin and the csClassificationPlugin that are added to the Sabre/DAV WebDAV server.

22 CHAPTER 4. SYSTEM IMPLEMENTATION

Plugins are added to the server after the server is initiated. The important plugins are the lockPlugin and the csClassificationPlugin, as seen in Figure 4.1.3. The lockPlugin prevents file operation collisions between multiple clients on the WebDAV server. The csClassificationPlugin is a custom middleware to handle classification data and is further explained in the Classification Plugin section.

4.1.2 Virtual Filesystem

The virtual filesystem is created in three main classes. A directory class, file class and a root class. The Sabre/DAV chapter explained that the server is initialised with the a root directory. This root directory maps a location on the server to instances of the directory and file classes. These classes together are the filesystem that is shared to clients. The root class is used to allow for multiple locations to be merged together into one virtual filesystem. The root paths of all the different desired classification levels, and an enumerate classification value, are defined as CSRoots as seen in Figure 4.1.4. These roots are used to always store data according to their classification and are used in CSDirectory. The CSDirectory is used to initialize the server with a virtual filesystem instead of a regular singular filesystem location.

1 $Public = new ClassificationStorage\CSRoot('public/A', 1000); 2 $Internal = new ClassificationStorage\CSRoot('public/B', 2000); 3 $Restricted = new ClassificationStorage\CSRoot('public/C', 3000); 4 $Confidential = new ClassificationStorage\CSRoot('public/D', 4000); 5 $Secret = new ClassificationStorage\CSRoot('public/E', 5000); 6 7 $csRoots = new ClassificationStorage\CSRoots([ 8 $Public , 9 $Internal , 10 $Restricted, 11 $Confidential, 12 $Secret 13 ]); 14 15 $rootDirectory = new ClassificationStorage\CSDirectory('', $csRoots); 16 17 $server = new DAV\Server($rootDirectory);

Figure 4.1.4: PHP code of the initialisation of CSRoots that contains the root directory paths and classification values for the different classification levels.

CSDirectory holds most of the important functions that a filesystem needs to function properly. This is where requests from an OS or in this case WebDAV client eventual come to manipulate files and data. The CSDirectory class features functions like getChildren, getChild, createFile, createDirectory, setName and delete. All these functions are written with a strong connection to the CSRoots class as they need to perform the file operations on any storage location dependent on file classification.

23 CHAPTER 4. SYSTEM IMPLEMENTATION

Figure 4.1.5: The left figure shows the file locations divided over the different classifications represented by A, B, C, D and E. The Right figure show the virtual files as presented to the clients.

An example of the virtualisation is the getChildren function that not only returns all children from a single location, but appends the children from all different root locations into one list. All functions are taking the multiple root paths into consideration when executing, always following the set classifications creating a virtual filesystem as seen in Figure 4.1.5. CSFile is used to handle all the file specific operations such as writing data, reading data and changing a filename. The CSFile instances are created in the createFile function in the CSDirectory class. These objects represent the actual files that are stored on any of the different storage locations. All files that are represented this way have a classification value on them that is based on their root path. New files that are created through the client will have classification data that is used to determine where a file should be created. The classification data is parsed from client requests in the Classification Plugin and passed to the CSDirectory and CSFile classes.

4.1.3 Classification Plugin

The Classification Plugin is a middleware that holds functions that execute on different types of request and responses. The main functionality of this middleware is to handle all classification and tag data to and from clients. Not all operations require classification or tag data to be send in the request to perform the operation. The middleware only activates on operations that require classifications.

24 CHAPTER 4. SYSTEM IMPLEMENTATION

The following operations require classification or tag data:

• CreateFile: creates a file and requires a classification to set the correct file path.

• PropFind: requests that expect information about a certain file or directory to be returned, including classification and tag metadata.

• Move: when moving a file to a new location, tags need to be copied with the original file.

• ChangeClassification: when a client changes classification for a file, the file needs to be moved to a different root path.

When a CreateFile request is received, the middleware will extract classification and tag data from the request header as seen in Figure 4.1.6. The data is passed to the CSDirectory where a CSFile is created. Along with the creation of the file a tags file is created that holds the tag data. Tags are stored as a bit field structure in a separate file next to the original file. These tag files have the same name as the original file, but with a .cstags appended to it.

1 $csclassification = $httpRequest->getHeader("csclassification"); 2 3 $cstags = $httpRequest->getHeader("cstags");

Figure 4.1.6: PHP code where classification and tag metadata is extracted from the httpRequest headers.

PropFind requests are among the most common requests in this system and they are used to inform clients about properties of the requested file or directory. The middleware checks what file the request is for and determines its classification and tag values. These values are added to the response header and send to the client, as seen in Figure 4.1.7.

1 $propfind->set('{DAV:}csclassification', $csvalue);

Figure 4.1.7: PHP code where the header value for propfind is set.

The Move operation and ChangeClassification are closely connected as changing a classification for a file requires the file to be moved to the root path of the new classification. Because of this, a special header is sent when a classification is changed. If the cschangeclassification header is set, the middleware moves the file to the new location corresponding to the new classification that is provided. After the file is moved the tags file is moved to the same location, whether the cschangeclassification header is set or not.

25 CHAPTER 4. SYSTEM IMPLEMENTATION

4.2 Client

The Classification Storage client is implemented in a Windows environment as explained in Section 3.1.3. The Windows environment where the client is installed and tested is running as a VM just as the server is. The client environment has the following specifications.

Client environment Virtualization Software VirtualBox 6.1.16

Operating system Windows 10 Enterprise Evaluation Version 20H2 OS build 19042.867

The client consists of three main parts: Classification Storage FileSystem (CSFS), CSShellExtension and CSDialogBox. Because this is a Windows client the choice was made to implement the client in the C Sharp programming language. • CSFS is a filesystem that communicates all file operations with the server through an extended WebDAV protocol. CSFS is the implemented counterpart of the Client (FUSE) module and the Classification Module that is described in Sections 3.3.1 and 3.3.4. • CSShellExtension is a collection of Windows Shell Extensions that handle overlay icons and adds classification options when right­clicking a file in CSFS. CSShellExtension is the implemented counterpart of the Classification Service module as described in Section 3.3.3. • The CSDialogBox is a popup window that lets users set tags and a classification through a Graphical User Interface (GUI). This is used by both CSFS and CSShellExtension to let users perform classifications when these are required by file operations.

26 CHAPTER 4. SYSTEM IMPLEMENTATION

4.2.1 CSFS

CSFS is the main program in the client. CSFS creates and mounts a virtual network storage that is mounted to a volume in Windows, as seen in Figure 4.2.1. the virtual network storage has all its file operations connected to the server through an extended WebDAV protocol implementation.

Figure 4.2.1: CSFS mounted as network drive on volume W: in Windows.

CSFS uses WinFSP to create a custom filesystem [42]. WinFSP functions as the Windows FUSE Module that is described in Section 3.3.2. WinFSP is used by CSFS to integrate the custom filesystem with Windows, allowing data to be manipulated through the standard Windows file APIs. WinFSP provides a framework that can create a virtual filesystem that is able to be mounted as a network storage. The framework also provides a base class called FileSystemBase. This base class provides hooks into Windows API and CSFS inherits this. By doing so the CSFS class is able to handle all filesystem operations that applications send through Windows. The following custom functions are defined in CSFS, together creating the virtual filesystem.

• CanDelete • GetVolumeInfo • Rename • Cleanup • GetSecurityByName • SetBasicInfo • Close • Open • SetFileSize • Create • Overwrite • Write • Flush • Read • GetFileInfo • ReadDirectoryEntry

27 CHAPTER 4. SYSTEM IMPLEMENTATION

One of the unique features of CSFS is the requirement for all files to have a classification set at all times. This means that no file can be created in this system without the user being required to set a classification. This is handled in the Create function by pausing the current create process and launching CSDialogBox. User input determines the classification of the file and is returned to the paused process. After a classification is set the paused process continues creating the file and if no classification is set the process is aborted. There are two main components that most of the functions are dependent on, CSFileNode and CSWebDAV. The CSFilenode is a file node component that can ether represent a file or a directory in the filesystem. The CSWebDAV is a custom WebDAV client that is used to connect to the server and communicate all the file operations and share data.

CSFileNode

The CSFileNode class is used to send data between different functions and is one of the core building blocks of the virtual filesystem. When Windows requests information about a certain file or directory, the CSFileNode is the object in the filesystem that contains this information. The class has information such as, nodeName, nodeUri, isCollection, fileInfo and CSClassification. This information is retrieved from the server when, for example a directory lists all the files it contains. The different file operations use this information to properly send requests to the server to execute operations.

CSWebDAV

To communicate with the server an extended WebDAV protocol is used and CSWebDAV is the class that implements this. The base WebDAV client that is used is called WebDav.Client and is a widely used WebDAV client implemented following the official WebDAV standard RFC4918 [16, 34]. The following functions in the WebDAV protocol are used in CSFS. • Delete: Deletes a file or directory. • GetRawFile: Retrieves file data. • Mkcol: Creates a directory. • Move: Moves a file or directory. • Propfind: Retrieves file or directory properties. • PutFile: Writes data to a file. CSWebDav extends the base WebDAV client with features to allow classification and tag data to be communicated to the client. The headers for the different functions were extended to allow for classification and tag data to be send and parsed, as seen in Figure 4.2.2.

28 CHAPTER 4. SYSTEM IMPLEMENTATION

1 PutFileParameters putParam = new PutFileParameters 2 { 3 Headers = new List> 4 { 5 new KeyValuePair( 6 "csclassification", 7 node.classification 8 ), 9 new KeyValuePair( 10 "cstags", 11 true.ToString() 12 ), 13 } 14 };

Figure 4.2.2: C Sharp code of classification and tag data being added as headers for a PutFile request.

4.2.2 CSShellExtension

CSShellExtension is a Windows Shell Extension as described in Section 2.5. This shell extension is created with a shell extension library called SharpShell [8]. The shell extension handles icon overlay and a right­click context menu. These features are only active on files that are stored on CSFS.

Icon Overlay

The icon overlay is implemented for every classification level that the system is designed for. The system was designed for five different levels, with every level having an overlay icon associated with it as seen in Figure 4.2.3. To implement and integrate these overlay icons with Windows, three functions need to be defined, CanShowOverlay, GetOverlayIcon and GetPriority. CanShowOverlay is responsible to determine if an overlay icon should be displayed or not. The extension communicates with CSFS to determine the classification level of the file being processed and returns true for the icon that matches that level. The GetOverlayIcon returns the determined icon from the extension resources that are loaded into the Windows register when the extensions is installed. Lastly GetPriority is required to settle any conflicts if multiple icons are returning true from CanShowOverlay. These five icons have been given priorities 11 for I, 12 for II, 13 for III, 14 for IV and 15 for V. This way the higher classification is selected in case of a conflict.

Figure 4.2.3: I = Public. II = Internal. III = Restricted. IV = Confidential. V = Secret. Overlay icons 1­5 as presented in CSFS.

29 CHAPTER 4. SYSTEM IMPLEMENTATION

Figure 4.2.4: Context Menu with Change Classification entry, as presented in CSFS.

Context Menu

The context menu adds one entry to the right­click menu of files on CSFS. This entry is “Change Classification” and selecting this option will open the CSDialogBox for the file that is right­clicked on, as seen in Figure 4.2.4.

This is a slight change in implementation from the original design. The original design has two entries, one for classification and one for tags. The thought was to change classifications and tags through this menu only. This was changed during implementation to enhance usability and cooperation between tags and classifications, and is discussed in more detail in Section 4.2.3.

The Change Classification entry is connected to a function that performs four tasks. Firstly, the function connects to the Classification Storage server using CSWebDAV as described in Section 4.2.1. Secondly, a CSDialogBox is executed as a new process and the current process is paused to wait for the return value of CSDialogBox, as seen in Figure 4.2.5. Thirdly, the return value of CSDialogBox is send to the server with a cschangeclassification request. The cschangeclassification request moves the file from its previous classifications storage location to the location that matches the new classification. Lastly the folder of the changed file is refreshed to display the new classification level.

30 CHAPTER 4. SYSTEM IMPLEMENTATION

1 string arguments = $"\"{filePath}\" {classification}"; 2 3 Process process = new Process(); 4 5 process.StartInfo.FileName = CSDIALOGBOXPATH; 6 process.StartInfo.Arguments = arguments; 7 process.Start(); 8 9 process.WaitForExit(); 10 11 exitcode = process.ExitCode;

Figure 4.2.5: C Sharp code of Change Classification function to execute CSDialogBox as new process and wait for return value.

4.2.3 CSDialogBox

The CSDialogBox is a small application that is used as a popup dialog box where tags and a classification can be set by users. The CSDialogBox has a minimal GUI with only tree pieces of information, file name, tags and classification, as seen in Figure 4.2.6. This design originally only intended to be used to set classifications and tags in CSFS when a new file is created. The original architecture for changing classifications and tags was to be implemented as an interactive context menu when right­clicking file in CSFS, as seen in Figure 3.3.2. The right­click menu is changed to also implement the CSDialogBox to create unity in setting classifications and because we found the CSDialogBox to be more intuitive and easy to use. A user is prompted with this application when a new file is created on CSFS or when a user wants to change a classification through the right­click context menu. A filename and current classification (if available) are used as parameters when launching the application and these are displayed to the user. Users have three paths through the application: 1. Selecting the desired classification directly and clicking Save. This returns the new classification value for the file. 2. Selecting tags that match the content of the file. This excludes certain classification levels based on the requirements that the selected tags are associate with. (An example can be seen in Figure 4.2.6, where the Public classification level is excluded because one of the tags selected requires Internal or higher storage.) The minimal requirements are determined using tags and the user can decide if the minimal classification is sufficient or if they want to select a higher classification. The classification and tags are returned when the user clicks Save. 3. The user can click Cancel, in which case the application returns ­1.

31 CHAPTER 4. SYSTEM IMPLEMENTATION

Figure 4.2.6: CSDialogBox GUI. Multiple tags are selected and there requirements cause the Public classification level to be disabled. The Internal classification level is selected.

Both CSFS and the context menu handle the return values the same way. First, the value is compared against ­1, if that is the case than the classification is not successfully and the current thread is aborted. In the case of CSFS, this mean that a file is not created. In the case of the context menu, the file keeps its previous classification and tag values and no changes are made. When the application returns a classification and tags, this metadata is sent to the server and used to store the file in the correct location and stores the selected tags alongside it.

The CSDialogBox is implemented dynamically and most aspects, like classifications and tags, can easily be changed through the config file that is loaded when the application is started. The config file holds the tags and requirements that each tag has, along with the number of classification levels and the names associated with them.

4.3 Summary

In this chapter the implementation of the Classification Storage system is described. The implementation consists of a custom Client–Server configuration. The server implementation consists of a PHP webserver that runs Sabre/DAV with a virtual filesystem and a classification plugin. These components together share a virtual filesystem with clients through an extended WebDAV implementation. The client implementation consists of three parts, CSFS, CSShellExtension and CSDialogBox. CSFS is a virtual filesystem that connects and communicates to the server and displays that virtual filesystem to the user as a network drive in Windows. The

32 CHAPTER 4. SYSTEM IMPLEMENTATION

CSShellExtension enables overlay icons and options to change classification for all files in the CSFS. All files that are saved to the CSFS are required to have a classification associated with them, and this is set through the GUI CSDialogBox. The next chapter will explain how the system is evaluated in this thesis.

33 Chapter 5

System Evaluation

This chapter explains the method and process that is used to evaluate the system. The method used to evaluate the system is a usability study. This method was chosen because of the following goals of the system; to raise information security awareness and to create an intuitive and user friendly system. The usability study aims to evaluate the system on the following points:

• User awareness of information security

• Potential changes in user behaviour toward information security

• User interface design

The usability study was performed through experiments and interviews with 10 participants. These participants followed a set of instructions where the system is used. After fulfilling the tasks in the instructions, a post experiment interview was performed with the participants. The experiment and post experiment interview were first tested with the help of four experienced researchers in the area of security and usability. The researchers took part in a trial run of the experiment and interviews, after which improvements were made based on their feedback. This made the actual experiments and interviews go very smoothly.

The following sections explain the usability study and results. First, the structure and process of the usability study are explained in detail in Section 5.1. Following the study details, the recruitment and participants are discussed in Section 5.2. In Section 5.3 ethical considerations concerning the participants are discussed, followed by a data analysis in Section 5.4. The results of the usability study are elaborated on in Section 5.5 and the limitations of the study are described in Section 5.6. A short summary is presented in Section 5.7.

34 CHAPTER 5. SYSTEM EVALUATION

5.1 Usability Study

The usability study is designed around an experiment where participants perform a few tasks using the Classification Storage system. The study consists of three parts: 1) a short structured interview to determine demographics, 2) the experiment itself, and 3) a structured post experiment interview. These parts are further described in the following subsections.

The current national and Karlstad University Covid­19 restrictions prohibit meeting participants to perform experiments. Therefor interviews were realised using Zoom and the experiment is performed with a Remote Desktop Connection. The audio of the interviews are recorded and documented. The Remote Desktop Connection is connected to a virtual desktop with the system installed along with instructions of tasks that the participants are asked to complete. This setup worked pretty well, despite the drawbacks and limitations of performing everything remotely. Further limitations are discussed in Section 5.6.

5.1.1 Part One: Interview

The first part consists of a short interview of six or seven questions, depending on the answers of the participant. The interview is performed over Zoom, where the audio of the questions and answers are recorded and documented. All participants are asked the questions in the same way and order, creating a structured interview [3]. The questions and answer alternatives are displayed to the participants for further clarification as the interviewer asks the questions. All questions and answer alternatives can be found in Appendix B.

The first question is about the participant agreeing to an informed consent that was provided before the interview. More on the informed consent in Section 5.3. The second and third questions are about age and gender to determine the demographics of the study. Questions four to seven determine how involved a participant is with information security. Participants that use remote storage solutions for work are asked if their workplace has policies and rules surrounding this. The last question is an open ended. This question asking for examples on how participants store file differently depending on file content. A followup question about these workplace policies, rules and participant file storage behaviour is asked in the post experiment interview.

5.1.2 Part Two: Experiment

The experiment is performed by having a participant connect to a virtual Windows machine with Remote Desktop Connection [21]. Participants are greeted with a windows desktop where the desktop­image displays instructions for five tasks, as seen in Figure 5.1.1. The goal of these tasks is to let the participant experience the Classification Storage system in a controlled environment.

35 CHAPTER 5. SYSTEM EVALUATION

Figure 5.1.1: Desktop of windows virtual machine with tasks as background.

The virtual Windows machine is set up with the Classification Storage system installed. The Classification Storage system on the virtual Windows machine is connected to a virtual server with running server code of the Classification Storage system. A file­ structure is hosted by the server and displayed to the users in the CSFS network drive (W:), as seen in Figure 5.1.2. This file­structure is set back to its original order between participants. This makes sure that the server is in the same state at the start of the experiment for every participant. In this experiment, the Classification Storage system is configured to have three classification levels and eight different tags. The three classification levels are Public, Internal and Confidential, as seen in Figure 5.1.3. The following tags are used, sorted by classification level requirements: • Public : Marketing, Website • Internal : Customer Records, Employee Records, Accounting, Contract • Confidential : Company Strategy, Company Secret

36 CHAPTER 5. SYSTEM EVALUATION

Figure 5.1.2: File­structure on CSFS network drive W: as seen by participants.

The tasks that the participants are instructed to perform can be categorised into two groups, copying files and changing file classification. The first four tasks require the participant to copy a file from the desktop to the CSFS network storage. The fifth and last task is to change the classification of a file a few times. The participants are instructed to complete the following tasks as seen in Figure 5.1.1: 1. Copy “Invoice 1003.txt” to CSFS (W:\Invoices\). 2. Copy “Client Contract CompanyC.txt” to CSFS (W:\Clients\). 3. Copy “Product Sheet v2.1.txt” to CSFS (W:\Product Sheets\). 4. Copy “Invoice Bribes.txt” to CSFS (W:\Invoices\). 5. Change classification for W:\ChangeClassification.txt by rightclicking the file and selecting Change Classification. • First change from Confidential to Public. • Then change from Public to Internal. The first three tasks are used to let the participant become used to the interface. When the participants copy the files according to the tasks, the CSDialogBox pops up, and they are required to set a classification before the file is successfully copied. The three files that are copied in the first three tasks are easy to set classifications on because the filenames are closely associated with available tags. For example, file “Invoice 1003.txt” would be stored as classification level Internal, because the Accounting tag requires Internal as a minimal classification level.

37 CHAPTER 5. SYSTEM EVALUATION

Figure 5.1.3: File Classification dialog box with three security levels (Public, Internal, Confidential).

To encourage the participants to reason about file classification decisions, the file in the fourth task is called “Invoice Bribes.txt”. The idea is that participants would select the tag Company Secret in addition to Accounting and with that be forced to classify the file as Confidential. This way the participant could be made more aware of the possibilities and ease of use, with which relatively similar files can be stored side by side, with different classification levels. The last task is to change the classification of a file two times. First, to a lower classification level, and later to a higher level again. This task allows the participant to experience two important features. The first is to show the implemented user error prevention, where users are warned when the classification of a file is changes to a lower level. The second is to show the easy of use with which a file can be reclassified, and with that moved to the back­end server matching the new classification level. When the last task is completed the participants are asked to disconnect the Remote Desktop Connection and the post experiment interview is started.

5.1.3 Part Three: Post Experiment Interview

The Post Experiment consists of 12 to 13 questions, one question might be skipped depending on the answers in part one. The goal of these questions is to evaluate the user experience and usability of the classification storage system. This interview is performed similarly as the interview prior to the experiment and the sound of the interview is recorded. The first 10 to 11 questions are displayed and asked in the same way and order for all participants. This structured interview makes use of the Likert­ scale for most questions to easily quantify and analyse the answers [19]. Only the last two questions are open ended questions as these are broader, open ended questions. All questions can be found in Appendix C.

38 CHAPTER 5. SYSTEM EVALUATION

First, the participant is asked how the classification storage system could influence their day to day tasks and approach to handling data of different classification levels. These are the first questions right after the experiment because we want to get the participants to think of the tasks they just performed from an information security perspective. The third question is only asked participants that replied Yes to questions six or seven from the previous interview. The third question determines if the classification storage system would make it easier for participants to follow these policies or rules. Questions four to eleven are mapped to the Nielsen’s heuristics [27]. Nielsen’s heuristics are commonly used to evaluate user interface design through usability principles. Not all of the heuristics are applicable to the classification storage system that was evaluated in this study. Eight of the ten heuristics were made into questions that the participants answered following the Likert­scale answering alternatives. This allows us to evaluate how the participants experienced the front­end of the classification system. The last two questions are open ended to encourage the participants to reason about the classification storage system as a whole and let them further elaborate on their thoughts. The main question is if participants find that the classification storage system makes them more information security aware. The participants are asked to explain their answer to find out what aspects of the classification storage system are most influencing user information security awareness. The last question is if the participant wants to add or comment anything else to conclude the interview.

5.2 Recruitment and Participants

Recruitment of participants for this study was significantly limited due to current Covid­19 limitations. This lead to the recruitment of friends, colleague and other people from our professional network to be participants. This is known as convenience sampling and this method has significant drawbacks [4]. The main downside to this method is potential bias. Bias can lead to the study not being able to generalise the results to a larger population. Ten participants partook voluntarily in the study to evaluate the Classification Storage system. The results of the usability study are based on the interviews and experiments of those ten participants.

5.3 Ethical Considerations

All participants took part in this study on a voluntary basis. The participants are informed about the purpose and conditions of the study through an informed consent before participating. The informed consent can be found in Appendix A. The study does not collect sensitive personal data and all data that is collected is kept confidential. Participants agree to the informed consent by confirming they do so, as the first recorded question in the interview.

39 CHAPTER 5. SYSTEM EVALUATION

The usability study collected demographic data about age and gender and the audio of the interviews is recorded and stored. The storing of the audio and research data is done by KAU in compliance with the EU General Data Protection Regulation (GDPR). All data is encrypted and archived for 10 years by KAU following rules and regulations. Names are replaced with pseudonyms and a list of the name mapped to pseudonym is stored separately and safely from the data. No personally identifying information about participants is revealed in the context of this study.

5.4 Data Analysis

The data in this study is collected from ten participants. Table 5.4.1 shows age and gender demographics for this study. All participants are divided within only two of the available age­groups. 70% of participants are 25­34 years old and the remaining 30% are 55+ years old. Only one of ten participants is female and 90% are males. These demographics are not representative for any larger population, making it impossible to generalize the sample group to a larger population. The main cause of this, is the small sample group and the participant recruitment process. These limitations are further discussed in Section 5.6.

Table 5.4.1: Demographic data

Demographic Value Participants % 18­24 years old 0 0 25­34 years old 7 70 35­44 years old 0 0 Age 45­54 years old 0 0 55+ years old 3 30 Prefer not to answer 0 0 Female 1 10 Male 9 90 Gender Other 0 0 Prefer not to answer 0 0 Private 0 0 Remote Storage Work 1 10 usage Both 9 90

The Post Experiment Interview starts with three statements that evaluate the impact of the Classification Storage system on the actions of participants, as seen in Table 5.4.2. All participants agreed or strongly agreed that the markers that visualise file classification levels could change their approach to handling data. 90% of the participants find that the use of different security levels becomes more clear when using the Classification Storage system. 70% of participants agree that the Classification Storage system would make it easy to store files based on content and/or follow company policy.

40 CHAPTER 5. SYSTEM EVALUATION

Table 5.4.2: Post Experiment Interview, Statements that participants answered with one of the alternatives in the Likert­scale.

Statement Participant Response

Strongly Disagree Disagree The markers visualizing a files classification could Neither Agree Nor Disagree Agree change my approach to how I handle files. Strongly Agree

0 1 2 3 4 5 6 7 8

Strongly Disagree Disagree The system makes the use of different security levels Neither Agree Nor Disagree Agree clear. Strongly Agree

0 1 2 3 4 5 6 7 8

Strongly Disagree Disagree This system would make it easy to store files Neither Agree Nor Disagree Agree correctly based on content and/or company policy. Strongly Agree

0 1 2 3 4 5 6 7 8

The statements in Table 5.4.3 are created from the Nielsen’s heuristics in an attempt to create a good evaluation method and cover all parts of the user interface in this evaluation. These eight statements cover most aspects of the usability of the classification Storage system. The statement that participants least agree with, is if the process of canceling or changing classification is clear. 50% of the participants strongly agree, but 10% disagree and 20% neither agree nor disagree. The most positive statement among participants is if the system prevents errors by warning users. 60% of participants strongly agree and the remaining 40% agree with that statement. Two statements have the same results, with half the participants agreeing and the other half strongly agreeing. These statements cover if the system is easy to understand and not too technical in its wording, and that there is a consistency of terms and actions in the system. The remaining statements have 90% of participants agree or strongly agree and 10% neither agree nor disagree. The last two questions of the post experiment interview are open­ended questions. The first questions, asks if the participants find that the system makes them more information security aware. 80% of participants find this to be true. The remaining 20% find that they already are very information security aware and that using the Classification Storage system would not improve that. Those participants also commented that users with no background in security could benefit greatly from the system and become more information security aware. 80% of participants find that the visual markers that represent the classification levels of files is one of the reasons that makes users more information security aware. 30% of participants find that forcing users to set a classification on all files is one of the reasons users become more information security aware when using the Classification Storage system.

41 CHAPTER 5. SYSTEM EVALUATION

Table 5.4.3: Post Experiment Interview, Statements that are mapped to the Nielsen’s heuristics that participants answered with one of the alternatives in the Likert­scale.

Statement Participant Response

Strongly Disagree Disagree Neither Agree Nor Disagree The system presents current status of files clearly. Agree Strongly Agree

0 1 2 3 4 5 6 7 8

Strongly Disagree Disagree The system is easy to understand and is not too Neither Agree Nor Disagree Agree technical in its wording. Strongly Agree

0 1 2 3 4 5 6 7 8

Strongly Disagree Disagree It is clear how to cancel or change classifications Neither Agree Nor Disagree Agree when setting them. Strongly Agree

0 1 2 3 4 5 6 7 8

Strongly Disagree Disagree There is a consistency of terms, situations and Neither Agree Nor Disagree Agree actions in the system. Strongly Agree

0 1 2 3 4 5 6 7 8

Strongly Disagree Disagree The system tries to prevent errors by warning users Neither Agree Nor Disagree Agree of unexpected choices. Strongly Agree

0 1 2 3 4 5 6 7 8

Strongly Disagree Disagree All information required to set a classification is Neither Agree Nor Disagree Agree present in the classification window. Strongly Agree

0 1 2 3 4 5 6 7 8

Strongly Disagree Disagree Setting a classification on a file is both intuitive and Neither Agree Nor Disagree Agree fast. Strongly Agree

0 1 2 3 4 5 6 7 8

Strongly Disagree Disagree Neither Agree Nor Disagree The system only presents relevant information. Agree Strongly Agree

0 1 2 3 4 5 6 7 8

42 CHAPTER 5. SYSTEM EVALUATION

The last question asks if the participants have anything else to comment on regarding the Classification Storage system. This question is answered very differently among participants and different improvements and features are discussed. A few participants mention that they find the system quite intuitive, while others mention different changes they would like to see. These changes were mostly related to supporting different company structures and needs. These comments are further elaborated on and discussed in Section 5.5.

5.5 Results

This usability study has three main goals with evaluating the Classification Storage system: user awareness of information security, potential changes to user behaviour, and user interface design. First, we take a look at user awareness of information security. The results from the study show that participants find that using the Classification Storage system does raise information security awareness. 80% of participants become more information security aware themselves. The remaining 20% of participants find themselves to be very information security aware already. All of those participants state that the Classification Storage system would raise information security awareness for less proficient users that are not experts in the field. As one participant states, when talking about other users and confidentiality of files: “By using this you definitely, force them to think and treat files better” Most of the participants explain that one of the main reasons for becoming more information security aware is the visual markers that represent the file classifications. The benefits of the visual markers are mentioned in many different ways by participants. When asked, does the system make you more information security aware, one of the participants states the following: “It does, by simply giving a visual indication of the classification levels of files” The participants are also asked if they agree with the statement; the system makes the use of different security levels clear. This was answered with 90% of participants responding agree or strongly agree. Next, we evaluate if the Classification Storage system could change user behaviour towards information security. The participants are asked if their approach to handling a file could change, based on the visual markers showing classification levels. All participants agree to this, with 60% of participants strongly agreeing. Participants also consider the forced file classification to be very beneficial to information security for users. In the context of storing types of information that users are unfamiliar with, one participant states: “without this kind of system that force me to make a choice, would normally make me select a storage without properly taking care of how sensitive the information is”

43 CHAPTER 5. SYSTEM EVALUATION

Lastly, we look at the evaluation results of the user interface design. Eight heuristic based statements are presented to the participants to evaluate the Classification Storage user interface. The participants are very positive to user interface as seen in Table 5.4.3. The least agreed with statement is about the clarity of canceling or changing classifications. This is something the system could improve on. The most agreed with statement is about error prevention by warning users of unexpected choices. The participants find the system quite intuitive but not pretty. As one participant states: “everything is quite intuitive in design, it could just look nicer” This is backed by the response of participants when asked if they agree with the statement; Setting a classification on a file is both intuitive and fast. 90% of participants agree or strongly agree with that statement. These results show that the participants that evaluated the Classification Storage system are quite positive to all three points of evaluation in this usability study. The usability study that was performed does have limitations that should be considered when analysing these results. The limitations of this study are further discussed in the next section.

5.6 Limitations

The main limitations of the usability study are the scale of the study and the recruitment process. As mentioned before, the main problem with our recruitment is the sampling of participants. The sampling method that is used is convenience sampling. This method was chosen because of current Covid­19 restrictions that make it difficult to recruit participants, as described in Section 5.2. The main drawback of using convenience sampling is high risk of bias results from this select group. The results from studies using convenience sampling can not be generalised to a larger population. This is clearly visible in the demographics of this study, as seen in Table 5.4.1. The study is performed with ten participants, which is a small sample group. A small sample group can similarly to convenience sampling lead to bias results that cannot be generalized to a large population. Performing this study with a larger and more versatile sample group would increase the confidence in our results. This was not possible within the scope of this thesis. The study is performed by having participants partake in an experiment where they follow a set of predetermined tasks. These tasks are tailored to this study and this gives a curated view of the entire system. The tasks are designed to cover as much of the systems features as possible, in an attempt to mitigate this limitation. To properly evaluate this system the participants should have a longer period during which they would use the system more freely. Unfortunately, a more proper evaluation of the system like that is something that can not be done during this thesis work, and will have to be a topic for future work.

44 CHAPTER 5. SYSTEM EVALUATION

5.7 Summary

In this chapter the evaluation of the Classification Storage system was described. The evaluation was performed through a usability study with ten participants. The usability study evaluates the Classification Storage system on three points; user awareness of information security; potential changes in user behaviour towards information security and user interface design. The study consists of a short interview, followed by the experiment, and lastly, a post experiment interview. The first interview established the demographics of the study. The experiment was executed using remote desktop through which participants performed a few tasks using the Classification Storage system. After successfully performing the tasks, the participants partook in a post experiment interview. The participants were asked if they agreed with statements about the Classification Storage system. The results of the usability study show that users of the Classification Storage system become more information security aware. The main reasons for this are the visual markers that represent the classification of a file, and the way the system forces users to set classifications on all files. All participants agree, that the Classification Storage system could change their behaviour and approach to handle files. The user interface design was evaluated by using the Nielsen’s heuristics. The participants are very positive to the user interface in terms of functionality, but as one participant states: “everything is quite intuitive in design, it could just look nicer” Overall the results of the usability study are positive for the points that the Classification Storage system is evaluated on. The results from this system evaluation are further discussed in the following chapter Discussion.

45 Chapter 6

Discussion

This chapter discusses the Classification Storage system, the results of the evaluation and limitations of the thesis work. The Classification Storage system is discussed and compared with the goals of this thesis in Section 6.1. The results of the system evaluation are elaborated on and parts of the system that could not be evaluated are discussed in Section 6.2. The limitations of the Classification Storage system and the thesis work are discussed in Session 6.3. Lastly, a short summary of this chapter is presented in Section 6.4.

6.1 Classification Storage

The Classification Storage system is created with three goals in mind. The first goal is to allow users to perform file classifications in a distributed storage system. The second is to achieve user driven classification. The third is to raise awareness of information security among users. To achieve these goals, a system architecture is created that fulfills the requirements. The implementation of the system architecture is the Classification Storage system as presented in this thesis. The Classification Storage system is a Client–Server solution where the server distributes files over different storage devices, based on classification levels. The client presents a custom filesystem that is connected to the server through an extended WebDAV protocol. The filesystem forces users to set a classification whenever a file is stored. Through this system the first and second goals are achieved. The third goal, (raising information security awareness among users) is more challenging to prove and therefore a usability study is performed as presented in the system evaluation chapter. The evaluation results are discussed in Section 6.2. The Classification Storage system is aimed to be used by organisations that need to get control over their files in regards to information security. The Classification Storage system could help these organisations implement classification policies on unstructured data such as Word­documents and other files.

46 CHAPTER 6. DISCUSSION

All files that are stored in this system are automatically stored on locations that follow the classification requirements if the system is installed correctly. Different classification levels have different security and confidentiality requirements. These can be implemented on the different storage locations or solutions associated with the classification levels. This could also have the added benefit of allowing less sensitive data to be stored on cheaper less secure solutions. The Classification Storage system is presented to users as one network drive. There is a low entry level to using the Classification Storage system, because it can be used as any network drive. All users that store files on this system are forced to make a classification selection. This ensures that all files in the Classification Storage system are classified. This also leads to all users of the system actively working with information security, and gives users an incentive to think more about the data they are working with as supported by the usability study results. The network drive that is presented to the users through the Classification Storage system also informs users of the classification levels. Each classification level is represented with a small icon that is displayed on top of the regular file icons. This causes classification levels to always be visible and could make users more careful with sensitive data. This prevents incidents where data is accidentally leaked or stored insecurely.

6.2 Evaluation Results

An evaluation of the Classification Storage system was performed through a usability study. The usability study was designed to evaluate the third thesis goal, raise awareness of information security among users. Along with the goal of raising information security awareness, the usability study has two more goals. Does the system potentially change user behaviour towards information security, and an evaluation of the user interface design. The results of the usability study show that users become more information security aware by using the Classification Storage system. Users that find themselves information security aware already, are not experiencing a raised awareness. The usability study also concludes that users of the Classification Storage system change their behaviour towards information security. All participants in the study state that their approach to handling a file could change based on the visual markers showing classification levels. The user interface design is evaluated on eight points with the help of the Nielsen’s heuristics. The results show that the user interface design is intuitive and the participants are very positive to the functionality. The Classification Storage system was evaluated from a usability perspective with the help of participants that performed an experiment using the system. The part of the system that is evaluated in this way is mainly the front­end that users interact with. The main functionality of the Classification Storage system is executed in the background and is something that regular users have no control over.

47 CHAPTER 6. DISCUSSION

The evaluation that was performed on the Classification Storage system is therefore not representative of the entire system. With more time a more elaborate evaluation could be performed, where background processes and server functionality could be examined on speed, security and optimisation.

6.3 Limitations

The limitations in this thesis can be divided into three categories, evaluation limitations, architectural limitations and implementation limitations. The limitations of the evaluation are focused on the participants and how the usability study is performed. Evaluation limitations were discussed in detail in the System Evaluation chapter in Section 5.6. Architectural limitations of the Classification Storage system are mainly based on design choices that were made early in the project. One of the main limitations of the Classification Storage system is the filesystem approach. The Classification Storage system is build around the custom filesystem, CSFS as described in Section 4.2.1. The reason for designing and implementing the system around a filesystem is to ensure that all file operations take file classifications into consideration. This design choice causes negative effects in practice, because of how different applications store files differently. During the implementation and testing of Classification Storage, this behaviour was first discovered when saving a Word­file to the network storage created by the system. When the Word­file was saved, it did not have the expected name. Instead, a temp file was created with a temporary name. After successfully saving the data to the temp file, Word renames the temp file to the original name. This behaviour caused the Classification Storage to trigger a user classification request for the temp file and not the intended file. This negative effect can cause unexpected behaviour from any application that does not call filesystem operations in a traditional way. Solving this problem will require extensive development and testing because a filesystem is reactive to how applications call file operations. This problem would have to be solved before the Classification Storage system could be used by any organisation. The implementation of the Classification Storage system has limitations through modules that are not implemented and software limitations. The main limitations regarding the implementation of the Classification Storage system is the authentication module that is not implemented because of time restrains. As this system is designed to handle sensitive information, most organisations would require extensive authentication functionality for both access rights and accountability. The authentication module is a vital part of the Classification Storage system and would need to be implemented before the system could be used securely. Without this module, the system can only be utilized in a limited way.

48 CHAPTER 6. DISCUSSION

The current implementation also has limitations surrounding stability and scalability. During development, the Classification Storage system was tested with a few simultaneous clients, but the software limitations of the system are unknown to this point in time. The unknown system stability under heavy load is a limitation that needs to be dealt with before deploying the Classification Storage system to any organisation. The implementation of the Classification Storage system also has limitations based on implementation choices that were made early in the development process. One such choice is the operating system that the client is developed for. The Windows client limits the use of the Classification Storage system to organisations that only use Windows clients.

6.4 Summary

This chapter discusses the Classification Storage system in details along with evaluation results and limitations. The implantation of Classification Storage system achieves the three main goals of this thesis. Firstly, the system allows users to perform file classifications in a distributed storage system. Secondly, user driven file classification is achieved. Lastly, information security awareness of users is raised by using the Classification Storage system. The limitations of this thesis are discussed in three categories, system limitations, implementation limitations and evaluation limitations. The next chapter contains the final conclusion and future work of this thesis work.

49 Chapter 7

Conclusions

The goal of this thesis is to create and evaluate a system that makes it easier for organisations to comply with laws, regulation, and legislation regarding information security. The system imagined makes it easy for users to set classifications on unstructured files, and these files should automatically be stored accordingly. The locations that files are stored at should following the security requirements of the different classification levels. An extensive system that achieves the thesis goals is created and is called Classification Storage. The Classification Storage system is a customisable solution for classifying unstructured files supporting information security policies of organisations. File classifications are performed by users that handle the documents and know its content. The classified files are stored at designated locations following classification levels. Organisations can set this system up to have different security standards for different classification levels. A properly set up Classification Storage system makes it easier to be compliant with controls imposed by laws and legislation. The Classification Storage system is evaluated through a usability study. The usability study concludes that users of the Classification Storage system not only find the system intuitive to work with, but also finds the system to raise user awareness of information security. Organisations that implement the Classification Storage system will create a more information security aware environment among users, while classifying unstructured files to comply with laws and legislation. This could potentially prevent circumstance where data is accidentally leaked or stored insecurely and reduce information security related incidents. The Classification Storage system achieves the goals set for this thesis, but more work can still be done to improve the system and create an even better solution as discussed in future work.

50 CHAPTER 7. CONCLUSIONS

Future work

The Classification Storage system is a new system and approach to file classification in information security. With this system, many more research paths can be explored. To start with, the entire system should be developed and problems with the current implementation should be solved. The current implementation does not have the Authentication Module implemented and a solution needs to be found for handling unexpected behaviour of applications interacting with the CSFS. A Classification Storage system that has been fully developed, is interesting to evaluate through a more extensive usability study. The usability study should be performed with a larger more diverse sample group to properly evaluate the system. It would also be interesting to evaluate the Classification Storage system over a longer period of time. Preferably with the system implemented in an organisation where users would interact with the system on a daily bases. This way, the impact of the Classification Storage system on users and the organisation can be measured. An extensive study that is performed over a longer period of time could also be used to evaluate the Classification Storage system on classification accuracy. The usability study in this thesis work concludes that users become more information security aware, but does this result in better classification accuracy and fewer information security incidents? Further improvements to the Classification Storage system could be made in the layout of the user interface. The interface could be made to look prettier, as some participants mentioned. The tags displayed in the DialogBox, could be grouped by classification associations. This could lead to an even more intuitive and easy to use system, that would be interesting to compare to the current results. The Classification Storage system could also be further developed with the intent to block certain files from being sent by email or other means. This could possible make secret and confidential files in an organisation more difficult to unintentionally share with unauthorised parties. The current implementation of the Classification Storage system has a very specialised usage in the information security space. This system could also be changed to support and enable integration in other industries. Many more fields of application could benefit from having the power to choose where a document is stored based on organisational structure or policy through an easy to use interface that any user can use. The Classification Storage system has shown to be an effective and unique solution for enhancing information security. With further development and investments, this system can become the basis for a start­up company.

51 Bibliography

[1] 27001:2013, ISO/IEC. “Information technology — Security techniques — Information security management systems — Requirements”. In: ISO/IEC (2013). [2] Android. Scoped Storage. Dec. 2020. URL: https://source.android.com/ devices/storage/scoped. [3] Blackman, Melinda C and Funder, David C. “Effective interview practices for accurately assessing counterproductive traits”. In: International Journal of Selection and Assessment 10.1­2 (2002), pp. 109–116. [4] Bornstein, Marc H, Jager, Justin, and Putnick, Diane L. “Sampling in developmental science: Situations, shortcomings, solutions, and standards”. In: Developmental Review 33.4 (2013), pp. 357–370. [5] Bowker, Geoffrey C and Star, Susan Leigh. Sorting things out: Classification and its consequences. MIT press, 2000. [6] Canonical. Ubuntu 20.04.1 LTS (Focal Fossa). June 2020. URL: https : / / releases.ubuntu.com/20.04/. [7] Castells, Manuel. The information age. Vol. 98. Oxford Blackwell Publishers, 1996. [8] dwmkerr. SharpShell. Nov. 2020. URL: https : / / . com / dwmkerr / sharpshell. [9] Eminağaoğlu, Mete, Uçar, Erdem, and Eren, Şaban. “The positive outcomes of information security awareness training in companies – A case study”. In: Information Security Technical Report 14.4 (2009). Human Factors in Information Security, pp. 223–229. ISSN: 1363­4127. [10] Fleischer, Benjamin. What is macFUSE? Dec. 2020. URL: https://osxfuse. github.io/. [11] FreeBSD. FUSEFS. Dec. 2020. URL: https://wiki.freebsd.org/FUSEFS. [12] Fruux GmbH. sabre/dav. 2020. URL: https://sabre.io/dav/. [13] Grispos, George. Criminals: Cybercriminals. 2019. [14] IETF Network Working Group. Calendaring Extensions to WebDAV (CalDAV). Mar. 2007. URL: https://tools.ietf.org/html/rfc4791.

52 BIBLIOGRAPHY

[15] IETF Network Working Group. Extended MKCOL for Web Distributed Authoring and Versioning (WebDAV). Sept. 2009. URL: https://tools.ietf. org/html/rfc5689. [16] IETF Network Working Group. HTTP Extensions for Web Distributed Authoring and Versioning (WebDAV). June 2007. URL: https://tools.ietf. org/html/rfc4918. [17] IETF Network Working Group. Web Distributed Authoring and Versioning (WebDAV) Access Control Protocol. May 2004. URL: https://tools.ietf. org/html/rfc3744. [18] Karjoth, Günter, Schunter, Matthias, and Waidner, Michael. “Privacy­enabled services for enterprises”. In: Proceedings. 13th International Workshop on Database and Expert Systems Applications. IEEE. 2002, pp. 483–487. [19] Likert, Rensis. “A technique for the measurement of attitudes.” In: Archives of psychology (1932). [20] Lucas, George R. Ethics and cyber warfare: the quest for responsible security in the age of digital warfare. Oxford University Press, 2017. [21] Microsoft. How to use Remote Desktop. Apr. 2021. URL: https : / / support . microsoft.com/en- us/windows/how- to- use- remote- desktop- 5fe128d5- 8fb1-7a23-3b8a-41e636865e8c. [22] Microsoft. Learn about sensitivity labels. Nov. 2020. URL: https : / / docs . microsoft.com/en-us/microsoft-365/compliance/sensitivity-labels. [23] Microsoft. Microsoft 365. 2020. URL: https : / / www . microsoft . com / en / microsoft-365. [24] Microsoft. Microsoft Information Protection SDK documentation. 2020. URL: https://docs.microsoft.com/en-us/information-protection/develop. [25] Microsoft. Working with Shell Extensions. Apr. 2018. URL: https : / / docs . microsoft.com/en-us/windows/win32/shell/shell-exts. [26] Miorandi, Daniele, Rizzardi, Alessandra, Sicari, Sabrina, and Coen­Porisini, Alberto. “Sticky policies: a survey”. In: IEEE Transactions on Knowledge and Data Engineering (2019). [27] Nielsen, Jakob. “Usability inspection methods”. In: Conference companion on Human factors in computing systems. 1994, pp. 413–414. [28] Oracle. Welcome to VirtualBox.org! Mar. 2021. URL: https://www.virtualbox. org/. [29] rsync. Welcome to the rsync web pages. Aug. 2020. URL: https : / / rsync . samba.org/. [30] SAMBA. Writing a Samba VFS Module. July 2018. URL: https://wiki.samba. org/index.php/Writing_a_Samba_VFS_Module.

53 BIBLIOGRAPHY

[31] Schantz, Richard E and Schmidt, Douglas C. “Middleware”. In: Encyclopedia of Software Engineering (2002). [32] scp. scp — OpenSSH secure file copy. Jan. 2021. URL: https://man.openbsd. org/scp.1. [33] sftp. sftp — OpenSSH secure file transfer. Feb. 2021. URL: https : / / man . openbsd.org/sftp.1. [34] skazantsev. WebDav.Client 2.7.0. Sept. 2020. URL: https://www.nuget.org/ packages/WebDav.Client/. [35] Sloane, Mona. “Inequality Is the Name of the Game: Thoughts on the Emerging Field of Technology, Ethics and Social Justice”. In: Weizenbaum Conference. DEU. 2019, p. 9. [36] Tankard, Colin. “Data classification – the foundation of information security”. In: Network Security 2015.5 (2015), pp. 8–11. ISSN: 1353­4858. [37] The Economist. “The world’s most valuable resource is no longer oil, but data”. In: The Economist: New York, NY, USA (2017). [38] The Internet Engineering Task Force. CardDAV: vCard Extensions to Web Distributed Authoring and Versioning (WebDAV). Aug. 2011. URL: https:// tools.ietf.org/html/rfc6352. [39] The Internet Engineering Task Force. The Internet Engineering Task Force (IETF) is the premier Internet standards body, developing open standards through open processes. Dec. 2020. URL: https://ietf.org/about/. [40] The kernel development community. FUSE. Dec. 2020. URL: https : / / www . kernel.org/doc/html/latest/filesystems/fuse.html. [41] Wikipedia. File Explorer. May 2021. URL: https://en.wikipedia.org/wiki/ File_Explorer. [42] WinFsp. Windows File System Proxy. Dec. 2020. URL: http://www.secfs. net/winfsp/. [43] Yaqoob, Ibrar, Hashem, Ibrahim Abaker Targio, Gani, Abdullah, Mokhtar, Salimah, Ahmed, Ejaz, Anuar, Nor Badrul, and Vasilakos, Athanasios V. “Big data: From beginning to future”. In: International Journal of Information Management 36.6, Part B (2016), pp. 1231–1247. ISSN: 0268­4012.

54 Appendix ­ Contents

A Informed Consent 56

B Interview Questions 58

C Post Experiment Interview Questions 60

55 Appendix A

Informed Consent

56 Consent Form for Participation in the Experiment ”Evaluation of Classification Storage system”

Thanks for your interest to participate in the study conducted by Jo¨elSloof for a master thesis at Karlstad University (KAU). The purpose of this study is to evaluate a system called Classification Storage, that was created for this master thesis. We hope to determine the quality of the software and if using Classification Storage raises awareness of information security among users. What will I be asked to do? You will be asked questions as an interview and will be asked to perform a few tasks using the Classification Storage system through a remote desktop connection. The tasks will be followed by a post experiment interview with questions related to the experiment and your experiences with the Classification Storage system. The approximate time of the experiment is about 20 min. What data will be collected and for what purposes? Who will process your data? KAU will as the data controller request your demographic data about your gender and age. We will also audio record the interview questions. In addition, a list matching your name with a pseudonym will be created for the purpose of pseudonymisation of all interview recordings. All data will be used for research purposes only. The analyzed data will be used in Master thesis and academic publication. How will your data be processed? All your data will be kept confidential, stored safely on an encrypted partition of a computer hard drive and deleted after the archiving period of 10 years (required by KAU for all original research data for prevent- ing/detecting research fraud). The list matching your names to pseudonyms will be kept separate from all other collected data at a secure place. Data processing and handling will be done by KAU and in compliance with the EU General Data Protection Regulation (GDPR). At no time, your name or any other information that may directly identify you will be used when reporting the results. No personally identifying information about you should be revealed in the context of this study. Voluntary participation & Your rights Participation in this study is completely voluntary. You are free to leave or end the experiment at any point without explanations. If you withdraw, we will delete your data and therefore destroy any information that you have provided. You can also exercise your data subject rights to access, rectification, deletion or blocking of your data according to the GDPR without any costs data deletion is however only possible up to the time when the results of the experiment will be published in anonymised form. The study is designed to evaluate the Classification Storage system, not to evaluate your knowledge. There are no wrong or right answers to the questions being asked. Contact If you have questions, concerns or if you want to exercise your rights please contact: Data controller: Karlstad University, Universitetsgatan 2, 65188 Karlstad Contact Person: Jo¨elSloof: [email protected], telephone: 073-8409177 Conny Classon: (Data Protection Officer at KAU), [email protected], telephone: 054-70010 00. Appendix B

Interview Questions

1. Do you agree to provide data for the purposes and under the conditions described in the informed consent, which you have read? □ Yes □ No 2. How old are you? □ 18­24 years old □ 25­34 years old □ 35­44 years old □ 45­54 years old □ 55+ years old □ Prefer not to answer 3. What is your gender? □ Female □ Male □ Other □ Prefer not to answer 4. Do you store files on locations other than your local hard drive? □ Yes □ No □ I don’t know If the answer to question four is No or I don’t know, than the participant is not of interest to this study and the interview is ended. 5. Do you use remote storage for private use, work use or both? □ Private □ Work □ Both

58 APPENDIX B. INTERVIEW QUESTIONS

If the answer to question five is Private, than question six is passed over and the next question is question seven. 6. Does your company have policies and rules in place for storing files? □ Yes □ No 7. Do you store files differently depending on its content? For example: You store less sensitive data in the cloud, but more sensitive data on a local NAS. □ Yes → Could you give an example? □ No

59 Appendix C

Post Experiment Interview Questions

1. The markers visualizing a files classification could change my approach to how I handle files.

For example: files containing sensitive data would be handled with more care

□ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree

2. The system makes the use of different security levels clear.

(Public, Internal, Confidential)

□ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree

Question 3 is only asked if the participant answered Yes on Question 6 and/or Question 7 in Interview Questions Part 1.

3. This system would make it easy to store files correctly based on content and/or company policy.

□ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree

60 APPENDIX C. POST EXPERIMENT INTERVIEW QUESTIONS

4. The system presents current status of files clearly. □ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree 5. The system is easy to understand and is not too technical in its wording. □ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree 6. It is clear how to cancel or change classifications when setting them. □ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree 7. There is a consistency of terms, situations and actions in the system. For example: the system does not use different names for the same actions creating possible confusion. □ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree 8. The system tries to prevent errors by warning users of unexpected choices. For example: the confirmation box that appears when a file is classified lower than it previously was. □ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree

61 APPENDIX C. POST EXPERIMENT INTERVIEW QUESTIONS

9. All information required to set a classification is present in the classification window. □ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree 10. Setting a classification on a file is both intuitive and fast. □ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree 11. The system only presents relevant information. □ Strongly agree □ Agree □ Neither agree nor disagree □ Disagree □ Strongly disagree 12. Does the system makes you more information security aware? Please explain. 13. Is there anything else you would like to add or comment on regarding the system?

62