Linköping University | Department of Computer and Information Science Master’s thesis, 30 ECTS | Datateknik 2021 | LIU-IDA/LITH-EX-A--2021/013--SE

Generating Datasets Through the Introduction of an Attack Agent in a SCADA Testbed – A methodology of creating datasets for intrusion detection re- search in a SCADA system using IEC-60870-5-104

Hur en SCADA testmiljö med IEC-60870-5-104 protokollet un- der attack kan skapa data att använda för nätverksbaserade in- trångdetekteringssystem

August Fundin

Supervisor : Chih-Yuan Lin Examiner : Simin Nadjm-Tehrani

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer- ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Över- föring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och till- gängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet än- dras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down- load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

© August Fundin Abstract

In December 2015 a power outage was caused by a hacking attack in Ukraine. This further highlighted the ongoing increase of attacks on critical infrastructure and the vulnerabilities of the aging industrial control systems governing it. Supervisory Control and Data Acqui- sition (SCADA) is an example of such a system. Studying the intrusion of adversaries and anomalies in SCADA systems is no easy feat. Administrators of SCADA systems rarely share data as they risk getting their weaknesses detected. Hence, datasets containing this data need to be acquired through other means. In this study, a SCADA testbed simulating a real-world counterpart was used to create datasets for intrusion detection. As the testbed had no previously documented attacks, this study also investigated how the testbed reacted to generated attacks. This study focused on attacks on the communication protocol IEC-60870-5-104. The chosen approach to obtain datasets was to construct a so-called attack-bot, generating attacks during scenarios where network traffic was recorded. After a scenario, a user has access to labeled network traffic, ready to be used when training intrusion detection systems. This kind of data is traditionally challenging to create. There are few publicly available qualitative testbeds and generating data without a testbed comes with a whole set of dif- ficulties. The results illustrate how this study’s approach can generate high quality data with a rather small effort. Acknowledgments

I would like to thank Chih-Yuan Lin and Simin Nadjm-Tehrani, my supervisor and my ex- aminer. For the guidance as well as the valuable feedback and discussions for the duration of my work. I would like to follow that up with a hearty thanks to Erik Westring, Peter Andersson and Tommy Gustafsson at FOI for aiding me with RICS-el. And finally, thanks to all of you who gave me much needed encouragement when I needed it!

iv Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables viii

Abbreviations ix

1 Introduction 1 1.1 Motivation ...... 2 1.2 Aim...... 2 1.3 Research Questions ...... 3 1.4 Delimitations ...... 3 1.5 Thesis Outline ...... 3

2 Background 4 2.1 SCADA ...... 4 2.2 IEC-60870-5-104 ...... 6 2.3 SCADA Vulnerabilities ...... 10 2.4 SCADA Exploits ...... 11 2.5 RICS-el ...... 13

3 Related Work 16 3.1 Dataset Generation ...... 16 3.2 Attack Types and Attack Evaluation ...... 19

4 Methodology of Dataset Generation 22 4.1 Attack-Bot Implementation ...... 23 4.2 Experiment Setup ...... 23 4.3 Dataset Generation Workflow ...... 25

5 Attack Generation in RICS-el 30 5.1 Attack Model ...... 30 5.2 Attack Scenario Implementation ...... 31

6 Method of Evaluation 36 6.1 Dataset Requirements ...... 36 6.2 Evaluation of Datasets Requirements ...... 37 6.3 Attack Impact Evaluation ...... 38

v 7 Results and Evaluations 39 7.1 Impact of Attacks ...... 39 7.2 Created Datasets ...... 43 7.3 Review of Requirements ...... 44

8 Discussion 46 8.1 Results ...... 46 8.2 Method ...... 48 8.3 Sources ...... 50 8.4 The Work in a Wider Context ...... 50

9 Conclusion 52 9.1 Dataset Creation in RICS-el ...... 52 9.2 Attack Generation in RICS-el ...... 53 9.3 Future Work ...... 54

A Appendix Attack-Bot Configurations 56 A.1 List of Flags ...... 56 A.2 Configfile Options ...... 57

B Appendix Attack-Code Template 58

Bibliography 59

vi List of Figures

2.1 An overview of SCADA ...... 5 2.2 APDU with fixed and variable length ...... 7 2.3 APCI control field formats ...... 8 2.4 Information contained in an ASDU ...... 9 2.5 Overview of RICS-el ...... 13 2.6 Interactions between bots and SCADA in RICS-el ...... 14

4.1 Network configuration in the experiment setup ...... 23 4.2 The attack-bot in RICS-el’s dataflow ...... 24 4.3 The dataset generation workflow ...... 25 4.4 Running scheduled attack scenarios ...... 27 4.5 Flowchart of iterative dataset evaluation ...... 29

vii List of Tables

2.1 Common ASDU functions in RICS-el ...... 8

5.1 Overview of implemented attack-scenarios ...... 31 5.2 IP addresses of IEC-104 devices ...... 32

6.1 Attack success criteria ...... 38

7.1 Result of the scanning attack ...... 40 7.2 Results of the DoS attacks ...... 40 7.3 Results of the sequence attack ...... 41 7.4 Results of the MitM attacks ...... 41 7.5 Result of the replay attack ...... 42 7.6 Results of the injection attacks ...... 43 7.7 Recorded datasets ...... 44 7.8 Operator actions in each scenario ...... 44

A.1 List of flags ...... 56

viii Abbreviations

APCI Application Protocol Control Information.

APDU Application Protocol Data Unit.

ARP Address Resolution Protocol.

ASDU Application Service Data Unit.

CoT Cause of Transmission.

CSV Comma Separated Values.

DMZ Demilitarized Zone.

DoS Denial-of-Service.

FOI Swedish Defence Research Agency.

HMI Human-Machine-Interface.

ICMP Internet Control Message Protocol.

ID Identity.

IDS Intrusion Detection System.

IE Information Element.

IEC International Electrotechnical Commission.

IEC-104 IEC-60870-5-104.

IO Information Object.

IOA Information Object Address.

IP Internet Protocol.

IT Information Technology.

ITF Invalid Time Flag.

LAN Local Area Network.

MitM Man-in-the-Middle.

NIDS Network Intrusion Detection System.

ix NSTB National SCADA Test Bed.

NTP Network Time Protocol.

ORG Originator Address.

OT Operation Technology. Packet Capture.

PLC Programmable Logic Controllers.

RICS Resilient Information and Control Systems.

RTT Round-Trip Time.

RTU Remote Terminal Units.

S3 SUTD Security Showdown.

SCADA Supervisory Control and Data Acquisition.

SQ Structure Qualifier.

SSH Secure Shell.

STARTDT Start Data Transfer.

STOPDT Stop Data Transfer.

SUTD Singapore University of Technology and Design.

TCP Transmission Control Protocol.

TCP/IP Internet protocol suite.

TESTFR Test Frame.

TTL Time To Live.

VM Virtual Machine.

VPN Virtual Private Network.

WAN Wide Area Network. 1 Introduction

SCADA is a control system that encompasses both devices interfacing with physical machin- ery and computers of geographically distributed critical infrastructure, such as power grids. Organizations managing power grids need SCADA systems to control and monitor safe and reliable operations [10]. SCADA systems and their protocols were previously used in isolated networks with propri- etary solutions. However, this has changed over the last decades. Components are now stan- dardized instead of specialized, to improve maintainability. Instead of proprietary software, more publicly known software is used to ease the integration of systems. Connections be- tween SCADA networks and the organization’s corporate networks have been added. These changes have made SCADA systems easier to operate. But at the same time, the changes have also made SCADA systems more vulnerable. The connections to the corporate network open up for intruders to penetrate the system in new ways. Devices and protocols that then become exposed often have known vulnerabilities [10]. Cyberattacks targeting SCADA systems are undeniably happening in today’s society. The power grid cyberattack in Ukraine, December 2015, is believed to be the first example of a power outage deliberately caused by a hacking attack [25]. Since then, there has been an increase in reports of attacks on SCADA systems with malicious intent [41, 29]. Security re- searchers need to find ways of detecting anomalies and intrusions in SCADA systems. How- ever, research on SCADA systems used in production risk disruptions since such a system needs to be in constant operation [19]. Another issue is that intrusions in SCADA systems, such as the cyberattack in Ukraine, are unique events. This complicates the reproduction of data necessary for studies on intrusion detection. Therefore, datasets with recorded or gener- ated traffic with the characteristics of the examined SCADA system need to be used instead to enable the development, evaluation and comparison of different defense mechanisms. An important method of defense is to introduce a Network Intrusion Detection System (NIDS) in SCADA networks [51]. A NIDS is an automated process that monitors events within the traffic of a system, analyzing data in search of signs of adverse events [51]. The development of NIDS requires realistic datasets to configure and attune the NIDS, to effec- tively detect intrusion. The datasets need to contain normal traffic and labeled attack traffic so that researchers can distinguish attacks from normal operation [49]. Introducing anomalies

1 1.1. Motivation or attacks to a SCADA system in production to acquire such data is out of the question. One method to generate datasets for NIDS development is to record data in testbeds that imitate the behavior of SCADA systems in production. Unfortunately, reliable and publicly available datasets have traditionally been few. Those available have been criticized for being out of date, lack proper labeling and containing de- fects not present in real-world applications [28, 30]. This study presents a method to generate datasets in a virtual testbed [5] containing an emu- lated power grid containing about 20 substations. The testbed runs similarly to the real-world application, allowing the creation of reliable and adaptable datasets to study. In this study, a program is built to enable attack generation in the virtual SCADA testbed. The creation of the attack generating program contributes to the research field surrounding the security of SCADA systems in two main ways. Firstly, it offers a methodology to create relevant datasets generated in the virtualized testbed. Secondly, it investigates the testbed’s reactions to differ- ent attacks and the support and limitations of attack generation within it.

1.1 Motivation

Cyberattacks against power grid control systems constitute a threat to the availability and reliability of power. Since it is infeasible to run tests on such systems in production there is a need for testbeds that correlates closely with the real system. A testbed needs to allow repeat- able tests and simulations for the evaluation of different methods. The testbed used in this study is called RICS-el [5] and is created by the Swedish research center Resilient Information and Control Systems (RICS)1. In RICS-el long data streams can be collected during simula- tions, generating datasets for research. With the introduction of simulated agents involved in the power grid, such as operators and attackers, generating relevant datasets would be made easier. It is of relevance to conduct a study on dataset generation in a virtualized testbed rather than a physical testbed. If a virtual testbed can be deemed as believable and effective as a physical testbed there is a lot to gain. One advantage of a virtual testbed is that it is more cost-effective and easier to maintain. Another advantage is that virtual testbeds allow for easier system recovery. The state of the virtual system can be saved and loaded rapidly making experiments easier to replicate. In early 2020, when this project started, there was no reliable way to generate datasets con- taining attacks in RICS-el. Through the creation of a program that generates deterministic attacks in RICS-el, researchers would be able to collect data in repeatable experiments in a reliable way. Hence, streamlining the process of finding new and better methods to detect intrusions in SCADA systems. Also, attacks and their impacts on RICS-el’s SCADA system have not been previously docu- mented. Since RICS-el is a virtual environment it might not react in the same way to certain input or situations as a real power grid would. If RICS-el behaves too differently from its real-world counterparts then the generated datasets are not as useful. Therefore, the impact of generated attacks needs to be documented.

1.2 Aim

The purpose of this study is to provide a systematic method to generate datasets for intrusion detection research in SCADA systems. This is to be achieved by creating a bot, an attack-bot,

1https://www.rics.se/

2 1.3. Research Questions generating attacks within the virtualized testbed RICS-el. Ideally, the attack-bot in conjunc- tion with the virtual nature of RICS-el would create a flexible environment for researchers and operators in training alike. RICS-el’s implementation differs from a real power grid in several ways, such as the amount of bandwidth, how data is processed in the SCADA system and its devices are virtual rather than authentic. To generate datasets containing relevant attacks, there is a need to evaluate which attacks RICS-el responds to believably. This study aims to document attack impacts to increase the understanding of this topic.

1.3 Research Questions

Given the demand for a reliable approach to dataset generation, the following research ques- tions are answered in this study.

1. How can datasets, including realistic network traffic and labeled attacks, for intrusion detection research in SCADA systems be realized in RICS-el?

2. Do attacks generated in this study have an appropriate effect on RICS-el? What impact do the attacks have on RICS-el?

1.4 Delimitations

In this study, an attack-bot is defined as a program that, after startup, launches attacks with- out the need of any user input. The attack generation is carried out in the RICS-el testbed and all attacks are restricted to the IEC-60870-5-104 (IEC-104) protocol. Because of time constraints put on this study, the attack-bot need not generate more than five different types of attacks. The focus is on attack impact assessment and network intrusions. Hence, process-aware Intrusion Detection Systems (IDSs) or datasets containing process data are not part of this study.

1.5 Thesis Outline

In chapter 2, the relevant theory and terminology surrounding SCADA and IEC-104 are cov- ered together with their respective vulnerabilities. The testbed RICS-el is also presented more thoroughly. Chapter 3 covers how others have chosen to generate datasets and successful at- tacks on SCADA systems. The approach taken to generate the datasets is described in chapter 4. Chapter 5 defines an attack model and the attacks. Then, chapter 6 cover how the datasets and attacks were evaluated. The evaluation of attacks and datasets is given in chapter 7, which also lists the resulting datasets. Chapter 8 discusses the results, reviews the method and reflects on the work in a wider context. Finally, chapter 9 describes to what extent the aim has been achieved and presents the answers to the research questions.

3 2 Background

This chapter covers the theoretical background and terminologies necessary to understand this study. Section 2.1 describes what SCADA is and how communication works within SCADA systems. A common communication protocol in SCADA is IEC-104 and the rele- vant aspects of that protocol are explained in section 2.2. Specific vulnerabilities of SCADA systems and IEC-104 are described in section 2.3 and section 2.4 lists examples of different attacks on SCADA systems. The SCADA testbed using IEC-104, which is being attacked in this study, is described in section 2.5.

2.1 SCADA

This section presents a typical SCADA architecture and describes the components in the ar- chitecture. The section also covers the basics of SCADA communication with a focus on protocols and traffic behavior. SCADA is a control system for monitoring and controlling geographically distributed phys- ical processes in real-time. It is widely used in power grid infrastructure. A lot of resources have been devoted to ensure safety and that the processes run as expected. To manage safe functioning processes is the priority of SCADA. SCADA systems interact with physical pro- cesses and assets such as controllers and sensors. Readings are sent from the outer nodes of the SCADA network to a central control system, from where the physical process can be regulated through the hands of an operator or automatic processes [51, 10]. In SCADA systems, much emphasis is put on availability, access to commands and data is necessary to deliver proper operation. There is also a very low tolerance for delays, as slow reactions could lead to the system entering an unsafe state [10].

4 2.1. SCADA

2.1.1 Architecture and Components There is no conventional architecture of SCADA systems [51] but there is an idea of an ar- chitectural pattern which conceptually consists of three layers: The field stations, the control network and the corporate network [10], see Figure 2.1. The lower layer consists of the field stations, each containing one or more field devices and controllers. Examples of such could be Remote Terminal Units (RTU), Programmable Logic Controllers (PLC), sensors and actuators. The RTUs of a SCADA system typically hold pro- cess setpoints. These setpoints define bounds or goals which the system uses in its control of actuators. The lower layer receives control commands from the control network and sends monitoring data in return. The control network has a Human-Machine-Interface (HMI) where operators can overview and manually control the current state of the system. The supervisory control of the system is located at the control server/data acquisition. The historian is a database that logs the process information for the entire system. The upper layer, the corporate network, is interested in production scheduling. Passing in- formation about forthcoming load forecasts or changes foreseen in the operational capacity. Operators on this level need access to data in the control network [51]. access to data and remote control of the system.

Figure 2.1: An overview of SCADA

2.1.2 Communication Traffic features in SCADA differ from traditional Information Technology (IT). For instance, most packets sent in a SCADA network are generated by a machine repeating the same ac- tions over and over [31]. One such action could be exemplified by a control network contin- ually polling the field devices, which is constantly sending data in return [51].

5 2.2. IEC-60870-5-104

SCADA and traditional IT do not only have different users and usage, they have different priorities. Confidentiality and integrity, that unauthorized users or processes should not be available to access or alter information are main concerns in traditional IT [51]. However, SCADA systems need to ensure safe and functioning processes and therefore need to prior- itize availability of information [7]. SCADA systems interact with physical assets, having a direct impact on physical systems. If an operation or response is delayed, the system could be brought to an unsafe state [56]. As such, SCADA systems have a low tolerance for jitter and delay. Therefore, the communication protocols were not designed with security as a top priority, but rather with good performance and functionality [43]. Communication channels between the field devices and the control network are, for example, wire, radio or satellite [51]. It often depends on the amount of infrastructure available to the field device which could operate at remote locations. All SCADA devices are expected to run for a very long time, up to 20 years [10]. During that time, there is an expectation that every device is responsive to incoming connections.

2.2 IEC-60870-5-104

This section covers the IEC-104 communication protocol, specifically how the packets are structured and how communication works where IEC-104 is implemented. IEC-104 is an international standard used in SCADA systems. Especially within the field of electricity dis- tribution and power system automation in European countries where its main functionality is telemetry gathering [4]. IEC-104 holds a close similarity to IEC-101, which was the first standard developed by In- ternational Electrotechnical Commission (IEC) in the set of standards 60870 part 5 [13]. In IEC-104 the application layer introduced in IEC-101 is preserved. However, the standard was extended to be used over the Internet protocol suite (TCP/IP), changing the transport, network, link and physical layer services.

2.2.1 How Communication Works IEC-104 supports bidirectional communication but differentiates between control and mon- itor direction [13]. Transmissions from the control network to the field devices are sent in the control direction. Transmissions going in the other direction, from a field station to the control network, are sent in the monitor direction. The IEC-104 protocol supports three different modes of operation, control/request, periodic and spontaneous [44, 32]. With the control/request mode, the control network polls a field station for data transmissions. In periodic mode, data is sent with a predefined interval. A spontaneous transmission occurs whenever there is an update of the system’s state. For ex- ample, when a switch is toggled or a measured value differentiates enough from a previously sent value. The communication between the control network and each field device is initiated like any other connection using Transmission Control Protocol (TCP). Involved nodes identify them- selves using the Internet Protocol (IP) addressing system. IEC-104 has designated port 2404. When a connection has been established, user data transfer is not automatically enabled. It has to be initiated by a controlling station sending a Start Data Transfer (STARTDT). Open connections may be periodically checked with a Test Frame (TESTFR), typically after a pe- riod of time where no data has been transmitted [44]. IEC-104 and TCP use acknowledge numbers and sequence numbers in a similar fashion. Each byte in a TCP stream has a sequence number. In a sent TCP packet, the sequence number is the byte number of the first byte of data. The acknowledgement number is the sequence

6 2.2. IEC-60870-5-104 number of the next byte the sender expects to receive. The sequence numbers are used to check the validity of a packet and to know if buffered bytes have to be re-sent [47]. The initial sequence number is chosen at random to mitigate connection hijacking [47]. In IEC-104, the sender also holds sent packets in a buffer until they have been acknowledged by its own sender sequence number, returned as a receive sequence number. All packets in the buffer with equal or lesser sequence numbers are then released. To avoid overflow an acknowledgement is typically sent in response to a longer data transmission in one direction [44].

2.2.2 The The communicated packets are called Application Protocol Data Unit (APDU) and exist in three different formats; the S, U and the I-format [44]:

• The Information transfer format, I-format, has an Application Service Data Unit (ASDU) used for sending detailed information such as measurements and commands.

• The S-format is used for numbered Supervisory functions. A typical use for this frame is to acknowledge received APDUs.

• The U-format, U as in Unnumbered control functions, is used for the activation and confirmation mechanisms STARTDT, Stop Data Transfer (STOPDT) and TESTFR.

An APDU always contains a header, the Application Protocol Control Information (APCI). The APCI is constrained to six bytes of data: A start byte, a byte specifying the length; and four control fields. Control fields detail format type and sequence numbers. As the S and U- formats APDU only consists of an APCI, both are of a fixed length. The I-format is different as it contains an ASDU of variable length, holding sensitive information about the system. An overwiew of the APDUs is shown in Figure 2.2 (based on Matoušek [44]).

Figure 2.2: APDU with fixed and variable length

7 2.2. IEC-60870-5-104

The start byte of the APDU is 0x68 and the length of the APDU is excluding the start byte and length byte. The following four control fields of the APCI are shown in Figure 2.3. The control fields of an I-format specifies the message direction. The sequence numbers are initially set to zero and then incremented by one for each APDU and direction.

Figure 2.3: APCI control field formats

An ASDU can have 256 different type identifications, functions, of which 67 are defined in the standard. The functions of relevance to this study are listed in Table 2.1.

Table 2.1: Common ASDU functions in RICS-el Function Description) M_SP_NA_1 Single point information M_ME_NA_1 Normalized measured value M_SP_TB_1 Single point information with time tag M_DP_TB_1 Double point information with time tag M_ME_TF_1 Measured short floating point value with time tag C_DC_NA_1 Double command C_DC_TA_1 Double command with time tag C_SE_NC_1 Set-point command, short floating point value

8 2.2. IEC-60870-5-104

The structure of the fields found in an ASDU can be seen in Figure 2.4 (based on Matoušek [44]).

Figure 2.4: Information contained in an ASDU

• The type of the ASDU is specified in the first byte, Type identification, which will dictate what function the Information Object (IO) has.

• The Structure Qualifier (SQ) is used for specifying if the ASDU has a sequence of one or more IOs, or if it has a sequence of Information Elements (IEs) of one type. An ASDU can hold up to 127 objects or elements, specified in Number of objects.

• T is a test bit set to 1 during test conditions, signaling it should not change the system state.

• The P/N bit indicates whether a command is confirmed or not when sent in mirrored direction.

• The Cause of Transmission (CoT) hints at why the ASDU is sent.

• The Originator Address (ORG) is used to explicitly declare the controlling station.

• The ASDU address field is the address to the station to which all objects in the ASDU are associated.

Following the data unit identifier, there are one or more objects. Each field of such an object contains:

9 2.3. SCADA Vulnerabilities

• The Information Object Address (IOA), referring to different addresses on the RTU that is being controlled.

• The IEs contain the transmitted information.

• There are 32 defined type identifications in the standard that have a field reserved for a time tag. IEC-104 has a built-in security mechanism for noting that the devices were not synchronized, the Invalid Time Flag (ITF), in a packets time tag [9]. This bit is set in the acquisition function if any inconsistencies are recognized.

2.3 SCADA Vulnerabilities

This section outline vulnerabilities in a typical SCADA system architecture and vulnerabili- ties in the communication protocols.

2.3.1 Vulnerabilities in Architecture and Components As devices in SCADA are expected to run for a long time, a lot of legacy software and hard- ware is in operation within SCADA systems. Obviously, old software and hardware often have vulnerabilities, not prevalent in newer versions or patches applied to avert known threats [43]. Leaving a machine running for a long time also accumulates fragmentation, leaving it vulnerable to buffer overflow [56]. Built with availability as a top priority, authentication and encryption were downplayed as both practices hinder availability. When remote access was introduced to enhance ease of access to SCADA systems, the lack of authentication and encryption became an apparent issue. Especially when proprietary solutions were exchanged for commercial off-the-shelf solutions. While new solutions made integration and maintenance easier, the domain knowl- edge needed to cause harm drastically decreased [56]. Disconnecting SCADA networks from the Internet, or restricting access by implementing a Virtual Private Network (VPN), is not enough to mitigate risks and secure the system [10]. For instance, operators cannot protect field devices from malicious commands sent from the con- trol room. This type of insider attack constitutes the majority of attacks on SCADA systems. Of course, an operator might make the wrong call or issue the wrong command accidentally. But anyone with access to a physical connection to the SCADA system could just bypass the VPN or HMI.

2.3.2 Vulnerabilities in the Communication As mentioned in section 2.3.1, there is no authentication or encryption built into SCADA systems. This is true for the communication protocol IEC-104 too. There are few security bits, no digital signatures and there is no checksum. Instead, IEC-104 depends on lower layers for data integrity [45]. However, the lower layer TCP/IP used in IEC-104 has plenty of known vulnerabilities pub- licly known [50, 21]. Examples of such:

• SYN flooding. A SYN packet is used to initiate communication in TCP. If a receiver of a SYN packet responds with an acknowledgement the original sender will confirm the connection, enabling an exchange of data. This is called a TCP handshake. A SYN flood is a continuous stream of SYN packets sent to a device. Without pre- cautions, the device will respond to each SYN it receives but the TCP handshake is not completed in the attack. This could leave the targeted device unresponsive to legitimate traffic as it is waiting for a handshake confirmation [6].

10 2.4. SCADA Exploits

• Spoofing of the Address Resolution Protocol (ARP spoofing). ARP spoofing is a tech- nique that allows an adversary to intercept data frames in a Local Area Network (LAN) [39].

Vulnerabilities in SCADA traffic exist in both control and monitor directions. An adversary could for instance block the data transfer in the monitor direction through the utilization of SYN flooding. This can have a big impact as the SCADA operations completely depend on the data received from the RTUs [45]. The vulnerabilities can be exploited in the control direction too. The adversary may use ARP spoofing to intercept data and modify its content. This is harmful since any issued command from the control network is seen as a legitimate one by the receiver.

2.4 SCADA Exploits

This section lists different types of attacks and examples of attacks on SCADA, from research or historical events.

2.4.1 Scanning To find suitable targets within the system, an adversary depends on scanning the network. A worm that was discovered in 2010 targeted nuclear power plants in Iran in an attack that became known as Stuxnet [20]. This virus scanned for specific machines to spread to in order to cause as much damage as possible. SCADA networks contain devices that are infrequently updated, are proprietary or lack an interface to commonly used methods of scanning. As such, using scanning tools designed for traditional IT networks might cause unexpected issues [12].

2.4.2 Replay A replay attack is when authentic data is delayed or repeated by an adversary. As SCADA systems do not prioritize integrity there are few security mechanisms preventing unautho- rized transmissions such as a replayed packet. The adversaries behind Stuxnet used a replay technique to hide the manipulation of genera- tors. Data sent to a PLC from peripheral sensors were recorded for 21 seconds. Then, further updates from the sensors were dropped and the recorded data was sent in its place [20].

2.4.3 Man-in-the-Middle Man-in-the-Middle (MitM) attacks are attacks in which an adversary captures, and possibly modifies, data sent to or from an unknowing target. Decisions taken at the HMI are based on information from the data it receives from the RTU. And the RTU assumes legitimate traffic from the HMI. This makes the system very susceptible to attacks that focus on misinforma- tion. An adversary can obtain direct or indirect control of the system through modifying transmitted information. Maynard et al. [39] identified three general phases of any MitM attack, applicable on IEC-104 too:

1. Detection: During this phase, the adversary identifies targets enabling the coming phases.

2. Capture: Data, packets and payloads, are collected from the system.

3. Attack: The adversary use collected data to harm the system.

11 2.4. SCADA Exploits

In their work, ARP spoofing was used to enable MitM attacks in systems using IEC-104. One tested attack altered the CoT of IEC-104 traffic. ARP spoofing has been shown to be identifiable by a NIDS [6, 53].

2.4.4 Denial-of-Service A Denial-of-Service (DoS) attack makes an asset unavailable to its intended user. If packets from the RTU are missing or delayed, the information (on which the control of the system is based) is unreliable. If packets from the HMI get delayed or do not reach the RTU, human control of the power grid is lost. A historical example is the malware Industroyer [17] that left a part of Kyiv without electricity for an hour. The malware was, among other things, capable of preventing connections to devices using a serial connection. There are many ways to achieve DoS in SCADA systems. For example, saturating the re- sources of the target, re-route packets to an unintended destination or maliciously alter servers [56].

2.4.5 Sequence Attacks Sequence attacks misplace packets in a communication stream in order to disrupt the SCADA process. Chromik [10] built and evaluated an IDS tailored for power distribution systems using IEC- 104. One of the attacks used in evaluation aimed to interfere with interlocks used in power distribution. Basically, an interlock requires a sequence of steps to be taken in the correct order to complete an action. Execution of the sequence in the wrong order risks potentially dangerous situations. This can be exploited by an adversary willfully changing the order of the sequence or bypassing a step. One approach taken by Chromik to disrupt a sequence was to randomly drop packets. It was found that removing 0.1% of the transmitted packets did increase the number of observed transitional anomalies by the evaluated IDS.

2.4.6 Desynchronization A time-synchronization attack targets synchronization mechanisms to bring the nodes of a system out-of-sync [48]. SCADA systems, having a low tolerance for delay, could possibly be very susceptible to time-synchronization attacks. Baiocco and Wolthusen [8, 9] exploited the Network Time Protocol (NTP) to destabilize their SCADA testbed. During experimentation, they managed to deny the operator any control of the system and achieved causality re- ordering of events logged in the historian.

2.4.7 Packet Injection By supplying fabricated data to a target in newly crafted packets, an adversary would not need to be as dependent on monitored traffic in the network as with MitM or replay attacks. Gao et al. [55, 42] define an injection attack as an attack that introduces false data into a control system. They show that injection attacks have the potential of modifying process setpoints, interrupting process control or communication or modifying device configurations. As each packet in both control and monitor direction is processed as authentic, SCADA is very susceptible to these kinds of attacks. One instance of a packet injection attack was the Maroochy Shire Sewage Spill [43], the year 2000 in Australia. A disgruntled employee installed company software on their computer and took control of the waste management system. Ultimately releasing millions of liters of sewage into surrounding waters.

12 2.5. RICS-el

2.5 RICS-el

RICS-el [5] is a virtual, experimental testbed, created with the goal of enabling repeatable experiments for security research within SCADA systems. This section describes RICS-el’s design and compares RICS-el to a real-world SCADA system. RICS-el is realised in a collaboration between RICS and the Swedish Defence Research Agency (FOI)1. The ambition was to enable the creation of scenarios closely resembling real- world utility. This included the generation of attacks, implementation of defensive mecha- nisms and generating open datasets in long streams to be used in comparative research. The real-world utility emulated in RICS-el is the power grid.

2.5.1 Design RICS-el consists of a network of Virtual Machine (VM), connecting an office IT segment with a SCADA system, see Figure 2.5 (based on Almgren et al. [5]) for an overview.

1. The RTU Emulator represents the RTUs of a SCADA system. The power grid simulator contains lower-level field devices.

2. The control network is represented by the so-called Operation Technology (OT) LAN. The OT-LAN consists of an HMI, Active Directory, control server and a backup control server. The control network communicates with the RTU Emulator using IEC-104 in spontaneous mode.

3. The Demilitarized Zone (DMZ), OT DMZ, are used to separate the enterprise network from the control network.

4. The enterprise network has access to a simulated Internet and is separated from the OT side with firewalls.

Figure 2.5: Overview of RICS-el

1https://www.foi.se/

13 2.5. RICS-el

Everything in RICS-el is virtualized. The power grid is emulated, built using an operator training simulator module from a well-known SCADA company. Three emulated RTUs con- nect the SCADA front end to the power grid through a Wide Area Network (WAN). The WAN consists of 15 nodes forming a meshed network. The RTUs contain no control loops but serve as interpreters between the power system and the SCADA system. The sensors in RICS-el monitor electrical quantities and some actuator states. Present actuators are breakers and generators. The backlink sends traffic data from all RTUs within the power system which has not been converted to IEC-104. Every host present is running on a VM, which allows for high flexibility in configuration. Each machine in the network of RICS-el has its network address(es), operating system and administrator password documented on an intranet, accessible after being granted an account by a system administrator. To streamline the process of creating scenarios in RICS-el a so-called ScenarioBot has been cre- ated. It allows a user to execute pre-programmed events such as start a recording of network traffic, stored in Packet Capture (pcap) files. There is also a second bot called the OT-Bot, a more specialized version of the ScenarioBot. The OT-Bot issues scheduled commands from the HMI to an RTU. An overview of the relations between the present bots and components of the RICS-el are shown in Figure 2.6.

Figure 2.6: Interactions between bots and SCADA in RICS-el

A user of RICS-el is able to schedule a scenario to run in the ScenarioBot. The user can load pre-configured actions into the OT-Bot and schedule when the actions are to be executed. In turn, the OT-Bot executes its actions in the HMI, controlling the power grid through com- mands sent to the RTU. The available commands set at the OT-bot:

1. Open or close a breaker. 2. Change a generator setpoint.

At the start of a scenario, the ScenarioBot synchronizes two internal databases, unifying initial values and behavior of the power grid. When ready, the ScenarioBot starts to record network

14 2.5. RICS-el traffic in the firewall between the control network and the simulated power grid. When the scenario ends, the ScenarioBot will fetch the recording as a pcap file accessible to the user. The network is configured so that the ScenarioBot has access to the LANs of all other nodes, to ease the addition of future actions.

2.5.2 Real World Comparison There are three major differences between RICS-el and a real power grid:

1. In a real-world counterpart there is no backlink. Instead, data is measured by sensors within the power grid, sent through field devices to the control network. There are packets traveling that route in RICS-el too, but the WAN does not make the RTUs and SCADA front end truly geographically distant. The meshed network is in place to make it appear to be so.

2. The power grid in RICS-el is not relying on actual RTUs but generates RTU traffic it- self. The emulated RTUs in RICS-el only convert this traffic to the IEC-104 standard. Furthermore, the emulated RTUs in RICS-el contain no registers, logic, control loops or programming.

3. The third and the second layer of the SCADA system in RICS-el are realized by dupli- cating a training simulator module.

The training simulator module offers less control than a real-world counterpart. However, it does behave as the real counterpart with generated warnings and alarms et cetera. For example, at the HMI the entire overview is available to the user but not everything can be in- teracted with. Generators can have their setpoint altered and breakers connecting generators and stations to a power line can be opened or closed. The historian was not available. The simulator module is dependant on a database to know about the state of the system. There is one in connection to the power grid and one in the OT-LAN. Both of them need to have their data unified at system startup, a task bestowed on the ScenarioBot. There is no functioning synchronization mechanism, such as NTP, between the virtual hosts in RICS-el. Instead, each virtual host uses the internal clock of the host that the VM is running on.

15 3 Related Work

This chapter contains important related work on the topic of dataset generation and experi- mental attacks within SCADA systems. Section 3.1 covers methodologies of dataset genera- tion and an overview of existing testbeds related to RICS-el. Previous attacks carried out in SCADA system testbeds are introduced in section 3.2.

3.1 Dataset Generation

Without access to SCADA systems in production, one common approach to creating datasets is to build a testbed mirroring a real-world SCADA system. Datasets are then generated when the system is operational and relevant data is recorded. Previous work with virtual testbeds is covered in section 3.1.1 and work in physical testbeds in section 3.1.2. Another method is to create the traffic synthetically, generating traffic in accordance with a set of rules or replicating traffic from previous captures. The approach using synthetic traffic is touched upon in section 3.1.3.

3.1.1 Using a Virtual Testbed One of the most well-known dataset in anomaly detection research is the DARPA 99 dataset [34]. Seven weeks of traffic was recorded including both normal operation and attacks. The DARPA 99 dataset does however fail one of the more challenging aspects of dataset genera- tion, that both background traffic and injected attacks have to carry the same characteristics. If the packets differ too much from each other then it becomes easy to discover anomalies. For example, when Mahoney trained an anomaly detector on the DARPA 99 dataset, he found that the Time To Live (TTL) field in the IP header had to be set to zero to not make the attacks too easy to detect [36]. TASSCS by Mallouhi et al. [37] is a testbed that simulates an electrical grid, HMI, PLC, IDS and a network including DMZ and an office segment. TASSCS is constructed primarily to perform anomaly-based IDS research. That said, TASSCS lacks remote access [11] which is needed to be a competitive candidate for dataset generation. In 2018, Maynard et al. [40] took a minimalistic approach to build a SCADA testbed. They show that researchers have access to simplistic and cheap alternatives for experimental re-

16 3.1. Dataset Generation search and dataset generation. The testbed can be deployed on the researchers’ computer. They use VMs to simulate each machine, the SCADA network can be set up on virtual hosts in any way the user sees fit. The testbed is an open framework that currently supports IEC- 104. An important aspect of the testbed is that it allows for physical devices or networks to be connected. One dataset has been documented using this testbed in a setup with five RTUs, one HMI and a historian. The small size of the testbed is also one of its flaws as it has no support for a DMZ or remote devices. In contrast to RICS-el, mentioned testbeds are lacking software for scenario creation. This hinders repeatability and means to log each action taken in the environment. The only prior dataset generated in RICS-el has been during normal operation. Lin and Nadjm-Tehrani [32, 33] used such datasets to analyze patterns in IEC-104 traffic.

3.1.2 Using a Physical Testbed The DARPA 99 dataset [34] was not generated in a SCADA environment. In fact, when the development of RICS-el took off, SCADA testbeds and datasets were hard to gain access to [5]. However, since DARPA there has been a number of testbeds built to deal with the scarcity of SCADA testbeds. One example of a SCADA testbed is the National SCADA Test Bed (NSTB) [18]. NSTB is a United States national lab that utilizes a power grid test range with miles of electrical transmission lines and several substations. There are no published datasets for public use generated at NSTB. Another physical testbed is the European project CRUTIAL [16]. CRUTIAL includes a teleoperation and a microgrid testbed. Data gathered in CRUTIAL are statistics about the effect of attacks rather than network data, which is needed during NIDS development. More recent testbeds, that have been actively involved in dataset generation, are three testbeds called SWaT, WADI and EPIC created at Singapore University of Technology and Design (SUTD) [26]. The testbeds at SUTD are physically implemented to enhance realism. In general, what is gained in fidelity and reliability while using a physical testbed is lost in repeatability, scalability and cost [46]. For instance, RICS-el showcases its scalability through having a virtual office segment accessible. SWaT [38] was built in 2015 and models a water treatment facility, enabling experimental research of SCADA systems. In 2016, WADI [54] was added as an extension to SWaT forming a complete water treatment, storage and distribution network. One year later the testbed EPIC [1, 2] was introduced supplying both SWaT and WADI with power. Datasets have been collected in SWaT during normal operation and attack scenarios [26]. The attacks have been conducted in the context of tests and research purposes but also during events called showdowns [23, 6, 35]. During a showdown teams compete to come up with effective and deceptive attacks, generating datasets with a diverse set of attacks. All datasets from the three testbeds at SUTD contain network traffic. Some have sensor read- ings and data from the historian too. Whether a dataset is labeled or not varies, if attacks have been launched in a controlled setting the dataset is generally labeled. S. Adepu et al. [24] illustrated how to generate labeled datasets with attacks in SWaT. They noted that the labeling was quite straightforward. When action was taken to start or stop an attack, relevant information was stored. They logged the following:

• Start time of the attack.

• Stop time of the attack.

• Attack points, the target(s) of the attack.

• Start state, the target(s) initial state.

17 3.1. Dataset Generation

• Attack description.

• Attack value, the substituted value of a sensor.

• Attackers’ intent of the attack.

What is lacking from this description is whether any commands were issued from the control network and if and how those commands were part of the labeling too. The bots in RICS- el make sure events, as issued commands, are repeatable between experiments and can be logged.

3.1.3 Using Synthetic Data Synthetic data generation could remove the need for a testbed, greatly reducing the cost of dataset generation. Despite that, the generation of SCADA datasets containing synthetic data or synthetically injected attacks is to the best of our knowledge not a particularly explored topic. One example of the creation of synthetic data, not used to create datasets, is a SCADA traffic generator by Al-Dalky et al. [15]. The generator generated malicious traffic based on the rules of a NIDS. Consequently, the generator did not create complete attacks but rather a stream of individual packets not conforming to the rules of a certain NIDS. This generator failed to create realistic scenarios with attacks one can expect in real applications. However, the synthetic generation of datasets in traditional IT is a more active research area. One example is the tool ID2T by Garcia C. et al [14]. Garcia C. et al defined requirements on both generated datasets and the tools that generate them. How these requirements are im- plemented in this study are described in section 6.1. ID2T takes input in the form of recorded background traffic and injects synthetic attacks into that traffic. ID2T also generates labels for what attacks are used and when. As ID2T is limited to the supplied background data it can only approximate the effects of attacks that would change the state of the system. An issue that is not as prominent when using a testbed. ID2T also has a problem with labeling the datasets if attacks are present in the provided background traffic but when using a testbed the effects of one can control whether attacks are present or not. An issue with synthetic attacks is that the payload of a packet in legitimate traffic is incredibly hard to estimate [30]. The state of the system could heavily influence the information found in payloads. By carrying out attacks within a SCADA system the dependency of provided background data and the need for estimating the effect of an attack on payloads is removed. Furthermore, the state of the system can be observed, making the correctness of generated at- tacks much easier to verify in comparison to synthetically created attacks. Another difficulty with synthetic attacks is to inject traffic with realistic characteristics and timing [30]. Timing of attacks is a non-issue when attacks are generated in a testbed. In conclusion, RICS-el has the potential to generate datasets without the difficulties and re- strictions of synthetic generation.

3.1.4 This Study’s Approach to Dataset Generation Being virtual, RICS-el is both cost-efficient and flexible in comparison to a physical counter- part. The fact that RICS-el is virtual and supports the generation of scenarios using bots also greatly increases repeatable events and experiments. This feature is important in order to generate consistent datasets while using different configurations in scenarios. Other benefits of generating datasets in RICS-el rather than a physical testbed include:

• VMs can be easily added or removed to suit the needs for a specific scenario by a system administrator.

18 3.2. Attack Types and Attack Evaluation

• The system can be reset much quicker.

• There are no physical devices that could be harmed during attacks.

RICS-el implements a SCADA system communicating with the IEC-104 protocol, which only one other quoted source used (Maynard et al. [40]). Their testbed is built to be easily de- ployed, making it less expansive than the testbed RICS-el. For instance, RICS-el contains an office segment while Maynard et al.’s testbed do not. The biggest concern however is that their testbed is not accessible remotely nor have devices located at a distance from the con- trol network. The testbed also lacks the means of creating scenarios. Hence, RICS-el is seen as better suited than the testbed by Maynard et al. when it comes to generating accessible datasets in repeatable scenarios. The datasets generated in RICS-el will be labeled similarly to the straightforward approach of S. Adepu et al [24]. When an action is taken by the ScenarioBot, such as starting or stopping an attack, the action is logged to be used when labeling datasets.

3.2 Attack Types and Attack Evaluation

This section offers an introduction to previously carried out attacks in SCADA systems and their results. The attacks in this study are generated in a testbed where no previous attacks have been documented. Attacks covered in this section have influenced the choice of attacks performed in this study and functioned as a basis for the evaluation of attacks.

3.2.1 Scanning Maynard et al. [40] identified nodes using the IEC-104 protocol in a network using Nmap and a script developed by Timorin [52]. The script detected IEC-104 devices by sending TESTFR packets in the network, to which an IEC-104 node was expected to reply. On confirmation, a STARTDT packet is sent and the reply is checked for the common address. If not given, the script broadcasted an interrogation command. As known IEC-104 devices were found by the scan it was evaluated as successful. That approach, however, failed to address how to find IEC-104 devices when the adversary does not know what network identity (network ID) the RTUs use. This study uses the same approach to identify IEC-104 devices but complements the scanning with a few commonly used networking tools.

3.2.2 Man-in-the-Middle In a MitM attack that Maynard et al. [39] conducted in a system using IEC-104, certain packets had their CoT changed, together with the state an RTUs Input/Output port was in. The tool Wireshark was used to analyze network traffic and verify the impact of the attack. Maynard et al. could confirm that the sent packet differed from the received and that the modified packet was accepted as legitimate by the receiver. Maynard et al. built a library to support further MitM attacks on the IEC-104 protocol. It pro- vides the means of doing ARP spoofing with Ettercap, TCP network functions, code skeletons and building blocks to add on new attacks. This study uses this library to modify IEC-104 payloads in more ways than just altering the CoT.

19 3.2. Attack Types and Attack Evaluation

3.2.3 Replay In the study mentioned in section 3.2.2, Maynard et al. also conducted a replay attack. They captured all monitored packets in transmission between an RTU and HMI and filtered out non-IEC-104 traffic using Wireshark before replaying the traffic. The replayed packets were rejected, dropped by the receiver’s TCP/IP stack as TCP sequence numbers of the captured packets were not legitimate. In other words, the replay attack was not effective. The replayed packets in this study use sequence numbers that the receiver accepts. Using RICS-el, Lin and Nadjm-Tehrani [33] created a 20-second traffic sequence which was replayed in the original inter-arrival times. The resulting attack is shown to be detectable by their NIDS, especially under the conditions of: Low event rates in the original traffic; and longer periods of the attack being active. However, in contrast to this study, their replay attack was not done live in RICS-el but on previously generated pcap files.

3.2.4 Denial-of-Service The testbed used in the SUTD Security Showdown (S3) 2016 [6] used a protocol dependant on TCP/IP. This dependency was exploited by a team using SYN flooding. The SYN flood made the field device unreachable to anyone, as it was busy waiting for handshake acknowledge- ments. This made the HMI unable to acquire any further information from this field device which was seen as a verification of a successful attack. Another attack during the S3 2016 was an attack using ARP spoofing to capture packets in- tended for the HMI. The packets were redirected to another destination and dropped. All the while, millions of packets were sent in batches to the field device. The attack was considered a success because the operator lost connection to the testbed’s RTUs [3]. Kalluri et al. [27] conducted various DoS attacks in a SCADA network. To measure the impact of attacks they observed the time it took for the HMI to receive a response from an RTU. In this study, ARP spoofing in combination with dropping packets and sending large quan- tities of packets to SCADA devices are used as DoS techniques. Ping, the Round-Trip Time (RTT) for Internet Control Message Protocol (ICMP) echo request packets, is used in order to evaluate attack impact.

3.2.5 Desynchronization Baiocco and Wolthusen [8] conducted two attacks on the NTP protocol to achieve desynchro- nization in their IEC-104 testbed. In the first attack, the NTP servers had their clocks altered which propagated invalid time settings through the network. They noted that devices labeled or re-labeled packets with inaccurate time tags, or started to wait for packet source synchro- nization. If the time gap became too wide, there was a system re-synchronization that left logs in the historian ambiguous, making it hard to determine what request led to what re- sponse. Baiocco and Wolthusen concluded that this may sabotage control loops and make it more problematic to conduct auditing or intrusion detection. In the second attack, Baiocco and Wolthusen [9] fed the RTUs fabricated NTP packets, desyn- chronizing them from the rest of the system. When a targeted RTU was out of sync by a certain threshold, about 30 seconds, no commands the operator sent were executed. Despite that, all commands were accepted and processed by the RTU. The ITF was not set in the RTU, as it was in sync with the NTP server from its perspective. However, the HMI noticed in- consistencies in the time tags which caused it, to no avail, to check the status between RTU and NTP servers. The attack was seen as successful as it denied the operator control of the targeted RTU.

20 3.2. Attack Types and Attack Evaluation

Affecting the SCADA process through NTP is not possible in RICS-el. So while being an interesting attack vector it is not plausible to reproduce. Rather, an attack is created where time tags of IEC-104 packets are modified before reaching its receiver, simulating system desynchronization.

3.2.6 Packet Injection Gao et al. [55] offer two examples of how an adversary could perform an injection attack: Either through injecting commands into the network; or through overwriting an RTUs pro- gramming or register settings. The latter is not possible in RICS-el as present RTUs are emula- tions. The work of Gao et al. revolved around the communication protocol Modbus. Modbus is a protocol without digital signatures or authentication which makes it vulnerable to injec- tion attacks, much like IEC-104. They describe a scenario in which the adversary use ARP spoofing to introduce completely new, fabricated packets, into the traffic. In an investigation of Morris and Gao [42] injection attacks of various degrees of complexity are covered. In one of the attacks, Morris and Gao changed the setpoints of how much a water tank should contain, making the system regulate the amount of water to an unwanted level. With that observation, the attack was evaluated as a success. This attack is launched in this study, with a generator as a target, through the injection of fabricated packets in the established connection between the HMI and an RTU.

3.2.7 Conclusion of Attacks The desynchronization attack cannot be implemented in RICS-el as described. Instead, a MitM attack will be used to make devices appear out-of-sync. Other attacks are executed as described, with small adjustments to fit conditions given in RICS-el and the IEC-104 protocol.

21 4 Methodology of Dataset Generation

This chapter covers how datasets were generated in RICS-el, using the ScenarioBot and a new attack tool called the attack-bot. The chapter begins with section 4.1 presenting the attack-bot implementation and location in RICS-el. The attack-bot was created to generate attacks in repeatable experiments. The following section 4.2 explains the setup for dataset generation, covering relevant data flow and component interaction. This information is needed to fully comprehend the workflow for dataset generation, presented in section 4.3.

22 4.1. Attack-Bot Implementation

4.1 Attack-Bot Implementation

This section covers implementation details regarding the attack-bot, set up on a node in RICS- el called the MitM Machine. The location of the attack-bot was chosen so that MitM attacks were possible. The MitM Machine had the operating system Ubuntu 16.04 and the MitM library by Maynard et al. installed. The MitM Machine was on the same LAN as the HMI, the OT-LAN, see Figure 4.1.

Figure 4.1: Network configuration in the experiment setup

The following set of parameters were configurable on the attack-bot, setting up for a usable interface with relevant parameters:

• What attack the attack-bot should generate. If no attack was specified, a random attack was selected.

• IP addresses of attack targets. If not specified, the default addresses of one RTU and the HMI in RICS-el were used.

• Duration of the attack in seconds. If not specified the attack-bot defaulted to generate attacks until terminated.

The complete command-line interface of the attack-bot is presented in Appendix A. This is the interface used by the ScenarioBot to initiate the generation of any attack implemented in the attack-bot. To make the attack-bot portable and easy to deploy on another node or system it was created with Bash and Python3. The attacks were written in C, using the attack-code template in Appendix B.

4.2 Experiment Setup

This section gives an overview of the interactions between nodes involved in the dataset generation. The experiments were conducted at the lower two layers of RICS-el’s SCADA system, the so-called OT side, consisting of the OT-LAN, RTU and power grid. Nodes of interest were the power grid, ScenarioBot, HMI, RTU and a firewall between the HMI and RTU. At the HMI and power grid, an overview where breakers, generators and voltage levels of the power grid could be seen. Both machines were running on Windows 10 and had Wireshark installed.

23 4.2. Experiment Setup

An addition made to RICS-el’s OT side was the attack-bot, tasked with generating attacks. The setup of the virtual environment is shown in Figure 4.2.

Figure 4.2: The attack-bot in RICS-el’s dataflow

Orange arrows represent normal IEC-104 traffic flow in monitor and control direction. How- ever, the attack-bot was able to re-route traffic to and from the HMI through ARP spoofing. This made traffic follow the red arrows between the HMI and firewall instead. Yellow arrows show what components that the bots in RICS-el could control through a set of actions using Secure Shell (SSH). These arrows are one-directional as no component generates a response. The blue arrows depict communication not using IEC-104 and where no attacks or bots were involved. During experimentation, the ScenarioBot was used to create scenarios. In a scenario, an event could be that the attack-bot started generating attacks or that the OT-Bot issued a control command. Each scenario was tested at least once in RICS-el, running for 30 minutes with attacks active for large portions of the scenario.

24 4.3. Dataset Generation Workflow

4.3 Dataset Generation Workflow

This section covers the workflow of the dataset generation, an iterative process of six steps. Each step is detailed in its respective section. The workflow of dataset creation is shown in Figure 4.3.

Figure 4.3: The dataset generation workflow

All scenarios were arranged in a queue for processing. During script preparation, the scripts of the ScenarioBot and attack-bot were updated to suit the first scenario in the queue. The update included the following actions:

• The attack-bot had added or adjusted attacks.

• The ScenarioBot could have had new actions added.

When an attack was ready to be tested, the ScenarioBot and OT-bot were used for system configuration and scheduling of events as follows:

• The ScenarioBot readied the system for a scenario.

• Events were scheduled in the ScenarioBot.

• IEC-104 commands were scheduled in the OT-Bot.

The ScenarioBot was used to launch the scheduled events, creating a scenario. One such event could be to start the attack-bot that would start generating attack(s). When a scenario finished data was collected, forming a dataset. The dataset was then pro- cessed before being evaluated. After the evaluation step, a new iteration of the workflow was executed with the next scenario in the queue.

4.3.1 Script Preparation In the first iteration of each attack scenario, an attack script, executable by the attack-bot, was created to suit the attack scenario. To add a new attack to the attack-bot, a script detailing the function call to launch the attack had to be added to the attack-bot’s source code. When the

25 4.3. Dataset Generation Workflow attack-bot had a new attack implemented, an action was added to the ScenarioBot to launch the attack. This was done by specifying what argument the ScenarioBot needed to launch the attack-bot with to start the new attack. If the data evaluation step of the dataset generation workflow showed that the attack needed to be altered, that was done during this step too. Examples of alterations were, but were not limited to, updated attack timings or altered payload modification values. A configuration file stated which IP addresses the attack-bot should target. If targets changed between attack scenarios, this file needed to be updated.

4.3.2 System Configuration The system configuration was an action, scheduled at the start of each scenario. The action was a reset command to stabilize the system issued by the ScenarioBot. The reset command, system state reset, reverted the system to normal operation after an attack scenario, readying the system for additional experiments. The steps included in the system state reset:

• Shut down the attack-bot’s attack generation.

• Setting process values in the internal databases to pre-defined values. This was to re- store the power grid to a given, operational state.

• Unification and synchronization of internal databases.

• Enable incoming connection requests on involved actors (RTU, HMI).

• Resetting communication between RTU and HMI. This was caused indirectly when the power grid was restored.

The reset temporarily stopped the IEC-104 traffic. When the IEC-104 traffic resumed, the system was stable and ready for use. According to observations made during resets, it takes about one minute for RICS-el to get stable again.

4.3.3 Running an Attack-Scenario Actions during a scenario were initiated remotely from the ScenarioBot. The ScenarioBot launched actions by establishing SSH connections to the different nodes where actions needed to be executed. When connected to a node, scripts were started using a command-line interface, triggering the actions. The ScenarioBot used the following actions in each scenario:

1. System state reset: RICS-el became stable and ready for a new scenario.

2. Start recording: This action started a recording of all network traffic monitored in the firewall.

3. Launch the OT-bot: Through this action, the ScenarioBot told the OT-Bot to load and execute a set of commands. Such as opening or closing breakers or changing the gener- ator setpoints.

4. Start an attack: Starting an attack made the attack-bot launch an attack from its location in the OT-LAN. Which attack could be specified before the scenario started.

5. Stop attacking: This action halted all attacks that the attack-bot was generating.

26 4.3. Dataset Generation Workflow

6. Stop recording: This action canceled any active network recording and downloaded the created pcap file to the ScenarioBot’s location. No network traffic was recorded between attacks, as no IEC-104 traffic was generated during a reset.

This sequence of actions was used back to back to run several attack scenarios in one go, see Figure 4.4. Between each cycle, each action was configured to suit the next attack scenario.

Figure 4.4: Running scheduled attack scenarios

The recorded datasets were available after the completion of a scenario when the recording was stopped. The user orchestrating the scenarios could use Wireshark at the HMI to super- vise the network traffic in real-time.

4.3.4 Data Collection All network traffic was recorded at the firewall between the RTU and HMI using tcpdump initiated from the ScenarioBot. Tcpdump records packets in their raw format with all headers and layers intact together with a timestamp. When the action to stop recording was issued, the network traffic was fetched and stored in a pcap file on the ScenarioBots disk. The ScenarioBot had its every action logged in a Comma Separated Values (CSV) file. The logging included a timestamp and the action taken. This meant that integrating the attack- bot with the ScenarioBot, enabling scheduling of attacks, created labels stating at what times actions were taken. Hence, each dataset acquired labels per flow, detailing for instance at what times RICS-el were under attack. An example of a log-entry could look like: 2020- 06-22 14:38:51,Command sudo python3 /root/Attack-Bot/attack-bot.py -a replay -s sent to 171.173.192.66 Together, the pcap file with the network traffic and the CSV file with the labels formed a dataset. However, more data was collected before the data processing step:

• The IEC-104 commands issued by the OT-Bot were noted together with the command timings.

• The attack-bot produced logs which were collected containing the following informa- tion:

– Active attacks and which phase an attack was in. – Captured and forwarded packets, detailing the information found in the packets ASDU.

27 4.3. Dataset Generation Workflow

The ScenarioBot did not access this information as the attack-logging was mostly thought of as a debugging tool. Network traffic was recorded at the firewall and the attack-bot was active in the OT-LAN. Consequently, datasets with attacks that alter traffic going from the RTUs to the HMI do not contain packets with altered payloads.

4.3.5 Data Processing The data processing included the following steps:

1. The total number of network packets in the pcap file was counted using Wireshark.

2. The number of IEC-104 packets in the pcap file seen as legitimate by Wireshark was counted.

3. Attack-bot logs were analyzed to check if there were any run-time errors.

4. The attack-bot logs were also parsed for attack-specific data, such as the number of packets recorded in a replay attack.

5. The overviews in the HMI and power grid were compared to check for attack impacts or distortions.

6. Network traffic characteristics were examined in the collected pcap file.

7. A check was done to see whether dataset labels were present and accurate, especially attack intervals and attack type.

4.3.6 Iterative Dataset Evaluation Each generated dataset was evaluated based on the outcome of the data processing. This was done to determine the appropriate actions in the next cycle of the workflow, depicted in Figure 4.3. The flowchart in Figure 4.5 shows how the evaluation was performed. Evaluation of the effect of an attack on RICS-el was done using the intent of the attack. For example, if the intent was to replay IEC-104 packets, the answer was "YES" if IEC-104 packets were replayed and accepted. Or, if the intent was to flood the network with traffic but there was no substantial increase in network load then the answer was "NO". To evaluate whether an issue had been resolved, another iteration of the workflow was re- quired. The flowchart allows an infinite number of iterations from start to the decision box "Was the issue able to be resolved?". But, the process stops with a "NO" answer after two iterations if ending up in that decision box twice in a row.

28 4.3. Dataset Generation Workflow

Figure 4.5: Flowchart of iterative dataset evaluation

29 5 Attack Generation in RICS-el

An investigation of attack vectors through a description of the attacking agent and the system under attack is found in section 5.1. To offer a diverse set of attacks, twelve attacks of six different types were defined and carried out in RICS-el. The attacks are described in section 5.2.

5.1 Attack Model

The purpose of the attack model was twofold: to identify targets in RICS-el and to analyze how to attack them to threaten the safe and functioning process of the SCADA system. This section describes RICS-el and the adversary who was attacking RICS-el.

5.1.1 The Adversary In this study, the adversary, or attacker, was a disgruntled former employee with a lot of IEC- 104 knowledge. The outer defenses of RICS-el were circumvented and a node was connected to the OT-LAN, see Figure 4.1. The adversary intended to disrupt the SCADA process. The adversary has a finite set of resources, with the node connected to the OT-LAN as the only available platform. From this station, the adversary has access to the communication channels of the OT-LAN and the means to read and write data on them.

5.1.2 The System, RICS-el Cyber-physical attacks aim to harm physical devices, such as sensors. RICS-el, being a virtual environment, lacked such assets. RTUs had no logic to target, the historian was unavailable and there was no synchronization protocol in use. Therefore, a delimitation was set to focus the attacks on the IEC-104 protocol, only occurring on the OT side of RICS-el. More specifically, only between two RTUs and the HMI. The environment in which the adversary acted is seen in Figure 4.2, chapter 4. During a scenario, IEC-104 traffic in both control and monitor directions was transmitted through a firewall between HMI and RTU. This traffic is represented with the orange arrows

30 5.2. Attack Scenario Implementation in Figure 4.2. In all attacks, except those launched in scenarios 2 and 11, the traffic between the HMI and firewall was re-directed through the MitM Machine. The attack traffic flow is represented using red arrows in Figure 4.2, the traffic between the RTU and firewall was unaltered. In scenarios 2 and 11 normal IEC-104 data flow was not tampered with. Instead, the attack-bot communicated directly with the RTU. Out of the two emulated RTUs, one RTU was selected as a target for all attacks. That RTU was connected to three generators, nine breakers and several sensors which monitored the power grid. With the HMI chosen as a second target the following goals were identified as achievable and interesting to an adversary:

• Deny communication between targets.

• Make the HMI and/or RTU receive false information.

• Take control over the actions the RTU should execute.

Identified attack types were set to reconnaissance, DoS, MitM, replay and packet injection attacks. This achieved the above goals and created a diverse set of attacks. The interactions available to the adversary to create the attacks were:

• Using any command-line interface actions that the MitM Machine had pre-configured.

• Capture and modify, store and/or drop packets sent to or from the HMI.

• Inject packets to the network.

5.2 Attack Scenario Implementation

This section presents the implementations of attacks in the experiments. Each attack with the exception of scenario 1 was integrated in the attack-bot. The attack-bot was used to generate each attacks in a scenario, an attack scenario, resulting in a dataset. See Table 5.1 for an overview of all attack scenarios. In the attack model, the adversary knew the IP addresses of the HMI, RTUs and the firewall, found through the execution of scenario 1 in section 5.2.1.

Table 5.1: Overview of implemented attack-scenarios Attack Scenario # Attack description type The adversary search for targets and use ARP spoofing at the 1 Scanning identified targets 2 The adversary drops all IEC-104 packets between two targets DoS 3 The adversary targets an RTU with SYN flooding DoS The adversary uses network traffic flooding, targeting the HMI 4 DoS and RTU 5 The adversary drops IEC-104 packets at random Sequence 6 The adversary alters the sequence numbers of IEC-104 packets MitM The adversary alters sensor values in IEC-104 packets in the 7 MitM monitor direction 8 The adversary alters time tags of IEC-104 packets MitM 9 The adversary alters IOAs of IEC-104 packets MitM 10 The adversary replays IEC-104 packets in control direction Replay The adversary injects a setpoint command into the communica- 11 Injection tion stream 12 The adversary masquerade as a second HMI Injection

31 5.2. Attack Scenario Implementation

5.2.1 Scenario 1: Adversary Gets Established (Scanning into MitM) This initial scenario was used to identify targets in RICS-el. Specifically, the adversary wanted to identify an RTU, the HMI and a network node between the two to be able to install oneself as a MitM. To achieve this, both active scanning and passive monitoring techniques were used. The attack-bot was not implemented with this attack, the adversary conducted the steps manually. First, the adversary jacked into the OT-LAN and identified other hosts on the network. Then, the adversary connected to the other hosts and monitor the traffic for signs of IEC-104 traffic. When found, the adversary had to figure out which host the IEC-104 traffic was monitored on before establishing the MitM Machine. To be able to capture IEC-104 packets the adversary needed to find hosts that were involved in IEC-104 transmissions. The approach used to find hosts that communicate using IEC-104 on the network:

1. The adversary identified the MitM Machine’s own IP address. Then used the first three bytes of the IP address to identify the network ID. Command: ifconfig 2. With the address and network ID noted the adversary searched for other hosts in the LAN. Command: arp -a or nmap -sn .* 3. When other hosts were found, the adversary connected to one of them. Command: ssh 4. When connected to another host, the adversary captured network traffic and identified IEC-104 traffic by searching for port 2404. Command: tcpdump | grep 2404* 5. If no IEC-104 were monitored by the adversary step 3 was tried with a different IP- address collected in step 2.

After capturing a packet from or destined to port 2404, IP addresses of the involved sender and receiver were extracted. As that would be the RTU and HMI, those were now identified. The adversary then compared the IP addresses of the HMI and the node currently connected to. If the IP addresses were not equal, the adversary concluded that the host currently con- nected to were in between HMI and RTU. In the OT-LAN, this would mean that the adversary knew the address of the firewall too. Otherwise, the adversary ran traceroute and looked for the path a packet took to the RTU, knowing that the address that shared network ID with themselves was the firewall. With the HMI and firewall between the WAN and OT-LAN identified, the adversary enabled the interception of TCP transmissions through ARP spoofing: ettercap -T -M arp:remote - P // //. This established the adversary as a MitM, allowing the adversary to monitor all packets transmitted between the HMI and RTU. The adversary identified IEC-104 packets by checking for the IEC-104 start byte in the packet payload. Finding additional nodes using IEC-104 was done by broadcasting TESTFRs, using the ap- proach described in section 3.2.1. In the example below there is one RTU found using the steps above and two IEC-104 devices with unknown IP addresses, see Table 5.2.

Table 5.2: IP addresses of IEC-104 devices Device IP address (made up) RTU 12.3.245.56 Unknown 1 12.4.245.56 Unknown 2 12.10.245.56

32 5.2. Attack Scenario Implementation

By performing these steps the adversary revealed the two previously unknown IP addresses:

1. The adversary scanned for IEC-104 devices. Command: nmap 12.*.245.56 -p 2404

2. Identified targets from step 1 were included in a port scan launched by the adversary. Command: nmap 12.3-10.245.56 -A

3. Finally, the adversary detected IEC-104 devices in a probing scan, actively looking for devices responding to IEC-104 communication. Command: nmap -Pn -n -d –script iec-identify.nse –script-args=‘iecidentify.timeout=500’ -p 2404 12.3-10.245.56

5.2.2 Scenario 2: IEC-104 Packets Dropped In this scenario, the adversary wanted to deny the operator all services of the RTU. This was achieved by dropping all IEC-104 packets routed through the firewall.

5.2.3 Scenario 3: Network Flooding (SYN Flood) In another attempt to deny access to the RTU, a SYN flood was launched targeting all ports ranging from zero to 4096 on the RTU. The ports targeted were selected to cover a wide range while still having a chance at interfering with the IEC-104 traffic sent on port 2404. The adversary in this scenario did not use ARP spoofing but instead sent SYN packets to the RTU from the MitM Machine.

5.2.4 Scenario 4: Network Flooding (IEC-104 Flood) The goal of this attack was to deny legitimate traffic between the RTU and HMI to be trans- mitted promptly. For a period of 20 seconds, each identified IEC-104 packet was stored together with its TCP header. When the recording stopped the packets were replayed to the source, as quick as the MitM Machine would allow. The reason behind choosing to store packets over a 20 second period was to ensure that packets intended both for the HMI and the RTU were captured, to make both of them targets. That said, the HMI was expected to be flooded to a much higher extent. As for every packet sent to the RTU in RICS-el one could expect around eight to be sent to the HMI.

5.2.5 Scenario 5: Unreliable Connection (Sequence Attack) The goal of the attack was to introduce a risk that crucial data could be lost at a devastat- ing moment. Dropping a packet containing necessary data, such as one in the middle of a sequence. After a packet was captured through the use of ARP spoofing, the adversary identified if it was an IEC-104 transmission. If it was, the packet was not forwarded from the MitM Machine to the intended recipient of the packet.

5.2.6 Scenario 6: Altered Payload (Sequence Numbers) S-format packets sent to the RTU had their sequence number in the APCI lowered. The goal was to make packets sent from the RTU appear unacknowledged by the HMI. This way, the adversary hoped to deny the RTU to clear its buffer and resend old packets. The offset of the sequence numbers started at one and was incremented by one for every 16th APCI captured. Using this slow approach to offsetting sequence numbers, the adversary hoped that the attack would be less apparent to operators and control loops. If the sequence

33 5.2. Attack Scenario Implementation number was too small to subtract the offset from, then the sequence number was halved instead to avoid underflow.

5.2.7 Scenario 7: Altered Payload (Sensor Values) The goal was to target information sent from the RTU to HMI and modify it so that the HMI was fed with misinformation. If the adversary captured an I-format packet then the function of the packet was checked, looking at the type identification in the ASDU. Packets of the type M_ME_NA_1, M_SP_NA_1 and M_DP_TB_1, which were prevalent in RICS-el, had their IE altered; the normalized value was set to decimal zero before letting the transmission through to the HMI.

5.2.8 Scenario 8: Altered Payload (Time Tag) In this scenario, the adversary aimed to desynchronize the RTU. As NTP was not used in RICS-el, the adversary captured IEC-104 packets and modified the time tags. The adversary chose to filter out types M_DP_TB_1, C_SE_TC_1, C_DC_TA_1 and C_CS_NA_1. The time tag of filtered packets was modified by a separate amount in two separate tests. In the first test, the time tags were altered so that they appeared to be two hours early. In the second test, the time tags were set to be ten seconds older than they orig- inally were. The modifications were done to give the illusion that the RTU’s internal clock was either desynchronized or that the packet had arrived late.

5.2.9 Scenario 9: Altered Payload (Breaker IOAs) The adversary monitored each packet between the HMI and RTU waiting for a command used to open or close a breaker. The whole sequence of transmissions when such a command was issued:

• The HMI sent a packet of type C_DC_NA_1, asking for the breaker to be selected.

• The RTU responded with an acknowledgement with packet type C_DC_TA_1.

• The HMI generated another C_DC_NA_1, asking the breaker to execute the command.

• The RTU responded with another C_DC_TA_1.

• The RTU sent a sensor reading in packet type M_DP_TB_1, containing an update of the breaker’s state.

The adversary wanted to redirect the command to another breaker and hide that activity from the operator by modifying the object addresses, the IOAs. The adversary stored the IOAs of discovered breakers and their corresponding sensors in pairs. When the adversary had at least two pairs of IOAs, captured packets of mentioned types had their IOAs swapped with the IOAs of another pair.

5.2.10 Scenario 10: Replay The chosen design for the replay scenario was to get old IEC-104 packets from the RTU sent to the HMI to be replayed over and over. The goal of the adversary was to have the HMI accept old data and disguise any subsequent activity in the power grid. The adversary recorded and stored each ASDU from the RTU to the HMI over a 20 second period. The stored packets had the same ASDU address field value and were of the types

34 5.2. Attack Scenario Implementation

M_ME_NA_1, M_SP_NA_1, M_DP_TB_1, M_SP_TB_1 and M_ME_TF_1. When the record- ing was completed, all incoming IEC-104 packets with the mentioned attributes had their ASDU replaced with a recorded ASDU.

5.2.11 Scenario 11: Inject Fabricated Packets to Stream RTU-HMI The adversary’s objective in this scenario was to send their commands to the RTU. The ap- plied technique was to masquerade as the HMI, injecting commands into the established TCP connection between the RTU and HMI. After setting up as a MitM, the adversary waited for the HMI to send a setpoint command of type C_SE_NC_1. The contained IOA revealed the IOA of a generator in the power grid to the adversary. The adversary created a replica of the commands APDU. But the value found in the IE was set to 0x00000303, essentially interpreted as setpoint value zero. With the replica ready, a TCP header was created: extracting source, destination, flags, sequence and acknowledgement numbers from the most recent transmission between the HMI and RTU. The packet length was modified and the fabricated C_SE_NC_1 was added as payload to the TCP-header and subsequently sent to the RTU. Since the length of this transmission got changed, a gap in TCP sequence and acknowledge- ment numbers between RTU and HMI was introduced. This gap increased while the com- munication was hijacked by the adversary, seeing the transmissions through. Until the RTU confirmed the new setpoint and the adversary acknowledged the RTU with forged TCP pack- ets, no packets were allowed to reach the HMI. Packets were continuously captured and updated after the injection so that the sequence and acknowledge numbers of both TCP and IEC-104 were updated in both directions. This way, the injection would not be seen in the network traffic from the perspective of the control network.

5.2.12 Scenario 12: Inject Fabricated Packets to RTU The goal of the adversary in this scenario was to gain the same privileges and access to the RTU as the HMI had. The adversary identified an RTU through broadcasting interrogation commands and listened for responding nodes to extract port and IP address. A TCP handshake was performed be- tween the RTU and the adversary’s machine. As soon as that was accepted a STARTDT-frame was sent to the RTU to initiate IEC-104 communication. After the RTU had acknowledged the STARTDT, packet types C_DC_TA_1 and C_SE_NC_1 were sent to various IOAs. This was repeated until given an affirmative response, meaning that the adversary found a correct value for an IOA. Then the adversary repeated this process on other IOAs, trying to scan for every actuator in the power grid station.

35 6 Method of Evaluation

This chapter covers how the results were to be evaluated. Section 6.1 specify the requirements set on the datasets. How these requirements were evaluated is described in section 6.2. If the datasets were shown to meet these requirements it would demonstrate that the attack-bot was a suitable tool for generating datasets in RICS-el. The evaluation of attack impacts is listed in section 6.3. The motivation for evaluating attacks was that without some proof of damage of a generated attack, the attack-bot was not generating anything useful.

6.1 Dataset Requirements

This section covers the requirements set on the datasets. To be usable for researchers the generated datasets must fulfill certain requirements. This study used the five functional and three non-functional requirements defined by Garcia et al. [14].

6.1.1 Functional Dataset Requirements The functional requirements of this study were set to be following:

1. Payload availability: Since plenty of NIDSs depend on payload inspection, it is re- quired that datasets do not have encrypted or removed payloads.

2. Labeled attacks: Each packet needs to be associated with a label.

3. Ground truth: The labeling needs to be correct and the background traffic cannot con- tain any unlabeled attacks.

4. Growing: When new attacks and new network patterns emerge, the datasets will be- come obsolete if they are not updated or cannot adapt.

5. Attack diversity: The datasets need a diverse set of network attacks.

6.1.2 Non-Functional Dataset Requirements The non-functional requirements used in this study were:

36 6.2. Evaluation of Datasets Requirements

1. Public availability or replicability: The datasets need to be publicly accessible or easily replicated. That maintenance or service of a dataset must be in place for a foreseeable future.

2. Interoperability: The datasets must be given in a widely accepted format.

3. Quality, authenticity: The datasets must contain data as close to real-world data as possible.

6.2 Evaluation of Datasets Requirements

This section covers how the datasets were evaluated against their set requirements.

6.2.1 Functional Dataset Requirements This section covers the prerequisites to fulfill the functional requirements set on the datasets:

1. Payload availability: As a testbed RICS-el guarantee that no sensitive data is leaked, hence no encryption or removal of data is necessary. This requirement was fulfilled if no encrypted IEC-104 payloads were found in any recorded dataset and if no payload was excluded from a dataset.

2. Labeled attacks: Allowing the ScenarioBot to control the attack-bot produced labels as attacks started or ended. This requirement was fulfilled if an observer could determine if an arbitrary packet of a dataset was transmitted during an ongoing attack or not.

3. Ground truth: With the used approach to labeling, an error was only to occur with human interference. If the scenarios were labeled according to running attacks, then this requirement was fulfilled.

4. Growing: This requirement was fulfilled if RICS-el supported the addition of new at- tacks.

5. Attack diversity: At least five types of attacks were to be recorded per this study’s goals and delimitations. As attacks may be unsuccessful, this requirement was fulfilled if at least three different scenarios were available to a user.

6.2.2 Non-Functional Dataset Requirements This section covers the prerequisites to fulfill the functional requirements set on the datasets:

1. Public availability or replicability: This requirement was fulfilled if the datasets were accessible to anyone who asked for them. Alternatively, RICS-el and the attack-bot could be accessible and maintained after this work had finished enabling replication of the datasets.

2. Interoperability: This requirement was deemed as fulfilled if the network traffic was available in pcap files and labels and additional information was accessible in CSV files.

3. Quality, authenticity: The background traffic of RICS-el was assumed to be authentic. As the attack traffic was reactive to the background traffic that should be no different. An analysis was done to check this: A baseline pcap file with no present attack and a pcap file with attack(s) were compared. The analysis compared the TTL field that was problematic in DARPA 99 and the total number of packets sent.

37 6.3. Attack Impact Evaluation

6.3 Attack Impact Evaluation

Table 6.1 defines what criteria an attack had to fulfill to be considered successful.

Table 6.1: Attack success criteria Scenario # Criteria of success 1 If a machine running IEC-104 was found. If the connection between HMI and RTU was lost from the operator’s per- 2 spective. If the connection between HMI and RTU was lost or if latency increased by 3 100%. If the connection between HMI and RTU was lost or if latency increased by 4 100%. 5 If a command was ignored or executed in an unintended way. If the RTU stopped all of its transmissions to the HMI, started to repeat 6 itself or lost data. If the observed values in the HMI differed from the actual values in the 7 power grid. 8 If the commands from the HMI were ignored by the RTU. If the operator saw a breaker change in the HMI, but another breaker than 9 the intended got changed in the power grid. If the HMI stopped receiving new values from the RTU and instead ac- 10 cepted the replayed packets. If the setpoint of a generator was altered without the involvement of an 11 operator. If the setpoint of a generator was altered without the involvement of an 12 operator.

To measure latency, 100 ICMP packets were sent from the HMI to the RTU twice. Once when the system was under attack and once when it was not. Then an average RTT was computed and compared.

38 7 Results and Evaluations

Section 7.1 covers the outcome of the scenarios and their corresponding attacks. This chapter also includes a presentation of the generated datasets in section 7.2. The evaluation of the requirements set on the datasets is outlined in section 7.3.

7.1 Impact of Attacks

Each scenario described in section 5.2 resulted in an attack. These attacks are described in this section together with an evaluation of the impact of the attacks as listed in section 6.3.

7.1.1 Scenario 1: The Scanning Attack All encountered hosts were password protected. Hence, the resulting scan failed the SSH connection, described in step 3 in the first bullet of scenario 1. The scan should have included a step on how to crack any present passwords. To proceed with the scan, all passwords were collected from the administrators. Having done so, the adversary identified RTU, HMI and the firewall in between. Commands issued by the adversary with made-up IP addresses:

1. ifconfig: The adversary noted their own IP address 171.173.192.66 and network ID 171.173.192.

2. nmap -sn 171.173.192.*: The network scan identified IP addresses 171.173.192.1 and 171.173.192.211.

3. ssh 171.173.192.1, followed by : The adversary connected to 171.173.192.1.

4. tcpdump | grep 2404*: Through passive monitoring, the adversary discovered that a message was sent from IP address 12.3.245.56 port 2404 to 171.173.192.211 port 52360. Since 171.173.192.211 shared the network ID with the MitM Machine and was the recip- ient of a message from the 2404 port that was concluded to be the HMI. The node with IP address 171.173.192.1 was the firewall and 12.3.245.56 the RTU.

39 7.1. Impact of Attacks

To confirm that the adversary found an IEC-104 device, the steps in the second bullet of scenario 1 were conducted. A device with IP address 12.3.245.56 did respond with IEC-104 traffic to the sent IEC-104 message. The resulting evaluation is seen in Table 7.1.

Table 7.1: Result of the scanning attack Scenario Criteria of success Success? # 1 If a machine running IEC-104 was found. Yes

The criteria of success were fulfilled as a machine running IEC-104 was found. However, the attack needed to be supplemented with a step to circumvent password-protected SSH connections.

7.1.2 The DoS Scenarios The attack-bot generated three different DoS attacks, implementing scenarios 2-4, resulting in different outcomes. Either seemingly nothing happened, the IEC-104-communication got unreliable or the link between HMI and RTU was completely shut down denying traffic even after the attack stopped. Whether the DoS attacks were seen as successful or not is shown in Table 7.2.

Table 7.2: Results of the DoS attacks Scenario Criteria of success Success? # If the connection between HMI and RTU was lost from the oper- 2 Yes ators perspective. If connection between HMI and RTU was lost or if latency in- 3 No creased with 100%. If connection between HMI and RTU was lost or if latency in- 4 Yes creased with 100%.

7.1.2.1 Scenario 2: Packets Dropped For as long as this attack was running, no IEC-104 transmissions were completed between the RTU and HMI. From the operator’s view, nothing was operational and the power supply got interpreted as disrupted. When the HMI stopped receiving IEC-104 traffic from the RTU, then the HMI tried to re- establish the connection. As TCP traffic was not dropped, the RTU was responsive to the TCP handshake. This lead to the HMI continuously sending STARTDT frames, close the connection when met by silence and repeat the cycle with a new TCP handshake. When the adversary stopped the attack, the RTU and HMI re-assumed IEC-104 communication after approximately 30 seconds.

7.1.2.2 Scenario 3: SYN Flood The launched SYN flood did not deny the operator access to the RTU. There were no open ports on the RTU for the adversary to flood. The ports that were open and of relevance to the IEC-104 communication such as port 2404 only accepted traffic from white-listed IP ad- dresses. Sadly for the adversary, the only white-listed IP address was the one of the HMI. Fur- thermore, the firewall between the HMI and RTU was configured to drop unsolicited packets not part of any established connection, except for the open port 22. The traffic generated by the attack did not affect the latency of packets sent between RTU and HMI.

40 7.1. Impact of Attacks

7.1.2.3 Scenario 4: IEC-104 Flood The adversary flooded the RTU and HMI with IEC-104-packets, denying the operators remote control service. An average of 11783 packets per second was sent, each packet to its original recipient. The packets were not processed as both TCP and IEC-104 sequence numbers were incorrect. From the operator’s view, there was seemingly no connection to the RTU most of the time, but some measurements still got through to the HMI. In all tests where the attack was active for more than 20 minutes, the HMI stopped trying to maintain a communication line service with the RTU. If the adversary stopped the attack before 20 minutes had passed, the IEC-104 communication was re-assumed as normal. During normal operation, with no attacks present, the expected latency of an ICMP message between HMI and RTU was about 7 milliseconds. During the flooding attack, the latency for transmitted ICMP messages was increased by nearly 1429%, with an RTT of nearly 100 milliseconds. Also, 75% of the messages were dropped entirely.

7.1.3 Scenario 5: The Sequence Attack The resulting attack was not observed to cause any damage. The packets that were dropped were of different types; M_ME_NA_1, S-format packets and C_DC_TA_1. Because the in- tended recipient did not receive the dropped packets it did not acknowledge their TCP se- quence numbers. As the sender did not receive an acknowledgement, the dropped packets were re-sent. Eventually, the data always made it through to the intended recipient within less than a second. Table 7.3 list the resulting evaluation.

Table 7.3: Results of the sequence attack Scenario Criteria of success Success? # 5 If a command was ignored or executed in an unintended way. No

7.1.4 The MitM Scenarios The attack-bot was able to start four different MitM attacks using the design of scenarios 6-9. Table 7.4 states whether each MitM attack was seen as a success or not.

Table 7.4: Results of the MitM attacks Scenario Criteria of success Success? # If the RTU stopped all of its transmissions to the HMI, started to 6 Yes repeat itself or lost data. If the observed values in the HMI differed from the actual values 7 Yes in the power grid. 8 If the commands from the HMI were ignored by the RTU. No If the operator saw a breaker change in the HMI but another 9 Yes breaker than the intended got changed in the power grid.

7.1.4.1 Scenario 6: Altered APCI Sequence Numbers When the APCI sequence number was decreased by one, nothing seemed to happen. At offsets larger than three, the traffic came to a halt and the RTU reacted with closing the con- nection. Ten seconds later the HMI re-initiated the connection and traffic resumed. About 15

41 7.1. Impact of Attacks seconds later the RTU closed the connection again, repeating the process for as long as the attack was ongoing. The operator was denied access to the RTU during the reset, no commands nor measurements were transmitted. The process of closing and restarting continued for some iterations but after approximately ten minutes the communication line service was terminated. The attack was having an adverse impact but not the effect that the adversary had anticipated.

7.1.4.2 Scenario 7: Altered Sensor Values Measurements sent to the HMI had their IE set to zero. The voltage in the power grid was interpreted as lowered at the HMI. Instead of the expected 420 Volt, 250 Volt was measured. The attack also denied the operator knowledge about what state breakers were in. Chang- ing the value of a breaker resulted in an indeterminate state since the M_DP_TB_1 packet, delivering the state of a breaker, had its data wiped.

7.1.4.3 Scenario 8: Altered Time Tag This attack seemingly had no effect. Measurements with modified time tags were still ac- cepted and processed by the HMI. The ITF was never set and the control system never asked the RTU to be synchronized. The RTU executed all received commands, in contrast to Baiocco and Wolthusen’s [9] observations where the RTU did not execute seemingly delayed com- mands. It was observed that C_CS_NA_1 and M_DP_TB_1 were the only type of packets found to send accurate time tags in RICS-el. The implementation of the protocol was such that all other time tags were set to the 31st of December 1979. Control loops in RICS-el need to consider the time tags and the erroneous timestamps need to be corrected. Until then, modifying the IEC-104 time tags will not result in a realistic reaction on RICS-el’s part.

7.1.4.4 Scenario 9: Altered Breaker IOAs When the operator wanted to toggle breaker A, breaker B was toggled instead and vice versa. The response from the RTU was switched too, to hide the attack from the operator. This attack disabled the operator from doing a proper job and made the system behave in an unwanted manner. It could have brought the system to an unsafe state or even a state in which the power station was completely shut off from the power distribution.

7.1.5 Scenario 10: The Replay Attack The HMI received the same values over and over in a 20-second loop with a varied inter- arrival time. The time between each replayed packet was fluctuating following RICS-el’s natural pattern without the adversary having to mimic any timings. If the adversary chose to unplug the power grid from the power distribution, resulting in a blackout, the operator’s view would not show any sign of errors. Table 7.5 states the outcome of the replay attack in terms of success or no success.

Table 7.5: Result of the replay attack Scenario Criteria of success Success? # If the HMI stopped receiving new values from the RTU and in- 10 Yes stead accepted the replayed packets.

42 7.2. Created Datasets

7.1.6 The Packet Injection Scenarios The resulting evaluation of the packet injection attacks is seen in Table 7.6.

Table 7.6: Results of the injection attacks Scenario Criteria of success Success? # If the setpoint of a generator was altered without the involve- 11 Yes ment of an operator. If the setpoint of a generator was altered without the involve- 12 No ment of an operator.

7.1.6.1 Scenario 11: Injected Packets to Stream The setpoint of the target generator was set to zero. All subsequent packets on the stream had their TCP sequence numbers altered so that the injected command and its effect on the gener- ator were not noticed in the HMI. Packets sent to the HMI from the RTU, had their sequence numbers lowered by a value representing the number of bytes that had been involved in the injection and the following TCP acknowledgements. Packets sent to the RTU from the HMI had their sequence numbers increased by the same value. The continuous modification of packets did not last more than a minute after making the injection. The subsequent sequence numbers got incorrect and the connection between HMI and RTU halted.

7.1.6.2 Scenario 12: Inject Packets Directly to RTU Initiating TCP and IEC-104 communication with the RTU was not possible. The RTU only allowed packets from the whitelisted IP address of the HMI and it only accepted IEC-104 traffic on port 2404. To enable the attack, the attack-bot would have to pretend to be the HMI. That attack is represented by scenario 11.

7.2 Created Datasets

This section covers the datasets generated during the defined attack scenarios. Unsuccessful attacks are not covered (scenarios 1, 3, 5, 8 and 12). The datasets are listed in Table 7.7 together with a scenario with no present attacks. Time is specified in mm:ss, denoting time passed since RICS-el had stabilized. The ScenarioBot stored network traffic and timestamps for started and stopped attacks. But, data detailing exact events during a scenario was spread out on the three machines running the three involved bots.

1. Only the OT-Bot knew exactly what commands were issued as this data was stored in its working memory.

2. Only the attack-bot knew exactly what packets it had tampered with. The APCI control fields of the packet and what the attack-bot did with the packet were stored in local text files.

3. The ScenarioBot stored the traffic in pcap files, obtaining this data from the firewall lo- cated between the RTU and the HMI. The ScenarioBot also stored its actions with times- tamps, a snapshot of electrical process data and when the OT-Bot issued a command in CSV files.

43 7.3. Review of Requirements

All scenarios were 30 minutes long. All attacks started three minutes after RICS-el had stabi- lized and ended after being active for 20 minutes. One reason for choosing attacks to be active for the majority of a scenario was to check how stable the attack was. Every scenario lasted the entire duration except for scenario 11, in which the attack lasted for three minutes. Another reason was to stress test RICS-el. Given the duration of 20 minutes, it was noticed that scenarios 4 and 6 resulted in a terminated communication between the HMI and RTU, even after the attack stopped. The OT-Bot was active for about 1.5 minutes, sufficient to test labeling and an attack’s impact on operator actions. The OT-bot started five minutes after RICS-el had stabilized.

Table 7.7: Recorded datasets Scenario # Number of Packets (IEC-104) Comments Normal op. 7144 (1571) No attacks present 2, DoS 4366 (603) Attack briefly stopped 22:17-23:05 4, DoS 16488794 (1121) 1121 legitimate IEC-104 packets 6, MitM 2955 (520) Communication line service crashed 7, MitM 6850 (1519) - 9, MitM 6991 (1540) - 10, Replay 6967 (1504) 14 packets recorded 11, Injection 6718 (1446) Cover-up successful for ten seconds

The actions of the OT-Bot in each scenario are listed in Table 7.8.

Table 7.8: Operator actions in each scenario Command Type Target Value At time point mm:ss C_DC_NA_1 Breaker 4 Close 05:10 C_SE_NC_1 Generator 1 Setpoint 299 05:15 C_DC_NA_1 Breaker 4 Open 05:20 C_DC_NA_1 Breaker 5 Open 05:30 C_DC_NA_1 Breaker 5 Close 05:40 C_SE_NC_1 Generator 2 Setpoint 299 05:50 C_DC_NA_1 Breaker 6 Open 06:00 C_SE_NC_1 Generator 2 Setpoint 333 06:20 C_DC_NA_1 Breaker 6 Close 06:30 C_DC_NA_1 Breaker 4 Close 06:45

The datasets with scenarios 7 and 10 contain no trace of the generated attacks since the effect of the attack is not illustrated in traffic leaving the OT-LAN. The impact of attacks was instead observed at the HMI.

7.3 Review of Requirements

This section covers the resulting evaluation of the requirements set on the datasets.

7.3.1 Functional Requirements of the Datasets The resulting evaluation of the functional requirements:

1. Payload availability (Fulfilled): No payloads were removed and no payloads were encrypted.

44 7.3. Review of Requirements

2. Labeled attacks (Fulfilled): A user could associate each packet in the pcap file to a specific label. Timestamps specifying when an attack was present or not were kept in CSV files.

3. Ground truth (Fulfilled): If the ScenarioBot was used to schedule the entire scenario then the ground truth was known. The exception being when the attack-bot malfunc- tioned or if the attack-bot was used locally.

4. Growing (Fulfilled): The system supported the addition of new attacks. In this study, several attacks have been added. It was not possible to alter network patterns other than with operator actions and attacks.

5. Attack diversity (Fulfilled): The attack-bot supports four different types of attacks with a documented impact. The resulting datasets contain packets from three different attack types, discounting the replay attack.

7.3.2 Non-Functional Requirements of the Datasets The resulting evaluation of the non-functional requirements:

1. Public availability or replicability (Fulfilled): Datasets have been uploaded [22] to en- sure public availability for anyone who asks for it, fulfilling the requirement. However, RICS-el as documented in this study, have been updated since 2020 when this study was conducted. As a consequence, the work is not replicable.

2. Interoperability (Fulfilled): The network traffic and relevant labels were stored at the node on which the ScenarioBot was running, in pcap and CSV files respectively.

3. Quality, authenticity (Not fulfilled): The TTL value of packets that the adversary mod- ified or replayed were not consistent with the TTL of the baseline traffic.

45 8 Discussion

This chapter examines the study in four different aspects, each having its respective section. In section 8.1 the results of this study are discussed. How the methodology affected the results and what could have been done differently is discussed in section 8.2. Criticism of the sources used in this study is found in section 8.3. Section 8.4 considers the possible implications and benefits of this study in a wider context.

8.1 Results

The functional and non-functional requirements set on the datasets are discussed in their respective sections 8.1.1 and 8.1.2. The evaluation is followed by section 8.1.3 discussing the attacks from the perspective of how realistic RICS-el’s response to the attacks is. It should be stated that no attacks generated in this study should be considered stealthy: Most attacks utilize ARP spoofing to enable packet capture. Something identifiable by existing NIDSs.

8.1.1 Functional Requirements of the Datasets Each functional requirement set on the datasets is fulfilled. However, this section discusses how they can be improved even further. In scenario 2 an active attack temporarily stopped capturing packets, see Table 7.2 in chapter 7. During this attempt, the attack behaved differently from before. Therefore, the pause was noted too late to make corrections to the dataset. A likely cause of the paused attack is that two instances of the attack ran simultaneously, blocking each other. This could have happened if the user forgot to properly reset the system between scenarios or accidentally started a new attack locally on the MitM Machine while monitoring logs. That the attack in scenario 2 temporarily stopped is not shown in the automatic labeling. That error showcases an issue concerning the ground truth of datasets in this study, that the labels are incorrect when the attack-bot malfunction.

46 8.1. Results

One way of solving this could be by allowing the attack-bot to notify the ScenarioBot when that happens. Logs from the attack-bot should be sent to the ScenarioBot too, should the attack-bot not be able to determine the error itself. The collected logs together with a descrip- tion of the attack would help a user of the datasets to identify if an error has occurred. Another issue with ground truth is that timestamps specifying when an attack is launched or terminated could be slightly erroneous. When the ScenarioBot logs that an attack is activated, the ScenarioBot simultaneously connects to the MitM Machine and launches the attack-bot. Connecting and launching an attack took milliseconds, but it could result in benign packets being labeled as part of an attack during a few milliseconds. The same is true, but the other way around, when the attack-bot is stopped. The results also show that RICS-el can generate a more diverse set of datasets:

• Firstly, the number of packets present in the recording of scenario 4 show that the attack- bot and RICS-el can create much longer and populated datasets.

• Secondly, there could have been a more varied set of attacker profiles. For instance, being allowed to use more resources than the single MitM Machine could have enabled something like distributed DoS attacks.

• Thirdly, the attack model of the system could have allowed for emulated physical at- tacks on the power grid. An attacker could have, for instance, been allowed to manip- ulate the power grid directly using an interface to the simulated power grid.

8.1.2 Non-Functional Requirements of the Datasets Modified, injected or replayed packets do not have the same TTL as packets occurring nat- urally in the traffic. This is an issue for researchers that want to use the datasets to identify intrusions, as a NIDS might learn that looking at the TTL is enough to detect the intrusion. This is not necessarily the case. The attack-bot needs to adjust the TTL in malicious traffic in an attempt to make attack traffic more similar to normal traffic. The attack-bot has access to the TTL field. So a way to achieve a more believable TTL value is to copy the TTL value of a monitored or captured packet into the malicious packet.

8.1.3 Realism in RICS-el In RICS-el, the level of realism varies. For instance, RICS-el’s emulated RTUs are not realistic. They convert traffic between the two different protocols used on the OT side. There are no present control loops or logic. Scenario 6 does not affect the stored buffer of the RTU, as there is no buffer to affect. However, the training simulator module in the power grid reacted to the attack, becoming unable to generate new IEC-104 packets. This lead to the closing of the established TCP connection between HMI and RTU. Another, not so believable feature, is the presence of erroneous time tags found in the packets; and that scenario 8 shows that the tags seemingly are not processed. On top of that, the historian is not accessible, so how any attack influenced the system’s traceability of events cannot be documented. The packet injection attack in scenario 12 was not successful, in part because the HMI is the only node allowed to connect to the RTU. The SYN flood attack in scenario 3 failed for the same reason. In RICS-el, there are embedded firewalls on the HMI and RTU, effectively filtering the adversary’s packets in the above attacks. Of course, it feels believable that some level of security is in place. But seeing how much success SYN flooding had in the S3 event [6] how realistic it truly is that an RTU has an embedded and configured firewall is up for debate.

47 8.2. Method

Sequence attacks through dropping packets is not applicable in RICS-el. Scenario 5 shows that if a packet is dropped in the middle of a sequence, the mechanisms of the TCP protocol make sure that the dropped packet is resent. Furthermore, each step in each sequence in RICS-el waits for its preceding step to be acknowledged. Re-ordering the packets might result in different behavior. For instance, asking a breaker to switch state before being selected. Desynchronization attacks are not plausible in RICS-el either, using NTP or MitM attacks to alter time tags. Baiocco and Wolthusen [8, 9] documented that an IEC-104 command with a time tag indicating a delay of 10 - 30 seconds was not executed by the RTU. This study does not use NTP to achieve desynchronization as Baiocco and Wolthusen. Rather, time tags are altered through MitM before the IEC-104 commands reach the RTU. To the best of our understanding, the different attack vectors should have resulted in a similar outcome.

8.2 Method

This section aims to cover the consequences of the chosen approach to build an attack-bot in RICS-el as a means of generating datasets for intrusion detection. It describes what could have been done differently if the study was conducted a second time.

8.2.1 The Implementation of the Attack-Bot The attack-bot offers a diverse set of attacks following the delimitations of this study. Five types of attacks are supported; DoS, sequence, MitM, packet injection and replay attacks. A user can configure what target(s) to attack. So even though the attack-bot cannot scan for targets itself, a user can direct the attack-bot in the right direction. However, the lack of scanning attacks presents a slight issue with the attack-bot’s usability. If the attack-bot could do reconnaissance, the necessity to supply specific targets would lessen. To enable the scanning attack mentioned in section 5.2.1, a way of cracking the password protection is required. The attack-bot has no agency or understanding of an ongoing scenario. A user cannot intro- duce triggers that the attack-bot would react to. Adding that would make the attack-bot more interesting when choosing to launch random attacks for training or attack simulations. The method lacked instructions on how the attack-bot would re-evaluate its situation by itself, like an adversary in the real-world would do. All attacks use the library created by Maynard et al. [39]. The choice of this approach, rather than using different libraries and tools given different situations, increased the efficiency of deployment. The user needs to configure the attack-bot’s source code to extend the attack-bot with new attacks. This is unnecessarily difficult and should not be required by a user as it limits ex- tensibility. The lacking extensibility hinders maintainability since software must keep up with new techniques and technology. It is not very appealing to maintain an attack-bot only supporting old attacks. This problem could have been avoided by changing the interface be- tween the attack-bot, written in Python, and the utilized attacks written in C. Adding a path to a script in a user interface or configuring the configuration file should have been enough to add a new attack. Still, it is worth it to use the library created by Maynard et al. [39]. Because the offered functionality covers everything one needs to develop the attacks covered in this study.

8.2.2 The Setup of the Experiments The experiment setup, with its static relations between nodes, is simple. However, it does not allow the workflow enough flexibility. The attack-bot always launches its attacks from the

48 8.2. Method

OT-LAN and the ScenarioBot always records traffic transmitted through the firewall. This creates a problem in scenario 10, where the packets replayed to the HMI is not routed through the firewall. The ScenarioBot is recording the packets at the firewall, where all the new values from the RTU are present. And the attack-bot’s replayed packets are only registered by the HMI, not the firewall. Because of this, the dataset from scenario 10 does not include any replayed packets. However, generating a dataset with replayed packets is easy. Because the attack-bot is built to be portable and the ScenarioBot could be enabled to record traffic in any node. So if one wants to capture attack traffic of such attacks, the attack generation and/or packet recording need to vary depending on the scenario at hand. Choosing a scenario length of 30 minutes ensures human surveillance throughout. Only hardware limits for how long a user could leave a scenario running in a headless operation.

8.2.3 The Workflow for Dataset Generation A hypothesis in this study was that using a virtual environment would make system resets between scenarios much easier. The resulting reset procedure used in the system configura- tion step is easy to use. But, while quite effective compared to a physical equivalent, there are potential improvements:

1. The first improvement would be to restart the communication line service automati- cally. In the case of scenarios 4 and 6, an operator had to restart the communication line service with the RTU to reactivate the SCADA process. 2. The second improvement is not directly related to the system reset functionality of the ScenarioBot. When the ScenarioBot stopped attack generation, it restarted the MitM Machine and a warning blocked the MitM Machine from completing its boot-up se- quence. So before a user could start an attack again, user intervention at the MitM Machine was required. The attack-bot needs to be given a new VM without a faulty boot sequence to improve the utility of the dataset generation.

Running attack scenarios proved to be an easy exercise too. However, that the ScenarioBot does not receive feedback from the system during a scenario is problematic. Especially if something deviates from the schedule, such as the attack-bot failing to initiate an attack. Such deviations are discovered during data collection when the logs from the attack-bot and OT-bot are gathered manually. Other than that, the data collection step is straightforward and effective. Further, this indicates that data transfers from the attack and OT-bot should be automated too. The data processing step should involve a deeper analysis of the network traffic. If that had been done, the abnormal TTL value could have been found earlier. A parser should be used to aid in this endeavor. In the iterative dataset evaluation, it was at times hard to know when to stop the process. Especially when the issue seemed to be with the scripts of the attack-bot or ScenarioBot. The flowchart should include an activity stating that a scenario should be placed back into the queue of scenarios. That way, other incomplete scenarios would be prioritized earlier, potentially improving the final set of attacks.

8.2.4 Design of Attacks The attacks in this study are of six different types. Re-using the attack-code template from Appendix B made it possible to introduce a lot of variation between attacks within each type, using small changes to the code.

49 8.3. Sources

There are different approaches to exploring attack vectors in a system. The attack design in this study involves an expert in critical infrastructure security, with access to one machine and a quite large amount of time. This restricts attack space but gives time to refine the attacks being developed. More important still, the attacks are implemented in an attack-bot, being made repeatable. The path taken in S3 [3, 6] involves many teams composed of people from different backgrounds. S3’s method leads to a large diversity of attacks, while also covering a lot of trial and error in a short amount of time. Entry points, targets and the profile of the adversary vary because of the nature of the event. The evaluation of attacks in this study is technical in large parts, measuring latency or logging and monitoring packet modifications. In S3 [3, 6] a scoring system is applied. The scoring is based on goals, relating to how the system should behave and what could be done to damage the functioning process. The goals set up in this study are subjective and qualitative, assuming that an attack would work in a certain way. Having a set of quantitative metrics at hand, applying those to each attack, might give a more nuanced version of what the attacks achieve.

8.2.5 Requirements as Evaluation The criteria for evaluating datasets showed that the datasets have promising results. The biggest concern is the uncertainty of RICS-el’s future. Datasets become unable to adapt if the platform on which they are generated is not updated. The datasets contain packets generated by the attack-bot from three different attacks. In the replay attack, no replayed packets are present in the dataset because of how the attack-bot and ScenarioBot were set up in the scenario. Hence, no malicious activity was observed in the pcap file during data processing. There was no time to change the method and run the scenario again to fix this issue. But, to supplement this, a pcap recording created at the HMI is available. Evaluating attack diversity based on the number of attack types does not accomplish a di- versity of attack techniques. For example, the method does not encourage different attack vectors, most attacks in this study relied on ARP-spoofing. Including other techniques, such as remote exploitation, would make the datasets even more diverse.

8.3 Sources

This section discusses potential risks with used sources. This study has a reliance on a couple of main sources detailing how IEC-104 work and how MitM attacks on IEC-104 previously have been done. The main source for MitM attacks is the work of Maynard et al. [39] and the main source for IEC-104 is the work of Matoušek [44]. By mainly adapting their understand- ing of the subject, this study risks having their views adopted, instead of encompassing the full picture. On a similar note, some authors appear as a source more than once. That the author is in- volved in several papers might indicate that the author has a lot of knowledge within this field. But as with relying on one main source for information, the bias of a specific author could shadow out nuances that might have been available through involving other authors.

8.4 The Work in a Wider Context

The aim of building an attack-bot was twofold: Make it easier to create datasets to study intrusions in SCADA systems; and to document the process to make future studies using this methodology better. Operators in training could benefit from including the attack-bot in exercises as well, operating a system under attack.

50 8.4. The Work in a Wider Context

Operators and anomaly researchers are not the only groups that could benefit from this study. Those who build their own testbed could use this study to identify how RICS-el handled attacks and supported the addition of the attack-bot. Then adjust their method to make their testbed equally good, or even better equipped than RICS-el. The attack-bot could also be used to evaluate certain aspects of existing testbeds. Such as this study shows that RICS-el has its strengths and weaknesses. Groups with intent to harm could use this study too. Through the generation of attacks over and over in a testbed, problems with the attacks could get caught and fine-tuned before being used in production. An attack-bot or an attack documented in this study could also be applied directly in production. After all, the attacks generated in this study are mimicking real attacks, conducted in an environment meant to be similar to a real-world counterpart. However, understanding how to develop effective IDSs outweighs the risk it entails of dis- closing potentially harmful information. Effective defense mechanisms in the power grid are not going to be adequately implemented if the vulnerabilities of new solutions, or the present-day power grids, are not understood.

51 9 Conclusion

Intrusion detection research in SCADA networks requires qualitative datasets. This study intends to provide a data generation framework by issuing repeatable attacks in the virtual SCADA testbed RICS-el. This study also observes and documents how the issued attacks impacted RICS-el. The verdict on the dataset generation methodology is presented in section 9.1. A summary of the resulting impact of the launched attacks on RICS-el is covered in section 9.2. Lastly, some recommendations on how to move forward are detailed in section 9.3.

9.1 Dataset Creation in RICS-el

This section answers research question one; How can datasets, including realistic network traffic and labeled attacks, for intrusion detection research in SCADA systems be realized in RICS-el? Through the creation of an attack-bot that generates different attacks in RICS-el using sched- uled and repeatable attack scenarios. The scenarios resulted in eight labeled datasets with IEC-104 traffic, demonstrating the usefulness of the attack-bot. The attacks target an RTU and HMI in RICS-el and datasets are created by recording network traffic in transmission between the targets. A set of requirements are used to ensure the generation of relevant datasets. The results show that:

• Datasets generated in this study have their packets and payloads unmodified. By using a virtual testbed the need for anonymization is removed.

• All datasets are labeled. A scheduler, the "ScenarioBot", is used to control events in scenarios. The ScenarioBot label the datasets with timestamps, specifying when and what scheduled action is executed. Datasets are labeled with active attacks and timings since the ScenarioBot schedule attack-bot actions too.

• Ground truth is known, only the attack-bot could generate attacks and its actions are always logged.

52 9.2. Attack Generation in RICS-el

• RICS-el supports the addition of new attacks, making the datasets able to grow and adapt to new attacks. The attack-bot complement RICS-el by making the process to add new, repeatable attacks, easier.

• There are eight created datasets offering a diverse set of attacks. The attack-bot can modify specific packets in transmission between two nodes and inject new packets.

• The datasets are stored separately to be available regardless of the state of RICS-el.

• Interoperability is ensured, network traffic is stored in pcap files and labels in CSV files.

• The datasets contain the IEC-104 background traffic generated by RICS-el and attack traffic generated by the attack-bot. The quality of generated datasets is lacking, as it is easy to identify what packets the attack-bot had modified or created. This was since the TTL value of such packets is different from the system’s norm.

In conclusion, the results show that the generated datasets have potential. However, they are not useful to intrusion detection research. The reason being that the attack traffic lacks authenticity. For future use of the attack-bot, the attacks need to be modified to make TTL values remain consistent with background traffic. One way to do this is to copy the values of monitored packets into manipulated packets. To improve the already generated datasets, a script should be written that changes every TTL value to 127 for packets sent from the HMI to the RTU during an attack. It is apparent from the results that the methodology to create datasets in this study needs additional improvements:

• To ease the process of generating datasets the ScenarioBot has a function to reset RICS-el between scenarios. Due to a corrupt installation of the operating system on the com- puter where the attack-bot resides, the attack-bot reset does not work. Further services need to be reset too, such as the communication line service between HMI and RTU.

• The attack-bot needs additional tools to adapt to changing background traffic. RICS-el also needs to improve on what network properties a user can configure on their own.

• The attack-bot needs to offer an interface where a user can add new attacks without modifying source code.

• The entry point of the attack-bot or the node where the ScenarioBot records network traffic needs to be configurable. It should be possible to properly record all attack traffic, evaluating different attack vectors and NIDS placements.

9.2 Attack Generation in RICS-el

The second research question asked Does attacks generated in this study have an appropriate effect on RICS-el? What impact do the attacks have on RICS-el? For the generated datasets to be of any relevance in the research field, the impact of attacks is evaluated. Furthermore, RICS-el needed an evaluation too as no documentation of previous attacks existed before this study was conducted. Most attacks have a realistic impact on RICS-el. The results show that:

• DoS-attacks through flooding network of the system works, as the network capacity is not unbelievably high.

• The HMI trusts the data sent from an RTU or an adversary masquerading to be an RTU.

53 9.3. Future Work

• RTUs react to commands sent from the HMI or an adversary masquerading to be the HMI.

• Vulnerabilities such as lack of encryption, authentication and more are present.

However, there are some inappropriate features and behaviors in RICS-el that limit the set of possible attack vectors and results in unrealistic attack impacts:

• The emulated RTUs have embedded firewalls with a very strict ruleset.

• There are no control loops, message buffers or internal logic in the emulated RTUs. This means that the RTU cannot be used as a target in RICS-el and attacks cannot exploit properties present in physical RTUs.

• There are no function processing IEC-104 time tags. Attacks that alter time tags do not have an impact on the behavior of RICS-el.

• There is no functioning historian. Not only does that limit targets but it also limits ways of investigating and exploring attack impacts.

• Some protocols are not enabled. Most notably, a protocol for clock synchronization is not present. There are no packets in the network related to clock synchronization and introducing such an attack to desynchronize the system would not have an effect.

9.3 Future Work

This section covers several possibilities when moving forward with this research. The attack- bot could improve in regards to functionality, extensibility and accessibility. Further devel- opment of the attack-bot and RICS-el could enable RICS-el to become a source of relevant SCADA datasets to security researchers. To improve accessibility, RICS-el and its projects should push towards becoming open source. The system state reset needs improvement, making the attack-bot allow attacks to be termi- nated remotely without the need to restart any hardware. There is also a need to change how the IEC-104 protocol is used in RICS-el. Packets should be assigned correct time tags and control loops should be added or expanded on to account for this information. Extensibility could be improved through the addition of new tools to the attack-bot, such as password cracking or reconnaissance attacks. Another way of improving extensibility is to develop the attack-bot’s understanding of its surroundings, making it appear more human. RICS-el’s extensibility can be improved by enabling the historian and NTP, the servers are already in place but are not operational. Another direction to move in is to involve more of RICS-el or focus on different parts of it. In this study, attack vectors did not involve the enterprise layer nor the power grid of the SCADA system. This would be interesting to explore because it expands the scope of possible attacks and errors while placing new demands on the attack-bot and dataset generation. Another path forward is to install the attack-bot in another environment. This would be an interesting comparative study, to further broaden the understanding of what is important when creating datasets using this method. If that environment was not running IEC-104, then that study could also illustrate how the attack-bot would adapt to a new communication protocol. If one wanted to focus on the IEC-104 aspects of RICS-el more, there are two aspects to consider. The use of more than one threat actor and the introduction of physical circuits and hardware in RICS-el. A final path forward is to do a deeper analysis of the datasets generated. What other aspects not covered in this study made the datasets good or bad? What would be needed from the

54 9.3. Future Work attack-bot or testbed to make up for the deficits and what helped to create the good parts? The attacks, or the datasets, should also be tested against a NIDS. In that way, the attacks could be improved, being made more stealthy while still having an impact. This, in turn, would deliver better attacks and therefore better datasets for NIDS development.

55 A Appendix Attack-Bot Configurations

A.1 List of Flags

Table A.1 describes what flags the attack-bot recognized.

Abbreviated Verbose flag Description flag -a –Attack Used to specify attack or attack type. Used to get a short explanation of the attack-bot -h –Help and how to use it. -v –Verbose Not used. Enable additional attack logging. Used to disable user-input. Default when used -s –Silent remotely. -d Option disabled. Used to delay execution of an –Delay attack. Used to specify for how long an attack should be generated. Excluding this flag while using the -s -t –Time flag would result in an attack running until the process is killed. -r –Runtime Identical to -t . Used to launch the attack-bot with pre- -p programmed behaviours. Choices: Novice, haviour> random, stealthy and opportunist. Used to launch the attack-bot with a specific con- -c <filepath> –Config <filepath> figuration file.

Table A.1: List of flags

For example, running the command python3 attack-bot.py -a mitm -t 300 -s would run a random MitM-attack targeting default IP addresses for five minutes, without asking the user for additional input.

56 A.2. Configfile Options

A.2 Configfile Options

The contents of the file attack-bot-settings.cfg: # To adjust the settings of the bot, change the values of a field. # Comments could be added to the file using an octothorpe (#)

# IP addresses of targets. If only one target is used, leave the other field blank IP1: 171.173.192.211 IP2: 171.173.192.1

57 B Appendix Attack-Code Template

The attack-code template had the following four functions:

1. An initializing function, setting up the attack. For example, setting up a hook to the TCP stack to monitor and/or modifying packets.

2. A finalizing function freeing resources after the attack. For example, deleting hooks, threads and allocated memory.

3. A function writing packet information to a log.

4. A function parsing the TCP packets. This function is called when interrupted by the hook to the TCP stack. A description of this function using pseudo-code:

If the captured packet is of relevance to the ongoing attack then Prevent the packet from being forwarded. /* Do something... Such as: Packet manipulation, store payload values or generate a response to the packet source. */ If the original receiver should receieve a packet then Send the appropriate packet to the original receiver.

58 Bibliography

[1] Sridhar Adepu, Nandha Kumar Kandasamy, Jianying Zhou, and Aditya Mathur. “At- tacks on Smart Grid: Power Supply Interruption and Malicious Power Generation”. In: International Journal of Information Security volume 19. 2020, pp. 189–211. DOI: 10.1007/ s10207-019-00452-z. [2] Sridhar Adepu, Nandha Kumar Kandhasamy, and Aditya Mathur. “EPIC : An Elec- tric Power Testbed for Research and Training in Cyber Physical Systems Security”. In: Computer Security. November. Cham: Springer International Publishing, 2019, pp. 37– 52. ISBN: 978-3-030-12786-2. DOI: 10.1007/978-3-030-12786-2_3. [3] Sridhar Adepu and Aditya Mathur. “Assessing the Effectiveness of Attack Detection at a Hackfest on Industrial Control Systems”. In: IEEE Transactions on Sustainable Comput- ing (2018). DOI: 10.1109/TSUSC.2018.2878597. [4] Alexander Tlyapov Alexander Timorin. SCADA Security Deep Inside. 2013. URL: http: //www.scada.sl/2013/11/scada-security-deep-inside.html (visited on 12/04/2019). [5] Magnus Almgren, Peter Andersson, Gunnar Björkman, Mathias Ekstedt, Jonas Hall- berg, Simin Nadjm-Tehrani, and Erik Westring. “RICS-el : Building a National Testbed for Research and Training on SCADA Security (Short Paper)”. In: Critical Information In- frastructures Security (CRITIS). Cham: Springer International Publishing, 2019, pp. 219– 225. ISBN: 978-3-030-05849-4. DOI: 10.1007/978-3-030-05849-4_17. [6] Daniele Antonioli, Hamid Reza Ghaeini, Sridhar Adepu, Martín Ochoa, and Nils Ole Tippenhauer. “Gamifying Education and Research on ICS Security: Design, Implemen- tation and Results of S3”. In: (2017). arXiv: 1702.03067. URL: http://arxiv.org/ abs/1702.03067. [7] Adnan Anwar and Abdun Naser Mahmood. “Cyber Security of Smart Grid Infrastruc- ture”. In: The State of the Art in Intrusion Prevention and Detection. Vol. abs/1401.3936. CRC Press, Taylor & Francis Group, 2014, pp. 449–472. [8] Alessio Baiocco and Stephen D. Wolthusen. “Causality Re-Ordering Attacks on the IEC 60870-5-104 Protocol”. In: 2018 IEEE Power Energy Society General Meeting (PESGM). IEEE, 2018, pp. 1–5. ISBN: 9781538677032. DOI: 10.1109/PESGM.2018.8586010.

59 Bibliography

[9] Alessio Baiocco and Stephen D. Wolthusen. “Indirect Synchronisation Vulnerabili- ties in the IEC 60870-5-104 Standard”. In: Proceedings - 2018 IEEE PES Innovative Smart Grid Technologies Conference Europe, ISGT-Europe 2018. IEEE, 2018, pp. 1–6. ISBN: 9781538645055. DOI: 10.1109/ISGTEurope.2018.8571604. [10] Justyna Chromik. “Process-aware SCADA traffic monitoring : a local approach”. PhD thesis. University of Twente, 2019. ISBN: 9789036548014. DOI: 10 . 3990 / 1 . 9789036548014. [11] M. H. Cintuglu, O. A. Mohammed, K. Akkaya, and A. S. Uluagac. “A Survey on Smart Grid Cyber-Physical System Testbeds”. In: IEEE Communications Surveys Tuto- rials. Vol. 19. 1. IEEE, 2017, pp. 446–464. [12] Kyle Coffey, Richard Smith, Leandros Maglaras, and Helge Janicke. “Vulnerability Analysis of Network Scanning on SCADA Systems”. In: Security and Communication Networks. Vol. 2018. USA: John Wiley & Sons, Inc., 2018. DOI: 10 . 1155 / 2018 / 3794603. [13] International Electrotechnical Commission. IEC 60870-5-104. 2016. URL: : / / webstore.iec.ch/publication/25035. [14] Carlos Garcia Cordero, Emmanouil Vasilomanolakis, Aidmar Wainakh, Max Mühlhäuser, and Simin Nadjm-Tehrani. “On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection”. In: ACM Transactions on Privacy and Security (2021). DOI: 10.1145/3424155. [15] R. Al-Dalky, O. Abduljaleel, K. Salah, H. Otrok, and M. Al-Qutayri. “A Modbus traf- fic generator for evaluating the security of SCADA systems”. In: 2014 9th International Symposium on Communication Systems, Networks Digital Sign (CSNDSP). 2014, pp. 809– 814. DOI: 10.1109/CSNDSP.2014.6923938. [16] G. Dondossola, G. Garrone, J. Szanto, G. Deconinck, T. Loix, and H. Beitollahi. “ICT resilience of power control systems: experimental results from the CRUTIAL testbeds”. In: 2009 IEEE/IFIP International Conference on Dependable Systems Networks. IEEE, 2009, pp. 554–559. DOI: 10.1109/DSN.2009.5270292. [17] DRAGOS. CRASHOVERRIDE, Analysis of the Threat to Electric Grid Operations. Tech. rep. 2017. [18] OFFICE OF ELECTRICITY. National SCADA Test Bed Fact Sheet. URL: https://www. energy.gov/sites/prod/files/oeprod/DocumentsandMedia/NSTB_Fact_ Sheet_FINAL_09-16-09.pdf (visited on 09/20/2020). [19] Mohamed Elhoseny and Aboul Ella Hassanien. “Using Wireless Sensor to Acquire Live Data on a SCADA System, Towards Monitoring File Integrity”. In: Dynamic Wireless Sensor Networks: New Directions for Smart Technologies. Cham: Springer International Publishing, 2019, pp. 171–191. ISBN: 978-3-319-92807-4. DOI: 10.1007/978-3-319- 92807-4_8. [20] Nicolas Falliere, Liam O Murchu, and Eric Chien. W32.Stuxnet Dossier. Tech. rep. Symantec Security Response, 2011. [21] Finjan. TCP/IP Vulnerabilities. 2016. URL: https : / / blog . finjan . com / tcpip - vulnerabilities/ (visited on 12/09/2019). [22] August Fundin. 2021 attack RICS. https : / / gitlab . liu . se / ida - rtslab / public-code/2021_attack_rics. 2021. [23] Francisco Furtado, Lauren Goh, Sita Rajagopal, Elaine Cheong, and Ericson Thiang. S3-17 : SUTD Security Showdown (Event Report). Tech. rep. Centre of Research in Cyber Security, Singapore University of Technology and Design, 2017.

60 Bibliography

[24] Jonathan Goh, Sridhar Adepu, Khurum Junejo, and Aditya Mathur. “A Dataset to Sup- port Research in the Design of Secure Water Treatment Systems”. In: International Con- ference on Critical Information Infrastructures Security (CRITIS). Springer International Publishing, 2017, pp. 88–99. DOI: 10.1007/978-3-319-71368-7_8. [25] Alex Hern. Ukrainian blackout caused by hackers that attacked media company, researchers say. https : / / www . theguardian . com / technology / 2016 / jan / 07 / ukrainian-blackout-hackers-attacked-media-company. Jan. 7, 2016. (Vis- ited on 06/17/2020). [26] iTrust and Singapore University of Technology and Design. Dataset Characteristics. https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/. (Visited on 06/29/2020). [27] Rajesh Kalluri, Lagineni Mahendra, R. K.Senthil Kumar, and G. L. Ganga Prasad. “Sim- ulation and impact analysis of denial-of-service attacks on power SCADA”. In: 2016 Na- tional Power Systems Conference, NPSC 2016. 1. IEEE, 2017, pp. 1–5. ISBN: 9781467399685. DOI: 10.1109/NPSC.2016.7858908. [28] Robert Koch, Mario Golling, and Gabi Dreo Rodosek. “Towards Comparability of In- trusion Detection Systems: New Data Sets”. In: TERENA Networking Conference. Vol. 7. 2014. [29] Kaspersky Lab. Threat landscape for industrial automation systems. H2 2018. https:// ics - cert . kaspersky . com / reports / 2019 / 03 / 27 / threat - landscape - for-industrial-automation-systems-h2-2018/. Mar. 27, 2019. (Visited on 06/17/2020). [30] Antoine Lemay and José M Fernandez. “Providing SCADA network data sets for intru- sion detection research”. In: CSET @ USENIX Security Symposium. USENIX Association, 2016. [31] Antoine Lemay and Jose M Knight Scott Fernandez. “Defending the SCADA network controlling the electrical grid from advanced persistent threats”. PhD thesis. Université de Montréal, 2013. ISBN: 9781321210170. [32] Chih Yuan Lin and Simin Nadjm-Tehrani. “Understanding IEC-60870-5-104 traffic pat- terns in SCADA networks”. In: CPSS 2018 - Proceedings of the 4th ACM Workshop on Cyber-Physical System Security, Co-located with ASIA CCS 2018. CPSS ’18. Association for Computing Machinery, 2018, pp. 51–60. ISBN: 9781450357555. DOI: 10 . 1145 / 3198458.3198460. [33] Chih-Yuan Lin and Simin Nadjm-Tehrani. “Timing Patterns and Correlations in Spon- taneous SCADA Traffic for Anomaly Detection”. In: 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019). Chaoyang District, Beijing: USENIX Association, 2019, pp. 73–88. ISBN: 978-1-939133-07-6. [34] Richard Lippmann, Joshua W. Haines, David J. Fried, Jonathan Korba, and Kumar Das. “The 1999 DARPA Off-Line Intrusion Detection Evaluation”. In: Computer Networks. Vol. 34. 4. USA: Elsevier North-Holland, Inc., Oct. 2000, pp. 579–595. DOI: 10.1016/ S1389-1286(00)00139-0. [35] Beebi Siti Salimah Binte Liyakkathali, Francisco Furtado, Kandasamy Nandha Kumar, and Ivan Lee. The Third International CISS - Critical Infrastructure Security Showdown 2019: Technical Report. Tech. rep. Centre of Research in Cyber Security, Singapore Uni- versity of Technology and Design, 2019. [36] Matthew V. Mahoney. “Network Traffic Anomaly Detection Based on Packet Bytes”. In: Proceedings of the 2003 ACM Symposium on Applied Computing. SAC ’03. Melbourne, Florida: Association for Computing Machinery, 2003, pp. 346–350. ISBN: 1581136242. DOI: 10.1145/952532.952601.

61 Bibliography

[37] M. Mallouhi, Y. Al-Nashif, D. Cox, T. Chadaga, and S. Hariri. “A testbed for analyzing security of SCADA control systems (TASSCS)”. In: ISGT 2011. IEEE, 2011, pp. 1–7. DOI: 10.1109/ISGT.2011.5759169. [38] Aditya P. Mathur and Nils Ole Tippenhauer. “SWaT: A water treatment testbed for research and training on ICS security”. In: 2016 International Workshop on Cyber-physical Systems for Smart Water Networks, CySWater 2016. Figure 1. IEEE, 2016, pp. 31–36. ISBN: 9781509011612. DOI: 10.1109/CySWater.2016.7469060. [39] Peter Maynard, Kieran McLaughlin, and Berthold Haberler. “Towards Understanding Man-In-The-Middle Attacks on IEC 60870-5-104 SCADA Networks”. In: ICS-CSR 2014. St Pölten, Austria: BCS, 2014, pp. 30–42. ISBN: 9781780172866. DOI: 10.14236/ewic/ ics-csr2014.5. [40] Peter Maynard, Kieran McLaughlin, and Sakir Sezer. “An Open Framework for De- ploying Experimental SCADA Testbed Networks”. In: 5th International Symposium for ICS & SCADA Cyber Security Research 2018. Science Open, 2018, pp. 89–98. DOI: 10. 14236/ewic/ics2018.11. [41] Dave McMillen. Attacks Targeting Industrial Control Systems (ICS) Up 110 Percent. https://securityintelligence.com/attacks-targeting-industrial- control - systems - ics - up - 110 - percent/. Dec. 27, 2016. (Visited on 06/17/2020). [42] Thomas Morris and Wei Gao. “Industrial Control System Cyber Attacks”. In: ICS-CSR. 2013. DOI: 10.14236/ewic/ICSCSR2013.3. [43] A. Nicholson, S. Webber, S. Dyer, T. Patel, and H. Janicke. “SCADA security in the light of cyber-warfare”. In: Computers and Security. Vol. 31. 4. Elsevier Ltd, 2012, pp. 418–436. DOI: 10.1016/j.cose.2012.02.009. [44] Matoušek Petr. “Description and analysis of IEC 104 Protocol Petr Matoušek”. In: (2017). URL: https : / / www . fit . vut . cz / research / publication - file / 11570/TR-IEC104.pdf. [45] Durga Samanth Pidikiti, Rajesh Kalluri, R. K. Senthil Kumar, and B. S. Bindhumadhava. “SCADA communication protocols: vulnerabilities, attacks and possible mitigations”. In: CSI Transactions on ICT. Vol. 1. 2. Springer International Publishing, 2013, pp. 135– 141. DOI: 10.1007/s40012-013-0013-5. [46] Qais Qassim, Norziana Jamil, Izham Zainal Abidin, Mohd Rusli, Salman Yussof, Roslan Ismail, Fairuz Abdullah, Norhamadi afar, Hafizah Hasan, and Maslina Daud. “A Sur- vey of SCADA Testbed Implementation Approaches”. In: Indian Journal of Science and Technology. Vol. 10. 2017, pp. 1–8. DOI: 10.17485/ijst/2017/v10i26/116775. [47] Zhiyun Qian, Zhuoqing Mao, and Yinglian Xie. “Collaborative TCP Sequence Num- ber Inference Attack: How to Crack Sequence Number Under A Second”. In: CCS ’12 (2012), pp. 593–604. DOI: 10.1145/2382196.2382258. [48] Tanya Roosta, Wei-Chieh Liao, Wei-Chung Teng, and Shankar Sastry. “Testbed Imple- mentation of a Secure Flooding Time Synchronization Protocol”. In: 2008 IEEE Wireless Communications and Networking Conference. IEEE, 2008, pp. 3157–3162. DOI: 10.1109/ WCNC.2008.551. [49] Peter Schneider and Alexander Giehl. “Realistic Data Generation for Anomaly Detec- tion in Industrial Settings Using Simulations”. In: Computer Security. Ed. by Sokratis K. Katsikas, Frédéric Cuppens, Nora Cuppens, Costas Lambrinoudakis, Annie Antón, Stefanos Gritzalis, John Mylopoulos, and Christos Kalloniatis. Cham: Springer Interna- tional Publishing, 2019, pp. 119–134. ISBN: 978-3-030-12786-2. DOI: 10.1007/978-3- 030-12786-2_8.

62 Bibliography

[50] “Security problems in the TCP/IP protocol suite”. In: SIGCOMM Computer Communi- cation Review. Vol. 19. 2. Association for Computing Machinery, 1989, pp. 32–48. DOI: 10.1145/378444.378449. [51] Keith Stouffer, Victoria Pillitteri, Suzanne Lightman, Marshall Abrams, and Adam Hahn. Guide to Industrial Control Systems (ICS) Security. National Institute of Standards and Technology Special Publication 800-82 (Revision 2). NIST, 2015. DOI: 10.6028/ NIST.SP.800-82r2. [52] Aleksandr Timorin. iec-60870-5-104.py: IEC-60870-5-104 (IEC 104) protocol discovery tool. https://github.com/atimorin/PoC2013/blob/master/iec- 60870- 5- 104/iec-60870-5-104.py. (Visited on 06/29/2020). [53] Robert Udd, Mikael Asplund, Simin Nadjm-Tehrani, Mehrdad Kazemtabrizi, and Mathias Ekstedt. “Exploiting Bro for Intrusion Detection in a SCADA System”. In: Pro- ceedings of the 2nd ACM International Workshop on Cyber-Physical System Security. CPSS ’16. New York, NY, USA: Association for Computing Machinery, 2016, pp. 44–51. ISBN: 9781450342889. DOI: 10.1145/2899015.2899028. [54] “WADI: A water distribution testbed for research in the design of secure cyber physical systems”. In: Proceedings - 2017 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks, CySWATER 2017. CySWATER ’17. Association for Computing Machinery, 2017, pp. 25–28. ISBN: 9781450349758. DOI: 10.1145/3055366.3055375. [55] Wei Gao, T. Morris, B. Reaves, and D. Richey. “On SCADA control system command and response injection and intrusion detection”. In: 2010 eCrime Researchers Summit. 2010, pp. 1–9. DOI: 10.1109/ecrime.2010.5706699. [56] Bonnie Zhu, Anthony Joseph, and Shankar Sastry. “A taxonomy of cyber attacks on SCADA systems”. In: 2011 IEEE International Conferences on Internet of Things and Cy- ber, Physical and Social Computing, iThings/CPSCom 2011. IEEE, 2011, pp. 380–388. ISBN: 9780769545806. DOI: 10.1109/iThings/CPSCom.2011.34.

63