An -Based, Integrity-First Communication Protocol for IoT Devices by Elizabeth Reilly Submitted to the Department of Electrical Engineering and Science in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2019 Massachusetts Institute of Technology 2019. All rights reserved.

Author...... Department of Electrical Engineering and Computer Science May 24, 2019

Certified by...... Michael Siegel Principal Research Scientist Thesis Supervisor

Accepted by ...... Katrina LaCurts Chairwoman, Department Committee on Graduate Theses 2 An Ethereum-Based, Integrity-First Communication Protocol for IoT Devices by Elizabeth Reilly

Submitted to the Department of Electrical Engineering and Computer Science on May 24, 2019, in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering

Abstract The use of IoT devices in smart cities, advanced energy delivery systems, manufactur- ing plants and transportation systems is rapidly increasing. These systems are often responsible for communicating critical data about infrastructure and system state. Despite the significance of IoT devices, many of these devices lack communication protocols with data integrity as a priority. Without data integrity, these systems become reliant on compromised data, and ultimately fail. Attackers can use these vulnerabilities to wage cyber-physical attacks. The light client is an integrity-first communication protocol for IoT devices based on the Ethereum . This light client ensures that data is not compromised and is lightweight, at a total mem- ory consumption size of 1.2 MB. Therefore, this light client is distributed, secure, and light enough to fit on many IoT devices and ensure that integrity is maintained where it is needed most [24].

Thesis Supervisor: Michael Siegel Title: Principal Research Scientist

3 4 Acknowledgments

First and foremost I would like to thank my supervisor Michael Siegel, for continually supporting and encouraging my research. I would also like to thank my colleagues Gregory Falco and Matthew Maloney. Their technical guidance was instrumental in helping me complete this research. I would further like to thank the Cyber Resilient Energy Delivery Consortium (CREDC)1 for all of the support and resources they have given me and for welcoming me into their research group. Finally, I would like to thank my friends and family. Specifically, my Mother and Father, for helping me set up and run all of my tests on their home Wifi system. They have supported me endlessly in both my undergraduate and graduate degree and this thesis would not have been possible without them.

1This material is based in part on work supported by the Department of Energy under Award Number DE-OE0000780. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

5 6 Contents

1 Introduction 15

2 Related Works 17 2.1 Legacy Communication Protocols ...... 17 2.1.1 Modbus ...... 17 2.1.2 DNP3 ...... 18 2.2 Modern IoT Communication Protocols ...... 18 2.2.1 DTLS and CoAP ...... 18 2.2.2 MQTT ...... 19 2.3 IoT ...... 19 2.3.1 Tangle ...... 20 2.3.2 IoT Chain ...... 20 2.3.3 IoTex ...... 21 2.3.4 NeuroMesh ...... 21

3 Light Client Implementation 23 3.1 Using Ethereum as a Base ...... 23 3.2 Avoiding Storing Chain Data ...... 24 3.2.1 Nonce ...... 24 3.2.2 Gas Price and Gas Limit ...... 25 3.3 Reducing Code Size ...... 25 3.3.1 Code Removed ...... 26 3.3.2 Compiler Flags ...... 26

7 3.3.3 UPX and Code Compression ...... 26 3.4 Architecture and Communication ...... 26

4 Testing Environment 29 4.1 Ropsten Network Constraints ...... 29 4.2 Testing Dashboard ...... 30 4.3 Devices ...... 31 4.3.1 Mac OS and Unix ...... 31 4.3.2 ThinkPad and Linux ...... 31 4.3.3 64 bit ARM ...... 32 4.4 Parameters ...... 32

5 Preliminary Testing 35 5.1 Transaction Approval and Removal Time ...... 35 5.2 Peer Count Over Time ...... 37 5.3 Transaction Size ...... 38 5.4 Stress Testing ...... 39 5.4.1 Maximum Queue Size ...... 39 5.5 Device Comparisons ...... 40

6 Discussion 43 6.1 Comparison to other Blockchains ...... 43 6.2 Evaluation as a Communication Protocol ...... 44 6.3 Applications ...... 45 6.3.1 Smart Cities ...... 45 6.3.2 Energy Delivery Systems ...... 45 6.3.3 Device Updates ...... 45 6.4 Limitations ...... 46

7 Conclusion 47 7.1 Summary of Contributions ...... 47 7.2 Future Work ...... 48

8 A Tables 51

B Figures 53

9 10 List of Figures

3-1 Distributed node architecture ...... 27

4-1 The GL iNet router, a sample 64 bit ARM device, on the left. The GL iNet plugged into a Verizon ethernet port on the right...... 32

5-1 A graph of transaction approval times over a 100 minute period. . . . 35 5-2 A graph of the time it takes for transactions to be removed from the local queue when using a peer limit...... 36 5-3 A graph of the peer count over time...... 37 5-4 A graph of how the transaction times change with data size...... 38 5-5 A graph of how the transaction cost changes with the data size. . . . 39 5-6 The number of transactions sent within various time ranges for each device...... 40

B-1 Sample output from the testing dashboard when testing for transaction time. Output is human-readable. Testing dashboard also includes a regrex file to strip this output file into a dataset...... 53 B-2 Sample output from the testing dashboard when testing for peer count. Output is human-readable. Testing dashboard also includes a regrex file to strip this output file into a dataset...... 54

11 12 List of Tables

3.1 The impact of each code reduction method...... 25 3.2 A comparison of the original Ethereum source code to the light client implementation...... 25

4.1 Statistics about the average performance of the Ethereum network [8]. 29 4.2 Size of each cross-compiled binary...... 31

5.1 The average time to send a transaction per . Taken across 4096 transactions...... 40

6.1 Comparison of different Blockchain algorithms [5][6][13][29]...... 43 6.2 Comparison of different IoT communication protocols [12]...... 44

A.1 Sizes of the cross-compiled light client for operating systems and ar- chitectures that were not explicitly tested in this thesis...... 51

13 14 Chapter 1

Introduction

Although the use and prevelance of IoT devices is on the rise in homes and in cities, many of these devices lack proper security [14]. There are hundreds of videos ded- icated to demonstrating how to hack into IoT devices in minutes [7][24]. There are currently few communication protocols and the variation in manufacturers makes standardizing communicaiton difficult. Furthermore, communication becomes even more difficult on memory and processing power-constrained devices such as smart me- ters and CCTV security cameras [13]. This research focuses specifically on improving the communication integrity of critical infrastructure IoT. Integrity-first communication, otherwise known as prioritizing correct communi- cations, is important for many IoT devices. IoT devices such as smart meters and electronic appliances can be found in the home [14]. They can also play a large role in transportation systems such as subways and traffic control [16]. Furthermore, IoT devices can be found in energy delivery systems or smart cities, where lack of integrity of data can have cyber-physical consequences [24]. The integrity of these devices is critical. It can be hard to determine a single universal communication protocol for IoT de- vices because these devices often have different operating systems and configurations [11]. This is even more difficult for IoT devices with limited memory and compu- tational resources as they often lack the space to be able to host a communication protocol at all [18]. Many integrity-first IoT communication protocols have been

15 suggested, but they are often not scalable to large networks of devices [22]. Given the scale and distributed nature of IoT devices, a suitable integrity-first communication protocol is needed [24]. Blockchain is therefore a good candidate. Blockchain is a distributed in which different nodes in the system can send transactions and also verify the transactions of other participants. The global blockchain is comprised of the overall series of approved transactions, in order of when they were approved. This global blockchain is determined by consensus among the nodes. Es- sentially, the majority of nodes in the system will agree about which transactions should be approved and these transactions then go into the blockchain so that the majority of nodes will always have the same blockchain. This blockchain is therefore immutable and secure. Blockchain provides a scalable, distributed record that en- forces consensus across all participating nodes [23][24]. Once a transaction has been approved by the majority of nodes, it cannot be altered or deleted unless an attacker controls over 51% of the nodes. If an attacker does gain control over 51% of the nodes, that attacker could control which direction the chain grows, essentially choos- ing which transactions to verify. This is known as a ’51% attack’. However, the risk of such an attack in a large network is very low as the attacker would need to gain control over a massive number of nodes. Therefore, large blockchains are effective at guaranteeing the integrity of IoT communications [23][24]. In general, hosting a blockchain node on an IoT device has been difficult because each node must store the entire chain of transactions. Many IoT devices have mem- ory and computational limits which keep them from being able to store entire chain data. While current solutions have bypassed this issue by using lighter clients or hard- ware solutions, even these models have limitations [1][2][13][24][25]. The Ethereum blockchain light client presented in this paper overcomes these limits. The light client provides integrity-first communication for IoT devices at a small size of only 1.2 MB.

16 Chapter 2

Related Works

Currently there is no industry standard for how IoT devices should communicate with integrity. The current communication protocols for IoT devices often lack proper secu- rity considerations or are not scalable to large networks. Blockchain has only recently been explored as a potential solution to the problem of integrity-first communication for IoT devices. However, modern IoT blockchain implementations are often too heavy, too centralized, or depend on hardware solutions.

2.1 Legacy Communication Protocols

In the past, communication protocols were mainly designed for isolated systems. These isolated systems were modeled off of power plants or otherwise private net- works. Therefore, communication integrity was not often a top priority. Similarly, the manufacturers of IoT devices did not design these devices to be critical and there- fore, often did not consider security. Therefore, legacy communication protocols often lack scalability and integrity of communication [19].

2.1.1 Modbus

Modbus was one of the original proposed communication mechanisms for IoT devices. Modbus was originally published by Modicon, now known as Schneider Electric, and

17 was intended for isolated systems. It is used for transmitting information over serial lines between electronic devices. Modbus follows the master-slave model where the device requesting information is the master and the device sending it is the slave. In general, there is usually one master in a system and many more slaves. This creates a source of centralization which is problematic for the distributed nature of IoT networks. Furthermore, Modbus was originally designed for isolated systems so the integrity of messages was not taken into consideration [19][15].

2.1.2 DNP3

DNP3 is another early communication protocol for IoT devices built upon TCP/IP. It has a master-slave structure, similar to Modbus, and likewise does not provide integrity of messages and no authentication mechanism between master and slave. Therefore, it also is not a good candidate for the distributed integrity-first communi- cation protocol needed by IoT devices [15].

2.2 Modern IoT Communication Protocols

IoT devices with limited resources are often referred to as constrained nodes. These constrained nodes pose unique challenges to communication and security as they often do not have the memory space or computational abilities to host a thorough protocol. Despite these challenged, DTLS (Data Transport Layer Security), CoAP (Constrained Application Protocol), and MQTT (Message Queuing Telemetry Transport) attempt to create secure communication protocols for constrained nodes. However, these protocols all suffer from a similar lack of scalability.

2.2.1 DTLS and CoAP

Currently, DTLS is the default security protocol used for sending messages between constrained nodes. DTLS is built upon the UDP protocol, which means it has to deal with packet reordering, loss of datagram and data larger than the size of a datagram

18 network packet [18]. Furtherore, the DTLS protocol is highly vulnerable to denial of service attacks caused by establishing a higher number of invalid, half-open, secure sessions. The key storage system on the server side also does not scale well with the number of clients [28]. Therefore, DTLS lacks scalability. Similarly, the IETF (Internet Engineering Task Force) has attempted to create a standard for communication protocols for constrained nodes. One such protocol is CoAP, which is also built upon UDP and includes a subset of HTTP (HyperText Transfer Protocol) functions that are optimized for resource scarce environments [26]. However, similar to DTLS, this protocol does not perform well under high load or congestion [26]. It also suffers from the same issues related to UDP as DTLS [24]. Therefore, neither of these communication protocols are able to achieve a distributed, scalable form of integral communication.

2.2.2 MQTT

MQTT is another communication protocol designed specifically for constrained nodes. It is built on TCP (Transmission Control Protocol) and IP (Internet Protocol), mean- ing it does not rely on UDP but can still suffer from the ”TCP meltdown problem” [27]. MQTT has three different Quality of Service Levels, which represent reliability standards. The first level is 0, in which a message is delivered at most once and no acknowledgement of reception is required. The second is level 1, in which every mes- saged is delivered at least once and acknowledgement of reception is required. The third level is 2, in which a four-way handshake mechanism is used to deliver a mes- sage exactly once. Despite these different levels of reliability, MQTT lacks scalability because nodes must connect to a centralized broker in order to perform handshake procedures [22][24].

2.3 IoT Blockchains

Blockchain has been proven to be highly scalable but has not been largely applied to IoT devices nor explored as a source of integrity-first communication. The main

19 reason for this is that blockchain is memory-heavy and also requires computational resources. Most IoT devices cannot meet these high standards. However, a few IoT specific blockchains have been designed to try and overcome these obstacles.

2.3.1 Tangle

The most well known variation of blockchain specifically designed for IoT is the tangle, which is accompanied by the IOTA coin [24][25]. The tangle specifically stores the chain in a DAG (Directed Acyclic Graph) structure to reduce space. It also has a light variation specifically for highly constrained nodes. In order for nodes to get their transactions approved, they must first approve two other transactions in the system. This is how the blockchain algorithm makes progress. Because of this design choice, transactions are fee-less and there are no miners in the system. Without miners, the system has less incentive to grow and does not attract users that do not need their own transactions to be approved. This can open up the system to a 51% attack if the network is not large enough. Furthermore, a centralized coordinater is needed in order to decide which nodes can approve which transactions, or else some transactions would never get approved. This is problematic because it creates a level of centralization and single point of failure within the network [3]. The Tangle is not secure or distributed enough to provide integrity-first communication for IoT devices.

2.3.2 IoT Chain

IoT chain uses a DAG structure similar to that of the tangle and also uses Simpli- fied Payment Verification (SPV) to facilitate operations on smaller devices [1]. SPV allows devices to conduct payment verification using block headers. This means that devices do not need to store entire chain data, and can instead store just the chain headers. IoT chain also uses Practical Tolerance (PBFT) to ensure fast consensus and fast transaction times. PBFT is a state machine replication algo- rithm that is based on the consistency of message passing. The combination of the DAG structure and PBFT allow transaction times for IoT chain to be in millisec-

20 onds, a feature that can be useful for many IoT devices. Despite these benefits, the primary limitation of IoT Chain is that it requires a specific operating system and linking module. This makes it a hardware solution which is more difficult to integrate with older devices. Furthermore, some constrained devices do not have the memory ability to store block headers. Also, the relatively small size of IoT Chain makes it vulnerable to a 51% attack.

2.3.3 IoTex

IoTex similarly uses PBFT (Practical Byzantine Fault Tolerance) and SPV to ensure fast transaction times and limited memory consumption [2]. The key concept behind IoTex is the idea of blockchains within blockchains. Essentially, the designers conclude that different blockchains should be used for different kinds of IoT devices. This is because different devices have different features and by separating these features into different blockchains, no one device has to store large chains. For example, one chain might record smart contracts and another one might store transactions. There is one root blockchain that manages all the blockchains. This creates a layer of centralization that can be problematic in large, distributed networks of IoT devices. Furthermore, like the Tangle and IoT Chain, the small size of IoTex makes it vulnerable to a 51% attack [24].

2.3.4 NeuroMesh

One blockchain that successfully provides secure communication for IoT devices, which is most similar to this design, is NeuroMesh [13][24]. NeuroMesh functions as a friendly botnet to fight against other botnets and communicates security com- mands to IoT devices using the blockchain as the communication protocol. While this technology does provide integral communication because of the size of the (minimizing the 51% attack risk), it has several operational con- straints. First, NeuroMesh is only able to send 80 bytes of characters at a time because of the limits of Bitcoin transactions. This might be fine for sending security

21 commands to IoT devices, but is insufficient to handle substantial data transfers. Second, the Bitcoin network is slow and expensive compared with Ethereum [29]. Finally, Neuromesh uses SPV which must store block headers and requires additional memory space [13][23]. This limits how small it can become. The light client design overcomes many of the limitations of NeuroMesh.

22 Chapter 3

Light Client Implementation

There were many factors to consider in the implementation of the light client. One of the most important considerations was size. The goal was to create the smallest blockchain implementation possible while still maintaining integrity and functionality. To accommodate this goal, it was decided that chain data should not be stored at all as it requires too much space. This led to many challenges in how to maintain functionality without storing chain data, such as maintenance of the nonce. Despite these obstacles, the fully functioning light implementation of Ethereum is only 1.2 MB.

3.1 Using Ethereum as a Base

The first design consideration to be made was what blockchain algorithm to use. Ethereum is a good candidate for the light blockchain design because it is largely used, scalable, and has fast transaction times and small transaction fees. This prevents 51% attacks and allows for faster transactions than in Bitcoin. Furthermore, data transfer is not limited to 80 bytes as in Bitcoin. The cost of a transaction is instead proportional to the amount of data sent. Like Bitcoin, Ethereum uses as a decentralized consensus algorithm to hash blocks to the blockchain. The coins exchanged in the Ethereum network are Ethers rather than [29]. Therefore, Ethereum is completely decentralized, secure, and large but has slightly advantageous

23 transaction times and data storage limits. This makes Ethereum the best blockchain variation for IoT devices.

3.2 Avoiding Storing Chain Data

A considerable challenge was avoiding the need to store the Ethereum chain data (including headers) on constrained nodes so that they still could participate in the Ethereum network. Nodes rely on chain data to send transactions because transac- tions must include the correct nonce, or previous transaction count, and appropriate gas, or cost to send the transaction, in order to be approved [29]. Therefore, without chain data, the nonce and gas estimates must be stored locally and correctly in order for the light client to continue to function.

3.2.1 Nonce

A node’s nonce is a count of its outgoing transactions [29]. Every time a node wants to make a new transaction it must include its current nonce, otherwise the transaction will be rejected. Keeping the nonce consistent is crucial for the light client. Two different options were considered for storing the nonce of a transaction. In the first, the system can send a call to a full node hosted by the site Infura to query for its nonce [10]. The downside of this option is that Infura is a third party that could be shut down at any time. Furthermore, the HTTP call adds about .6 MB of space, which is about a 50% increase in storage size, too much for certain constrained nodes. In the second, a transaction is assumed to have been sent, and the nonce incre- mented, after it is broadcast to 25 peers. The number 25 was chosen because this is the maximum peer count any node can have at any given time [29]. Therefore, nodes are reliant on being connected to 25 peers, meaning that the developers of Ethereum considered this number sufficient. While this could lead to a transaction being re- moved from the local queue before it is actually sent, in reality this is unlikely. Users of the light client may prefer one method of accounting for the nonce over another.

24 3.2.2 Gas Price and Gas Limit

In addition to the nonce, nodes need to have correct gas limit and gas price values in order to ensure that their transactions are mined and stored in the chain [24][29]. Gas pays for the computation of the transaction regardless of whether it is accepted or not. This transaction fee is equal to gas limit * gas price. These values have been hard coded at a gas price of 30 Gwei (equal to 0.00000003 ether) and a gas limit of 210,000 units of gas because these values are likely to get transactions added to the chain. Therefore, light nodes do not need to query for this data. With the gas and nonce set correctly, the light node can exchange transactions without storing chain data. Overall Code Reduction Size Before (MB) Size After (MB) Percent Reduced Code Removal: 29.6 5.4 81.76% Compiler Flags: 5.4 5 8% UPX: 5 1.2 76%

Table 3.1: The impact of each code reduction method.

3.3 Reducing Code Size

Eliminating the need to store chain data greatly reduces the size of the code that needs to be stored on the node. However, the current code for Ethereum is still 30 MB without chain data. Therefore, as much code as possible has been stripped from the node in order to reduce its size [24].

Original Code vs. Current Code Original Current Percent Reduced Size: 29.6 MB 1.2 MB 95.95% Lines of Code: 864,045 596,549 30.95% Number of Files: 186 26 86.02%

Table 3.2: A comparison of the original Ethereum source code to the light client implementation.

25 3.3.1 Code Removed

Since the light nodes are not acting as miners in the system, there are many aspects of the code base that can be removed for the light client, such as any code related to mining. Furthermore, without chain data, many databases and consensus algorithms could also be removed. As seen in Table 3.2, the code has been stripped from 29.6 MB to 1.2 MB while preserving the ability to send transactions and query other nodes. This included about 267,496 lines of code and 160 files removed. Although 1 MB is sufficient for many smart city devices (CCTVs), it is possible that this size can be reduced even more in the future.

3.3.2 Compiler Flags

Some compiler flags were used that also reduced code size. When built with ”-s” and ”-w”, the size of go code can be reduced by about 8%, as seen in table 3.1. This is because this flags strip out debugging info that is often not necessary.

3.3.3 UPX and Code Compression

UPX was also used to reduce the code size. UPX is able to compress go binaries to about 24% of their original size with the ”–brute” option, as seen in Table 3.1. After compressing the go binary, the binary was still able to run and function normally. Therefore, there is no need to decompress the binary once it is on a device. This greatly reduced the code size as seen in table.

3.4 Architecture and Communication

As seen in Figure 3-1, the light client is able to interact with full nodes and other light nodes in a distributed manner. The light nodes do not store any chain data but can both send and receive data, as indicated by the two-way transaction broadcast arrows.

26 Figure 3-1: Distributed node architecture

The light nodes send data by adding that data to a transaction and then sending the transaction to another node. The cost of that transaction is proportional to the amount of data sent. When the light node wishes to send a transaction, it pings all Ethereum peers that it is connected to, just like any full node would. Only a full node can process the light node’s data, which is indicated by the red ”x” shown between light node communications. To process the light nodes data means to actually mine the block containing that transaction and add that transaction to the blockchain. Because light nodes do not store the chain, they cannot actually add the transaction to the chain. When a full node successfully mines the light node’s block (indicated by a checkmark), other full nodes will no longer be able to mine the same block (indicated by an ”x”). After the light node’s block is successfully mined it is posted to the Ethereum network as an immutable value on the ledger. The fee that the light client must pay is usually on the order of a few cents [17][24]. Light clients are able to receive data from the full nodes by pulling data from the blockchain. Although these light nodes will not store the blockchain themselves, they can query other nodes for information about the chain. Therefore, these nodes can query to see whether any recent transactions have been sent to them and then

27 download these specific, small transactions. Therefore, the light nodes and full nodes are able to send and receive messages effectively, and the small node does not need to store the entire chain data [24]. If the light client has a large amount of information it wants to send all at once, it can queue transactions and aggregate the transfer. The gas limit of an Ethereum transaction is about 3,000,000 gas which allows up to 780 kB of data per transaction [4]. Therefore, at any given time, transactions are queued up to this amount.

28 Chapter 4

Testing Environment

Average Ethereum Statistics Transaction Time: 13.3 s Transaction Fee: .0012 ETH ($.191 USD) Transaction Value: 3.51 ETH ($560.85 USD) Transactions per hour: 25,879

Table 4.1: Statistics about the average performance of the Ethereum network [8].

4.1 Ropsten Network Constraints

Testing at a large scale on the Ethereum network is expensive, given that a transaction costs around 10 cents on average [8]. Therefore this device was tested using the Ropsten test network, which most closely emulates the real Ethereum network. The Ropsten test network runs the same protocol as Ethereum. The coins on this network are called rETHs and they have no real monetary value. They are obtained through various free faucets. It is common for developers to test their apps on the Ropsten network before deploying to the real Ethereum network [20]. In order to gauge the effectiveness of the light client, the already existing constraints of the Ethereum Network are displayed in table 4.1. The Ropsten network statistics are assumed to be similar.

29 In both Ethereum and Ropsten, the Wifi connection must be set up to allow port forwarding on port 30303. The default of most Wifi networks is to have this feature shut off. Therefore, these router settings had to be changed for all devices. A private wifi network was used over the public MIT network in order to ensure these port forwarding rules could be enabled.

4.2 Testing Dashboard

A testing dashboard was created that can be run from different devices in order to test the effectiveness of the light client in broadcasting transactions. This test code is dynamic and can be adjusted to send transactions in specific bulk sizes or in specific time intervals. It can monitor the time it takes for these transactions to be confirmed by the network and also for the transactions to be removed from the local queue. This test code is typically run after the node has accumulated at least 20 peers. This is because transaction times are most accurate when the node is in its steady state, meaning that the node has nearly the maximum number of peers, which is defaulted to 25 [29]. Therefore, once the node has reached at least 20 peers, it begins sending transactions at a specified rate. In some steady state tests, transactions are sent slowly to monitor the effectiveness and variance of the network over time. In other tests, large numbers of transactions are sent all at once to measure the robustness of the system. In order to gauge an accurate transaction confirmation time, the test code queries Infura. When Infura indicates that the nonce of an address has changed, it means that that transaction was approved [10]. Therefore, the difference between the time a transaction was sent and the time it takes for the nonce to change is equal to the transaction time. This gives a somewhat accurate transaction time, with a slight lag from the call to Infura. A similar method is used to test the time it takes for a transaction to leave the transaction queue. When a transaction leaves the queue, the difference between the time stamp when it was removed and the time stamp when it was first broadcast is

30 determined. This represents the total time it took for the transaction to leave the queue. The difference between this number and the time it took for the transaction to be approved also represents the time the transaction stayed in the queue after already being confirmed. Finally, the testing dashboard can monitor other aspects of the system, such as how the number of peers in the network changes over time.

4.3 Devices

Cross Compiled Code Sizes Unix Linux ARM 64 Size (MB): 1.2 3.5 1.6

Table 4.2: Size of each cross-compiled binary.

The light client was tested on various devices representative of the kinds of IoT devices this technology will eventually be put on. Different architectures and operat- ing systems were targeted to ensure the device was operable on a variety of devices. Table 4.2 shows the cross-compiled size of the light client binary for various operating systems that were tested.

4.3.1 Mac OS and Unix

The first device tested on was a Mac OS. The binary was compiled normally at a size of 1.2 MB. The Mac OS runs Unix.

4.3.2 ThinkPad and Linux

The next device was a ThinkPad laptop with Linux. The code had to be compiled specially for Linux amd64 and was put on the Linux laptop using SCP (Secure Copy Protocol). The size of the light client on this device is 3.5 MB, as seen in Table 4.2.

31 4.3.3 64 bit ARM

Figure 4-1: The GL iNet router, a sample 64 bit ARM device, on the left. The GL iNet plugged into a Verizon ethernet port on the right.

The next device tested was the GL iNet Travel Router B1300 [9]. This device uses the 64 bit ARM architecture. The firmware is based on OpenWrt/LEDE. The memory storage is 32MB Nor Flash. The binary was cross compiled in order to run on the router. The binary had to be statically linked in order to draw in packages that the GL router did not have access to. Therefore, the size of this binary ended up being about 1.6 MB. This was still sufficiently small to be placed on the router. The binary was sent to the router using SCP. The router had to be configured to allow port forwarding on port 30303, the port used for the Ethereum and Ropsten peer to peer networks.

4.4 Parameters

There are multiple parameters that must be considered in the testing of this device:

32 Latency: How long does it take for a transaction to be broadcast. Peer Count: How many peers is the light client connected to. Integrity: Whether the data inside the transaction is preserved. Liveness: Whether the light client can continue to send transactions over time.

These are the parameters that were measured in the testing process. These param- eters ensure that the device has low latency, and can send transactions in a reasonable time frame. The parameters also measure the peer count as this represents the con- nectivity of the device. Integrity is at the core of the light client design, therefore, it is important to verify that transaction data remains intact. Finally, liveness ensures that the light client can remain active over long periods of time and recover from failures.

33 34 Chapter 5

Preliminary Testing

5.1 Transaction Approval and Removal Time

Figure 5-1: A graph of transaction approval times over a 100 minute period.

In the test shown in Figure 5-1, the testing dashboard shows how the transaction confirmation time varies over time. The testing dashboard sent 1 transaction per

35 minute for 100 minutes. The test begins once the node has accumulated 20 peers as this represents the steady state of the node. The dashboard then monitors how long it takes for the transactions to be approved using a call to Infura. As seen in Figure 5-1, transaction times vary from less than 10 seconds to around 60 seconds for all 3 operating systems. The three operating systems have comparable transaction times throughout the 100 minutes and all are continually able to send transactions over time.

Figure 5-2: A graph of the time it takes for transactions to be removed from the local queue when using a peer limit.

Similarly, the time it takes for the transactions to leave the local queue is tracked, as seen in Figure 5-2. As mentioned in Section 3.2.1, one alternative to using a call to Infura to keep track of the nonce is to assume that a transaction has been sent after it was broadcast to a certain number of peers. In this test that number was 20. In Figure 5-2, you can see that the local queue removal time often spikes and then drops off. These spikes correspond with the peer count falling below 20 as the transaction is not removed from the queue until the number of peers it has been sent to is equal

36 to 20. However, in all three devices this behavior was consistent and no transactions were dropped prematurely. While this clearly tests latency, it also tests liveness. This code ran for 100 minutes to ensure that the system did not crash and it was found that transactions were continually able to send.

5.2 Peer Count Over Time

Figure 5-3: A graph of the peer count over time.

The testing dashboard can also monitor how the peer count changes over time. Figure 5-3 shows that the peer count rises quickly to around 20 peers, and then drops off between 20-25 peers for all operating systems. This was tested over 100 minutes, to see how the peer count would change. This mainly tests peer count and liveness. Over the course of 100 minutes, the peer count does not drop below 20 meaning that devices are still connected to the peer network.

37 5.3 Transaction Size

Figure 5-4: A graph of how the transaction times change with data size.

In this test, the testing dashboard monitored how the transaction time is affected by the transaction size. The testing dashboard sent transactions every minute for 100 minutes, incrementing the data size for each transaction. As seen in the Figure 5-4, there was no real correlation between data size and confirmation time, as expected, for all devices. For each transaction, the data received was identical to the data sent. Therefore, this test tests for the integrity of communication by ensuring that messages of varying sizes are sent correctly. The testing dashboard also analyzes how transaction cost changes with transaction size. As seen in Figure 5-5, the transaction cost increases at a constant rate. The cost increases because data stored on the Ethereum chain is permanent and immutable. Therefore, the additional cost is essentially the cost of data storage as well as the cost of gas to get the transaction approved.

38 Figure 5-5: A graph of how the transaction cost changes with the data size.

5.4 Stress Testing

While the previous tests sent transactions once every minute over a long period of time, the light client must also be stress tested to the maximum capacity of the device. This involved sending the maximum queue size of transactions from each device.

5.4.1 Maximum Queue Size

Multiple tests were ran where large batches of transactions were sent all at once. The testing dashboard sends the maximum capacity of the pending queue, which is 4096 transactions. As seen in Figure 5-6, the transactions times for these transactions were comparable to those in Figure 5-1, most are within 0-60 seconds. Furthermore, all devices have similar ranges, meaning that the light client performs comparably on these devices. This shows that when the maximum number of transactions are in the queue, the node can still perform well.

39 Figure 5-6: The number of transactions sent within various time ranges for each device.

5.5 Device Comparisons

Operating System Comparisons Unix Linux ARM 64 Average Latency (s): 20.6 45.9 35.5

Table 5.1: The average time to send a transaction per operating system. Taken across 4096 transactions.

In Table 5.1, the average latency across the light client on different operating systems is compared. In Table 4.1, it is listed that the average transaction time on the Ethereum Network is 13.3 seconds. While the latency of the light client is within reason, it is slightly slower than the average Ethereum latency. This could be because the tests in this thesis were run on nodes that only connected to 20-25 peers whereas the statistics of the Ethereum network may have been taken on nodes with a much higher peer maximum. Furthermore, even though Ropsten and Ethereum are very similar networks, it is possible that transactions experience more latency on the Ropsten network. There is

40 less motivation for peers and miners to join this network as the rETH coin does not have any monetary value [29]. There is also about a 10 second difference in transaction latency among the devices tested. This can be explained by the variance in the connectivity of the devices’ peers or the strength of the wifi connection

41 42 Chapter 6

Discussion

The light client is significant because it allows IoT devices to join the Ethereum network. The light client differs from other Blockchain technologies in that it is light weight, not computationally heavy, and communications are guaranteed to be secure and immutable. The light client paves the way for new applications for IoT devices with limited resources.

6.1 Comparison to other Blockchains

Blockchains Light Client Ethereum Bitcoin Tangle Neuromesh Latency: 20.6 s 15 s 10 m 80 s 10 m Binary Size (MB) : 1.2 29.6 56 16.7 1

Table 6.1: Comparison of different Blockchain algorithms [5][6][13][29].

As seen in Table 6.1, the light client has one of the smallest sizes and fastest transaction times of the given blockchain algorithms. The nearest neighbor in size is the Neuromesh binary [13]. While the Neuromesh binary is slightly smaller than the light client, it is built upon Bitcoin and therefore has transaction times of around 10 minutes, almost 100% slower than the light client. The Tangle has a much slower average transaction time of 1 minute and 20 seconds and the Tangle binary is much

43 larger than the light client binary [6]. Furthermore, the sizes listed in this table are binary sizes alone. All of the Blockchain algorithms other than the light client currently store a large amount of chain data or header data. This chain data can often be many multiples larger than the binary size [29]. The light client greatly reduces the size of the blockchain algorithm while also maintaining a fast transaction time in comparison to other blockchains. It is the best fit for IoT devices to participate in blockchain.

6.2 Evaluation as a Communication Protocol

IoT Communication Protocols Light Client MQTT DDS CoAP UDP Latency: 20.6 s 234 ms 234 ms 108.7 ms 106.1 ms

Table 6.2: Comparison of different IoT communication protocols [12].

The light client has much higher latency that other IoT devices. This is because transactions must be communicated to many peers on the peer-to-peer network before they can be considered approved on the chain. However, the Ethereum light client communication protocol provides assurance that data was not compromised in transit. The light client authenticates the origin of a data source based on the unique Ethereum address that initiated the transaction. This can be trusted because of the proof of work consensus algorithm and the size of the Ethereum chain [24]. Furthermore, data can be encrypted before it is sent, ensuring that other nodes on the network cannot read and gain access to this data. While the light client does not achieve the same latency as other communication protocols, it is able to provide a higher level of security. All communications are guaranteed to be immutable and integral, a feature which can be invaluable to certain applications.

44 6.3 Applications

The light client is best suited for applications that require high communication in- tegrity but not necessarily low latency.

6.3.1 Smart Cities

The light client should be used for mission critical infrastructure. For example, smart city technology often relies on correct communications. Smart cities can involve IoT devices that direct traffic, handle emergency situations or environmental monitoring [16]. In areas such as these, integrity can be crucial. If traffic systems broadcast incorrect information, human life can be at risk. Therefore, IoT devices in smart cities need an integrity-first communication protocol such as the light client.

6.3.2 Energy Delivery Systems

Likewise, errors in energy delivery system communications can lead to cyber-physical damages. Many urban critical infrastructure sectors rely on integral communica- tions. For example, electric grid infrastructure requires device state information to be transferred over networks at regular intervals. Areas such as these rely on secure communication but may not be as time critical when the lag of the light client is only around 20 seconds [24].

6.3.3 Device Updates

One specific application of this light client, that work has already begun on, is its use with an agent. The agent can act as a command and control center for the device. The agent can perform more computationally intensive tasks, such as machine learning, white-listing, or configuration changes. This information can then be disseminated to the device via the light client. This allows for updates to be made to the device without requiring a reboot or a reset. Another use case for the agent-light client combination is device provisioning, where each device must be configured to send

45 data to the right place and authenticate it on the network. Therefore, in combination with an agent, the light client can have a host of more complicated applications such as filtering, white-listing, and configuration updates [21].

6.4 Limitations

While the light client does have many significant applications, it also has limitations. An operating system is required for the light client to run. Similarly, even at the small size of 1.2 MB, some IoT devices will not have enough memory storage to run it. The client also has a high transaction time of around 20 seconds which may be problematic for some devices or systems. Furthermore, the cost to send a transaction is around 10 cents on average [8]. For devices or systems that need to send many transactions, this cost may be too high. Despite its limitations, the light client software is a much improved form of integral communications for IoT devices.

46 Chapter 7

Conclusion

The light client aims to address the problems with communication integrity for IoT devices. Public blockchains such as Ethereum have the ability to disseminate data in a scalable and distributed fashion. Future work on the light client will include establishing optimal conditions for its function on IoT devices. The light client will eventually be integrated with an agent that will interpret data from the light client and implement commands on the IoT endpoint. Building an agent for the light client will be the basis for an integrity-driven approach to performing updates for IoT devices at scale [24].

7.1 Summary of Contributions

The light client has a size of only 1.2 MB on Unix. It is a robust system that can send secure transactions onto an immutable blockchain. It is able to provide secure communication to IoT devices without requiring hardware changes or software restarts. This is fundamentally important as IoT devices have lacked in security for a long time and previous solutions have often placed a burden on manufacturers [14]. The light client is small and lightweight. One hundred and sixty files were removed from the binary, totalling 264,496 lines of code removed. The binary was then further compressed to achieve a size of 1.2 MB when run on Unix. The size of the binary on other operating systems and architectures varies from 1.2 MB to 3.5 MB, a size that

47 is sufficient for many IoT devices [24]. The light client has also been tested on a variety of architectures and operating systems. These include unix, linux, and ARM 64. The light client binary was indi- vidually cross-compiled for each device and each device was configured to allow for port-forwarding so that it could properly communicate with the peer-to-peer network. A multitude of tests were run on each device, measuring the average transac- tion time, peer count, and ability to send transactions of varying sizes among other parameters. The light client was determined to perform comparably to the original Ethereum source code. It outperformed many other blockchain algorithms and did so at a much smaller size. While the light client does not achieve the same latency as other IoT communication protocols, it supplies secure, immutable transactions which these protocols cannot.

7.2 Future Work

As mentioned in Section 6.3.3, one potential application for the light client is integra- tion with a command-and-control agent. Work has already begun on this integration. The agent can have many responsibilities, such as determining certain processes and IP addresses that can be white-listed, and can then use the light client to distribute this information. This agent-client integration has already been tested and more fu- ture work could involve adding additional functionality such as device configuration updates [21]. Furthermore, the light client should be tested more robustly. While the light client has been tested on a variety of architectures and operating systems, it should be placed on more physical devices. This is because the light client is meant to be a technology specifically for IoT devices and therefore, it should be tested on a larger distributed network of such devices. In the future, the light client code will be ideally tested on over 100s of different IoT devices as well as architectures, including 32 bit systems. This will simulate the

48 kind of environment where the light client is expected to be deployed, such as a power plant or a smart city grid. The light client still has many more applications that have only begun to be explored. This technology has tremendous potential to change the way IoT devices communicate with each other.

49 50 Appendix A

Tables

Cross Compiled Code Sizes Darwin Windows MIPS Size (MB): 1.6 1.4 3.6

Table A.1: Sizes of the cross-compiled light client for operating systems and archi- tectures that were not explicitly tested in this thesis.

51 52 Appendix B

Figures

Figure B-1: Sample output from the testing dashboard when testing for transaction time. Output is human-readable. Testing dashboard also includes a regrex file to strip this output file into a dataset.

53 Figure B-2: Sample output from the testing dashboard when testing for peer count. Output is human-readable. Testing dashboard also includes a regrex file to strip this output file into a dataset.

54 Bibliography

[1] Iotchain: A blockchain security architecture for the internet of things. In Proceed- ings of the IEEE Wireless Communications and Networking Conference (WCNC 2018), pages 1–6, 2018.

[2] Iotex: A decentralized network for internet of things. 2018. https://iotex.io/ white-paper.

[3] Our response to a without a blockchain has been built to out- perform bitcoin. 2018.

[4] Ethereum Blockchain App Platform, (Accessed: 2019-02-13). https://www. ethereum.org.

[5] Bitcoin source code, (Accessed: 2019-05-14). https://github.com/bitcoin/ bitcoin.

[6] IOTA source code, (Accessed: 2019-05-14). https://github.com/iotaledger.

[7] Shodan, (accessed April, 2018). www.shodan.io/.

[8] BitInfo Charts, (accessed May, 2019). https://bitinfocharts.com/ ethereum/.

[9] iNet B1300, (accessed May, 2019). https://www.gl-inet.com/products/ gl-b1300/.

[10] Infura - Scalable Blockchain Infrastructure, (accessed November, 2018). infura. io/.

[11] Riccardo Bonetto, Nicola Bui, Vishwas Lakkundi, Alexis Olivereau, Alexandru Serbanati, and Michele Rossi. Secure communication for smart iot objects: Pro- tocol stacks, use cases and practical examples. 2012 IEEE International Sym- posium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2012.

[12] Yuang Chen and Thomas Kunz. Performance evaluation of iot protocols under a constrained wireless access network. 2016 International Conference on Selected Topics in Mobile Wireless Networking (MoWNeT), 2016.

55 [13] Gregory Falco, Caleb Li, Pavel Fedorov, Carlos Caldera, Rahul Arora, and Kelly Jackson. Neuromesh: Iot security enabled by a blockchain powered botnet vac- cine. ACM Proceedings: International Conference on Omni-Layer Intelligent Systems (COINS), 2019.

[14] Gregory Falco, Arun Viswanathan, Carlos Caldera, and Howard Shrobe. A master attack methodology for an ai-based automated attack planner for smart cities. IEEE Access, pages 48360–48373, August 28, 2018.

[15] Igor Fovino, Andrea Carcano, Thibault Murel, and Alberto Trombetta. Mod- bus/dnp3 state-based intrusion detection system. 2010 24th IEEE International Conference on Advanced Information Networking and Applications, 2010.

[16] Aditya Gaur, Bryan Scotney, Gerard Parr, and Sally McClean. Smart city ar- chitecture and its applications based on iot. Procedia Computer Science, 2015.

[17] Adam Gencer, Soumya Basul, Ittay Eyall, Robert van Renessel, and Emin Sirer. in bitcoin and ethereum networks. arXiv preprint arXiv:1801.03998, 2018.

[18] Jiyong Han, Minkeun Ha, and Daeyoung Kim. Practical security analysis for the constrained node networks: Focusing on the dtls protocol. 2015 5th International Conference on the Internet of Things (IOT), 2015.

[19] Peter Huitsing, Rodrigo Chandia, Mauricio Papa, and Sujeet Shenoi. Attack taxonomies for the modbus protocols. International Journal of Critical Infras- tructure Protection, pages 37–44, December, 2008.

[20] Seoung K. Kim. Measuring ethereums peer-to-peer network. University of Illinois at Urbana-Champaign, 2017.

[21] Matthew Maloney, Elizabeth Reilly, Michael Siegel, and Gregory Falco. Cy- ber physical iot device management using a lightweight agent. The 2019 IEEE International Conference on Internet of Things (iThings-2019), 2019.

[22] Andrew Minteer. Analytics for the Internet of Things.

[23] Satoshi Nakamotoi. Bitcoin: A peer-to-peer electronic cash system. 2008. https: //bitcoin.org/bitcoin.pdf.

[24] Elizabeth Reilly, Matthew Maloney, Michael Siegel, and Gregory Falco. A smart city iot integrity-first communication protocol via an ethereum blockchain light client. Software Engineering Research and Practices for the Internet of Things, 2019.

[25] Popov Sergui. The tangle. 2015. https://iota.org/IOTAWhitepaper.pdf.

56 [26] Zhengguo Sheng, Shusen Yang, Yifan Yu, Athanasios Vasilakos, Julie McCann, and Kin Leung. A survey on the ietf protocol suite for the internet of things: standards, challenges, and opportunities. IEEE Wireless Communications, pages 91–98, 2013.

[27] Dinesh Thangavel, Xiaoping Ma, Alvin Valera, Hwee-Xian Tan, and Colin Tan. Performance evaluation of mqtt and coap via a common middleware. 2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and In- formation Processing (ISSNIP), 2014.

[28] Marco Tiloca, Christian Gehrmann, and Ludwig Seitz. Robust and scalable dtls session establishment. ERCIM News, 2016.

[29] . Ethereum (eth) whitepaper. 2015. http://gavwood.com/paper. pdf.

57