Energy Efficiency in Wireless Sensor Networks, Through

Srod Karim Master’s Thesis, Autumn 2017

Abstract

Wireless sensor networks are an ever-growing market in today’s society; Smart homes with wirelessly connected and controllable devices, continuous monitoring in environmental, health care and industrial settings, and automated, self-adjusting systems in homes and factories are all growing technologies that are built using wireless sensor networks. In many of these systems, energy consumption can be a major concern; Most devices in the network rely on small batteries as their energy source. Because of this, any energy usage optimization can lead to greatly increased lifetime. Often, radio communication between microcontrollers and other devices is the biggest energy drain in the system. Compression of sensor data before transfer between nodes can be used to reduce the energy consumed by the radio transmissions. By first compressing the data, it is possible to reduce the size of the message transmitted over radio, which in turn will lead to less energy consumed. However, energy is required to perform the compression and decompression. It can be important to have a good idea of when compression may be beneficial for the overall energy efficiency of a wireless sensor network, and when it might be better to send the data without compression. In this thesis we will look at typical properties of wireless sensor networks, wireless transfer protocols for sensor networks, and sensor data. We will analyze the effects of compressing sensor data before sending it over the network, and perform experiments and measurements in a simple wireless sensor network, to assess the effects data compression can have on the energy consumption in wireless sensor networks.

i Acknowledgments

My thanks go to my advisor Kjetil Østerås, who provided guidance, advice, proofreading, suggestions and general feedback, and for helping me all around throughout my thesis. I also want to thank Silicon Labs, for giving me the opportunity, and for providing me with the microcontrollers and tools required to perform the experiments. I also owe thanks to my advisor at the University of Oslo, Associate Professor Arne Maus, for providing guidance, and for giving feedback on the structure and content of this thesis. I would also like to thank Jonathan Ringstad for providing in depth proofreading and valuable feedback on the structure and wording of the text.

ii Preface

Overview In this thesis, we perform experiments and measurements to asses the effects of compression in wireless sensor networks. For this purpose, we have divided our thesis into 6 distinct chapters that each cover a aspect of our thesis. In chapter 1, we provide motivation for our thesis, and look at previous work in the field of energy efficiency in wireless sensor networks. In chapter 2, we look at Wireless sensor networks and wireless data transfer, including many factors that contribute to the overall energy usage of wireless transmissions. In chapter 3, we outline various encoding and compression that can be useful for wireless sensor networks, and create a basis for the algorithms we use for our experiments. In chapter 4, we explain the experiments we perform, and examine the algorithms and datasets we use for our experiments. In chapter 5, we look at the results from our experiments, and asses the effects compression has on energy usage in wireless sensor networks. Finally, in chapter 6 we discuss potential future work in the fields of energy efficient wireless data transfer.

iii Contents

1 Introduction 1 1.1 Motivation ...... 1 1.2 Energy efficient hardware ...... 2 1.2.1 Energy efficient processors ...... 2 1.2.2 Energy efficient microcontrollers ...... 3 1.3 Energy efficient software ...... 4 1.3.1 Clockspeeds and dynamic speed scaling ...... 4 1.3.2 Parallelization and vectorization ...... 5 1.3.3 Wireless sensor networks and compression ...... 5

2 Wireless data transfer 7 2.1 Wireless sensor networks ...... 7 2.2 Energy usage in radio transmissions ...... 9 2.2.1 Energy calculations ...... 9 2.2.2 Transmission energy usage ...... 10 2.2.3 Network packets ...... 11 2.2.4 Receiver timing ...... 12 2.3 Wireless transfer protocols ...... 14 2.3.1 Wi-Fi ...... 16 2.3.2 Bluetooth ...... 16 2.3.3 ZigBee ...... 17 2.3.4 Proprietary Protocols ...... 17

3 Encoding and compression 18 3.1 Sensor Data ...... 19 3.2 Reducing range or precision ...... 20 3.3 Run-length encoding ...... 20 3.4 ...... 21 3.5 Huffman Encoding ...... 23 3.6 ...... 25 3.6.1 Prediction by partial matching ...... 27 3.7 Lempel–Ziv 77 ...... 28

iv Contents Energy efficiency in wireless sensor Section Contents networks, through data compression

3.8 Lempel–Ziv–Welch ...... 31 3.8.1 Sensor Lempel–Ziv–Welch ...... 34 3.8.2 Lempel–Ziv–Welch with minicache ...... 34

4 Experiments 36 4.1 Compression algorithms ...... 36 4.1.1 Delta encoding ...... 36 4.1.2 Variable-width delta encoding ...... 37 4.1.3 ...... 40 4.1.4 LZ4 ...... 41 4.1.5 LZW ...... 41 4.1.6 PPMs ...... 42 4.1.7 fpaq0 ...... 42 4.2 Datasets ...... 43 4.2.1 Relative humidity ...... 43 4.2.2 Sea surface temperature ...... 45 4.2.3 Mixed data ...... 46 4.3 Experiments ...... 47 4.3.1 Compression efficiency ...... 47 4.3.2 Energy cost of compression ...... 48

5 Results and Conclusion 51 5.1 Compression efficiency ...... 51 5.1.1 Results ...... 51 5.1.2 Conclusion ...... 55 5.2 Energy cost of compression ...... 57 5.2.1 Results ...... 57 5.2.2 Conclusion ...... 76 5.3 Summary ...... 78

6 Future work 79 6.1 Compression algorithms ...... 79 6.2 Retained compression information ...... 80 6.3 Struct delta encoding ...... 80

7 Appendix 84 7.1 Results ...... 84 7.1.1 Experiment 1 ...... 84 7.1.2 Experiment 2 ...... 88

v Introduction

1.1 Motivation

Energy consumption is an emerging topic in the field of computer science. Excessive production and consumption of electricity can have hazardous effects on the environment, due to the use of non renewable energy sources. Energy consumption is also a big concern for many companies with focus on computation and IT. Power consumption has been a design constraint for many businesses, most importantly due to the high monetary costs of supplying power. The electricity costs of modern computers have become one of the biggest strains on the budget of many firms and data centers. In 2005, engineer Luiz André Barosso published an article in which he analyzed the performance costs of their computer severs. In this article he stated that electricity costs made up for roughly one third of the total cost of running the servers. He additionally theorized that energy costs could end up becoming the most expensive part of maintaining a computer server by the end of the decade, even more expensive than the cost of the computer hardware[1]. Energy consumption is also a major concern in mobile, battery-driven devices. For laptops and phones, long battery times are an important feature. Yet, it can be hard to maintain long battery times, given the power consumption of the various components of modern laptops and phones[2][3]. Phones are not the only mobile devices that depend on getting energy through battery usage, as more and more devices using microcontrollers are being produced. Microcontrollers are small processors used in a variety of fields. These include remote controls, power tools, toys, office machines and automatically controlled products and devices such as sensor networks, control systems and implantable medical devices, as well as other embedded systems. Given their small structure and autonomous nature, they tend to be heavily power constrained. Many microcontrollors reside in locations that are hard to access, such as

1 Chapter 1. Introduction Energy efficiency in wireless sensor Section 1.2. Energy efficient hardware networks, through data compression medicinal devices implanted in the body, or distributed sensor networks operating in forests, seas, and other locations far from civilization. In some cases, the changing of batteries for these products can be a difficult or maybe even impossible task, further underlining the importance of energy conservation. Additionally, microcontrollers in IoT devices and wireless sensor networks in homes aim to be easily installable, without requiring rewiring by electricians or drawing long extension cords, and therefore have to rely on batteries. Finally, a lot of devices operating on microcontrollers are intended to reduce energy consumption in larger systems, such as smart light bulbs that automatically turn on and off when people enter and leave the room, and that adjusts brightness depending on the time of day. It is of course important that the microcontrollers are are energy efficient as well, to reduce the overall energy consumption in the system. In this thesis, we want to add to the scientific knowledge of energy consumption in mobile devices, with focus on wireless sensor networks. We want to measure the energy consumption of data transfer in wireless sensor networks, and asses how this property is affected by using compression algorithms to data before transfer.

1.2 Energy efficient hardware

Due to the importance of power consumption, and the benefits energy efficiency can have, there has emerged a market for the design and production of energy friendly solutions in large portions of the computer science landscape. There is a constant flow of newer and more effective hardware and software designed with energy consumption in mind; tools, peripherals, code and even specifications are being made to be more energy-aware. Perhaps most important is the development of energy efficient Central Processing Units (CPUs), as these can be the most energy hungry part of any microcontroller. With the rise of the Internet Of Things (IoT) and smart devices, there has been a rise to the energy friendly processors market. The current leader within this market is ARM, with a popular line of energy friendly processors. In 2013 ARM released a statement that nearly 60% of all all mobile devices worldwide used at least one ARM-based chip[4].

1.2.1 Energy efficient processors ARM offers three primary series of microprocessors for use with mobile devices, and devices that don’t require high processing power. Each processor series has its own characteristics and applications. Most noticeably trading off computation speed for energy efficiency.

2 Chapter 1. Introduction Energy efficiency in wireless sensor Section 1.2. Energy efficient hardware networks, through data compression

In addition to the processor hardware, ARM provides other incentives for energy efficiency. For example, ARM designed the THUMB and THUMB2 instruction sets for their processors. These are compact processor instruction sets which allow for highly dense code, leading to an even smaller energy footprint. Many of the processors also come with multiple sleep states or energy modes. These allow the processor to run at varying energy levels, with proportional energy usage and computational power. When a processor is put into a sleep state, the main CPU is turned off, so it will consume less power. When in sleep mode, the CPU is unable to execute instructions until it is woken up again with a wakeup signal. The wakeup signal can be sent from a timer, or another peripheral. Sleep states are a great measure for cutting energy usage in a program with intervals where computation is not needed, and may increase the battery lifetime of the device. This can happen when a program is waiting on incoming data, or when it is waiting until a scheduled time to perform its code.

1.2.2 Energy efficient microcontrollers In addition to the energy efficient processors with multiple energy modes, many microcontrollers will offer supplemental features to further reduce power consumption. Some microcontrollers even specialize in maintaining energy efficiency. This is done by designing an energy aware microcontroller layout, providing energy management controls to the , and incorporating energy-efficient on-chip peripherals. These can be peripherals such as low energy clocks, timers, serial interfaces, wireless communication, and more. Since the processors are often the most power-consuming part of an microcontroller, it can be beneficial to use the processor as little as possible. In 1.2.1 we discussed the idea of putting the processor into a sleep state when not in use, but there are additional ways to reduce processor usage. If a certain type of task is performed often in a program, it can be beneficial to outsource this task to a specialized peripheral. The peripheral will have less flexibility than a full CPU, but may perform the specialized tasks at a lower energy cost. While the peripheral is performing its task, the main CPU can perform other tasks, or even go into a sleep state. One popular peripheral used for offloading tasks from the processor is the Direct Memory Access (DMA) unit. This is a unit that allows for memory transfers without CPU intervention. The processor can configure the DMA, and move on to perform other tasks, or even be put into a sleep state. The DMA will transfer memory as configured by the processor. The DMA can transfer data within the local memory of the microcontroller, as well as transferring data between different peripherals. It can also be set up to handle incoming data and requests from other peripherals, and to resume the processor from sleep mode once the transfer is complete. This

3 Chapter 1. Introduction Energy efficiency in wireless sensor Section 1.3. Energy efficient software networks, through data compression is especially useful for communication between the microcontroller and peripherals. The serial ports used for communication between the processor and peripherals typically run at a slower clock speed than the processor’s. This means that the processor will typically waste energy while communicating with a peripheral, by having cycles where it waits for the communication to continue. Using the DMA, the processor can instead go into a sleep mode. The DMA will then copy the message from local memory to the serial port, or from the serial port to the local memory, and resume the CPU once the message transfer is complete, thereby saving the energy that would otherwise be spent on the processor simply waiting.

1.3 Energy efficient software

While it is important to have energy efficient CPUs and on-chip peripherals, it is also important to utilize these components to their fullest potential through energy aware programming. In addition to energy efficient hardware, software can be designed with energy consumption in mind. Everything from core functions such as operating systems and schedulers, down to simple algorithms such as sorting, can be optimized for energy efficiency. This is also a field of academic interest, as studies and new algorithms are constantly developed. This thesis focuses primarily on the effects data compression has on energy consumption in wireless sensor networks, but there are many additional measures for minimizing energy usage in energy constrained devices and wireless sensor networks. We will discuss a few notable thesis’ and studies that focus on energy efficient software.

1.3.1 Clockspeeds and dynamic speed scaling One way to lower the energy cost of a CPU is to lower the clock speed it is running at. CPUs typically use less energy the slower clock speed they run at. However, running at a lower clock speed also means that the processor performs less computations per second, often at a higher energy cost per computation. Often a good strategy is to perform and complete tasks as fast as possible, and put the processor into sleep mode until the next task needs to be started. An alternative technique is to run the processor at a clock speed that lets it complete a task just in time before another task is initiated. Despite using more energy per computation, this technique can save energy by not having to go into sleep mode in the first place, and by not having downtime where no computation is performed. Francis Yao, Alan Demers and Scott Shenker wrote in their 1995 article "A scheduling model for reduced CPU energy" about a dynamic speed scaling which they called "YDS", named after its three

4 Chapter 1. Introduction Energy efficiency in wireless sensor Section 1.3. Energy efficient software networks, through data compression creators. The idea behind this algorithm is to adjust the clockspeed of the processor at runtime, in a way that is optimal for energy usage. This idea is called Dynamic speed scaling[5]. Dynamic speed scaling is when the clockspeed is adjusted during computation to complete tasks just in time before another task is started. The clock speed is adjusted as necessary, depending on the amount of computation needed to complete a task, and the amount of time left until the next task needs to be started.

1.3.2 Parallelization and vectorization A modern problem in computation is the dark silicon effect. This effect describes the phenomenon in which computers are limited in how many of their transistors they can use for any given operation. Moore’s law states that the number of transistors in a dense circuit doubles every other year[6]. However, powering all of these transistors becomes increasingly harder, especially at high clockspeeds. This is because of the transistors’ heat dissipation, which cause the device to become too hot for the transistors to function properly. The higher the clockspeed, the greater the heat dissipation. A common solution to this problem is to parallelize the workload. By cutting up a task into smaller parts, which can then be performed simultaneously, we can perform more tasks without increasing the clockspeed or computation time. Parallelization can also increase the energy efficiency of a program, both by increasing the overall computation speed, and through reducing the heat dissipation. In 2013, Juan M. Cebrián and Lasse Natvig published online an article titled "Performance and energy impact of parallelization and vectorization techniques in modern microprocessors". Herein, they analyze and evaluate the usage of parallelization and vectorization for saving energy. They show that, depending on the processor and algorithm, it is possible to achieve energy savings by parallelizing the algorithms and running them on multiple threads. Additionally, vectorization can be used in place of, or in addition to, parallelization to further save energy[7]. Vectorization means that the processor performs an operation on multiple values simultaneously. This can be especially relevant for handling multimedia files such as images and audio, and for compression, tasks which are easily parallelizable or vectorizable.

1.3.3 Wireless sensor networks and compression One popular use case for microcontrollers is in sensor nodes in wireless sensor networks. In a wireless sensor network, microcontrollers are used collect data using sensors, perform calculations on the data if necessary, and send the data off to be processed once enough has been collected. For such a task, the most energy expensive part is often the process of transmitting and receiving data wirelessly. In an article from 2010,

5 Chapter 1. Introduction Energy efficiency in wireless sensor Section 1.3. Energy efficient software networks, through data compression

Gregory J. Pottie and William J. Kaiser state that around 3000 processor instructions can be executed for the same energy cost as transmitting a single bit 100 meters over radio[8]. This gives us an opportunity for saving power by using data compression before transmitting data. By Pottie and Kaiser’s estimates, we will save energy even if we use thousands of instructions to compress the data size by a single bit. This idea of using compression to reduce power usage is written about in Christopher M. Sadler and Margaret Martonosi’s article "Data Compression Algorithms for Energy-Constrained Devices in Delay Tolerant Networks". Here, they analyze multiple compression algorithms, and even develop their own algorithm, the Sensor Lempel–Ziv–Welch (S-LZW). The S-LZW is a variation on the Lempel–Ziv–Welch algorithm, and is a compression algorithm designed specifically for sensor networks. In the article, they show that it can outperform other compression algorithms in terms of energy efficiency, depending on the network setup.[9] This article by Sadler and Martonosi is an important influence for our thesis. In this theses, we want to build upon their work by measuring results for further compression algorithms, datasets, and compression sizes.

6 Energy efficient wireless data transfer

The focus of this thesis is on reducing the energy usage of data transfer over a wireless network. In this chapter we will describe the components that make up data transfer in wireless sensor networks, and various factors that can have an effect on the energy usage.

2.1 Wireless sensor networks

Wireless Sensor Networks (WSN) are a popular use for microcontrollers. In a Wireless sensor network, numerous microcontrollers and other devices are connected in a mesh, sending information to one another, and acting upon the information. Usually, microcontrollers in a sensor network are source nodes that gather sensor-data such as temperature, humidity, pressure, and more. This data is then sent to a sink node for processing. The sink node is typically either a microcontroller that acts upon the received data to regulate a bigger system, or a more powerful machine with less restrictions that collects and analyzes the data, and acts upon it. The wireless sensor network can have any topology. In some networks, the source nodes send data directly to the sink node. In others, the microcontrollers might have to route the data trough many nodes before it reaches the sink. There may even be dedicated gateway nodes for routing the data. Figure 2.1 shows a few typical network topologies. Wireless sensor networks have a wide variety of applications, as there are many tasks that rely on gathering sensor data in a region. They can be used to measure and log sensor data such as climate conditions, machine health, and factory conditions. The sensor data can then be collected and analyzed, and acted upon to improve conditions or keep a system stable. Sensor networks are also used for safety-critical tasks, such as detecting natural disasters like fire detection in forests and coal-mines, flood detection, and more. Early detection of natural disasters

7 Chapter 2. Wireless data transfer Energy efficiency in wireless sensor Section 2.1. Wireless sensor networks networks, through data compression

source sink

source source source source

sink source

source source source source

(b) multi-hop network topology, (a) Star network topology, where all where source nodes are connected to source nodes are connected directly to the sink either directly, or through the sink other source nodes.

Figure 2.1: Two popular network topologies for wireless sensor networks.

Transceiver oe Source Power

CPU Sensor

External Memory

Figure 2.2: Illustration of a microcontroller acting as a single source node in a wireless sensor network. The power source is usually a small battery, but can also be a connection to the main power supply in a house, or an energy generator.

8 Chapter 2. Wireless data transfer Energy efficiency in wireless sensor Section 2.2. Energy usage in radio transmissions networks, through data compression is crucial to preventing or mitigating the disaster, or to allow initiating evacuation and other life saving measures in a timely manner. Sensor networks are also used in many automated networks, for the purpose of measuring and automatically adjusting conditions in an area or system, such as the humidity and ventilation in a building. When designing a Wireless sensor network, it is important to keep in mind the characteristics that are unique to it. The most important characteristic in sensor networks is the power consumption constraints inherent in microcontrollers. The nodes must be able to stay powered by small batteries, or by harvesting energy from external sources such as solar power and wind energy. A wireless sensor network should ideally be able to handle changes to the node mesh, including adding new nodes to the mesh, removing nodes from it, and moving nodes around. Another consideration is the environmental conditions of the sensor network. The network must be able to cope with the environmental factors that surround it; the communication between the nodes needs to function regardless of the conditions of the environment it operates in.

2.2 Energy usage in radio transmissions

When broadcasting a message from a node, its transmitter or transceiver is activated to perform the data transfer, and starts draining energy from the power source. Likewise, the receiver or transceiver is activated on the receiving node when listening for a broadcasted message. In many wireless sensor networks, the transmitters and receivers can be the biggest power sinks of the system.

2.2.1 Energy calculations When discussing the energy consumption in electrical circuits, we focus on the electrical potential energy provided by electrical outlets, generators or batteries, that is converted to other forms of energy, such as motion, light and heat. Once the electrical potential energy has been converted, it is considered as being used up, as a battery has a limited amount of potential energy to convert. Energy is typically denoted as E in mathematical formulas, and is measured in Joules (J). One Joule describes the energy transferred to an object when a force of one newton moves that object one meter. Energy can be measured through a variety of methods. A practical way to measure electrical energy usage is through the following formula:

1J = 1V ∗ 1C = 1V ∗ 1A ∗ 1s where V describes the voltage of the circuit, and C is the coulomb. The coulomb, which is the electrical charge in the circuit, can also be described

9 Chapter 2. Wireless data transfer Energy efficiency in wireless sensor Section 2.2. Energy usage in radio transmissions networks, through data compression as 1A ∗ 1s, describing the charge transported by a current of one ampere in one second. Another useful formula for measuring energy is through time and power: 1J = 1W ∗ 1s where W describes the electrical power of a circuit, which is the rate at electrical energy is being transferred in the circuit. These formulas communicate to us that we can reduce energy consumption by reducing the voltage or ampere of a circuit, or by increasing the amount of time the circuit operates at reduced voltage and ampere.

2.2.2 Transmission energy usage The energy consumption of a transmission is affected by various factors. From 2.2.1, we can assert that the energy consumption of a transmission can be calculated from the power of the transmission, and the time it takes. One important factor that affects the time a transmission takes is the length of the message that is being transferred. The larger a message is, the more time, and thus energy, is required to broadcast it. Therefore, we can reduce the energy consumption of a transmission by reducing the length of the message being transferred. The transmit power of a transmitter describes the power used for transmission, and can also be adjusted. By increasing the transmit power of a transmission, the message transmitted will be able to reach further. Typically, transmit power is measured in decibel-milliwatts (dBm). The dBm of a transmitter tells us the logarithmic ratio between its power, and one milliwatt. The transmission power in microcontroller transmitters and transceivers will typically range range from -10dBm to 20dBm. Figure 2.3 shows the relation between dBm and watts for this range. The transmit power we select for an application affects the range of our transmissions. In general, we can double the distance that a transmission can reach by quadrupling the transmit power. It can be beneficial to choose a fitting transmit power for each given task. The transmit power should be as small as possible, to conserve energy, while being large enough for the transmissions to reach the receivers.

10 Chapter 2. Wireless data transfer Energy efficiency in wireless sensor Section 2.2. Energy usage in radio transmissions networks, through data compression

100

75

50

25

Power in milliwatts 0 −10 −5 0 5 10 15 20 Power ratio in dBm

Figure 2.3: The relation between dBm and wattage for transmission power, from -10dBm to 20dBm.

Another factor that affects the range and energy usage of data transmission is the frequency of the radio waves it is sent over. Higher frequencies allow for faster message throughput at the cost of increased energy usage per byte sent. We discuss this factor further in section 2.3, in conjunction with wireless transfer protocols.

2.2.3 Network packets Wireless networks rely on sending information in packets. Each packet is a finite length set of data, and can be a self contained message, or part of a bigger message. Packets consist of meta-information regarding the packet, and payload data. The meta-information can contain info on the connection, checksums, id, package size, and more. The payload consists of the actual data that is being transferred. When sending a message wirelessly, it be a good idea to split it up into multiple packets, rather than having one big packet. Splitting a message into smaller packets can cause increased message transmission overhead, as each package requires its own meta data. However, having multiple smaller packets reduces the cost of failed transmissions, as each package costs less to resend. This makes the network communication more secure against package loss or data corruption. For example, if one byte is corrupted in a single 512-byte packet, the entire 512-byte packet needs to be resent. If one byte is corrupted across eight 64-byte packets, only the 64-byte packet containing that byte needs to be resent. When picking a packet size for transferring data, it can be wise to pick a packet size that balances the overhead cost of sending more messages against the potential cost of losing a packet. It is also possible to send packets with variable lengths, instead of a fixed length. When using a variable length packet, additional information

11 Chapter 2. Wireless data transfer Energy efficiency in wireless sensor Section 2.2. Energy usage in radio transmissions networks, through data compression needs to be sent in the header of the packet, to specify its length. For example, when sending a variety of messages ranging from 8 to 32 bytes in size, it may be wiser to send the packets as variable length packets ranging from 8 to 32 bytes, rather than sending all messages in 32 byte packets. This can be useful for compressed data, as the compressed data’s size usually isn’t known beforehand, and may not be consistent between different compressed messages.

2.2.4 Receiver timing When wirelessly sending data from one device to another, the receiving device needs to continuously be at standby, listening for incoming messages. This can lead to a lot of energy being wasted on idly waiting for a message. In figure 2.4 we can see a receiver listening for ca. 20 milliseconds before receiving any actual data, thereby wasting some energy. One way to minimize this energy spent on listening for messages is to time the transmitter and receiver; the listening device should wake up just before the sending device sends out a message. In figure 2.5 we see a receiver listening for only 3 milliseconds instead, as the transmitter and receiver have agreed beforehand when to initiate transfer. We could try to minimize this waiting time even further, but then we risk that the transmitter starts transmitting too early, before the receiver has started listening. If the receiver starts listening after the transceiver has started sending data, the receiver will fail to receive the first part of the data, which can make the whole data unusable. It is important for the power consumption of a receiver to pick an optimal listening margin, so that little time is spent listening for messages, without risking to lose packages by starting to listen too late.

12 Chapter 2. Wireless data transfer Energy efficiency in wireless sensor Section 2.2. Energy usage in radio transmissions networks, through data compression

10

8

6

4 Current in mA 2

0

0 5 10 15 20 25 30 35 40 45 50 Time in ms

Figure 2.4: Wireless transceiver waiting for and receiving data, without timing. The red portion shows the transceiver listening for a message, and the green portion shows the transceiver receiving the message.

10

8

6

4 Current in mA 2

0

0 5 10 15 20 25 30 35 40 45 50 Time in ms

Figure 2.5: Wireless transceiver waiting for and receiving data, with timing. The red portion shows the transceiver listening for a message, and the green portion shows the transceiver receiving the message.

13 Chapter 2. Wireless data transfer Energy efficiency in wireless sensor Section 2.3. Wireless transfer protocols networks, through data compression

2.3 Wireless transfer protocols

When multiple devices are communicating, a transfer protocol is required. A transfer protocol is a set of rules and guidelines for how to communicate with one another. It defines how connections should be established, maintained, and cut, and how data should be transferred between the devices. If two devices do not use the same protocol, or don’t follow the rules set by the protocol, they cannot safely and reliably transfer data. When communicating wirelessly, it is very easy to end up with interference with other communications and broadcasts on the same or nearby frequencies. A wireless transfer protocol needs to be able to tolerate this kind of interference. The protocols also have to comply with laws and regulations concerning radio transmissions, which vary from country to country. Fortunately, there are many wireless transfer protocols that can handle these issues. They have been designed to comply with radio regulations, and to operate regardless of interference. The full radio spectrum consists large range of frequencies. Most of it is reserved for specific uses, and a license is required to broadcast or transfer data in those frequencies. There are however some radio bands that can be used without a license, as long as one follows a defined set of regulations. These are referred to as unlicensed radio bands, and are popular for usage in Internet Of Things (IOT) devices, and wireless sensor networks. A frequency band is a range of frequencies in the spectrum with a defined least and greatest frequency. Radio bands are sometimes split into smaller frequency ranges called channels. Figure 2.6 displays a rough overview of the radio spectrum as designated by the Institute of Electrical and Electronics Engineers (IEEE), including some near-globally unlicensed bands. The regulations regarding usage and licensing of different radio bands can vary greatly from region to region. For example, a frequency range that is available for unlicensed use in Japan might not be available in the United States. Many rules and regulations regarding telecommunication, and standards within wireless network communication that we use today, are maintained by the IEEE. IEEE 802 is a family of IEEE standards dealing with local area networks and metropolitan area networks, and contains many standards for wireless network protocols. Within the specifications set by IEEE, there are a few radio bands that can be used license free close to worldwide. These are known as the 2.4GHz and 5GHz radio bands. These bands are also allocated for licensed uses, but do allow for unlicensed communication[11]. In addition to these, there are sub-1GHz frequency bands, ranging from 750MHz to 950MHz, which are available for unlicensed use. The availability of these bands varies from region to region, where different parts of this range can be used depending on the region[12].

14 Chapter 2. Wireless data transfer Energy efficiency in wireless sensor Section 2.3. Wireless transfer protocols networks, through data compression

Radio wave frequency in MHz

101 102 103 104 105

HF VHF UHF L S C X Ku K Ka V W G

ca 750-950 MHz, varies depending ca 2.4 - 2.5 GHz ca 5.25 - 5.85 GHz on region (IEEE 802.11) (IEEE 802.11) (IEEE 802.15.4)

Figure 2.6: Radar-frequency bands according to IEEE standard, with focus on unlicensed radio bands, popular with IOT devices[10]

Choosing the correct communication frequency for each application can have a great impact on it’s performance and energy efficiency. Higher frequencies allow for higher data rates. This means that it is possible to transfer data in a shorter time, or transfer more data in the same time. However, higher frequencies have greater free-space path-loss, which means that the signal will reach a shorter distance, and devices thus have shorter communication range. In addition, higher frequencies struggle more with obstruction, and reach even less far when communicating through walls, trees, or other structures. Finally, higher frequencies typically require more expensive hardware, as well as higher energy usage to operate. In addition to the basic differences from being at different frequencies, the 2.4GHz region is very crowded, which means that there is typically more noise and interference. This can lead to slower transmission latency and more packet losses. We will look at several distinct transfer protocols, most of which are commonly used in wireless sensor networks, and describe their characteristics.

15 Chapter 2. Wireless data transfer Energy efficiency in wireless sensor Section 2.3. Wireless transfer protocols networks, through data compression

2.3.1 Wi-Fi Wi-Fi is one of the most popular wireless local area networking protocols. It is primarily used for connecting devices to the Internet, through a wireless access point acting as gateway. Wi-Fi commonly operates on the 2.4GHz and 5GHz bands, and has high data throughput. Newer Wi-Fi standards allow for a theoretical bandwidth of over 1Gbit/s. However, this also means that Wi-Fi has relatively high energy consumption, and requires expensive hardware. The high cost and high energy consumption make Wi-Fi not typically suitable for wireless sensor networks. However, it might be worth considering using Wi-Fi in networks where high throughput is critical, and where energy consumption is not a concern.

2.3.2 Bluetooth Bluetooth is a popular short range transfer protocol. It is maintained and marketed by the Bluetooth Special Interest Group (SIG), which is a group composed of over 30000 companies within telecommunication, computing, networking, and consumer electronics. In addition to developing the specification and protecting the trademark, Bluetooth SIG also maintains standards that manufacturers must meet before they can market their products as Bluetooth devices. As of the Bluetooth specification 4.0 release in 2010, the Bluetooth specification Special Interest Group added Bluetooth Low Energy to the Bluetooth specification. Bluetooth Low Energy is an alternative to Bluetooth, for energy constrained devices. It allows for communicating with other Bluetooth Low Energy devices, as well as devices running regular Bluetooth 4.0 or above. Whereas regular Bluetooth connections are continuous, devices using Bluetooth Low Energy communicate in intervals, sleeping when not in use. This reduces the throughput of the device, but allows for low energy consumption in devices that don’t require a constant connection, such as nodes in wireless sensor networks. It also performs timed communication as described in 2.2.4. Bluetooth benefits from being very generic and highly interoperable. Completely separate devices can communicate with with other, regardless of product manufacturer or transceiver vendor. Additionally, Bluetooth Low Energy devices can describe themselves, and what features they expose, using Generic Attribute (GATT) Profiles. These factors make Bluetooth great for small networks such as homes and smart cars, systems that benefit from a generic protocol with high interoperability and modest throughput. However, Bluetooth low energy is not well suited for longer distance communications, and in networks where the devices are frequently

16 Chapter 2. Wireless data transfer Energy efficiency in wireless sensor Section 2.3. Wireless transfer protocols networks, through data compression relocated. This can make it problematic in networks that are outdoors, or networks that get frequently modified. A final positive note for Bluetooth is that most modern computers, laptops and smartphones come with built in bluetooth adapters. This allows us to connect these devices directly to a wireless sensor network, without having to install additional proprietary hardware.

2.3.3 ZigBee ZigBee is a transfer protocol specification developed by the ZigBee Alliance, a group of companies that maintain and publish the ZigBee standard. The ZigBee specification intends it to be simpler and cheaper than other wireless area networks such as Wi-Fi and Bluetooth. The ZigBee protocol operates at the 2.4GHz and sub-1GHz frequency bands in North America and Europe, and has a higher range than Bluetooth Low Energy, promising full home coverage. The ZigBee protocol is an open protocol, and the ZigBee alliance allows using it non-commercially without having to pay a royalty. However, a membership in the ZigBee Alliance is required to get permission to create commercial products that use the ZigBee specification. Outdoors, ZigBee devices can communicate with nodes up to 200m away in open areas without occlusion, according to the specification.

2.3.4 Proprietary Protocols In addition to popular wireless transfer protocols, many companies make use of their own proprietary protocols. A proprietary protocol can be used, as long as it is following the rules set by IEEE, and any other potential radio regulation laws. Some benefits of using proprietary protocols include not having to pay licensing rights to the protocol, and having a greater control over the format of the transmissions, to better suit the specific wireless communication use case.

17 Encoding and compression for energy efficient transfer of sensor data

In distributed sensor networks, source nodes need to send their data to a sink node. In some networks, the data might pass through multiple nodes before reaching the sink. Depending on the method of communication between nodes, the transfer of data can cost a significant amount of energy. In many wireless sensor networks, data transmission can be the part of the system that drains the most energy, and can be considered a limiting factor for the battery lifetime of the sensor nodes. Minimizing data sent between nodes may therefore yield a huge return in energy savings. This can be accomplished by compressing the data in the source microcontroller, and decompressing it again at the sink node. The microcontroller will have to spend power to compress and decompress the data, but the reduced data size will cause savings in energy for every hop it makes through the sensor network. Depending on the transmission power of the microcontrollers, radio frequency, path through the mesh, and other factors, these energy savings can outweigh the cost of compression and decompression. Depending on the transfer speed of the nodes, and distance between source and sink nodes, compression can also reduce the total time it takes for the message to reach the sink node. In time-critical sensor networks, such as fire detection sensor networks, this could help signal emergencies as quickly as possible. There are many different compression algorithms to choose between, each suited for a different task. In this chapter, we will look at the characteristics of sensor data, analyze what kind of compression algorithms may produce the best results, and look at some compression algorithms that we believe are well suited for compressing sensor data on a microcontroller.

18 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.1. Sensor Data networks, through data compression

3.1 Sensor Data

Sensor data is a general term used to describe measurements taken from the physical environment, using sensor devices. These can be chemical or physical measurements such as temperature or acceleration, user inputs such as knobs or buttons, and more. There are many different sensors with unique characteristics. Even when measuring the same element the sensor data can differ from device to device, as there are many different sensors that perform similar functions, each with their own characteristics. There are however some characteristics that are shared between many different sensor measurements. Data taken with a sensor will often be smooth, meaning that nearby measurements have very close values. This is caused by samples being taken close to each other in time and space, and thus tend to be similar. For example, when measuring temperature over time, typically the temperature won’t change more than a few degrees between every measurement; Big sudden increases or decreases in temperature are very rare. Additionally, sensor measurements tend to stay within a certain range, often oscillating around a central value. This causes repetition in values and patterns. For example, temperature data in one location over a few days will often increase and decrease in a sinusoidal fashion, with nearby measurements having similar values, and with repeating patterns. However, this pattern can be interfered with by a drastic weather change one day, or may drift away after a few weeks, to a different sinusoidal pattern with a different center and amplitude. Not all data measurements are smooth or have repeating values, however. User input sensors for example can be very erratic and non-smooth. Another example is when measuring the Global Positioning System (GPS) coordinates of a ship during a cruise trip. In that case the values will be smooth, but there might not be repeating values. Another thing of importance is that sensors have varying range and precision, and thus the values acquired from them have to be stored in varied bit sizes, including sizes that don’t align with bytes. The data can have precision less than 8 bits, somewhere in between 8 and 16 bits, and so on. This can make the sensor values harder to work with together with compression, as many compression algorithms only perform well on data that is byte aligned, or data that is of a specific bit length. For example, the popular temperature sensor DS18B20 reports temperatures with 9 to 12 bits of precision[13]. If we want to compress this data well, we may have to give up some precision by removing the least significant bit, or use 16 bits to represent the values. Often it’s also possible to change the data to make it more compressible. This will typically involve a transformation function that changes the data

19 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.2. Reducing range or precision networks, through data compression from its original format into one that is easier to compress, with or without losing information. A transformation that loses information is called a lossy transformation, and one that doesn’t lose any information is called a lossless transformation.

3.2 Reducing range or precision

Sometimes, the full range of a sensor is not needed, and the data can be transformed to a format that uses fewer bits and is byte-aligned, to improve compression. For example, if we have 10-bit precision sensor measurements, but we don’t need full precision, we can perform a lossy transformation by removing the two least significant bits from each value. This will reduce the size of the values to fit in a single byte each, at the cost of losing a bit of precision. It will typically also increase occurrence of specific values, and can further improve compression. However, this type of data transform is lossy, and should only be used if lower precision data is acceptable. Another option can be by adding a set value to each measurement, or subtracting a set value from each of them. For example, if all our measurement values range from 200 to 350, we can subtract 200 from each value in order to fit each value in a single byte.

3.3 Run-length encoding

Run-length encoding is a data encoding method sometimes used together with compression algorithms. Run-length encoding is performed by encoding the data into a list of (value, count) tuples. For example, the number sequence

1122221121123332112223322 can be represented as

(1, 2)(2, 4)(1, 2)(2, 1)(1, 2)(2, 1)(3, 3)(2, 1)(1, 2)(2, 3)(3, 2)(2, 2) where each tuple shows what value comes next, and how many instances of that value there are in a row. Run-length encoding works best on data that has a low range of values, and long strings of the same value. Typically, run length encoding will be performed as part of a bigger compression algorithm. For example, if a compression algorithm greatly reduces the number of unique values, run-length encoding can be used afterwards to further compress the data. This can be especially useful together with and encoding methods, which may output data with long strings of repeating values.

20 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.4. Delta encoding networks, through data compression

3.4 Delta encoding

Delta encoding is a popular data encoding method used to transform data into a more compressible format. Instead of being represented by its own value, each value is represented by the change since last measurement. Figure 3.1 shows an example of a list of numbers being delta encoded.

Before delta encoding 66 78 75 100 125 120 After delta encoding 66 12 -3 25 25 -5

Figure 3.1: Example of delta encoding applied to a simple array of values. The first value remains the same after delta encoding. The rest show the difference between a value and the previous value.

We can delta encode a list of values

(a0, a1, a2, ..., an−1, an) into a delta encoded list of the values

(d0, d1, d2, ..., dn−1, dn) by performing the operations

d0 = a0 dn = an − an−1

To transform the values from their difference encoded from back to absolute values, we can use

n an = ∑ di i=0 An alternate way to express this is as follows:

a0 = d0 an = an−1 + dn

In figure 3.2 we can see the effects of applying delta encoding to sea surface temperature measurements that were sampled once a minute. As we can see, the overall spread of values is much smaller in the delta encoded data. The original measurements range from −1.646 to −0.473, whereas the delta encoded data ranges from −0.275 to 0.262. The dataset got reduced from 385 unique values to only 146 unique values, less than half as many unique values. The values are also much more

21 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.4. Delta encoding networks, through data compression

C Original sensor data ◦ −0.50 −0.75 −1.00 −1.25 −1.50 Temperature in C ◦

0.20 0.10 0.00 −0.10 − 0.20 Delta encoded sensor data

Change in temperature 0 100 200 300 400 500 600 700 800 900 1000

Figure 3.2: Example showing the effects of delta encoding a sensor data set, of sea surface temperatures. concentrated on one region. In the original dataset, the most common value is −1.619 with 13 occurrences. In the delta encoded data, the most common value is 0, with 82 occurrences. By reducing the range of the values, and having values more concentrated on a smaller region, the data has been transformed to a format that is likely more compressible. Delta encoding can however have drawbacks. Applying delta encoding to erratic and non-smooth data can result in data that is even less suited for compression. If nearby samples have values that are far from each other, delta encoding can further increase this distance, and make the data appear even more erratic. Fortunately, many sensor data measurements are smooth, and allow for improved compression with delta encoding. Another problem with delta encoding is how all previous delta encoded values are needed, in order to decode one value. This means that we can’t decode a value in the middle of a delta encoded dataset, without first decoding every previous value. If we lose any values during a wireless transfer, all following values will also be lost. It may be wise to cut the input data into smaller chunks that are delta encoded separately.

22 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.5. Huffman Encoding networks, through data compression

3.5 Huffman Encoding

Huffman encoding is a popular character-by-character encoding method, introduced by David A. Huffman in his 1952 article "A Method for the Construction of Minimum-Redundancy Codes" [14]. It works by ’translating’ each character to its optimal bit-representation. The idea is to use a minimal number of bits for each character. For example, if we only have 3 unique characters, there is no need to spend upwards of 8 bits on each character in our data string, as each set of 2 bits can represent 4 different characters. Huffman encoding goes a step further, encoding one of the characters, whichever is most used, as only 1 bit. This is to ensure that the encoded code is as small as possible, as far as encoding each character at a time, while also being unique for the input string. This pattern also scales well for greater amounts of characters. The more often a character occurs, the shorter the Huffman encoded code for that character will be. The following is an example of a Huffman encoding. Imagine we have a dataset with only three different characters, ’A’, ’B’, and ’C’, where one of the characters, ’B’, occurs more often than each of the other two characters. We can set the bit-strings ’0’ to represent ’B’, ’10’ to represent ’A’, and ’11’ to represent C. Thus, the string ’AABCBBCB’ would be encoded as the binary string ’101001100110’. This code uniquely decodes to the string ’AABCBBCB’, given that we know what character each bit sequence translates to, and is in fact an optimal code for encoding the characters in the dataset separately.

23 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.5. Huffman Encoding networks, through data compression

These are the steps we need to perform for a Huffman encoding: 2 1 Create Create 1. Iterate through the input data Huffman frequency representation string to count the unique table table characters, and how often each character is used. 2. Use the data from step 1 to 4 Write its 3 create a Huffman translation Read Huffman table from character to its bit- representation character string representation. 3. Read the character that is no in the current input data position. 6 5 4. Find the bit-string W that it Move to next Done with character translates to, and write that to the data? our output string. 5. Move the input forward by one character, and the output by

yes however many bits W took. 6. Go to step 3, continue until all Done data is encoded.

When working with sensor data, each character is typically one sensor measurement value, or part of a sensor measurement value.

24 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.6. Arithmetic coding networks, through data compression

3.6 Arithmetic coding

Arithmetic coding is a lossless invented by Jorma J. Rissanen, published in his 1976 article "Generalized Kraft Inequality and Arithmetic Coding". [15] Arithmetic coding works by representing a series of characters as a single arbitrary-precision fraction in the range [0.0, 1.0). At any point in the message, the data so far is represented as a range between to fractions. To perform arithmetic coding we first need to create a probability model for our coding. The probability model is a model describing the likelihood of each character, in our non-compressed data. It is represented through a series of connected intervals in the range [0.0, 1.0), with each interval corresponding to a character, where the size of the interval describes the likelihood of that character. For example, if we have an alphabet with only two characters, "a" and "b", with both characters having an equal probability of appearing in the string, we create a probability model with the interval [0.0, 0.5) representing "a", and the interval [0.5, 1.0) representing "b". This probability model can then be used to perform the arithmetic coding, by representing the string as a number. A number in the range [0.0, 0.5) represents the string "a", while a number in the range [0.5, 1.0) represents the string "b". We can then use the probability model to encode the next character in the string as well, by again dividing our previously mentioned ranges into further ones. A number in the range [0.0, 0.25) represents the string "aa", [0.25, 0.5) "ab", and so on, recursively. The end result is that we can encode any string using this alphabet as a single arbitrary-precision fraction in the range [0.0, 1.0). If the characters have different probabilities of occurring in the string, we can give the characters intervals with varying size in the probability model. For example, if "a" is three times as likely to appear as "b", we can give it the range [0.0, 0.75), and "b" the range [0.75, 1.0). This allows more common sequences to be represented using fewer bits, thus ensuring that we can encode the string optimally. Figure 3.3 shows the string "aaba" encoded using this probability model. The more accurately the probability model describes the string, the more optimally we can encode it, resulting in fewer bytes total once compressed. An accurate probability model is required in order to compress effectively. Additionally, the decoder needs to have the same probability model as the coder, to ensure that the data is correctly decoded. This can be done in multiple ways. One method is to define the probability model beforehand, and have it shared by the encoder and decoder. This method works well when the probability distribution of the data is well known and not expected to change. The downside is that it only optimally messages that fit said probability model. It will not work well with data that doesn’t fit

25 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.6. Arithmetic coding networks, through data compression

0.00 0.00 0.00 0.421875

a aa aaa aaba

0.75 0.5625 0.421875 0.52734375

b ab aab aabb 1.00 0.75 0.5625 0.5625

Figure 3.3: Arithmetic coding of the string "aaba" into a single arbitrary precision floating point number. Here, the letter "a" is three times as likely as "b", and therefore takes three times as much space on the probability model. Using this probability model, the string "aaba" can be coded to the floating point number in the interval [0.421875, 0.52734375) that takes the least amount of bits to write as an arbitrary precision floating number. That number, together with an integer that defines the length of the input string, defines a unique arithmetic code that can be decoded into the input string. the model, or with data with frequently changing probability distributions. Another method is to decide the probability model separately for each string. This can guarantee an optimal compression for all strings. However, it does mean that the encoder has to perform an extra pass over the string to first create the probability model. Additionally, the model has to be sent together with the compressed data in the message, as it is needed by the decoder to decompress the string. A final method is to adaptively change the probability model during encoding and decoding, to better match the string at any point. The decoded data matches the original string as long as the probability model is the same when decoding as when encoding, between each character. This method is great for compressing data without knowing the probability distribution beforehand. The probability model is built during encoding and decoding, meaning that we don’t need to do an extra pass over the data first, and that we don’t need to send the probability model alongside the compressed data. Additionally, this method can sometimes compress even better than with an optimal probability model, as the probability model might better fit the string at any given character, rather than being optimal for the entire string as a whole. This final method is well suited for sensor data. Sensor data tends

26 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.6. Arithmetic coding networks, through data compression to have repeating values during short intervals, and repeating patterns of values are not uncommon either. The adaptive probability model will change over time to best fit these repeating values and patterns, achieving favorable compression.

3.6.1 Prediction by partial matching Prediction by partial matching (PPM) is a statistical data compression algorithm based on arithmetic coding. It was developed by John Cleary and Ian Witten, and published in their 1984 article "Data Compression Using and Partial String Matching" [16]. PPM is an adaptive arithmetic coder. It dynamically creates and maintains a probability model while encoding character by character. When decoding, it dynamically recreates and maintains the same probability model, character by character. By using the contents of the string so far at any step to describe the probability model, it doesn’t require additional data to represent the probability model. PPM uses a to calculate probabilities. At any point during the encoding and decoding process, series of characters that have occurred earlier in the string are used to define the probability model for the next character. For example, during decoding, if the character sequence "abc" has occurred frequently in the string so far, and the decoder has just decoded "ab", there is a high probability that the next character in the string is "c". Therefore, that character is given a big range in the probability model. The number of immediately preceding characters used to predict the next character is called the order of the PPM model. A PPM model of order 3 is written as PPM(3), and uses the character sequence made by three previous characters to determine the next character, by looking at which characters commonly occurred after this sequence earlier in the string. If the 3 character sequence hasn’t occurred yet, and thus no prediction can be made using the three character sequence, the decoder tries to make a prediction using only the last two characters, and so on. PPM is especially good at encoding text, which often has repeating words, phrases and sentences, but can also work well with sensor data, with repeating sequences of data.

27 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.7. Lempel–Ziv 77 networks, through data compression

3.7 Lempel–Ziv 77

Lempel–Ziv 77 (LZ77) is a algorithm developed by and in 1977, in an article titled "A universal algorithm for sequential data compression"[17]. Together with the similar LZ78, released by Lempel and Ziv one year later, LZ77 is a well known algorithm that serves as a basis for many modern compression algorithms. In 2004, IEEE declared LZ77 and LZ78 key historical achievements in electronic engineering. The LZ77 is a . Dictionary coders are compression algorithms where the compressed code is a list of indexes to strings in a dictionary. The LZ77 uses a sliding window as its dictionary. When compressing a substring of the input data, the algorithm looks at the previous characters inside the sliding window to see if there is a matching prefix. It then generates a code, (i, j, x), where i is the index of the prefix inside the window, j is the length of that prefix, and x is first character after the prefix. It writes this code to the output, and moves the window forward by the length of the prefix plus 1 for the extra character x. If no prefix is found within the previous characters in the sliding window, we write a code with prefix length 0, so the code only describes the next character, x. After the compression, the output is a list of LZ77 codes. For working with LZ77 using the original rules proposed by Lempel and Ziv, we need define two values. First we need to define the window size, N, which tells us the size of our sliding window. Secondly we need to define a max substring size, Ls, which tells us the max length of a substring that can get compressed into one code. These two values need to be the same when encoding and decoding a message. It is best to either decide these values beforehand, or decide these values seperately for each compression, at the start of the compression. If the values are decided separately for each compression, they need to be sent to the decompresser as part of the compressed data’s message. By having a larger window size N, we increase the chance of finding a matching prefix in the window, but we need more bits for the i values in the output codes to describe the larger range of possible indexes in the window. By having larger max substring size LS, we can store larger prefixes in a single output code, but we need more bits for the j values in the output codes to describe the larger range of possible substring lengths in the window. The window acts as a First In First Out (FIFO) buffer, with a head and a tail end. Whenever a character is added to the window, it is added to the head of the window, and a character is removed from the tail of the window. Figure 3.4 shows an example of Lempel–Ziv 77 applied to an input string "aabbbcbbbca", resulting in an list of LZ77 codes once compressed.

28 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.7. Lempel–Ziv 77 networks, through data compression

window, window, not remaining step output list of LZ77 codes encoded encoded input 0 ????? aabbb cbbbca 1 ????a abbbc bbbca [(0,0,a)] 2 ??aab bbcbb bca [(0,0,a),(4,1,b)] 3 abbbc bbbca [(0,0,a),(4,1,b),(4,2,c)] 4 bbbca [(0,0,a),(4,1,b),(4,2,c),(1,4,a)]

Figure 3.4: Step by step Lempel–Ziv 77 compression performed on the string "aabbbcbbbca", with window size N = 10 and max substring size Ls = 5. For each step, the content of the window, remaining input to compress and current state of the output list is displayed. The LZ77 code added in step 1 has prefix length j = 0, meaning that it only consists of the character "a". The code added in step 3 has prefix start position i = 4 and prefix length j = 2, meaning that the prefix here consists of data that is both in the encoded and yet to be encoded parts of the window.

These are the rules for encoding and decoding data with LZ77, as explained in the original article by Abraham Lempel and Jacob Ziv: 2 1 Add first 1. Fill the N bytes window with Initialize L bytes window s zeroes. to window 2. Add the first Ls characters from the input string to the

4 window. 3 Find greatest Set current prefix 3. Look at the substring from p string to earlier in n − Ls to N character n − Ls to N in the the window window, and find the longest prefix p of this substring within the window, that starts 5 before character n − L . Define , s i no j and x 4. Set i equal to the position of p from prefix and string in the window, j equal to the length of p, and x equal to the 8 first character after p in the Done with input string. 6 the data? Write (i, j, x) 5. Write the code (i, j, x) to the to output output. 6. Add the next + characters yes j 1 from the input string to the 7 Add j + 1 window. chars from Done input into 7. Go to Step 3, continue until all the window data is compressed.

29 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.7. Lempel–Ziv 77 networks, through data compression

To decode it again at the sink, the following procedure can be used: 2 1 Add Ls Initialize unknown 1. Fill the window with zeroes. window bytes to window 2. Add Ls unknown characters to the window. An unknown character is a character that is

4 inside the window, but has not Write 3 read code yet been decoded. substring (i,j,x) from (p,x) defined input 3. Read the next code, (i, j, x), by code from the input string. i, j, and x are as described in the no encoding rules. 4. Write the substring (p, x) to 5 6 the first unknown character in Add j + 1 unknown Done with the window, as well as to the bytes to the data? output string. p is as described window in the encoding ruleset. 5. Add j + 1 unknown characters yes to the window. 6. Go to Step 3, continue until all data is decompressed. Done

Newer and more complex variants of LZ77 optimize on these rulesets in various ways. For example, some variants add an extra bit at the beginning of each code, to determine if the code consists of index, length, and character, or if it is just a character. This makes the codes that represent substrings with length 0 many bits smaller, at the cost of one extra bit per code.

30 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.8. Lempel–Ziv–Welch networks, through data compression

3.8 Lempel–Ziv–Welch

Lempel–Ziv–Welch (LZW), named after its creators Abraham Lempel, Jacob Ziv, and , is a well known and widely used string compression algorithm. It is a dictionary based sequential data compression algorithm based on Lempel-Ziv 78, and can compress most data streams fairly well. It was published by Terry Welch in 1984 in the article "A Technique for High-Performance Data Compression"[18]. Unlike LZ77, which uses a sliding window of the previous characters as its dictionary, LZW creates a separate dictionary that is built up from the previous characters. The dictionary is a list of arbitrary-length strings. The list is initialized with every unique single-character string in the given alphabet, and is added to whenever a new string is encountered. The compressed data is an array of dictionary entry indexes, that together make up the original data. Given enough repeated substrings in the input data, these substrings are compactly compressed into single dictionary entry indexes for the compressed output. Additionally, LZW is designed in such a way that it doesn’t require the dictionary to be sent together with the data for decompression. Instead, the dictionary can be rebuild at the end node, during the decompression. Figure 3.5 shows is a step by step example of a simple Lempel–Ziv–Welch compression. In this case, our alphabet contains only the characters "a", "b" and "c", and we are compressing the string "aabcbcaabcaab". The dictionary is initialized as (a, b, c), and the final compressed result is (0, 0, 1, 2, 1, 7, 2, 3).

step remaining input output list dictionary 0 aabcbbbcaa (a,b,c) 1 abcbbbcaa (0) (a,b,c,aa) 2 bcbbbcaa (0,0) (a,b,c,aa,ab) 3 cbbbcaa (0,0,1) (a,b,c,aa,ab,bc) 4 bbbcaa (0,0,1,2) (a,b,c,aa,ab,bc,cb) 5 bbcaa (0,0,1,2,1) (a,b,c,aa,ab,bc,cb,bb) 6 caa (0,0,1,2,1,7) (a,b,c,aa,ab,bc,cb,bb,bbc) 7 aa (0,0,1,2,1,7,2) (a,b,c,aa,ab,bc,cb,bb,bbc,ca) 8 (0,0,1,2,1,7,2,3) (a,b,c,aa,ab,bc,cb,bb,bbc,ca)

Figure 3.5: Step by step Lempel–Ziv–Welch compression performed on the string "aabcbcaabcaab", with the alphabet (a, b, c). For each step, the remaining input to compress is displayed, as well as the current states of the output and dictionary.

31 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.8. Lempel–Ziv–Welch networks, through data compression

When compressing data or byte-strings, the dictionary is initialized with 256 strings, representing all 256 unique single-byte strings.

LZW compression is performed through the following steps: 2 1 Find longest Initialize prefix in 1. Initialize the first 256 entries dictionary dictionary in the dictionary, to contain all strings of length one.

3 2. Find the longest string W in 4 Write Move input dictionary the dictionary that matches forward index to correspondingly the beginning of the current output position of our input data. 3. Write the dictionary index for W to output

6 4. Move input forward by the 5 Add prefix length of W. plus 1 Done with letter to the the data? no dictionary 5. Add W followed by the next character in the input to the dictionary. yes 6. Go to Step 2, continue until all data is compressed. Done

32 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.8. Lempel–Ziv–Welch networks, through data compression

The following steps can then be performed at the sink node to decompress the data into it’s original values:

1 2 Initialize Read first dictionary dictionary index

3 1. Initialize the first 256 entries 4 Set WCurrent Write WCurrent to the in the dictionary, to contain all to output corresponding string strings of length one. 2. Read first input dictionary index. 5 6 Set Read next WPrevious = dictionary 3. Look up in the dictionary what W index Current string, WCurrent, it translates into.

4. Write WCurrent to the output.

9 5. Move WCurrent into WPrevious. 7 Set WCurrent to 6. Read the next dictionary entry WPrevious Is this followed index in our by the first from input. no dictionary? character of WPrevious 7. Check if this index exists in our dictionary.

yes 8. If it does exist in the dictionary, 10 set WCurrent to it. Add WPrevious followed 8 by the first 9. If it doesn’t exist in the Set W character of Current to the string dictionary yet, set WCurrent WCurrent to the equal to WPrevious followed by dictionary the first letter of WPrevious. 10. Add next dictionary item to the dictionary, which is 11 WPrevious Write WCurrent followed by the first letter of to the output WCurrent.

11. Write WCurrent to the output. 12. Go to step 5, continue until all data is decompressed. 12

Done with Done no the data? yes

33 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.8. Lempel–Ziv–Welch networks, through data compression

3.8.1 Sensor Lempel–Ziv–Welch In 2006, Christopher M. Sadler and Margaret Martonosi released an article titled "Data Compression Algorithms for Energy-Constrained Devices in Delay Tolerant Networks"[9]. In this article, they talk about the benefits of compression in sensor networks, and develop a compression algorithm they call the Sensor Lempel–Ziv–Welch (S-LZW). S-LZW can be seen as a subset of LZW with a few additional rules that are chosen to optimize for energy consumption, for most datasets. The rules were decided by performing tests on multiple datasets, and finding out what rules lead to the best average results. The rules are as follows:

• Use a dictionary with 512 entries. This means that our dictionary indexes are 9 bit unsigned integers, and thus the compressed data is an array of 9-bit integers. • Compress the data in chunks of 2 pages, which equals 512 bytes. • If the dictionary fills up before all 512 bytes are processed, freeze the dictionary, and continue to use it without adding any new entries. • transfer the compressed data in groups of 10 or less dependent packets, to minimize the cost of a packet loss.

3.8.2 Lempel–Ziv–Welch with minicache In addition to S-LZW, Sadler and Martonosi also propose a variant of LZW that makes use of a Mini-Cache (MC) in addition to the static dictionary, to further optimize results. The article calls this the S-LZW-MC. The primary idea of the mini cache is to have an additional, smaller dictionary with dynamic content, acting as a cache. Unlike the primary dictionary, the cache would only keep the most recently used and created dictionary items in it. When encoding, we can output cache entry index instead of dictionary entry index if the largest substring is in the cache. This is called a called cache-hit. We also have to write an additional bit to the output, to signify that the data is in the cache. Since the cache has less entries than the dictionary, the index and cache bit take up less space than a dictionary index. If the largest substring isn’t in the cache, we write the dictionary entry index to the output, together with an additional bit to signify that the data isn’t in the cache. This is called a cache-miss, and the code representation takes up one bit more than in S-LZW without cache. Compared to S-LZW without cache, cache-misses take up one more bit, and cache-hits take up a few less bits. Sensor data, which tends to repeat numbers heavily in short bursts, should frequently cache-hit, thus reducing the final compressed data size. Assuming a dictionary size of 512 entries, with 9-bit integer indexes, and mini-cache of size 32, with 5 bit integer indexes, a cache-miss takes

34 Chapter 3. Encoding and compression Energy efficiency in wireless sensor Section 3.8. Lempel–Ziv–Welch networks, through data compression

10 bits, and a cache-hit takes up 6 bits. Compared to a system without cache, where all entries take up 9 bits, the system with cache requires a cache-hit rate of 25%, to compress equally well. If we have a greater cache-hit rate, the mini-cache pays off. If we have a smaller cache-hit rate, the mini cache is detrimental.

In order to work with cache, we have to modify the steps used during compression and decompression. Below are the steps needed for compression. The decompression is modified similarly. 1. Initialize the first 256 entries 2 in the dictionary, to contain all 1 Find longest Initialize prefix in dictionary strings of length one. dictionary 2. Find the longest string W in the dictionary that matches the beginning of the current 4 position of our input data. write cache 3 index to output, Is this index 3. Check if W is present in the and move yes in our cache? cache. it to front of cache 4. If W is in the cache, write the cache bit and cache index for W no to the output. Then move W to the front of the cache 5 write 6 dictionary 5. If W is not in cache, write Move input index to forward output, the directory bit and directory and add it index for W to the output. to cache Then Add W to the cache, removing the oldest entry if there are no more empty cache slots. 7 8 Add prefix plus 1 6. Remove W from the input. Done with letter to the the data? dictionary no 7. Add W followed by the next and cache character in the input to the dictionary and cache, removing yes the oldest entry if we’re out of empty cache slots. Done 8. Go to Step 2, continue until all data is compressed.

35 Experiments

As part this thesis we performed multiple experiments to assess the effects of data compression in wireless sensor networks. For this, we decided on several compression algorithms that are compact enough to fit on the flash memory of the Flex Gecko, together with the rest of the code needed to operate the microcontroller, and transmit and receive data. To keep the scope of this thesis from becoming too large, we decided on several different encoding and compression algorithms, with multiple variations of some of them. Our compression algorithms are suited for general-purpose data compression, and we expect them to work well with sensor data. We decided to keep our choices limited to Lempel–Ziv based algorithms and arithmetic coding based algorithms. This gives us a wide range of compression algorithms, while also giving us some similar algorithms to compare to one another. In this chapter we will look at the encoding and compression algorithms we used for our experiments, the datasets we used, and the experiments that we performed, including which tools we used for each of them.

4.1 Compression algorithms

4.1.1 Delta encoding We have shown in 3.4 how delta encoding can have a positive effect on the compression ratio of sensor data when used in combination with compression algorithms. For this reason, we have decided to perform all our measurements with and without delta encoding. when performing delta encoding in conjunction with the compression algorithms, the transmitter and receiver nodes need to apply delta encoding in addition to the compression algorithm. When the transmitter node sends data, it first delta encodes the data, then compresses the delta encoded data, before transmitting that. On the receiving end, the receiver

36 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.1. Compression algorithms networks, through data compression decompresses the received message into the delta encoded data, and then decodes that back into its original format. We have implemented two different delta encoding algorithms, each suited for a different case. For each dataset, we will use the delta encoding implementation we expect to perform best. Our first delta encoding algorithm is a simple byte-by-byte delta encoding implementation with overflow wrap-around. Figure 4.1 shows this delta encoding algorithm applied to an array of unsigned 8 bit integers. During the rest of this thesis, this delta encoding is denoted as "∆ encoding". When used together with a compression algorithm, it is denoted as "∆ +".

Before delta encoding uint8 array 66 78 75 100 125 120 After delta encoding uint8 array 66 12 251 25 25 251

Figure 4.1: Example of delta encoding applied to an array of 8-bit unsigned integers. The first value is written directly to the output. The rest are a delta. If the delta overflows, the byte wraps around. This allows us to encode deltas outside the [−128, 127] range, without increasing the length of the data. We can see that the total byte-size remains unchanged.

4.1.2 Variable-width delta encoding In addition to potentially improving other compression algorithms by first delta encoding data, a delta encoding algorithm can also compress the data itself. This can be done by turning 16 and 32 bit arrays into arrays of 8-bit integers, as many differences between two adjacent sensor data measurements tend to be within a [−127, 127] range, which can be expressed as 8-bit signed integers. Figure 4.2 shows delta encoding performed on a simple array of 16-bit unsigned integers.

Before delta encoding uint16 array 966 978 975 1000 1125 1120 After delta encoding int8 array 966 12 -5 25 125 -5

Figure 4.2: Example of delta encoding applied to an array of 16-bit unsigned integers. The first uint16 value is written directly to the output, using two bytes in the int8 array. We can see that the total byte-size is reduced from 12 bytes to 7.

37 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.1. Compression algorithms networks, through data compression

In the cases where we get a delta value that is outside of the [−127, 127] range, we instead write −128 to the output, to signal an escape from the 8-bit integer array. After that, we write the original non-delta-encoded number to the output in its original format. We call this delta encoding implementation variable-width delta encoding. Figure 4.3 shows our variable-width delta encoding being applied to a short array of 16-bit unsigned integers. Since we represent the delta as an 8-bit integer by default, and escape for values that can not be represented this way, our delta encoding algorithm will work less well on datasets where a big portion of the changes between two neighboring values lies outside the [−127, 127] range. This can occur with data sets that are not smooth, and datasets with very high-precision values, where a difference of 128 between two measurements represents a very small difference in the environment. Nonetheless, for a lot of sensor data it is applicable to use. We expect this algorithm perform better on datasets consisting of 16 bit and 32 bit integers, than our first delta encoding algorithm, due to its ability to compress the data. For the rest of this thesis, we will denote our variable width delta encoding as "vw∆ encoding". When used together with other compression algorithms, it will be denoted as "vw∆ +".

Before delta encoding uint16 array 966 978 975 1200 1125 1120 Hex representation C6 03 D2 03 CF 03 B0 04 65 04 60 04 After delta encoding Mixed width array 966 12 -5 1200 -75 -5 Hex representation C6 03 0C FB 80 B0 04 B5 FB

Figure 4.3: Example of variable-width delta encoding applied to an array of 16-bit unsigned integers. We can see that the total byte-size is reduced from 12 bytes to 9.

38 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.1. Compression algorithms networks, through data compression

Our delta encoding algorithm functions as follows: 1 Read first number from 2 Read next input into v p number and write it into directly to vc the output 1. Read the first number from the input array into vp, and encode it directly into the output 3 array, as its original type. 4 Write vc − vp Is |vc − vp| to the output less than or 2. Read next number from the as an int8 yes equal to 127? input array into vc. 3. Check if the absolute

no difference between vp and vc is less than or equal to 127. 5 Write −128 to 4. If yes, encode vc − vp to the the output output as an 8-bit signed 6 as int8, then Set vp = vc vc to the integer. output in its original format 5. If not, encode −128 to the output as an 8-bit signed integer, and then vc directly to the output, as its original type.

7 6. Set vp equal to vc. Done with 7. Go to step 2, continue until the data? no the input array is completely encoded.

yes

Done

39 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.1. Compression algorithms networks, through data compression

4.1.3 Snappy Snappy is a compression algorithm based on LZ77. It is developed and maintained by Google, and is used within the company[19]. The snappy compression algorithm focuses first and foremost on being generally fast, while still maintaining a reasonable compression rate. Snappy uses a sliding window that is up to 65536 bytes long, and a compressed data output format that varies slightly from the original LZ77. The first bytes in the compressed data are not compressed with LZ77, but with variable length encoding instead. After that, Snappy uses codes for output, similar to the ones in LZ77. The codes are modified slightly from LZ77, and allow for more complex outputs, to better handle different cases that can occur when compressing. Additionally, Snappy will sometimes pick a non optimal compression option, to reduce execution time. The Snappy algorithm is also designed to not produce output that is much larger than the input. It starts by making a rough estimate of the compressed data size. If it estimates that the compression benefits are minimal, or that the compression is detrimental, it will not compress the data at all. Instead, it will simply add a few bytes to inform the decompresser that the data is not compressed. While this means that nothing is gained on the compression, in these cases, it also means that it doesn’t spend time or energy on a compression that will most likely not be beneficial. By using Snappy for our experiments, we will be able to observe how fast compression and decompression speeds affect the energy consumption in our wireless sensor network, compared to algorithms that are slower, but might compress better. Snappy is available online on Google’s Github page[19]. It is released under a license based on the Berkeley Software Distribution (BSD).

40 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.1. Compression algorithms networks, through data compression

4.1.4 LZ4 LZ4 is a compression algorithm based on LZ77. It is developed and maintained by Yann Collet on Github[20]. Similarly to Snappy, LZ4 adds to the base LZ77 algorithm by using more complex output codes that can better handle various cases. The implementation also has preference options that allow for varying compression efficiency. There are faster and slower versions of LZ4 available, which trade speed for potential compression ratio. For this thesis, we went with the default version. We decided to use LZ4 in our thesis as it is a widely used Lempel– Ziv based compression algorithms, used in a variety of operating systems and applications[21]. Similarly to Snappy, it is considered to be a fast algorithm, especially when decoding. LZ4 is available online on the official Github page[20], and is released under a BSD 2-Clause License.

4.1.5 LZW For this thesis, we created an Lempel–Ziv–Welsch implementation based on the LZW spec as described in the article "Data Compression Algorithms for Energy-constrained Devices in Delay Tolerant Networks"[9]. In addition to the description from the article, a lookup array was added to improve compression speed, and some settings added that can be tweaked at compile time. We also created a variation of our LZW code that uses a minicache to potentially further improve the compression ratio, as described in 3.8.2. This is also implemented as described in the article. We denote this version of our program as LZW_MC. We included LZW as it is a popular compression algorithm for general data, yet is simple to implement. It also recommended in the previously mentioned article, and thus known to produce good compression ratios. The LZW and LZW_MC implementations are available online on Github, and are released under the MIT license[22].

41 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.1. Compression algorithms networks, through data compression

4.1.6 PPMs PPMs is a lightweight PPM implementation written and published by Dmitry Shkarin[23]. It is a PPM implementation that focuses on requiring low memory, which allows it to be used on our microcontroller. The program also allows for setting the order of the PPM model during execution. For this thesis, we decided to run three different orders, to see how the PPM model order effects the compression ratio for the algorithm, and how it affects the energy consumption. We decided on 5, 3 and 1, denoted as PPMs(5), PPMs(3) and PPMs(1) respectively. These will give us a good overview over how the compression and energy consumption changes with PPM model order. We don’t expect model orders above 5 to work well with our dataset sizes, as we don’t expect many repeated patterns that are longer than 5 bytes in the datasets we have chosen to test with. PPMs is an algorithm that aims to achieve high compression ratios, at the costs of computing time. By using it, we will be able to observe how much energy we can afford to spend on compression and decompression, in order to minimize our data transfer energy costs. PPMs is available online on a Russian website dedicated to collecting and publishing materials in the field of data compression[23]. It is released into public domain, and can be used and modified without license.

4.1.7 fpaq0 fpaq0 is a compression algorithm based on arithmetic coding, created by Matt Mahoney[24]. It is a statistical data compression algorithm that uses arithmetic encoding together with an adaptive probability model. fpaq0 is an order 0 compression algorithm. This means that the probability model at any point is solely dependent on the number of occurrences of each symbol so far. Since it only cares about occurrence of symbols, instead of series of symbols, it can compress faster than PPM, and requires less memory to run. fpaq0 is especially good for data with small range and a lot of repeating values. Since it doesn’t take account for sequences of data, it can compress especially well data sets with repeating values, without requiring the data to have repeating sequences of values. We included fpaq0 in this thesis as it is a lightweight arithmetic coder with order 0 adaptive probability model. It is well suited for our datasets that have commonly repeated values, and will be interesting to compare to PPMs. fpaq0 is available online on Matt Mahoney’s webpage. There are also variations of the program available for download[24]. For this thesis we went with the original version, as the other versions required too much memory to run, and could not fit on the microcontroller. It is released under the GNU General Public License (GPL).

42 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.2. Datasets networks, through data compression

4.2 Datasets

To measure energy efficiency and compression rate, we ran our compression algorithms on sensor data. As measuring data is not part of the scope of this thesis, we decided to use previously measured data to perform our experiments on. For this thesis, we chose to use sensor data sets from a 2007 United States Coast Guard cruise through the Bering Sea on the Healy class ice breaker USCGC Healy (WAGB-20)[25]. This data contains variety of measurements and metadata taken at regular intervals. The full data consists of many different measurements, all taken once a minute, starting on the 14th of April 2007 at 6:00 UTC. We have used these measurements to create three distinct datasets to use for our experiments, in order to analyze the effect of compression over a wide range of use cases. This data was not measured by a wireless sensor network. However, the sensors used, and the data acquired, are similar to sensor measurements one would find in a wireless sensor network. Therefore, the results of our experiments using these datasets will be applicable to wireless sensor networks. For each of these datasets we will use 2048 bytes of data for our experiments. This gives us big enough datasets to give us reliable results, without being too big to fit on the microcontrollers together with the compression algorithms.

4.2.1 Relative humidity The first dataset is from relative humidity measurements in the Bering Sea. The sensor used has a precision of 1% relative humidity, with values ranging from 0% to 100%. The values are represented as 8 bit unsigned integers. While it is possible to represent the data as 7-bit unsigned integers, our chosen compression algorithms are designed to work with data aligned to 8-bit boundaries. An overview of this dataset is shown in figure 4.4. Due to the low range of values, low precision, and being 8 bits per value, this data set is ideal for compression. It exhibits a high rate of repeated values and sequences. The delta encoded humidity data, shown in figure 4.5, has an even smaller range of values, which will likely further increase the effectiveness of the compression algorithms when used in conjunction with them.

43 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.2. Datasets networks, through data compression

90

85

80

75 Relative humidity % 70

0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time in minutes

Figure 4.4: Humidity data

3

2

1

0

−1

−2

−3 Change in relative humidity % 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time in minutes

Figure 4.5: Delta encoded humidity data.

44 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.2. Datasets networks, through data compression

4.2.2 Sea surface temperature Our second the sensor data dataset is measurements of sea surface temperature in the Bering Sea. The sensor used has a precision of 0.001 degrees centigrade, and ranges from -5 to 35 degrees. Figure 4.6 shows this dataset, and figure 4.7 shows the dataset after delta encoding has been applied to it. For this thesis, we modified the values in this dataset into 16 bit unsigned integers, without reducing the precision of the values. The Cortex M chip we used for our energy efficiency measurements can operate on floating point numbers, but there are many microcontrollers that can not. In the interest of keeping these experiments applicable to these classes of MCUs as well, we will not be testing on floating point values. Additionally, we expect this modification from floating points to unsigned integers to improve the efficiency of our delta encoding algorithm, as it allows us to perform our variable-width delta encoding. Unlike the humidity data, this dataset is less ideal for compression, due to consisting of 16-bit values, having higher precision and wider range of values. This dataset will show us the effects of compression in a non-ideal scenario. This dataset will also allow us to assess the effect of our variable width delta encoding.

−0.4

−0.6 C ◦ −0.8

−1.0

−1.2

−1.4 Temperature in

−1.6

0 100 200 300 400 500 600 700 800 900 1000 Time in minutes

Figure 4.6: Sea surface temperature data

45 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.2. Datasets networks, through data compression

0.3 C ◦ 0.2

0.1

0.0

−0.1

−0.2 Change in Temperature −0.3 0 100 200 300 400 500 600 700 800 900 1000 Time in minutes

Figure 4.7: Delta encoded sea surface temperature data.

4.2.3 Mixed data Many sensor nodes in wireless sensor networks perform multiple sensor measurements, and send them to the sink all at once. This data that is transferred will typically consist of metadata, and some measurements. Therefore, our third dataset is a mix of many of the measurements taken, in order to measure the effect on energy consumption compression has for these cases. Our mixed data comes in blocks of 32 bytes, where each block consists of a few measurements taken, as well as some metadata to go along with it. The format for each block is as described in figure 4.8. Since this dataset does not consist of a single array of 16 or 32 bit integers, using the variable delta encoding is not advised, and we will use the regular delta encoding instead. As the data from byte to byte won’t make for a smooth dataset, we don’t expect the delta encoding to have a positive effect on the compression ratio.

46 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.3. Experiments networks, through data compression

Name Data type Description POS_MV_Year uint16_t Year POS_MV_Month uint8_t Month number POS_MV_Day uint8_t Day number POS_MV_Time uint32_t Seconds POS_MV_Heading_Accuracy uint8_t Heading accuracy POS_MV_Heading uint16_t Heading POS_MV_LAT uint32_t Latitude POS_MV_LON uint32_t Longitude RMY_Humidity uint8_t Humidity RMY_Baro uint32_t Barometer TSGF_SST uint16_t Sea surface temperature TSGF_Int_Temp int16_t Internal temperature TSGF_Cond uint16_t Conductivity TSGF_Sal uint16_t Salinity

Figure 4.8: The format for our mixed data dataset.

4.3 Experiments

For this thesis, we decided on two experiments that allow us to throughly assess the effects of compressing sensor data in wireless sensor networks. All code used for these experiments are available online on Github[26]. The code can be used to replicate these experiments, or can be built upon for future work and experiments.

4.3.1 Compression efficiency For our first experiment, we wanted to analyze the compression efficiency of our compression algorithms. For this, we measured the compression ratio of the algorithms, without looking at the energy consumption required to perform them. This tells us if there is any potential benefit to performing compression on our sensor data datasets in a wireless sensor network. In order for there to be any benefit to the energy consumption, the compression algorithm has to reduce the message payload on average. If the payload remains the same after compression, the transmitter and receiver will use an equal amount of energy as if there was no compression, but the energy cost of compressing and decompressing the data will increase the overall energy usage of each device. If the payload increases through the use of compression, the transfer cost will be even greater than if there was no compression, in addition to the added cost of compression and decompression. As en extension, the more we are able to compress, the more can we potentially benefit from the compression, as the transfer cost gets reduced. Additionally, we will test out the compression efficiency given different payload sizes. The amount of data that is sent during one transmission

47 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.3. Experiments networks, through data compression can vary greatly depending on the context. Some wireless sensor nodes are time sensitive, and data needs to be transferred immediately, resulting in the messages having small payload sizes. In other cases, the node can sit and collect many sensor measurements before sending them all simultaneously, which leads to messages with bigger payload sizes. Sometimes a sensor node will collect a variety of measurements, while other times a sensor node may collect only one or two sensor inputs. In these different cases, we expect different levels of benefit from performing compression. To simulate this, and test for the different use cases, we will measure the compression ratio for different payload sizes. For the humidity and temperature datasets, we measure for 2, 4, 8, 16, 32, 64, 126, 256 and 512 bytes of data. For the mixed dataset, we only perform measurements for the payload sizes from 32 bytes and up, which is as small as the mixed data can get without separating the data blocks into smaller chunks. We expect our compression ratio to be worse when compressing smaller payloads at a time, as the compression algorithms we use tend to improve their compression method during compression, using data earlier in the set. This experiment will tell us the minimum amount of data we can compress at a time and still expect benefit. We perform this experiment on a regular computer rather than a microcontroller, and use the GNU Compiler Collection (GCC)[27] to compile our program. The result of the compression algorithms are the same regardless of the platform or compiler toolchain used, so the result of this is applicable to code performed on a sensor node.

4.3.2 Energy costs of compression and transmission In our second experiment, we measured the energy cost of performing compression and decompression on a microcontroller, as well as the energy cost of sending and receiving data, with and without compression. Through this experiment we intended to find out whether or not compressing data can be beneficial for the energy conservation of a sensor node. If it is beneficial, we want to find out how much energy we might save by compressing the data. Additionally, we could find out which of our selected compression algorithms performs the best for each task. To cover a variety of cases, we tested the effect of compression given multiple payload sizes, as we did with the first experiment. For this experiment, we went with 512 byte payloads and 64 byte payloads. We decided on 512 byte compression as is recommended in the S-LZW article, as described in 3.8.1. We also expect high compression ratios, leading to a lot of energy being saved. As for 64 byte payloads, this is the smallest size where we expected potential benefit for all of our datasets, after having performed our first experiment. In the case of 512 bytes, we performed our compression algorithms on

48 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.3. Experiments networks, through data compression each chunk of 512 bytes, then split the result up into multiple smaller packets for transfer, again as described in 3.8.1. For this experiment, we chose to split the result into 64 byte packets to send over the network. By splitting the result in multiple smaller packets, we mitigate the potential loss of energy that can occur if one packet fails to transfer correctly, as explained in 2.2.3. To keep the scope of this thesis focused, we did not look into the energy cost of losing packets, nor what packet size is best for minimizing the risk. Splitting the result into packets means that the energy cost of message transfer is highly dependent on the number of packets the compressed result requires to be transferred. However, as we run the compressions multiple times, on different parts of or dataset, this trait is less pronounced in our result. While the overall energy consumption can vary depending on packet size, our choice of 64 byte packets gives us a good general idea of the energy consumption. In the case of compressing the data in chunks of 64 bytes, we performed our compression algorithm on each chunk, and transmitted each result in a variable length packet. We created a program that can test compressing, decompressing, sending and receiving data. It has configurable options for whether the device should operate as a transmitter or receiver, which compression algorithm to use, and more. It performs each of the actions separately, and goes into sleep state between each action, which makes it easy to measure how much energy is spent on each unique action. The code also performs receiver timing as explained in 2.2.4, in order to minimize overhead energy loss when the receiver is waiting for a packet to arrive. Another option that can be set in the program is the transmit power ratio in dBm. Transmit power of a transmitter is explained in 2.2.2. For our experiments, we will measure transmissions at 10dBm and 19.5dBm. This represents two different broadcasting ranges. The energy usage of the transceiver also increases with dBm. 19.5dBm is the highest transmit power we can use for our chosen transceiver, and will help us analyze the maximum energy reduction we may achieve from using our compression algorithms. At 19.5dbm, the transceiver is measured to be able to transmit messages about 180 meters in dry weather[28]. By testing for two values, we can analyze how an increase in transmit power affects the energy usage in the transmitter, with and without compression. We expect that there will potentially be cases where compression is beneficial when transmitting at 19.5dBm, but not at 10dBm. We performed our measurements in a small wireless network consisting of two microcontroller nodes. We used the Silicon Labs Flex Gecko starter kit for these nodes, as these kits come with integrated transceivers and sophisticated energy measurement tools. For wireless transmission, we used a Silicon Labs proprietary wireless transfer protocol, and broadcasted on the 2.4Ghz frequencies, as they are commonly used frequencies for sensor networks.

49 Chapter 4. Experiments Energy efficiency in wireless sensor Section 4.3. Experiments networks, through data compression

The Flex Gecko microcontroller uses an ARM Cortex M4 Processor. The M4 is a powerful yet low energy processor design in the ARM Cortex M series. Additionally, it comes with on chip clocks and timers, and an extended energy management system, allowing for multiple energy modes, as described in 1.2.1[29]. We used the Simplicity Studio Integrated Development Environment (IDT) to compile the code and upload it to the microcontroller. The code was compiled using GCC-ARM, using Simplicity Studio’s default compilation settings. To measure the energy usage on the microcontroller, we used the on-chip Advanced Energy Monitor (AEM), which gives us continuous information on the current and voltage usage of our devices. The AEM can measure currents ranging from 0.1µA to 95mA. The current measurements are accurate within 0.1mA for currents above 250µA, and 1µA for currents below 250µA. The AEM samples at 10kHz, meaning 10 samples per millisecond. This gives us a good resolution for measuring the energy consumption of our actions. While we use a specific microcontroller and wireless network protocol in this experiment, the results will be applicable to other devices and protocols, as the transmission and reception cost of packets is typically proportional to the radio frequency, packet length, and transmission power, as described in 2.2.2.

50 Results and Conclusion

In this chapter, we will look at the results from our two experiments, and form conclusions based on them.

5.1 Compression efficiency

In our first experiment, we looked at the compression ratio of different compression algorithms, and how they vary with the size of the input data. We performed the experiment using the compression algorithms described in the previous chapter. Additionally, we tested using our variable width delta encoding algorithm in combination with each of the other compression algorithms.

5.1.1 Results Figures 5.1, 5.2 and 5.3 show the results of performing our compression algorithms on the humidity, temperature, and mixed datasets, respectively. The compression ratio is plotted on a logarithmic axis, to better display the difference between the various compression algorithms’ results. Additionally, we’ve split up each plot into two parts, one for algorithms without delta encoding, and one for algorithms used together with delta encoding. This is done both to prevent the plots from becoming too complicated, and because the results tend to vary noticeably when delta encoding was utilized. Horizontal dashed lines are added to the plots, to show where the two parts of each plot overlap. The exact results of the experiments can be found in the appendix, under 7.1.1.

51 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.1. Compression efficiency networks, through data compression

Compression ratios for compression of humidity data

4

2.83

2

1.41

1 None Snappy fpaq0 0.71 PPMs(5) PPMs(3) 0.5 PPMs(1) LZW 0.35 LZW_MC LZ4 Average compression ratio (higher is better) 2 4 8 16 32 64 128 256 512 5.66 ∆ encoding ∆ + Snappy 4 ∆ + fpaq0 ∆ + PPMs(5) 2.83 ∆ + PPMs(3) ∆ + PPMs(1) 2 ∆ + LZW ∆ + LZW_MC 1.41 ∆ + LZ4

1

0.71

0.5

0.35 Average compression ratio (higher is better) 2 4 8 16 32 64 128 256 512 Bytes compressed

Figure 5.1: Compression ratio results when applying compression algorithm to our 2048 bytes humidity sensor data set. The top image shows the result of our compression algorithms without first delta encoding the input data. The bottom image shows the results of applying delta encoding, as well as combining delta encoding with our other algorithms.

52 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.1. Compression efficiency networks, through data compression

Compression ratios for compression of sea surface temperature data

1.41

1

0.71 None Snappy fpaq0 PPMs(5) 0.5 PPMs(3) PPMs(1) LZW 0.35 LZW_MC LZ4 Average compression ratio (higher is better)

2.83

2

1.41

1 vw∆ encoding vw∆ + Snappy 0.71 vw∆ + fpaq0 vw∆ + PPMs(5) vw∆ + PPMs(3) 0.5 vw∆ + PPMs(1) vw∆ + LZW 0.35 vw∆ + LZW_MC vw∆ + LZ4 Average compression ratio (higher is better) 2 4 8 16 32 64 128 256 512 Bytes compressed

Figure 5.2: Compression ratio results when applying compression algorithm to our 2048 bytes sea surface temperature sensor data set. The top image shows the result of our compression algorithms without first delta encoding the input data. The bottom image shows the results of applying delta encoding, as well as combining delta encoding with our other algorithms.

53 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.1. Compression efficiency networks, through data compression

Compression ratios for compression of mixed data

None 1.68 Snappy fpaq0 PPMs(5) PPMs(3) 1.41 PPMs(1) LZW LZW_MC LZ4 1.19

1

0.84 Average compression ratio (higher is better)

1.41 ∆ encoding ∆ + Snappy ∆ + fpaq0 ∆ + PPMs(5) ∆ + PPMs(3) 1.19 ∆ + PPMs(1) ∆ + LZW ∆ + LZW_MC ∆ + LZ4 1

0.84 Average compression ratio (higher is better) 32 64 128 256 512 Bytes compressed

Figure 5.3: Compression ratio results when applying compression algorithm to our 2048 bytes mixed sensor data set. The top image shows the result of our compression algorithms without first delta encoding the input data. The bottom image shows the results of applying delta encoding, as well as combining delta encoding with our other algorithms.

54 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.1. Compression efficiency networks, through data compression

5.1.2 Conclusion Figures 5.1 through 5.3 show the compression ratio of the compression algorithms, when applied to three different datasets. As expected, all our algorithms perform well on large data inputs. Additionally, for the large datasets that consist only of one data type, we get generally improved results when first performing delta encoding on our datasets. At lower input sizes, we see that a portion of our algorithms produce a compression ratio of less than one. In these cases, our compression algorithms expand the size of the data, and thus will result in higher transfer energy costs, in addition to the cost of compression. For the humidity dataset, we see that we achieve favorable compression ratios with only 4 bytes, using fpaq0 without delta encoding. For the temperature dataset, we also achieve favorable compression ratios with 4 bytes, by using the variable width delta encoding algorithm on its own. For the mixed dataset, we required a minimum of 64 bytes, to achieve favorable compression ratios. At that size, fpaq0 without delta encoding achieved promising results, as well as all three PPMs’ algorithms, with and without delta encoding. We see that our variable width delta encoding algorithm performed well on the 16-bit temperature data, as it reliably reduced the 16-bit array to an 8-bit array with some escape sequences, almost halving the dataset in most cases. For the humidity data set however, where the delta encoding doesn’t perform any compression of its own, we see that it could have negative effects on the compression efficiency when used together with other algorithms on small datasets under 32 bytes. However, For larger datasets, it produced noticeably higher compression ratios, reaching as high as 4.93 when compressing 512 bytes with fpaq0. On the other hand, our delta encoding appears to have had strictly negative effects when applied to the mixed data dataset. This is likely because the data isn’t smooth, unlike the other two datasets, which resulted in our delta encoding causing the data to appear even more sporadic. One characteristic we notice is that fpaq0 appears to lose compression efficiency given big enough input data. We suspect that this is caused by two factors. The first factor is the adaptive probability model becoming too complicated when there are too many unique values in it, causing each compressed byte to require more bits to represent it. The second factor is how the sensor input data gradually changes over time, as we saw in figure 4.4. This caused the information given to the probability model in the earlier measurements no longer accurately represent the data in its later measurements.

55 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.1. Compression efficiency networks, through data compression

When it comes to PPM algorithm, we observe that the order generally had an insignificant effect on the resulting compression ratios. The mixed data dataset was the only one of the three datasets where there is a noticeable difference between the different PPM model orders. We suspect that this was caused by the length of repeating patterns in our datasets. The humidity and temperature datasets did not have many repeating patterns with length greater than 2, causing all three PPM orders to perform approximately equally well. Meanwhile, the mixed dataset likely did not have many repeating patterns of length 4 or more, causing similar compression ratios for PPMs(3) and PPMs(5). Given these results, we can see that there are potential benefits to the overall energy consumption on our devices, when performing our compression algorithms on sensor data. For our humidity and temperature data sets there are potential benefits from compression even with datasets as small as 8 bytes, at least for some of the compression algorithms. For the mixed data dataset, at least 64 bytes is required to achieve a positive compression ratio, which is equivalent to two mixed data blocks. For all of these cases, where reduced energy consumption is possible, the exact level of benefit depends on the cost of performing compression, and the cost of transferring data.

56 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

5.2 Energy costs of compression and transmission

In our second experiment, we analyzed the effect our compression algorithms have on the energy usage in a wireless sensor network, by performing measurements in a wireless sensor network made of two nodes. We performed the test using the compression algorithms outlined in the previous chapter. Additionally, we tested using delta encoding algorithms in combination with each of the compression algorithms.

5.2.1 Results For this experiment, we measured the energy usage for many different factors. For each dataset and compression algorithm combination, we measured energy usage of compression, decompression, data transmission at both 10dBm and 19.5dBm, and data reception. We also measured energy usage for compressing and sending payloads of 512 bytes, and smaller payloads of 64 bytes. All of the exact values can be found in the appendix at 7.1.2. For this chapter, we compare the information we are most interested in:

• For which compression algorithms and payload sizes does the decreased energy usage from smaller transmission make up for the energy cost of performing the compression?

• Likewise, in which cases does the decreased energy usage of reception from smaller transmission make up for the energy cost of decompression?

• What is the total cost for one hop, including compression, transmission, reception and decompression?

• How does the transmitter’s power ratio affect the energy usage; what is the difference between 10dBm and 19.5dBm?

To answer these question, we have compiled sorted lists for the various combined energy usages. This will give us an easy way to determine the energy saved by performing compression, depending on the case. We decided to show energy usage graphs for a single transmitter node, a single receiver node, and the combined cost of one transmitter and receiver. Figures 5.4 through 5.12 show us the various energy costs for 512 bytes compressed at a time. Figures 5.13 through 5.21 show the same for 64 bytes compressed data at a time.

57 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

∆ encoding 933.43 None 928.05 ∆ + LZ4 637.1 LZ4 635.29 fpaq0 588.65 Snappy 583.05 PPMs(5) 579.6 LZW_MC 500.89 LZW 488.1 ∆ + Snappy 481.87 ∆ + LZW_MC 452.5 ∆ + PPMs(5) 442.19

Compression algorithm PPMs(1) 402.32 PPMs(3) 402.19 ∆ + LZW 382.39 ∆ + fpaq0 334.13 ∆ + PPMs(3) 302.43 Compression cost ∆ + PPMs(1) 302.26 Transfer cost 10dBm

0 100 200 300 400 500 600 700 800 900 1,000 1,100 Total energy cost of transmitter, in µJ (Joined energy usage of compression and transmission)

Figure 5.4: The average energy cost of compressing and transmitting 512 bytes of humidity data at 10dBm, for various compression algorithms, with and without delta encoding.

58 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

PPMs(5) 1,231.27 LZW_MC 1,115.04 PPMs(3) 1,113.42 PPMs(1) 1,112.11 LZW 1,030.69 None 928.61 LZ4 894.49 Snappy 869.46 vw∆ + PPMs(5) 773.89 vw∆ + PPMs(1) 734.36 vw∆ + PPMs(3) 734.13 fpaq0 718.98

Compression algorithm vw∆ + LZW_MC 594.86 vw∆ + LZ4 557 vw∆ + Snappy 524.99 vw∆ encoding 520.88 vw∆ + LZW 486.96 Compression cost vw∆ + fpaq0 407.17 Transfer cost 10dBm

0 200 400 600 800 1,000 1,200 1,400 1,600 Total energy cost of transmitter, in µJ (Joined energy usage of compression and transmission)

Figure 5.5: The average energy cost of compressing and transmitting 512 bytes of sea surface temperature data at 10dBm, for various compression algorithms, with and without delta encoding.

59 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

∆ + PPMs(5) 1,291 ∆ + PPMs(1) 1,231.79 ∆ + PPMs(3) 1,231.67 ∆ + LZW_MC 1,069.42 PPMs(5) 1,025.66 PPMs(3) 961.58 PPMs(1) 961.04 ∆ + fpaq0 951.02 None 927.12 ∆ encoding 926.86 LZW_MC 887.12 fpaq0 849.35

Compression algorithm ∆ + LZW 835.68 ∆ + LZ4 791.31 ∆ + Snappy 784.47 LZ4 739.92 Snappy 736.42 Compression cost LZW 701.27 Transfer cost 10dBm

0 200 400 600 800 1,000 1,200 1,400 1,600 Total energy cost of transmitter, in µJ (Joined energy usage of compression and transmission)

Figure 5.6: The average energy cost of compressing and transmitting 512 bytes of mixed data at 10dBm, for various compression algorithms, with and without delta encoding.

60 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

∆ encoding 730.59 None 725.03 PPMs(5) 540.25 fpaq0 508.27 ∆ + LZ4 479.88 LZ4 470.48 Snappy 457.52 ∆ + PPMs(5) 410.16 ∆ + Snappy 386.4 LZW_MC 385.42 LZW 377.35 PPMs(1) 371.16

Compression algorithm PPMs(3) 370.72 ∆ + LZW_MC 336.78 ∆ + fpaq0 324.9 ∆ + PPMs(3) 285.61 ∆ + LZW 284.98 Reception cost ∆ + PPMs(1) 279.48 Decompression cost

0 100 200 300 400 500 600 700 800 900 Total energy cost of receiver, in µJ (Joined energy usage of reception and decompression)

Figure 5.7: The average energy cost of receiving and decompressing 512 bytes of compressed humidity data, for various compression algorithms, with and without delta encoding.

61 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

PPMs(5) 1,191.63 PPMs(3) 1,067.75 PPMs(1) 1,059.21 LZW 798.92 vw∆ + PPMs(5) 766.72 None 734.77 LZW_MC 728.25 vw∆ + PPMs(3) 723.68 vw∆ + PPMs(1) 722.63 Snappy 706.22 LZ4 666.91 fpaq0 608.71

Compression algorithm vw∆ + LZW_MC 460.6 vw∆ + Snappy 413.59 vw∆ + LZ4 410.7 vw∆ encoding 408.81 vw∆ + LZW 378.53 Reception cost vw∆ + fpaq0 351.87 Decompression cost

0 200 400 600 800 1,000 1,200 1,400 Total energy cost of receiver, in µJ (Joined energy usage of reception and decompression)

Figure 5.8: The average energy cost of receiving and decompressing 512 bytes of compressed temperature data, for various compression algorithms, with and without delta encoding.

62 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

∆ + PPMs(5) 1,231.37 ∆ + PPMs(3) 1,169.84 ∆ + PPMs(1) 1,169.28 PPMs(5) 978.94 PPMs(3) 911.01 PPMs(1) 907.61 ∆ + LZW_MC 822.65 ∆ + fpaq0 789.52 ∆ encoding 727.25 None 724.48 fpaq0 706.42 LZW_MC 677.02

Compression algorithm ∆ + LZW 652.91 ∆ + Snappy 612.62 ∆ + LZ4 596.76 Snappy 569.84 LZ4 550.73 Reception cost LZW 543.12 Decompression cost

0 200 400 600 800 1,000 1,200 1,400 1,600 Total energy cost of receiver, in µJ (Joined energy usage of reception and decompression)

Figure 5.9: The average energy cost of receiving and decompressing 512 bytes of compressed mixed data, for various compression algorithms, with and without delta encoding.

63 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

∆ encoding 1,664.01 None 1,653.08 PPMs(5) 1,119.84 ∆ + LZ4 1,116.98 LZ4 1,105.78 fpaq0 1,096.92 Snappy 1,040.57 LZW_MC 886.31 ∆ + Snappy 868.26 LZW 865.45 ∆ + PPMs(5) 852.34 ∆ + LZW_MC 789.28

Compression algorithm PPMs(1) 773.48 PPMs(3) 772.91 ∆ + LZW 667.37 ∆ + fpaq0 659.03 Compression cost ∆ + PPMs(3) 588.04 Transfer cost 10dBm Reception cost ∆ + PPMs(1) 581.73 Decompression cost

0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2,000 Total energy cost of transmitter and receiver, in µJ

Figure 5.10: The average energy cost of one full hop in a sensor network, including data compression, transmission at 10dBm, reception and decompression of 512 bytes of humidity, for various compression algorithms, with and without delta encoding.

64 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

PPMs(5) 2,422.9 PPMs(3) 2,181.17 PPMs(1) 2,171.32 LZW_MC 1,843.29 LZW 1,829.61 None 1,663.38 Snappy 1,575.68 LZ4 1,561.4 vw∆ + PPMs(5) 1,540.62 vw∆ + PPMs(3) 1,457.81 vw∆ + PPMs(1) 1,456.99 fpaq0 1,327.68

Compression algorithm vw∆ + LZW_MC 1,055.46 vw∆ + LZ4 967.7 vw∆ + Snappy 938.57 vw∆ encoding 929.69 Compression cost vw∆ + LZW 865.49 Transfer cost 10dBm Reception cost vw∆ + fpaq0 759.04 Decompression cost

0 400 800 1,200 1,600 2,000 2,400 2,800 Total energy cost of transmitter and receiver, in µJ

Figure 5.11: The average energy cost of one full hop in a sensor network, including data compression, transmission at 10dBm, reception and decompression of 512 bytes of sea surface temperature data, for various compression algorithms, with and without delta encoding.

65 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

∆ + PPMs(5) 2,522.37 ∆ + PPMs(3) 2,401.51 ∆ + PPMs(1) 2,401.07 PPMs(5) 2,004.61 ∆ + LZW_MC 1,892.07 PPMs(3) 1,872.59 PPMs(1) 1,868.65 ∆ + fpaq0 1,740.55 ∆ encoding 1,654.11 None 1,651.59 LZW_MC 1,564.14 fpaq0 1,555.76

Compression algorithm ∆ + LZW 1,488.59 ∆ + Snappy 1,397.09 ∆ + LZ4 1,388.07 Snappy 1,306.26 Compression cost LZ4 1,290.64 Transfer cost 10dBm Reception cost LZW 1,244.39 Decompression cost

0 500 1,000 1,500 2,000 2,500 3,000 3,500 Total energy cost of transmitter and receiver, in µJ

Figure 5.12: The average energy cost of one full hop in a sensor network, including data compression, transmission at 10dBm, reception and decompression of 512 bytes of mixed data, for various compression algorithms, with and without delta encoding.

66 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

PPMs(5) 126.25 ∆ + LZ4 124.19 LZ4 123.94 ∆ + PPMs(5) 113.73 PPMs(1) 112.36 PPMs(3) 112.2 ∆ encoding 106.99 None 106.31 ∆ + PPMs(3) 98.01 ∆ + PPMs(1) 97.88 Snappy 92.2 ∆ + Snappy 91.58

Compression algorithm LZW 82.42 LZW_MC 81.89 fpaq0 81.41 ∆ + LZW 78.69 ∆ + fpaq0 76.94 Compression cost ∆ + LZW_MC 76.73 Transfer cost 10dBm

0 20 40 60 80 100 120 140 160 180 Total energy cost of transmitter, in µJ (Joined energy usage of compression and transmission)

Figure 5.13: The average energy cost of compressing and transmitting 64 bytes of humidity data at 10dBm, for various compression algorithms, with and without delta encoding.

67 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

PPMs(5) 168.33 PPMs(1) 162.2 PPMs(3) 161.38 LZ4 142.25 LZW_MC 132.48 vw∆ + PPMs(5) 129.13 vw∆ + PPMs(3) 128.27 vw∆ + PPMs(1) 128.22 LZW 120.04 Snappy 110.82 None 106.69 vw∆ + LZ4 105.18

Compression algorithm fpaq0 101.11 vw∆ + LZW_MC 89.12 vw∆ + LZW 80.17 vw∆ + fpaq0 75.06 vw∆ + Snappy 72.38 Compression cost vw∆ encoding 68.02 Transfer cost 10dBm

0 20 40 60 80 100 120 140 160 180 200 Total energy cost of transmitter, in µJ (Joined energy usage of compression and transmission)

Figure 5.14: The average energy cost of compressing and transmitting 64 bytes of sea surface temperature data at 10dBm, for various compression algorithms, with and without delta encoding.

68 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

∆ + PPMs(5) 194.44 ∆ + PPMs(1) 193.61 ∆ + PPMs(3) 193.61 PPMs(5) 176.05 PPMs(1) 173.04 PPMs(3) 172.99 ∆ + LZW_MC 155.75 LZW_MC 148.25 ∆ + LZ4 143.04 LZ4 140.95 ∆ + fpaq0 131.79 fpaq0 123.88

Compression algorithm ∆ + LZW 120.99 LZW 116.54 ∆ + Snappy 112.27 Snappy 108.99 ∆ encoding 106.76 Compression cost None 106.6 Transfer cost 10dBm

0 20 40 60 80 100 120 140 160 180 200 220 240 260 Total energy cost of transmitter, in µJ (Joined energy usage of compression and transmission)

Figure 5.15: The average energy cost of compressing and transmitting 64 bytes of mixed data at 10dBm, for various compression algorithms, with and without delta encoding.

69 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

PPMs(5) 153.97 PPMs(1) 142.16 ∆ + PPMs(5) 142.01 PPMs(3) 140.5 ∆ + PPMs(3) 129.01 ∆ + PPMs(1) 125.59 fpaq0 110.42 ∆ + fpaq0 105.15 LZW_MC 101.13 ∆ encoding 98.46 None 97.45 ∆ + LZW_MC 96.08

Compression algorithm LZW 93.31 ∆ + LZW 92.67 ∆ + LZ4 91.35 Snappy 91.13 ∆ + Snappy 90.21 Reception cost LZ4 90.1 Decompression cost

0 20 40 60 80 100 120 140 160 180 200 Total energy cost of receiver, in µJ (Joined energy usage of reception and decompression)

Figure 5.16: The average energy cost of receiving and decompressing 64 bytes of compressed humidity, for various compression algorithms, with and without delta encoding.

70 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

PPMs(5) 184.78 PPMs(1) 177.6 PPMs(3) 177.05 vw∆ + PPMs(1) 154.82 vw∆ + PPMs(5) 154.1 vw∆ + PPMs(3) 153.87 LZW_MC 116.34 fpaq0 113.74 LZW 103.34 vw∆ + LZW_MC 102.23 vw∆ + fpaq0 100.48 LZ4 98.56

Compression algorithm Snappy 97.84 None 97.01 vw∆ + LZW 94.15 vw∆ + LZ4 86.98 vw∆ + Snappy 85.45 Reception cost vw∆ encoding 85.71 Decompression cost

0 20 40 60 80 100 120 140 160 180 200 220 Total energy cost of receiver, in µJ (Joined energy usage of reception and decompression)

Figure 5.17: The average energy cost of receiving and decompressing 64 bytes of compressed temperature, for various compression algorithms, with and without delta encoding.

71 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

∆ + PPMs(5) 189.5 ∆ + PPMs(1) 188.51 ∆ + PPMs(3) 186.71 PPMs(3) 178.89 PPMs(5) 177.86 PPMs(1) 174.56 ∆ + LZW_MC 126.28 fpaq0 124 ∆ + fpaq0 123.38 LZW_MC 122.66 ∆ + LZW 106.25 LZW 103.56

Compression algorithm Snappy 98.71 ∆ + Snappy 98.22 LZ4 98.05 ∆ encoding 97.74 None 96.72 Reception cost ∆ + LZ4 95.22 Decompression cost

0 20 40 60 80 100 120 140 160 180 200 220 240 Total energy cost of receiver, in µJ (Joined energy usage of reception and decompression)

Figure 5.18: The average energy cost of receiving and decompressing 64 bytes of compressed mixed, for various compression algorithms, with and without delta encoding.

72 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

PPMs(5) 280.22 ∆ + PPMs(5) 255.74 PPMs(1) 254.51 PPMs(3) 252.7 ∆ + PPMs(3) 227.03 ∆ + PPMs(1) 223.47 ∆ + LZ4 215.54 LZ4 214.04 ∆ encoding 205.45 None 203.76 fpaq0 191.83 Snappy 183.32

Compression algorithm LZW_MC 183.02 ∆ + fpaq0 182.09 ∆ + Snappy 181.79 LZW 175.72 Compression cost ∆ + LZW_MC 172.81 Transfer cost 10dBm Reception cost ∆ + LZW 171.36 Decompression cost

0 50 100 150 200 250 300 350 400 Total energy cost of transmitter and receiver, in µJ

Figure 5.19: The average energy cost of one full hop in a sensor network, including data compression, transmission at 10dBm, reception and decompression of 64 bytes of humidity data, for various compression algorithms, with and without delta encoding.

73 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

PPMs(5) 353.11 PPMs(1) 339.8 PPMs(3) 338.43 vw∆ + PPMs(5) 283.23 vw∆ + PPMs(1) 283.04 vw∆ + PPMs(3) 282.13 LZW_MC 248.82 LZ4 240.81 LZW 223.38 fpaq0 214.85 Snappy 208.66 None 203.7

Compression algorithm vw∆ + LZ4 192.16 vw∆ + LZW_MC 191.35 vw∆ + fpaq0 175.54 vw∆ + LZW 174.32 Compression cost vw∆ + Snappy 157.83 Transfer cost 10dBm Reception cost vw∆ encoding 153.73 Decompression cost

0 50 100 150 200 250 300 350 400 Total energy cost of receiver, in µJ (Joined energy usage of reception and decompression)

Figure 5.20: The average energy cost of one full hop in a sensor network, including data compression, transmission at 10dBm, reception and decompression of 64 bytes of sea surface temperature data, for various compression algorithms, with and without delta encoding.

74 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

∆ + PPMs(5) 383.94 ∆ + PPMs(1) 382.12 ∆ + PPMs(3) 380.32 PPMs(5) 353.91 PPMs(3) 351.88 PPMs(1) 347.6 ∆ + LZW_MC 282.03 LZW_MC 270.9 ∆ + fpaq0 255.17 fpaq0 247.88 LZ4 239 ∆ + LZ4 238.25

Compression algorithm ∆ + LZW 227.24 LZW 220.09 ∆ + Snappy 210.49 Snappy 207.7 Compression cost ∆ encoding 204.5 Transfer cost 10dBm Reception cost None 203.32 Decompression cost

0 50 100 150 200 250 300 350 400 450 500 Total energy cost of receiver, in µJ (Joined energy usage of reception and decompression)

Figure 5.21: The average energy cost of one full hop in a sensor network, including data compression, transmission at 10dBm, reception and decompression of 64 bytes of mixed data, for various compression algorithms, with and without delta encoding.

75 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression

5.2.2 Conclusion Figures 5.4 through 5.21 show a variety of energy consumption measurements in a simple wireless sensor network. In each figure, a vertical line is drawn to show the energy usage without compression or encoding. This allows us to easily see which of the compression algorithms resulted in less energy consumption than without compression, for each use case. From our results, we can see that there was almost always a compression algorithm that reduced the energy usage in a system for the payloads that we tested, wven when using a transmit power of only 10dBm. We also see that the compression can have tremendous effects on the energy usage. For example, in figure 5.10 we see the effects of compression on our humidity data with 512 byte payloads. Here, using PPMs(1) with delta encoding reduced the energy usage of the transmission by about one third. We see that no one compression algorithm performed best for all use cases, and that the best performing algorithm depended heavily on what the input data was. The wireless sensor network energy distribution in some cases also havd an effect on which algorithm performs best. For example, for transferring 64 bytes of humidity data, LZW with delta encoding used the lowest amount of energy overall, as shown in figure 5.19. However, if we have a system where only the transmitter is energy critical, and the receiver can afford to use as much energy as it wants, then it might be better to pick the algorithm with lowest energy cost on the transmitter side only, as displayed in figure 5.13, so LZW_MC with delta encoding would likely be the best among our compression algorithms for this use case. The only case where none of our compression algorithms had a positive effect, was the 64 bytes mixed data. This is likely caused by the fact that it consists of only two measurements, and thus doesn’t have many repeating values and patterns. There are still ways in which we could potentially reduce the energy consumption of this as well. In chapter 6, future work, we look at a few potential methods to improve our results. As expected from the results of the first experiment, the different orders of PPMs produced relatively close compression ratios, which means that the total energy cost for them was primarily dependent on their compression and decompression costs. Unsurprisingly, PPMs(1) tended to outperform the other two in terms of compression and decompression cost, and thus tended to be more energy efficient. Another interesting note we spotted from the results was how Snappy remained a safe compression algorithm to use, regardless of the dataset and payload size. In every case, Snappy performed either better than no compression, or in the worst case very slightly worse. It never consumeed

76 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.2. Energy cost of compression networks, through data compression noticeably more energy than the no compression alternative. As for the transmission power ratio’s effect on energy consumption, figure 5.22 shows how increasing the power ratio affected the results from figure 5.19. The costs of transmission got raised by about 100%, which means that there was much more weight on the compression ratio of the algorithm than on its power consumption. In this case, LZW_MC and fpaq0 surpassed LZW as the more energy efficient compression algorithms.

PPMs(5) 344.23 PPMs(1) 317.29 PPMs(3) 315.47 ∆ + PPMs(5) 312.83 None 312.43 ∆ encoding 312.3 ∆ + LZ4 303.02 LZ4 302.92 ∆ + PPMs(3) 282.41 ∆ + PPMs(1) 278.88 Snappy 274.93 ∆ + Snappy 270.73

Compression algorithm fpaq0 251.5 LZW 251.01 LZW_MC 245.64 ∆ + LZW 240.11 Compression cost ∆ + fpaq0 233.61 Transfer cost 19.5dBm Reception cost ∆ + LZW_MC 230.07 Decompression cost

0 40 80 120 160 200 240 280 320 360 400 440 480 520 Total energy cost of transmitter and receiver, in µJ

Figure 5.22: The average energy cost of one full hop in a sensor network, including data compression, transmission at 19.5dBm, reception and decompression of 512 bytes of humidity, for various compression algorithms, with and without delta encoding.

77 Chapter 5. Results and Conclusion Energy efficiency in wireless sensor Section 5.3. Summary networks, through data compression

5.3 Summary

In this thesis, we analyzed the effects of compressing sensor data before sending it over the network, by performing experiments and measurements in a simple wireless sensor network. We also provided various methods to reduce energy consumption in a wireless sensor networks, as well as comparisons between various datasets and compression algorithms. From our observations we have concluded that compression can have a very positive effect on the energy consumption of a wireless sensor network. The effectiveness of compression can vary depending on the use case and what sensor data is being worked with, as well as which compression algorithm used. Through this thesis, we have shown that it can be valuable to find a compression algorithm that is appropriate for each given case. The results can make up for the time and effort it may take.

78 Future work

In this thesis we looked at the properties of wireless sensor networks, and showed through our experiments how compression can have a positive effect on the energy usage in wireless sensor networks. We also conducted energy measurements in a simple wireless sensor network, and analyzed our results from the experiments, giving an insight into the effects compression can have on the energy usage in wireless sensor networks. We hope that this thesis can be of use for companies and individuals working with wireless sensor networks, as well being a basis and a stepping stone for future work in the field of wireless sensor networks. In this chapter we will look at a few potential avenues for further research within this field.

6.1 Compression algorithms

In this thesis we looked at a variety of different compression algorithms when performing our measurements. We also tested our compression algorithms used in combination with our delta encoding algorithms. For future studies, it may be of interest to look into further compression algorithms beyond the ones tested in this thesis. Some compression algorithms that may be interesting to experiment on include other Lempel–Ziv based algorithms such as Lempel–Ziv–Markov chain algorithm and Lempel–Ziv–Oberhumer. Another interesting avenue are compression algorithms that focus on using "Single Instruction, Multiple Data" based operations. Additionally, new data compression algorithms are being developed over time, as well as newer and better implementations of already existing algorithms.

79 Chapter 6. Future work Energy efficiency in wireless sensor Section 6.2. Retained compression information networks, through data compression

6.2 Retained compression information

When measuring the compression efficiency of small packets, we compressed each package individually, and measured the compression ratio and energy usage resulting from that. One interesting alternative is to preserve data between multiple compressions. In other words, we can keep compression information from one compression execution and use in the next. For example, when using LZW, we are building up a dictionary that we use for our compressed output. Instead of discarding our dictionary after each performed compression, we can keep it in memory to continue using it the next time we perform a compression, and to continue building on to it. This could allow us to have the benefit of compressing large chunks of data, without having to save up the data on the sensor node and compressing it all at once; instead, the data can be efficiently compressed and sent as it is measured.

6.3 Struct delta encoding

In this thesis, we tested delta encoding algorithms on all our datasets, and also tested them in combination with compression algorithms. For our mixed data dataset we ran a regular byte-by-byte delta encoding algorithm. One alternative delta encoding method that could improve the compression ratio of the mixed data when used in conjunction with compression algorithms, is struct based delta encoding, where we store the delta between consecutive structs, instead of consecutive bytes. In other words, for each block of mixed data, we perform delta encoding for each variable in the mixed data block with the corresponding variable in the previous mixed data block. This can be especially useful when used together with the above mentioned retained compression information strategy.

80 Bibliography

[1] Luiz André Barroso. “The Price of Performance”. In: Queue 3.7 (Sept. 2005), pp. 48–53. ISSN: 1542-7730. DOI: 10 . 1145 / 1095408 . 1095420. URL: http://doi.acm.org/10.1145/1095408.1095420. [2] Aqeel Mahesri and Vibhore Vardhan. “Power Consumption Breakdown on a Modern Laptop”. In: Proceedings of the 4th International Conference on Power-Aware Computer Systems. PACS’04. Portland, OR: Springer-Verlag, 2005, pp. 165–180. ISBN: 3-540-29790-1, 978-3-540-29790-1. DOI: 10 . 1007 / 11574859 _ 12. URL: http://dx.doi.org/10.1007/11574859_12. [3] Aaron Carroll and Gernot Heiser. “An Analysis of Power Consumption in a Smartphone”. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference. USENIXATC’10. Boston, MA: USENIX Association, 2010, pp. 21–21. URL: http://dl.acm.org/citation.cfm?id=1855840.1855861. [4] Sarah Murry. ARM’s Reach: 50 Billion Chip Milestone. 2014. URL: http://www.broadcom.com/blog/chip-design/arms-reach- 50-billion-chip-milestone-/. [5] F. Yao, A. Demers, and S. Shenker. “A scheduling model for reduced CPU energy”. In: Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on. Oct. 1995, pp. 374–382. DOI: 10.1109/SFCS.1995.492493. [6] G. E. Moore. “Cramming more components onto integrated circuits, Reprinted from Electronics, volume 38, number 8, April 19, 1965, pp.114 ff.” In: IEEE Solid-State Circuits Society Newsletter 11.5 (Sept. 2006), pp. 33–35. ISSN: 1098-4232. DOI: 10.1109/N-SSC.2006.4785860.

81 Bibliography Energy efficiency in wireless sensor Section Bibliography networks, through data compression

[7] Juan M. Cebrián, Lasse Natvig, and Jan Christian Meyer. “Performance and energy impact of parallelization and vectorization techniques in modern microprocessors”. In: Computing 96.12 (2014), pp. 1179–1193. ISSN: 1436-5057. DOI: 10 . 1007 / s00607 - 013 - 0366 - 5. URL: http://dx.doi.org/10.1007/s00607-013-0366-5. [8] G. J. Pottie and W. J. Kaiser. “Wireless Integrated Network Sensors”. In: Commun. ACM 43.5 (May 2000), pp. 51–58. ISSN: 0001-0782. DOI: 10 . 1145 / 332833 . 332838. URL: http://doi.acm.org/10.1145/332833.332838. [9] Christopher M. Sadler and Margaret Martonosi. “Data Compression Algorithms for Energy-constrained Devices in Delay Tolerant Networks”. In: Proceedings of the 4th International Conference on Embedded Networked Sensor Systems. SenSys ’06. Boulder, Colorado, USA: ACM, 2006, pp. 265–278. ISBN: 1-59593-343-3. DOI: 10 . 1145 / 1182807 . 1182834. URL: http://doi.acm.org/10.1145/1182807.1182834. [10] “IEEE Standard Letter Designations for Radar-Frequency Bands”. In: IEEE Std 521-2002 (Revision of IEEE Std 521-1984) (2003), 0_1–3. DOI: 10.1109/IEEESTD.2003.94224. [11] “IEEE Standard for Information technology–Telecommunications and information exchange between systems Local and metropolitan area networks–Specific requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications”. In: IEEE Std 802.11-2012 (Revision of IEEE Std 802.11-2007) (Mar. 2012), pp. 1–2793. DOI: 10.1109/IEEESTD.2012.6178212. [12] “IEEE Standard for Local and metropolitan area networks–Part 15.4: Low-Rate Wireless Personal Area Networks (LR-WPANs)”. In: IEEE Std 802.15.4-2011 (Revision of IEEE Std 802.15.4-2006) (Sept. 2011), pp. 1–314. DOI: 10.1109/IEEESTD.2011.6012487. [13] DS18B20 Programmable Resolution 1-Wire Digital Thermometer. English. Version 19-7487; Rev 4; 1/15. Maxim Integrated Products, Inc. 20 pp. [14] D. A. Huffman. “A Method for the Construction of Minimum-Redundancy Codes”. In: Proceedings of the IRE 40.9 (Sept. 1952), pp. 1098–1101. ISSN: 0096-8390. DOI: 10.1109/JRPROC.1952.273898. [15] J. J. Rissanen. “Generalized Kraft Inequality and Arithmetic Coding”. In: IBM Journal of Research and Development 20.3 (May 1976), pp. 198–203. ISSN: 0018-8646. DOI: 10.1147/rd.203.0198.

82 Bibliography Energy efficiency in wireless sensor Section Bibliography networks, through data compression

[16] J. Cleary and I. Witten. “Data Compression Using Adaptive Coding and Partial String Matching”. In: IEEE Transactions on Communications 32.4 (Apr. 1984), pp. 396–402. ISSN: 0090-6778. DOI: 10.1109/TCOM.1984.1096090. [17] J. Ziv and A. Lempel. “A universal algorithm for sequential data compression”. In: IEEE Transactions on 23.3 (May 1977), pp. 337–343. ISSN: 0018-9448. DOI: 10.1109/TIT.1977.1055714. [18] T. A. Welch. “A Technique for High-Performance Data Compression”. In: Computer 17.6 (June 1984), pp. 8–19. ISSN: 0018-9162. DOI: 10. 1109/MC.1984.1659158. [19] Google Inc. Snappy. Version 1.1.4. Jan. 5, 2017. URL: http://google.github.io/snappy/. [20] Yann Collet. LZ4. Version 1.7.6. Apr. 9, 2017. URL: https://github.com/lz4/lz4/. [21] Yann Collet. LZ4 Homepage. 2017. URL: http://lz4.github.io/ lz4/ (visited on 06/30/2017). [22] Srod Karim. LZW. Version 1.1.2. July 31, 2017. URL: https://github.com/sind/lzw/. [23] D. Shkarin. PPMs. Version var.J. Feb. 21, 2006. URL: http://compression.ru/ds/. [24] Matt Mahoney. FPAQ0. Sept. 3, 2004. URL: http://mattmahoney. net/dc/. [25] . Sambrotto and Lamont-Doherty Earth Observatory (LDEO) Columbia University. HLY-07-01 SCS One Minute Data, Version 1.0. 2009. DOI: 10.5065/D6NG4NNJ. [26] Srod Karim. Compression Experiments. Version 1.0.0. Aug. 1, 2017. URL: https://github.com/Sind/compression-experiments. [27] Richard M. Stallman et al. Using the GNU Compiler Collection (GCC). English. Using the GNU Compiler Collection. Boston, 2017. [28] UG147: Flex Gecko 2.4 GHz, 20 dBm Range Test Demo User’s Guide. English. Version Rev. 0.3. Silicon Labs. 13 pp. [29] EFR32xG1 Wireless Gecko Reference Manual. English. Version Preliminary Rev. 0.6. Silicon Labs. 993 pp.

83 Appendix

7.1 Results

In chapter 5 we analyzed the results from the experiments we outlined in 4.3, and created graphs to illustrate them. As it can be hard to read the exact measurement values from the graphs, we show here all of our measurements in tables, for ease of readability.

7.1.1 Experiment 1 Here we show the results from our first experiment, "Compression Efficiency", as outlined in 4.3.1. To make the results easier to read, we have colored in compression ratios greater than one in green, and the compression ratios less than one in a red. The red fields represent undesirable compression ratios, while the green ones represent desirable compression ratios. Fields that are white indicate a compression ratio of one.

84 Chapter 7. Appendix Energy efficiency in wireless sensor Section 7.1. Results networks, through data compression

Algorithm 2 4 8 16 32 64 128 256 512

None 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Snappy 0.50 0.67 0.80 0.89 1.09 1.26 1.39 1.54 1.67

fpaq0 0.86 1.12 1.50 1.92 2.27 2.46 2.49 2.35 2.16

PPMs(5) 0.33 0.53 0.83 1.23 1.70 2.16 2.61 3.05 3.41

PPMs(3) 0.33 0.53 0.82 1.22 1.68 2.16 2.63 3.08 3.45

PPMs(1) 0.32 0.54 0.84 1.25 1.73 2.24 2.78 3.26 3.66

LZW 0.50 0.74 1.02 1.21 1.44 1.67 1.88 2.12 2.31

LZW_MC 0.60 0.82 1.14 1.47 1.82 2.17 2.42 2.64 2.74

LZ4 0.67 0.80 0.89 0.98 1.15 1.28 1.38 1.46 1.52

∆ encoding 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

∆ + Snappy 0.50 0.67 0.80 0.89 1.08 1.28 1.49 1.76 1.96

∆ + fpaq0 0.67 0.90 1.22 1.66 2.24 2.98 3.74 4.42 4.93

∆ + PPMs(5) 0.33 0.50 0.79 1.21 1.79 2.48 3.20 3.85 4.34

∆ + PPMs(3) 0.33 0.50 0.78 1.20 1.79 2.52 3.24 3.97 4.52

∆ + PPMs(1) 0.33 0.52 0.79 1.22 1.83 2.60 3.43 4.21 4.79

∆ + LZW 0.50 0.70 0.95 1.17 1.49 1.85 2.25 2.69 3.11

∆ + LZW_MC 0.50 0.70 1.00 1.39 1.90 2.46 2.94 3.25 3.50

∆ + LZ4 0.67 0.80 0.89 0.94 1.13 1.30 1.41 1.53 1.59

Figure 7.1: Compression ratios for compression of humidity data, given different algorithms and input sizes.

85 Chapter 7. Appendix Energy efficiency in wireless sensor Section 7.1. Results networks, through data compression

Algorithm 2 4 8 16 32 64 128 256 512

None 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Snappy 0.50 0.67 0.80 0.89 0.95 0.98 1.00 1.00 1.02

fpaq0 0.67 0.90 1.10 1.30 1.45 1.55 1.58 1.54 1.53

PPMs(5) 0.33 0.50 0.71 0.95 1.16 1.34 1.46 1.54 1.63

PPMs(3) 0.33 0.50 0.71 0.95 1.16 1.34 1.47 1.55 1.63

PPMs(1) 0.33 0.50 0.72 0.95 1.17 1.35 1.48 1.56 1.63

LZW 0.50 0.68 0.83 0.90 0.96 1.02 1.09 1.14 1.16

LZW_MC 0.50 0.68 0.84 0.97 1.08 1.16 1.23 1.28 1.31

LZ4 0.67 0.80 0.89 0.89 0.95 0.98 1.01 1.02 1.04

vw∆ encoding 1.00 1.33 1.59 1.76 1.86 1.91 1.94 1.95 1.96

vw∆ + Snappy 0.50 0.80 1.14 1.44 1.67 1.80 1.85 1.89 1.92

vw∆ + fpaq0 0.67 1.03 1.41 1.80 2.16 2.45 2.65 2.77 2.87

vw∆ + PPMs(5) 0.33 0.56 0.87 1.22 1.57 1.89 2.16 2.38 2.57

vw∆ + PPMs(3) 0.33 0.56 0.87 1.22 1.57 1.89 2.16 2.38 2.56

vw∆ + PPMs(1) 0.33 0.56 0.87 1.22 1.57 1.88 2.16 2.38 2.57

vw∆ + LZW 0.50 0.80 1.14 1.35 1.58 1.72 1.85 1.96 2.12

vw∆ + LZW_MC 0.50 0.80 1.03 1.34 1.62 1.84 2.01 2.12 2.22

vw∆ + LZ4 0.67 1.00 1.32 1.58 1.67 1.80 1.88 1.92 1.94

Figure 7.2: Compression ratios for compression of sea surface temperature data, given different algorithms and input sizes.

86 Chapter 7. Appendix Energy efficiency in wireless sensor Section 7.1. Results networks, through data compression

Algorithm 32 64 128 256 512

None 1.00 1.00 1.00 1.00 1.00

Snappy 0.94 1.00 1.11 1.19 1.27

fpaq0 0.99 1.08 1.15 1.19 1.22

PPMs(5) 0.88 1.12 1.38 1.60 1.75

PPMs(3) 0.88 1.11 1.38 1.60 1.76

PPMs(1) 0.88 1.11 1.36 1.56 1.71

LZW 0.87 1.01 1.15 1.29 1.41

LZW_MC 0.81 0.94 1.07 1.19 1.33

LZ4 0.94 1.01 1.12 1.22 1.28

∆ encoding 1.00 1.00 1.00 1.00 1.00

∆ + Snappy 0.94 0.96 1.04 1.12 1.19

∆ + fpaq0 0.92 0.98 1.03 1.07 1.09

∆ + PPMs(5) 0.87 1.01 1.16 1.29 1.36

∆ + PPMs(3) 0.87 1.01 1.17 1.29 1.37

∆ + PPMs(1) 0.87 1.01 1.16 1.27 1.34

∆ + LZW 0.87 0.95 1.05 1.13 1.18

∆ + LZW_MC 0.79 0.88 0.96 1.04 1.12

∆ + LZ4 0.94 0.99 1.07 1.15 1.20

Figure 7.3: Compression ratios for compression of mixed data, given different algorithms and input sizes.

87 Chapter 7. Appendix Energy efficiency in wireless sensor Section 7.1. Results networks, through data compression

7.1.2 Experiment 2 Here we show the results from our second experiment, "Energy Costs of compression and transmission", as outlined in 4.3.2. The tables show energy measurements for compression the data, transmitting the data with a 10dBm Transfer power, transmitting the data with 19.5dBm, receiving the data, and decompressing the data. All energy measurements are in microjoules (µJ).

Comp- Comp- 10dBm 19.5dBm Decomp- Reception ression ression transmission transmission ression (µJ) algorithm (µJ) (µJ) (µJ) (µJ)

None 0.00 928.05 1,866.36 725.03 0.00

Snappy 12.35 570.70 1,141.77 455.71 1.81

fpaq0 123.08 465.57 933.84 371.54 136.73

PPMs(5) 269.29 310.31 622.64 253.28 286.97

PPMs(3) 142.50 259.68 519.42 212.16 158.57

PPMs(1) 142.53 259.79 519.12 212.44 158.72

LZW 48.74 439.36 881.95 352.56 24.80

LZW_MC 139.08 361.81 726.31 290.32 95.10

LZ4 39.79 595.50 1,193.63 468.02 2.46

∆ encoding 4.66 928.77 1,867.17 729.16 1.43

∆ + Snappy 16.77 465.10 934.12 381.39 5.01

∆ + fpaq0 126.89 207.24 415.53 186.01 138.89

∆ + PPMs(5) 181.54 260.65 519.30 221.16 189.00

∆ + PPMs(3) 93.83 208.59 415.64 185.98 99.63

∆ + PPMs(1) 93.76 208.50 415.64 179.90 99.57

∆ + LZW 70.92 311.47 623.23 262.87 22.11

∆ + LZW_MC 140.34 312.15 623.44 261.91 74.87

∆ + LZ4 44.04 593.06 1,192.92 473.85 6.02

Figure 7.4: Energy usage for compression and transmission of humidity data, given different algorithms, when compressing 512 bytes at a time.

88 Chapter 7. Appendix Energy efficiency in wireless sensor Section 7.1. Results networks, through data compression

Comp- Comp- 10dBm 19.5dBm Decomp- Reception ression ression transmission transmission ression (µJ) algorithm (µJ) (µJ) (µJ) (µJ)

None 0.00 928.61 1,867.23 734.77 0.00

Snappy 8.34 861.12 1,815.15 706.13 0.09

fpaq0 123.85 595.13 1,193.79 470.74 137.96

PPMs(5) 663.05 568.22 1,141.87 449.44 742.19

PPMs(3) 568.36 545.06 1,090.19 439.60 628.15

PPMs(1) 567.44 544.67 1,089.55 432.15 627.07

LZW 230.30 800.39 1,607.94 628.01 170.90

LZW_MC 392.37 722.67 1,452.37 568.84 159.41

LZ4 40.75 853.74 1,712.01 666.64 0.27

vw∆ encoding 2.34 518.55 1,037.96 408.53 0.27

vw∆ + Snappy 6.91 518.08 1,037.80 412.31 1.28

vw∆ + fpaq0 69.92 337.25 675.19 276.04 75.83

vw∆ + PPMs(5) 386.44 387.46 778.25 313.02 453.70

vw∆ + PPMs(3) 346.69 387.44 778.21 312.29 411.39

vw∆ + PPMs(1) 346.74 387.62 777.95 310.97 411.66

vw∆ + LZW 48.59 438.37 881.97 351.53 27.00

vw∆ + LZW_MC 156.46 438.41 882.47 353.90 106.70

vw∆ + LZ4 40.13 516.87 1,038.26 409.70 0.99

Figure 7.5: Energy usage for compression and transmission of sea surface temperature data, given different algorithms, when compressing 512 bytes at a time.

89 Chapter 7. Appendix Energy efficiency in wireless sensor Section 7.1. Results networks, through data compression

Comp- Comp- 10dBm 19.5dBm Decomp- Reception ression ression transmission transmission ression (µJ) algorithm (µJ) (µJ) (µJ) (µJ)

None 0.00 927.12 1,867.52 724.48 0.00

Snappy 11.60 724.82 1,452.59 569.36 0.48

fpaq0 125.06 724.29 1,453.03 567.13 139.29

PPMs(5) 483.27 542.39 1,089.51 429.80 549.14

PPMs(3) 416.94 544.64 1,088.65 432.02 478.99

PPMs(1) 416.67 544.37 1,089.79 428.54 479.07

LZW 55.87 645.40 1,297.45 506.69 36.43

LZW_MC 215.96 671.16 1,348.73 527.19 149.83

LZ4 40.82 699.09 1,401.01 550.30 0.42

∆ encoding 2.43 924.43 1,884.22 726.58 0.67

∆ + Snappy 14.03 770.43 1,569.98 609.92 2.70

∆ + fpaq0 127.37 823.65 1,661.68 648.36 141.17

∆ + PPMs(5) 618.97 672.03 1,349.40 531.53 699.84

∆ + PPMs(3) 559.36 672.31 1,349.02 532.32 637.52

∆ + PPMs(1) 559.45 672.34 1,349.21 531.63 637.65

∆ + LZW 60.19 775.48 1,556.64 609.44 43.47

∆ + LZW_MC 242.80 826.62 1,661.39 653.77 168.88

∆ + LZ4 43.64 747.67 1,505.22 594.38 2.38

Figure 7.6: Energy usage for compression and transmission of mixed data, given different algorithms, when compressing 512 bytes at a time.

90 Chapter 7. Appendix Energy efficiency in wireless sensor Section 7.1. Results networks, through data compression

Comp- Comp- 10dBm 19.5dBm Decomp- Reception ression ression transmission transmission ression (µJ) algorithm (µJ) (µJ) (µJ) (µJ)

None 0.00 106.31 214.98 97.45 0.00

Snappy 2.47 89.73 181.33 90.75 0.38

fpaq0 22.93 58.48 118.15 85.58 24.83

PPMs(5) 63.05 63.20 127.22 85.82 68.14

PPMs(3) 50.52 61.68 124.45 84.92 55.58

PPMs(1) 50.60 61.75 124.53 86.52 55.64

LZW 8.30 74.11 149.40 88.03 5.28

LZW_MC 18.89 63.01 125.62 87.78 13.35

LZ4 35.23 88.70 177.59 89.77 0.33

∆ encoding 0.68 106.31 213.15 98.14 0.32

∆ + Snappy 3.20 88.38 177.32 89.31 0.90

∆ + fpaq0 23.71 53.23 104.75 79.98 25.17

∆ + PPMs(5) 55.13 58.60 115.70 83.40 58.61

∆ + PPMs(3) 40.64 57.38 112.76 85.08 43.93

∆ + PPMs(1) 40.55 57.33 112.74 81.87 43.73

∆ + LZW 9.18 69.51 138.26 87.29 5.38

∆ + LZW_MC 17.92 58.81 116.07 83.53 12.56

∆ + LZ4 36.18 88.01 175.48 90.53 0.82

Figure 7.7: Energy usage for compression and transmission of humidity data, given different algorithms, when compressing 64 bytes at a time.

91 Chapter 7. Appendix Energy efficiency in wireless sensor Section 7.1. Results networks, through data compression

Comp- Comp- 10dBm 19.5dBm Decomp- Reception ression ression transmission transmission ression (µJ) algorithm (µJ) (µJ) (µJ) (µJ)

None 0.00 106.69 213.13 97.01 0.00

Snappy 2.30 108.52 217.10 97.59 0.25

fpaq0 23.24 77.87 155.37 88.59 25.15

PPMs(5) 81.98 86.34 172.06 90.53 94.25

PPMs(3) 76.06 85.32 172.78 89.27 87.78

PPMs(1) 76.60 85.61 170.91 89.30 88.30

LZW 15.34 104.69 209.59 95.71 7.63

LZW_MC 37.54 94.94 192.39 92.31 24.03

LZ4 35.05 107.20 217.91 98.45 0.11

vw∆ encoding 0.14 67.88 137.20 85.57 0.14

vw∆ + Snappy 1.95 70.42 142.31 85.24 0.21

vw∆ + fpaq0 16.37 58.69 118.40 83.36 17.13

vw∆ + PPMs(5) 60.36 68.77 138.36 87.82 66.28

vw∆ + PPMs(3) 59.71 68.56 138.41 88.30 65.56

vw∆ + PPMs(1) 59.68 68.54 138.41 89.24 65.59

vw∆ + LZW 7.54 72.63 146.60 89.18 4.97

vw∆ + LZW_MC 19.54 69.58 140.09 88.10 14.13

vw∆ + LZ4 35.03 70.15 142.20 86.93 0.06

Figure 7.8: Energy usage for compression and transmission of sea surface temperature data, given different algorithms, when compressing 64 bytes at a time.

92 Chapter 7. Appendix Energy efficiency in wireless sensor Section 7.1. Results networks, through data compression

Comp- Comp- 10dBm 19.5dBm Decomp- Reception ression ression transmission transmission ression (µJ) algorithm (µJ) (µJ) (µJ) (µJ)

None 0.00 106.60 213.16 96.72 0.00

Snappy 2.44 106.55 213.28 98.44 0.26

fpaq0 23.46 100.42 201.68 98.67 25.33

PPMs(5) 77.82 98.23 196.24 94.32 83.54

PPMs(3) 74.93 98.06 196.30 98.20 80.69

PPMs(1) 74.96 98.08 196.30 93.89 80.67

LZW 10.49 106.05 212.25 95.82 7.74

LZW_MC 36.94 111.30 223.21 96.32 26.34

LZ4 35.38 105.57 211.73 98.00 0.05

∆ encoding 0.35 106.41 213.17 97.71 0.03

∆ + Snappy 2.58 109.69 220.67 97.79 0.43

∆ + fpaq0 23.71 108.08 216.42 98.23 25.15

∆ + PPMs(5) 89.61 104.83 210.80 96.72 92.78

∆ + PPMs(3) 88.27 105.34 210.80 95.20 91.51

∆ + PPMs(1) 88.30 105.31 210.80 96.95 91.56

∆ + LZW 10.93 110.06 221.69 97.99 8.26

∆ + LZW_MC 38.95 116.80 235.45 97.85 28.42

∆ + LZ4 35.74 107.30 214.73 94.96 0.25

Figure 7.9: Energy usage for compression and transmission of mixed data, given different algorithms, when compressing 64 bytes at a time.

93