Thesis submitted in fulfilment of the requirements for the award of the degree of Doctor of Engineering Sciences (Doctor in de ingenieurswetenschappen)

AGGREGATION AND RECOVERY METHODS FOR HETEROGENEOUS DATA IN IOT APPLICATIONS.

EVANGELOS K. ZIMOS

Supervisors: Prof. Deligiannis Nikos Prof. Munteanu Adrian FACULTY OF ENGINEERING Department of Electronics and Informatics

Examining Committee

Advisor: Prof. Dr. Ir. Nikos Deligiannis, Vrije Universiteit Brussel. Advisor: Prof. Dr. Ir. Adrian Munteanu, Vrije Universiteit Brussel. Chair: Prof. Dr. Ir. Ann Now, Vrije Universiteit Brussel. Vice Chair: Prof. Dr. Ir. Rik Pintelon, Vrije Universiteit Brussel. Com. Member: Prof. Dr. Ir. Martin Timmerman, Vrije Universiteit Brussel. Secretary: Prof. Dr. Ir. Jan Lemeire, Vrije Universiteit Brussel. Ext. Member: Prof. Dr. Miguel Rodrigues, University College London. Ext. Member: Prof. Dr. Ir. Eli De Poorter, Universiteit Gent.

“. . . Πάντες ἄνθρvωποι το˜υ εἱδέναι ὀρέγονται φύσει. Σημείον δ᾿ ἡ των˜ αἰσθή-σεvων ἀγάπησις: καί γὰρ χωρὶς της˜ χρείας ἀγαπωνται˜ δι᾿ αὑτάς, καὶ μάλιστα των˜ ἄλλων ἡ διὰ των˜ ὀμμάτων. Οὐ γὰρ μόνον ἵνα πράττωμεν ἀλλά καί μηθέν μέλλοντες πράττειν τὸ ὁρ˜αν αἱρούμεθα ἀντὶ πάντων ὡς εἰπε˜ιν των˜ ἄλλων. Αἴτιον δ᾿ ὅτι μάλιστα ποιε˜ι γνωρίζειν ἡμ˜ας αὕτη των˜ αἰσθήσεων καὶ πολλὰς δηλο˜ι διαφοράς ...”

– ᾿Αριστοτέλης, Μετά τά Φυσικά Α΄

“. . . All men naturally desire knowledge. An indication of this is our es- teem for the senses; for apart from their use we esteem them for their own sake, and most of all the sense of sight. Not only with a view to action, but even when no action is contemplated, we prefer sight, generally speaking, to all the other senses. The reason of this is that of all the senses sight best helps us to know things, and reveals many distinctions . . . ”

– Aristotle, Metaphysics I

Contents

Acknowledgements i

Synopsis vii

Acronyms viii

1 Introduction 1 1.1 Wireless Sensor Networks ...... 3 1.1.1 SensorNode ...... 4 1.1.2 Deployment Phases and Topologies ...... 7 1.2 Data Aggregation and Challenges ...... 8 1.3 Contributions...... 11 1.4 Applications...... 15 1.4.1 Environmental monitoring ...... 17 1.5 Outline ...... 19

2 Copula Functions 23 2.1 Motivation ...... 24 2.2 Copula Definition ...... 25 2.2.1 Mathematical Background ...... 29 2.2.2 Sklar’s Theorem ...... 33 2.2.3 Fr`echet-Hoeffding Bounds ...... 34 2.2.4 Copulas and Random Variables ...... 36 2.2.5 Derivatives and Copula Density ...... 38

i Contents

2.3 Dependence Concepts ...... 40 2.3.1 Linear Correlation ...... 40 2.3.2 Perfect Dependence ...... 41 2.3.3 Concordance ...... 41 2.3.4 Kendall’s tau and Spearman’s rho ...... 44 2.3.5 Tail Dependence ...... 46 2.4 Copula Families...... 49 2.4.1 Elliptical Copulas ...... 50 2.4.2 Archimedean Copulas ...... 56 2.5 CopulaFitting ...... 61 2.5.1 Parametric Copula Estimation ...... 63 2.5.2 Non-Parametric Copula Estimation ...... 67 2.5.3 Choosing the Right Copula ...... 69 2.5.4 Dealing with Discrete Marginal Distributions ...... 70

3 Data Gathering via Multiterminal Coding and Copula Regression 73 3.1 Introduction...... 73 3.1.1 Contributions ...... 75 3.2 Background on MT Source Coding ...... 77 3.2.1 Gaussian Regression as a Refining Stage ...... 79 3.3 The Proposed MT Source Code Design ...... 79 3.4 The Proposed Semi-Parametric Copula Regression ...... 82 3.4.1 Statistical Modelling for Diverse Sensor Data ...... 83 3.4.2 The Proposed Copula Regression Method ...... 85 3.5 Experiments...... 89 3.5.1 Choice of the Appropriate Kernel Function ...... 93 3.5.2 Performance Evaluation of DPCM ...... 97 3.5.3 Performance Evaluation of the Copula Regression Algorithm 99 3.5.4 Overall Performance of the Proposed System ...... 100 3.5.5 Performance Evaluation for Weaker Intra-Sensor Dependence Structure ...... 102 3.5.6 Comparison with DSC for Different WSN Topologies . . . . . 105 ii 3.6 Conclusion ...... 107 3.7 Discussions ...... 108

4 Data Gathering via Compressed Sensing with Side Information 113 4.1 Problem Overview ...... 115 4.1.1 Application Scenario ...... 115 4.1.2 PriorWork ...... 116 4.1.3 Contributions ...... 117 4.2 Background ...... 118 4.2.1 Compressed Sensing ...... 119 4.2.2 Compressed Sensing with Side Information ...... 121 4.2.3 Distributed Compressed Sensing ...... 124 4.2.4 Data Gathering with CS ...... 125 4.3 Proposed Architecture based on Compressed Sensing with Side In- formation ...... 126 4.3.1 Data Gathering under Noise ...... 126 4.3.2 Successive Data Recovery ...... 129 4.3.3 Extension to Tree-Based Topologies ...... 131 4.4 Statistical Modelling For Diverse Data ...... 132 4.5 Copula-based Belief Propagation Recovery ...... 137 4.5.1 Background on Bayesian Inference ...... 138 4.5.2 Description of the proposed recovery algorithm ...... 139 4.6 Recovery Algorithm for the Extended ` ` Problem ...... 142 1 − 1 4.7 Experiments...... 143 4.7.1 Sparsifying Basis Selection ...... 144 4.7.2 Performance Evaluation of the Proposed Copula-based Algo- rithm ...... 145 4.7.3 Evaluation of the System Performance ...... 147 4.7.4 Evaluation of System Performance under Noise ...... 149 4.8 Conclusions ...... 152

5 Large-Scale Data Gathering via Compressive Demixing 157 5.1 PriorArt ...... 158

iii Acknowledgements

5.2 Contributions...... 159 5.3 Background ...... 160 5.3.1 Source Separation ...... 161 5.3.2 Compressive Demixing ...... 163 5.3.3 Oracle Problem ...... 164 5.4 ProposedScheme...... 166 5.4.1 Joint Data Aggregation...... 166 5.4.2 Joint Data Recovery via Compressive Demixing ...... 168 5.5 Experimental Evaluation ...... 169 5.5.1 Comparison with State of the Art ...... 170 5.5.2 Comparison with Successive Reconstruction Architecture . . 172 5.6 Conclusions ...... 173

6 Conclusions & Future Study 175 6.1 FutureWork ...... 178 6.2 General Directions ...... 182

iv Acknowledgements

Becoming a Doctor of Philosophy requires not only your personal effort and dis- cipline but also the help and support from persons that were (or were meant to be) important in your life. During this Ph.D. journey, I was blessed to meet and cooperate with people that significantly contributed to my scientific evolution and changed my perspective on life. Firstly, I would like to express my gratitude to my advisors, Prof. Nikos Deli- giannis and Prof. Adrian Munteanu, for guiding me through the last four years and giving me the opportunity to accomplish my Ph.D. study. Moreover, I would like to thank the rest of my thesis committee: Prof. Ann Now, Prof. Rik Pintelon, Prof. Martin Timmerman, Prof. Jan Lemeire, Prof. Miguel Rodrigues and Prof. Eli De Poorter for dedicating part of their time to evaluate the work presented in this thesis, as well as their insightful comments and encouragement. I am sincerely grateful to Dr. Dimitris Toumpakaris for our continuous infor- mation theoretic talks, which started during my diploma thesis in the University of Patras and continued during my Ph.D. study. Dimitris was a mentor to me through- out all these years, a person with positive spirit, a great source of inspiration and motivation. My sincere thanks also goes to Prof. Miguel Rodrigues and Dr. Jo˜ao Mota for their significant support and guidance, especially during the last two years of my Ph.D. study. Both were providing strict comments and fruitful scientific ideas that were targeting on high-quality studies on the domain. This Ph.D. journey would not be successfully accomplished without working in a positive and productive environment. My scientific colleagues Athanassia, Ruxan-

v Acknowledgements dra, Gabor, Bruno, Beerend, Petar, Jan, and Adriaan were also my dear friends that supported me at difficult times and enriched my life with pleasant memories. More importantly, this experience resulted in broadening my family, as my colleague and friend Andrei Sechelea became my best man. My heartfelt thanks goes to my parents, Kostas and Maria, as well as my beloved sister, Dimitra, for their inherent love, support and guidance. They were always available for listening to my personal problems and wisely advising me. Last but not the least, I would like to thank my wife Sofia, my personal “car- diologist”. She is a beautiful person in every aspect, characterized by endless love, trustworthiness, patience and compassion. She knows that she played the most im- portant role in the whole experience.

vi Synopsis

Powered by technological advances on sensing hardware, wireless communications, cloud computing and data analysis, wireless sensor networks have become a key tech- nology for the Internet of Things. Wireless sensor systems of various types promise to bring new perspectives in our lives by enabling various applications, such as smart cities, ambient-condition monitoring, air-pollution monitoring, smart retail, wearables, and many more. We propose novel data aggregation and recovery mechanisms that solve the fol- lowing challenges: (i) achieving efficient data sensing and in-network compression so as to increase the power autonomy of the wireless sensor network, and (ii) providing robust data transmission against communication noise. Unlike previous data gath- ering schemes, which exploit underlying correlations among homogeneous sensor data, the proposed designs embody new recovery methods that leverage statistical dependencies among diverse sensor data. Firstly, we propose a code design that combines multiterminal source coding with copula regression and predictive coding mechanisms. Experiments using real sensor measurements show that the proposed approach achieves significant compression improvements (up to 79.99% in distortion reduction) compared to state-of-the-art multiterminal and Wyner-Ziv coding schemes. Secondly, we propose a data gather- ing scheme that relies on compressed sensing with heterogeneous side information. Using actual sensor data from the US Environmental Protection Agency, we show that this design (a) provides significant improvements in mean-squared-error per- formance against state-of-the-art schemes that use (distributed) compressed sensing and (b) offers robustness against measurement and communication noise. Thirdly,

vii Acronyms we introduce two data aggregation and recovery schemes, based on convex optimiza- tion approaches, which address use cases where the statistical characterizations of the sensor data are not available a priori. In particular, we propose a scheme that performs signal recovery via the extended ` ` problem, as well as a mechanism 1 − 1 based on the compressive demixing framework that results in significant communi- cation rate reductions of up to 50% compared to (distributed) compressed sensing systems. We show the superior performance of the proposed data gathering schemes in real-life scenarios, such as environmental monitoring, smart cities and smart build- ings; nevertheless, they can be employed in a plethora of Internet-of-Things appli- cations.

viii Acronyms

ACF Auto-Correlation Function

ADMM Alternating Direction Method of Multipliers

AMP Approximate Message Passing

API Application Programming Interface

AWGN Additive White Gaussian Noise

BP-CS Belief Propagation - Compressed Sensing

BPDN Basis Pursuit De-Noising cdf cumulative distribution function

CH Cluster Head

CHP Combined Heat & Power

CML Canonical Maximum Likelihood

CoSaMP Compressive Sampling Matching Pursuit

CS Compressed Sensing

CSS Chirp Spread Spectrum

DCS Distributed Compressed Sensing

DCT Discrete Cosine Transform

ix Acronyms

DoS Denial of Service

DSC Distributed Source Coding

EML Exact Maximum Likelihood

EPA Environmental Protection Agency

FFT Fast Fourier Transform

GMM Gaussian Mixture Model

HVAC Heating, Ventilation & Air Conditioning

IDS Intrusion Detection Systems iid independently and identically distributed

IFM Inference Functions for Margins

IoT Internet of Things

IP Internet Protocol

KDE Kernel Density Estimation

LDPC Low-Density Parity-Check

LDPCA Low-Density Parity-Check Accumulate

LoRa Long Range

LoRaWAN Long-Range Wide Area Network

M2M Machine to Machine

MAP Maximum A Posteriori

MCP Mutual Coherence Property

MEMS Micro-Electro-Mechanical Systems

ML Maximum Likelihood x MMSE Minimized Mean-Squared Error

MSE Mean-Squared Error

MT Multi-Terminal

NFC Near Field Communication

NSP Null Space Property

OECD Organization for Economic Cooperation and Development

OMP Orthogonal Matching Pursuit

PC Personal Computer

PCA Principal Component Analysis pdf probability density function

PN Peripheral Node

RFID Radio Frequency Identification

SW Slepian Wolf

SWCQ Slepian Wolf Coded Quantization

TCQ Trellis Coded Quantization

TEG Thermo-Electric power Generator

TSMP Time Synchronized Mesh Protocol

USQ Uniform Scalar Quantizer

UWB Ultra Wide-Band

WSN Wireless Sensor Network

WIA-PA Wireless Industrial Automation – Process Automation

WZ Wyner Ziv

xi

Chapter 1

Introduction

“Internet of Things” (IoT) is a paradigm that supports pervasive presence of a va- riety of things (usually called objects) in the environment. The term was invented by the British entrepreneur Kevin Ashton while working at Auto-ID Labs. He was referring to a global network of objects connected to radio-frequency identification (RFID). These objects are capable of interacting (and/or cooperating) with each other through wireless (or wired) connections and unique addressing schemes in or- der to offer new services. IoT is expected to provide advanced connectivity of devices, systems, and services that enable machine-to-machine (M2M) communications and covers a variety of protocols and applications in several domains [106]. As the num- ber of IoT devices is expected to be approximately 34 billions by 2020 [79], their interconnection will lead to the development of advanced applications in a plethora of domains and will enable, such as smart grid [151] and smart cities [105]. A smart city is the vision for urban development that involves secure integration of various IoT solutions in a city environment. The main goals are (i) to improve citizens’ quality of life, and (ii) to optimize resource allocation via proper man- agement of city assets (e.g., power grids, water supply networks, transportation systems, public buildings). To this end, sophisticated monitoring systems equipped with smart devices are needed for real-time data collection and assessment. These systems will enhance the management of urban flows and allow for efficient decision making regarding various challenges. As realistic examples consider the following

1 Chapter 1. Introduction

Figure 1.1: Concept of a Smart City [34]. scenarios:

• Salt spreading crews can be dispatched only when specific bridges have icing conditions.

• Due to corrosion and leaks in pipe materials, services and resources can be dedicated a priori to water main breaks so as to avoid degradation of the water delivery and sewage treatment systems.

• The number of building inspectors can be reduced by finding alternative ways to monitor the life condition of structures [218].

Furthermore, meticulous monitoring of ambient conditions, pollution levels in air and water will result in a clean and healthy environment. Fig. 1.1 depicts the concept of a smart city [34]. IoT technologies involve various devices (objects), such as sensors, personal com- puters, tablets or smartphones, which are uniquely identifiable via a network IPv6 address and are able to (i) dynamically join the network, and (ii) efficiently collabo- rate and cooperate with each other so as to perform different tasks. Wireless sensor networks (WSNs) is a key enabling technology for various IoT applications offering

2 1.1. Wireless Sensor Networks new perspectives. WSNs of various scales can be employed to collect information of diverse types, covering a wide variety of application areas. However, the deployment and integration of advanced WSN-driven technologies raise important challenges that need to be tackled. Some of these challenges refer to power autonomy of the sensors, robust data transmission against communication noise and imperfections of wireless channels, as well as data security and privacy. In the following sections we describe the main structure of typical WSN deploy- ments, we detail the existing problems and challenges, and we mention cutting edge data acquisition and reconstruction mechanisms proposed in the literature. Finally, we outline the contributions of this thesis by explaining how the proposed data ag- gregation and recovery schemes deal with challenges and improve the reconstruction quality compared to their counterparts.

1.1 Wireless Sensor Networks

Wireless sensor networks (WSNs) have attracted a large variety of disciplines where a close with the physical world is essential. They are communication networks consisting of wireless sensor devices equipped with small (or, miniature) batteries. These tiny devices are capable of measuring physical conditions, such as temperature, light, voltage, electricity, heat, pressure, seismic activity, and more. The distributed sensing capabilities provided by wireless communication paradigms make WSNs revolutionary data gathering systems, which are able to improve the reliability of infrastructure systems and extend the reach of current cyber infras- tructures to the physical world. Moreover, their ease of deployment provides more flexible solutions for daily applications compared to alternative wired schemes. A WSN comprises tiny sensor devices (alias, sensor nodes), which act as data generators, as well as network relays, gateways and clients. Typically, a large amount of sensor devices is spread around a monitored area, called sensor field, and form a wireless communication network. These devices measure data of diverse types, which are then sent to a gateway node. The gathering procedure is based on a routing architecture depending on the network scale and the application type. For instance, in large-scale networks multi-hop transmission [177] is usually preferred

3 Chapter 1. Introduction

Power management

Micro- Sensor Tranceiver controller

Figure 1.2: Hardware structure of a WSN sensor device [15]. in order to avoid rapid battery depletion, whereas in small-scale application, such as in homes or buildings, direct transmission from the sensor to the sink can be employed. Finally, the information arrives to a management node through internet or satellite, where the user can assign monitoring missions and strategies. As several technologies related to WSN deployments are getting mature, the required budget for building such a network has been significantly reduced. As a result, several applications gradually expand from industrial and military domains to commercial and marketing fields. At the same time, communication protocols, such as Zigbee [16], LoRa [28], ISA 100.11a [109] and WirelessHart [44], have been developed for various application scenarios, such as industrial automation-process automation (WIA-PA) [128].

1.1.1 Sensor Node

The fundamental component of a WSN is the sensor node, i.e., the device that has as its main mission to gather the required information, encode it and safely send it to other nodes for re-transmission or other type of processing. In general, each node comprises the following hardware parts [15]: the power management unit, the sensor, the wireless transceiver and the micro-controller. The hardware structure of such a device is depicted in Fig. 1.2. The role of the power management module is to

4 1.1. Wireless Sensor Networks provide the required power for the node operation; this can be achieved via batter- ies or harvesters that incorporate proper modules, such as photovoltaic panels and vibrators [14, 204]. The sensor collects signals of various types, such as vibration, voltage, light, movement of molecules, transforms them into electrical signals and conveys them to the micro-controller. The latter component is the programmable processing unit that contains on-board memory and a small storage unit. It per- forms sensing operations, runs algorithmic encoding procedures, and collaborates with other devices via wireless communication. Finally, the node contains a wireless transceiver, namely, an RF module that sends the processed data over the wireless channel. The design of a sensor device is constrained by important challenges. In par- ticular, these devices should have miniature size such that they can be employed in applications with space restrictions (e.g., wearable devices). Recent technological advances on micro-electromechanical systems (MEMS) have enabled miniaturiza- tion of WSN nodes [221]. Two- and three-dimensional micro-sensitive designs of various levels can be realized, resulting in impressively miniature sensing elements. These elements, along with power-supply tools and signal conditioning circuits, can be integrated into a miniature combo, i.e., a MEMS-based sensor node. Energy consumption on a WSN node imposes more restrictions in the hard- ware design. Although these devices are tiny and hence they require small amount of power to operate, the applications that they are involved in are constrained by the available battery supply. For example, whereas wearable devices can be easily recharged, sensor nodes installed in volcanos (or, sea beds) measuring seismic activ- ity (or, pollution levels) cannot be easily replaced [15]. An alternative approach for powering sensors is ambient energy harvesting from external sources. A conventional way for achieving harvesting is via optical cell power generation. Recently, other methods have been introduced using piezoelectric crystals, thermoelectric power generators (TEG), photovoltaic panels or modules for electromagnetic wave recep- tion. Figure 1.3 shows some types of these harvesters.

5 Chapter 1. Introduction

(a) Piezoelectric mechanism [1]. (b) Photovoltaic panel [2].

Figure 1.3: Ambient energy harvesting devices.

(a) Sensor placement. (b) Waking up and detection.

(c) Connection to network. (d) Transmission.

Figure 1.4: Organizing and transmitting process of WSNs.

6 1.1. Wireless Sensor Networks

(a) Star topology. (b) Tree topology.

(c) Chain topology. (d) Cluster-based topology.

Figure 1.5: Different WSNs topologies [15].

1.1.2 Deployment Phases and Topologies

As mentioned in the previous section, a typical WSN includes sensor nodes, which gather information of diverse types, and a sink (node) where the collected infor- mation is delivered to. The latter also acts as a gateway and, hence, facilitates the transferring of the collected data to the Internet. Sensor nodes can be deployed en masse, e.g., by being dropped by an aircraft, or carefully installed by humans or robots, as in factories. However, many of these nodes are prone to connectivity failures due to interference, noise, or moving obstacles. As such, the design and the maintenance of the WSN topology is a challenging task. The deployment of sensor devices can be done in three phases [15]. First is the pre-deployment phase, where the designer builds an efficient deployment that (i) de- creases the installation cost, (ii) does not require a priori organization and planning, (iii) offers flexibility in sensors’ arrangement, and (iv) promotes self-organization and fault tolerance. Note that fault tolerance is a parameter that indicates the sys- tem immunity to failures caused by nodes, such that its functionality is preserved.

7 Chapter 1. Introduction

Figure 1.4 provides a step-wise description of the WSN deployment process dur- ing the pre-deployment phase. First, the nodes detect other nodes by exchanging status information with their neighbors (via broadcasting). Then the nodes are self- organized into a connected network based on a topology, such as linear, star, tree or mesh (see Figure 1.5). Subsequently, the communication paths used for forwarding the collected information to the sink are determined. Since the power consumption on the nodes needs to be minimized, the inter-node transmission distance should be short1. Depending on the considered protocol, the inter-node distance may vary between rural and urban environments. For example, Long Range (LoRa) protocol supports transmission of up to 22km in rural areas and 2km in cities [3]. The post-deployment phase is the second stage and deals with the maintenance of the network after its initial deployment [15]. Interference, noise, fading effects and node failures may result in a dynamic network environment and hence in changes of the WSN topology. Periodic changes of the topology may occur based on the sensing tasks (duty cycling) as well [33]. As a result, adaptation of the protocol operations based on the topology variations is required. When the inter-senor connectivity is severely affected by node failures, addi- tional sensor devices may be installed in the network. This stage constitutes the re-deployment phase, where sensor devices are used either to replace malfunction- ing nodes or to cope with changes in task dynamics. The network needs to be re-organized as the topology changes. However, the WSN designer should consider that strict energy consumption constraints and frequent maintenance are imposed as the scale of the network increases.

1.2 Data Aggregation and Challenges

Until now we have described the structure of WSNs, their deployment phases, as well as the various topologies that can be applied to them. In this section, we detail the main challenges in a WSN setup and we describe how existing studies try to meet them. Furthermore, we mention the weak points of these studies, which motivate our research in this thesis. 1Detailed models for energy consumption can be found in [15, pp. 46-49].

8 1.2. Data Aggregation and Challenges

Power consumption at the sensor nodes is a primal constraint for designing data aggregation strategies. Energy savings can be achieved by reducing the inter- node distance and by adopting suitable (application-driven) network topologies. Moreover, the system lifetime is prolonged when nodes perform efficient in-network compression on the collected data before their transmission. In this way, the sensed data size reduces without loosing significant information. More importantly, efficient compression performance should be combined with lightweight encoding at the node because computationally expensive encoding schemes may lead to severe battery depletion. Thus, heavy processing tasks should be shifted towards the decoder, i.e., the sink. Since data collected by sensor nodes have strong spatiotemporal dependencies, they contain redundancy. Data compression should not take into account only the intra-source data dependencies, but also the inter-source dependence structure. For example, if nodes A and B respectively measure temperature and humidity, an effi- cient compression scheme should leverage the marginal among temperature values of node A—or among humidity values of node B—at different instants, as well as the joint statistics between temperature and humidity values of the same in- stance (or, even different instances). In this case, optimized management of system resources, including power, processing capability, storage and transmission band- width, can be achieved. In the literature there have been proposed modern data aggregation mechanisms that remove the redundancies between correlated sensor data via high-performance compression techniques, such as predictive coding [180, 217], collaborative wavelet transform coding [51,56], distributed source coding [45,62,63,200]. All schemes ex- ploit the dependence structure among sensor data, providing significant reductions in the transmission rate. This is translated to lower radio emissions from the nodes and hence increased power savings. However, each of these schemes has its own draw- backs. In particular, predictive coding and wavelet-based coding require excessive overhead information between nodes— and hence additional encoding complexity— due to inefficient handling of abnormal events. Moreover, distributed source coding performs well for a small number of nodes and requires accurate statistical modelling for the sensor data.

9 Chapter 1. Introduction

Figure 1.6: Multi-hop transmission in conventional WSN setups [134].

State-of-the-art data gathering schemes consider conventional statistical models, such as the multivariate Gaussian or Laplace model [45,49,167], in order to describe the inter-source data dependencies. These models are accurate when data are gen- erated from homogeneous sources, namely sources with common statistical traits. However, as data of diverse types or modalities (e.g., temperature, humidity, light, various air pollutants) are typically monitored in numerous IoT applications, con- ventional modelling introduces inaccurate inference and constrains the compression performance. To this end, more sophisticated models that will allow flexible joint statistical descriptions among data of heterogeneous types (i.e., data with different marginal statistics) need to be taken into account. A gathering strategy not only focuses on the minimization of energy consump- tion, but also on providing a transmission plan that leads to balanced radio emis- sion in the network. Traditional large-scale monitoring setups with myriads of nodes abide by the following transmission scenario (see Fig. 1.6): nodes operate in a multi- hop transmission scheme, where each sensor node n sends its own information mes- sage m to its neighbor n + 1 along with the n 1 relayed messages received from n − the other sensors. The procedure ends when the node N transmits its message mN and the N relayed messages. It is clear that sensor nodes closer to the sink consume more power. This leads to unbalanced battery consumption in the network, namely, all nodes do not send similar amount of information and thereby do not consume a similar number of energy units. As a consequence, a group of devices may deplete their batteries faster that others, resulting in abnormal network operation that often requires duty cycling. To deal with this problem, compressive data gathering has

10 1.3. Contributions been proposed in [100, 134], which considers intelligent aggregation schemes based on the compressed sensing framework [39,66]. However, these designs leverage only the inter-source dependencies for data compression. Distributed compressed sens- ing (DCS) [19, 69] is another paradigm for compressive gathering that, contrary to [100, 134], exploits both spatial and temporal correlations between data sources. However, compression performance can be further improved since, similarly to all aforementioned designs, DCS does not manage to efficiently capture the dependence structure among sensor readings of heterogeneous types. The design of data gathering mechanisms for WSNs is also restricted by other parameters, resulting in challenging trade-offs. First, when sensor nodes search for neighbors or when they wait until data from other nodes arrive, the average latency of the system may be increased. Second, data aggregation schemes may offer good compression performance, but typically require an additional channel encoding step at the sensor nodes that deal with communication noise and fading effects. The latter step adds redundancy to the compressed information, requires extra processing resources and consumes valuable power. Nevertheless, it is useful for protecting sensor information and for eliminating re-transmissions. Therefore, we realise the need for devising new gathering schemes for WSNs that provide robustness against communication noise without requiring energy-expensive channel coding.

1.3 Contributions

The main scope of this thesis is the design of new data aggregation and recovery mechanisms for WSN setups of various scales that meet the following challenges:

1. achieving efficient data sensing and in-network compression so as to increase the power autonomy of the wireless sensor network;

2. providing robust data transmission against communication noise.

These code designs can be applied in a plethora of IoT applications and provide su- perior performance compared to the state-of-the-art. Without loss of generality, we focus on the applications of (i) air-pollution monitoring, which tends to be a serious

11 Chapter 1. Introduction problem in modern cities, and (ii) monitoring of ambient conditions (e.g., tempera- ture, humidity, etc.) for smart buildings and homes. Nonetheless, the proposed IoT architectures are generic and, hence, can be applied to other applications as well (see Section 1.4 and Table 1.2). The contributions of this study are summarized as follows:

• To achieve efficient in-network compression (and hence reduce the consumed power of the sensor nodes) data gathering mechanisms should leverage accu- rate statistical descriptions among sensor data. As mentioned in the previous section, IoT applications typically deal with data of heterogeneous types, i.e., data that have highly different ranges and marginal statistics but are intrin- sically correlated. Thus, the use of conventional models (e.g., multivariate Gaussian model or the Gaussian mixture model) for capturing the underlying dependence structure can be imprecise and can limits the compression per- formance of the system. To this end, this thesis proposes novel aggregation schemes relying on the statistical model of copula functions [158, 193]. Cop- ula functions construct multivariate distributions by modeling the marginal distributions and the dependence structure separately. As such, they cap- ture complex dependencies among diverse data more accurately than existing modeling approaches [19,69] and, hence, they inspire new data reconstruction schemes. Our data gathering schemes are based on theories of various domains, such as multiterminal coding [23,213], compressed sensing with side informa- tion [152–154] and compressive demixing [145, 147], which address different challenges in WSN of various scales. Hereunder, we detail the contributions of each code design.

• Targeting at small-to-medium scale WSNs—for IoT monitoring applications in smart homes and smart buildings—we propose a novel multiterminal source code design2 [23, 213], which, contrary to prior work, utilizes both the intra- and the inter-sensor data dependencies. The former is exploited by applying simple differential pulse-coded modulation [139, 180, 217] followed by arith- metic entropy coding at each distributed encoder. This approach limits the

2Multi-terminal source coding belongs to the broader family of distributed source coding [24, 194, 226].

12 1.3. Contributions

encoding complexity and provides a flexible design that adapts to variations in the number of operating sensors. The inter-sensor correlations are exploited via a new regression method that is applied at the joint decoder and makes use of copula functions [158,193]. Experiments using real sensor measurements show that the proposed approach achieves significant compression improve- ments (up to 79.99% in distortion reduction) compared to state-of-the-art multiterminal [49] and Wyner-Ziv coding schemes [60, 228].

• To deal with the problem of (unbalanced) energy consumption of the sen- sor nodes in large-scale WSN setups (e.g., air-pollution monitoring or smart power grid in smart cities), we propose a compressive data gathering scheme, which performs compressed sensing of heterogeneous data gathered by a large number of wireless sensor devices within a large sensor field (e.g., large ge- ographic areas or huge data centers). The data gathering procedure follows the compressed sensing principles [39, 66] described in [101] and leads to bal- anced energy consumption of the nodes. Contrary to previous data gathering schemes—based on classical CS [101], CS with side information [152,234], and DCS [19]—that leverage the underlying dependencies among homogeneous sensor data, the proposed reconstruction architecture embodies a novel recov- ery algorithm—built upon belief-propagation principles [20, 112]—that lever- ages correlated information from multiple heterogeneous signals, called side information. As in the aforementioned code design, to efficiently capture the statistical dependencies among diverse sensor data, the proposed algorithm follows a copula-based approach for the statistical modeling. Experimenta- tion based on heterogeneous air-pollution sensor measurements from a United States Environmental Protection Agency [4] dataset showed that the proposed copula-based design provides significant improvements in mean-squared-error performance against state-of-the-art schemes. This means that the required data rates for a given data reconstruction quality are significantly reduced, thereby resulting in less network traffic and prolonged system lifetime.

• To address large-scale IoT applications where the statistical traits of the sources are not known a priori to the system designer, we propose a non-

13 Chapter 1. Introduction

Bayesian recovery scheme based on an extension of the framework in [152,154], which studies the problem of compressed sensing with side information. Our approach follows a multi-hop data transmission scenario, which lies in contrast with the transmission mechanisms in [19, 45]. Furthermore, unlike previous schemes [100,101,134,142] that exploit only the intra- or inter-sensor data de- pendencies, our method leverages both types of dependencies. Since the num- ber of correlated sources may be more than one, the algorithm can incorporate multiple correlated signals as side information, resulting in a multi-hypothesis- based CS scenario. To evaluate the performance of our framework, we use real air-pollution data taken from a database of the United States Environmen- tal Protection Agency [4]. The experimental results show that, for a given data rate, the proposed method provides significant reductions in the mean squared-error of the recovered data compared to the classical CS [100,101] and DCS [19] methods.

• As mentioned in Section 1.2, the lifetime of a WSN depends on the number of transmissions over the wireless network. In a typical multi-hop scenario, where no compressive data gathering is assumed, the total number of transmissions is (LN 2), where L is the number of monitored sources and N is the num- O ber of sensors. Applying the proposed compressive data aggregation based on compressed sensing with multiple side information reduces the transmission number to ( NM ) that is lower than the previous figure (M is the num- O l l l ber of networkP transmissions required for the data source l). This is because the proposed scheme leads to a systematically lower number of transmissions M , l, than state-of-the-art compressive data gathering schemes [134] since it l ∀ accounts for multiple side information signals at the data recovery stage; thus, it holds that M < N, l 1, 2,...,L . To further reduce the transmission l ∀ ∈ { } rates in a multi-sensory setup, we propose a compressive data aggregation and recovery scheme that surprisingly requires only (NM) overall network trans- O missions, no matter how many sources are monitored, where M is the number of measurements required for the joint recovery of all information sources; it is clear that M M . The proposed scheme is built upon the elegant theory  l l of compressive demixingP [145, 147] that allows for joint data aggregation as

14 1.4. Applications

well as joint data recovery. To evaluate the performance of our framework, we consider the problem of air-pollution monitoring based on actual air-pollution sensor readings taken from a database of the United States Environmental Protection Agency [4]. The experimental results show that the required data rates for a given data reconstruction quality reduces up to 60% compared to CS [100] and DCS [21] methods.

• Robust transmission in WSN setups is an important requirement. Channel imperfections, such as additive noise and fading effects [212], typically corrupt messages sent by sensor devices and may result in an abnormal network oper- ation, involving a large amount of retransmissions. Our compressive gathering and recovery mechanisms provide robustness against communication and sens- ing noise. Moreover, we assume that transmission in the proposed schemes fol- lows the recent low-power wireless networking protocol designed for the phys- ical (PHY) layer of IoT architectures [28], namely, Long Range (LoRa) [3]. Thus, channel fading effects are assumed to be efficiently handled by the chirp spread spectrum (CSS) modulation [25], a multipath/fading and doppler shift resistant technique adopted in LoRa. Finally, we should mention that this the- sis mainly deals with the widely used additive white Gaussian noise (AWGN) channel model. More sophisticated noise and fading models are left for future work.

1.4 Applications

In the previous sections we described the problems need to be tackled in WSN setups for applying efficient data gathering. Moreover, we enumerated the contributions of this study with an emphasis on the applications of air pollution monitoring for smart cities and of ambient-condition monitoring for smart home/buildings. Nevertheless, the proposed schemes are applicable to many IoT applications. Based on the various characteristics of a use case—e.g., the network scale, or if the data dependence structure is (a)symmetric—the system designer has to decide which aggregation scheme is more appropriate. Compressive gathering designs are more suitable for applications with numerous sensor nodes and therefore multi-hop transmission needs

15 Chapter 1. Introduction

Figure 1.7: Popularity of Internet-of-Things applications using: 1. Monthly worldwide Google searches for the application, 2. Monthly Tweets containing the application name and #IOT, 3. Monthly LinkedIn posts that include the application name. All metrics valid for Q4/2014 [10]. to take place, whereas the multiterminal coding scheme is appropriate for WSNs of smaller scale where flexibility is of the utmost importance. Internet of Things enables various potential applications that, in practice, af- fect numerous fields related to enterprises, individuals and the society as a whole. Although it is challenging to envisage all potential applications for the Internet of Things, especially if we consider the continuous advances in technology as well as the diverse needs of potential users, Table 1.2 tabulates some example scenarios in different application domains [15,155]. Nonetheless, the main idea is the integration and the unification of various diverse domains into a single domain, while security, privacy, trust, and safety is taken into account. It is important to mention that the proposed data aggregation and recovery methods can provide efficient solutions for the implementation of these use cases. Fig. 1.7 shows the popularity of several IoT-based applications based on Google, Twitter and LinkedIn data sources. We see that smart home applications, as well as applications related to smart cities, such as environmental monitoring and smart

16 1.4. Applications

Figure 1.8: Stationary monitors in Hong Kong [9]. metering, have attracted a lot of interest. This shows that IoT can define strategic technology paths for the next 5-10 years. In the following section we describe the problem of environmental monitoring, which is extensively considered in this thesis for the evaluation of the proposed data gathering and recovery architectures. Fig. 1.7 also shows that applications related to wearable devices, such as smart- phones, smart watches, or smart bands are in the crux of the IoT vision. In these application scenarios, data of heterogeneous types are typically gathered. The pro- posed gathering schemes—and especially the design presented in Chapter 3—can leverage the dependencies among these heterogeneous data sources and provide ef- ficient compression performance. However, the primal concern in our designs is to achieve energy savings at the encoder (i.e., the sensor node) by efficient compres- sion performance and lightweight encoding. This is not of paramount importance in wearable-based applications, where the devices can be recharged every day (or, usually night) and are designed to operate on a battery during the day.

1.4.1 Environmental monitoring

The public concern on the hazardous effects of air pollution on the public health has been significantly increased the last years. For example, Google Search results reveal that there were almost 46 million searches related to “2014 Air Pollution”, while, at

17 Chapter 1. Introduction

Table 1.1: Information about stationary monitors in selected cities [232].

Number Coverage area Coverage per monitor City of monitors in km2 (number of football fields) Beijing 35 16,000  64,025 Hong Kong 15 2,700 25,210 New York 44 1,200 3820 London 123 1,600 1822 the same time, searches related to “2014 Nobel Prize” amounted to 27 millions [11]. Governments worldwide have taken initiatives and have made serious efforts on mitigating the dangerous impacts of air-pollution on our health, the environment and the global economy using cutting-edge air-pollution monitoring systems [232]. The goal of these systems is the acquisition of detailed air-pollution conditions in extensive geographical zones, such that scientists and policy makers analyse this information and take decisions of how to improve the living environment. Conventional monitoring stations face two major problems, both depending on the fact that they are heavy, bulky and very expensive [232]. First, due to their cost, their deployments in large geographic areas is typically sparse; see Fig. 1.8 and Table 1.1. This results in inaccurate air-pollution monitoring that does not capture fluctuations on a hourly (or daily) basis. Second, deployments with conventional monitoring stations lack of versatility. The initial plan for the monitors’ placement is based on various parameters, such as human activities and urban arrangement. However, when these parameter change, the re-location of the monitoring centers is not an easy task due to their size and weight. Moreover, the placement of additional monitoring stations is not always feasible due to financial constrains. The solution to these issues can be given by large-scale WSNs comprising small sensor devices that are capable of communicating the acquired information via the Internet. The deployment of these networks in urban and rural areas can provide meticulous air-pollution monitoring, which can revolutionize the citizens’ way of living. An example city that uses efficient WSN setups for monitoring air quality is

18 1.5. Outline

Santander3. In particular, fixed nodes in the order of thousands have been mounted on street lamps and wall facades measuring concentrations of air pollutants as well as temperature, noise level and light intensity every minute. Moreover, fixed nodes interact with mobile nodes mounted on vehicles that offer increased flexibility to the monitoring process and extend the network coverage.

1.5 Outline

This introductory chapter aimed at familiarizing the reader with the structure of a WSN system, which constitutes the key enabling technology for various IoT ap- plications. In such systems, we introduced the specific challenges that need to be taken into consideration by modern data aggregation and recovery mechanisms. In short, these challenges refer to (i) minimizing the consumed energy of the sensors, which can be achieved by efficient compression techniques exploiting spatiotemporal dependencies among heterogeneous IoT data, and (ii) providing robust data sensing and transmission against measurement and communications noise. In Section 1.2 we saw that existing gathering schemes based either on predictive coding [180,217], collaborative wavelet transform coding [51,56], distributed source coding [45,62,63,200] or (distributed) compressive sampling [19,101] can cope with part of the aforementioned problems, but they have some drawbacks that limit their compression performance. In particular, these schemes do not account for accurate statistical modelling among heterogeneous sensor data. Focussing on the solution of this problem, we propose new data aggregation and recovery schemes that use the statistical model of copula functions (alias, copulas) [158, 193]. Copulas are capable of capturing wide ranges of dependence among data sources with different

3The related project is called SmartSantander [5], which created an experimental test facility for the research and experimentation of architectures, key enabling technologies, services and ap- plications for the Internet of Things (IoT) in an urban landscape. The platform comprises sensors, actuators, cameras and screens to offer useful information to citizens. In particular, 1125 ultra-low power devices (Waspmote model), which support various radio communication protocols for short- or long-range transmission, have been deployed to monitor different parameters such as noise, tem- perature, luminosity, CO and free parking slots. Each sensor has two radios for communicating at 2.4GHz (except for the parking sensors). On one end of the communication, DigiMesh is the pro- tocol selected to send the environmental information. On the other hand, IEEE 802.15.4 protocol is used to carry out experiments within the network. All the nodes within the SmartSantander network can be used to test new algorithms without any downtime, while citizens still receive information about their environment.

19 Chapter 1. Introduction statistical characteristics and thus provide a more suitable solution for describing their joint statistics. Chapter 2 elaborates on the theoretical background of the copula model. Chapter 3 describes a new data gathering technology, which relies on the multiterminal coding paradigm [24,49] and a novel copula regression algorithm; this scheme targets at monitoring diverse ambient conditions in smart homes and smart buildings. Chapter 4 considers applications involving large-scale WSNs, such as air pollution monitoring for smart cities. In particular, this chapter proposes two compressive data aggregation and reconstruction methods; the one is based on the theory of Bayesian compressed sensing with heterogeneous side information [233], whereas the other on the extended ` ` convex optimization problem [234]. 1 − 1 Focusing on similar use cases as in Chapter 4, an alternative data gathering scheme, built upon the compressive demixing paradigm [145, 147], is proposed in Chapter 5. Finally, Chapter 6 draws the conclusions of this thesis and highlights potential improvements and alternative schemes that are left as future work.

20 1.5. Outline

Table 1.2: IoT applications and example scenarios where data gathering and recovery is required [15, 155].

Application Example Scenarios Domains

Environmental Detection of forest fire, i.e., monitoring of preemptive fire conditions Monitoring in order to define alert zones; air-pollution monitoring, i.e., moni- toring the concentration of various air pollutants in cities, industrial zones, or monitoring of toxic gases generated in farms; detection of landslide and avalanche occurrences by monitoring the soil mois- ture, vibrations and earth density; earthquake detection, especially for areas with intense seismic activity [15, 155].

Smart city Smart parking to monitor and improve parking availability in the city; traffic congestion monitoring to optimize driving and walk- ing routes; smart lighting that adapts based on the needs and the weather; waste management to optimize the trash collection routes; intelligent transportation systems, where roads and highways will provide warning messages and proposed diversions based on unex- pected events (e.g., hazardous climate conditions, accidents, traffic jams); structural health to measuring vibrations and the material conditions in public constructions; real-time sound monitoring in centric zones [155].

Smart meter- Smart grids that monitor energy consumption levels and adapt ing based on the demand; monitoring of water, oil and gas levels in storage tanks; monitoring of performance in solar energy plants and optimization; monitoring of water flow in water transportation sys- tems [15, 155].

Smart home Remote control of heating, ventilation and air conditioning (HVAC) over the internet; intelligent lighting control system based on the daily demands and the time of the day; integration with the smart grid and smart meters in order to optimize the energy consumption of the house (e.g., high energy output from solar panels during noun can be used to run washing machines); smart household security system that is integrated with a home automation system and offers security services, such as remote surveillance of IP-connected cam- eras, and central locking of doors and windows; detection of smoke and CO leakages; smart home automation that allows elderly peo- ple and individuals with disabilities to remain at home, safe and comfortable (assistive domotics) [155].

21 Chapter 1. Introduction

Industrial M2M collaboration for auto-diagnosis and assets control; indoor air- control quality monitoring of toxic gas to ensure workers and goods safety; temperature monitoring to control temperature in fridges with sensi- tive merchandise; asset indoor location using active [ZigBee, UWB] and/or passive tags [RFID/NFC]); vehicle auto-diagnosis where real- time alarms are sent to warn emergencies and provide advices [155]. Smart Agri- Wine quality enhancing via soil-moisture monitoring systems that culture control the amount of sugar in grapes and preserve the grapevine health; monitoring and control of the micro-climate conditions in green houses so as to optimize production and control the quality; forecasting the weather conditions in fields so as to detect ice for- mation, rain, drought, snow or wind changes [15, 155]. Smart Retail Supply chain control, i.e., monitoring of storage conditions and prod- uct tracking; smart payment via near-field communication (NFC) technology; smart shopping, where customers get advice base on their preferences, habits (e.g., allergies if it is about meal delivery); smart management of products, where there is product rotation in shelves so as to automate the restocking process [155].

e-Health Fall detection and assistance for elderly or disabled people; moni- toring and control of conditions inside freezers that store medicines, organic elements and vaccines; monitoring of vital signs in training centres and fields; patients surveillance inside hospitals or homes in case of elderly or disabled people; measurement of ultraviolet sun rays in order to avoid exposure during certain periods of the day [155].

22 Chapter 2

Copula Functions

A plethora of applications for the Internet of Things involve various sensor nodes measuring data of heterogeneous types or modalities, such as various air pollutants, temperature, humidity, light, and pressure. These data types have highly different ranges and marginal statistics but are intrinsically correlated. To accurately express dependencies among diverse data sources, we propose the use of statistical mod- els based on copula functions [158, 193]. Copula functions (also known as copulas) have the advantage that they allow data from individual sensors to have arbitrary marginal distributions, merging them into a multivariate pdf. Therefore, they can be used for coupling any discrete and/or continuous distributions. An appealing feature of copula modeling is that it can be used to retain well-known parametric families for the marginal distributions even though they may not be easily extendable to multivariate settings. The copula method for understanding multivariate distributions has a relatively short history in statistics; most of the statistical applications have appeared in the last 15 years. However, copulas have been studied in the Probabilities literature for about 50 years and thus many properties of copulas are now widely known. Nevertheless, copula functions have only recently been explored in signal processing [62, 110, 156].

23 Chapter 2. Copula Functions

2.1 Motivation

To support the aforementioned argument and convince the reader for the use of copula-based models in our code designs, we perform fitting tests based on actual sensor readings from well-established databases. In particular, focussing on applica- tions of ambient-condition monitoring, we considered real-valued temperature and humidity data from the Intel Berkeley database [136]. Fig. 2.1 presents the his- togram and various probability density functions (pdfs), namely, the Gaussian, the non-parametric and the Gaussian mixture model (GMM), fitted on the temper- ature and humidity sensor readings. For the parametric pdf estimation we used maximum likelihood estimators, whereas kernel density estimation (KDE) with an Epanechnikov kernel was considered for the non-parametric distributions. Based on the Quantile-Quantile plots (alias, Q-Q plots) in Figs. 2.3 and 2.4, as well as the p-values1 from Kolmogorov-Smirnov fitting tests [143] in Table 2.1, we can clearly say that the non-parametric distribution describes more accurately both the tem- perature and the humidity sensor values. Aiming at air-quality monitoring applications, we performed the same experi- ments based on carbon monoxide (CO) and nitrogen dioxide (NO2) sensor values collected by the United States European Protection Agency (EPA) database [4]. Fig. 2.2 depicts the same pdfs (along with histograms) fitted on the real-valued data. Q-Q plots in Figs. 2.5 and 2.6, and the p-values from Kolmogorov-Smirnov fitting tests in Table 4.1 reveal that the non-parametric distribution describes more accu-

1As a goodness-of-fit procedure we use the two-sample Kolmogorov-Smirnov test, namely, a nonparametric hypothesis test that evaluates the maximum difference between the cdfs of two data vectors, say x1 and x2, and also calculates an asymptotic p-value based on this reported maximum difference and the sample sizes [143]. The asymptotic p-value is a scalar value in (0, 1) and represents the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the hypothesis that the data in vectors x1 and x2 are from the same continuous distribution; this is typically called the “null hypothesis”. In other words, the p-value is the answer to the following question: If the two samples were randomly sampled from identical populations, what is the probability that the two cdfs would be as far apart as observed? If the p- value is small, we conclude that the two datasets belong to populations with different distributions. The populations may differ in median, variability or the shape of the distribution.The asymptotic p-value is very accurate for large sample sizes, and is reasonably accurate for sample sizes n1 and n1n2 n2, such that 4 [6]. Finally, if the resulting p-value of the test is too large, then we n1+n2 ≥ can infer that there is overfitting of the data [143, 202]; for example, this happens in normality tests, when the data are standardized using the estimates of the mean value and the standard deviation [143]. To perform the Kolmogorov-Smirnov test in this chapter we used the MATLAB function kstest2 [6].

24 2.2. Copula Definition

Table 2.1: Asymptotic p-values when performing Kolmogorov-Smirnov fitting tests between the the actual sensor values and the sets produced by the gaussian, the Non-Parametric distribution with Epanechnikov Kernel, and the Gaussian Mixture Model. The significance level is set to 5%.

Gaussian Kernel Density Estimation Gaussian Mixture Model Temperature 0.0001 0.6171 0.2237 Humidity 0.0001 0.1683 0.0098 Carbon Monoxide 0.0004 0.5232 0.0141 Nitrogen Dioxide 0.0052 0.0965 0.4213

rately the NO2 sensor values, whereas the GMM distribution is the best to express the statistics of the CO readings. These experiments show that indeed data sources in a monitoring application cannot be accurately described by conventional statistical models, such as the mul- tivariate Gaussian distribution [49], the joint Gaussian mixture models (GMM) [178, 179] and the multivariate Laplacian distribution [72]. This is because these models require the marginal statistics to be of the same type, which is clearly not true. To deal with this we propose to use copula functions [158, 193], which allow data from individual sensors to have arbitrary marginal distributions, merging them into a multivariate pdf. In the next sections we proceed with the definition and the mathematical back- ground of copulas, we describe various measures of dependence that are crucial for the deep understanding of the underlying data dependencies and are used in copula expressions, we detail some important copula families considered in this thesis, and we conclude with copula estimation techniques.

2.2 Copula Definition

The “operational” definition of a copula is the following [74]:

Definition 2.2.1. A copula function is a multivariate distribution function defined on the unit cube [0, 1]L, with L uniformly distributed marginal distributions.

Based on this definition, a copula can be simply viewed as the original multivariate distribution function with transformed univariate margins. However, this definition

25 Chapter 2. Copula Functions

0.25 Histogram Gaussian Kernel Density Estimation 0.2 Gaussian Mixture Model

0.15

Probability 0.1

0.05

0 10 15 20 25 30 Temperature (a) Temperature sensor readings

0.1 Histogram 0.09 Gaussian Kernel Density Estimation 0.08 Gaussian Mixture Model 0.07

0.06

0.05

Probability 0.04

0.03

0.02

0.01

0 15 20 25 30 35 40 45 50 55 Humidity (b) Humidity sensor readings

Figure 2.1: Fitting of known distributions (Gaussian, KDE with an Epanech- nikov kernel, and Gaussian Mixture Model) on various sensor readings from the Intel-Berkeley database [136].

26 2.2. Copula Definition

0.45 Histogram 0.4 Gaussian Kernel Density Estimation 0.35 Gaussian Mixture Model

0.3

0.25

0.2 Probability 0.15

0.1

0.05

0 -5 0 5 10 Carbon Monoxide (a) Temperature sensor readings

0.03 Histogram Gaussian 0.025 Kernel Density Estimation Gaussian Mixture Model

0.02

0.015 Probability 0.01

0.005

0 -50 0 50 100 150 200 Nitrogen Dioxide (b) Humidity sensor readings

Figure 2.2: Fitting of known distributions (Gaussian, KDE with an Epanech- nikov kernel, and Gaussian Mixture Model) on various air pollu- tants from the EPA database [4].

27 Chapter 2. Copula Functions

35 30 35

30 30 25 25 25 20 Y Quantiles Y Quantiles 20 Y Quantiles 20 15

10 15 15 15 20 25 30 35 15 20 25 30 35 15 20 25 30 35 X Quantiles X Quantiles X Quantiles (a) Gaussian (b) Kernel Density Estimation (c) Gaussian Mixture Model

Figure 2.3: Quantile-Quantile plot for temperature sensor values (Intel Berke- ley database [136]).

60 50 50

55 45 45 50 40 40 45 35 35 40 30 30 Y Quantiles

Y Quantiles 35 Y Quantiles 25 25 30

25 20 20

20 15 15 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 X Quantiles X Quantiles X Quantiles (a) Gaussian (b) Kernel Density Estimation (c) Gaussian Mixture Model

Figure 2.4: Quantile-Quantile plot for humidity sensor values (Intel Berkeley database [136]).

10 10 12

8 8 10

6 8 6 4 6 4 2 4 Y Quantiles Y Quantiles 2 Y Quantiles 0 2 0 -2 0

-4 -2 -2 -5 0 5 10 15 -5 0 5 10 15 -5 0 5 10 15 X Quantiles X Quantiles X Quantiles (a) Gaussian (b) Kernel Density Estimation (c) Gaussian Mixture Model

Figure 2.5: Quantile-Quantile plot for Carbon Monoxide sensor values (United States European Protection Agency [4]).

28 2.2. Copula Definition

200 200 160

140 150 150 120

100 100 100 80 50 Y Quantiles

Y Quantiles Y Quantiles 60

50 40 0 20

-50 0 0 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 X Quantiles X Quantiles X Quantiles (a) Gaussian (b) Kernel Density Estimation (c) Gaussian Mixture Model

Figure 2.6: Quantile-Quantile plot for Nitrogen Dioxide sensor values (United States European Protection Agency [4]). hides some potential problems that may arise when constructing copulas via other techniques. To this end, we present a more abstract definition that abides by the approach of Nelsen in [158]. First we focuss on general multivariate distributions and then we study special properties that characterize the copula subset.

2.2.1 Mathematical Background

The term “Copula” comes from the Latin noun which means a link, tie, bond re- ferring to joining together. Indeed, a copula is defined as a function that couples one-dimensional marginal distribution functions into multivariate distribution func- tions. It is a multivariate distribution function defined on the unit L-cube [0, 1]L, with uniformly distributed marginals. Before proceeding to the mathematical background behind the definition of copu- las, let us first provide some intuitive comments.. Consider a tuple of of random vari- ables X ,X ,...,X , with marginal distribution functions F (x ) = P [X x ], 1 2 L X1 1 1 ≤ 1 F (x ) = P [X x ] and F (x ) = P [X x ] respectively, and a joint X2 2 2 ≤ 2 XL L L ≤ L distribution function H(x , x , . . . , x ) P [X x ,X x ,...,X x ]. 1 2 L ≤ 1 ≤ 1 2 ≤ 2 L ≤ L Each tuple of real numbers (x1, x2, . . . , xL) can be associated to (L + 1) numbers:

FX1 (x1),FX2 (x2),...,FXL (xL) and H(x1, x2, . . . , xL), where each of them lies in the interval [0, 1]. In other words, each tuple (x1, x2, . . . , xL) of real numbers leads L to a point (FX1 (x1),FX2 (x2),...,FXL (xL)) in the L-unit space [0, 1] , and this or- dered pair in turn corresponds to a number H(x , x , . . . , x ) [0, 1]. Copula theory 1 2 L ∈ shows that this correspondence, where the value of the joint distribution function

29 Chapter 2. Copula Functions is assigned to each ordered pair of values of the marginal distribution functions, is indeed a function. Such functions are copulas. The theory of copulas is based on the fundamental theorem proposed by Sklar in [193]. Before getting into the details and the implications of this theorem, the reader should be familiar with the theoretical background presented in this section. First, we define the concept of an H-volume [74, 158]. We denote by Rˉ = [ , + ] the −∞ ∞ extended real line and by Rˉ L the extended L-space. We use vector notation for points ˉ L in R , namely, a = [a1, . . . , aL]; we will write a b, where b = [b1, . . . , bL], when it ≤ holds that a b for all l. Moreover, for a b, let B = [a, b] = [a , b ] [a , b ] l ≤ l ≤ 1 1 ×∙ ∙ ∙× L L denote the L-box, namely the Cartesian product of L close intervals. The vertices of B are the points c = [c1, . . . , cL], where each cl is equal to either al or bl. The L-place real function H is a function whose domain,denoted by DomH, is a subset of Rˉ L and whose range, RanH, is a subset of R.

ˉ Definition 2.2.2. Let S1,...,SL denote L non-empty subsets of R, and let H be an L-place real function with DomH = S S . Also, let B be an L-th box 1 × ∙ ∙ ∙ × L with all vertices lying in DomH. Then the H-volume of B can be given by

VH (B) = sgn(c)H(c), (2.1) c B X∈ where the sum is taken over all vertices c, and sgn(c) is the following function:

+1, if cl = al for an even number of l ’s, sgn(c) =   1 if cl = al for an odd number of l ’s,. −  For example, consider the trivariate case (i.e., L = 3), where B = [a , b ] [a , b ] 1 1 × 2 2 × [a3, b3] is a 3-box and the corresponding H-volume is

V (B) = H(b , b , b ) H(b , b , a ) H(b , a , b ) H(a , b , b ) H 1 2 3 − 1 2 3 − 1 2 3 − 1 2 3 + H(b , a , a ) + H(a , b , a ) + H(a , a , b ) H(a , a , a ). 1 2 3 1 2 3 1 2 3 − 1 2 3

Alternately, the H-volume of an L-box B = [a, b] can be given by the L-th order

30 2.2. Copula Definition difference of H on B, namely,

b bL b1 VH (B) = Da H(t) = DaL ... Da1 H(t), (2.2) where

bl Da H(t) =H(t1, . . . , tl 1, bl, tl+1, . . . , tL) l −

H(t1, . . . , tl 1, al, tl+1, . . . , tL). (2.3) − −

Then we proceed with the definition of an L-increasing function [74, 158]:

Definition 2.2.3. An L-variate real function H is L-increasing if the H-volume is non-negative, i.e., V (B) 0, for all L-boxes B whose vertices lie in DomH. H ≥ We assume that the domain of H can be expressed as H = S S , where S 1 × ∙ ∙ ∙ × L l is the subset with the smallest element al. We say that the function H is grounded if H(t) = 0, t DomH, such that t = a for at least one l 1,...,L . ∀ ∈ l l ∈ { } If each subset Sl is non-empty and its greatest element is bl, then H has single- dimensional marginal functions (alias, margins) Hl with DomHl = Sl, where

Hl(x) = H(b1, . . . , bl 1, x, bl+1, . . . , bL), x Sl. (2.4) − ∀ ∈

Note that marginal functions of higher dimensions can be defined in a similar way. Then we proceed with a lemma presented in [74,158] for the increasing functions in each argument:

Lemma 2.2.1. Let H be a grounded L-increasing function with DomH = S 1 × ˉ SL, where S1,...,SL are non-empty subsets of R. We say that H is increas- ∙ ∙ ∙ × ing in each argument when the following holds: If (t1, . . . , tl 1, x, tl+1, . . . , tL) and − (t1, . . . , tl 1, x0, tl+1, . . . , tL) are in DomH and x x0, then − ≤

H(t1, . . . , tl 1, x, tl+1, . . . , tL) H(t1, . . . , tl 1, x0, tl+1, . . . , tL). (2.5) − ≤ −

Also, consider the following lemma [74, 158]:

31 Chapter 2. Copula Functions

Lemma 2.2.2. Let H be a grounded L-increasing function with DomH = S 1 × ˉ SL, where S1,...,SL are non-empty subsets of R. If x = [x1, . . . , xL] and ∙ ∙ ∙ × x0 = [x10 , . . . , xL0 ] are any points in DomH,

L H(x) H(x0) H (x ) H (x0) (2.6) | − | ≤ | l l − l l | Xl=1

For the proof, we refer the reader to [186]. Subsequently, we give the definition of a cumulative distribution function with L dimensions as in [74, 158].

Definition 2.2.4. An L-dimensional cumulative distribution function is a function H with domain Rˉ L such that H is grounded, L-increasing and H( ,..., ) = 1. ∞ ∞ We clearly see that Lemma 2.2.1 implies that the marginal functions of an L- dimensional distribution function are cumulative distribution functions (cdfs), de- noted by FX1 ,...,FXL . Subsequently, we proceed with the definition of a copula function [158]:

Definition 2.2.5. An L-dimensional copula is a function C with domain [0, 1]L such that:

1. C is grounded and L-increasing.

2. C has margins C , l = 1,...,L that satisfy: C (u ) = u , u = F (x ) that l l l l ∀ l Xl l takes values in [0, 1].

For any L-dimensional copula C with L 3, each l-dimensional margin of C is ≥ also an l-copula. Equivalently, an L-dimensional copula C : [0, 1]L [0, 1] has the → following properties:

1. u [0, 1]L: C(u) = 0 if at least one coordinate of u is 0, and C(u) = u if all ∀ ∈ l coordinates of u equal 1 except ul.

2. a, b [0, 1]L where a b , l 1,...,L : V ([a, b]) 0. ∀ ∈ l ≤ l ∀ ∈ { } C ≥ As copulas are multivariate cdfs, they induce a probability measure on [0, 1]L via V ([0, u ] [0, u ]) = C(u , . . . , u ), (2.7) C 1 × ∙ ∙ ∙ × L 1 L 32 2.2. Copula Definition and a standard extension to arbitrary (not necessarily L-boxes) Borel subsets of [0, 1]L. From the measurement theory we know that there is a unique probability L measure on the Borel subsets of [0, 1] which coincides with VC on the set of L-boxes L of [0, 1] . We also denote this probability measure with VC . Based on the Definition 2.2.5, we can say that a copula C is a cdf with uniformly distributed margins. The following theorem [158] follows directly from Lemma 2.2.2.

Theorem 2.2.3. Let C be an L-dimensional copula. Then, for all u = [u1, . . . , uL] L and u0 = [u10 , . . . , uL] in [0, 1] , it holds that:

L C(u) C(u0) u u0 . (2.8) | − | ≤ | l − l| Xl=1 Thus, C is uniformly continuous on [0, 1]L.

2.2.2 Sklar’s Theorem

The Sklar’s theorem [193] is a fundamental result widely used in the theory of copulas.

Theorem 2.2.4. Sklar’s Theorem [193]. Let H be an L-dimensional distribution function with marginal cdfs FX1 ,...,FXL . Then there exists an L-dimensional cop- ula C such that:

ˉ L H(x) = C FX1 (x1),...,FXL (xL) , x R , (2.9) ∀ ∈   where x = [x1, . . . , xL] is an instantiation of the random vector X = [X1,...,XL].

If FX1 ,...,FXL are all continuous, then C is unique; otherwise C is uniquely deter- mined on RanF RanF , where RanF denotes the range of F . Conversely, if 1 × ∙ ∙ ∙ × L

C is an L-dimensional copula and FX1 ,...,FXL are distribution functions, then the function H defined above is an L-dimensional distribution function with marginal cdfs FX1 ,...,FXL .

For the proof, see [193]. Note that in our framework the components of the random vector X represent sensed information or, as we may call it, data sources. Furthermore, they may represent transformed data sources in other domains, such

33 Chapter 2. Copula Functions as the discrete cosine transform domain. Sklar’s Theorem says that the univariate marginal distributions and the multivariate dependence structure can be separated in the case of continuous joint distributions functions. For us this is a very impor- tant remark since sensor data produced from continuous distributions are typically monitored in IoT applications.

The dependence structure can be captured by a copula function. Let FXl be 1 1 a univariate distribution function and F − its inverse, where F − (t) = inf xl Xl Xl { ∈

R FXl (xl) t for all t in [0, 1], using the convention that the infimum of zero is | ≥ } . Then, we present a corollary [158] based on the Sklar’s theorem: −∞ Corollary 2.2.4.1. Let H be an L-dimensional distribution function with continu- ous marginal cdfs FX1 ,...,FXL and let C be a copula that satisfies (2.9). Then for L any u = [u1, . . . , uL] in [0, 1] it holds that:

1 1 C(u , . . . , u ) = H F − (u ),...,F − (u ) . (2.10) 1 l X1 1 XL L   If the margins are not assumed to be continuous, then we refer the reader to [141, 158].

2.2.3 Fr`echet-Hoeffding Bounds

In this section we describe the upper and lower bounds of a copula function. Let M, Π and W be functions defined on [0, 1]L, where

M(u) = min(u1, . . . , uL), (2.11) Π(u) = u u , (2.12) 1 ∙ ∙ ∙ L W (u) = max(u + + u L + 1). (2.13) 1 ∙ ∙ ∙ L −

Both M and Π are L-dimensional copulas for L 2, whereas W is not a copula for ≥ L 3. ≥ The following theorem was introduced in [82] and defines the well-known Fr´echet- Hoeffding bounds inequality.

34 2.2. Copula Definition

Theorem 2.2.5. If C is any L-copula, then for every u in [0, 1]L,

W (u) C(u) M(u). (2.14) ≤ ≤

If the reader is interested in more details, we refer to [150]. Although the Fr`echet- Hoeffding lower bound W is not a copula for values of L larger than 3, it is the best possible lower bound as described in Theorem 2.2.6.

Theorem 2.2.6. For u = [u , . . . , u ] in [0, 1]L, L 3, there is an L-dimensional 1 L ≥ copula C(u) such that C(u) = W (u). (2.15)

For the proof, see [158]. Let Cˉ denote the joint survival function for L uniformly distributed random vari- T ables in U = [U1,...,UL] with joint cdf C(u), namely, Cˉ(u1, . . . , uL) = Pr(U1 > u1,...,Ul > uL). Then, we have the following definition [158]:

Definition 2.2.6. Let C1 and C2 be copulas. We say that C1 is smaller than C2 (and we write C C ) if 1 ≺ 2

C (u) C (u) and Cˉ (u) Cˉ (u), u [0, 1]L. (2.16) 1 ≤ 2 1 ≤ 1 ∀ ∈

In the bivariate case (i.e., L = 2), it holds that:

Cˉ (u , u ) Cˉ (u , u ) 1 1 2 ≤ 2 1 2 ⇔ 1 u u + C (u , u ) 1 u u + C (u , u ) − 1 − 2 1 1 2 ≤ − 1 − 2 2 1 2 ⇔ C (u , u ) C (u , u ) (2.17) 1 1 2 ≤ 2 1 2

Moreover, in the bivariate case, Fr`echet-Hoeffding lower bound W is always smaller than every bivariate copula. Similarly, a bivariate copula is always smaller than the Fr`echet-Hoeffding upper bound M. This ordering of the copula set is named

35 Chapter 2. Copula Functions concordance ordering, and is partial as not every copulas’ pair is comparable accord- ing to this order. Nevertheless, some copula families are totally ordered. Hence, we say that a copula family C with a single parameter θ is positively ordered if { θ} C C when θ θ . 1 ≺ 2 1 ≤ 2

2.2.4 Copulas and Random Variables

T Let X = [X1,...,XL] be a random vector with continuous marginal distribution functions (or, cdfs) FX1 ,...,FXL and a joint distribution function H. Then X has a unique copula Cgiven by (2.9). The multivariate distribution of X can be written as:

H(x , . . . , x ) = Pr(X x ,...,X x ) = C F (x ),...,F (x ) . (2.18) 1 L 1 ≤ 1 L ≤ L X1 1 XL L   The transformation of the form:

X F (X ), l 1,...,L (2.19) l 7→ Xl l ∈ { } is known as the probability integral transform [91] and forms a standard tool in simulation methodology. As known from the probability theory, if the joint distribution H takes the form:

H(x1, . . . , xL) = FX1 (x1) FXL (xL) for all x1, . . . xL in R, then the variables × ∙ ∙ ∙ × X1,...,XL are independent. The following theorem comes as a result of Theorem 2.2.4 [158].

T Theorem 2.2.7. Let X = [X1,...,XL] be a vector of continuous random variables with copula C. Then X1,...,XL are independent iff C = Π.

Copulas have the following property: When strictly monotone transformations are applied on the random variables, copulas either change in certain (usually simple) ways or they are invariant. This property can be useful in IoT applications where data need to be transformed in order to exhibit a desired structure. If FXl (xl) is continuous and t( ) is a strictly monotone function with domain that contains RanX, ∙ then the cdf of t(X) is also continuous. We have the following theorem [74]:

36 2.2. Copula Definition

T Theorem 2.2.8. Let X = [X1,...,XL] be a random vector of continuous random variables and a copula C(u , . . . , u ), where u = F (x ). If τ ( ), . . . , τ ( ) are 1 l l Xl l 1 ∙ L ∙ strictly increasing functions on the ranges RanX1,..., RanXL, respectively, then T the transformed [τ1(X1), . . . , τL(XL)] also has the copula C.

Proof. The proof is presented in [74]. Let FX1 ,...,FXL denote the cdfs of X1,...,XL and let G1,...,GL denote the corresponding cdfs of τ1(X1), . . . , τL(XL). Moreover, T T we assume that [X1,...,XL] has a copula C, whereas [τ1(X1), . . . , τL(XL)] a cop- ula C . Since we each function τ ( ) is strictly increasing, then G (x) = Pr(τ (X ) τ l ∙ l k k ≤ 1 1 x) = Pr(Xk τ − (x)) = Fk(τ − (x)) for any x R. Hence, it can be written that: ≤ k k ∈

C G (x ),...,G (x ) = Pr τ (X ) x , . . . , τ (X ) x τ 1 1 L L 1 1 ≤ 1 L L ≤ L    1 1  = Pr X τ − (x ),...,X τ − (x ) 1 ≤ 1 1 L ≤ L L   1 1 = C FX1 τ1− (x1) ,...,FXL τL− (xL)      = C G1(x1),...,GL(xL) . (2.20)  

Given that the variables X1,...,XL are continuous, then it holds that: RanG1 = = RanG = [0, 1]. Thus, C = C on [0, 1]L. ∙ ∙ ∙ L τ  From Sklar’s Theorem we know that C can separate the L-dimensional cdf H from the single-dimensional margins FXl . The next theorem shows that another function Cˆ, named survival copula, can separate the L-dimensional joint survival function from the survival marginal functions [74].

T Theorem 2.2.9. Let X = [X1,...,XL] be a random vector of continuous ran- dom variables and a copula C(u , . . . , u ), where u = F (x ). If τ ( ), . . . , τ ( ) 1 l l Xl l 1 ∙ L ∙ are strictly increasing functions on ranges RanX1,..., RanXL, respectively, then [τ (X ), . . . , τ (X )]T has a copula C . Furthermore, let τ ( ), l = 1 1 L L τ1(X1),...,τL(XL) l ∙ 1,...,L be strictly decreasing for some l, and assume, without loss of generality

37 Chapter 2. Copula Functions that l = 1. Then

Cτ1(X1),...,τL(XL)(u1, . . . , uL) =Cτ2(X2),...,τL(XL)(u2, . . . , uL) C (1 u , u , . . . , u ). (2.21) − X1,τ2(X2),...,τL(XL) − 1 2 L

Proof. The proof can be found in [74]. As before, let FX1 ,...,FXL denote the cdfs of

X1,...,XL and let G1,...,GL denote the corresponding cdfs of τ1(X1), . . . , τL(XL). Then we can write:

Cτ1(X1),...,τL(XL) G1(x1), . . ., GL(xL) = ...   = Pr τ (X ) x , . . . , τ (X ) x 1 1 ≤ 1 L L ≤ L  1  = Pr X > τ − (x ), τ (X ) x , . . . , τ (X ) x 1 1 1 2 2 ≤ 2 L L ≤ L   = Pr τ (X ) x , . . . , τ (X ) x 2 2 ≤ 2 L L ≤ L  1  Pr X τ − (x ), τ (X ) x , . . . , τ (X ) x − 1 ≤ 1 1 2 2 ≤ 2 L L ≤ L   = Cτ2(X2),...,τL(XL) G2(x2),...,GL(xL)

 1  C F (τ − (x )),G (x ),...,G (x ) − X1,τ2(X2),...,τL(XL) X1 1 2 2 L L   = Cτ2(X2),...,τL(XL) G2(x2),...,GL(xL)   C 1 G (x ),G (x ),...,G (x ) . − X1,τ2(X2),...,τL(XL) − 1 1 2 2 L L   This concludes the proof. 

Theorems 2.2.8 and 2.2.9 [74] clearly show that the copula Cτ1(X1),...,τL(XL) can be described in terms of the copula CX1,...,XL and the margins. For more examples on that, we refer the reader to [74, 158].

2.2.5 Derivatives and Copula Density

l The l-th order partial derivative of a copula C, denoted by ∂ C(u) , exists for ∂u1...∂uL L almost all vectors u = [u1, . . . , uL] in [0, 1] . In [158] it is proven that for every

38 2.2. Copula Definition

l u [0, 1]L it holds that 0 < ∂ C(u) < 1. Based on this, we can write ∈ ∂u1...∂uL

C(u1, . . . , uL) = AC (u1, . . . , uL) + SC (u1, . . . , uL), (2.22) where

u1 uL ∂L A (u , . . . , u ) = ... C(s , . . . , s )ds . . . ds , (2.23) C 1 L ∂u , . . . , ∂u 1 L 1 L Z0 Z0 1 L and S (u , . . . , u ) = C(u , . . . , u ) A (u , . . . , u ). (2.24) C 1 L 1 L − C 1 L

If the margins are continuous, meaning that there is no u such that VC (u) > 0. If L C = AC on [0, 1] , then the copula C is absolutely continuous. Then, the copula density can be derived as follows:

∂L c(u1, . . . , uL) = C(u1, . . . , uL). (2.25) ∂u1, . . . , ∂uL

Based on (2.9), the copula density takes the following form:

h(x1, . . . , xL) c(u1, . . . , uL) = L , (2.26) l=1 fXl (xl) Q where h(x1, . . . , xL) denotes the joint pdf and fXl (xl) the marginal pdf. Equation (2.26) clarifies that the multivariate pdf can be described by the copula density and the product of the marginal distributions.

If C = SC , then C is singular and the copula density is zero almost everywhere in [0, 1]L. The support of a copula is the complementary set of the union of all open subsets A of [0, 1] with VC (A) = 0. When C is singular its support set has Lebesgue measure zero (the converse is also true).

Remark. Until now, we have assumed that the marginal distributions are contin- uous, which implies that the copula is unique [193]. However, some interesting appli- cations in various fields, such as marketing, phychometrics, finance and biostatistics, require statistical modelling of multivariate discrete data, such as binary, categorical and count data [169]. In this case, a copula function that captures the dependence

39 Chapter 2. Copula Functions structure among discrete data sources is not unique but still well-defined [158].

2.3 Dependence Concepts

Copula functions offer an efficient way of studying and measuring underlying depen- dencies among various data. Theorem 2.2.8 [74] implies that the properties of copulas do not change when strictly increasing transformations are applied. Pearson’s linear correlation is a dependence measure that is widely used. However, sometimes it may be imprecise as it captures dependencies only of linear type, i.e., correlations. In the next section, we review some basic properties of linear correlation and we provide some copula-based dependence measures.

2.3.1 Linear Correlation

T Definition 2.3.1. Let [X1,X2] be a 2-dimensional random vector. The linear correlation coefficient is defined as [74, 158]

Cov(X1,X2) ρ(X1,X2) = , (2.27) Var(X1)Var(X2) p where Cov(X1,X2) denotes the covariance of the random variables X1 and X2, whereas Var(X1) and Var(X2) their respective non-zero finite variances. As mentioned before, the linear correlation coefficient measures linear depen- dence. When the linear dependence is perfect, namely, X2 = α1X1 + α0, where

α1 R 0 and α0 R, then we have: ρ(X1,X2) = 1; the converse holds as well. ∈ \{ } ∈ | | In any other case, it holds that 1 < ρ(X ,X ) < 1. Moreover, the linear correlation − 1 2 coefficient satisfies the following property [74, 158]:

ρ(α1X1 + α0, β1X2 + β0) = sign(α1β1)ρ(X1,X2), (2.28) where α1, β1 R 0 and α0, β0 R. Equation (2.28) shows that the linear cor- ∈ \{ } ∈ relation coefficient is invariant when (strictly) increasing linear transformation are applied. In addition, linear correlation can be easily handled under linear operations. Let

40 2.3. Dependence Concepts

m A, B be m l matrices, a, b R and let X1, X2 be random l-vectors; it can be × ∈ written that: T Cov(AX1 + a, BX2 + b) = ACov(X1, X2)B , (2.29) where ( )T signifies the transpose of a matrix. Hence, if γ Rl, then ∙ ∈

Var γTX = γTCov(X)γ, (2.30)  where Cov(X) , Cov(X, X). The linear correlation is widely used as (i) it can easily be computed, and (ii) it is used in well-known elliptical distributions, such as the Gaussian and the Student’s t-distributions. However, in most cases, data are accurately described by multivari- ate functions, and, thus, measures of linear dependence may prove inaccurate. Even in cases where data are elliptically distributed, the use of linear correlation, as defined by (2.27), can be also imprecise. Alternatively, we may considered heavy-tailed distributions, such as the t2-distribution, where linear cor- relation coefficient cannot be defined.

2.3.2 Perfect Dependence

Theorem 2.2.5 defines the Fr`echet-Hoeffding inequality

W (u) C(u) M(u) (2.31) ≤ ≤ for every L-copula C. In the bivariate case (L = 2), the bounds are copulas. Moreover, the bounds W (u) and M(u) are the bivariate distribution functions of (U, 1 U)T and (U, U)T , respectively, where U is a random variable, uniformly dis- − tributed on [0, 1]. In this case, the 2-dimensional W (u) describes perfect negative dependence and the 2-dimensional M(u) perfect positive dependence.

2.3.3 Concordance

T T Let [X1,X2] be a random vector with continuous components and [x1, x2] and T T T [x10 , x20 ] be two different realizations. We say that [x1, x2] and [x10 , x20 ] are con- cordant when (x x0 )(x x0 ) > 0, and discordant when (x x0 )(x x0 ) < 0. 1 − 1 2 − 2 1 − 1 2 − 2

41 Chapter 2. Copula Functions

Consider the following theorem [74, 158]:

T T Theorem 2.3.1. Let (X1,X2) and (X10 ,X20 ) be two independent vectors with continuous components, and let H and H0 be their respective multivariate distri- bution functions with common margins: FX1 (of both X1 and X10 ) and FX2 (of both X2 and X20 ). Moreover, let C and C0 be two copula functions such that

H(x1, x2) = C(FX1 (x1),FX2 (x2)) and H0(x1, x2) = C0(FX1 (x1),FX2 (x2)). We de- note by Q the following quantity:

Q = Pr (X X0 )(X X0 ) > 0 Pr (X X0 )(X X0 ) < 0 , (2.32) 1 − 1 2 − 2 − 1 − 1 2 − 2   i.e., the difference between the concordance and discordance probabilities of the random vectors. Then, it holds that

Q = Pr(C,C0) = 4 C0(u, v)dC(u, v) 1. 2 − ZZ[0,1] Proof. This proof is presented in [158]. Since we assume random vectors with con- tinuous components, it holds that: Pr((X X0 )(X X0 ) > 0) = 1 Pr((X 1 − 1 2 − 2 − 1 − X0 )(X X0 ) < 0) and, hence, Q = 2Pr((X X0 )(X X0 ) > 0) 1. 1 2 − 2 1 − 1 2 − 2 − Also it is valid that: Pr((X X0 )(X X0 ) > 0) = Pr(X > X0 ,X > X0 ) 1 − 1 2 − 2 1 1 2 2 − Pr(X1 < X10 ,X2 < X20 ). The computation of these probabilities can be done by integration over the respective distributions, that is,

Pr(X1 > X10 ,X2 > X20 ) = Pr(X10 < X1,X20 < X2)

= Pr(X10 < x1,X20 < x2)dC(FX1 (x1),FX2 (x2)) 2 ZZR

= C0 (FX1 (x1),FX2 (x2)) dC (FX1 (x1),FX2 (x2)) 2 ZZR = C0(u, v)dC(u, v). (2.33) 2 ZZ[0,1] In a similar way, we can calculate:

Pr(X1 < X10 ,X2 < X20 ) = 1 u v + C0(u, v) dC(u, v). (2.34) 2 − − ZZ[0,1] 

42 2.3. Dependence Concepts

1 It holds that E(U) = E(V ) = 2 as the copula C has as arguments uniformly distributed random variables. Thus,

Pr(X1 < X10 ,X2 < X20 ) = C0(u, v)dC(u, v), (2.35) 2 ZZ[0,1] and

Pr (X1 X10 )(X2 X20 ) > 0 = 2 C0(u, v)dC(u, v), (2.36) − − 2 ZZ[0,1]  which concludes the proof.  Theorem 2.3.1 [158] results in the following corollary presented in [74, 158]:

Corollary 2.3.1.1. Let C,C0 and Q be the quantities defined in Theorem 2.3.1. Then, the following properties are valid:

1. Q(C,C0) = Q(C0,C).

2. If there is a copula C00 such that C C00 then Q(C,C00) Q(C0,C00). ≺ ≤

3. Q(C,C0) = Q(C,Cˆ 0), where Cˆ is the survival copula.

Then we proceed with the definition of a dependence measure [185]:

Definition 2.3.2. A real-valued dependence measure ηX1,X2 is defined as a measure of concordance between two continuous variables X1 and X2, if ηX1,X2 satisfies the following properties:

1. The measure ηX1,X2 is defined for all (X1,X2)-pairs.

2. It holds that: 1 ηX ,X 1, ηX ,X = 1, ηX , X = 1. − ≤ 1 2 ≤ 1 1 1 − 1 −

3. It is valid that: ηX1,X2 = ηX2,X1 .

4. If X1 and X2 are independent, then ηX1,X2 = ηΠ = 0.

5. It is valid that: η X ,X = ηX , X = ηX ,X . − 1 2 1 − 2 − 1 2

6. If C and C0 are two copulas so that C C0, then η η . ≺ C ≤ C0 7. If (i) (X ,X ) is a sequence of n continuous variables with respective { 1,n 2,n } copulas C , and (ii) C converges point-wise to a copula C, then it holds { n} { n} that: limn ηC = ηC . →∞ n 43 Chapter 2. Copula Functions

Definition 2.3.2 [185] results in some additional outcomes. In particular, if X2 is an increasing function of X1, then ηX1,X2 = ηM = 1. Correspondingly, if X2 is a decreasing function of X , then η = η = 1. Moreover, if τ ( ) and τ ( ) 1 X1,X2 W − 1 ∙ 2 ∙ are strictly increasing functions on the respective ranges RanX1 and RanX2, then

ητ1(X1),τ2(X2) = ηX1,X2 .

2.3.4 Kendall’s tau and Spearman’s rho

Two extensively used dependence (or concordance) measures are the Kendall’s tau and the Spearman’s rho. These measures are very useful for non-elliptical distribu- tions, where the use of the Pearson’s correlation coefficient might not be accurate. Estimator and further details on Kendall’s tau and Spearman’s rho can be found in the related literature [40,119,123,126], whereas in [187] other dependence measures are described. First, we provide the definition of the Kendall’s tau [158]:

T T Definition 2.3.3. Let [X1,X2] be a random vector and [X10 ,X20 ] an independent copy of it. The Kendall’s tau is defined as:

τ(X ,X ) = Pr (X X0 )(X X0 ) > 0 Pr (X X0 )(X X0 ) < 0 . (2.37) 1 2 1 − 1 2 − 2 − 1 − 1 2 − 2   Clearly, the Kendall’s tau equals to the the probability of concordance minus the probability of discordance. The Kendall’s tau can be calculated based on the following theorem [158]:

T Theorem 2.3.2. Let [X1,X2] be a random vector with continuous components, described by a copula C. The Kendall’s tau is given by:

τ(X1,X2) = Q(C,C) = 4 C(u, v)dC(u, v) 1. (2.38) 2 − ZZ[0,1]

The integral in (2.38) is the expected value of C(U, V ) over [0, 1]2, where both variables U and V are uniformly distributed. Hence,

τ(X ,X ) = 4E(C(U, V )) 1. (2.39) 1 2 −

Now, we proceed with the definition of the other dependence measure, the Spear-

44 2.3. Dependence Concepts man’s rho [158].

T Definition 2.3.4. The Spearman’s rho for a random vector [X1,X2] is defined as:

ρ (X ,X ) = 3 Pr (X X0 )(X X00) > 0 Pr (X X0 )(X X00) < 0 (2.40) s 1 2 1− 1 2− 2 − 1− 1 2− 2    T T T where (X1,X2) ,(X10 ,X20 ) and (X100,X200) are independent copies.

The combination of Theorem 2.3.1 and Corollary 2.3.1.1 yields the following theorem [158] for the calculation of the Spearman’s rho:

T Theorem 2.3.3. Let [X1,X2] be a random vector with continuous components, T described by a copula C. The Spearman’s rho for [X1,X2] is given by:

ρs(X1,X2) = 3Q(C, Π) = 12 uvC(u, v)dC(u, v) 3 2 − ZZ[0,1] = 12 C(u, v)dC(u, v) 3. (2.41) 2 − ZZ[0,1]

Hence, if U = FX1 (X1) and V = FX2 (X2) with FX1 and FX2 respectively being the marginal distribution functions, then we can write:

ρ (X ,X ) = 12E(U, V ) 3 s 1 2 − 1 E(U, V ) 4 = 1 − 12 Cov(U, V ) = Var(U) Var(V )

= ρp(X1,X2)p. (2.42)

An important remark is that both Kendal’s tau and Spearman’s rho satisfy the properties of a dependence measure. This is stated by the following theorem [158].

Theorem 2.3.4. Let X1 and X2 be continuous random variables described by a copula C. The Kendall’s tau and the Spearman’s rho satisfy the properties in Definition 2.3.2.

Proof. See [158]. 

45 Chapter 2. Copula Functions

Apart from the properties described in Definition 2.3.2, there are some additional properties for concordance measures. First, recall that:

If C = M then τC = ρC = 1; (2.43) If C = W then τ = ρ = 1. (2.44) C C −

T for a random vector [X1,X2] described by a copula C. The following theorem states that the converse is also true.

Theorem 2.3.5. Let X1 and X2 be continuous random variables described by a copula C, and let η be either the Kendall’s tau or the Spearman’s rho. Then it is true that:

1. ηX1,X2 = 1 iff C = M.

2. η = 1 iff C = W . X1,X2 − Proof. See [75].  The definitions of the Kendall’s tau and the Spearman’s rho reveal that both are increasing functions with respect to the concordance ordering, given in Defini- tion 2.2.3. More importantly, when we deal with continuous random variables, the calculation of the Kendall’s tau or the Spearman’s rho can be done based on ev- ery value in [ 1, 1]. However, this this is not the case for the Pearson’s correlation − coefficient [75]. Both Kendall’s tau and Spearman’s rho can be extended to dimensions higher than 2. In this case, the pairwise measures are grouped in an n n matrix, similarly × to the Pearson’s correlation coefficient.

2.3.5 Tail Dependence

Tail dependence is a concept that applies to bivariate distributions and it expresses the dependence in either the upper-right or the lower-left quadrant tails. It is a constructive concept, especially for studying the dependence between extreme val- ues. Importantly, the tail dependence between two continuous random variables X1 and X2 appears to be a copula property, meaning that is invariant when increasing transformations are applied to X1 and X2.

46 2.3. Dependence Concepts

T Definition 2.3.5. Let [X1,X2] be a random vector with continuous components. The coefficient of the upper-tail dependence equals to:

1 1 Lu = lim Pr X2 > F − (u) X1 > F − (u) , (2.45) u X2 X1 →∞ |  where FX1 and FX2 are the marginal distribution functions. The existence of the limit L [0, 1] is a prerequisite. If L (0, 1] then X and X are asymptotically u ∈ u ∈ 1 2 dependent in the upper tail. In contrast, they are asymptotically independent in the upper tail if Lu = 0.

The tail dependence can be also expressed as [74]:

1 1 1 Pr(X > F − (u) X > F − (u)) = ... 2 X2 1 X1 1 | 1 Pr(X1 F − (u)) × − ≤ X1 1 1 1 Pr(X1 F − (u)) Pr(X2 F − (u)) + ... − ≤ X1 − ≤ X2 1 1 Pr( X1 F − (u),X2 F − (u)) . (2.46) ≤ X1 ≤ X2  Definition 2.3.6. Let C be a bivariate copula such that the limit:

1 2u + C(u, u) Lu = lim − (2.47) u 1 1 u → − − exists. Then, if L (0, 1], the copula C has upper-tail dependence. If L = 0 it u ∈ u upper-tail independence.

Let (U, V ) be a 2-dimensional random vector with uniformly distributed compo- nents and let C be a bivariate copula describing (U, V ). It holds that:

∂C(u, v) Pr(V v U = u) = . (2.48) ≤ | ∂u

Equation (2.48) can provide a similar expression when conditioning on V . Then, the

47 Chapter 2. Copula Functions limit can be expressed as:

Cˉ(u, u) Lu = lim u 1 1 u → − − dCˉ(u, u) = lim − u 1 du → − dC(s, t) dC(s, t) = lim 2 + s=u + s=u − u 1− ds t=u dt t=u →  

= lim Pr(V > u U = u) + Pr(U > u V = u) . (2.49) u 1 | | → −  If the copula C is exchangeable, then the expression of the limit Lu can be reduced to

Lu = lim 2 Pr(V > u U = u) . (2.50) u 1 | → −  C(u,u) Lower-tail dependence can be defined in a similar way. If Ll = limu 0+ → u and L (0, 1] then C has lower-tail dependence; if L = 0 then it has lower-tail l ∈ l independence. An alternative expression for Ll that is very important for copulas without closed-form expressions is the following:

C(u, u) Ll = lim u 0+ u → dC(u, u) = lim − u 0+ du → dC(s, t) dC(s, t) = lim + + s=u s=u − u 0 ds t=u dt t=u →  

= lim Pr(V < u U = u) + Pr(U < u V = u) . (2.51) u 0+ | | →  If C is an exchangeable copula, then we can write that:

Ll = lim 2 Pr(V < u U = u) . (2.52) u 0+ | →  The bivariate survival copula can be expressed as [158]:

Cˆ(u, v) = u + v 1 + C(1 u, 1 v), (2.53) − − −

48 2.4. Copula Families. whereas the joint survival function as:

Cˉ(u, v) = 1 u v + C(u, v) = Cˆ(u, v). (2.54) − −

Hence, it follows that

Cˉ(u, u) Cˆ(1 u, 1 u) Cˆ(u, u) lim = lim − − = lim . (2.55) u 1 1 u u 1 1 u u 0+ u → − − → − − → The last equation shows that the upper-tail coefficient of the copula C is equal to the lower-tail coefficient of the survival copula Cˆ. Similarly, the lower-tail coefficient of the copula C equals to the upper-tail coefficient of Cˆ.

2.4 Copula Families.

Copulas provide an efficient statistical tool for constructing multivariate distribu- tions, even when the marginal distributions are of diverse types. To built a variety of stochastic models with various properties, such as asymmetries on dependence structure or heavy tails, we need to have at our disposal different sets of copulas, called copula families. To this end, extensive research has been conducted on the construction of families with specific properties. This chapter presents the most important of them, that are widely used in the literature and are used in this thesis. The fact that several copulas are available for fitting stochastic models on a dataset seems convenient, but that there are some important aspects that need to be taken into account. In particular, any copula fitting procedure may be lead to inaccurate modelling if copula families with some unnecessary characteristics, such as light tails or exchangeability, have been considered. The same hold when the family lacks specific behaviour and cannot efficiently capture the dependence structure of a dataset. To this end, an analysis of the copula characteristics should always be carried out by the researcher (or the data scientist) before the use of a family. The “goodness” of a copula family C , where θ is a parameter is a subset { θ} Θ RL,L 1, depends on the following characteristics [111]: ∈ ≥

49 Chapter 2. Copula Functions

• Interpretability. The members of a copula family should have some stochas- tic interpretation such that we can identify “natural” situations where this family could be used. For example, the elliptical family is useful in applica- tions where the dependence structure seems symmetric.

• Dependence range. The different members of a copula family should cover a variety of dependence ranges, tail dependencies and asymmetries.

• Ease in handling. Closed-form expression of the copulas in a family can pro- vide ease in calculation, especially when the number of the margins increases. Alternatively, copulas should be easily simulated via a known method.

In the literature [75,83,88,114,158], there exist various bivariate and multivari- ate copula families, which are separated into two categories: implicit and explicit copula functions. Implicit copulas do not have a simple closed-form density expres- sion, but they are implied by well-known multivariate distribution functions. The second category refers to explicit copulas, which are not derived from multivariate distribution functions, but have simple closed-form expressions.

2.4.1 Elliptical Copulas

A characteristic example of implicit copulas is the Elliptical copulas, which are associated to the elliptical distribution. They provide symmetric expressions and are more attractive for applications where the dimensionality of the variables (i.e., L) increases.

We say that a random vector X = (X1,X2,...,XL) follows an elliptical distri- bution, i.e., X (μ, Σ, g), whereμ RL is the mean vector, Σ the covariance ∼ E ∈ matrix and g : [0, ] [0, ] the generator, if it can be expressed as [158]: ∞ → ∞

X = μ + RAU. (2.56)

In Equation (2.56), Σ = AAT signifies the Cholesky decomposition [203] of Σ, U is an L-dimensional random vector with uniformly distributed components lying on the sphere L 1 L 2 2 2 S − = u R : u + u + u = 1 , ∈ 1 2 ∙ ∙ ∙ L  50 2.4. Copula Families. and R is a variable independent of U that is drawn from the following pdf:

L 2π 2 L 1 2 f(r) = L r − g r , r > 0. (2.57) Γ 2   An elliptical distribution has a density function given by:

1 T 1 L hg(x) = g (x μ) Σ− (x μ) , x R . (2.58) Σ − − ∈ | |  p As the density of the elliptical distributions is constant on ellipses, namely, the contour lines of the distribution are ellipses, this family is named elliptical. The scaled components of an elliptical distribution X1 ,..., XL are identically σ1 σL distributed according to a distribution function Fg. However, when the components are not identical a copula function of a multivariate elliptical distribution can be used to increase modelling flexibility. This results in a meta-elliptical distribution [80], which is defined as follows [111]:

Definition 2.4.1. Let X be a random vector such that X (μ, Σ, g), and let ∼ E Xl Fg, l 1, 2,...,L . The distribution function of the random vector √σl ∼ ∀ ∈ { }   X X X 1 , 2 ,..., L (2.59) √σ √σ √σ  1   2   L  is called an elliptical copula.

As copulas are invariant with monotone transformations, the construction of a meta-elliptical distribution, where the marginal distributions are given, can be simply done by considering X (0, R, g), with R being the correlation matrix. ∼ E Hence, we can construct the implicit elliptical copula of the vector X as [111, 158]

1 1 1 C(u1, . . . , uL) = G G1− (u1),...,Gl− (ul),...,GL− (uL) , (2.60)  where u [0, 1], l 1, 2,...,L , G is the multivariate cdf of X, G is the univariate l ∈ ∈ { } l 1 marginal cdf of Xl, and G− is the inverse function of Gl. A target random vector

Y that follows a meta-elliptical distribution with marginal cdfs F1 ...,FL, can be

51 Chapter 2. Copula Functions obtained from [111, 158] 1 Yl = Fl− (Gl(xl)). (2.61)

1 where Fl− is the inverse function of Fl. Typically, an elliptical copula does not have a closed-form expression. In the next paragraphs, we will describe the two most well-known copulas of the Elliptical family, the Gaussian (or normal) and the Student’s t copulas [111, 158].

Gaussian copula.

The Gaussian copula is derived when the generator takes the form [111, 158]:

1 L t g(t) = exp . (2.62) √2π −2    

Then the Gaussian copula CG can be written as [111, 158]

1 1 1 CG(u) = ΦRG (Φ− (u1),..., Φ− (ul),..., Φ− (uL)), (2.63)

T where u = [u1, . . . , ul, . . . , uL] and ul = Fl(xl), l = 1, 2,...,L. Also, ΦRG denotes the standardized multivariate Gaussian distribution with correlation matrix RG, whereas Φ the standardized univariate Gaussian distribution. Based on Equation (2.58), the normal copula density is given by [53]

1 1 1 T c (ξ) = exp ξ R− I ξ , (2.64) G − −2 G − RG   | |  1 1 p 1 where ξ = [Φ− (u ), Φ− (u )...., Φ− (u )] and I is the L L identity matrix. 1 2 L × Figures 2.7(a) and 2.7(b) respectively depict the bivariate Gaussian copula function and the corresponding Gaussian copula density as a function of the marginal cdf values, where actual electricity generation data from an on-site solar panel and a wind turbine are used [7]. The Gaussian copula provides a simple closed-form expression for capturing sym- metric dependence structure among data. Being parameterized only by the correla- tion matrix RG, it simplifies the modelling procedure but may result in inaccurate handing of extreme values. In the next section, we describe another copula that

52 2.4. Copula Families. belongs to the elliptical family, i.e., the Student’s t copula [111, 158], which deals with this issue more efficiently.

Student’s t copula.

The Student’s t copula (alias, t-copula) has a generator function of the form [111, 158]: L+ν 1 − 2 g(t) = c 1 + , (2.65) t ν   where ct is a suitable constant and with ν is the degrees of freedom. The multivariate t-copula function [158] can be expressed as

1 1 1 Ct(u) = TRt,ν (tν− (u1), tν− (u2), ..., tν− (uL)), (2.66)

where TRt,ν is the standardized multivariate t-distribution with ν degrees of free- 1 dom and correlation matrix Rt, and tν− denotes the inverse of the univariate t- distribution. The t-copula density is [158]

L ν+L ν+L ν 1 1 T − 2 1 Γ Γ 1 + ηRt− η 2 2 2 ν ct(η) = Rt − ν+1 , (2.67) | | L 2 2   L  η − Γ ν+L Γ ν 1 + l 2 2 ν l=1      Y 1 where η = [η , η ..., η ], η = t− (u ), l = 1, 2,...,L and Γ( ) is the gamma function. 1 2 L l ν l ∙ Figures 2.8(a) and 2.8(b) respectively show the bivariate t-copula function and the corresponding t-copula density as a function of the marginal cdf values, where actual electricity generation data from an on-site solar panel and a wind turbine are used [7]. Compared to the Gaussian copula, the t-copula is characterized by an additional parameter (apart from the correlation matrix), the degrees of freedom. This pro- vides extra flexibility to the model, especially for expressing better the dependencies among extreme values [32].

53 Chapter 2. Copula Functions

1

0.8 )

2 0.6 ,u 1 (u

G 0.4 C

0.2

0 0 0.5 0.8 1 0.4 0.6 1 0 0.2 u u 1 2 (a) Bivariate Gaussian copula.

15

) 10 2 ,u 1 (u

G 5 c

0 1 0 0.5 0.2 0.4 0.6 0.8 0 u 1 u 1 2

(b) Bivariate Gaussian copula density.

Figure 2.7: Bivariate Gaussian copula function and the corresponding cop- ula density function fitted on actual electricity generation data from a wind turbine (X1) and a solar panel (X2) [7]. The linear correlation parameter is equal to ρG = 0.8325.

54 2.4. Copula Families.

1

0.8 )

2 0.6 ,u 1 (u t 0.4 C

0.2

0 0 0.5 0.8 1 0.4 0.6 1 0 0.2 u u 1 2 (a) Bivariate Student’s t copula.

2

1.5 ) 2 ,u

1 1 (u t c 0.5

0 0 1 0.8 0.5 0.6 0.4 0.2 1 u 0 u 1 2

(b) Bivariate Student’s t copula density.

Figure 2.8: Bivariate Student’s t copula function and the corresponding cop- ula density function fitted on actual electricity generation data from a wind turbine (X1) and a solar panel (X2) [7]. The linear correlation parameter is equal to ρt = 0.8325 and the degrees of freedom is found to be ν = 10. KDE is used for the calculation of the marginal cdf values.

55 Chapter 2. Copula Functions

2.4.2 Archimedean Copulas

Archimedean copulas are the most popular explicit copulas because they are easily derived. Contrary to elliptical copulas, which are suitable for modelling datasets with symmetric dependence structure, archimedean copulas capture wide ranges of dependence [158]. However, being parameterized by a single parameter, they lack some modeling flexibility.

An Archimedean copula Ca can be represented by [158]

1 C (u , . . . , u ) = g− g (u ;θ ) + + g (u ; θ ); θ , (2.68) a 1 L a a 1 a ∙ ∙ ∙ a L a a  where g : [0, 1] Θ [0, ) is a continuous, strictly decreasing and convex function a × → ∞ such that ga(1; θa) = 0, and θa is the copula parameter defined in the range Θ. The function ga(u; θa) is called Archimedean generator and its pseudo-inverse, defined by 1 1 ga− (u; θa) if 0 u ga(0; θa), ga− (u; θa) = ≤ ≤ (2.69)  0 if g (0; θ ) u ,  a a ≤ ≤ ∞ has to be completely monotonic of order L [149]. In the next paragraphs, we present the the most widely used Archimedean copulas [148].

Frank copula.

The Frank copula is a symmetric Archimedean copula that does not exhibit either upper or lower tail dependence [158]. The standard expression for members of this family of L-copulas is

L θFrul 1 l=1 e− 1 CFr(u1, . . . , uL) = log 1 + L−1 (2.70) −θFr (e θFr 1) − ! Q − −  where θFr > 0. The limiting case θFr = 0 corresponds to ΠL. For the bivariate case

(L = 2), the parameter θFr can also take negative values. The generator of the Frank copula family is given by:

θ u e− Fr 1 gFr(u; θFr) = log − , (2.71) − e θFr 1  − − 

56 2.4. Copula Families. whereas the Frank copula density function can be expressed as:

L 1 L θFr − exp( θFr l=1 ul) cFr(u) = Li (L 1)(HF (u)) − , (2.72) 1 e θFr − − H (u)  − −  F P where u = [u1, . . . , uL] and Li s(z) denotes the polylogarithm of order s at z given − by the following expression [104]:

∞ zk Li s(z) = . (2.73) − − ks kX=1 Also, the function HF (u) is given by

L ul HF (u) = θFr . (2.74) 1 θFr(1 ul) Yl=1 − − Copulas of this type have been introduced by Frank [81]. Figures 2.9(a) and 2.9(b) respectively show the bivariate Frank copula function and the corresponding copula density as a function of the marginal cdf values, where actual electricity generation data from an on-site solar panel and a wind turbine are used [7].

Mardia-Takahasi-Clayton copula.

Contrary to the Frank copula, the Mardia-Takahasi-Clayton copulas are asymmetric. The standard expression for members of this family of L-copulas is

1 L − θCl θCl CCl(u1, . . . , uL) = max ul− (L 1) , 0 (2.75)  − − !   Xl=1 

1   where θCl L 1 , θCl = 0. The limiting case θCl = 0 corresponds to the indepen- ≥ − − 6 dence copula. The Archimedean generator of this family is given by

1 g (u; θ ) = (max 1 + θ u, 0 )− θCl . (2.76) Cl Cl { Cl }

In [149], it is proven that for every L-dimensional Archimedean copula C and for L 1 every u = [u1, . . . , uL] I , CCl(u) C(u) for θCl = L 1 . ∈ ≤ − −

57 Chapter 2. Copula Functions

1.2

1

0.8 ) 2 ,u 1 0.6 (u Fr

C 0.4

0.2

0 0 0.5 0.8 1 0.4 0.6 1 0 0.2 u u 1 2 (a) Bivariate Frank copula.

2

1.5 ) 2 ,u

1 1 (u Fr c 0.5

0 0 1 0.8 0.5 0.6 0.4 0.2 1 u 0 u 1 2

(b) Bivariate Frank copula density.

Figure 2.9: Bivariate Frank copula function and the corresponding copula density function fitted on actual electricity generation data from a wind turbine (X1) and a solar panel (X2) [7]. The copula pa- rameter is equal to θFr = 1.4901. KDE is used for the calculation of the marginal cdf values.

58 2.4. Copula Families.

The Mardia-Takahasi-Clayton copula density is given by [104]

1 L 1 L θCl − − cCl(u) = (θClk + 1) ul (1 + tθCl (u)) (2.77) ! kY=1 Yl=1

L 1 where tθCl (u) = l=1 gCl− (ul; θCl). Copulas of thisP type can be derived from two types of multivariate distributions: the Pareto distribution by Mardia [140] and the Burr distribution by Takahasi [205]. Bivariate expressions can be found in [121], whereas multivariate in [54, 89]. Note that a special member of this family can be derived from the bivariate first-type logistic model of Gumbel [97]. Oakes [163] showed that bivariate copulas expressed in Equation (2.75) are associated with the Clayton’s model [52]. For this reason (and also for the sake of simplicity), this family is widely known as Clayton copulas. In Figures 2.10(a) and 2.10(b), the bivariate Mardia-Takahasi-Clayton copula function and the corresponding density are respectively shown as a function of the marginal cdf values, where actual electricity generation data from an on-site solar panel and a wind turbine are used [7].

Gumbel-Hougaard copula.

Gumber-Hougaard is another member of the Archimedean copula family, introduced by Gumbel in [98] and by Hougaard in [107]. Similarly to the Clayton copula, it is an asymmetric copula with higher probability concentrated in the right tail. It is suited not only to random variables that are positively correlated but also, particularly, to those in which high values of each are more strongly correlated than low values. The standard expression for members of this family of L-copulas is

1 L θGb θGb CGb(u1, . . . , uL) = exp ( log ul) (2.78) − − !  Xl=1   where θ 1. For θ = 1 we obtain the independence copula, whereas the limit of Gb ≥ Gb C for θ is the co-monotonicity copula [158]. The Archimedean generator Gb Gb → ∞

59 Chapter 2. Copula Functions

1

0.8 )

2 0.6 ,u 1 (u

Cl 0.4 C

0.2

0 0 0.5 0.8 1 0.4 0.6 1 0 0.2 u u 1 2 (a) Bivariate Mardia-Takahasi-Clayton copula.

1.5

) 1 2 ,u 1 (u

Cl 0.5 c

0 0 1 0.5 0.8 0.6 0.4 0.2 1 u 0 u 1 2

(b) Bivariate Mardia-Takahasi-Clayton copula density.

Figure 2.10: Bivariate Mardia-Takahasi-Clayton copula function and the cor- responding copula density function fitted on actual electricity generation data from a wind turbine (X1) and a solar panel (X2) [7]. The copula parameter is equal to θCl = 0.0596. KDE is used for the calculation of the marginal cdf values.

60 2.5. Copula Fitting of this family is given by

1 g (u; θ ) = exp u θGb . (2.79) Gb Gb −   The copula Gumbel-Hougaard density is given by [104]

L θ 1 L α l=1( log ul) − Gb α cGb(u) = θ exp ( tθ(u )) − P (tθ(u) ) (2.80) − L L L,α Qtθ(u) l=1 ul where Q L Gb Gb k PL,α(x) = qLk (α)x (2.81) Xl=1 and

L k Gb L k l L! k αl L l a (α) = ( 1) − α s(L, l)S(l, k) = ( 1) − (2.82) Lk − k! l L − Xl=k Xl=1    where l 1, 2,...,L . S and s denote the Stirling numbers of the second kind and ∈ { } the first kind, respectively, given by the recurrence relations

S(n + 1, k) = S(n, k 1) + kS(n, k), (2.83) − s(n + 1, k) = s(n, k 1) ns(n, k), (2.84) − − with S(0, 0) = s(0, 0) = 1 and s(n, 0) = s(0, n) = S(n, 0) = S(0, n) = 0. In Figures 2.11(a) and 2.11(b) the bivariate Gumbel-Hougaard copula function and the corresponding density are respectively shown as a function of the marginal cdf values, where actual electricity generation data from an on-site solar panel and a wind turbine are used [7].

2.5 Copula Fitting

In [114] Joe writes:

“... Statistical modelling usually means that one comes up with a simple (or mathe- matically tractable) model without knowledge of the physical aspects of the situation.

61 Chapter 2. Copula Functions

1

0.8 ) 2 0.6 ,u 1 (u

Gb 0.4 C

0.2

0 0 0.5 0.8 1 0.4 0.6 1 0 0.2 u u 1 2 (a) Bivariate Gumbel-Hougaard copula.

5

4 ) 2 3 ,u 1

(u 2 Gb c 1

0 0 1 0.8 0.5 0.6 0.4 0.2 1 u 0 u 1 2

(b) Bivariate Gumbel-Hougaard copula density.

Figure 2.11: Bivariate Gumbel-Hougaard copula function and the correspond- ing copula density function fitted on actual electricity generation data from a wind turbine (X1) and a solar panel (X2) [7]. The copula parameter is equal to θGb = 1.2226. KDE is used for the calculation of the marginal cdf values.

62 2.5. Copula Fitting

The statistical model needs to be ‘real’ and is not an end but a means of providing statistical inferences... My view of multivariate modelling, based on experience with multivariate data, is that models should try to capture important characteristics, such as the appropriate density shapes for the univariate margins and the appropri- ate dependence structure, and otherwise be as simple as possible. The parameters of the model should be in a form most suitable for easy interpretation (e.g., a parameter is interpreted as either a dependence parameter or a univariate parameter but not some mixture)... ”

With these words Joe tried to explain two of the most important aspects in statistical modeling. First, he says that a model is not only a complex mathematical tool but should also have a meaning for the application that it is applied to. In the previous section, we described various copula families that can capture different types of dependence structure. The members of a copula family should have some stochastic interpretation such that we can identify “natural” situations where this family could be used to. Second, based on Joe, each model should be parameterized in such a way that the parameters are easily interpreted. Thus, copula parameters should be closely related with the dependence concepts described in Section 2.3. This section deals with the second aspect and particularly with methods for estimating copula parameters. As we will see, the main idea involves the decomposition of complex multivariate modelling problems into simpler statistical problems.

2.5.1 Parametric Copula Estimation

(t) (t) (t) (t) Let x = [x1 , . . . , xl , . . . , xL ] denote a realization (alias, a sample) of the ran- (t) (t) (t) (t) dom process X = [X1 ,...,Xl ,...,XL ] at instant t described by the joint (t) (t) (t) (t) pdf fX(t) (x1 , . . . , xL ). Also, let fXl (xl ) and FXl (xl ) respectively denote the marginal pdf and cdf of the variable X(t) and θ be the K 1 vector with the l × parameters taking values in the parameter space Θ. If t is an observation, we de- (t) (t) note the likelihood function of t by Lt(θ) = fX(t) (x1 , . . . , xL ; θ) and its logarithmic (t) expression by `t(θ) = ln Lt(θ). If the realizations of the random vector X are inde- pendently and identically distributed (i.i.d.), the likelihood function can be written

63 Chapter 2. Copula Functions as:

T

L(θ) = Lt(θ) t=1 Y T (t) (t) = fX(t) x1 , . . . , xL ; θ . (2.85) t=1 Y   Therefore, the log-likelihood function can be expressed as follows:

`(θ) = ln L(θ)

T

= ln Lt(θ) t=1 Y T T = ln Lt(θ) = `t(θ) t=1 t=1 X  X  T (t) (t) = ln fX(t) x1 , . . . , xL ; θ , (2.86) t=1 X   where T is the total number of observations. We say that θML is the maximum likelihood (ML) estimator if b

` θ `(θ), θ Θ. (2.87) ML ≥ ∀ ∈   b Under regularity conditions, the ML estimator θML is characterized by the property of asymptotic normality [58]. Then, it holds that b

1 1 − √T θML θ0 0, (θ0) , (2.88) − → N T J !     b where θ0 denotes the true value of the parameter vector we want to estimate, and (θ ) denotes the Fisher information matrix [118] with J 0

1 d2 ln f (x , . . . , x ; θ) (θ ) = E X 1 L . (2.89) T J 0 − dθdθT " θ=θ0 #

64 2.5. Copula Fitting

This property is of fundamental importance, as it provides the asymptotic distri- bution of any maximum likelihood estimator. Moreover, in some cases where the sample distribution for the ML estimators cannot be derived, the asymptotic distri- bution is the only mean we have for inference. Using the expression of the copula density in (2.26), the log-likelihood in (2.86) can be written as:

T L `(θ) = ln c F x(t) ,...,F x(t) f x(t) X1 1 XL L ∙ Xl l t=1 X n h    i Yl=1   o T T L (t) (t) (t) = ln c FX1 x1 ,...,FXL xL + ln fXl xl . (2.90) t=1 t=1 X h    i X Xl=1   By assuming uniform margins, we yield that:

T (t) (t) (t) `(θ) = ln c u1 , . . . , ul , . . . , uL . (2.91) t=1 X   The log-likelihood function for the Gaussian copula family takes the following form:

T 1 T T ` (R ) = ln R ξ(t) (R I)ξ(t), (2.92) G G − 2 | G| − 2 G − t=1 X  

(t) 1 (t) 1 (t) where ξ = Φ− u1 ,..., Φ− uL . The ML estimate of RG is given by [137] h    i 1 T T R = ξ(t) ξ(t). (2.93) G,ML T t=1 X   b Clearly, the Gaussian copula family allows for closed-form expressions of the ML estimator. However, this is not the case for every copula family, where the estimation of the related parameters may be carried out via numerical optimization methods.

65 Chapter 2. Copula Functions

An example is the Student’s t-copula, where the log-likelihood function is given by

T ν + L T 1 T ` (R , ν) ln R ln 1 + η(t) R η(t) t t ∝ − 2 | t| − 2 ν t t=1     X 2 T L (t) ν + 1 ηl + ln 1 + , (2.94) 2   ν   t=1 l=1 X X     (t) (t) (t) (t) 1 (t) where η = [η1 , . . . , ηL ] and η1 = tν− ul . The aforementioned method is calledexact maximum likelihood (EML) and may result in serious computational expense, especially when the number of the marginal distributions increases. This is because EML involves joint estimation of the marginal parameters and the parameters of the dependence structure. However, this approach does not leverage the fact that parameters in a copula expression are separated into (a) marginal distribution parameters θ1, . . . , θL, and (b) common dependence structure parameters β. To this end, we modify the log-likelihood in (2.90) as follows:

T (t) (t) `(θ) = ln c FX1 x1 ; θ1 ,...,FXL xL ; θL ; β t=1 X h     i T L (t) + fXl xl ; θl , (2.95) t=1 X Xl=1   where the parameters’ vector is θ = [θ1, . . . , θL, β]. Now we can solve the whole esti- mation problem into two steps. In particular, the first step deals with the parameter estimation of the marginal distributions, which can be achieved by:

L (t) θl = arg max ln fXl xl ; θl . (2.96) θl Xl=1   b In the second step, we use the estimates of the marginal distributions to compute the estimates of the copula parameter(s) as:

T (t) (t) β = arg max ln c FX1 x1 ; θ1 ,...,FXL x ; θL ; β . (2.97) β L t=1 X h     i b 66 2.5. Copula Fitting

This method is called inference functions for margins (IFM) and has lower com- plexity than EML. In general, it holds that

θ = θ . (2.98) EML 6 IFM b b As studied in [114], the IFM method can be more efficient compared to the ML estimation method. Based on the IFM method, we can note that the estimation of β can be carried out without identifying the marginals. To further extend this, we (t) (t) consider another approach where the observed data x1 , . . . , xL are transformed (t) (t)   into their respective cdf values u1 , . . . , uL , resulting in the following estimation method:   T (t) (t) β = arg max ln c u1 , . . . , u ; β . (2.99) β L t=1 X   b Equation (2.99) shows that β could be viewed as the ML estimator given the ob- served margins. Of course, this requires the marginal distributions be parametric. As b it is based on the empirical distributions for the data transformation, this method is called canonical maximum likelihood (CML).

2.5.2 Non-Parametric Copula Estimation

Although this thesis does not consider non-parametric estimation of copulas, we believe that, for the sake of completeness, a description of the most well-known non- parametric copula estimation methods should be provided to the reader. Empirical copulas have been introduced in [59]. (t) (t) (t) (t) Let x = [x1 , . . . , xl , . . . , xL ] denote a realization (alias, a sample) of the (t) (t) (t) (t) random process X = [X1 ,...,Xl ,...,XL ] at instant t. The empirical copula distribution is given by

T t1 tl tL 1 C ,..., ,... = 1 (t) (t ) (t) (t ) (t) (t ) (2.100) x x 1 ,...,x x l ,...,x x L T T T T 1 1 l l L L t=1 ≤ ≤ ≤   X b where x(t) are the order statistics and 1 t , . . . , t T . The empirical copula l ≤ 1 L ≤

67 Chapter 2. Copula Functions frequency is given by the following expression:

1 (t1) (tl) (tL) (t) t t t if x1 , . . . , xl , . . . , xL X , c 1 ,..., l ,... L = T ∈ T T T      0 otherwise. b  The relationship between empirical copula distribution and frequency is

t t t t t t 1 l L i i i C 1 ,..., l ,... L = c 1 ,..., l ,... L . (2.101) T T T ∙ ∙ ∙ ∙ ∙ ∙ T T T i =1 i =1 i =1   X1 Xl XL   b b Alternately, the previous relationship can be written as

2 2 2 t1 tl tL i + +i +i c ,..., ,... = ( 1) 1 ∙∙∙ l∙∙∙ L T T T ∙ ∙ ∙ ∙ ∙ ∙ − × i =1 i =1 i =1   X1 Xl XL b t1 i1 + 1 tl il + 1 tL iL + 1 C − ,..., − ,..., − . (2.102) T T T   b Empirical copulas can be used for the estimation of dependence measures. For ex- ample, an the Spearman’s rho can be estimated as:

12 T T t t t t ρ = C 1 , 2 1 2 . (2.103) T 2 1 T T − T 2 t =1 t =1 − X1 X2     b b Identification of an Archimedean copula.

In [59] an empirical method was developed to identify of Archimedean copula fam- ilies. Let X be an L-dimensional random vector, C the associated copula with gen- erator g( ) and K( ) the function defined as ∙ ∙

K(u) = Pr(C(u , . . . , u ) u). (2.104) 1 L ≤

In [90], it is proven that

L l l g (u) K(u) = u + ( 1) κL 1(u), (2.105) − l! − Xl=1

∂κL 1(u) 1 where κL(u) = ∂g−(u) , and κ0(u) = ∂g(u) .

68 2.5. Copula Fitting

In the bivariate case, this formula reduces to

g(u) K(u) = u . (2.106) − g0(u)

A non parametric estimate of K is given by

1 T K(u) = 1[θ u], (2.107) T i≤ t=1 X b where 1 T θi = 1[x x(t),...,x x(t)] (2.108) T 1 1 1 L L t=1 ≤ ≤ − X The idea is to fit K( ) by choosing a copula from an Archimedean family. ∙ b 2.5.3 Choosing the Right Copula

Copula functions provide in fact the means for expressing the dependence structure among the different heterogeneous sources. The choice of an appropriate copula family that is going to fit the data is of paramount importance. Typically, in the majority of applications, a parametric family of copulas is taken among others and is fitted to the data by estimating the parameters of the family [70,183]. In general, there is not a systematic, rigorous methodology for choosing the appropriate copula for an application: nothing can tell us that the selected copula family will converge to the real dependence structure of the considered dataset [70]. Nevertheless, there exist some attempts towards providing a methodology for appropriate copula se- lection, mainly focussing on finance applications. In particular, the study in [70] proposes some methods to choose copulas based on the empirical copula introduced by Deheuvels in [59]. In case of the Archimedean family, the choice is based on

Kendall’s processes, as mentioned in Section 2.5.2. In the general case, the `p norm is used as a distance measure between a considered copula and the empirical copula. Finally, a Bayesian approach for copula selection methods has been proposed in [108]. However, the authors focus mainly on the bivariate case and they do not provide a theoretical Bayesian framework to demonstrate the convergence of their method.

69 Chapter 2. Copula Functions

2.5.4 Dealing with Discrete Marginal Distributions

We conclude this chapter by describing how one can deal with applications involving multivariate heterogeneous discrete data. This is not directly the scope of this thesis, since here we deal with applications where the marginal distributions are continuous. Nevertheless, multivariate copula modelling with discrete margins is applied to many diverse fields, such as biostatistics, marketing and finance and hence provide a bridge between the proposed data gathering mechanisms and applications related to these domains. The literature on copula modeling of discrete data covers a wide range of copula families and estimation methods. For copula functions with a closed form, MLE is straightforward. The probability mass function (pmf) for L-dimensional data can be computed by taking 2L finite differences of the copula. This is an approach that involves intense computations, and becomes infeasible for scenarios with high dimensions. Therefore, this approach is most frequently used in case of copulas that are fast to compute. For instance, a variation of the Farlie-Gumbel-Morgenstern (FGM) copula family [158] is used in [127], a mixture of max-id copulas in [159] as well as a copula function based on a finite normal mixture is proposed in [160]. However, these copulas achieve limited dependence: the mixture of max-id is not able to capture negative dependence, whereas the FGM and finite normal mixture copulas capture weak dependence compared to more flexible elliptical copulas. Significant efforts have reported on applying elliptical copulas—and specifically the Gaussian copula—to discrete data. This is due to the many attractive prop- erties of elliptical copulas, such as their ability to capture a wide range of depen- dence structure that can be both positive and negative. However, many elliptical copulas (including the Gaussian copula) cannot be written down in closed-form expressions, and hence MLE via taking finite differences is not easy to compute. In- stead, likelihood-based estimation requires integration of the Gaussian copula den- sity over a rectangle, which can be evaluated either numerically or by simulation techniques [92, 113]. Moreover, Bayesian methods provide another alternative for estimation in elliptical copula-based models (for furether information, we refer the reader to [171] for the Gaussian copula and to [197] for the skewed t-copula). In gen- eral, both frequentist and Bayesian techniques are computationally intensive, and

70 2.5. Copula Fitting may not scale easily to higher dimensions. Finally, estimation can be based on the composite likelihood estimation techniques [214]. While such methods are faster, they are not as statistically efficient as MLE.

71

Chapter 3

Data Gathering via Multiterminal Coding and Copula Regression

3.1 Introduction

Figure 1.7 presents the popularity scores of several emerging IoT applications in a wide span of disciplines, ranging from smart cities and smart water systems to smart metering, home automation and smart farming and agriculture. It can be observed that application scenarios related to smart homes and smart buildings have attracted considerable interest during the last two years. Thus, in this chapter1, we address use cases connected with these domains, such as ambient-condition monitoring in indoor environments (e.g., temperature and humidity monitoring in data centers). Typically, these scenarios involve wireless sensor networks of small-to-medium scale, where wireless devices collect diverse correlated information, such as light, pressure, temperature, or humidity data, process it, and then transmit it to central nodes for storage and/or further processing [15]. This information is then upload on the cloud by integrating WSNs in the generic internet infrastructure via the 6LoWPAN/IPv6

1The study presented in this chapter is partially based on [235].

73 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression standard [196]. The main scope of this study is to provide an efficient and secure way for gather- ing the sensed information (by sensor nodes) at the sink. As mentioned in Chapter 1, data aggregation and recovery mechanisms for WSNs face specific problems and challenges. In particular, since wireless sensors are typically powered by batteries that cannot easily be changed or recharged, the primary constraint in the design of WSNs is energy consumption. Power savings can be achieved by reducing the radio emission of the sensors, which, therefore, calls for efficient compression of the trans- mitted data. To this end, the intra- and inter-sensor data dependencies need to be effectively leveraged without increasing the computational effort at the sensor nodes and without requiring inter-sensor communication. The encoding complexity at the sensor nodes should be as low as possible and the computational burden should be shifted towards energy-robust central nodes (e.g., base stations, fusion centers). Moreover, the encoding design needs to be flexible in terms of rate allocation so as to avoid continuous reconfiguration. Existing works [139, 180, 217] propose conventional compression algorithms for WSNs, where low-memory differential pulse-code modulation (DPCM) [93] followed by entropy coding is used. Such compression schemes exploit the intra-sensor data dependencies, namely, the dependencies among consecutive samples collected by each sensor. In order to leverage the dependence among data collected by different nodes, the conventional predictive coding paradigm requires that data be exchanged between the nodes, which in turn implies that inter-sensor communication is estab- lished. However, this encoding strategy introduces additional radio transmission requirements for the sensors, thereby leading to rapid battery depletion. An alternative strategy for efficient data compression in WSNs adheres to dis- tributed source coding (DSC), a paradigm that leverages inter-sensor data depen- dencies at the decoder side. DSC was initiated by Slepian and Wolf [195], who showed that by separate encoding, two correlated sources can be compressed to their joint entropy with vanishing decoding error probability as the code length goes to infinity. Later, Wyner and Ziv [227] established the rate-distortion bound for lossy compression with decoder side information. They showed that when the source and the side information are jointly Gaussian and the mean-squared error

74 3.1. Introduction

(MSE) is used as the distortion metric, there is no performance loss incurred by not using the side information at the encoder. Recently, this no-rate-loss property has been extended to the case where the source and the side information are binary and correlated by means of the Z-channel [61]. Berger [23] and Tung [213] introduced the multiterminal (MT) source coding problem, which refers to separate lossy encoding and joint decoding of two (or more) correlated sources. From a theoretical perspective, the problem is shown to be challenging: an achievable rate region for the general MT problem is still unknown, but inner and outer bounds have been devised [23, 213]. Theoretical studies have focused on special cases such as the quadratic Gaussian, where Gaussian sources and a quadratic distortion criterion are assumed [220]. Towards practical implementations of DSC for WSNs, a two-sensor Slepian-Wolf (SW) coding scheme for temperature monitoring was deployed in [167], where rate adaptation was achieved by means of an entropy tracking algorithm. An alternative SW design using Raptor codes was proposed in [62,64] for cluster-based WSNs that measure temperature data. Instead of using SW coding as in [62,64,167], the work in [45] devised a Wyner-Ziv (WZ) code construction for WSNs measuring temperature data, which comprised quantization followed by binarization and LDPC encoding. Focusing on the application of wind farm monitoring, an MT code construction [230] was developed to compress wind speed measurements in [201]. Existing DSC designs consider a limited number of sensors (typically two or three), since SW coding for many data sources is difficult to implement in practice. To address this limitation, the authors of [231] and [49] proposed to replace SW coding with entropy coding, thereby obtaining practical code constructions.

3.1.1 Contributions

Prior studies consider the compression of homogeneous data types, such as temper- ature [62,64,167] or wind speed data [201]. However, many up-to-date applications involve various sensors of heterogeneous modalities measuring diverse yet correlated data (e.g., temperature, humidity, light). In this study, we propose a novel MT source coding scheme that achieves efficient compression by leveraging the depen- dencies among diverse data types produced by multiple heterogeneous data sources.

75 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

Our specific contributions are as follows:

• We propose a novel code design for multisensory WSNs, where both intra- and inter-sensor data dependencies are exploited via DPCM and MT source coding, respectively. The proposed system combines the merits of conventional predictive coding [139, 180, 217]—where only intra-sensor data dependencies are leveraged—and DSC systems [49, 62, 64, 201, 231]—that only uses inter- sensor dependencies.

• The proposed design is characterized by (i) lightweight encoding, as it applies DPCM to utilize the intra-sensor data dependencies instead of complex vector quantization or trellis-coded quantization (TCQ) as in other works [138]; (ii) optimized compression performance, as it deploys a scalar Lloyd-Max quantizer at each encoder instead of a simple uniform scalar quantizer (USQ) used in other schemes [49]; and (iii) flexibility, since, contrary to classical DSC systems [84, 86, 132, 172, 199, 229, 230], no system reconfiguration is required when a subset of sensors is not functional.

• Previous studies [45, 49, 167] have focused on WSNs collecting homogeneous data and have used a multivariate normal or Laplace distribution to describe the inter-sensor data dependencies. However, in this work, we exploit the data structure among multiple sensors that collect data of different types. In order to accurately express the symmetric and asymmetric dependencies across di- verse data sources, we propose the use of statistical models based on copula functions2 [158, 193]. To this end, we use well-known Elliptical copulas, such as the normal and Student’s t copulas, as well as the Clayton copula, which belongs to the Archimedean copula family. We show that copula functions capture the dependencies among heterogeneous data sources more accurately than the conventional multivariate modeling approach.

• The proposed system embodies a copula regression method to leverage the inter-sensor data dependencies at the decoder. In contrast to alternative copula

2Despite their long history in econometrics, copula functions have only recently been explored in signal processing [62, 110, 156].

76 3.2. Background on MT Source Coding

regression approaches [161,170], the proposed algorithm provides for accurate inference at a reasonable complexity.

• The proposed coding scheme is evaluated using real sensor measurements taken from the well-established Intel-Berkeley database [136].

The remainder of the chapter is organized as follows: Section 3.2 gives a brief description of MT source coding without SW compression. Section 3.3 presents the proposed coding design, whereas Section 3.4 elaborates on the proposed elliptical copula regression. Experimental results are provided in Section 3.5. Section 3.6 draws the conclusion of the work.

3.2 Background on MT Source Coding

We consider a WSN comprising L sensors that collect data produced by correlated sources X ,X ,...,X , which take values from L continuous alphabets , ,..., 1 2 L X1 X2 XL and are drawn i.i.d. according to the joint pdf fX(x1, x2, . . . , xL). Each sensor, in- dexed by l = 1, 2,...,L , gathers a sequence of n source samples and forms ∈ IL { } a data block xl = [xl(1), xl(2), . . . , xl(n)]. The data are encoded using L separate encoding functions φ : n 1,..., 2nRl , l , (3.1) l Xl → { } ∈ IL where each compresses the source block xl at rate Rl by assigning to it a discrete index φl(xl). The joint decoder is a function

θ : 1,..., 2nR1 1,..., 2nRL ˆn ˆn, (3.2) { } × ∙ ∙ ∙ × { } → X1 × ∙ ∙ ∙ × XL that reconstructs the data blocks of all sensors, [xˆ1, xˆ2,..., xˆL], based on the ob- served index tuple [φ1(x1), φ2(x2), . . . , φL(xL)]. Let d ( ) be a distortion measure for the sensor l, defined as l ∙

ˆ dl : l l R+. X × X →

Given a distortion tuple D = [D1,D2,...,DL], the rate tuple R = [R1,R2,...,RL] is achievable if, for any  > 0, there exists a large enough n, L source encoder

77 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

functions φl, and a decoder function θ such that the distortion constraint

n 1 E[d(x (i), xˆ (i))] D +  n l l ≤ l i=1 X be satisfied for each l . The achievable rate-distortion region ∗(D) is the ∈ IL R convex hull of all achievable rate tuples R. Two code designs for the two-terminal Gaussian MT problem are proposed in [230], where TCQ is combined with SW coding. In the first scheme, labeled as asymmetric SW Coded Quantization (SWCQ), each data block of one source, say X1, is quantized and entropy encoded so as to act as side information to encode the corresponding data block of X2 by means of WZ coding. Then the decoded information is linearly combined to produce side information that is used to further refine X1. Finally, the reconstructed data blocks, denoted by x˜1 and x˜2, respectively, are passed to a linear estimator that yields the final decoded estimates [xˆ1, xˆ2]. In the second scheme [230], referred to as symmetric SWCQ, data blocks produced by both sources are quantized and compressed using symmetric SW coding, based on the concept of channel code partitioning. At the decoder, symmetric SW decoding is followed by inverse quantization to reconstruct the two blocks. Similarly to asym- metric SWCQ, a linear estimation step is finally applied. However, extending the designs in [230] to multiple sources is challenging, as practical SW coding based on channel codes becomes difficult to implement. To increase flexibility at the expense of compression performance, the authors of [231] studied the specific MT source coding scenario where SW coding is replaced with simple entropy coding. A practical realization of this scheme is presented in [49], where L sensors monitor homogeneous data types. Each encoder performs USQ followed by arithmetic entropy encoding. At the decoder, after recovering the blocks x˜l from all sensors, a second estimation stage is applied, where the dependencies among the sensed data are exploited through Gaussian regression, as explained below.

78 3.3. The Proposed MT Source Code Design

3.2.1 Gaussian Regression as a Refining Stage

Let the random vector X = [X1,...,Xl,...,XL], which describes the data produced from all sensors at instant i 1, 2, . . . , n , follow a multivariate normal distribu- ∈ { } tion, X (μ , Σ ), with mean value μ and covariance matrix Σ . Moreover, ∼ N X X X X let the quantization noise Zl, which corrupts each component Xl in X, be additive, independent of Xl, and temporally independent. The dequantized data random vari- able at instant i from the l-th sensor is given by X˜l = Xl + Zl. The variance of the quantization noise Zl can be calculated as

2 x Q[x (i)](˜xl(i) xl) fXl (xl)dxl 2 l∈ l − σZl = , (3.3) R x Q[x (i)] fXl (xl)dxl l∈ l R where Q[xl(i)] andx ˜l(i) are, respectively, the quantization index and the recon- structed value assigned to xl(i), and fXl (xl) is the marginal pdf of Xl. The vectors

X and X˜ = [X˜1,..., X˜l,..., X˜L] are assumed to be jointly Gaussian, i.e.,

X˜ μ ˜ Σ ˜ ˜ Σ ˜ X , XX XX , (3.4) X ∼ N μ  ΣT Σ   X XX˜ XX        where Σ = Σ = ΣT = Σ , and Σ = Σ + Σ , with Σ being a XX XX˜ XX˜ X X˜ X˜ X Z Z 2 T diagonal matrix with nonzero elements ΣZ (l, l) = σ and ( ) denoting matrix Zl ∙ transpose. Given the dequantized data X˜ , the final estimate Xˆ is given by the ˜ conditional mean μX X˜ of X X (μX X˜ , ΣX X˜ ), that is, [223] | | ∼ N | |

T 1 Xˆ = μ + Σ Σ− (X˜ μ ˜ ). (3.5) X XX˜ X˜ X˜ − X

3.3 The Proposed MT Source Code Design

The architecture of the proposed MT source coding system is depicted in Fig. 3.1. Unlike prior studies, we assume that the sensors collect heterogeneous correlated data (e.g., temperature, humidity, and light). Furthermore, contrary to prior studies [49, 231], we consider a design that removes the redundancies between consecutive

79 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

Sensor 1 (Encoder) DPCM Entropy Encoding Encoding

Sensor 2 (Encoder) DPCM Entropy Encoding Encoding

Sensor L (Encoder) DPCM Entropy Encoding Encoding

Sink (Joint Decoder) DPCM Entropy Decoding Decoding

Copula Regression DPCM Entropy Decoding Decoding

DPCM Entropy Decoding Decoding

Figure 3.1: The proposed system architecture.

80 3.3. The Proposed MT Source Code Design data samples, which are collected by each sensor and are highly correlated. With the aim to make use of this temporal correlation, each sensor l gathers a ∈ IL block of n readings, xl = [xl(1), . . . , xl(n)], and applies DPCM encoding [174], the block diagram of which is shown in Fig. 3.2. The DPCM encoder comprises a linear prediction function and a Lloyd-Max [93] scalar quantization function Q( ). For the ∙ l-th sensor, the prediction value for the data sample at instant i is given by

m v (i) = a x (i j), (3.6) l j l − j=1 X where m is the memory length of the predictor. The coefficients aj , j = 1, . . . , m, are chosen so as to minimize the MSE between xl(i) and vl(i), and are estimated by solving the Yule-Walker equations [174]. A Lloyd-Max scalar quantizer with M reconstruction levels is used to quantize the prediction error w (i) = x (i) v (i). l l − l The reconstruction points and the quantization regions of the quantizer are deter- mined during a training period. The l-th DPCM encoder outputs a block ql =

[Q[wl(1)],Q[wl(2)],...,Q[wl(n)]] containing the quantization indices of the predic- tion errors wl = [wl(1), wl(2), . . . , wl(n)]. The DPCM block ql is then arithmetic 1 entropy encoded [225] at rate Rl = n H(ql) bits/sample. Arithmetic coding is very efficient, allowing for a compression rate that is very close to the empirical entropy. At the joint decoder, the bitstream received from each sensor l is arith- ∈ IL metic entropy decoded, producing the quantization indices ql. The reconstructed values w˜ l = [w ˜l(1), w˜l(2),..., w˜l(n)] of the prediction errors, are then calculated via inverse quantization. Subsequently, the decoder applies DPCM decoding per sensor for estimating the source blocks x˜l = [˜xl(1), x˜l(2),..., x˜l(n)] for all sensors. Upon reconstructing the source blocks, the joint decoder performs an additional estimation stage, where the dependence structure among the heterogeneous data collected by the various sensors is exploited. To express the joint statistics among diverse correlated data, the proposed system adheres to a modeling approach based on copula functions [158]. Namely, the vector

x˜(i) = [˜x1(i), x˜2(i),..., x˜L(i)],

81 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

xl(i) + wl(i) Q[wl(i)] Entropy — Q encoder wl(i) + ˜

vl(i) +

Predictor

Figure 3.2: DPCM encoder of memory length m for sensor l.

-th DPCM encoder outputsapplies a block DPCM decoding for estimating, and the source are used blocks to estimate the source blocks , ..., Q[wl(i)]] , ..., w˜l(i)] + , ..., x˜l(i)] , ..., xˆl(i)] Entropy -1 Copula indices of the prediction errorsculatedw via inverse quantization.Upon decoding Subsequently, the sourceDPCM the blocks, decoder decoder the of sink each performs sensor an decoder Q + Regression remaining sensors. Let the response variable be variables X : ς l Predictor { ς ∈ IL\{ }} copula regression methods [32], [33], would estimate the val- Figure 3.3: Proposed DPCM decoder of memory length m for sensor l. containing the dequantized values from all sensors at instant i, is passed to the proposed copula regression algorithm that outputs a refined version, denoted by

xˆ(i) = [ˆx1(i), xˆ2(i),..., xˆL(i)].

Fig. 3.3 presents the proposed DPCM decoder of each sensor l. The copula regression algorithm that we devise here is described in the next section. The final estimates xˆ(i) are calculated for all i = 1, 2, . . . , n, and are used to estimate the source blocks xˆl = [ˆxl(1), xˆl(2),..., xˆl(n)] for all sensors.

3.4 The Proposed Semi-Parametric Copula Regres- sion

Focusing on homogeneous data types, previous studies [45,49,167] express the inter- sensor data dependencies using the multivariate normal distribution. This implies that the marginal distributions are assumed normal and that the dependence struc- ture among the data is considered to be linear. However, in Chapter 2, we studied the characteristics of various sensor signals taken from well-established databases, such

82 3.4. The Proposed Semi-Parametric Copula Regression as the Intel-Berkeley database and the Environmental Protection Agency database, and we showed that data collected by sensor nodes typically are of different statisti- cal traits. Based on the dataset used in the experimental evaluation of this chapter, we give Figs. 3.4(a) and 3.4(b) that depict the marginal pdfs of the data collected by a temperature and a humidity sensor, respectively. Clearly, the margins do not follow a Gaussian distribution. Therefore, we deal with heterogeneous information sources and these assumptions may be inaccurate, due to, for example, variations in signal dimensionality across diverse modalities. In this chapter, we abide by the approach described in Chapter 2 and thus, we use the statistical model of copula functions [158, 193] in order to accurately express the dependencies among diverse data sources. Copula functions can combine heterogeneous sensor data with disparate marginal distributions into a multivariate pdf.

3.4.1 Statistical Modelling for Diverse Sensor Data

Chapter 2 elaborates on various various bivariate and multivariate copula families that exist in the literature [75, 83, 88, 115, 158]. The most well-established of them are the Elliptical and the Archimedean copula families. Elliptical copulas provide symmetric expressions and are suitable for applications where the number of the variables (i.e., L) increases. Typical examples of Elliptical copulas are the normal copula and the t-copula (alias, Student’s copula). It is worth mentioning that Ellip- tical copulas facilitate the design of regression schemes since they allow for a non- constant degree of association (i.e., the Spearman rank coefficient) between the re- sponse variable and the covariates [170]. Contrary to Elliptical copulas, Archimedean copulas are easily derived and capture a wide range of dependence. However, they lack modeling flexibility as they are usually parameterized by a single parameter. Among others, the Clayton copula is a widely used Archimedean copula, since it has a simple closed-form expression. Some of the information presented in this section is also described in a more detailed way in Chapter 2. Nevertheless, to ease the understanding of the proposed design, we highlight some important aspects that need to be taken into account. In our system, copula functions are used to model the statistical distribution of the

83 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

0.12 0.07

0.1 0.06

0.05 0.08

0.04 0.06 0.03 Probability 0.04 Probability 0.02

0.02 0.01

0 0 10 15 20 25 30 35 40 10 20 30 40 50 60 Temperature (oC) Humidity (%) (a) (b)

Figure 3.4: Example marginal pdfs obtained using kernel density estimation

for an appropriate smoothing window of bandwidth hXl . (a) Real-

valued temperature data (hXl = 0.9315), and (b) real-valued hu-

midity data (hXl = 1.6166).

random vector X = [X1,...,Xl,...,XL] that describes the sensor values at instant i (see Fig. 3.2). Let FX1 (x1) ...,FXL (xL) be the continuous marginal distribution functions of the random variables in X. To describe the statistical dependencies in X using a copula-based model. Namely, according to Sklar’s theorem [158, 193], if

FX is the L-dimensional joint distribution of X, there exists a unique L-dimensional copula function C : [0, 1]L [0, 1] such that →

FX(x1, . . . , xL) = C FX1 (x1),...,FL(xL) . (3.7)  If the marginal distributions are continuous, an assumption that holds in our case, the copula function is unique. We are interested in having a description of the multivariate pdf of X. To that end, we use the copula density function, denoted by c (FX1 (x1),...,FXL (xL)). In particular, we differentiate the expression in (4.13) with respect to ul = FXl (xl), for all l . Thus, the multivariate pdf of the sensor data can be written as: ∈ IL

L

fX(x1, . . . , xL) = c FX1 (x1),...,FXL (xL) fXl (xl). (3.8) l=1  Y Given the marginal pdfs of the random variables, an appropriate copula function

84 3.4. The Proposed Semi-Parametric Copula Regression that best captures the dependencies among the sensor data should be selected. In this work, we consider the multivariate normal, t- and Clayton copulas. For the mathematical formulas of these copulas, we refer the reader to Chapter 2. Finally, we are interested in deriving the conditional pdf of the random variables 2 L X → = X ,...,X given X . This can be derived as follows: { 2 L} 1

fX(x1, . . . , xL) 2 L fX → X1 (x2, . . . , xL x1) = | | fX1 (x1) c (F (x ), ..., F (x )) L f (x ) = X1 1 XL L l=1 Xl l f (x ) X1 1 Q L

= c (FX1 (x1), ..., FXL (xL)) fXl (xl). (3.9) Yl=2 3.4.2 The Proposed Copula Regression Method

The proposed code design embodies a copula regression method that takes as input the reconstructed sensor values x˜(i) = [˜x1(i), x˜2(i),..., x˜L(i)] and produces a refined estimate, denoted by xˆ(i) = [ˆx1(i), xˆ2(i),..., xˆL(i)]. During a training stage, the model parameters, namely, either the correlation matrix Rg of the normal copula, the correlation matrix Rt and the degrees of free- dom ν of the t-copula, or the parameter δCl of the Clayton copula, as well as the continuous marginal pdfs and cdfs of sensor data are estimated. The correlation matrix Rg of the normal copula is parametrically estimated using standard Max- imum Likelihood Estimation (MLE) [29]. Regarding the t-copula, the correlation matrix Rt and the degrees of freedom parameter ν are parametrically estimated using approximate MLE [29], where the copula function is fitted by maximizing an objective function that approximates the profile log-likelihood for the degrees of freedom parameter. Moreover, the parameter of the Clayton copula is estimated using MLE. The marginal pdfs are non-parametrically estimated using KDE [31]. Specifically, given τ training samples from sensor l , the KDE estimator is ∈ IL τ ˆ 1 xl xλ fXl (xl) = K − , (3.10) τhXl hXl λX=1   85 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression where K( ) is the kernel function and h is the bandwidth of the smoothing window. ∙ Xl The kernel function is usually chosen to be a smooth unimodal function with a peak at zero. In the literature [31] various kernel functions, such as the Gaussian and the Epanechnikov kernel, have been proposed. Although the Epanechnikov kernel is optimal in the MSE sense [77], the accuracy of the non-parametric estimate depends less on the shape of the kernel function K( ) than on the value of its bandwidth ∙ hXl [219]. For accurate density estimation, an appropriate selection of the bandwidth value is important since small or large values can lead to under- or over-smoothed estimators, respectively. We have used the asymptotically optimal choice for the bandwidth hXl . Its calculation is based on the minimization of the mean integrated squared error (MISE) between the density estimator fˆ and the true density function f (for more details we refer the reader to [31]). The bandwidth value that minimizes MISE in an asymptotic sense is given from the following formula [31]:

1 γ(K) 5 h = , (3.11) Xl,opt nζ(f)  

α(K) 2 2 where ζ(f) = f 00(x)dx, γ(K) = 4 and α(K) = K (z)dz. Also σK denotes the σK 2 variance of theR kernel function and n the size of theR samples. Also σK denotes the variance of the kernel function and n the size of the samples. This bandwidth value cannot be used in practice since it involves the unknown density function f. However, it is useful as it shows how smoothing windows should decrease in the sample size 1 proportionately to n5 and quantifies the effect of the curvature of f through the fac- tor ζ(f) [31]. A practical method for evaluating the optimal bandwidth for Gaussian distributions is given in [31], whereas other methods involve cross-validation [30] or plugged-in bandwidths [190]. In MATLAB, some of these methods for optimal band- width selection are implemented in the function ksdensity. Apart from using fixed bandwidth, KDE can also be modified such that it as- signs large bandwidth values in regions where the data are sparse and smaller values where the are closely clustered together at small values of speed. This methodol- ogy is called adaptive or variable-bandwidth KDE and can be proven valuable in multidimensional fitting problems [207]. An interesting study has been proposed in [191], where an optimized kernel density estimate using a Gauss kernel function

86 3.4. The Proposed Semi-Parametric Copula Regression with bandwidths locally adapted to data is implemented. ˆ The smooth estimate of the corresponding marginal cdf, FXl , is constructed by ˆ integrating fXl . That is,

xl τ ˆ ˆ 1 xl xλ FXl (xl) = fXl (x)dx = k − , (3.12) τ hXl Z−∞ λX=1   where k(x) = x K(q)dq. −∞ This workR focuses on slowly-varying data sources, such as temperature and hu- midity,which are modelled as stationary processes, namely, random processes whose statistical properties are time independent or do not change for a given period of time [174]. Thus, the entries of the correlation matrices Rg (for the normal copula) or Rt (for the t-copula), as well as the parameter of the Clayton copula, are as- sumed to be constant for a given period of time and offline estimation is sufficient. A similar approach has been followed in previous works [49, 62]. Nevertheless, the use of online estimators for the statistical parameters is left for future research. Moreover, it is important to mention that to estimate the statistics of the marginal distributions and the copula function, the considered dataset contains read- ings of the diverse sources with similar timestamps. However, wireless systems often experience delays in receiving the data. Depending on the duration of the delay, the inter-source statistics may or may not be affected; this depends on how fast the source statistics vary in time. In the considered monitoring scenarios, the source statistics vary slowly in time and hence the dependence structure among readings with different timestamps does not change significantly. Saying that, we understand the performance of the proposed copula based algorithm is not significantly affected from potential delays. Once the parameters of the models have been determined, the refined estimates xˆ(i) are calculated based on the proposed regression algorithm. The algorithm refines the reconstructed sensor valuex ˜l(i), corresponding to the l-th sensor, using the other elements of x˜(i), which correspond to the other sensors. Let the response variable be X and the variables X : ς l denote the covariates. Existing copula l { ς ∈ IL\{ }} regression methods [161,170], would estimate the values in xˆ(i) using the conditional mean E[X X ,...,X ], ς l . This approach works well when the number of l| 1 ς ∈ IL\{ }

87 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression covariates is small (two or three). However, when the dimensionality is high, as in our system, exact inference can be intractable, thereby requiring computationally expensive Monte Carlo sampling methods [87]. Another approach for predicting the refined estimates in the vector xˆ(i) considers the MLE problem, which can be written as

1 ς xˆl = arg max fX → Xl (x1, . . . , xς xl) xl | |

= arg max c (FX1 (x1),...,FXς (xς ),FXl (xl)) xl ×  f (x ) ς f (x ) Xl l d=1 Xd d (3.13) f (x ) QXl l  ς

= arg max c (FX1 (x1),...,FXL (xL)) fXd (xd) xl dY=1

= arg max c (FX1 (x1),...,FXL (xL)) , (3.14) xl

fX(x1,...,xς ,xl) 1 ς where fX → Xl (x1, . . . , xς xl) = f (x ) , ς L l , and fX(x1, . . . , xς , xl) | | Xl l ∈ I \{ } is given by the expression in Equation (3.8). We solve (3.14) by using an algorithm that delivers accurate inference at reasonable complexity. In particular, the cdf of the response variable FXl (xl) is sampled until the considered copula density cg(ξ), ct(η) or c(Cl)(u) is maximized. This is expressed by the following optimization problem:

ul∗ = arg max c u1, . . . , ul, . . . , uL , (3.15) ul=FXl (xl)  where u [0, 1] and c( ) is replaced by the expression of the respective copula l ∈ ∙ density function. The solution of (3.15) is found numerically using a greedy approach. The objec- tive function in (3.15), is not necessarily concave meaning that local maxima may appear. We solve problem (3.15) using a greedy approach to find the optimal ul, which does not abide by the “hill-climbing principles” [94] that lead to calculation of local maxima; the algorithm rather performs an exhaustive search for all values of ul that span the region [0, 1] and finds the global maximum (within the step- size accuracy) of the copula density. The sampling step is chosen to be ust = 0.001 such that we strike a balance between: (a) the decoding complexity level, where

88 3.5. Experiments larger parameter values speedup the optimization process, and (b) the accuracy of the inference, where smaller values provide meticulous copula sampling. Finally, the refined estimate of the sensor value is given by

1 xˆ (i) = Fˆ− (u∗). (3.16) l Xl l

The procedure is described in Algorithm 1. Initially, the algorithm refines the value of the l-th sensor using the other elements of x˜(i), which correspond to de- quantized values of the remaining sensors. Subsequently, the algorithm replaces the corresponding dequantized estimatex ˜l(i) with the refined valuex ˆl(i) in x˜(i) and continues with refining the value of the next sensor. The same procedure repeats for all unrefined symbols in x˜(i), yielding the refined estimate vector xˆ(i). The sen- sors indices are processed sequentially with increasing the value of l. Moreover, as shown in Algorithm 1, the proposed methodology can be straightforwardly adapted to cope with the case where a subset of sensors, denoted by , is not operating Ic (due to, for example, battery depletion or duty cycling for extending the lifetime 3 of the system ). In this case, the vector x˜e(i) contains only the components of x˜(i) that correspond to the effective sensors, which are indexed in the set = . Ie IL\Ic Furthermore, only the columns and rows of the correlation matrices Rg (for the normal copula) or Rt (for the t-copula) that correspond to the effective sensors are kept and the remaining rows and columns are removed.

3.5 Experiments

We evaluate the proposed system using actual sensor readings from the Intel-Berkeley Research lab [136] database. The database contains data collected from 54 Mica2Dot sensors equipped with weather boards, monitoring diverse physical parameters (that is, humidity, temperature, and light) in an indoor office and laboratory environment. To conduct our experiments, we selected randomly L = 21 sensors from the database (see Fig. 3.5), taken from 2 different rooms, 11 of which harvest temperature data

3A subset of sensors is periodically turned off during specific time periods.

89 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

Algorithm 1: Proposed copula regression algorithm for refining the dequan- tized sensor values.

1: Inputs: Set of effective sensors e = l1, . . . , lLe , vector x˜e(i) with dequantized sensor values in ,I copula{ parameters,} marginal statistics. Ie 2: Output: Refined estimates xˆe(i). 3: Modify R (or R ) given the set = l , . . . , l . g t Ie { 1 Le } 4: for l do ∈ Ie 5: cl∗ = 0; 6: ul∗ = 0; 7: for ul = 0 1 with step ust do → ˆ ˆ 8: Create the vector u = [FXl (˜xl1 (i)), . . . , ul,...... FXl (˜xlL (i))]. 1 Le e 9: if Elliptical copula then 1 1 10: Calculate ξ = Φ− (u) (or η = tν− (u)). 11: Calculate cg(ξ) using (2.64) or ct(η) using (2.67). 12: if c (ξ) c∗(or c (η) c∗) then g ≥ l t ≥ l 13: cl∗ = cg(ξ) or cl∗ = ct(η)); 14: u = u ; l∗ l 15: end if 16: else 17: Calculate c(Cl)(u) using (2.77). (Cl) 18: if c (u) cl∗ then (Cl)≥ 19: cl∗ = c (u); 20: ul∗ = ul; 21: end if 22: end if 23: end for ˆ 1 24: Calculatex ˆl(i) = Fl− (ul∗). 25: Replacex ˜l(i) withx ˆl(i) in x˜e(i). 26: end for 27: Do xˆe(i) = x˜e(i).

(in oC) and the other ten collect humidity data (in %)4. The sensor readings from the Intel-Berkeley database exhibit various levels of dependence, such as strong

4We considered sensors measuring data of a single source, which does not have to be the same for all nodes. Namely, each sensor gathers readings of either humidity or temperature and then transmits this information to the sink. In other setups, a sensor can be part of a bigger device that also includes other sensors, i.e., a device that has sensors for monitoring humidity, pressure, temperature etc. In this case readings from different sources are encoded and sent separately to the sink. The readings of diverse types belonging to the same device have very strong dependence and this can significantly enhance the recovery performance of the joint decoder, since the proposed copula-based regression algorithm has the ability to leverage the stronger dependencies among the heterogeneous data sources (see Section 3.5.5). Nevertheless, we should mention that the depen- dence structure among the multi-sensor devices still plays an important role and has to be also strong. This is because we propose a multivariate copula-based algorithm, where dependencies with other devices should also be exploited.

90 3.5. Experiments

Figure 3.5: Floor plan with the sensor locations considered in the Intel Berke- ley Research lab database [136].

(ρ1,3 = 0.9764), medium (ρ8,18 = 0.7574) and weak (ρ9,14 = 0.5311), where ρl1,l2 denotes Spearman’s rank correlation coefficient between data of sensors l1 and l2. The collected data were distinguished into a training and an evaluation set, with- out an overlap between the two. The former, which consisted of the initial 15% of the data, was used to derive the parameters of the proposed coding scheme and the proposed copula-function-based model. Given the training dataset, we derived two different predictors of memory length m = 3: one for the sensors measuring temper- ature and one for those sensing humidity. The derived predictor coefficients, which led to minimum-MSE predictors [see (3.6)], were found to be at,1 = 2.7520, at,2 = 2.6647, a = 0.9127, for the temperature data, and a = 1.1613, a = − t,3 − h,1 h,2 0.0490, a = 0.1124 for the humidity data. Furthermore, following the semi- − h,3 − parametric approach described in Section 3.4, we estimated the correlation matrix

Rg of the normal copula density, the correlation matrix Rt and the degrees of free- dom ν for the t-copula density, the parameter of the Clayton copula δCl, as well as ˆ ˆ the marginal pdfs fXl and cdfs FXl on the sensor values. To compare the proposed modeling approach against the state of the art [49], we also fitted the multivariate Gaussian model on the data; namely, we estimated parametrically5 the mean values

μˆXl and the standard deviationsσ ˆXl of the marginal distributions for the sensor val- ues as well as the corresponding covariance matrix ΣX . The estimated parameters

5The parameters were estimated using the MATLAB function fitdist.

91 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression for the different statistical models are reported in Table 3.1. The degrees-of-freedom parameter of the t-copula function model was found to be ν = 6.6995. Finally, we calculated the parameter of the Clayton copula via ML estimation, which was found to be δCl = 0.5302.

Table 3.1: Mean Values and Standard Deviations for the Multivariate Gaus- sian Model, as well as the Bandwidth of the smoothing window for the Copula Function Model.

Gaussian Model Copula Model

Mean St. dev. Bandwidth Sensor ID μˆXl σˆXl hXl 1 (Temp.) 23.2250 4.1648 0.9315 2 (Hum.) 33.9024 7.8912 1.6166 3 (Temp.) 22.6966 3.1606 0.7681 4 (Hum.) 37.4034 5.1743 0.9993 5 (Temp.) 21.8933 2.4164 0.4031 6 (Hum.) 38.5627 5.3227 0.9811 7 (Temp.) 22.2168 2.4620 0.0054 8 (Hum.) 35.3782 5.7556 0.4494 9 (Temp.) 22.0934 2.4939 1.1969 10 (Hum.) 35.9978 5.8755 0.5175 11 (Temp.) 22.2770 3.3557 1.0858 12 (Hum.) 36.7319 6.6400 1.3081 13 (Temp.) 22.0698 2.9606 0.5910 14 (Hum.) 37.3494 5.4444 1.0532 15 (Temp.) 21.4909 2.7720 0.4276 16 (Hum.) 37.5436 5.3022 0.7690 17 (Temp.) 21.0885 2.8714 0.4685 18 (Hum.) 38.4008 4.8944 0.7394 19 (Temp.) 20.5121 2.9205 0.5148 20 (Hum.) 40.0336 5.5524 0.9118 21 (Temp.) 20.8336 2.5813 0.3595

During the training stage, we also configure the Lloyd-Max quantizer and the arithmetic entropy coder that are deployed to compress the data from each sensor in

92 3.5. Experiments

Figure 3.6: Fitting of non-parametric functions on temperature data collected by the Intel-Berkeley database [136], where the (a) Gaussian, (b) Laplacian, (c) Box, and (d) Epanechnikov kernels have been used. our system. Specifically, we determine the reconstruction values and the partitions of each quantizer, as well as the source statistics used in the arithmetic coders. During the evaluation stage, the data from each sensor l were aggregated ∈ I21 into data blocks, each consisting of n = 40 consecutive samples. The block length is chosen to strike a balance between good compression performance and delay. The quantizer of each sensor uses the same number of quantization levels, M, resulting in a stream of k = n log M bits that is passed to each entropy encoder. × 2

3.5.1 Choice of the Appropriate Kernel Function

First, we evaluate the impact of the fitting accuracy when different kernel functions are considered for the non-parametric estimation of the marginal pdfs. Fig. 3.6 depicts the fitting accuracy of different non-parametric distributions on the temper- ature data collected by sensor 1. The distributions use the Gaussian, the Laplacian, the Box and the Epanechnikov kernels. Moreover, using Kolmogorov-Smirnov fit- ting tests, Table 3.3 shows that the fitting accuracy of the different distributions is

93 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

Figure 3.7: KDE on temperature data collected by the Intel-Berkeley database [136], where the (a) fixed-bandwidth and (b) variable- bandwidth Gaussian kernels have been considered. quite similar; this agrees with prior results such as [219], where it has been proven that the choice of the kernel function is less significant than the appropriate choice of the bandwidth of the smoothing window. For the sake of completeness, we also provide Fig. 3.7 that depicts the fitted density functions via KDE when either fixed or variable bandwidth [191] is used. Table 3.2 shows the performance of the proposed system for different kernels. For illustrative purposes, we have assumed the normal copula regression for the refinement stage. The compression performance is expressed in terms of the effec- 1 tive distortion (in MSE), L l Dl, versus the effective rate (in bits/sample), e ∈Ie 1 R , for different quantization levels, that is, M = 8, 16, 32, 64, 128, or 256. L l e l P e ∈I HereP we have assumed that Le = 21, namely, all considered sensors are active. We see that, for all different kernel types, the effective distortion performance is very similar. Nevertheless, the Gaussian kernel provides slightly better results than the other functions and, hence, is used in our experiments6.

6The performance of the system for adaptive KDE is not tested here and is left for future work.

94 3.5. Experiments

(a) Square-root rule (32 bins)

(b) Integer rule (16 bins)

Figure 3.8: Normalized histograms based on temperature data collected by the Intel-Berkeley database [136], where (a) square-root, and (b) integer rules have been used for binarization.

95 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

Before concluding this section, we provide a comment on the choice of the bin size and how it affects the goodness of fit. The histogram in Fig. 3.6 uses equal- width bins, which is a widely-adopted approach. Determining an optimal number of bins is not an easy task since strong assumptions on the shape of the actual data distribution have to be made. Fortunately, there exist some guidelines and rules of thumb, such as the Sturges’ formula, the Doane’s formula, the Scott’s rule for normally distributed data, the square-root rule and the integer rule (see [188]). Using the MATLAB function histogram, we plot various normalized histograms for different binarization methods in Fig. 3.8. We see that by changing the bin-width, the goodness of fit is affected and KDE finds the right balance between over- and undersmoothing/fitting. Here we derive the number of bins using the following B formula: x x = max − min , B β   where β is the bin-width and xmax, xmin the maximum and minimum values in the considered dataset. The choice of β is made such that we cover the data range and reveal the shape of the underlying distribution.

Table 3.2: Effective rate Vs. Effective distortion for Gaussian, Laplacian, Box and Epanechnikov kernel functions (Le = 21).

Eff. Rate Eff. Distortion (all schemes) Gaussian Laplacian Box Epanechnikov

1.1161 1.5316 1.5331 1.5334 1.5327 1.8870 0.3306 0.3318 0.3320 0.3311 2.7818 0.1035 0.1041 0.1040 0.1039 3.7408 0.0324 0.0329 0.0329 0.0327 4.7228 0.0057 0.0061 0.0064 0.0059 5.6577 0.0019 0.0023 0.0022 0.0020

96 3.5. Experiments

Table 3.3: Asymptotic p-values when performing Kolmogorov-Smirnov fitting tests between the actual temperature readings and the sets pro- duced by the non-parametric distribution with different kernels. The significance level is set to 5%.

Kernel Function Bandwidth p-values

Gaussian 0.4332 0.3792 Laplacian 0.4332 0.3331 Box 0.4332 0.2917 Epanechnikov 0.4332 0.3553

3.5.2 Performance Evaluation of DPCM

We assess the impact of DPCM (including Max-Lloyd quantization) on the perfor- mance of the system. Particularly, we compare the system in [49], which applies USQ followed by arithmetic encoding, against our approach, which deploys DPCM with Lloyd-Max scalar quantization and arithmetic entropy encoding. In both systems, a Gaussian regression step is performed at the decoder after inverse quantization. The comparisons are conducted for two scenarios. In the first, all sensors are active, whereas in the second a random subset of sensors may not operate, resulting in Ic a setup with Le effective sensors (see Section 3.4.2). Figs. 3.9(a) and 3.9(b) illustrate the performance of the compared systems for 7 Le = 21 and Le = 12, respectively . As in Section 3.5.1, the compression per- formance is expressed in terms of the effective distortion versus the effective rate for different quantization levels. It is clear that the proposed approach leads to a substantially higher compression performance, delivering a significant effective rate reduction of up to 36.64% (when Le = 21) and 38.35% (when Le = 12) for a similar distortion level. These results underline the benefit of using a DPCM scheme with an optimized Lloyd-Max quantizer to leverage the intra-sensor dependencies in our setting.

7 Le is a parameter in our system. Without loss of generality, we can consider different values of Le without affecting the overall behaviour of the system since (a) the sensor readings from the Intel-Berkeley database exhibit various levels of dependencies, and (b) the indices of the effective sensors are chosen randomly.

97 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

101

100

10-1

-2

Effective distortion (dB) 10

DPCM with Gaussian regression Entropy coding with Gaussian regression 10-3 1 2 3 4 5 6 7 8 Effective Rate (bits/sample) (a)

101 DPCM with Gaussian regression Entropy coding with Gaussian regression 100

10-1

10-2 Effective distortion (dB) 10-3

10-4 1 2 3 4 5 6 7 8 Effective Rate (bits/sample) (b)

Figure 3.9: Rate-distortion performance comparison between entropy coding and DPCM when only Gaussian regression is employed at the decoder. The number of effective sensors is (a) Le = 21 and (b) Le = 12.

98 3.5. Experiments

3.5.3 Performance Evaluation of the Copula Regression Al- gorithm

We now evaluate the performance improvement achieved by the proposed copula- based regression algorithm. To this end, we remove the DPCM component from the system; namely, the collected data are quantized with the Lloyd-Max quantizer and the indices are entropy encoded. In particular, we compare the following schemes:

1. the baseline scheme using entropy coding without a refinement stage (i.e., no regression),

2. the scheme in [49] that combines entropy coding and Gaussian regression,

3. the proposed scheme using entropy coding and normal copula regression,

4. the proposed scheme that combines entropy coding and t-copula regression, and

5. the proposed scheme that combines entropy coding and Clayton copula re- gression.

The effective rate-distortion performance of the system is given in Figs. 3.10(a) and 3.10(b) for Le = 21 and Le = 12, respectively. It is worth observing that the Gaussian regression method in [49] induces a higher distortion of the decoded data compared to the simple case where no regression is applied, especially at low rates. This is because in the low-rate regime the vectors X and X˜ in (3.4) are not jointly Gaussian and thus, Gaussian regression leads to poor final estimates. When the rate increases, the assumption of joint Gaussian distribution becomes more accurate and therefore better estimates are provided. However, the proposed copula regression algorithm systematically outperforms the Gaussian regression scheme in [49] for all copula models. More importantly, the improvements are increased when the encoding rate decreases. The reason is that at low rates, where the quantization of the data is coarse, the copula regression schemes offer significant improvement on the reconstruction quality. Table 3.4 presents the average percentage distortion reductions obtained by comparing each of the schemes (a), (c)-(e) with the state-of-the-art scheme (b).

99 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

The improvements in distortion reduction refer to the cases when Le = 12 and

Le = 21. These improvements show that copula-based models express the joint statistics among heterogeneous data more accurately than the multivariate Gaus- sian model. Furthermore, the t-copula function results in higher modeling accuracy than the normal copula, which is attributed to the ability of the former to express better the dependencies between extreme values [32]. Finally, the best performance is obtained when the Clayton copula is used because this copula efficiently captures the asymmetric dependencies among sensor data.

Table 3.4: Average effective distortion gains (in %) using the system in [49] as a reference.

Regression Type Le = 12 Le = 21

No regression [system (a)] 4.96 39.07 Normal copula regression [system (c)] 47.05 57.31 t-copula regression [system (d)] 65.93 75.90 Clayton copula regression [system (e)] 79.22 87.55

3.5.4 Overall Performance of the Proposed System

In this section, we compare the proposed system, configured with DPCM and nor- mal, t- or Clayton copula regression, against the system in [49]. The results reported in Figs. 3.11(a) and 3.11(b) show that the proposed system significantly outper- forms the benchmark. Specifically, the configuration with normal copula regression achieves an average effective distortion reduction of 80.45% (when Le = 21) and

77.99% (when Le = 12). When the t-copula regression method is applied, average effective distortion reductions of 85.03% (when Le = 21) and 81.28% (when Le = 12) are observed. The best performance is achieved when the Clayton copula is used, where the average distortion reductions are 93.23% (when Le = 21) and 92.06%

(when Le = 12), respectively. The resulting effective rate gains are the same as in Section 3.5.2, as they are attributed to the DPCM encoder in our system. Contrary to the state-of-the-art system in [49], the proposed code design lever- ages both the intra-sensor data correlation by means of DPCM, and the inter-sensor

100 3.5. Experiments

1 10 No regression [system (a)] Gaussian regression [system (b)] Normal copula regression [system (c)] t-copula regression [system (d)] 100 Clayton copula regression [system (e)]

10-1

-2 Effective distortion (dB) 10

10-3 2 3 4 5 6 7 8 Effective Rate (bits/sample) (a)

101 No regression [system (a)] Gaussian regression [system (b)] Normal copula regression [system (c)] 0 10 t-copula regression [system (d)] Clayton copula regression [system (e)]

10-1

10-2 Effective distortion (dB) 10-3

10-4 2 3 4 5 6 7 8 Effective Rate (bits/sample) (b)

Figure 3.10: Rate-distortion performance comparison for Gaussian, normal copula, t-copula, and Clayton copula regression employed at the decoder. At the encoder, arithmetic entropy coding is performed for all configurations. The number of effective sensors is (a) Le = 21 and (b) Le = 12.

101 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression correlation using copula functions. Moreover, the proposed copula-based approach allows for capturing the dependencies among diverse data more accurately than the multivariate Gaussian model.

3.5.5 Performance Evaluation for Weaker Intra-Sensor De- pendence Structure

In the previous experiments, the proposed method was evaluated using actual sensor readings from the Intel Berkeley database, which are collected per minute; we refer to this data as dataset A. Due to the high sampling rate, consecutive sensor readings are highly correlated and, in this case, DPCM provides significant rate savings, as shown in Sections 3.5.2 and 3.5.4. In order to assess the performance of the proposed method for weak intra-sensor dependencies, we perform a sub-sampling of the data in the Intel Berkeley database, namely,we consider sensor readings collected per 30 minutes; we refer to them as dataset B. Fig. 3.12 compares the autocorrelation function (ACF) of the temperature readings of sensor 1 for the datasets A and B. It is clear that the ACF decays faster for the dataset B, as the sampling interval is larger. We compare the performance of the proposed system with DPCM and Clayton copula regression against the state-of-the-art system in [49] for dataset B. Moreover, in the comparison, we include the system that applies entropy encoding and Clayton copula regression. The reason for choosing the Clayton copula for regression is that it delivers the best MSE performance, as shown in Sections 3.5.3 and 3.5.4. For the dataset B, the Clayton copula parameter was found to be δCl = 0.5870. The effective rate-distortion performance of all schemes is given in Fig. 3.13, for

Le = 21. The results reveal that even when the intra-sensor dependence is weak, the proposed method outperforms the system in [49]. In particular, average rate savings of 1.1493 bits/sample are obtained, whereas the average distortion reductions are of 89.65%. The rate gain due to DPCM is smaller than in Section 3.5.4, where the dataset A is considered, but still significant. Furthermore, the system with entropy coding and Clayton copula regression delivers better performance in MSE than the state-of-the-art system in [49], obtaining average reduction is 88.03%. Thus, the pro- posed copula regression algorithm delivers more accurate inference than the Gaus-

102 3.5. Experiments

101

100

10-1

10-2 Effective distortion (dB) -3 10 Entropy coding with Gaussian regression DPCM with normal copula rgeression DPCM with t-copula regression DPCM with Clayton copula regression 10-4 1 2 3 4 5 6 7 8 Effective Rate (bits/sample) (a)

101 Entropy coding with Gaussian regression DPCM with normal copula rgeression DPCM with t-copula regression 0 10 DPCM with Clayton copula regression

10-1

10-2 Effective distortion (dB) 10-3

10-4 1 2 3 4 5 6 7 8 Effective Rate (bits/sample) (b)

Figure 3.11: Rate-distortion performance comparison between the state-of- the-art system presented in [49] and the proposed system using DPCM and (normal, t- and Clayton) copula regression at the decoder. The number of effective sensors is (a) Le = 21 and (b) Le = 12.

103 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

1 1 minute 30 minutes 0.8

0.6 ACF 0.4

0.2

0 -6000 -4000 -2000 0 2000 4000 6000 Lags

Figure 3.12: Autocorrelation function calculated for the temperature readings of sensor 1 during the training period, when data sampling is performed per (a) 1 minute, or (b) 30 minutes.

101

100

10-1

-2

Effective distortion (dB) 10 DPCM with Clayton copula regression Entropy coding with Clayton copula regression Entropy coding with Gaussian regression 10-3 1 2 3 4 5 6 7 8 9 Effective Rate (bits/sample)

Figure 3.13: Rate-distortion performance comparison between the proposed system, the system with entropy coding and Clayton copula re- gression, as well as the state of the art [49] (dataset B).

104 3.5. Experiments

Cluster 3 Cluster Head (CH)

Peripheral node Base Station

Cluster 1 Cluster 2

Figure 3.14: WSN architecture with clusters. sian regression [49] since copulas can efficiently capture the dependencies among the sensor data.

3.5.6 Comparison with DSC for Different WSN Topologies

Finally, we compare our system, which applies DPCM and Clayton copula regres- sion, with a state-of-the-art DSC system [60, 228]. The DSC system in [60, 228] abides by the following WSN topology [62]: the sensors are separated into smaller groups called clusters, as depicted in Fig. 3.14. Each cluster contains a cluster head (CH) and a number of peripheral nodes (PNs). The data collected from the CH are intra-encoded and communicated to a central node (decoder), where it plays the role of side information that is used to decode the data from the PNs, which are Wyner-Ziv [227] encoded. The readings of each sensor are uniformly quantized and the resulting quantization indices are split into bit-planes. The CH performs arithmetic entropy encoding of the bit-planes sequentially starting from the most significant one. The PNs perform Slepian-Wolf [195] encoding of the bit-planes using Low-Density Parity-Check Accumulate (LDPCA) codes [215]. A multivariate Gaus- sian distribution is used to describe the statistical dependencies among the sensor readings of the CH and the PNs.

105 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

101 Proposed Single-cluster DSC Two-cluster DSC 100

10-1

10-2 Effective distortion (dB) 10-3

10-4 1 2 3 4 5 6 7 8 Effective Rate (bits/sample)

Figure 3.15: Rate-distortion performance comparison of the proposed system with single- and two-cluster DSC designs.

We assume two different configurations of this topology. First, all 21 sensors form a big cluster with sensor 1 being the CH; we refer to this as a single-cluster topology. Second, the WSN is divided in two clusters; the first comprises 11 sensors measuring temperature, whereas the second includes the sensors that monitor humidity. Fig. 3.15 shows that the proposed design outperforms significantly the DSC sys- tem for both the single-cluster and the two-cluster topologies. In particular, our system offers average rate savings of 1.2662 bits/sample and of 1.0088 bits/sample compared to the single- and two-cluster DSC system, respectively. The correspond- ing average effective distortion reductions are 90.14% and 90.25%. Thus, the pro- posed design efficiently leverages the inter-sensor dependencies among heterogeneous data, i.e. both temperature and humidity. However, we see that the two-cluster DSC system outperforms the single-cluster configuration, showing that this DSC design is more suitable for homogeneous data. Hence, grouping of sensors measuring ho- mogeneous data (i.e., temperature or humidity) is required so as to improve its performance.

106 3.6. Conclusion

3.6 Conclusion

We proposed a novel MT source code design that addresses IoT applications of small-to-medium scale, such as monitoring of ambient conditions in smart homes and smart buildings. Our design is based on distributed source coding principles and results in significant compression gains compared to prior art because it takes into account both the inter- and intra-sensor data dependencies. Firstly, to express the dependence structure among the diverse data types collected by the various sen- sors (such as humidity or temperature sensors), we proposed the use of multivariate copula functions belonging to the Elliptical and the Archimedean family. Our sys- tem provides accurate statistical inference via regression, by means of a proposed algorithm that delivers accurate estimation at a reasonable complexity. Secondly, to leverage the intra-sensor data dependencies, we used a predictive quantization technique, namely, DPCM. We evaluated the performance of the proposed scheme through experimentation, using actual sensor measurements from the well-established Intel-Berkeley database [136]. We summarize our conclusions:

• The use of DPCM with an optimized Lloyd-Max quantizer leverages the intra- sensor dependencies and leads to a substantially higher compression perfor- mance. We compared the design using DPCM and Gaussian regression as a refining stage to the state-of-the-art design [49] and we showed that a signif-

icant effective rate reduction of up to 36.64% (when Le = 21) and 38.35%

(when Le = 12) for a similar distortion level is achieved.

• The novel copula regression method used as a refining stage at the joint de- coder results in significant improvements in distortion reduction. These im- provements showed that, when dealing with data of heterogeneous types, cop- ula modeling provides more accurate expressions of the joint statistics than the multivariate Gaussian model. Moreover, the Clayton copula provides the best performance among other copula families, because it efficiently captures the asymmetric dependencies among sensor data.

• The proposed system combines the assets of both DCPM and copula regres- sion and provides superior performance against the state-of-the-art design [49].

107 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

Notice that the best performance is achieved when the Clayton copula is

used, where the average distortion reductions are 93.23% (when Le = 21)

and 92.06% (when Le = 12), respectively.

• We showed that the proposed design outperforms the state of the art, even when the intra-sensor dependencies are not very strong. In particular, we showed that average rate savings of 1.1493 bits/sample are obtained, whereas the average distortion reductions are of 89.65%. As expected, rate savings due to DPCM are smaller than the case of strong intra-sensor dependencies, but still important. Furthermore, the system with entropy coding and Clayton copula regression delivers better performance in MSE than the state-of-the-art system in [49], obtaining an average reduction of 88.03%.

• We compared the performance of the proposed scheme to a state-of-the-art DSC system [60,228] that uses a cluster-based topology and Wyner-Ziv coding. We showed that the proposed design performs better than the DSC system for both the single-cluster and the two-cluster topologies. More importantly, we observed that the two-cluster DSC system—where each cluster contains sensors of the same type—performs better that the single-cluster configuration, showing that this DSC design is more suitable for homogeneous data.

• Finally, the proposed scheme is flexible, since it does not require reconfigura- tion when a subset of sensors is not operating.

3.7 Discussions

Apart from the contributions of this chapter, we want to discuss some related topics that needs considerable attention. The first topic has to do with the incorporation and the affect of cloud services into an existing cloud service infrastructure. The decoding procedure of the proposed multiterminal design can be performed in two different ways: either at the sink node or at a cloud server that communicates with the sink. Regarding the first approach that is considered in this thesis, as well as in the majority of state-of-the-art studies (see Section 3.1), the sink node is as- sumed to be unlimited in terms of power resources such that it can perform the

108 3.7. Discussions computationally-expensive decoding procedure. However, the computational and storage capabilities of the sink are limited. This problem is solved by following the second approach, where the decoding is performed at the cloud servers using power- ful hardware and guaranteed storage of the sensed data in case of equipment failure during operation. However, some important aspects need to be taken into account. In particular, the cloud servers should account for the communication delays based on the timestamps of the sensor data and then update the statistics of the time- varying marginal distributions and copula parameters. In this way, decoding will be done based on the up-to-date statistical parameters, achieving recovery performance based on the standards of the experimental evaluation. An experimental evaluation that compares the different approaches is left outside the scope of this thesis. Moreover, let us say a word about complexity at the encoder and the decoder sides. As mentioned in Section 1.1.1 (also see Fig. 1.2), each of the modules in a sen- sor node has a specific role and involves energy consumption that needs to be taken into account when designing a WSN system. In particular, the sensor nodes spend energy for data sensing, data storage (buffering), processing for source and channel coding, and data transmission over the wireless channel. In general, a good encoder for WSN nodes should consider two aspects: lightweight encoding and efficient com- pression. These aspects typically are contradictive: for achieving very good com- pression performance, complex algorithms need to be used. State-of-the-art schemes for multisensory setups attempt to provide efficient compression mechanisms that rely on arithmetic coding [49], an efficient technique that compresses data close to the source entropy. Compared to entropy coding, the proposed encoding scheme that uses DPCM followed by arithmetic entropy coding has marginally higher com- plexity. However, since we deal with slowly-varying data sources, the subtraction among adjacent values reduces the range where compression is applied on [174]; therefore, the data rate achieved by the proposed design is significantly reduced, as confirmed in our experiments, and as such it significantly reduces the transmission power required by the sensors. Entropy coding based on the Huffman coding technique [206], are similarly computationally intensive to distributed source coding schemes. However, both ap- proaches provide more complex encoding procedures than arithmetic entropy coding.

109 Chapter 3. Data Gathering via Multiterminal Coding and Copula Regression

100 120mA (20dBm) 44mA (14dBm) 80

60 49.24 40 Savings (%) 20

0 -11.36

0 816 32 64 128 256 Quantization Levels (M)

Figure 3.16: Current savings (in %) for different quantization levels M when the transmitter requires (a) 120μA (power level 20dBm) or (b) 44μA (power level 14dBm) for sending a packet with 12 bytes payload.

In general, DSC-based designs vary in complexity depending on the type of quan- tization (e.g., uniform scalar or TCQ) and/or the channel coder (e.g., LDPC [132], Turbo [133] or Raptor codes [62]). Moreover, some DSC designs allow for distributed joint source-channel coding [229] that, depending of the type of application, can pro- vide additional reductions in the data rates of the system. To show the impact of the proposed DPCM encoding procedure on the energy resources of a sensor node, we performed the following experiment: firstly, we varied the number of quantization levels M = 8, 16, 32, 64, 128, 256 and we calculated the average number of cpu cycles per sensor—over 103 iterations—required during the compression of 103 sensor readings8. Then, we computed the corresponding amount of current (in mA) based on the specs of a Texas Instruments MSP430 device9. Secondly, relying on the compression performance at each quantization level, we computed the number of excessive packets, which would be transmitted if no com- pression was applied, and we calculated the amount of current (in mA) that would be

8The encoding algorithm was running on an single-thread cpu fixed at 2.0GHz and 512MB of RAM. Note that although this method provides a high-level overview of the algorithmic complexity (expressed in CPU cycles), this is a broad approximation. 9Further information can be found at http://www.ti.com/lit/wp/slay015/slay015.pdf

110 3.7. Discussions needed for their transmission10 via the recent low-power wireless networking proto- col, specifically designed for IoT architectures [28], named Long Range (LoRa) [3]. Fig. 3.16 shows that for different power levels at the transmitter (i.e., 120μA at 20dBm and 44μA at 14dBm) the current savings are high, especially when the number of quantization levels decreases as the proposed encoding scheme is efficient and lightweight; the gains are smaller when lower distortion is required (i.e., large value of M). Moreover, we see that when the transmitter’s power level decreases (i.e., 44μA at 14dBm) and high reconstruction quality is required (M = 256), the proposed scheme does not provide energy savings but rather burdens the encoder with additional complexity. However, these results are calculated based on a MAT- LAB implementation of the encoding algorithm; more optimized implementations are expected to yield further energy savings. At the decoder side, the complexity significantly increases compared to the en- coder. This is something that we pursue since the sink does not have power limita- tions, as opposed to the sensor nodes where each consumed Joule plays an important role. The decoding complexity depends on the reconstruction algorithm used for the initial estimates (i.e., entropy decoding in [49] and DPCM decoding in the pro- posed scheme), as well as the complexity introduced by the refining stage. DPCM and entropy decoding have similar complexity levels, although DPCM decoding can use less quantization levels as explained before. At the refining stage, the differ- ent regression methods used to refine the block symbols assign different complexity levels to the considered schemes. Multivariate Gaussian regression used in [49] is a less complex inference method than the proposed semi-parametric copula regression method, because (i) the former has closed-form expressions, and (ii) the latter uses a sampling methodology on the copula density, namely, a procedure that requires considerable processing resources. Nevertheless, the proposed copula regression al- gorithm provides more accurate inference as it accounts for the heterogeneity among the data sources.

10For our calculations, we were based at the LoRa energy consumption calculator available from Semtech at http://www.semtech.com/wireless-rf/rf-transceivers/sx1272/.

111

Chapter 4

Data Gathering via Compressed Sensing with Side Information

The previous chapter introduces a novel data aggregation and reconstruction scheme that is based on MT coding and copula regression and targets at WSN setups of small-to-medium scale. The reader could reasonably pose the following question: can this efficient code design be realized in large-scale WSNs? The answer is affirmative, this design can be extended to setups with multiple sensors by adopting a cluster- based topology similar to the one mentioned in Chapter 3. However, this approach has a significant drawback: it does not account for balanced energy consumption in a large-scale network, since it behaves similarly to traditional code designs (see Section 1.2). Namely, the cluster heads that are closer to the sink should relay the information received by distant cluster-head nodes, thereby leading to faster deple- tion of their battery. Moreover, since the clusters should be organized in multi-level structures, an extension of this system to large-scale setups would add complexity and would require duty cycling (or, reconfiguration) when part of the sensors are not active. As a result, one of its assets, i.e., flexibility, might be compromised. To tackle the problem of unbalanced power consumption in the network, this

113 Chapter 4. Data Gathering via Compressed Sensing with Side Information chapter1 introduces a novel data aggregation and reconstruction mechanism for large-scale networks, which performs compressed sensing of heterogeneous data gath- ered by a large number of wireless sensor devices within a geographic area. Unlike previous data gathering schemes, which exploit underlying correlations among ho- mogeneous sensor data, the proposed design embodies a novel recovery algorithm— built upon belief-propagation principles [55, 135]—that leverages correlated infor- mation from multiple heterogeneous signals, called side information. To efficiently capture the statistical dependencies among diverse sensor data, the proposed algo- rithm makes use of the statistical model of copula functions. Apart from the proposed Bayesian-based solution, this chapter introduces an alternative algorithm that deals with use cases where statistical characterizations of the signals are not available. This algorithm relies on the extended version of the ` ` optimization problem [152–154]. Experimentation based on heterogeneous 1 − 1 air-pollution sensor measurements from the United States Environmental Protec- tion Agency database [4] showed that both the proposed copula-based design and the design with the algorithm for the extended ` ` problem provide significant 1 − 1 improvements in mean-squared-error performance against state-of-the-art schemes using classical compressed sensing, compressed sensing with side information (or, reference-based compressed sensing), and distributed compressed sensing, and offer robustness against measurement and communication noise. Moreover, the copula- based design provides the best performance compared to all designs, meaning that it is the preferred method when statistical descriptions of the signals can be inferred a priori. The remainder of the chapter is organized as follows: Section 4.1 briefly describes the key challenges met in a large-scale setup, whereas Section 4.2 reviews prior work. Section 4.3 details the proposed data aggregation and recovery architecture. Section 4.4 describes the copula-based statistical model for capturing diverse data types, whereas Section 4.5.2 elaborates on the proposed belief-propagation algorithm. The experimental results are provided in Section 4.7, whereas Section 4.8 draws the conclusion of the work. 1Part of the work presented in this chapter is published in [233, 234].

114 4.1. Problem Overview

4.1 Problem Overview

The technology behind wireless sensor networks and the Internet of Things enables sensing, collecting, and communicating data in urban and rural environments. Such a large-scale heterogeneous data collection poses various challenges in view of the limitations in transmission, computation and energy resources of the associated wireless devices. Apart from the problem of balanced energy consumption mentioned above, this study takes into account other challenges that are also mentioned in Chapter 1. A prime challenge is the design of data gathering schemes that effectively leverage de- pendencies among diverse (alias, heterogeneous) data types. Another key challenge is the minimization of power consumption on sensor devices, which operate under austere limitations in energy resources imposed by small batteries or harvesters. To this end, efficient designs should exploit both intra- and inter-sensor data dependen- cies without increasing the computational effort at the wireless sensors and without requiring excessive inter-sensor communication. Furthermore, increased power sav- ings can be achieved when sensors communicate over small distances, namely, from neighbor to neighbor, rather than directly to a sink. Therefore, data gathering is usually accomplished through multi-hop wireless transmission [15]. Finally, as infor- mation is sent over error-prone wireless channels, robustness against communication noise is another significant design aspect.

4.1.1 Application Scenario

To motivate the application domain considered in this chapter, we refer the reader to Figure 1.7 (Chapter 1) where it can be seen that use cases related to smart cities have attracted considerable interest. Therefore, this study focuses on the problem of air-quality monitoring, a pernicious problem especially as a plethora of regions worldwide urbanize rapidly. Recently, the Organization for Economic Co-operation and Development (OECD) reported that by the year 2050, outdoor air pollution is projected to be the top environmental cause of mortality, ahead of dirty water and lack of sanitation [8]. Human health is just one dimension of the air-pollution problem. Unclean air also affects crop growth and reduces the productivity of agri-

115 Chapter 4. Data Gathering via Compressed Sensing with Side Information culture [192]. Moreover, it affects climate in a complex manner, resulting in the global rise in temperature; a phenomenon known as “global warming”. Air-pollution monitoring involves a large number of wireless sensor devices that aggregate huge amounts of heterogeneous data over an urban area. According to the US Environmental Protection Agency (EPA) [4], data comprise measurements on several air pollutants, such as carbon monoxide (CO), nitrogen dioxide (NO2) and sulfur dioxide (SO2).

4.1.2 Prior Work

Prior studies on data aggregation propose predictive compression techniques for WSNs, such as DPCM followed by entropy coding [180,217]. Other approaches con- sider collaborative wavelet transform coding [56] or clustered data aggregation [130]. However, these techniques require excessive transmission of overhead information and, hence, additional encoding complexity due to inefficient handling of abnormal events. Finally, an alternative strategy adheres to DSC, a theory that leverages inter-sensor data correlation at the decoder side. DSC is a promising technique for WSNs as it shifts the computational burden towards the sink node. However, previous works on DSC [45, 62, 63, 200] perform well only for a limited number of sensors, typically two or three. To deal with this problem, Chapter 3 proposes an alternative design based on DSC principles—specifically, on multi-terminal source coding—that can perform efficient compression form multisensory setups. Although this design provides a very good solution for small-to-moderate networks, it can- not offer balanced energy consumption in the network when extended to large-scale setups. Distributed compressed sensing (DCS) [19, 69] is another approach to the data aggregation problem, where random measurements are transmitted from each sensor and the data are jointly recovered at the decoder by leveraging the spatiotemporal correlations. Since the work of Haupt et al. [100, 101], compressed sensing (CS) [39,66] has been used to devise an efficient technique for data gathering and recovery in large-scale WSNs. The technique is tailored to a multi-hop transmission paradigm and disperses the communication costs to all sensors along a given sensor data gathering route. This balances the power consumption of the sensing devices and

116 4.1. Problem Overview extends the lifetime of the sensor network. The work in [142] extends this framework by proposing a data recovery algorithm that combines principal component analysis (PCA) with CS for grid network topologies. Finally, an alternative scheme that considers multi-hop routing is presented in [134], where only the spatial correlation of the data is exploited at the sink node to recover the sensor readings.

4.1.3 Contributions

Prior studies on (distributed) compressed sensing [19, 69, 101, 152] consider signals produced from homogeneous sources, namely, signals of the same type and with identical structural or statistical traits. Many up-to-date applications, however, in- volve various sensors of heterogeneous modalities measuring diverse yet correlated data (e.g., various air pollutants, temperature, humidity, light). In this study, we propose a novel large-scale data aggregation framework that efficiently exploits both spatial and temporal dependencies among diverse data types produced by multiple heterogeneous data sources. Our specific contributions are as follows:

• We propose a new data sensing and recovery method, which builds upon the concept of Bayesian CS [20,112]. Our algorithm advances over this concept by incorporating multiple side-information signals, gleaned from heterogeneous correlated sources. This is in contrast to prior studies [152–154, 178], which consider signal recovery aided by a single side information signal.

• Previous CS approaches describe the underlying statistical dependencies among correlated sensor readings using (i) models with common sparse supports [69] or sparse common components [19, 69], (ii) simple additive models [22, 131], or (iii) joint GMM [178, 179]. In this work, however, we model the under- lying correlations among heterogeneous data sources by using copula func- tions [158,193]. Copula functions construct multivariate distributions by mod- eling the marginal distributions and the dependence structure separately. As such, they capture complex dependencies among diverse data more accurately than existing modeling approaches [19, 69]. Furthermore, we propose to ex- plore copula-based graphical models (based on belief propagation [55,135]) for recovering the heterogeneous signals.

117 Chapter 4. Data Gathering via Compressed Sensing with Side Information

• To provide an efficient recovery solution when the statistical descriptions of the signals (signal of interest as well as side-information signals), we devise a novel recovery algorithm that solves the extended version of the ` ` 1 − 1 optimization problem (see Section 4.6). Although this method is not better than the proposed copula-based method, it outperforms the state-of-the-art solutions. during sink Section 4.3.2).

• Experimental results, using actual air-pollution sensor measurements from the US EPA [4], show that, for a given data rate, the proposed method provides sig- nificant reductions in the relative error (see Section 4.7) of the recovered data compared to classical CS [101], CS with side information [152], and DCS [19] methods. Alternately, it reduces the required data rates for a given data re- construction quality, thereby resulting in less network traffic and prolonged system lifetime.

• We show that the proposed design offers increased robustness against measure- ment and communication noise compared to classical CS [101], CS with side information [152], and DCS [19] methods, thereby constituting a well-fitted solution for large-scale data monitoring.

4.2 Background

The pioneering work of Kotelnikov, Nyquist, Shannon, and Whittaker provided the theoretical foundations on sampling continuous-time bandlimited signals (for fur- ther details, see [122, 162, 189, 222]). They showed that signals provided by various data sources, such as sound, image or videos, can be perfectly reconstructed from uniformly-spaced samples that are taken at a rate of twice the highest frequency present in the signal of interest. This frequency is widely known as Nyquist fre- quency. In many important and emerging applications, the resulting Nyquist sampling rate is so high that results in far too many samples. Nowadays, building devices for gathering samples at such large quantities may simply be too costly, or even physically impossible. To address these challenges, compressed sensing (CS) has

118 4.2. Background

x x x x A A A A x x x x

p = 1 p = 2 p = p = 1 ∞ 2 Figure 4.1: Best approximation ˆx of the two-dimensional vector x by a one- dimensional subspace A using the `p norms with p = 1, 2, , and 1 ∞ for the `p quasi-norm with p = 2 [71]. emerged as a new framework for signal acquisition and sensor design. CS enables large reductions in the sampling and computation costs for sensing signals that have a sparse or compressible representation. The fundamental idea behind CS is the following: instead of first sampling at a high rate and then compressing the sampled data, we would like to find ways to directly sense the data in a compressed form, at a lower sampling rate. Fundamental studies of Cand`es, Romberg, Tao [37–39], and Donoho [66] showed that a signal having a sparse representation can be exactly recovered from a small set of linear, nonadaptive measurements. This impressive result suggests that it may be possible to sense sparse signals by taking far fewer measurements; hence the name compressed sensing. In the next sections, we shortly explain the fundamentals of the compressed sensing theory, and we mention some extensions, such as CS with side information and distributed compressed sensing.

4.2.1 Compressed Sensing

In the classical CS framework, the signal of interest x RN can be written in ∈ N0 the form x = Ψs, where s R is its K-sparse representation (i.e., s 0 = K) ∈ k k N N and Ψ R × 0 is an orthonormal or overcomplete basis, called dictionary. Let ∈ M N Φ R × be another matrix, called sensing (or encoding) matrix, such that the ∈ measurement matrix A = ΦΨ satisfies either the mutual coherence property (MCP) [67], the Restricted Isometry Property (RIP) [38] or the Null Space Property (NSP) [42]. CS theory states that x can be recovered using the measurement matrix A and

119 Chapter 4. Data Gathering via Compressed Sensing with Side Information

M N linear random measurements y = Φx = As by solving, for example, Basis  Pursuit [47]:

ˆs = arg min s 1 s k k (4.1) s. t. As = y

In particular, CS theory states that (4.1) has a unique solution equal to s whenever the number of measurements M is sufficiently large. In particular, when the entries of the matrix A are i.i.d. drawn from a Gaussian distribution and, for the number of measurements, it holds that

N 7 M > 2K log + K, (4.2) K 5 then it is guaranteed that Basis Pursuit has a unique solution with high probability [42]. This is known to be a tight bound for Gaussian matrices [18]. Since we deal with recovery of sparse signals, a first thought could be using the `0-norm instead of `1-norm in Eq. (4.1). However, an objective function of the form s is non-convex, and hence Problem (4.1) becomes very difficult to solve. k k0 In fact, one can show that for a general matrix A, finding a solution is NP-hard.

Therefore, by replacing the `0-norm with the the `1-norm we manage to transform a computationally intractable problem into a tractable one. However, it may not be immediately obvious why the solutions provided by the two problems are similar.

There are intuitive reasons to expect that the use of `1 minimization will indeed promote sparsity. First consider the following example: we wish to approximate a two-dimensional sparse vector x R2 using a point in a one-dimensional affine ∈ space A. If the approximation error is measured via the `p norm, then our task is to find xˆ A that minimizes x xˆ . The choice of value p has a significant impact ∈ k − kp on the value of the approximation error. Fig. 4.1 provides a geometrical explanation of the problem. Finding the closest point in A to x using an `p norm can be viewed as growing an `p sphere centered on x until it intersects with A. It can be observed that when p becomes larger, the approximation error tends to spread out evenly among the approximation error, whereas smaller values of p result in a more uneven error distribution; in the latter case, the error tends to be sparse2 [57]. This example

2This problem can be generalized to higher dimensions.

120 4.2. Background

shows that the solution of the `1 minimization problem coincides exactly with the solutions to the `p minimization problem for any p < 1, and notably, is sparse.

The use of `1 minimization to promote or exploit sparsity has a long history, dating back at least to the work of Beurling on Fourier transform extrapolation from partial observations [26]. Moreover, the `1 minimization approach received significant attention in the statistics literature as a method for variable selection in regression, known as the Lasso [209]. Lasso is also used in compressed sensing for signal recovery in the presence of measurement noise. In particular, to describe the generation of measurements we consider the following model

y = As + z, where z RM is an additive noise vector3. The recovery of the sparse representation ∈ is done via the Lasso or Basis Pursuit De-Noising (BPDN) problem [47]

1 2 ˆs = arg min y As 2 + γ s 1, (4.3) s 2k − k k k where γ is a parameter that controls the trade-off between sparsity and reconstruc- tion fidelity. Instead of assuming that s is strictly sparse (i.e., s = K), several k k0 works [20] (including ours) focus on compressible signals, i.e., signals whose coeffi- cients, when sorted in order of decreasing magnitude, decay exponentially.

4.2.2 Compressed Sensing with Side Information

In the previous section, we saw that, in a typical CS scenario, we attempt to recover the (structured) target signal s based on the linear measurements y and the mea- surement matrix A. It is important to notice that this recovery is feasible only when the target signal s is structured, namely, (approximately) sparse or compressible. Suppose that during decoding we also have access to another signal that is simi- lar to the target signal. This occurs when reconstructing sequences of signals (e.g.,

3Here we assume that the errors (y Ax) are generated i.i.d. from a Gaussian distribution, as this is the typical case in the compressed− sensing framework [71]. In this case, for regularization we use the `2 norm. If the errors are assumed to follow a Laplace distribution, the use `2 will still provide an estimate, which can be further improved if we consider the following: the `2 norm could be replaced by the `1 norm, since the latter is minimized by the median of the errors, which is the ML estimator when the errors are drawn i.i.d. from a Laplace distribution.

121 Chapter 4. Data Gathering via Compressed Sensing with Side Information video [117] and estimation problems [43]), or, more importantly, when we have ac- cess to prior similar signals, such as in sensor networks [69], multiview cameras [210], and medical imaging [46]). In this case, CS theory can be modified to leverage a signal correlated to the signal of interest, called side information, which is assumed to be provided a priori at the decoder in order to aid reconstruction [152–154,178]. In the theory of CS with side information, the decoder aims at recovering the signal x based on the measurements y, the matrix A and a side-information vector, say w, correlated with s. The work in [152–154] provides guarantees for a particular way of integrating side information into CS. This is done by adding to the objective of

Basis Pursuit the `1 norm of the difference between the signals s and w, yielding the so-called ` ` minimization problem: 1 − 1

ˆs = arg min ( s 1 + s w 1) s k k k − k (4.4) s. t. As = y.

In [152], the authors also studied the case where the objective function in (4.4) replaces the term s w with s w ; this is known as ` ` minimization k − k1 k − k2 1 − 2 and, as we will see later, it has similar behavior with classical CS. The following theorem (Theorem 3 in [152]) provides the lower bound for the number of measure- ments such that the ` ` minimization problem has a unique solution with high 1 − 1 probability. Before we proceed to the theorem, we present some definitions that are also considered in [152]:

• The Gaussian width (aperture) of a cone C RN is defined as π(C) = ⊂ N Eg[supz g>z : g C BN (0, 1) ], where g R is drawn i.i.d. from a stan- { ∈ ∩ } ∈ dard normal distribution, Eg denotes the expected value wrt g, and BN (0, 1) N denotes the unit `2-norm ball in R .

• The tangent cone of a convex function f : RN R at a point s? is defined as → T (s?) = cone s s? : f(s) f(s?) , where cone C = αc, α > 0, c C is f { − ≤ } { ∈ } the cone generated by the set C.

• If s? RN is the vector to reconstruct and w RN the side information, ∈ ∈ ands ˆ(i) and w(i) denote the i-th element of s? and w, respectively, then

122 4.2. Background

hˉ = i : s?(i) > 0, s?(i) > w(i) ˆi : s?(i) < 0, s?(i) < w(i) and ζˉ = i : |{ } ∪ { }| |{ w(i) = s?(i) = 0 i : w(i) = s?(i) = 0 . 6 }| − |{ }| Theorem 4.2.1. Let ˆs RN be the vector to reconstruct and let w RN be the ∈ ∈ side information. Moreover, let f (s) = s + s w , and assume that hˉ > 0 and 1 k k1 k − k1 that there exists at least one index i for whichs ˆ(i) = w(i) = 0. Let the entries of M N matrix A R × be drawn i.i.d. from the Gaussian distribution with zero mean ∈ 1 and variance M . Then, it holds that

N 7 ζˉ π (T (s?))2 2hˉ log + K + . (4.5) f1 ≤ ζˉ 5 2 K + 2  

Namely, if N 7 ζˉ M 2hˉ log + K + + 1, (4.6) ≥ ζˉ 5 2 K + 2   then s? is the unique solution of Problem (4.4), with probability at least

1 1 + exp (λ M)2 , −2 M −  

M where λM is the expected length of a vector in R drawn from a standard normal distribution.

Fig. 4.2 shows the rate of successful reconstruction (alias, success rate) as a function of the number of measurements [152]. Note that successful recovery happens ? ˆs s 2 when k −s? k < 0.01, where ˆs is the solution of the Basis Pursuit problem or the k k2 ` ` minimization problem. It is clear that the ` ` minimization requires less 1 − 1 1 − 1 measurements for successful reconstruction than standard CS. Also, notice that the ` ` minimization problem is also included in the comparison and it can be seen 1 − 2 that its performance is similar to CS [152]. Other studies were presented in [120,168,184,216], where prior information about the sparsity structure of the sparse representation s is used instead of a correlated side-information signal. In [216], although sufficient conditions for exact reconstruc- tion and error bounds were derived, the performance of the algorithm was not fully characterized. A different scheme was proposed in [210], where the authors made the assumption that the prediction error between the side information and the signal is

123 Chapter 4. Data Gathering via Compressed Sensing with Side Information

Rate of successful reconstruction 1.0

0.8 !1-!1 bound (5) CS bound (4) 0.6

0.4 !1-!2 !1-!1 CS !1-!2 bound (7) 0.2

0 0 100 200 300 400 500 600 700 Number of measurements m Figure 2. Reconstruction rate of standard CS (1), - minimization, and Figure 4.2: Reconstruction rate of standard CS, ` ` minimization, and 1 − 1 `1 `2 minimization. The vertical lines are the bounds. For the experiments,− K = 70, N = 1000 and the side information is 28- sparse. Each algorithm was run 50 times, and the success rate is the average value [152]. sparser than the signal itself. Hence, the estimation error was recovered instead of the sparse signal.

4.2.3 Distributed Compressed Sensing

To exploit spatiotemporal dependencies among multiple signals, Duarte et al. [19,69] proposed distributed compressed sensing (DCS). DCS typically assumes a number of sensors measuring signals that are each individually sparse in some basis and also correlated across sensors. Each sensor independently encodes its signal by projecting it onto another, incoherent basis and then transmits just a few of the resulting coefficients to a single collection point. Under the right conditions, a decoder can jointly reconstruct each of the signals precisely. DCS utilises a joint sparsity model to describe both the intra- and inter-sensor dependencies in the network. In particular, N the sensor signals xτ R , τ 1, 2,...,T , can be written as xτ = zc + zτ , where ∈ ∈ { }

T is the number of sensors. The vector zc = Ψszc is common to all sensor signals x , with s = K , whereas the vectors z = Ψs are the unique parts of each τ k zc k0 c τ τ x that all have the same sparsity s = K in the same orthonormal dictionary τ k τ k τ

124 4.2. Background

The sink node obtains M repetitions randomly generates another receives the final weighted sum of the sensors readings, 2 N and transmits, calculates thevalue the weightedφj,1x(1) sum i=1 φj,ix(i) i.e., y(j) = i=1 φj,ix(i). The index randomly. To generates generalize, another each node i generates weighted sum, where j = 1 ! !

1, the procedure is initialized by sensor. Subsequently,1, which randomly nodeand sends2 generates it to node number3. To generalize, eachN node Sink Node Figure 4.3: Multi-hop transmission in a large-scale WSN using CS [100, 134].

N N Ψ R × [19]. The optimization problem takes the form: ∈

T

ˆsall = arg min szc 1 + γτ sτ 1 sall k k k k τ=1 ! (4.7) X s. t. A s = y ext ∙ all all

T T T T T T where sall = s1 ... sT and yall = y1 ... yT are the extended sparse signal

vector and the extended measurements vector, respectively. Aext is the extended sensing matrix given by

A A 0 0 0 1 1 ∙ ∙ ∙ A2 0 A2 0 0  ∙ ∙ ∙ Aext = ...... ,  ......   . . . . .    A 0 0 0 A   T ∙ ∙ ∙ T    (τ) (τ) where Aτ = Φ Ψ, with Φ being the sensing matrix that corresponds to sensor

τ. In general, it should hold that γτ > 0. In our experiments, we assumed that γ = = γ = 1. 1 ∙ ∙ ∙ T

4.2.4 Data Gathering with CS

Instead of following a classical multi-hop scenario, where each node sends to its neighbour both its own information and the relayed information from other nodes, previous works [100, 134] assumed a different approach in which a weighted sum of readings is transmitted. Let x(i) R denote the reading of sensor i 1, 2,...,N . ∈ ∈ { }

125 Chapter 4. Data Gathering via Compressed Sensing with Side Information

As seen in Fig. 4.3, in the approach of [100, 134], the procedure is initialized by sensor 1, which randomly generates a number φj,1 and transmits the value φj,1x(1) to node 2. Subsequently, node 2 randomly generates another number φj,2, calculates 2 the weighted sum i=1 φj,ix(i) and sends it to node 3. To generalize, each node i generates a randomP number φj,i, computes the value φj,ix(i), adds it to the sum of i the previous relayed values and sends n=1 φj,nx(n) to node i + 1. The procedure repeats until node N sends its informationP to the sink node, which receives the final N weighted sum of the sensors readings, i.e., y(j) = i=1 φj,ix(i). The procedure is repeated for M times, each indexed by j = 1,...,MP. Hence, the sink node obtains M weighted sums y(j) M , which can be expressed as { }j=1

y = [φ ... φ ... φ ] x 1 i N ∙ = Φ x, (4.8) ∙ where y = [y(1) . . . y(j) . . . y(M)]T is the vector with the weighted sums (a.k.a. T measurements), φi = [φ1,i . . . φj,i . . . φM,i] is the column vector with the randomly generated numbers at the node i, and x = [x(1) . . . x(i) . . . x(N)]T is the vector with readings from different sensors. The reconstruction of the sensor values x at the sink node can be done via classical CS algorithms, such as OMP [211], CoSaMP [157], AMP [68] or BP-CS [20]. In order to avoid sending the sensing matrix from the sensors to the sink node, the following strategy is adopted: the sink node broadcasts a random seed to the entire network; using this global seed, each sensor generates its own seed based on its id. The sensing coefficients are generated by a pseudo-random number generator that is pre-installed on every sensor. These coefficients can be reproduced at the sink given that the identification numbers of all sensors are known.

4.3 Proposed Architecture based on Compressed Sensing with Side Information

The architecture proposed in [100, 134] exploits the intra-source correlation among the sensor readings using CS. In an air-pollution monitoring setting, if the sensors

126 4.3. Proposed Architecture based on Compressed Sensing with Side Information

same sensor, different source i (L) ) n=1 φj,n xL(n) : XL : ! "

) i φ(2)x (n) Pollutants : n=1 j,n 2 (sources) : X2 : !

i "#(1) L n=1 φj,nx1(n) i) X1 : ! 1, the procedureSensor is initialized id:. Subsequently, by sensor 1, which node 2 randomly generatestoi"# node numberi + 1. The procedureN repeats until

Sink

R !

Reconstructed : signals R !

R !

Figure 4.4: The schema of the proposed system for gathering and reconstruct- ing diverse sensor data. For each source Xl, the multi-hop trans- mission among the sensors takes place for Ml repetitions until the measurements vector yl is fully formed at the sink node.

127 Chapter 4. Data Gathering via Compressed Sensing with Side Information

gather values from different pollutants, alias, sources (e.g., CO, NO2, SO2), the reconstruction of each source is conducted independently, ignoring the underlying inter-source data dependencies. In this work, we propose a novel data gathering and recovery design that leverages both intra- and inter-source data correlations among sensor readings. We consider a large-scale WSN comprising N wireless devices that form a multi- hop route to the sink, as depicted in Fig. 4.4. Each device i 1, 2,...,N ob- ∈ { } serves the correlated sources X1,X2,...,XL that take values in their correspond- ing continuous alphabets , ,..., . We denote as x (i) a reading produced X1 X2 XL l from source X , l 1, 2,...,L and observed at sensor i. A tuple of readings l ∈ { } (i) T x = [x1(i), . . . , xL(i)] is assumed to be drawn i.i.d. according to the joint prob- ability density function (pdf) fX(x1, x2, . . . , xL).

4.3.1 Data Gathering under Noise

During the data aggregation stage, the nodes first acquire the correlated tuples of readings and they proceed to the transmission of the readings of each source separately. Initially, they start transmitting the values of source X1 using the data aggregation mechanism described in Section 4.2.4. When all measurements of source

X1 are gathered at the sink node, we repeat the data gathering procedure for the rest of the sources, namely, X2,...,XL. For each source l 1, 2,...,L , the sink node obtains M weighted sums ∈ { } l y (j) Ml , which can be expressed as { l }j=1

y = φ(l) ... φ(l) ... φ(l) x l 1 i N ∙ l h i = Φ(l) x , (4.9) ∙ l

T where yl = [yl(1) . . . yl(j) . . . yl(Ml)] is the vector with the weighted sums (a.k.a. measurements), φ(l) = [φ(l) . . . φ(l) . . . φ(l) ]T is the column vector with the ran- i 1,i j,i Ml,i (l) domly generated numbers at the node i grouped in the matrix Φ , and xl = T [xl(1) . . . xl(i) . . . xl(N)] is the vector with readings from different sensors observ- ing the source Xl. (l) In data acquisition systems, the value φj,ixl(i) that contributes to the j-th mea-

128 4.3. Proposed Architecture based on Compressed Sensing with Side Information surement and is sent from the node i to its neighbor, is contaminated with noise (e.g., due to quantization). This noise is usually modelled as additive white Gaus- sian noise (AWGN) Z(l) (0, σ(l) ), where σ(l) is the standard deviation. meas,j ∼ N meas,j meas,j (l) In addition, the value φj,ixl(i) is corrupted by additive white Gaussian transmis- sion noise Z(l) (0, σ(l) ) with standard deviation σ(l) . Hence, the j-th trans,j ∼ N trans,j trans,j measurement received at the sink node is written as

N N N (l) (l) (l) yl(j) = φj,ixl(i) + zmeas,j (i) + ztrans,j (i) i=1 i=1 i=1 X X X N N (l) (l) (l) = φj,ixl(i) + zmeas,j (i) + ztrans,j (i) i=1 i=1 X X h i N (l) = φj,ixl(i) + zl(j), (4.10) i=1 X

(l) (l) (l) (l) where zmeas,j (i) and ztrans,j (i) are i.i.d realizations of Zmeas,j and Ztrans,j , respec- tively. Also, in (4.10) zl(j) denotes the aggregate noise that corrupts the measure- ment yl(j) due to sensing and transmission, and is generated from the normal dis- tribution (0, σ(l) ) with standard deviation4 N z,j

2 2 (l) (l) (l) σz,j = N σmeas,j + σtrans,j . (4.11) s     

(l) N (l) and zTR(j) = i=1 ztrans,j (i) the total measurement and transmission noise value (l) (l) applied to the Pj-th measurement of source Xl. We assume that zMS(j) and zTR(j) are drawn from (0, σ(l) ) and (0, σ(l) ), respectively, where σ(l) and σ(l) N MS,j N TR,j MS,j TR,j denote the corresponding standard deviations for the j-th measurement of source

Xl. The measurements in (4.9) are then written as

y = Φ(l) x + z , (4.12) l ∙ l l

T where zl = [zl(1) . . . zl(j) . . . zl(Ml)] is the aggregate noise vector drawn i.i.d. from (0, σ(l) ); hence, the subscript j in (4.11) can omitted. N z,j 4This holds since the measurement noise and the transmission noise are assumed to be described by independent random variables.

129 Chapter 4. Data Gathering via Compressed Sensing with Side Information

4.3.2 Successive Data Recovery

Upon receiving the set of measurements from all the sources, the sink node pro- ceeds to the data recovery stage, which deals with reconstructing the sensor readings x L based on the measurements y L and the matrices Φ(l) L . Typically, { l}l=1 { l}l=1 { }l=1 the signal vectors x L can be respectively projected on a dictionary Ψ, resulting { l}l=1 1 L in their representation vectors: s = Ψ− x . This transformation is used in { l l}l=1 order to leverage the desired structure of the representation vectors during CS re- covery. In our use case, several transform bases, such as the discrete cosine transform

(DCT) basis ΨDCT, can be used for providing efficient compressible descriptions. Applying classical CS algorithms, such as OMP [211], CoSaMP [157], AMP [68] or BP-CS [20], to independently recover each signal vector xl, based solely on the (l) measurements yl and the matrix Φ , would only leverage the intra-source signal correlation (baseline system). A more sophisticated scheme would consider applying DCS [19, 69] so as to recover the ensemble of signals x L using the ensemble of { l}l=1 measurements vectors y L and the matrices Φ(l) L . Indeed, DCS leverages { l}l=1 { }l=1 both the inter- and intra-source dependencies by assuming a joint sparse model between the sources. As we will show in our experimental results, DCS does not effi- ciently capture underlying dependencies among heterogeneous sources, such as var- ious air pollutants, which have different ranges and marginal statistics. Instead, we introduce a Bayesian CS recovery method that leverages diverse correlated signals as side information by using the statistical model of copula functions [158,193]. Based on this method, we propose an architecture that performs successive reconstruction of the sensor readings x L . In particular, the joint decoder at the sink node is sep- { l}l=1 arated into L recovery stages. At the l-th stage, the reconstruction of the represen- (l) tation vector sl is done via our proposed recovery algorithm (yl, Φ ,ˆsl,...,ˆsl 1) R − (l) that uses the corresponding gathered measurements yl, the matrix Φ as well as all the previously reconstructed representation vectors ˆs1,...,ˆsl 1 as multiple side in- − formation. The final estimate of the signal vector is then computed by ˆxl = ΨDCTˆsl. In Section 4.4, we review some key points on copula functions (for further details see Chapter 2), whereas in Section 4.5.2 we propose the method that deploys belief- propagation based reconstruction.

In practice, the number of measurements of each source, Ml, can be determined

130 4.3. Proposed Architecture based on Compressed Sensing with Side Information by trial-and-error, assuming that the signals are stationary in the sense that their sparsities (not necessarily their sparsity patterns) and dependencies are more and less constant. The proposed method reduces the number of messages sent over the network, in view of the following argument: in a typical multi-hop scenario, where no compressive data gathering is assumed, the total overall message complexity5 is (LN 2). When O applying compressive data aggregation, the corresponding complexity is ( NM ), O l l which is lower than the previous figure because M < N, l 1, 2,...,L . As we l ∀ ∈ { P} will show in the experimental results, our scheme leads to a systematically lower number of measurements M , l, than state-of-the-art compressive data gathering l ∀ schemes [134] because it accounts for multiple side information signals at the data recovery stage. It is important to notice that the proposed compressive data gathering design has the following attribute: if some of the measurements symbols are contaminated by noise or are even lost, the decoder can still recovery the data at the expense of lower relative error performance (alternately, higher number of number of transmissions in the network).

4.3.3 Extension to Tree-Based Topologies

In practical settings, designing an extremely long multi-hop routing path via many sensors is difficult. This is due to variations in the density of sensor devices in urban and rural areas. For example, in an urban environment the sensor density is signifi- cantly higher, mainly because: (a) non line-of-sight transmissions between sensors in a city area call for less inter-sensor distance, and (b) urban regions contain numer- ous pollution sources and, hence, more refined pollution monitoring is required. To address this issue, our design accounts for the fact that networks are organized in a tree-based structure [15]. In a typical tree-based design, the sink node comprises a number of children nodes, called root nodes, where each of them aggregates the sen- sor values of its assigned subtree. Within each subtree, the proposed data gathering scheme is applied: each parent node waits until receiving the values from its children nodes, it adds its own value and then transmits the weighted sum to the next parent

5Here we provide an upper bound of the complexity of each scheme to aid the reader’s intuition.

131 Chapter 4. Data Gathering via Compressed Sensing with Side Information

Sink

Root 1

Root 2 Root R

Figure 4.5: Tree-based network structure based on the sensor coordinates from the EPA database. node. The procedure repeats until the root nodes receive the weighted sums from their children nodes. The root nodes are capable of transmitting information to the sink node via the Internet. Fig. 4.5 depicts the considered topology for some regions in USA, based on the sensor coordinates from the EPA database. The number of the sensors in a subtree depends on the distance between neigh- boring sensors, which in turn is defined by the transmission protocol. In this work, we assume that the transmission in the PHY layer follows the recent low-power wireless networking protocol, specifically designed for IoT architectures [28], named Long Range (LoRa) [3]. When the tree-structured topology of the network changes, the decoder can still perform successful recovery. The reasoning is the following: in a practical setup, when a sensor sends its information (i.e., the weighted sum) to a neighbour, the packet also includes the sensor id. Therefore, the sink receives the measurement symbol (i.e., the weighted sum from the N-th node) along with the sensor ids’ that contributed to the calculation of the specific symbol. The latter means that the sink knows exactly

132 4.4. Statistical Modelling For Diverse Data

0.9 Histogram (200 bins) 0.8 Laplace Cauchy Non-parametric 0.7

0.6

0.5

0.4 Probability 0.3

0.2

0.1

0 -10 -5 0 5 10 DCT Coefficients

Figure 4.6: Fitting of known distributions (Laplace, Cauchy and KDE with a Gaussian kernel) on DCT-transformed CO readings (EPA database [4]). which coefficients of the sensing matrix are used for this calculation. In case of a topology change, the receiver will receive the new list of sensor ids’ contributed to the measurement symbol and will assign the corresponding coefficients of the sensing matric to perform successful recovery.

4.4 Statistical Modelling For Diverse Data

In Chapter 2 we study the statistical behavior of various sensor signals containing readings from diverse data sources and we see that they have highly different ranges and marginal statistics. Nevertheless, these signals are intrinsically correlated and hence an efficient modelling tool is needed to capture the dependence structure. As a result, Chapter 2 proposes the use of copula functions [158, 193] to model the dependencies among these heterogeneous signals. Here we need to slightly deviate from this approach. Specifically, we need to describe the dependence structure in the

133 Chapter 4. Data Gathering via Compressed Sensing with Side Information

0.9 Histogram (200 bins) 0.8 Laplace Cauchy Non-parametric 0.7

0.6

0.5

0.4 Probability 0.3

0.2

0.1

0 -10 -5 0 5 10 DCT coefficients

Figure 4.7: Fitting of known distributions (Laplace, Cauchy and KDE with a Gaussian kernel) on DCT-transformed NO2 readings (EPA database [4]). sparsifying domain (i.e., DCT domain) among the representation vectors of these signals, namely, s L . { l}l=1 First, we study the marginal statistics of the representations. Figs. 4.6, 4.7 and 4.8 show various pdfs fitted on DCT coefficients of several air pollutants collected by the EPA database [4]. It is clear that fitting accuracy of different distributions, such as the Laplace, the Cauchy and the non-parametric distribution (via the KDE method) depends on the data type. In particular, using Kolmogorov-Smirnov tests

[143], Table 4.1 shows that CO and SO2 are more accurately described by the Cauchy distribution, whereas the Laplace distribution suits NO2 data. Thus, we realize that the representation vectors are also heterogeneous and in order to accurately express their dependencies we need to use copula functions. This is because, as mentioned in Chapter 2, they allow data from individual sensors to have arbitrary marginal distributions, merging them into a multivariate pdf. Although Chapter 2 elaborates on the copula model, we mention some of the

134 4.4. Statistical Modelling For Diverse Data

0.6 Histogram (200 bins) Laplace 0.5 Cauchy Non-parametric

0.4

0.3 Probability 0.2

0.1

0 -10 -5 0 5 10 DCT coefficients

Figure 4.8: Fitting of known distributions (Laplace, Cauchy and KDE with a Gaussian kernel) on DCT-transformed SO2 readings (EPA database [4]). key points of this model so that we ease the understanding of the proposed design. Copula functions are used to model the statistical distribution of the random vector (i) T S = [S1(i),...,Sl(i),...SL(i)] , where Sl(i) denotes the random variable that 6 describes the DCT coefficients from the source Xl, observed from sensor i. We as- sume that the realizations of the vector S(i), namely, s(i), are drawn i.i.d. from their joint pdf f (i) (s , . . . , s ), which remains the same for all sensors i 1, 2,...,N . S 1 L ∈ { } Given this assumption, in the rest of the description we omit the sensor index i. Although some of the following information has been already mentioned in Chapter 2, we find it useful to describe some key aspects required for the understanding in this gathering scheme.

Let FS1 (s1),...,FSl (sl),...,FSL (sL) be the marginal cumulative distribution functions (cdfs) of the random variables in S. As explained in Chapter 2, Sklar’s theorem [193] states that a unique L-dimensional copula function C : [0, 1]L [0, 1] → 6Section 4.7.1 explains why we use DCT as sparsifying basis.

135 Chapter 4. Data Gathering via Compressed Sensing with Side Information

Table 4.1: Asymptotic p-values when performing Kolmogorov-Smirnov fitting tests between the DCT coefficients of the actual datasets and the sets produced by the Laplace, the Cauchy and the Non-Parametric distribution with Gaussian Kernel. The significance level is set to 5%.

Laplace Cauchy KDE 20 CO 0.0031 0.6028 9.4218 10− × 21 NO2 0.5432 0.1441 2.2777 10− × 21 SO2 0.0471 0.9672 1.0626 10 × − exists such that

FS(s1, . . . , sl, . . . , sL) = C u1, . . . , ul, . . . , uL , (4.13)  where FS is the L-dimensional joint distribution of S, ul = FSl (sl). Differentiation of the expression in (4.13) with respect to s , for all l 1, 2,...,L , yields the l ∈ { } multivariate pdf of the DCT coefficients

L f (s , ..., s ) = c u , . . . , u , . . . , u f (s ), (4.14) S 1 L 1 l L × Sl l l=1  Y where c( ) is the copula density and f (s ) the marginal pdf of S . Given the ∙ Sl l l marginal pdfs of the random variables, an appropriate copula function that best captures the dependencies among the data should be selected. In this Chapter, we consider various copulas, such as the multivariate Gaussian and Student’s t copulas for the elliptical family, as well as the Frank, the Clayton and the Gumbel copulas from the Archimedean family. More information about these functions is provided in Chapter 2. For the marginal statistics we abide by parametric approaches. In particular, based on Kolmogorov-Smirnov fitting tests on various pollutants (see Table 4.1), we have considered the following pdfs to describe the DCT coefficients of the data:

1. the Laplace distribution (μ , b ) L s s 1 s μ f (s ; b ) = exp | l − s| , (4.15) Sl l s 2b − b s  s 

136 4.5. Copula-based Belief Propagation Recovery

where bs is the scale parameter and μs is the mean value.

2. the Cauchy (or Lorentz) distribution

1 s α 2 − f (s ; α , β ) = πβ 1 + l − s , (4.16) Sl l s s s β "  s  #!

where βs is the scale parameter that specifies the half-width at half-maximum,

and αs is the location parameter.

Other univariate distributions for the marginal statistics of DCT coefficients have been proposed in [78, 124].

4.5 Copula-based Belief Propagation Recovery

In many applications, such as in large-scale data gathering in WSN setups, we have access to considerably more a priori information on the coefficient structure in addition to signal sparsity, which previous approaches to CS fail to leverage [41]. Nevertheless, by exploiting this prior information we can make CS better, stronger, and faster. To that end, we abide by a probabilistic, Bayesian approach to sparse signal recovery based on graphical models (GM) [27,125]. This approach allows for representing the prior information of the representation vectors. In particular, the posterior pdf can be written as follows:

l 1 l l 1 fS Y ,Sl 1 (sl yl, s − ) fY Sl yl s fS Sl 1 sl s − l| l − | ∝ l| | ∙ l| − | ( ) l 1 ∗  l 1  = fYl Sl (yl sl) fS S sl s − (4.17) | | ∙ l| − | Ml N  l 1 l 1 = fYl Sl (yl(j) sl) fS S sl s − , (4.18) | | l| − | j=1 i=1 Y Y 

l 1 l 1 l 1 where S − = S1,..., Sl 1 , s − = s1,..., sl 1 , s − = s1, . . . , sl 1 , and the { − } { − } { − } l 1 equality ( ) holds due to the Markov chain S − S Y . Moreover, from (4.17) ∗ → l → l to (4.18), we used the assumption that measurement noise at different sensors is independent, and also that each sensor observes independent realizations of the random vector S. Optimal reconstruction via the maximum a posteriori (MAP) or

137 Chapter 4. Data Gathering via Compressed Sensing with Side Information the minimum mean-squared-error (MMSE) criteria is difficult—or even infeasible— due to the complexity of the posterior distribution in (4.18). To overcome this issue, similarly to [20], we resort to an approximate Bayesian inference technique for CS recovery named loopy belief propagation [55, 135], which allows for rapid sensing and recovery procedures and offer high recovery accuracy [20, 41]. Apart from belief propagation, in the literature there have been proposed al- ternative inference methods, such as Monte Carlo Markov chain methods [208] and variational Bayes algorithms [27,116]. In Section 4.5.1 we provide a short description of these methods along with their assets and drawbacks. Subsequently, in Section 4.5.2 we describe the proposed copula-based belief-propagation method.

4.5.1 Background on Bayesian Inference

Approximate Bayesian inference algorithms on GMs typically leverage the factoriza- tion properties of probability distributions, the likelihood of which can be efficiently computed in a distributed manner by manipulating the intermediate factors [41]. In other words, these algorithms are able to decompose the required computation into calculations that are local to each node in the GM. Monte Carlo Markov chain (MCMC) sampling methods [208] provide computational means of doing approxi- mate inference. The basic idea is to represent the pdf by a discrete set of samples, which is carefully weighted by some evidence likelihood so that the inference is con- sistent; guarantees typically improve with larger sample sizes. The main drawback of MCMC methods is that they are computationally very intensive, and also suffer from difficulties in diagnosing convergence [224]. Variational Bayes (VB) methods [27, 116] is another category of approximate inference methods that have been successfully used for a wide range of models, whereas new applications are constantly being explored. The mean-field approxi- mation, which is a well-established VB example, essentially decouples all vertices in a GM, and then introduces a parameter, called a variational parameter, for each vertex. These variational parameters are iteratively updated such that the cross- entropy between the approximate and true probability distributions is minimized. This method facilitates inference in an efficient and principled manner and pro- vides a bound on the marginal likelihood to be calculated. The main drawback

138 4.5. Copula-based Belief Propagation Recovery of VB methods is that the equations for optimizing the variational approximation have been worked out by hand, a process which is both time consuming and error prone [224]. Finally, loopy belief propagation methods, such as provide an alternative solution to approximate Bayesian inference. The sum-product (also known as belief propaga- tion) and max-product (also known as min-sum) algorithms are routinely applied to graphical models with cycles. The main drawback of these methods is that their convergence and correctness guarantees for direct acyclic graphs7 do not hold in general, but they are only guaranteed to converge for tree-structured graphs [125]. Nevertheless, as mentioned in [20], these methods can demonstrate quite promising performance when CS-LDPC [20] matrices are used to reduce the number of loops and message encoding accounts for damping (message damped belief propagation, a.k.a. MDBP) [173]. More importantly, loopy belief propagation allows for using sparse sensing matrices and therefore can accelerate the encoding and decoding procedures [41].

4.5.2 Description of the proposed recovery algorithm

We now describe the proposed belief-propagation algorithm, which modifies the algorithm in [20] such that the recovery of the target signal sl is based not only on the measurements, yl, but also on the previously reconstructed signals ˆs1,...,ˆsl 1. − (l) (l) T (l) The sensing matrix is given by Φ = Θ ΨDCT, where Θ is a Rademacher matrix [20] and ΨDCT is the DCT matrix. The measurements are then created (l) (l) T (l) by yl = Φ xl = Θ ΨDCT(ΨDCTsl) = Θ sl. Similarly to LDPC matrices [85], Rademacher matrices are sparse with non-zero entries equal to 1, +1 ; thus, con- {− } trary to dense sub-Gaussian CS encoding matrices [39, 66], they allow for faster encoding and decoding procedures and they require less storage space on the sensor nodes. An in [20] , the row weight (`) and the column weight (ρ) of Θ(l) are assumed to be constant. In practice, to generate this specific form of Φ(l) on the sensor nodes, we abide by the description in Section 4.3.2 with some differences: the DCT matrix and the positions of the non-zero entries have to be pre-stored on the sensors. The

7A directed acyclic graph is a directed graph without directed loops. For more information on the graph theory and the specific terminology, we refer the reader to [41].

139 Chapter 4. Data Gathering via Compressed Sensing with Side Information generation of the Rademacher entries is done based on Section 4.3.2 where a signum function is applied on the pseudo-randomly generated values. A graphical representation of the belief-propagation decoding procedure is given by a Tanner graph, a factor graph that captures the statistical dependencies between the variables (see Fig. 4.9). The nodes in the Tanner graph are separated into:

(a) factor nodes yl(j), j = 1,...,M, which carry the measurement symbols (dark- grey squares); (b) variable nodes sl(i), i = 1,...,N, with the signal coefficients (white circles); (c) assistant nodes (light-gray squares), which store soft-decision information for each variable node. An edge occurs when there is a non-zero element in the Θ(l) matrix and it is associated with a negative or a positive sign. (λ) At the λ-th iteration, a variable node sl(i) sends a message qi j [sl(i)] to each → (λ) neighboring factor node yl(j), and a factor node yl(j) sends a message rj i[sl(i)] to → each neighbor sl(i) (see Fig. 4.9). To include multiple heterogeneous side information signals, the messages are modified as

(λ) 1 (j) (λ) rj i[sl(i)] = (λ) gj yj sl qi j [sl(i0)] , (4.19) → | → C (j) j i s i i0 (j) i → Xl \   ∈NY \

(λ+1) 1 (0) (λ) q [sl(N)] = q [sl(i)] r [sl(i)], (4.20) N Ml (λ+1) i j j0 i → → − Ci j j (i) j → 0∈MY \ (0) where qi j [sl(i)] is the initial message sent from the variable node sl(i) to the → neighboring measurement node y (j). In (4.19), (j) denotes the set of the neigh- l N (j) bors of the measurement node yl(j), sl the variables in sl that are neighbors to the factor node y (j), and s(j) i the set of variables in vector s(j) with the vari- l l \ l able that corresponds to s (i) excluded. The function g ( ) is the factor function8 l j ∙ that applies the constraints imposed by the neighbors of the factor node y(j) via the sensing matrix matrix Φ(l). Moreover, (i) is the set of the neighbors of the M (λ) (λ+1) variable node s(i), while the normalization factors Cj i and Ci j guarantee that → → (λ) (λ+1) s (i) rj i[sl(i)] = 1 and s (i) qi j [sl(i)] = 1, respectively. Under the proposed l → l → P8 P (j) (l) In the noiseless case, gj yl(j) sl = δ yj i (j) φj,isl(i) , where δ( ) denotes a | − ∈N ∙ Dirac delta function. In the presence of AWGN, theP function gj ( ) is the normal distribution ∙ (j) fz ( ) of the measurement noise that corresponds to the factor node j, i.e., g y (j) s = j ∙ j l | l (l)   fZj yj i (j) φj,isl(i) . − ∈N  P  140 4.5. Copula-based Belief Propagation Recovery copula modelling approach, the initial messages in (4.20) are calculated as

(0) l 1 l 1 l 1 qi j [sl(i)] = fSl S − sl s − =s ˆ − → | | fSl (s1 = s ˆ1, . . . , sl 1 = s ˆl 1, sl) = − − fSl 1 (s1 =s ˆ1, . . . , sl 1 =s ˆl 1) − − − c(ˆu1,..., uˆl 1, ul) = − fSl (sl), (4.21) c(ˆu1,..., uˆl 1) × − whereu ˆ = F (s =s ˆ ), k = 1, . . . , l 1. Note that the calculation of (4.21) is k Sk k k − based on (4.14). Message passing takes place for a maximum number of Λ iterations and the final MAP estimatesl, ˆ MAP(i), which depends on the stored soft-decision information at the assistant node of sl(i), is

(0) (Λ) sˆl,MAP(i) = arg max qi j [sl(i)] rj i[sl(i)]. (4.22) sl → → i (i) ∈NY

Based on the per-symbol MAP estimates, we form the reconstructed vector ˆsl, which in turn gives the final reconstructed values of ˆxl = ΨDCTˆsl. As described in [20], the messages sent between neighboring nodes are vectors containing p samples of the pdf. At the coefficient nodes, the multiplication of messages, as it is described by (4.20), corresponds to element-wise multiplication between vectors. At the measurement nodes, the message update is performed by (4.19), which is done in the frequency domain via the Fast Fourier Transform (FFT). The reader should notice that the proposed method makes use of sparse sensing matrices Φ, such as the Rademacher matrices, so as to accelerate the encoding and decoding procedures. In general, sparse matrices have the following advantages:

• During encoding, the matrix-vector multiplications of the type Φxl can be per- formed very efficiently, since they are performed proportionally to the number of non-zeros in the sensing matrix.

• The measurement vectors y can be updated quickly if some coordinates { l} change. This is very important for processing data streams in WSN-based applications, where topologies vary and/or sensor devices may leave or enter the network.

141 Chapter 4. Data Gathering via Compressed Sensing with Side Information

Although sparse matrices are characterized by these strong assets, they have the following weakness: they are directly applicable only to the case where the target signal xl is approximately sparse in the canonical basis, namely, when it holds that xl = sl [41]. If the signal does not have this desired structure, which is true in the majority of the use cases (including air-quality monitoring), it should be projected on a linear transformation matrix Ψ. Then, the recovery method should first recover the representation vector of the signal xl before computing the final signal estimate xˆl. This approach was followed in the belief-propagation method proposed in [20] and is also followed in this study. Finally, it is important to clarify that when the signal is not structured, the decoder should compute the product matrix ΦΨ, which in general may not be sparse; this results in (NM ) computations. Nevertheless, this figure can be much O l lower when special bases—such as discrete Fourier, wavelet and discrete cosine transforms—are used as dictionaries [41]. Section 4.7.1 explains why we consider DCT as a sparsifying basis in this study.

4.6 Recovery Algorithm for the Extended ` ` 1 − 1 Problem

In Section 4.5.2, we describe a reconstruction method that uses multiple side infor- mation signals as recovery. This method provides a Bayesian-based solution to the problem of CS with side information using copulas, and it can be used to perform successive recovery of the signals s1,..., sL, as described in Section 4.3.2. In this section, we introduce an alternative algorithm that is not based on Bayesian principles; this is a very useful approach when statistical characteriza- tions of the representation signals are not available. The recovery algorithm is based on the recovery method for the ` ` algorithm proposed in [152], where we mod- 1 − 1 ified the optimization problem so as to leverage multiple side information signals. Therefore, we say that this algorithm solves the extended ` ` problem, which 1 − 1

142 4.7. Experiments

number of iterations, say Λ.iterations At that point, we calculate the final MAP estimate

Measurements Signal coefficients MAP estimates

(λ) i) sl(1) r1 1[sl(1)] = and ρ define )s ˆl, (1) ˆ → MAP and column has the same number of non-zero entries, respectively. The parameters ! and (1) sl(2) ) yl(1) (1)s ˆl,MAP(2) ˆ

(1) sl(i) two distinctive sets: (a) the factor nodes yl(j), j 1)s ˆl,MAP(i) ˆ

) sl(N 1) ) y (M ) − sˆl,MAP(N 1) l l −

(1) sl(N) (λ+1) (2)s ˆl,MAP(N) qN M [sl(N)] = → l

Figure 4.9: Tanner graph that visualises belief propagation. The messages are beliefs, which are updated at the assistant nodes (light-grey squares) at each iteration in order to aid the final per-symbol MAP estimate.

can take the following form:

l 1 − (l) ˆsl = arg min sl 1 + ωk sl ˆsk 1 s.t. yl = A sl, (4.23) sl k k k − k ! kX=1

(l) (l) 1 where A = Φ Ψ is the measurement matrix of X , s = Ψ− x , l 1, 2,...,L l l l ∀ ∈ { } are the compressible representations of the signals x L with ˆs denoting the { l}l=1 l l 1 corresponding reconstructed vector, and ω > 0 − are the weights establishing { k }k=1 a tradeoff between signal sparsity and fidelity to the information signal ˆsk. In this thesis, we consider a simple form of the problem in (4.23), where ω = 1, k k ∀ ∈ 1, . . . , l 1 . { − }

143 Chapter 4. Data Gathering via Compressed Sensing with Side Information

Table 4.2: Pairwise Copula Parameter Estimates

PARAMETERS (CO, NO2) (NO2, SO2) (CO, SO2) Correlation 0.7025 0.8126 0.8563 Degrees of Freedom, ν 35.56 35.56 490.95 δ (Clayton) 1.5770 2.3004 2.7655 δ (Frank) 6.6760 8.8767 11.0249 δ (Gumbel) 2.0877 2.5874 3.1619

4.7 Experiments

We evaluate the performance of the proposed copula-based design using sensor read- ings from the air pollution database of the US EPA [4]. We consider a tree-based network architecture with N = 1000 sensors, as described in Section 4.3.3, which comprises 15 subtrees defined by the sensor density in a geographic area9. Since the transmission of the sensor values is enabled by LoRa [3], we assume subtrees where the inter-sensor distance does not exceed 2km for urban areas and 22km for rural areas. We consider 6 105 values of three pollutants, namely, CO, NO and × 2 SO2, collected during 2015. The data are separated equally into a training and an evaluation set, without an overlap between the two. During the training stage, we perform Kolmogorov-Smirnov fitting tests on the DCT coefficient values of each pollutant, separated into blocks of N readings, to define the most appropriate marginal distributions. The results reported in Table 4.1 show that the Cauchy distribution is the most appropriate for the CO and the SO2, and the Laplace for the NO2. The parameters of the distributions are ˆ ˆ estimated via ML estimation, resulting in βCO = 0.6511, βSO2 = 0.9476 for the ˆ Cauchy distributions, and bNO2 = 2.3178 for the Laplace. We also estimated the mean values of the DCT coefficients, which appear to be very close to zero for all distributions. Additionally, we estimate the parameters of the different copula functions. Using standard ML estimation [29], we calculate the correlation matrix RG, the pairwise correlation values of which are presented in Table 4.2. Moreover, we estimate the correlation matrix Rt and the degrees-of-freedom parameter ν via approximate ML

9Each subtree is formed by sensors within only one of the following states: CA, NV, AZ, NC, SC, VA, WV, KY, TN, MA, RI, CT, NY, NJ, MD.

144 4.7. Experiments

Table 4.3: Average Percentage of the Number Coefficients of the Data with an Absolute Value Below a Given Threshold τ.

τ DCT Haar Daubechies-2 Daubechies-4 0.1 28.8 19.1 25.50 23.80 SO2 0.2 48.50 35.80 46.30 44.00 0.4 74.10 67.00 77.50 77.30 0.1 25.90 16.40 22.60 19.40 CO 0.2 44.50 31.20 40.30 38.70 0.4 70.30 58.20 69.80 69.20 estimation [29]. The latter method fits a Student’s t-copula by maximizing an objec- tive function that approximates the profile log-likelihood for the degrees-of-freedom parameter. For the ensemble of the three pollutants we find that ν = 89.91, whereas the values corresponding to each pair of pollutants are tabulated in Table 4.2. Table 4.2 also reports the pair-wise maximum-likelihood estimates [90] of the δ parame- ters that correspond to different bivariate Archimedean copulas. We consider only bivariate Archimedean copulas since they are parameterized by a single parame- ter although lacking some modeling flexibility. Therefore, multivariate Archimedean copulas provide less accurate statistical modelling compared to multivariate Ellipti- cal copulas [158]. Finally, we assume that the values sent from each sensor node are discretized using an analog-to-digital converter, where the bit-depth is 16 bits.

4.7.1 Sparsifying Basis Selection

We first identified a good sparsifying basis Ψ for the data. Following the network architecture described at the beginning of the section, we organized the training data into blocks of n readings per pollutant. In order to form a block xl, readings must have the same timestamp and be measured by neighboring stations, adhering to the LoRa [3] transmission distance criteria. We projected the data in each block onto different set of bases, including the discrete cosine transform (DCT), the Haar, the Daubechies-2, and the Daubechies-4 continuous wavelet transform (CWT) bases; for the CWT we experimentally found that the scale parameter α = 4 led to the best compaction performance. Since the resulting representation sl is a compressible signal, we calculated the number of coefficients in sl whose the absolute value is

145 Chapter 4. Data Gathering via Compressed Sensing with Side Information

below a given threshold τ. Table 4.3 reports the results for SO2 and CO, averaged over all the blocks in the training set. It shows that the DCT yielded the sparsest representations.

4.7.2 Performance Evaluation of the Proposed Copula-based Algorithm

In the first set of experiments, we assess the performance of the proposed copula- based belief propagation algorithm in Section 4.5.2 vis-`a-vis the ADMM-based al- gorithm for the ` ` minimization problem [152], which is the state-of-the-art 1 − 1 method for CS recovery with side information, and the Bayesian CS algorithm in [20]. Initially, we consider the scenario of two pollutants, where CO readings are reconstructed using NO2 readings as side information. We consider five differ- ent bivariate copulas for modelling the joint distribution, namely, the Gaussian, the Student’s t, the Clayton, the Frank and the Gumbel copula. The comparison

10 xCO ˆxCO is expressed in terms of `2-relative error norm , k x− k , of the reconstructed k COk CO data versus the number of measurements Ml. Fig. 4.10(a) shows that the pro- posed algorithm and the ADMM-based method manage to efficiently exploit the side information and improve the performance compared to the CS method in [20], except for the case where the number of measurements is small (< 200); this is be- cause convex optimization-based methods for sparse recovery typically require more measurements than non-convex and belief propagation methods. Furthermore, it is clear that the proposed algorithm systematically outperforms the ADMM-based al- gorithm for all the considered copula functions. The best performance is achieved by the Student t-copula function, providing relative-error reductions of up to 47.3% compared to the ADMM-based algorithm. Subsequently, to demonstrate that additional side information of good quality boosts the belief-propagation based recovery, we assume that CO is recovered hav- ing as side information data from the other two pollutants, namely, NO2 and SO2. Fig. 4.10(b) depicts the normalized relative-error performance of the proposed al- gorithm when the number of side-information signals ranges from zero to two. It

10We choose the relative error since it is directly connected with the rate of successful recon- struction [152].

146 4.7. Experiments

1.4 No side information Proposed (Gaussian copula) Proposed (Student copula) 1.2 Proposed (Clayton copula) Proposed (Frank copula) Proposed (Gumbel copula) ADMM-based algorithm 1

0.8 Relative Error

0.6

0.4

100 200 300 400 500 600 700 Measurements

(a) Single side information

1 No side information Single side information 0.9 Multi-hypothesis (Gaussian copula) Multi-hypothesis (Student copula) 0.8

0.7

0.6

Relative Error 0.5

0.4

0.3

0.2 100 200 300 400 500 600 700 Measurements

(b) Multiple side information

Figure 4.10: Performance comparison of the proposed recovery algorithm when CO signals are reconstructed using as side information (a) only signals of NO2, and (b) both signals of NO2 and SO2. The no side-information case is also included.

147 Chapter 4. Data Gathering via Compressed Sensing with Side Information is evident that the higher the number of side-information signals, the higher the performance of our algorithm. We also observe that the Student’s t-copula leads to a higher performance than the Gaussian copula; this is because the former embeds more parameters than the latter, thus allowing for more flexibility and for a better handling of the extreme values [32].

4.7.3 Evaluation of the System Performance

In another set of experiments, we evaluate the performance of the proposed suc- cessive reconstruction architecture (see Section 4.3) that makes use of either the proposed copula-based algorithm (see Section 4.5.2) or the proposed algorithm for the extended ` ` problem (see Section 4.6). 1 − 1 First, we focus on the scenario where two pollutants are monitored. We com- pare the following schemes: (i) the proposed system with successive data recovery using the proposed copula-based algorithm (the Student’s t-copula model has been considered as it performs better compared to other copulas, as shown in Section 4.7.2), (ii) the proposed system with successive data recovery using the proposed algorithm for the extended ` ` problem, (iii) the DCS setup11 [20], and (iv) the 1 − 1 baseline system where each source is independently reconstructed using Bayesian CS [20]. The performance evaluation of each scheme is expressed in aggregate rela-

L xl ˆxl tive error for all source signals, namely, l=1 k x− k , versus a common number of k lk measurements Ml per source. Fig. 4.11(a)P shows that the proposed systems and the DCS scheme manage to leverage both the intra- and inter-source data dependencies and provide improved reconstruction performance compared to the baseline system. Moreover, the system with the proposed algorithm for the extended ` ` prob- 1 − 1 lem outperforms the DCS when Ml < 500. However, this situation changes in the large measurements regime, where DCS improves its reconstruction performance and provides higher reductions in relative error due to its efficient joint sparsity model. More importantly, the proposed system with the Student’s t-copula based recovery algorithm constantly outperforms all other schemes, bringing relative-error

11Note that, in the classical DCS scenario, each signal of interest is constructed by many read- ings of the same sensor. In order to have a fair comparison with our design, we have modified this framework by assuming that each signal of interest contains readings from different sensors observing the same source.

148 4.7. Experiments

2 Baseline system Proposed System (copula-based algorithm) Proposed System (extended l -l problem) 1.8 1 1 DCS

1.6

1.4

1.2 Aggregate Relative Error

1

0.8 100 200 300 400 500 600 700 Measurements (per air pollutant)

(a) Two air pollutants

3 Baseline system 2.8 Proposed system (copula-based algorithm) Proposed system (extended l -l problem) 1 1 2.6 DCS

2.4

2.2

2

1.8

1.6 Aggregate Relative Error

1.4

1.2

1 100 200 300 400 500 600 700 Measurements (per air pollutant)

(b) Three air pollutants

Figure 4.11: Performance comparison of the proposed successive reconstruc- tion architecture with the DCS, the ADMM-based and the base- line systems when we assume (a) two air pollutants (CO and NO2), and (b) three air pollutants (CO, NO2 and SO2).

149 Chapter 4. Data Gathering via Compressed Sensing with Side Information savings of up to 15.3% and 13.8% against the proposed system for the extended ` ` recovery problem and the DCS scheme, respectively. 1 − 1 Second, we focus on the monitoring case of three air pollutants. The same re- construction schemes as in the two-pollutant case has been considered. Fig. 4.11(b) shows that DCS delivers superior performance compared to the baseline system, which is more noticeable in the regime of Ml > 250. Moreover, the proposed design with the algorithm for extended ` ` problem outperforms DCS with relative- 1 − 1 error reductions of up to 18.3%. Finally, the system with the copula-based algorithm provides superior performance against all other systems, with significant relative- error reductions that reach up to 19.3% against DCS. It is important to notice that both proposed designs significantly outperforms the state-of-the-art schemes when the number of measurements is small.

4.7.4 Evaluation of System Performance under Noise

Now, we evaluate the robustness of the proposed successive recovery architecture in Section 4.3 against measurement and communication noise, when either the pro- posed copula-based algorithm (see Section 4.5.2) or the proposed algorithm for the extended ` ` problem (see Section 4.6) is used. As explained in Section 4.3, for 1 − 1 each source, both types of noise are modeled as AWGN with a standard deviation σ 12. In this experiment, we varied the noise level by assuming that σ 0, 2, 5 z z ∈ { } and we calculated the aggregate relative error as a function of the measurements

Ml, which are assumed to be common for all sources. Apart from noise, our de- sign is affected by fading and shadowing effects. Small-scale fading [212] is mainly caused by relative movement between transmitter and receiver (i.e., Doppler shifts), whereas large-scale fading mainly consists of distance attenuation and shadow fad- ing [129]. In this work, channel fading effects are assumed to be handled by the chirp spread spectrum (CSS) modulation [25], a multipath/fading and doppler shift resistant technique adopted in LoRa [3].

12In this experiment we have assumed that the standard deviation of the noise is the same for all sources. Hence, we have omitted the superscript (l).

150 4.7. Experiments

2

1.8

1.6

Baseline system ( =0) z 1.4 Proposed system ( =0) z DCS ( =0) z Baseline system ( =2) 1.2 z Proposed system ( =2)

Aggregate Relative Error z DCS ( =2) z Baseline system ( =5) 1 z Proposed system ( =5) z DCS ( =5) z 0.8 100 200 300 400 500 600 700 Measurements (per air pollutant)

(a) Two air pollutants

3

2.8

2.6

2.4

2.2 Baseline system ( =0) z Proposed system ( =0) 2 z DCS ( =0) 1.8 z Baseline system ( =2) z 1.6 Proposed system ( =2) Aggregate Relative Error z DCS ( =2) z 1.4 Baseline system ( =5) z Proposed system ( =5) 1.2 z DCS ( =5) z 1 100 200 300 400 500 600 700 Measurements (per air pollutant)

(b) Three air pollutants

Figure 4.12: Performance comparison of the proposed system with the copula- based algorithm against the DCS scheme, when we assume differ- ent noise levels (σz = 0, 2, 5) for (a) two sources (CO and NO2), and (b) three sources (CO, NO2 and SO2).

151 Chapter 4. Data Gathering via Compressed Sensing with Side Information

Successive recovery via copula-based algorithm. Firstly, we evaluate the robustness of the proposed system with the copula-based algorithm; as before, the Student copula has been considered due to its better fitting accuracy compared to other families. The performance comparison takes place with respect to the DCS scheme, as well as the baseline system. With respect to the two-pollutant case, Fig. 4.12(a) shows that the proposed system delivers superior performance compared to the competing systems for moderate (σz = 2) and strong (σz = 5) noise. Moreover, we observe that the proposed algorithm is robust against noise, especially when the number of measurements is small. In particular, the relative error exhibits an average deviation of 5.8% (σz = 2) and 18.7% (σz = 5) compared to the noiseless case. When three pollutants are monitored, the proposed system constantly outper- forms the DCS scheme and the baseline system under moderate and strong noise corruption. Moreover, as depicted on Fig. 4.12(b), the proposed design continues to demonstrate robustness against noise, where the relative error shows an average deviation of 4.2% (σz = 2) and 10.1% (σz = 5) compared to the noiseless case. It is clear that the robustness of the proposed system increases for a larger number of pollutants.

Successive recovery via algorithm for the extended ` ` problem. Sec- 1 − 1 ondly, we assess the robustness of the proposed algorithm for the extended ` ` 1 − 1 problem. As in the copula-based system, the performance comparison takes place with respect to the DCS scheme and the baseline system. We varied the noise level by assuming that σ 0, 2, 5 and calculated the aggregate relative error, z ∈ { } L xl ˆxl l=1 k x− k , as a function of the measurements Ml, which are assumed to be com- k lk monP for all sources.

Focussing on the case of three air pollutants, we consider a moderate (σz = 2) and strong (σz = 5) noise scenario, along with the noiseless case (σz = 0). As depicted in Fig. 4.13, when σz = 2 the relative error reductions against the base- line scenario and the DCS setup reach up to 27.4% and 21.2%, respectively. When

σz = 5, the corresponding improvements mount to 27.9% and 20.4%. It should be mentioned that, when σz = 5, the performance of the proposed design is signifi-

152 4.8. Conclusions

3.5 Baseline system ( =0) z Prop. system ( =0) z DCS ( =0) z Baseline system ( =2) 3 z Prop. system ( =2) z DCS ( =2) z Baseline system ( =5) z Prop. system ( =5) 2.5 z DCS ( =5) z

2 Aggregate Relative Error

1.5

1 100 200 300 400 500 600 700 Measurements (per air pollutant)

Figure 4.13: Performance comparison of the proposed system with the algo- rithm for the extended ` ` problem against the DCS and the 1 − 1 baseline systems for different noise levels (σz = 0, 2, 5). Three air pollutants (CO, NO2 and SO2) have been considered. cantly higher that DCS until the number of measurements reach M = 550. Above M = 550 measurements, DCS provides better results. It can been observed that the proposed scheme provides robustness against noise, especially when the number of measurements is small. In particular, the relative error increases on average by 3.1%

(σz = 2) and 8.4% (σz = 5) compared to the noiseless case.

4.8 Conclusions

This chapter introduced two novel compressive data aggregation and reconstruction mechanisms targeting at IoT applications with large-scale WSN setups. We focused on an important use case related to smart cities, i.e., the air-pollution monitoring

153 Chapter 4. Data Gathering via Compressed Sensing with Side Information problem. The novelty of our work lies on the reconstruction stage, where we proposed two novel algorithms for signal recovery: a Bayesian-based reconstruction algorithm and a method for solving the extended ` ` problem. The former algorithm is 1 − 1 built upon belief propagation principles [55, 135] and uses the statistical model of copula functions [158, 193] for the iterative message-passing procedure. The latter algorithm is based on convex optimization and is very useful for applications where the statistical characterizations of the monitored data are not available a priori, or vary dynamically. Experimentation on the real EPA dataset resulted in the following conclusions:

• We evaluated the performance of the proposed copula-based algorithm and we showed that it delivers the best relative-error performance compared to the ADMM-based algorithm for the ` ` minimization problem [152], which is 1 − 1 the state-of-the-art method for CS recovery with side information, and the Bayesian CS algorithm in [20]. Contrary to these state-of-the-art recovery methods, the proposed copula-based algorithm can leverage multiple informa- tion signals as side information to further improve reconstruction. Note that the better performance is achieved when the Student’s t copula is used, which means that the dependence structure is symmetric (see Chapter 2).

• We compared the performance of the two proposed data gathering schemes against prior art [19, 101] and we showed that both provide significant im- provements in relative error. These improvements are pronounced when the number of measurements is small. Nevertheless, the copula-based gathering scheme performs better than the system with the extended ` ` method. 1 − 1 This means that copulas enable the implementation of powerful reconstruction algorithms that leverage the dependence structure among heterogeneous data types for providing superior relative-error performance.

• Although the system with the extended ` ` method does not reach the 1 − 1 performance level of the copula-based system, it still provides superior per- formance against the state of the art. This is because the extended ` ` 1 − 1 algorithm, similarly to the copula-based algorithm, can exploit multiple side information signals during reconstruction. Moreover, it is a very good option

154 4.8. Conclusions

when the source statistics are not known a priori, or they change rapidly. In the latter case, the copula-based algorithm should be modified such that it incorporates online estimators of the model parameters; this remains out of the scope of this thesis and is further discussed in the final chapter as future work.

• We compared the proposed designs against the state-of-the-art schemes when measurement and communication noise is present. We showed that the pro- posed schemes are more robust against noise of various levels.

• Clearly, the proposed compressive data gathering mechanisms enable balanced power consumption in the network, as nodes send similar amount of encoded information and hence consume similar energy for the data transmission. More- over, the proposed schemes offer low encoding complexity, which is also efficient as it leverages both intra- and inter-signal dependencies. sensing basis that is known only at the sink.

We saw that our designs meet more accurately the demands of a large-scale mon- itoring application. However, they can be further improved in the following aspect: during the data gathering stage, the measurements’ vector yl corresponds to sensor readings of a specific source Xl. Thus, if multiple sources need to be monitored, the same procedure has to be repeated independently. In the next chapter, we explain how we can avoid this and achieve significant rate savings. Finally, we conclude this study by capitalizing aspects related to the complex- ity of the proposed compressive data gathering method. During the data sensing and compression procedures, the nodes do not apply traditional compression mech- anisms, such as entropy coding, but rather compute scalar values as described in Section 4.3. This procedure is very lightweight, and if we also consider that the number of transmissions in the network is vastly reduced (see Section 1.2), we con- clude that encoding is of low complexity. At the decoder, the decoding procedure is computationally intense, since we make use of a novel copula-based loopy be- lief propagation algorithm, where the messages exchanged in the Tanner graph are vectors with pdf samples. This requires multiplication and convolution operations, which increase the complexity. In Chapter 6, we describe how this algorithm can be

155 Chapter 4. Data Gathering via Compressed Sensing with Side Information simplified using Gaussian approximations for the updating rules, a procedure also adopted in generalized approximate message passing [176].

156 Chapter 5

Large-Scale Data Gathering via Compressive Demixing

Despite their superior performance compared to the state of the art, the compressive data gathering schemes proposed in Chapter 4 have the following drawback: these designs provide novelty on the data aggregation part, where an efficient copula- based reconstruction algorithm and a method for the solution of the extended ` ` 1 − 1 problem are proposed. However, they follow a gathering procedure similar to the state of the art [19, 101], where the measurements’ vector yl aggregated at the sink contains the required information to recover the readings of all sensor nodes 1 T regarding only source Xl, namely, the signal xl = [xl(1), . . . , xl(n), . . . xl(N)] . Thus, if multiple sources need to be monitored, the data aggregation procedure needs to be iterated (independently) for all information sources. This becomes very inconvenient when the number of sources increases. Here we follow a more efficient approach. We devised a code design that encodes the sensor signals x L , which contain readings form all sources collected by the { l}l=1 nodes, into a single low-dimensional measurement vector y. Therefore, the gathering procedure is not repeated for each signal independently, thereby resulting in signif- icant reductions in the overall network data rates and hence significant power sav-

1Note that we abide by the notations in Chapter 4, where N denotes the number of nodes and L the number of monitored sources.

157 Chapter 5. Large-Scale Data Gathering via Compressive Demixing ings. The proposed design is based on the compressive demixing paradigm [145,147] and outperforms the state-of-the-art schemes, which are based on (distributed) com- pressed sensing, as well as the proposed designs in Chapter 4 that rely on compressed sensing with (multiple) side information.

Outline. The remainder of this chapter is structured as follows: Section 5.1 pro- vides a short description of the state-of-the-art and 5.2 the contributions of this chapter. Section 5.3 includes the background on the compressive demixing theory, which is required for the understanding of the proposed data aggregation and re- covery mechanism. Section 5.4 details our design, whereas Section 5.5 provides a comparison of the proposed scheme to the state of the art, as well as the designs proposed in Chapter 4, by experimentation on a real air-quality dataset from the United States EPA. Finally, Section 5.6 we draws the conclusion of this chapter.

5.1 Prior Art

Related studies on data acquisition and recovery systems are also referred in the previous chapter since they address similar problems. Collaborative wavelet trans- form coding and clustered data aggregation are two techniques proposed in [56] and [130] respectively, which require excessive transmission of overhead information— and hence additional encoding complexity—due to inefficient handling of abnormal events. An alternative strategy adheres to distributed source coding (DSC) [62], a theory initiated by Slepian and Wolf [195] that motivated code designs to lever- age inter-sensor data correlation at the decoder side, but performs well only for a limited number of nodes. To tackle this problem, Chapter 3 proposes a gather- ing scheme that delivers efficient compression performance for multisensory setups. This design is based on MT coding [23,213] and a novel copula regression algorithm that accounts for the heterogeneity among data sources. However, when we deal with large-scale setups—where the sensor nodes are in the order of thousands—this scheme results in unbalanced energy consumption in the network (a problem that is also met when traditional code designs are applied). Focussing on the problem of data aggregation for large-scale WSNs, Haupt et al.

158 5.2. Contributions proposed an intelligent design where compressed sensing (CS) principles are used to balance the power consumption of the sensing devices [100]. A similar gathering scheme that considers multi-hop routing and includes a network capacity analysis was presented in [134]. Both schemes apply independent recovery of each information source at the sink via Basis Pursuit [47], an approach that leverages only spatial L dependencies among sensor data and results in O l=1 NMl transmissions in the network; Ml the number of measurements requiredP for the l-th data source, where

Ml < N. This method provides significant rate reductions compared to traditional multi-hop schemes, where the overall number of transmissions mounts to O(N 2L) [134]. In Chapter 4, we propose two novel compressive gathering scheme based on the theory of compressed sensing with (multiple) side information was introduced. The first design uses a novel copula-based reconstruction algorithm built upon belief- propagation principles, whereas the second recovers the sensor signals via algorithm that considers the extended ` ` paradigm2. These schemes leverages both the 1 − 1 spatial dependencies between sensor readings of the same source, as well as the dependence structure among sensor readings of different sources. The number of transmissions Ml required for each source l is smaller than in [100,134] as recovery accounts for multiple side information signals at the data recovery stage. Finally, distributed compressed sensing (DCS) [21] provides an alternative solution that uses spatiotemporal dependencies among sensor data to succeed joint recovery. Using a sophisticated model to describe the joint sparsity among sensor signals, DCS needs a total number of transmissions that is similar to [100] for high relative error and similar to [234] when improved reconstruction quality is needed.

5.2 Contributions

The aforementioned compressive gathering designs emphasized on improving the reconstruction quality at the sink, without considering that significant amounts of energy are consumed because readings of different sources are gathered separately. Instead of focussing on how to improve joint reconstruction upon the state of the art (an approach followed in Chapter 4), we introduce a mechanism that improves the

2 The work based on the extended `1 `1 framework is published in [234]. −

159 Chapter 5. Large-Scale Data Gathering via Compressive Demixing data gathering procedure by jointly aggregating sensor readings from all data sources into a single measurements vector. As a result, a total amount of only O (NM) transmissions is needed, where M is the number of measurements required for the joint recovery of all information sources. Our method uses the compressive demixing paradigm [145, 147] that allows for joint gathering as well as joint recovery of the sensor signals. Although its wide applicability in various domains, such as image processing, machine learning and statistics [145], to the best of our knowledge, this is the first time that compressive demixing is used in the context of data gathering. To evaluate the performance of our framework, we consider the problem of air pollution monitoring based on actual air-pollution sensor readings taken from a database of the United States Environmental Protection Agency (EPA) [4]. The experimental results show that the proposed method significantly reduces the re- quired data rates for a given data reconstruction quality compared to compressed sensing [100] and DCS [21] methods, thereby resulting in less network traffic and pro- longed system lifetime. Furthermore, the proposed system shows robustness against measurement and communication noise without introducing excessive computation on the sensor nodes.

5.3 Background

This section provides the necessary theoretical background on the compressive demix- ing paradigm. Compressive demixing is used to recover (alias, demix) multiple signals—also knows as constituents—based on a low-dimensional projection of their sum onto a known random basis, as well as prior information about their structures. In our context:

N0 • The constituents are representations sl R , l = 1,...,L of the sensor { ∈ } N signals xl R that are computed by projection of the latter onto a dic- { ∈ } N N0 tionary basis Ψ R × , i.e., sl = Ψsl, l = 1,...,L. Similarly to Chapter 4, ∈ T a sensor signal sl = [sl(1), . . . , sl(n), . . . , sl(N)] contains readings of all nodes n = 1,...,N regarding only the l-th source. Hence, different signals refer to readings of different sources.

• The recovery (alias, demixing) of the constituents is done at the sink (i.e., the

160 5.3. Background

atom

Figure 5.1: Atomic gauge and other level sets.

joint decoder), which has access to the measurements’ vector y RM , M < N, ∈ namely, a low-dimensional projection of the sum of the constituents onto M N a known random basis Φ R × (i.e., the sensing matrix). Contrary to ∈ the proposed frameworks in Chapter 4 and the state of the art methods [21,100,134,234] where measurements are computed based on readings of only one source—hence the procedure has to be reapeated for every source—the proposed method generates the measurements using readings from all sources. In Section 5.4.1, we detail how this measurement vector is gathered at the sink, and in Section 5.4.2 we describe how measurements are used to recover the constituents via the compressive demixing framework.

5.3.1 Source Separation

Before describing the compressive demixing paradigm, we first consider the root problem of source separation (alias, demixing) [145], which targets at determining N two (unobserved) constituent signals s1, s2 R from the observed signal ∈

ζ = s1 + Us2, (5.1)

161 Chapter 5. Large-Scale Data Gathering via Compressive Demixing where U is a known orthogonal matrix. To achieve an efficient demixing procedure, the structured constituents must look different from one another. In particular, the two structured signals must be incoherent, namely, that their constituents must be very different from each other. This is the role of the basis U; it models the relative orientation of the structures, and it also provides a convenient proxy for incoherence. To generate a convex program that solves this problem, two requirements have to be met. First, convex functions that promote the desired structure for the component vectors s1 and s2 have to be identified. Second, these convex functions has to be merged into a convex optimization problem. Moreover, we need a way to describe the structure of the constituents. To this end, we use the following definitions.

Definition 5.3.1. A signal s RN is said to be atomic when it is constructed by ∈ N summation of a small number of scaled atoms, given in the atomic set s R [145]. A ⊂ N Definition 5.3.2. The atomic gauge s s of a signal vector s R is defined k kA ∈ as [42]:

s = inf λ > 0 : s λ conv( s) , (5.2) k kAs ∈ ∙ A  where conv( ) denotes the convex hull of the atomic set. As An illustration of the atomic gauge for an atomic set with five atoms is presented in Fig. 5.1, where the unit ball of s is the heavy line representing the closed k kAs convex hull of . The dashed lines are dilations of the unit ball and represent As other level sets of the gauge. Atomic gauges are ubiquitous in the literature on inverse problems. Some common examples include the `1 norm, the ` norm and ∞ the operation norm. The `1 norm is widely used to promote sparsity [66].

Now we assume that the component signals s1 and s2 are atomic w.r.t. their atomic sets and . By applying the atomic gauge functions on the compo- As1 As2 nent signals, we take the values s1 and s2 that are relatively small. This k kAs1 k kAs2 suggests that the demixing problem could be solved by searching for component vec- tors that generate the observed vector ζ and their atomic gauges take small values. As a result, we have the following optimization problem [145]:

[ˆs ,ˆs ] = arg min α s + α s 1 2 1 1 s1 2 2 s2 s1,s2 k kA k kA n o (5.3) s. t. ζ = s1 + Us2

162 5.3. Background

where α1 = 1, and α2 is a regularization parameter that trades the relative impor- tance of the atomic gauges. An extension of the source separation problem involves the demixing of more L L than two constituents sl R , where the observed vector takes the following { ∈ }l=1 form: L ζ = U s + + U s = U s , (5.4) 1 1 ∙ ∙ ∙ L L l l Xl=1 where U are known orthogonal matrices that encode the relative orientation { l} of the constituent vectors. Note that, each matrix U in the set U is a Haar- l { l} distributed rotation independent of all the others [147]. This problem is termed as multiple demixing [144]. As before, the constituents are assumed to be atomic w.r.t. their atomic sets L . Problem (5.3) is then modified as follows: {Asl }l=1

L [ˆs ,...,ˆs ] = arg min α s 1 L l l sl sl k kA { } Xl=1 (5.5) L s. t. Ulsl = ζ, Xl=1 where α are the regularization parameters and α = 1. { l} 1

5.3.2 Compressive Demixing

Consider the following scenario: the observed signal is not the superposition of the component signals ζ, as described by Equation (5.4), but solely an undersampled version of ζ, denoted by y and described by the following model:

L y = Φζ = Φ Ulsl, (5.6) Xl=1

M N where Φ R × is the sensing matrix representing the linear mapping of the ∈ superposition vector ζ to the so-called measurements vector y. This is the problem

163 Chapter 5. Large-Scale Data Gathering via Compressive Demixing of compressive demixing, which can be solved by the following optimization problem:

L [ˆs ,...,ˆs ] = arg min α s 1 L l l sl sl k kA { } Xl=1 (5.7) L

s. t. Φ Ulsl = y, Xl=1 where α are the regularization parameters and α = 1. { l} 1 We are interested in (approximate) sparse or compressible representation of the component signals. Thus, the atomic gauge function is the `1 norm [145], which allows for the formulation of convex optimization problems. Note that, conceptu- ally, the `0 norm would be more appropriate as we search for the sparsest possible constituents that generate the observed signal z. However, the `0 norm leads to non-convex problems, which are difficult to get solved. The compressive demixing problem can also be modified so as to include the case where the measurements vector y is corrupted by AWGN noise. If the measurements are affected by noise, a conic constraint is required; i.e., the minimization problem (5.7) needs to be changed as follows:

L [ˆs ,...,ˆs ] = arg min α s 1 L l l sl sl k kA { } l=1 X (5.8) L 2 s. t. Φ Ulsl y < , − l=1 2 X for  > 0 carefully chosen.

5.3.3 Oracle Problem

McCoy and Tropp also studied the problem of compressive demixing from a different point of view, where the same goal with Problem (5.7) is achieved and reconstruction guarantees are also provided [146]. In particular, they showed that the demixing

164 5.3. Background problem can be solved by:

L 2

[ˆs1,...,ˆsL] = arg min Φ† Φ Ul sl y sl ∙ − ! { } l=1 2 X

s. t. s1 1 s1∗ 1, k k ≤ k k (5.9) s s∗ , k 2k1 ≤ k 2k1 . .

s s∗ , k Lk1 ≤ k Lk1

M N where Φ R × is the sub-sampling matrix with elements drawn i.i.d. from ∈ a Gaussian distribution. The Moore-Penrose pseudo-inverse Φ† is included in the consistency term to ensure that the recovery procedure is independent of the condi- L tioning of Φ. Moreover, s∗ are signals, termed optimal points, which should be { l }l=1 good approximations of the true constituents s L in order to provide a successful { l}l=1 demixing procedure. Unfortunately, in our scenario, the sink cannot have access to optimal points for signal recovery and hence Problem (5.9) cannot be used; to this end, we refer to this as “oracle problem.” Problem (5.9) has reconstruction guarantees that can be used for defining the number of measurements in our setup. Theorem A in [146] says that Problem (5.9) succeeds if M is slightly larger than the sum of the statistical dimensions of the `1 norm. In [42], an upper bound for the statistical dimension of the `1 norm of an N N 7 S-sparse vector in R is provided and is equal to 2S log S + 5 S. Thus, if each optimal point s∗ respectively has sparsity S , an upper bound on the sum of the l l  B  statistical dimensions is

L 7 L L = 2 S log (N) + S 2 S log S . (5.10) B l 5 l − l l Xl=1   Xl=1 Xl=1 Thus, for a given M, there is a limit on the number L of signals that we can demix, depending on their sparsity. The formula in (5.10) provides a guidance on how to select M, although it is a somewhat loose upper bound. An exact expression can be computed numerically as in Equation (4.4) of [18].

165 Chapter 5. Large-Scale Data Gathering via Compressive Demixing

5.4 Proposed Scheme

Consider a large-scale WSN comprising N sensor nodes that form a multi-hop route to the sink, as depicted in Fig. 4.4. Each node n = 1, 2,...,N observes the ∈ IN { } correlated sources X1,X2,...,XL that take values in their corresponding continuous alphabets 1, 2,..., L. We denote by xl(n) R the reading produced from the X X X ∈ source X , l = 1, 2,...,L and observed at the n-th node. l ∈ JL { }

5.4.1 Joint Data Aggregation.

Before commencing the description of the data aggregation procedure, we assume that the sensor nodes have access to matrices Q(l) = ΦU L , where q(l) denotes { l}l=1 ji an ji-th element of the matrix Q(l). The proposed gathering procedure is initiated by node 1, which measures [x (1), . . . , x (1)], computes the sum q(1)x (1) + + 1 L j1 1 ∙ ∙ ∙ (L) qj1 xL(1) and sends it to node 2. Subsequently, node 2 measures [x1(2), . . . , xL(2)], computes the sum 2 q(1)x (i) + + 2 q(L)x (i) and sends it to node 3. i=1 ji 1 ∙ ∙ ∙ i=1 ji L To generalize, each nodeP n measures readingsP [x1(n), . . . , xL(n)] that correspond to different sources, and transmits the value

n n q(1)x (i) + + q(L)x (i) ji 1 ∙ ∙ ∙ ji L i=1 i=1 X X to device n + 1. The procedure continues until node N sends its information to the sink node, which receives the final weighted sum of the sensors readings

N N y(j) = q(1)x (i) + + q(L)x (i) ji 1 ∙ ∙ ∙ ji L i=1 i=1 X X = q(1)x + + q(L)x , (5.11) j 1 ∙ ∙ ∙ j L where x = [x (1), . . . , x (N)]T, l is the sensor signal that contain readings l l l ∀ ∈ JL (l) (l) (l) (l) of the same source Xl collected by all nodes, and qj = [qj1 , . . . , qjn , . . . , qjN ]. It is important to notice that in [100, 134, 234], sensor nodes send weighted sums containing readings of a single source; thus, the same gathering procedure has to be repeated for the other sources as well. However, in our scheme, node messages

166 5.4. Proposed Scheme

,...,M.repetitions The measurements can The data aggregation procedure is initialized by sensor (1) (1) (1) which computes the value φ x (1) + n φ x (i) N φ x (i) ! j1 1 i=1! ji 1 i!=1 ji 1 i) X.1 Subsequently,: + node + + ···(2) ! n··· (2) !N··· (2) !φj1 x2(1) i=1! φji x2(i)) i!=1 φji x2(i) : X2 : + + + ··· ! ··· !··· ! ! ! The data aggregation procedure is initialized by+ sensor + + +···φ(L)x (1) n··· (L) N··· (L) : XL : j1 L i=1 φji xL(i) i=1 φji xL(i) . Subsequently, node 2 calculates the ! !

1, the procedure is initialized. Subsequently, by sensor 1, which node 2 randomly generatestoi node numberi + 1. The procedureN repeats until

Pollutants Sensor ids (sources)

Reconstructed signals

Sink

y = Compressive Demixing

Figure 5.2: Proposed system for jointly gathering and jointly reconstructing the correlated sensor data.

167 Chapter 5. Large-Scale Data Gathering via Compressive Demixing are constituted by readings from all sources and hence these extra repetitions are avoided. As a result, the overall number of transmissions in the network is reduced. The aforementioned procedure is repeated M times, each indexed by j = 1,...,M. The measurements can then be written in the following matrix form:

T (1) (1) (1) y = q1 ... qj ... qM x1 + ... h i T + q(L) ... q(L) ... q(L) x , (5.12) ∙ ∙ ∙ 1 j M L h i where y = [y(1) . . . y(j) . . . y(M)]T is the measurements’ vector. Since

(l) (l) (l) (l) T Q = [q1 ,..., qj ... qM ] ,

Equation (5.12) can be written as

y = Q(1)x + + Q(L)x . (5.13) 1 ∙ ∙ ∙ L

The proposed data gathering strategy can readily address networks organized in a tree-based structure, similarly to [100, 134, 234]. In particular, each parent node waits until receiving the weighted sum from its children nodes, it adds its own randomly scaled value and then transmits the updated sum to the next parent node. The procedure repeats until the root nodes receive the weighted sums from their children nodes. The difference with [100, 134, 234] is that the measurements now are computed based on signals from all sources instead of only a specific source.

5.4.2 Joint Data Recovery via Compressive Demixing

Upon receiving the measurements, the sink proceeds to recovering x , l { l ∀ ∈ JL} based on y, the sensing matrix Φ and the rotation matrices U . In order to apply { l} the compressive demixing framework [145], the sensor signals x should demon- { l} strate an a priori known structure, such as sparsity or compressibility. Typically, signals x do not exhibit any desired structure. Nevertheless, in WSN deployments, { l} sensors demonstrate spatial correlation among their readings, an observation that enables x to be described more compactly in transform domains, such as wavelet { l}

168 5.5. Experimental Evaluation transform or DCT. We choose DCT as the data sparsifying transform in order to 1 align with prior work [134]. Thus, each signal xl has a representation sl = Ψ− xl, where Ψ is the DCT matrix. Instead of assuming that the representations s { l} are strictly sparse (i.e., s = K), we focus on compressible representations, i.e., k lk0 signals whose coefficients decay according to a power law when sorted in order of decreasing magnitude. To recover the compressible representations, we solve the following convex optimization problem:

L [ˆs1,...,ˆsL] = arg min αl sl 1 N s1,...,sL R k k ∈ Xl=1 (5.14) L (l) s. t. A sl = y, Xl=1

(l) (l) where A = ΦUlΨ = Q Ψ is known as the measurement matrix corresponding to source Xl, and αl > 0 are the parameters that trade off the relative importance of the regularizers. Here, we have assumed that α = 1, l . The problem l ∀ ∈ JL in (5.14) is quite similar to Problem (5.7) and can be solved using the cvx solver [95]. After solving (5.14), the final estimates of the signal vectors are calculated as: xˆ = Ψˆs , l . l l ∀ ∈ JL

5.5 Experimental Evaluation

As in Chapter 4, in order to evaluate the performance of the proposed system, we consider the problem of air-pollution monitoring. In particular, we used 6 105 × actual sensor readings of three air pollutants, namely, CO, NO2 and SO2, from the United States Environmental Protection Agency database [4], measured during 2015. We considered a tree-based network architecture with N = 1000 sensors, which comprises 15 subtrees defined by the sensor density in the geographic area3. The transmission of the sensor values is assumed to be conducted via the LoRa protocol, the most recent low-power wireless networking protocol, specifically designed for IoT architectures, which allows for extremely low-rate data transmission to long ranges.

3Each subtree corresponds to one of the following states: CA, NV, AZ, NC, SC, VA, WV, KY, TN, MA, RI, CT, NY, NJ, MD.

169 Chapter 5. Large-Scale Data Gathering via Compressive Demixing

3.5 Baseline DCS Proposed 3 Oracle problem

2.5

2 Aggregate Relative Error

1.5

1 200 400 600 800 1000 1200 1400 1600 1800 2000 Total Measurements

Figure 5.3: Performance evaluation of the proposed system against the base- line and the DCS systems using actual measurements from the EPA database.

In addition, we assume that the values sent between sensors are discretized using an analog-to-digital converter, where the bit-depth is 16 bits.

5.5.1 Comparison with State of the Art

We compare the proposed design against (i) the system in [100,134], where each air pollutant is independently recovered via Basis Pursuit (baseline system), (ii) the DCS system [21], and (iii) the system that recovers the signals based the oracle compressive demixing problem presented in [146]. It is important to notice that the comparison with the system (iii) is done only for illustration purposes, as this scheme requires a priori knowledge of signal approximations (optimal points) and, hence, it is realizable in a practical setting like ours. Moreover, DCS assumes that each signal xl is constructed by many readings of the same sensor. To have a fair comparison with our design, we have modified this framework by assuming that each signal xl contains readings from different sensor devices observing the same source. The performance comparison is expressed in aggregate relative error, i.e.,

L xl xˆl l=1 k x− k , as a function of the total number of measurements. Fig. 5.3 shows k lk P 170 5.5. Experimental Evaluation

Baseline ( =2) Z Baseline ( =5) 3.5 Z Baseline ( =10) Z DCS ( =2) Z DCS ( =5) Z DCS ( =10) 3 Z Proposed ( =2) Z Proposed ( =5) Z Proposed ( =10) Z Oracle ( =2) 2.5 Z Oracle ( =5) Z Oracle ( =10) Z

2 Aggregate Relative Error

1.5

1 200 400 600 800 1000 1200 1400 1600 1800 2000 Total Measurements

Figure 5.4: Performance evaluation of the proposed system the baseline and the DCS systems for various noise standard deviations: σZ = 2, 5, 10. that for a similar reconstruction quality, the proposed method achieves a reduction of up to 45% in the number of measurements compared to DCS. This means that the transmissions among the sensors are also reduced by the same factor, thereby significantly prolonging their battery life. Fig. 5.4 depicts the same performance comparison under the assumption that measurements are corrupted by noise. The noise refers to both the sensing procedure L (l) and the transmission and can be modelled as AWGN: y = l=1 A sl + z, where z is drawn i.i.d. from (0, σ ), with σ denoting standard deviation of the noise. N z z P We varied σz = 2, 5, 10 so as to include weak, moderate and strong noise corruption on the measurements. Fig. 5.4 shows that, for similar reconstruction quality, the proposed method provides significant reductions in the number of measurements compared to the baseline and the DCS systems, even when the noise increases. Moreover, the proposed scheme provides robustness against noise, since the relative error increases on average by 0.5% (σz = 2), 2.1% (σz = 5) and 4.2% (σz = 10) compared to the noiseless case.

171 Chapter 5. Large-Scale Data Gathering via Compressive Demixing

5.5.2 Comparison with Successive Reconstruction Architec- ture

We perform an experimental evaluation of the proposed system against the a system that uses a successive reconstruction architecture, proposed in Chapter 4. In particu- lar, we perform a comparison among the following systems: (i) the proposed scheme using compressive demixing principles, (ii) the scheme that recovers the signals using the oracle problem described in Section 5.3.3, (iii) the system that uses successive recovery architecture combined with the copula-based algorithm described in Sec- tion 4.5.2 (the Student’s t copula is used due to its superior performance, as proven in the Section 4.7.3), and (iv) the system that uses successive recovery architecture with the proposed algorithm for the extended ` ` problem (see Section 4.6). 1 − 1 Fig. 5.5 depicts the performance comparison of the four schemes in the noiseless case, i.e., when the measurements are not contaminated with measurement and/or sensing noise. It can be seen that the both systems that makes use of the succes- sive reconstruction architecture outperform the systems that recover the signals via compressive demixing principles when the total number of measurements is small (< 700). However, this changes for a larger amount of measurements: both the pro- posed design and the system based on the oracle problem provide significant rate savings of up to 50%. As we mentioned in Section 5.5.1, this leads to vast reduction of the transmitted information from sensor to sensor and, hence, prolonged sys- tem lifetime. It is important to highlight that although the system using the oracle problem provides improved performance compare to the proposed design, it cannot be used in our setup as reconstruction requires approximations of the constituent signals to be known at the sink. Fig. 5.6 shows the same performance comparison as in Fig. 5.5 in the presence of noise. The situation remains the same as in the noiseless case; the systems using successive reconstruction outperform the compressive-demixing based schemes in regime of small measurements’ number, whereas the situation is reversed for larger amounts of total measurements. The important observation in this set of experi- ments deals with robustness against noise. In particular, we see that the systems that perform signal recovery via compressive demixing are proven to be significantly more robust than the systems with the successive recovery architecture. This is be-

172 5.6. Conclusions

3.5 Oracle problem Proposed system Ext. l -l problem 1 1 3 Copula-based system

2.5

2 Aggregate Relative Error

1.5

1 200 400 600 800 1000 1200 1400 1600 1800 2000 Total Measurements

Figure 5.5: Performance evaluation of the proposed system against successive reconstruction via (i) the proposed algorithm for the extended `1 `1 problem and (ii) the proposed copula-based algorithm. Actual− measurements from the EPA database are used. cause the successive architecture introduces error propagation during reconstruction, resulting in less noise resilient schemes.

5.6 Conclusions

We introduced a new data gathering and recovery design well-suited for large-scale IoT-based applications. The proposed design leverages the theory of compressive demixing [145, 147] and performs joint aggregation and joint reconstruction of the sensor signals. In particular, we considered the application scenario of air-pollution monitoring. Based on experimentation using real-valued data from the EPA dataset [4], we have the following conclusions:

• We compared the performance of the compressive gathering scheme proposed in this chapter to the state-of-the-art compressive gathering schemes based on (distributed) compressed sensing [21, 100] and we showed that it provides significant savings in the required transmission rate that reaches up to 60%.

173 Chapter 5. Large-Scale Data Gathering via Compressive Demixing

3.5 Oracle problem ( =2) z Proposed system ( =2) z Ext. l -l problem ( =2) 3 1 1 z Copula-based system (( =2)) z Oracle problem ( =5) z Proposed system ( =2) z 2.5 Ext. l -l problem ( =5) 1 1 z Copula-based system (( =2)) z

2 Aggregate Relative Error

1.5

1 200 400 600 800 1000 1200 1400 1600 1800 2000 Total Measurements

Figure 5.6: Performance evaluation of the proposed system against successive reconstruction via (i) the proposed algorithm for the extended ` ` problem and (ii) the proposed copula-based algorithm 1 − 1 for various noise standard deviations: σZ = 2, 5. Actual measure- ments from the EPA database are used.

• Moreover, we compared the scheme proposed in this chapter with the two novel compressive gathering mechanisms proposed in Chapter 4. We saw that the designs that abide by a successive recovery architecture, powered by either the proposed copula-based algorithm or the extended ` ` framework, provide 1 − 1 better performance recovery than the compressive demixing framework when the number of measurements is low. Nevertheless, when high reconstruction quality (i.e., low relative error) is required, compressive demixing outperforms the mechanisms in Chapter 4, resulting in significant rate reductions of up to 50%.

• The design proposed in this chapter provides robustness against measurement and communication noise, similarly to the proposed compressive gathering mechanisms in Chapter 4.

• Finally, it offers a form of data encryption by projecting the sensor values on a randomly-generated sensing basis.

174 Chapter 6

Conclusions & Future Study

This thesis considered the problem of data aggregation and recovery in Internet-of- Things applications of various scales. In particular, we proposed code designs that can be applied to wireless sensor networks—i.e., the enabling technology for IoT systems—and can efficiently deal with various challenges, such as (i) reducing the power consumption on the sensor nodes, (ii) providing balanced energy consumption in the network, and (iii) offering robust data transmission. Initially, we proposed gathering mechanisms that leverage source statistics dur- ing signal recovery at the sink; thus, they follow a Bayesian-based approach. Prior studies, which also abide by this approach, use conventional statistical models to describe the dependence structure among sensor data, such as the multivariate Gaussian model. Therefore, they treat data sources as being of the same type— alternately, sources of homogeneous types. However, Chapter 2 showed that this is a realistic assumption since typical IoT applications involve heterogeneous sensor data. To that end, we introduced Bayesian-based designs that make use of copula functions [158, 193], a powerful mathematical tool that accounts for the hetero- geneity among sensor data. Copulas provide joint distributions where the marginal distributions of the data sources can be arbitrary. Apart form Bayesian-based solutions, we also introduced compressive data gath- ering schemes based on convex linear programming. These schemes (i) are not de- pendent on the source statistics, and (ii) are efficient in large-scale WSNs, since

175 Chapter 6. Conclusions & Future Study they are based on dimensionality reduction principles. Although we focused on applications that have attracted lots of interest during the last 2 years, such as ambient-condition and air-quality monitoring, we provided generic schemes that can be applied to many applications, where rate reductions, (balanced) battery consumption, and robust transmission are of paramount impor- tance. Moreover, our primal concern was to provide designs that are optimized based on the scale of the network. First, we targeted at small-to-medium use cases. In Chapter 3 we introduced a code design based on distributed source coding principles. This design proposed a combination of DPCM and Lloyd-Max quantization with a novel copula regression method as an extra refining stage at the joint decoder. The new regression method provides accurate statistical inference, as it accounts for the data heterogeneity, at reasonable complexity. Experimentation showed that the proposed method results in significant rate savings, as well as distortion reductions, compared to prior art on multiterminal source coding [49] and Wyner-Ziv coding [60, 228]. Apart from efficient in-network compression, this design provides flexibility since the decoding procedure can be performed even if part of the sensors are inactive. Nevertheless, an extension of this technique to large-scale networks requires multi-level cluster-based topologies, which add complexity in the system and, more importantly, suffer from the following problem: they result in unbalanced power consumption in the network. This means that sensor nodes closer to the sink have to relay sensed information from distant nodes and hence they have their battery depleted more rapidly. As a result, we resorted to compressive data gathering techniques, which account for this issue and also offer other advantages, such as low encoding complexity and robustness against noise. Chapter 4 introduced a new compressive data gathering and recovery scheme that targets at large-scale IoT applications, such as air pollution monitoring in large geographic areas. To recover the sensor readings at the sink, a new copula- based recovery algorithm that solves the problem of compressed sensing with mul- tiple side information was proposed. Based on experiments with real data from the EPA database, we proved that the proposed design offers improvements in the over- all network rate compared to state-of-the-art methods, such as the ADMM-based

176 algorithm for the ` ` problem [152] and the distributed compressed sensing 1 − 1 paradigm [20], even in the presence of noise. Except for the aforementioned Bayesian-based code designs, where source statis- tics are leveraged during reconstruction through powerful copula-based modeling, we investigated schemes based on linear convex optimization as well. This approach is very useful for applications where (i) statistical traits of the sources are not avail- able a priori during reconstruction, or (ii) source statistics change very fast. There- fore, Chapter 4 proposed an alternative compressive gathering scheme that performs joint sensor data recovery via an extended version of the ` ` framework. This 1 − 1 scheme provides better performance compared to (distributed) compressed sensing, but still underperforms the scheme proposed in Chapter 4, which makes use of the copula-based recovery method. Finally, Chapter 5 introduced the fourth data acquisition and recovery mecha- nism for large-scale IoT applications. This design leverages the compressive demixing theory, which enables both joint data aggregation and joint data recovery. There- fore, it provides significant rate savings compared to state-of-art methods based on Basis Pursuit recovery and DCS, which reach up to 60%. Moreover, we compared this design against the proposed compressive gathering schemes in Chapter 4 and we see that the rate reductions are smaller when low reconstruction quality is tar- geted. Nonetheless, in the low relative-error regime (high reconstruction quality), it delivers the best performance with rate gains of up to 50%. we have made a comparison among all proposed compressive gathering schemes, we have not included the system proposed in Chapter 3. The reason for doing this is very clear: the MT code design with copula regression target at small-to-moderate setups, where balanced battery consumption is not a primal constraint and the main scope is to reduce the network rate such that the required power for data transmission is reduced. The same goal needs to be achieved in large-scale setups. But, in order to prolong the lifetime of the network, balanced energy consumption is desired. bawe also desire but when applied to large-scale

177 Chapter 6. Conclusions & Future Study

6.1 Future Work

We give some directions on how the proposed data gathering and recovery methods can be improved with respect to various aspects. In particular, we describe how the proposed schemes could compare with other systems based on competitive statisti- cal models (e.g., joint GMM)1 and related inference methods, how data recovery of the proposed schemes can be enhanced such that they can deal with dynamic source statistics, algorithmic convergence and noise corruption, as well as how the compres- sive demixing approach described in Chapter 5 could be adapted so as to account for data heterogeneity. In particular, our future work will focus on the following aspects:

• The GMM can provide a competitive solution for accurately describing a large variety of marginal statistics. Copulas could be used for providing a multivari- ate pdf, where all the marginal statistics are following the GMM distribution. Another approach, which is also considered in recent studies for compressive signal recovery and classification [178, 179], is to model the joint statistics through the joint GMM. To incorporate this model into the proposed MT code design (Chapter 3), the refining stage should be based on Gaussian mix- ture regression (GMR) methods. Recent studies on this topic have been pre- sented in [35, 36], where Expectation-Maximization (EM) iterative learning algorithms are used to retrieve partial output data by specifying the desired inputs. Future work could compare the reconstruction performance between the proposed copula-based MT design and a system that uses the GMR meth- ods [35, 36] to refine the DPCM decoded values. The performance evaluation would reveal which model is more accurate for the joint statistics and the level of inference provided by the different regression methods. Apart from that, it would be interesting to see the comparison in complexity at the joint decoder side.

• The joint GMM can be also used in the belief-propagation recovery algorithm proposed in Chapter 4. This can be addressed by modifying the initial messages

1These systems have not been proposed in the state of the art, however they could provide alternative solutions for the problems considered in this thesis.

178 6.1. Future Work

of Eq. (4.20), namely,

(0) l 1 l 1 l 1 qi j[sl(i)] = fSl S − sl s − =s ˆ − , → | |  such that they contain corresponding conditional pdf expressions based on the joint GMM.

• The reconstruction algorithms proposed in Chapter 4 provide superior per- formance compared to state-of-the-art methods in terms of relative error. In particular, the copula-based belief-propagation method provides the most ac- curate recovery since it accounts for the heterogeneity among the sensor sig- nals. However, this algorithm has a weakness that may affect the quality of reconstruction: as it is a loopy belief-propagation method, convergence is not always guaranteed; this is the reason why in the proposed algorithm we use a maximum number of iterations as stopping criterion. To deal with issue, we may explore to alternative techniques based on VB inference, where conver- gence is guaranteed [224]. However, providing closed-form expressions for the update conditions in a VB algorithm is not a trivial task since expressions of multivariate copulas are involved. Our first step would be providing update conditions for copulas with simple expressions (e.g., the Gaussian copula) in- cluding solely two margins and then addressing more generic cases. Moreover, another aspect that needs to be considered has to do with the complexity of the VB techniques. In [103], we can see that tree-structured VB inference for sparse signals can be less complex (wrt computation time) than MCMC [102] and Bayesian CS (BCS) [112] methods, however a recovery-complexity comparison with loopy belief-propagation techniques, such as ours or the one proposed by Baron et al. in [20], has not been made.

• The copula-based belief-propagation method proposed in Chapter 4 can be further improved on how it handles noise corruption on the measurements. In Section 4.7.4, we showed that it provides robustness against measurement and communication noise, modelled as AWGN. However, recently, an alterna- tive Bayesian CS method has been proposed in [176]. This recovery technique is called generalized approximate message passing (GAMP) and although it

179 Chapter 6. Conclusions & Future Study

is used for recovering a single structured signal (including variations with Markov-tree prior [198] for compressive imaging), it is not extended so far in order to include more side information signals during reconstruction (i.e., the problem of CS with multiple side information). The GAMP algorithm is an iterative method that relies on relaxed belief propagation (RBP) [175]; it achieves fast and robust CS recovery as it simplifies the message passing procedure in the bipartite Tanner graph by using Gaussian approximations. In particular, the update messages in the GAMP algorithm are not sampled pdfs, as in the proposed copula-based algorithm and in [20], but only two val- ues that describe the pdfs, i.e., the mean and the variance. The calculation of the update messages is done in two levels, the input and the output level, each respectively characterized by two functions [176]. The input function models the statistical characteristics of the target signal, whereas the output function describes the statistical dependence between the measurements vector and the noise. Our future goal will be to provide closed-form expressions for the mes- sage updates at the input level, such that the input function describes the dependence between the target signal and the side information signals. The derivation of these expressions is not an easy task due to the complex expres- sions of copula functions. However it seems feasible for specific functions, such as the Gaussian copula. One important observation is that GAMP is not used only for recovery, but also for binary classification [236]. The latter is done by setting the output function to be of a specific type (e.g., logistic sigmoid, pro- bit function [27]). Therefore, by extending the GAMP method to cases where heterogeneous signals, we can address use cases where side information can be used both for recovery (e.g., WSNs) or binary classification (e.g., image classification based on metadata).

• The Bayesian algorithms proposed in this thesis—namely, the copula regres- sion algorithm in Chapter 3 for the refinement of the DPCM decoded val- ues, and the copula-based belief propagation algorithm in Chapter 4 for the successive recovery of the sensor signals—rely on the fact that the statistics describe sensor values varying slowly in time. Hence, similarly to many previ- ous studies (e.g., [49]), offline estimation is considered. However, for the sake

180 6.1. Future Work of completeness and in order to include use cases where the source statistics vary significantly in time (i.e., they are not stationary processes), future work can focus on incorporating online estimators for the statistical parameters of data sources into the proposed mechanisms. Online estimation of the copula parameters is not an easy task; finding the best way to model asymmetries and time-variations of the dependence structure is still an open question [65]. Sev- eral attempts for solving this problem mainly address applications involving financial time series data [65,96,166,181]. However, these studies and the ma- jority of research on copulas are conducted on the bivariate level. Application scenarions like the ones investigated in this thesis involve modeling with higher dimensions, and hence finding a numerically tractable model that is flexible enough to capture real data behavior becomes increasingly complicated [73]. To account for higher dimensions, the studies in [165, 182] introduce hierar- chical Archimedean copulas, while the authors in [164] propose factor copula models. In addition, vine copula constructions proposed by Aas et al. [12] pro- vide a flexible approach that estimates the parameters sequentially. Similarly, the literature offers different ways to specify time-varying copulas: besides the regime-switching models, the model developed in [99] is based on copula pa- rameters that are transformed by a latent Gaussian autoregressive process. A combination of the two approaches is proposed in [50] to estimate regime- switching vine copulas with the time-varying feature. In [17] an extension of the stochastic autoregressive copula model to higher dimensions using vine models is also presented. However, the latter models are estimated sequen- tially based on a simulated maximum likelihood estimator. The most recent study focusing on asset portofolios is presented in [13], where two different modeling approaches that account for time-variations in the dependence struc- ture are proposed. The first approach is based on the study in [50], where a regime-switching copula is proposed to capture the variations of the depen- dency structure over time. In this framework, copulas are static within one regime, but vary across regimes. Since the variations between the regimes can- not be known in advance, they are assumed to be governed by a latent Markov process. The second approach to model time-varying dependencies consists of

181 Chapter 6. Conclusions & Future Study

dynamic copulas whose parameters are allowed to vary with every discrete time step. To create dynamic elliptical copulas, the dynamic conditional correlation (DCC) model proposed in [76] is applied to multivariate elliptical copulas. As future work, it would be interesting to extend the latter two methods in our use cases so as to provide an updating method for the copula parameters of the proposed data gathering mechanisms.

• Motivated by state-of-the-art studies on the problem of source separation for two or more constituent signals, in Chapter 5 we solve the compressive demix- ing problem based on convex programming. However, similarly to the com- pressed sensing theory, the problem of compressing demixing could be also addressed from a Bayesian perspective. Variational Bayes inference has been already used for the JSM-1 distributed compressed sensing paradigm in [48], where multivariate Gaussian models were considered for the common and the innovations components of the joint sparsity framework. One of our future goals would be to develop a copula-based variational Bayes technique that addresses the problem of compressing demixing. Similarly to our previous dis- cussion on VB techniques, we acknowledge that the main difficulty of this approach will be the derivation of closed-form update rules, which will accel- erate the decoding procedure.

6.2 General Directions

Apart form these specific improvements on the already proposed meachanisms, we present some general directions for future data gathering strategies. Our personal belief is that, among other IoT applications, scenarios related to smart homes, smart cities and eHealth will experience important growth. In general, modern data gathering strategies should take into consideration objectives for the following topics:

• Security in IoT applications remains a technological barrier that needs con- siderable attention. The success of several applications related to smart homes, connected cars and smart metering relies on user robust security capabilities.

182 6.2. General Directions

As the volume of sensitive data conveyed over the IoT drastically increases, the risk of device (and server/network) manipulation, data falsification, as well as data and IP theft. Therefore, industrial IoT technology providers should find efficient solutions for the remaining engineering issues so as to enable the de- sign of secure IoT systems.

• In order to fully leverage the potential provided by several IoT scenarios, the related architectures should be enhanced with Cloud solutions and Big Data technologies. The massive amount of data transferred over IoT could be cleaned and processed using advanced machine learning mechanisms. To this end, they could be really helpful for extracting patterns and efficient deci- sion making. Moreover, since some applications deal with crucial information, cloud technologies could provide a competitive alternative to traditional stor- age systems that make data more accessible.

• User acceptability is another important topic that needs validation, espe- cially for applications that are not operational today still require some re- search. Some examples are could be car-to-car communication or enhanced assisted living for the purpose of relaying safety critical information.

• Another future challenge deals with innovation on the available object plat- forms. Apart form developers and engineers, experienced users should be also leveraged for the development of innovative applications. A similar approach is used in applications related to Interactive Machine Learning (IML), where ex- perienced users are used to validate results of traditional methods. Moreover, more innovation is needed in the way non-experienced users could communi- cate with smart objects.

• In order to realise the vision of the IoT, several use cases related to differ- ent application domains, such as smart homes, smart manufacturing, smart cities and smart power management, should cooperate. This could outcome in several cross use-cases issues that need to be tackled. To this end, agent- driven applications that would test systems in physical spaces in relation to the human scale should be deployed.

183

List of publications

Journals

E. Zimos, D. Toumpakaris, A. Munteanu, N. Deligiannis, “Multiterminal Source Cod- ing with Copula Regression for Wireless Sensor Networks Gathering Diverse Data,” IEEE Sensors Journal, 17(1):139-150, 2017.

N. Deligiannis, E. Zimos, D. Ofrim, Y. Andreopoulos, and A. Munteanu,“ Distributed joint source-channel coding with copula-function-based correlation modeling for wireless sensors measuring temperature,” IEEE Sensors Journal, 15(8):4496–4507, 2015.

N. Deligiannis, J. F. C. Mota, E. Zimos, and M. R. D. Rodrigues, “Heterogeneous Net- worked Data Recovery from Compressive Measurements Using a Copula Prior,” under revision on IEEE Transactions on Communications, 2017.

Conference proceedings

E. Zimos, J. F. C. Mota, M. R. D. Rodrigues and N. Deligiannis, “Bayesian compressed sensing with heterogeneous side information,” IEEE Data Compression Conference (DCC), pages 191-200, Snowbird, Utah, USA, Apr. 2016.

E. Zimos, J. F. C. Mota, M. R. D. Rodrigues and N. Deligiannis, “Internet-of-Things Data Aggregation Using Compressed Sensing with Side Information,” IEEE International Conference on Telecommunications (ICT) , pages 1-5, Thessaloniki, Greece, May 2016.

A. Sechelea, T. Do Huu, E. Zimos, and N. Deligiannis, “Twitter Data Clustering and Visualization”, IEEE International Conference on Telecommunications (ICT), pages 1-5, Thessaloniki, Greece, May 2016.

185 N. Deligiannis, E. Zimos, D. Ofrim, Y. Andreopoulos, and A. Munteanu, “Distributed joint source-channel coding with raptor codes for correlated data gathering in wireless sen- sor networks,” 9th International Conference on Body Area Networks, pages 279-285, Sept. 2015.

186 Bibliography

[1] [Online]. Available: http://www.digikey.com/en/articles/techzone/2011/dec/ fundamentals-of-piezoelectric-shock-and-vibration-sensors.

[2] [Online]. Available: http://www.libelium.com/products/plug-sense/ technical-overview/.

[3] [Online]. Available: https://www.lora-alliance.org/.

[4] [Online]. Available: http://www3.epa.gov/airdata/.

[5] [Online]. Available: http://www.smartsantander.eu/.

[6] [Online]. Available: https://nl.mathworks.com/help/stats/kstest2.html# btn37ur.

[7] [Online]. Available: http://traces.cs.umass.edu/index.php/Smart/Smart.

[8] [Online]. Available: http://www.oecd.org/dac/dcr2012.htm.

[9] Environmental Protection Department of Hong Kong. Air Quality Health Index. [Online]. Available: http://www.aqhi.gov.hk/en.html.

[10] IoT Analytics. [Online]. Available: https://iot-analytics.com/.

[11] World Health Organization. 7 Million Premature Deaths Annually Linked to Air Pollution. [Online]. Available: http://www.who.int/mediacentre/news/releases/ 2014/air-pollution/en/.

[12] Kjersti Aas, Claudia Czado, Arnoldo Frigessi, and Henrik Bakken. Pair-copula constructions of multiple dependence. Insurance: and economics, 44(2):182–198, 2009.

[13] Matthias Daniel Aepli. Portfolio Risk Forecasting-On the Predictive Power of Mul- tivariate Dynamic Copula Models. PhD thesis, University of St. Gallen, 2015.

187 Bibliography

[14] Huseyin Dogus Akaydin, Niell Elvin, and Yiannis Andreopoulos. Energy harvesting from highly unsteady fluid flows using piezoelectric materials. Journal of Intelligent Material Systems and Structures, 21(13):1263–1278, 2010.

[15] Ian F Akyildiz and Mehmet Can Vuran. Wireless sensor networks, volume 4. John Wiley & Sons, 2010.

[16] ZigBee Alliance et al. Zigbee specification, 2006.

[17] Carlos Almeida, Claudia Czado, and Hans Manner. Modeling high dimensional time- varying dependence using d-vine scar models. arXiv preprint arXiv:1202.2008, 2012.

[18] Dennis Amelunxen, Martin Lotz, Michael B McCoy, and Joel A Tropp. Living on the edge: Phase transitions in convex programs with random data. Information and Inference, page iau005, 2014.

[19] Dror Baron, Marco F Duarte, Shriram Sarvotham, Michael B Wakin, and Richard G Baraniuk. An information-theoretic approach to distributed compressed sensing. In 45th Annu. Allerton Conf. Commun., Control, and Computing, 2005.

[20] Dror Baron, Shriram Sarvotham, and Richard G Baraniuk. Bayesian compressive sensing via belief propagation. IEEE Trans. Signal Process., 58(1):269–280, 2010.

[21] Dror Baron, Michael B Wakin, Marco F Duarte, Shriram Sarvotham, and Richard G Baraniuk. Distributed compressed sensing. IEEE Trans. Inf. Theory, 52(12):5406– 5425, 2006.

[22] Parmida Beigi, Xiaoyu Xiu, and Jie Liang. Compressive sensing based multiview image coding with belief propagation. In Asilomar Conf. Signals, Syst., Comput. IEEE, 2010.

[23] Toby Berger. Multiterminal source coding. Inf. Theory Approach Commun., 229:171– 231, 1977.

[24] Toby Berger. Multiterminal source coding. The information theory approach to communications, 229:171–231, 1977.

[25] Albert J Berni and William D Gregg. On the utility of chirp modulation for digital signaling. IEEE Trans. Commun., 21(6):748–751, 1973.

[26] Arne Beurling. Sur les int´egrales de fourier absolument convergentes et leur ap- plication `aune transformation fonctionelle. In Ninth Scandinavian Mathematical Congress, pages 345–366, 1938.

[27] C Bishop. Pattern recognition and machine learning (information science and statis- tics), 1st edn. 2006. corr. 2nd printing edn. Springer, New York, 2007.

188 [28] Martin Bor, John Edward Vidler, and Utz Roedig. LoRa for the Internet of Things. MadCOM 2016, 2016.

[29] Eric Bouy´e, Valdo Durrleman, Ashkan Nikeghbali, Ga¨el Riboulet, and Thierry Roncalli. Copulas for finance-a reading guide and some applications. Available: http: // ssrn. com/ abstract= 1032533 , 2000.

[30] Adrian W Bowman. An alternative method of cross-validation for the smoothing of density estimates. Biometrika, pages 353–360, 1984.

[31] Adrian W Bowman and Adelchi Azzalini. Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations. Oxford University Press, 1997.

[32] W. Breymann, A. Dias, and P. Embrechts. Dependence structures for multivariate high-frequency data in finance. Quantitative Finance, 3(1):1–14, 2003.

[33] Michael Buettner, Gary V Yee, Eric Anderson, and Richard Han. X-mac: a short preamble mac protocol for duty-cycled wireless sensor networks. In Proceedings of the 4th international conference on Embedded networked sensor systems, pages 307–320. ACM, 2006.

[34] Francesco Calabrese, Kristian Kloeckl, and Carlo Ratti. Wikicity: Real-time location- sensitive tools for the city. IEEE Pervasive Computing, July-September 2007.

[35] S. Calinon. Robot Programming by Demonstration: A Probabilistic Approach. EPFL/CRC Press, 2009. EPFL Press ISBN 978-2-940222-31-5, CRC Press ISBN 978-1-4398-0867-2.

[36] S. Calinon, F. Guenter, and A. Billard. On learning, representing and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, Part B, 37(2):286–298, 2007.

[37] Emmanuel J Cand`es, Justin Romberg, and Terence Tao. Robust uncertainty prin- ciples: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory, 52(2):489–509, 2006.

[38] Emmanuel J Candes and Terence Tao. Decoding by linear programming. IEEE Trans. Inf. Theory, 51(12):4203–4215, 2005.

[39] Emmanuel J Cand`es and Michael B Wakin. An introduction to compressive sampling. IEEE Signal Process. Mag., 25(2):21–30, 2008.

189 Bibliography

[40] Philippe Cap´era`aand Christian Genest. Spearman’s ρ is larger than kendall’s τ for positively dependent random variables. Journal of Nonparametric Statistics, 2(2):183–194, 1993.

[41] Volkan Cevher, Piotr Indyk, Lawrence Carin, and Richard G Baraniuk. Sparse signal recovery and acquisition with graphical models. IEEE Signal Processing Magazine, 27(6):92–103, 2010.

[42] Venkat Chandrasekaran, Benjamin Recht, Pablo A Parrilo, and Alan S Willsky. The convex geometry of linear inverse problems. Found. Computational Mathematics, 12(6):805–849, 2012.

[43] Adam Charles, M Salman Asif, Justin Romberg, and Christopher Rozell. Sparsity penalties in dynamical system estimation. In Information Sciences and Systems (CISS), 2011 45th Annual Conference on, pages 1–6. IEEE, 2011.

[44] Deji Chen, Mark Nixon, and Aloysius Mok. Why WirelessHART. In WirelessHART, pages 195–199. Springer, 2010.

[45] Feng Chen, Marcin Rutkowski, Christopher Fenner, Robert C Huck, Shuang Wang, and Shukang Cheng. Compression of distributed correlated temperature data in sensor networks. In Data Compression Conf. (DCC), pages 479–479. IEEE, 2013.

[46] Guang-Hong Chen, Jie Tang, and Shuai Leng. Prior image constrained compressed sensing (piccs): a method to accurately reconstruct dynamic ct images from highly undersampled projection data sets. Medical physics, 35(2):660–663, 2008.

[47] Scott Shaobing Chen, David L Donoho, and Michael A Saunders. Atomic decompo- sition by basis pursuit. SIAM J. Sci. Comput., 20(1):33–61, 1998.

[48] Wei Chen and Ian J Wassell. A decentralized bayesian algorithm for distributed compressive sensing in networked sensing systems. IEEE Transactions on Wireless Communications, 15(2):1282–1292, 2016.

[49] Samuel Cheng. Multiterminal source coding for many sensors with entropy coding and Gaussian process regression. In Proc. Data Compress. Conf. (DCC), page 480. IEEE, 2013.

[50] Lor´an Chollete, Andr´eas Heinen, and Alfonso Valdesogo. Modeling international financial returns with a multivariate regime-switching copula. Journal of financial econometrics, page nbp014, 2009.

[51] Alexandre Ciancio, Sundeep Pattem, Antonio Ortega, and Bhaskar Krishnamachari. Energy-efficient data representation and routing for wireless sensor networks based on

190 a distributed wavelet compression algorithm. In Int. Conf. Inform. Process. Sensor Networks, pages 309–316. ACM, 2006.

[52] David G Clayton. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65(1):141–151, 1978.

[53] Robert T Clemen and Terence Reilly. Correlations and copulas for decision and risk analysis. Management Science, 45(2):208–224, 1999.

[54] R Dennis Cook and Mark E Johnson. A family of distributions for modelling non- elliptically symmetric multivariate data. Journal of the Royal Statistical Society. Series B (Methodological), pages 210–218, 1981.

[55] Robert G Cowell. Probabilistic networks and expert systems: Exact computational methods for Bayesian networks. Springer Science & Business Media, 2006.

[56] Mark Crovella and Eric Kolaczyk. Graph wavelets for spatial traffic analysis. In Annu. Joint Conf. IEEE Comput. and Commun. (INFOCOM), volume 3, pages 1848–1857. IEEE, 2003.

[57] Mark A Davenport, Marco F Duarte, Yonina C Eldar, and Gitta Kutyniok. Intro- duction to compressed sensing. Preprint, 93(1):2, 2011.

[58] Russell Davidson, James G MacKinnon, et al. Estimation and inference in econo- metrics. 1993.

[59] Paul Deheuvels. La fonction de d´ependance empirique et ses propri´et´es. un test non param´etrique dind´ependance. Acad. Roy. Belg. Bull. Cl. Sci.(5), 65(6):274–292, 1979.

[60] Nikos Deligiannis, Adrian Munteanu, Shuang Wang, Shukang Cheng, and Peter Schelkens. Maximum likelihood laplacian correlation channel estimation in layered wyner-ziv coding. IEEE Trans. Signal Process., 62(4):892–904, 2014.

[61] Nikos Deligiannis, Andrei Sechelea, Adrian Munteanu, and Samuel Cheng. The no- rate-loss property of Wyner-Ziv coding in the Z-channel correlation case. IEEE Commun. Lett., 18(10):1675–1678, 2014.

[62] Nikos Deligiannis, Evangelos Zimos, Dragos Ofrim, Yiannis Andreopoulos, and Adrian Munteanu. Distributed joint source-channel coding with copula-function- based correlation modeling for wireless sensors measuring temperature. IEEE Sensor J., 15(8):4496–4507, 2015.

191 Bibliography

[63] Nikos Deligiannis, Evangelos Zimos, Dragos Mihai Ofrim, Yiannis Andreopoulos, and Adrian Munteanu. Distributed joint source-channel coding with raptor codes for correlated data gathering in wireless sensor networks. In Int. Conf. Body Area Networks, pages 279–285. ICST, 2014.

[64] Nikos Deligiannis, Evangelos Zimos, Dragos Mihai Ofrim, Yiannis Andreopoulos, and Adrian Munteanu. Distributed joint source-channel coding with raptor codes for correlated data gathering in wireless sensor networks. In Int. Conf. Body Area Networks (Bodynets), pages 279–285, 2014.

[65] Alexandra Dias and Paul Embrechts. Modeling exchange rate dependence dynamics at different time horizons. Journal of International Money and Finance, 29(8):1687– 1705, 2010.

[66] David L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52:1289–1306, 2006.

[67] David L Donoho and Xiaoming Huo. Uncertainty principles and ideal atomic de- composition. IEEE Trans. Inf. Theory, 47(7):2845–2862, 2001.

[68] David L Donoho, Arian Maleki, and Andrea Montanari. Message-passing algorithms for compressed sensing. Proc. Nat. Academy Sci., 106(45):18914–18919, 2009.

[69] Marco F Duarte, Shriram Sarvotham, Dror Baron, Michael B Wakin, and Richard G Baraniuk. Distributed compressed sensing of jointly sparse signals. In Asilomar Conf. Signals, Syst., Comput., pages 1537–1541, 2005.

[70] Valdo Durrleman, Ashkan Nikeghbali, Thierry Roncalli, et al. Which copula is the right one. Working paper, Group de Recherche Operationelle, Credit Lyonnais, (March), 2001.

[71] Yonina C Eldar and Gitta Kutyniok. Compressed sensing: theory and applications. Cambridge University Press, 2012.

[72] Torbjørn Eltoft, Taesu Kim, and Te-Won Lee. On the multivariate laplace distribu- tion. IEEE Signal Processing Letters, 13(5):300–303, 2006.

[73] Paul Embrechts and Marius Hofert. Statistics and quantitative risk management for banking and insurance. Annual Review of Statistics and Its Application, 1:493–514, 2014.

[74] Paul Embrechts, Filip Lindskog, and Alexander McNeil. Modelling dependence with copulas. Rapport technique, D´epartement de math´ematiques, Institut F´ed´eral de Technologie de Zurich, Zurich, 2001.

192 [75] Paul Embrechts, Alexander McNeil, and Daniel Straumann. Correlation and depen- dence in risk management: properties and pitfalls. Risk Manage.: Value Risk and Beyond, pages 176–223, 2002.

[76] Robert Engle. Dynamic conditional correlation: A simple class of multivariate gen- eralized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics, 20(3):339–350, 2002.

[77] Vassiliy A Epanechnikov. Non-parametric estimation of a multivariate probability density. Theory of Probability and Its Applications, 14(1):153–158, 1969.

[78] T Eude, R Grisel, Hocine Cherifi, and R Debrie. On the distribution of the DCT coefficients. In IEEE Int. Conf. Acoust., Speech, and Signal Process. (ICASSP), pages V–365. IEEE, 1994.

[79] Dave Evans. The internet of things. How the Next Evolution of the Internet is Changing Everything, Whitepaper, Cisco Internet Business Solutions Group (IBSG), 1:1–12, 2011.

[80] Kaitai Fang, , and Kai Wang Ng. Symmetric multivariate and related distributions, volume 36. Chapman & Hall/CRC, 1990.

[81] Maurice J Frank. On the simultaneous associativity off (x, y) andx+y- f (x, y). Aequationes mathematicae, 19(1):194–226, 1979.

[82] Maurice Fr´echet. Les tableaux dont les marges sont donn´ees. Trabajos de estad´ıstica, 11(1):3–18, 1960.

[83] Edward W Frees and Emiliano A Valdez. Understanding relationships using copulas. North Amer. Actuarial J., 2(1):1–25, 1998.

[84] Maria Fresia, Luc Vandendorpe, and H Vincent Poor. Distributed source coding using raptor codes for hidden Markov sources. IEEE Trans. Signal Process., 57(7):2868– 2875, 2009.

[85] Robert G Gallager. Low-density parity-check codes. IRE Trans. on Inform. Theory, 8(1):21–28, 1962.

[86] Javier Garcia-Frias. Compression of correlated binary sources using turbo codes. IEEE Commun. Lett., 5(10):417–419, 2001.

[87] Alan E Gelfand and Adrian FM Smith. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410):398–409, 1990.

193 Bibliography

[88] Christian Genest and Jock Mackay. The joy of copulas: bivariate distributions with uniform marginals. Amer. Statistician, 40(4):280–283, 1986.

[89] Christian Genest and R Jock MacKay. Copules archim´ediennes et families de lois bidimensionnelles dont les marges sont donn´ees. Canadian Journal of Statistics, 14(2):145–159, 1986.

[90] Christian Genest and Louis-Paul Rivest. Statistical inference procedures for bivariate archimedean copulas. Journal of the American statistical Association, 88(423):1034– 1043, 1993.

[91] Christian Genest and Louis-Paul Rivest. On the multivariate probability integral transformation. Stat. & Probability Lett., 53(4):391–399, 2001.

[92] Alan Genz. Numerical computation of multivariate normal probabilities. Journal of computational and graphical statistics, 1(2):141–149, 1992.

[93] Allen Gersho and Robert M Gray. Vector quantization and signal compression. Springer, 1992.

[94] Thomas H Gormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein, et al. Introduction to algorithms. MIT Press, 44:97–138, 1990.

[95] Michael Grant, S Boyd, and Y Ye. CVX: MATLAB software for disciplined convex programming. [Online]. Available: http: // cvxr. com/ cvx/ , 2015.

[96] Dominique Guegan and Jing Zhang. Change analysis of a dynamic copula for measur- ing dependence in multivariate financial data. Quantitative Finance, 10(4):421–430, 2010.

[97] Emil J Gumbel. Bivariate logistic distributions. Journal of the American Statistical Association, 56(294):335–349, 1961.

[98] Emil Julius Gumbel. Distributions des valeurs extrˆemes en plusieurs dimensions. Publ. Inst. Statist. Univ. Paris, 9:171–173, 1960.

[99] Christian M Hafner and Hans Manner. Dynamic stochastic copula models: Estima- tion, inference and applications. Journal of Applied Econometrics, 27(2):269–295, 2012.

[100] Jarvis Haupt, Waheed U Bajwa, Michael Rabbat, and Robert Nowak. Compressed sensing for networked data. IEEE Signal Proces. Mag., 25(2):92–101, 2008.

[101] Jarvis Haupt and Robert Nowak. Signal reconstruction from noisy random projec- tions. IEEE Trans. Inf. Theory, 52(9):4036–4048, 2006.

194 [102] Lihan He and Lawrence Carin. Exploiting structure in wavelet-based bayesian com- pressive sensing. IEEE Transactions on Signal Processing, 57(9):3488–3497, 2009.

[103] Lihan He, Haojun Chen, and Lawrence Carin. Tree-structured compressive sensing with variational bayesian analysis. IEEE Signal Processing Letters, 17(3):233–236, 2010.

[104] Marius Hofert, Martin Maechler, and Alexander J McNeil. Estimators for archimedean copulas in high dimensions. arXiv preprint arXiv:1207.1708, 2012.

[105] Robert G Hollands. Will the real smart city please stand up? intelligent, progressive or entrepreneurial? City, 12(3):303–320, 2008.

[106] Jan Holler, Vlasios Tsiatsis, Catherine Mulligan, Stefan Avesand, Stamatis Karnouskos, and David Boyle. From Machine-to-machine to the Internet of Things: Introduction to a New Age of Intelligence. Academic Press, 2014.

[107] Philip Hougaard. A class of multivanate failure time distributions. Biometrika, 73(3):671–678, 1986.

[108] David Huard, Guillaume Evin, and Anne-Catherine Favre. Bayesian copula selection. Computational Statistics & Data Analysis, 51(2):809–822, 2006.

[109] ISA100 ISA. 100.11 a-2009: Wireless systems for industrial automation: Process con- trol and related applications. International Society of Automation: Research Triangle Park, NC, USA, 2009.

[110] Satish G Iyengar, Pramod K Varshney, and Thyagaraju Damarla. A parametric copula-based framework for hypothesis testing using heterogeneous data. IEEE Trans. Signal Process., 59(5):2308–2319, 2011.

[111] Piotr Jaworski, Fabrizio Durante, Wolfgang Karl H¨ardle, and Tomasz Rychlik. Cop- ula theory and its applications: proceedings of the workshop held in Warsaw, 25-26 September 2009, volume 198. Springer Science & Business Media, 2010.

[112] Shihao Ji, Ya Xue, and Lawrence Carin. Bayesian compressive sensing. IEEE Trans. Signal Process., 56(6):2346–2356, 2008.

[113] Harry Joe. Approximations to multivariate normal rectangle probabilities based on conditional expectations. Journal of the American Statistical Association, 90(431):957–964, 1995.

[114] Harry Joe. Multivariate models and multivariate dependence concepts, volume 73. CRC Press, 1997.

195 Bibliography

[115] Harry Joe. Dependence modeling with copulas. CRC Press, 2014.

[116] Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul. An introduction to variational methods for graphical models. Machine learning, 37(2):183–233, 1999.

[117] Li-Wei Kang and Chun-Shien Lu. Distributed compressive video sensing. In Acous- tics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Confer- ence on, pages 1169–1172. IEEE, 2009.

[118] Steven M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993.

[119] M Kendall and A Stuart. Handbook of statistics. Griffin & Company, London, 1979.

[120] M Amin Khajehnejad, Weiyu Xu, A Salman Avestimehr, and Babak Hassibi.

Weighted `1 minimization for sparse recovery with prior information. In IEEE Int. Symp. Inf. Theory (ISIT), pages 483–487. IEEE, 2009.

[121] George Kimeldori and Allan Sampson. Uniform representations of bivariate distri- butions. Communications in Statistics–Theory and Methods, 4(7):617–627, 1975.

[122] Vladimir Aleksandrovich Kotelnikov. On the carrying capacity of the ether and wire in telecommunications. In Material for the First All-Union Conference on Questions of Communication, Izd. Red. Upr. Svyazi RKKA, Moscow, volume 1, 1933.

[123] William H Kruskal. Ordinal measures of association. Journal of the American Sta- tistical Association, 53(284):814–861, 1958.

[124] Edmund Y Lam. Analysis of the DCT coefficient distributions for document coding. IEEE Signal Process. Lett., 11(2):97–100, 2004.

[125] Steffen L Lauritzen. Graphical models, volume 17. Clarendon Press, 1996.

[126] EL Lehmann and HD’Abera Nonparametrlcs. Statistical methods based on ranks holden-day. Boca Raton, 1975.

[127] Jialiang Li and Weng Kee Wong. Two-dimensional toxic dose and multivariate logis- tic regression, with application to decompression sickness. Biostatistics, 12(1):143– 155, 2011.

[128] Wei Liang, Xiaoling Zhang, Yang Xiao, Fuqiang Wang, Peng Zeng, and Haibin Yu. Survey and experiments of wia-pa specification of industrial wireless network. Wire- less Communications and Mobile Computing, 11(8):1197–1212, 2011.

196 [129] Bao Hua Liu, Brian Otis, Subhash Challa, Paul Axon, Chun Tung Chou, and Sanjay Jha. On the fading and shadowing effects for wireless sensor networks. In Int. Conf. Mobile Adhoc and Sensor Syst. (MASS), pages 51–60. IEEE, 2006.

[130] Chong Liu, Kui Wu, and Jian Pei. An energy-efficient data collection framework for wireless sensor networks by exploiting spatiotemporal correlation. IEEE Trans. Parallel and Distributed Syst., 18(7):1010–1023, 2007.

[131] Yu Liu, Xuqi Zhu, and Lin Zhang. Noise-resilient distributed compressed video sensing using side-information-based belief propagation. In IEEE Int. Conf. Network Infrastrastracture and Digital Content (IC-NIDC), pages 350–390. IEEE, 2012.

[132] Angelos D Liveris, Zixiang Xiong, and Costas N Georghiades. Compression of binary sources with side information at the decoder using LDPC codes. IEEE Commun. Lett., 6(10):440–442, 2002.

[133] Angelos D Liveris, Zixiang Xiong, and Costas N Georghiades. A distributed source coding technique for correlated images using turbo-codes. IEEE Communications Letters, 6(9):379–381, 2002.

[134] Chong Luo, Feng Wu, Jun Sun, and Chang Wen Chen. Efficient measurement gen- eration and pervasive sparsity for compressive data gathering. IEEE Trans. Wireless Commun., 9(12):3728–3738, 2010.

[135] David JC MacKay. Information theory, inference and learning algorithms. Cam- bridge University Press, 2003.

[136] S Madden. Intel berkeley research lab data, 2003.

[137] Jan R Magnus, Heinz Neudecker, et al. Matrix differential calculus with applications in statistics and econometrics. 1995.

[138] Michael W Marcellin and Thomas R Fischer. Trellis coded quantization of memory- less and Gauss-Markov sources. IEEE Trans. Commun., 38(1):82–93, 1990.

[139] Francesco Marcelloni and Massimo Vecchio. A simple algorithm for data compression in wireless sensor networks. IEEE Commun. Lett., 12(6):411–413, 2008.

[140] Kanti V Mardia. Multivariate pareto distributions. The Annals of Mathematical Statistics, pages 1008–1015, 1962.

[141] Albert W Marshall. Copulas, marginals, and joint distributions. Lecture Notes- Monograph Series, pages 213–222, 1996.

197 Bibliography

[142] Riccardo Masiero, Giorgio Quer, Daniele Munaretto, Michele Rossi, Joerg Widmer, and Michele Zorzi. Data acquisition through joint compressive sensing and principal component analysis. In IEEE Global Telecommun. Conf. (GLOBECOM), pages 1–6. IEEE, 2009.

[143] Frank J Massey Jr. The Kolmogorov-Smirnov test for goodness of fit. Amer. Statis- tical Assoc. J., 46(253):68–78, 1951.

[144] Michael B McCoy. A geometric analysis of convex demixing. PhD thesis, Citeseer, 2013.

[145] Michael B McCoy, Volkan Cevher, Quoc Tran Dinh, Afsaneh Asaei, and Leonetta Baldassarre. Convexity in source separation: Models, geometry, and algorithms. IEEE Sign. Process. Mag., 31(3):87–95, 2014.

[146] Michael B McCoy and Joel A Tropp. The achievable performance of convex demixing. arXiv:1309.7478, 2013.

[147] Michael B McCoy and Joel A Tropp. Sharp recovery bounds for convex demixing, with applications. Foundations of Computational Mathematics, 14(3):503–567, 2014.

[148] Alexander J. McNeil, Rdiger Frey, and Paul Embrechts. Quantitative risk manage- ment : concepts, techniques and tools. Princeton University Press, Princeton (N.J.), 2005.

[149] Alexander J McNeil and Johanna Neˇslehov´a. Multivariate archimedean copulas,

d-monotone functions and `1-norm symmetric distributions. Ann. Stat., pages 3059– 3097, 2009.

[150] Piotr Mikusinski, Howard Sherwood, and M Taylor. The fr´echet bounds revisited. Real Analysis Exchange, 17:759–764, 1992.

[151] Olivier Monnier. A smarter grid with the internet of things. Texas Instruments White Paper, 2013.

[152] Jo˜ao F. C. Mota, Nikos Deligiannis, and Miguel R. D. Rodrigues. Compressed sens- ing with prior information: Optimal strategies, geometry, and bounds. accepted for publication to IEEE Trans. Inf. Theory, 2017.

[153] Jo˜ao F. C. Mota, L. Weizman, Nikos Deligiannis, Yonina Eldar, and Miguel R.D. Rodrigues. Reference-based compressed sensing: A sample complexity approach. In IEEE Int. Conf. Acoust., Speech and Signal Process. (ICASSP), 2016.

198 [154] Jo˜ao FC Mota, Nikos Deligiannis, and Miguel RD Rodrigues. Compressed sensing with side information: Geometrical interpretation and performance bounds. In IEEE Global Conf. Signal and Inform. Process. (GlobalSIP), pages 512–516, 2014.

[155] Subhas Chandra Mukhopadhyay and NK Suryadevara. Internet of things: Challenges and opportunities. In Internet of Things, pages 1–17. Springer, 2014.

[156] Jared S Murray, David B Dunson, Lawrence Carin, and Joseph E Lucas. Bayesian Gaussian copula factor models for mixed data. Journal of the American Statistical Association, 108(502):656–665, 2013.

[157] Deanna Needell and Joel A Tropp. Cosamp: Iterative signal recovery from incomplete and inaccurate samples. Appl. Compututational Harmonic Anal., 26(3):301–321, 2009.

[158] Roger B. Nelsen. An Introduction to Copulas (Springer Series in Statistics). Springer- Verlag New York, Inc., Secaucus, NJ, USA, 2006.

[159] Aristidis K Nikoloulopoulos and Dimitris Karlis. Multivariate logit copula model with an application to dental data. Statistics in Medicine, 27(30):6393–6406, 2008.

[160] Aristidis K Nikoloulopoulos and Dimitris Karlis. Finite normal mixture copulas for multivariate discrete data modeling. Journal of Statistical Planning and Inference, 139(11):3878–3890, 2009.

[161] Hohsuk Noh, Anouar El Ghouch, and Taoufik Bouezmarni. Copula-based regres- sion estimation and inference. Journal of the American Statistical Association, 108(502):676–688, 2013.

[162] Harry Nyquist. Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers, 47(2):617–644, 1928.

[163] David Oakes. A model for association in bivariate survival data. Journal of the Royal Statistical Society. Series B (Methodological), pages 414–422, 1982.

[164] Dong Hwan Oh and Andrew J Patton. Modeling dependence in high dimensions with factor copulas. Journal of Business & Economic Statistics, 35(1):139–154, 2017.

[165] Ostap Okhrin, Yarema Okhrin, and Wolfgang Schmid. Properties of hierarchical archimedean copulas. Statistics & Risk Modeling with Applications in Finance and Insurance, 30(1):21–54, 2013.

[166] Tatsuyoshi Okimoto. New evidence of asymmetric dependence structures in interna- tional equity markets. Journal of financial and quantitative analysis, 43(03):787–815, 2008.

199 Bibliography

[167] Frank Oldewurtel, Marcin Foks, and P Mahonen. On a practical distributed source coding scheme for wireless sensor networks. In Proc. IEEE Veh. Technol. Conf. (VTC Spring), pages 228–232. IEEE, 2008.

[168] Samet Oymak, M Amin Khajehnejad, and Babak Hassibi. Recovery threshold for

optimal weight `1 minimization. In IEEE Int. Symp. Inf. Theory (ISIT), pages 2032–2036. IEEE, 2012.

[169] Anastasios Panagiotelis, Claudia Czado, and Harry Joe. Pair copula constructions for multivariate discrete data. Journal of the American Statistical Association, 107(499):1063–1072, 2012.

[170] Rahul A Parsa and Stuart A Klugman. Copula regression. Variance Advancing and Science of Risk, 5:45–54, 2011.

[171] Michael Pitt, David Chan, and Robert Kohn. Efficient bayesian inference for gaussian copula regression models. Biometrika, pages 537–554, 2006.

[172] S Sandeep Pradhan and Kannan Ramchandran. Distributed source coding using syndromes (DISCUS): Design and construction. IEEE Trans. Inf. Theory, 49(3):626– 643, 2003.

[173] Marco Pretti. A message-passing algorithm with damping. Journal of Statistical Mechanics: Theory and Experiment, 2005(11):P11008, 2005.

[174] John G Proakis and Masoud Salehi. Communication systems engineering, 2002.

[175] Sundeep Rangan. Estimation with random linear mixing, belief propagation and compressed sensing. In Information Sciences and Systems (CISS), 2010 44th Annual Conference on, pages 1–6. IEEE, 2010.

[176] Sundeep Rangan. Generalized approximate message passing for estimation with random linear mixing. In Information Theory Proceedings (ISIT), 2011 IEEE Inter- national Symposium on, pages 2168–2172. IEEE, 2011.

[177] Shalli Rani and Syed Hassan Ahmed. Multi-hop Routing in Wireless Sensor Net- works: An Overview, Taxonomy, and Research Challenges. Springer, 2015.

[178] Francesco Renna, Liming Wang, Xin Yuan, Jianbo Yang, Galen Reeves, Robert Calderbank, Lawrence Carin, and Miguel RD Rodrigues. Classification and recon- struction of compressed gmm signals with side information. In Information Theory (ISIT), 2015 IEEE International Symposium on, pages 994–998. IEEE, 2015.

200 [179] Francesco Renna, Liming Wang, Xin Yuan, Jianbo Yang, Galen Reeves, Robert Calderbank, Lawrence Carin, and Miguel Rodrigues. Classification and reconstruc- tion of high-dimensional signals from low-dimensional noisy features in the presence of side information. [Online]. Available: http: // arxiv. org/ abs/ 1412. 0614 , 2014.

[180] Dragos Ioan Sacaleanu, Rodica Stoian, Dragos Mihai Ofrim, and Nikos Deligiannis. Compression scheme for increasing the lifetime of wireless intelligent sensor networks. In Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pages 709–713. IEEE, 2012.

[181] Irving De Lira Salvatierra and Andrew J Patton. Dynamic copula models and high frequency data. Journal of Empirical Finance, 30:120–135, 2015.

[182] Cornelia Savu and Mark Trede. Hierarchies of archimedean copulas. Quantitative Finance, 10(3):295–304, 2010.

[183] Olivier Scaillet and Jean-David Fermanian. Nonparametric estimation of copulas for time series. Journal of Risk, 5:25–54, 2003.

[184] Jonathan Scarlett, Jamie S Evans, and Subhrakanti Dey. Compressed sensing with prior information: Information-theoretic limits and practical decoders. IEEE Trans. Signal Process., 61(2):427–439, 2013.

[185] M Scarsini. On measures of concordance. stochastica 8 201–218. Mathematical Re- views (MathSciNet): MR796650 Zentralblatt MATH, 582, 1984.

[186] Berthold Schweizer and Abe Sklar. Probabilistic metric spaces. Courier Corporation, 2011.

[187] Berthold Schweizer and Edward F Wolff. On nonparametric measures of dependence for random variables. The annals of statistics, pages 879–885, 1981.

[188] David W Scott. Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, 2015.

[189] Claude Elwood Shannon. Communication in the presence of noise. Proceedings of the IRE, 37(1):10–21, 1949.

[190] Simon J Sheather and Michael C Jones. A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B (Methodological), pages 683–690, 1991.

[191] Hideaki Shimazaki and Shigeru Shinomoto. Kernel bandwidth optimization in spike rate estimation. Journal of computational neuroscience, 29(1-2):171–182, 2010.

201 Bibliography

[192] Drew Shindell, Johan CI Kuylenstierna, Elisabetta Vignati, Rita van Dingenen, Markus Amann, Zbigniew Klimont, Susan C Anenberg, Nicholas Muller, Greet Janssens-Maenhout, Frank Raes, et al. Simultaneously mitigating near-term climate change and improving human health and food security. Science, 335(6065):183–189, 2012.

[193] M Sklar. Fonctions de r´epartition `an dimensions et leurs marges. Un. Paris 8, 1959.

[194] D. Slepian and J. K. Wolf. Noiseless coding of correlated information sources. IEEE Transactions on Information Theory, 19(4):471–480, Apr. 1973.

[195] David Slepian and Jack K Wolf. Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory, 19(4):471–480, 1973.

[196] George Smart, Nikos Deligiannis, Rosario Surace, Valeria Loscri, Giancarlo Fortino, and Y Andreopoulos. Decentralized time-synchronized channel swapping for ad hoc wireless networks. IEEE Trans. Veh. Technol., 2016.

[197] Michael S Smith, Quan Gan, and Robert J Kohn. Modelling dependence using skew t copulas: Bayesian inference and applications. Journal of Applied Econometrics, 27(3):500–522, 2012.

[198] Subhojit Som and Philip Schniter. Compressive imaging using approximate message passing and a markov-tree prior. IEEE transactions on signal processing, 60(7):3439– 3448, 2012.

[199] Vladimir Stankovic, Angelos D Liveris, Zixiang Xiong, and Costas N Georghiades. On code design for the Slepian-Wolf problem and lossless multiterminal networks. IEEE Trans. Inf. Theory, 52(4):1495–1507, 2006.

[200] Vladimir Stankovi´c, Lina Stankovi´c, Shuang Wang, and Samuel Cheng. Distributed compression for condition monitoring of wind farms. IEEE Trans. Sustainable En- ergy, 4(1):174–181, 2013.

[201] Vladimir Stankovic, Lina Stankovic, Shuang Wang, and Samuel Cheng. Distributed compression for condition monitoring of wind farms. IEEE Trans. Sustain. Energy, 4(1):174–181, 2013.

[202] Dag J Steinskog, Dag B Tjøstheim, and Nils G Kvamstø. A cautionary note on the use of the kolmogorov–smirnov test for normality. Monthly Weather Review, 135(3):1151–1157, 2007.

[203] Gilbert Strang. Introduction to linear algebra. 2011.

202 [204] Sujesha Sudevalayam and Purushottam Kulkarni. Energy harvesting sensor nodes: Survey and implications. IEEE Communications Surveys & Tutorials, 13(3):443–461, 2011.

[205] Koiti Takahasi. Note on the multivariate burrs distribution. Annals of the Institute of Statistical Mathematics, 17(1):257–260, 1965.

[206] D Taubman and MW Marcelin. JPEG 2000: image compression fundamentals, prac- tices and standards, 2001.

[207] George R Terrell and David W Scott. Variable kernel density estimation. The Annals of Statistics, pages 1236–1265, 1992.

[208] Andrew Thomas, David J Spiegelhalter, and WR Gilks. Bugs: A program to perform bayesian inference using gibbs sampling. Bayesian statistics, 4(9):837–842, 1992.

[209] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.

[210] Maria Trocan, Thomas Maugey, James E Fowler, and B´eatrice Pesquet-Popescu. Disparity-compensated compressed-sensing reconstruction for multiview images. In IEEE Int. Conf. Multimedia and Expo (ICME), pages 1225–1229, 2010.

[211] Joel Tropp and Anna C Gilbert. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory, 53(12):4655–4666, 2007.

[212] David Tse and Pramod Viswanath. Fundamentals of wireless communication. Cam- bridge University Press, 2005.

[213] Sui-Yin Tung. Multiterminal source coding. PhD thesis, Cornell University, May, 1978.

[214] Cristiano Varin, Nancy Reid, and David Firth. An overview of composite likelihood methods. Statistica Sinica, pages 5–42, 2011.

[215] David Varodayan, Anne Aaron, and Bernd Girod. Rate-adaptive codes for dis- tributed source coding. Signal Processing, 86(11):3123–3130, 2006.

[216] Namrata Vaswani and Wei Lu. Modified-CS: Modifying compressive sensing for problems with partially known support. IEEE Trans. Signal Process., 58(9), 2010.

[217] Massimo Vecchio, Raffaele Giaffreda, and Francesco Marcelloni. Adaptive lossless en- tropy compressors for tiny IoT devices. IEEE Trans. Wireless Commun., 13(2):1088– 1100, 2014.

203 Bibliography

[218] Ovidiu Vermesan and Peter Friess. Internet of things-from research and innovation to market deployment. River Publishers Aalborg, 2014.

[219] Matt P Wand and M Chris Jones. Kernel smoothing. Crc Press, 1994.

[220] Jia Wang, Jun Chen, and Xiaolin Wu. On the sum rate of Gaussian multiterminal source coding: New proofs and results. IEEE Trans. Inf. Theory, 56(8):3946–3960, 2010.

[221] Brett A Warneke and Kristofer SJ Pister. Mems for distributed wireless sensor networks. In Electronics, Circuits and Systems, 2002. 9th International Conference on, volume 1, pages 291–294. IEEE, 2002.

[222] Edmund Taylor Whittaker. Xviii.on the functions which are represented by the expansions of the interpolation-theory. Proceedings of the Royal Society of Edinburgh, 35:181–194, 1915.

[223] Christopher KI Williams. Regression with gaussian processes. In Mathematics of Neural Networks, pages 378–382. Springer, 1997.

[224] John Winn and Christopher M Bishop. Variational message passing. Journal of Machine Learning Research, 6(Apr):661–694, 2005.

[225] Ian H Witten, Radford M Neal, and John G Cleary. Arithmetic coding for data compression. Commun. ACM, 30(6):520–540, 1987.

[226] A. D. Wyner and J. Ziv. The rate-distortion function for source coding with side information at the decoder. IEEE Transactions on Information Theory, 22(1):1–10, Jan. 1976.

[227] Aaron D Wyner and Jacob Ziv. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory, 22(1):1–10, 1976.

[228] Zixiang Xiong, Angelos D Liveris, and Samuel Cheng. Distributed source coding for sensor networks. IEEE Signal Process. Mag., 21(5):80–94, 2004.

[229] Qian Xu, Vladimir Stankovic, and Zixiang Xiong. Distributed joint source-channel coding of video using raptor codes. IEEE J. Sel. Areas Commun., 25(4):851–861, 2007.

[230] Yang Yang, Vladimir Stankovic, Zixiang Xiong, and Wei Zhao. On multiterminal source code design. IEEE Trans. Inf. Theory, 54(5):2278–2302, 2008.

[231] Yang Yang and Zixiang Xiong. Distributed source coding without Slepian-Wolf compression. In Proc. IEEE Int. Symp. Inf. Theory (ISIT), pages 884–888. IEEE, 2009.

204 [232] Wei Ying Yi, Kin Ming Lo, Terrence Mak, Kwong Sak Leung, Yee Leung, and Mei Ling Meng. A survey of wireless sensor network based air pollution monitoring systems. Sensors, 15(12):31392–31427, 2015.

[233] Evangelos Zimos, Jo˜ao F. C. Mota, Miguel R. D. Rodrigues, and Nikos Deligiannis. Bayesian compressed sensing with heterogeneous side information. In IEEE Data Compression Conference, 2016.

[234] Evangelos Zimos, Joao FC Mota, Miguel RD Rodrigues, and Nikos Deligiannis. Internet-of-things data aggregation using compressed sensing with side information. In Int. Conf. Telecomm.(ICT). IEEE, 2016.

[235] Evangelos Zimos, Dimitris Toumpakaris, Adrian Munteanu, and Nikos Deligiannis. Multiterminal source coding with copula regression for wireless sensor networks gath- ering diverse data. IEEE Sensors Journal, 2016.

[236] Justin Ziniel, Philip Schniter, and Per Sederberg. Binary linear classification and fea- ture selection via generalized approximate message passing. In Information Sciences and Systems (CISS), 2014 48th Annual Conference on, pages 1–6. IEEE, 2014.

205