<<

Information Theory for the Analysis of Large Spatio-Temporal Datasets

Yannick Allard Michel Mayrand Dan Radulescu

Prepared by: OODA Technologies Inc. 4580 Circle Rd. Montréal (Qc), H3W 1Y7

PWGSC Contract Number: W7707-145677 Technical Authority: Bruce McArthur, Defence Scientist

The scientific or technical validity of this Contract Report is entirely the responsibility of the contractor and the contents do not necessarily have the approval or endorsement of the Department of National Defence of Canada.

Contract Report DRDC-RDDC-2017-C051 February 2017

Template in use: (2010) SR Advanced Template_EN (051115).dotm

© Her Majesty the Queen in Right of Canada, as represented by the Minister of National Defence, 2017 © Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale, 2017

Information Theory for the Analysis of Large Spatio-Temporal Datasets

B316<=:=573A

Yannick Allard Michel Mayrand Dan Radulescu

Prepared By: OODA Technologies Inc. 4580 Circle Rd. Montr´eal(Qc), H3W 1Y7 514-476-4773

Prepared For: Defence Research & Development Canada, Atlantic Research Centre 9 Grove Street, PO Box 1012 Dartmouth, NS B2Y 3Z7 902-426-3100 ext.359

Scientific Authority: Bruce McArthur, Defence Scientist Contract Number: W7707-145677 Call Up Number: 14 Project: Vessel Traffic Analysis of Large Spatio-Temporal Datasets Report Delivery Date: February 17, 2017

The scientific or technical validity of this Contract Report is entirely the responsibility of the contractor and the contents do not necessarily have the approval or endorsement of the Department of National Defence of Canada.

© Her Majesty the Queen in Right of Canada, as represented by the Minister of National Defence, 2017.

© Sa Majest´ela Reine (en droit du Canada), telle que repr´esent´eepar le ministre de la D´efensenationale, 2017. This page is intentionally left blank. Executive Summary

One important aspect of Maritime Domain Awareness (MDA) is the aggregation of data and information to help formulate an accurate description of vessel activity, with the Automatic Iden- tification System (AIS) being a key data source. However, the introduction of AIS, together with other sources of MDA-related data, has also posed a challenge, in that there is an overabundance of data, making manual analysis prohibitively time-consuming. In response, there has been a growing interest in the application of techniques for automated or semi-automated analysis of vessel traffic from large volume datasets. These techniques draw from research areas such as: data mining – the computational process of discovering patterns in large data sets; spatio-temporal trajectory anal- ysis; information theory techniques and visual analytics, analytical reasoning supported by highly interactive visual interfaces. All of these approaches have been used in support of MDA-related capabilities, which include anomaly detection, traffic route extraction, area of interest analysis, collision risk analysis, vessels of interest analysis, vessel tracking and data fusion. In this study, we focus our investigation on the area of information theory, which has so far seen little overlap with the maritime domain, despite having significant potential.

In this report, we present standard measures and techniques in information theory, and its appli- cation in large spatio-temporal datasets, in particular, maritime AIS datasets. By its very nature, information theory should be able to handle problems specific to AIS data, as well as problems that arise when performing spatio-temporal data mining based on the conducted literature survey and current challenges in MDA. A demonstration of a possible application over a one-month dataset has been implemented.

The and spatial diversity measures were applied as local measures at multiple scales and on selected attributes within Canada’s Exclusive Economic Zone. Its potential was assessed through visual inspection. It was highlighted that transit zones, or shipping routes, exhibit fairly stable characteristics resulting in lower entropic and spatial diversity intensity compared to areas where multiple activities occur.

Temporal behaviour was not investigated, however, results suggest that the implementation of spatio-temporal scale selection process should lead to the extraction of meaningful maritime pro- cesses over large datasets. Where processing is concerned, one should consider implementing an information-theoretic measure within a framework designed for large-scale cloud computing to speed up information extraction, even in an exploratory study.

In conclusion, it was shown that information-theoretic measures have the potential to be used over large maritime datasets in a data mining context, and that this potential remains largely untapped based on the available literature.

i OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

This page is intentionally left blank.

ii

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Contents

Executive Summary i

Contents iii

List of Figures vii

List of Tables ix

1 Introduction 1

2 Methodology 3

2.1 Summary of the Literature Survey ...... 3

2.2 Summary of Search Terms ...... 4

3 AIS Data 7

3.1 AIS Data and Information Product Sources ...... 7

3.2 AIS Capabilities and MDA Usage ...... 8

3.3 Problems with AIS Data ...... 9

3.4 Other Maritime Data Sources ...... 11

3.4.1 Earth Observation Data and Information Product Sources ...... 11

3.4.2 Earth Observation Capabilities and MDA Usage ...... 12

3.4.3 Other Public Information Products ...... 12

3.5 Spatio-Temporal and AIS Data Mining ...... 14

iii Study Report RISOMIA Call-up 14

4 Information Theory Basic Concepts 17

4.1 Information-Theoretic Measures ...... 17

4.1.1 Entropy ...... 17

4.1.1.1 Entropy ...... 17

4.1.1.2 R´enyi Entropy ...... 18

4.1.2 Conditional Entropy ...... 18

4.1.3 ...... 19

4.1.4 Kullback-Leibler Divergence ...... 19

4.1.5 Self-Information or Surprisal ...... 19

4.2 Spatio-Temporal Extension of Information Theory ...... 20

4.2.1 Co-occurrence-based Spatial Entropy ...... 20

4.2.2 Distance Ratios Spatial Diversity ...... 21

4.2.3 Markov Random Field-based Spatial Entropy ...... 21

4.3 Summary ...... 22

5 Applications of Information Theory 25

5.1 Maritime Domain Applications ...... 25

5.1.1 Pattern Discovery in Maritime Data ...... 25

5.1.2 Measures of Diversity Based on Distance and Co-occurrence ...... 26

5.1.3 Vessel Imaging ...... 27

5.1.4 Feature Extraction and Categorization ...... 28

5.1.5 Vessel Report Quality Assessment ...... 28

5.2 Non-Maritime Domain Applications ...... 28

5.2.1 Psychology ...... 28

5.2.2 Image Processing ...... 29

5.2.3 Information Theory for KDD ...... 31

5.2.4 Decision Trees ...... 31

5.2.5 Entropy Based Time Series Analysis in Biomedical Applications ...... 33

iv

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. CONTENTS

5.3 Application in Visualization and Interactive Analysis ...... 33

5.4 Summary ...... 39

6 Proof-of-Concept Demonstration 41

6.1 Potential Applications of Information Theory for Maritime Domain Awareness . . 42

6.1.1 Global and Local Entropy Computation ...... 43

6.1.2 Co-occurrence-based Spatial Entropy for Pattern Extraction ...... 44

6.1.3 Spatial Diversity Applied to Vessel/Data Attributes Using Distance Ratios 45

6.1.4 Other Avenues and Considerations ...... 45

6.1.5 Space-time Scale Selection for Spatio-Temporal Analysis ...... 46

6.1.5.1 Multigrid Methods ...... 46

6.1.5.2 Space-time Permutation Scan Statistic ...... 47

6.1.5.3 Scale-Space Analysis ...... 48

6.1.5.4 Number of Empty Grid Cells for Spatial Entropy ...... 48

6.1.5.5 Entropy-based Scale Saliency ...... 49

6.1.6 Summary of the Proposed Proof-of-Concept Demonstration ...... 51

6.2 Application of Information-Theoretic Measures over a Large Maritime Dataset . . 51

6.2.1 Description of the Dataset ...... 52

6.2.2 Global Entropy of the Dataset ...... 52

6.2.3 Multiscale Analysis and Measures Comparison ...... 54

7 Conclusion 79

Bibliography 81

Appendix A Conferences and books A-1

A.1 Conferences ...... A-1

A.2 Books ...... A-1

Appendix B Administration and User Guide B-3

v OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

B.1 Administration guide ...... B-4

B.1.1 Requirements ...... B-4

B.1.2 Quick Installation Guide Using Docker ...... B-4

B.2 User guide ...... B-6

B.2.1 Creating a MSARI database from raw AIS compressed files ...... B-6

B.2.2 Java script ...... B-8

B.2.3 SQL Scripts ...... B-10

B.2.4 Bash Script ...... B-11

vi

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. List of Figures

4.1 Venn diagram representing the different and the mutual information. . . 19

4.2 Two neighborhoods of the same equivalence class (4-2-1-1), as the actual values of the neighbours are not considered, only their numbers. Source: reference [76]. . . . 22

6.1 Multi-level application of information theory for analysis of large spatio-temporal datasets ...... 42

6.2 A multigrid Example ...... 47

6.3 Entropy of the course over ground attributes over multiple scales ...... 55

6.4 Entropy of the speed attribute over multiple scales ...... 56

6.5 Entropy of the ship type attribute over multiple scales ...... 57

6.6 Spatial diversity of the course over ground attributes over multiple scales . . . . . 58

6.7 Spatial diversity of the speed attribute over multiple scales ...... 59

6.8 Spatial diversity of the ship type attribute over multiple scales ...... 60

6.9 Entropy and spatial diversity of the course over ground attributes at 0.5 and 0.1 degrees ...... 61

6.10 Entropy and spatial diversity of the speed attribute at 0.5 and 0.1 degrees . . . . . 62

6.11 Entropy and spatial diversity of the ship type attribute at 0.5 and 0.1 degrees . . . 63

6.12 Ferry Route map (from http://newfoundland.hilwin.nl/PHP/en/gettingthere. php )...... 65

6.13 Entropy and spatial diversity of the course over ground attributes at 0.5, 0.1 and 0.05 degrees ...... 66

6.14 Entropy and spatial diversity of the speed attribute at 0.5, 0.1 and 0.05 degrees . . 67

6.15 Entropy of the course over ground attribute at 2.0 degree ...... 68

vii Study Report RISOMIA Call-up 14

6.16 Entropy of the speed attribute at 2.0 degree ...... 69

6.17 Entropy of the ship type attribute at 2.0 degree ...... 70

6.18 Entropy of the course over ground attribute at 1.0 degree ...... 71

6.19 Entropy of the speed attribute at 1.0 degree ...... 72

6.20 Entropy of the ship type attribute at 1.0 degree ...... 73

6.21 Entropy of the course over ground attribute at 0.5 degree ...... 74

6.22 Entropy of the speed attribute at 0.5 degree ...... 75

6.23 Entropy of the ship type attribute at 0.5 degree ...... 76

viii

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. LIST OF TABLES

List of Tables

2.1 MDA and bibliography reference. The references in bold face use information- theoretic measures...... 4

2.2 Search Keywords ...... 5

3.1 Potential of coastal AIS and S-AIS for vessel tracking (based on [36] and [37]). . . 10

3.2 Public data sources and information product capabilities...... 13

3.2 Public data sources and information product capabilities...... 14

5.1 Information theory concepts as they relate to visualization. Excerpt from Table 1 in Chen and J¨anicke article ([94])...... 37

5.1 Information theory concepts as they relate to visualization. Excerpt from Table 1 in Chen and J¨anicke article ([94])...... 38

6.1 Normalized Entropy, or relative entropy, for different attributes of an AIS message for the complete dataset ...... 53

ix OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

This page is intentionally left blank.

x

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. List of Acronyms

AIS Automatic Identification System aMCI amnestic mild cognitive impairment

AOI Area Of Interest

COG Course Over Ground

DRDC Defence Research and Development Canada

EO Earth Observation

ETA Estimated Time of Arrival

ID3 Iterative Dichotomiser 3

IMO International Maritime Organization

IR infrared

KDD Knowledge Discovery and Data Mining

LWIR Long Wavelength IR

MDA Maritime Domain Awareness

MMSPE Multivariate Multiscale Permutation Entropy

MMSI Maritime Mobile Service Identity

MSARI Maritime Situational Awareness Research Infrastructure

MSSIS Maritime Safety and Security Information System

PE Permutation Entropy

RCM RADARSAT Constellation Mission

RE Relative Entropy

ROT Rate of Turn

RS2 Radarsat 2

xi Study Report RISOMIA Call-up 14

S-AIS Space Automatic Identification System

SAR Synthetic Aperture Radar

SOG Speed Over Ground

SRRE Square Root Relative Entropy

UAV Unmanned Aerial Vehicle

VOI Vessel of Interest

xii

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 1

Introduction

In recent years, Automatic Identification System (AIS) has become the primary source of infor- mation for building a comprehensive and timely Maritime Domain Awareness (MDA) [1, 2]. How- ever, the introduction of AIS, together with other sources of MDA-related data, has also posed a challenge, in that there is an overabundance of data, making manual analysis, at minimum, time-consuming, if not completely infeasible [3, 4].

In response, there has been a growing interest in the application of techniques for automated or semi-automated analysis of vessel traffic from large volume datasets. These techniques draw from research areas such as: data mining – the computational process of discovering patterns in large data sets; spatio-temporal trajectory analysis; and visual analytics, analytical reasoning supported by highly interactive visual interfaces. In turn, vessel traffic analysis has been used in support of MDA-related capabilities that include anomaly detection, traffic route extraction, area of interest analysis, collision risk analysis, Vessel of Interest (VOI) analysis, vessel tracking and data fusion.

Applying analysis techniques to large sets of maritime traffic data in order to extract knowledge will facilitate vessel traffic analysis and management for maritime analysts, and improve decision- making in the maritime domain.

Information theory is the mathematical study of information encoding. This field defines some quantity of information measures, such as entropy, that can be generalized or applied to other types of information besides encoded communications. Information theory has been successfully applied in many fields. Since maritime traffic data is part of the wide set of spatial-temporal data, the assessment of possibilities and performance of information-theoretic measures is therefore extremely relevant. This project aims to assess the potential of information theory, specifically its spatio-temporal extension, for the analysis of large maritime datasets.

The objectives of this study are first to provide a survey of standard measures and techniques in information theory and their current applications in the maritime domain, as well as other fields. This study focuses on the following main context of application:

1. Vessel traffic analysis methods that are applicable to large spatio-temporal datasets, with a focus on AIS data;

1 Study Report RISOMIA Call-up 14

2. Extraction of vessel traffic characteristics (spatial and temporal) from large datasets.

In addition to the literature survey, the performance of information-theoretic metrics over a large spatio-temporal dataset of maritime surveillance has been assessed via the development of a proof- of-concept demonstration.

This study can be seen as a complement of the 2013 Defence Research and Development Canada (DRDC) study entitled Information Mining Technologies to Enable Discovery of Actionable Intelli- gence to Facilitate Maritime Situational Awareness [5], which focuses on the evaluation of existing data mining software performance with regards to maritime data. As reported, the most chal- lenging issues in maritime traffic data mining prove to be the volume of data and spatio-temporal characteristics. These should be the decision-making factors in the selection (or development) of the appropriate data mining tool for this kind of data [5].

This document presents the findings and results of an information theory study related to the maritime domain, and is organized as follows:

• Section 2 gives an overview of the methodology and results of the literature survey.

• Section 3 presents the characteristics and MDA usage of AIS data as well as the comple- mentary data sources that can additionally be used to build a comprehensive and timely MDA.

• Section 4 provides an introduction to information theory, its scope, as well as a review of the major measures and algorithms related to the field.

• Section 5 examines the findings of the literature survey, classifies them based on problem domain, and elaborates on their link to information theory.

• Section 6 proposes different avenues for the application of information theory on a large maritime AIS dataset, and discusses the implementation of a proof-of-concept demonstration of entropy measures using a one month duration AIS dataset.

• Section 7 summarizes the information learned during the implementation of this call-up, provides recommendations for further research, and serves as the general conclusion to this document.

2

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 2

Methodology

This chapter presents a brief summary of the search methodology, as well as bibliography statistics, and is organized as follows:

• Section 2.2 presents the list of search keywords;

• Section 2.1 provides an overview of the resulting bibliography.

2.1 Summary of the Literature Survey

The resulting bibliography includes about 121 references. These can be clustered into reference type classes approximately as follows:

• 4 books;

• 42 articles in technical journals;

• 29 conference papers;

• 18 theses;

• 11 conferences (set of slides);

• 12 technical reports;

• 1 conference proceedings (SAGEO 2013);

• 1 article in books;

• 3 miscellaneous;

The literature survey highlights that little research has been conducted in relation to the maritime domain. Table 2.1 presents the bibliography directly related to the maritime domain.

3 Study Report RISOMIA Call-up 14

Table 2.1: MDA and bibliography reference. The references in bold face use information-theoretic measures.

MDA interests Related publications

Situational awareness [6], [7], [8], [9], [10]

Vessel track analysis, Trajectory anomaly detection [11], [12], [13]

Vessel detection / recognition [14]

AIS reliability, AIS anomaly detection [15], [16], [17], [18], [19], [20], [21]

Collision prevention [22], [23]

Visualization analytics [24], [25], [26], [27], [28], [29]

Classification [30]

2.2 Summary of Search Terms

The literature survey was mainly performed by searching open literature on the web. Google was used as the search engine. Table 2.2 provides the list of search keywords used to query the Google search engine, and their justification.

4

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 2. Methodology

Table 2.2: Search Keywords

Keyword Category Search Keywords Justification

Maritime ’Vessel traffic’, ’Sea traffic’, ’Maritime domain Keywords pertaining to the maritime domain awareness’ / ’MDA’, ’Automatic Identification are important to ground the context of the System’ / ’AIS’ search to DRDC’s sphere of interest.

Information Theory ’Information theory’, ’Entropy’, ’Shannon’, When searching for information theory appli- ’Information-theoretic’, ’Kullback-Leiber’, cations, it is often more efficient to look for ’Relative entropy’, ’Information gain’, ’Mutual references to specific quantity names. information’, ’Joint entropy’, ’Conditional entropy’, ’Sample entropy’, ’Permutation entropy’

Known fields of appli- ’Physics’, ’Gaming’, ’Biology’, ’Psychology’, Fields of study familiar to the research team, cation ’Economics’, ’Cosmology’, ’Computing’, ’Sig- known to make use of information-theoretic nal Processing’, ’Communication’ , ’Cryptog- techniques. raphy’

Other technical terms ’Image registration’, ’Decision Trees’, ’spatial Technical terms related to the characteristics analysis’, ’spatio-temporal’,’ID3’ of the dataset (spatio-temporal) and some of the algorithms and analysis techniques related to these types of datasets.

5 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

This page is intentionally left blank.

6

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 3

AIS Data

Ships equipped with an AIS transceiver broadcast messages to other vessels, shore-based stations and satellites equipped with AIS receivers. These messages can provide information about the ship’s identity, voyage and navigational status, among other things. As they are broadcast using a non-secure channel, the messages can be gathered by anyone with AIS receivers.

This chapter is organized as follows:

• Section 3.1 gives a brief description of AIS, along with the major data providers;

• Section 3.2 presents the capabilities and MDA usage of AIS-derived information;

• Section 3.3 highlights problems related to the usage of AIS data in an operational context.

• Section 3.4 provides an overview of some other data sources used to build a comprehensive MDA.

• Finally, section 3.5 presents relevant results from a previous investigation on AIS spatio- temporal data mining.

3.1 AIS Data and Information Product Sources

Since 2002, it is required that all international voyaging ships with gross tonnage of 300 or more, all cargo ships with 500 gross tonnage or more (not necessarily engaged on international voyages), and all passenger ships regardless of size be equipped with an AIS transceiver. Although the main purpose of AIS is collision avoidance, a diverse array of government and private organizations worldwide pursue the expanded use of the AIS for security purposes. These organizations include port authorities, police, coast guard, border security, and vessel owners. As a result, an extensive community is developing with a shared interest in AIS data.

Two kinds of AIS reception are available: terrestrial, usually called coastal AIS, and satellite or space-based, named Space Automatic Identification System (S-AIS). The first type refers to AIS

7 Study Report RISOMIA Call-up 14 messages gathered by receivers located on the coast or on board ships, with a range limited by Earth curvature. The latter refers to AIS messages collected by receivers installed on constellations of satellites, which have a virtually global coverage. Coastal AIS and S-AIS can be gathered from commercial and governmental entities owning networks of receivers.

Perhaps the most popular source of coastal AIS is Maritime Safety and Security Information System (MSSIS), a low-cost, unclassified, near real-time AIS data collection and distribution network, first developed by Volpe for the U.S. Navy. From [31]: “more than 70 countries have joined the MSSIS network”, making it a suitable one-stop source for streaming global coastal AIS data.

As for S-AIS, the main sources are commercial:

1 • exactEarth : A Canadian company offering a range of products all based on the S-AIS data they gather. Products include S-AIS feed and archives, a web-based viewing tool and density maps. According to [32], exactEarth provided the space-based AIS which is stored in DRDC- Atlantic Maritime Situational Awareness Research Infrastructure (MSARI) database.

2 • ORBCOMM/Skywave : Since early 2015, ORBCOMM owns the Canadian company Sky- wave Mobile Communications, which is the main competitor of exactEarth in the Canadian market [33].

3 • Spire : The above sources are typically small networks of large satellites. Spire capitalizes on an emerging complementary tendency, which is to deploy a large network of small satellites (CubeSat4 type).

3.2 AIS Capabilities and MDA Usage

Both types of AIS data, coastal and space-based, are very similar. Indeed, an AIS transceiver will broadcast information regardless of its location (open sea, close to the coast or inland) and thus regardless the types of receivers collecting the information.

Among the types of transceivers, class A equips ships that are legally required to use AIS, and class B is for ships that are not mandatory but wish to broadcast their information. Class B systems are simpler, offering limited capabilities, and less expensive.

There are 27 AIS message types, covering the wide range of all possible communications. The most widely used types of class A are 1, 3 and 5 ([34]):

Message type 1: Scheduled position report.

Message type 3: Special position report, response to interrogation.

Message type 5: Scheduled static and voyage related vessel data report.

1http://www.exactearth.com/products/exactais 2http://www.orbcomm.com/en/networks/satellite-ais 3https://spire.com/products/sense/ 4http://www.cubesat.org/

8

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 3. AIS Data

A ship’s identity is usually described with a Maritime Mobile Service Identity (MMSI) number, a call sign, an International Maritime Organization (IMO) number (not mandatory for all types of vessels), a name, a type (e.g., fishing, tug, etc.) and dimensions. Voyage information includes an Estimated Time of Arrival (ETA) and a destination, while position information includes a position, a Speed Over Ground (SOG), a Rate of Turn (ROT), a Course Over Ground (COG), a heading and a navigation status. Class B message types (18, 19 and 25) offer respectively similar information to class A types 1, 3 and 5.

The information is broadcast every 2 to 10 seconds while moving, depending on the speed of the vessel, and every 3 minutes while at anchor [35].

Part of the information contained in AIS messages is filled automatically using onboard equipment. This information is mostly position related: latitude, longitude, timestamps, SOG, etc. The remaining information contained in AIS messages are filled manually by the crew, either at system initialization for static data (e.g., MMSI, IMO number) or at the beginning of every new voyage for voyage-related data (e.g., destination, ETA).

Although information contained in messages are the same for coastal and space-based AIS, they offer different capabilities for vessel tracking. As suggested by Alessandrini et al. [36], potential for vessel tracking can be assessed using five dimensions: spatial and vessel coverage, probability of detection, refresh rate and timeliness. Based on [36] and [37], table 3.1 describes these dimensions for both coastal and space-based AIS.

3.3 Problems with AIS Data

Five main problems arise when dealing with AIS data to build a comprehensive MDA:

1. Quality issues : From the executive summary of [38]:

AIS data quality is reflected in large part by the degree to which its application differs from its intended design. Though an AIS transponder is largely automated, opportunities exist for variability in practices, misconfiguration, and intentional misuse. These unintended behaviours generate an abundance of anomalies that the security community has an interest in monitoring and sharing, especially in cases indicative of malicious intent.

It was observed in [39] that only 4% of S-AIS messages have no error. For instance, 92% of S-AIS messages have incomplete attributes (e.g., destination not available) and 2% have at- tributes out of range (e.g., heading greater than 360). Also, as reported in [40], approximately 20% of false alarms in anomalous events detection are generated from AIS messages.

2. Identity ambiguity : Identity ambiguity results from the quality issues mentioned above. AIS messages contain information that is incomplete, imprecise, and sometimes inaccurate. Using these messages to uniquely define a vessel without any other source of information is challenging.

9 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Table 3.1: Potential of coastal AIS and S-AIS for vessel tracking (based on [36] and [37]).

Coastal AIS S-AIS

Spatial coverage Limited by Earth curvature Global

Vessel coverage Vessels legally required (equipped Vessels legally required (equipped with class A transceiver), volun- with class A transceiver), volun- tary vessels (equipped with class tary vessels (equipped with class B transceiver) B transceiver)

Probability of detec- All vessels, with AIS system on, Depends on the density of ships tion covered and in range, also depends in the field of view and decollision on contextual conditions (weather, technology of the source. Required radio interference, etc.) also that the vessel AIS system is on.

Refresh rate Adequate (not applicable) Depends on the number of satel- lites and their rate of operations. Depends also on the probability of detection: more than one pass may be required to refresh vessel posi- tion near dense regions.

Timeliness No latency Depends on signal processing time, and number and location of ground stations.

3. Probability of detection : According to [37], between 15% and 85% of ships are missed at each satellite pass. Therefore, a maritime picture built from only one source of S-AIS data would be incomplete.

4. Refresh rate and latency : S-AIS data has important latency issues, which impact the MDA. Firstly, the latest position for a given vessel as reported by S-AIS may be a few minutes to several days old [37]. Secondly, there may be gaps in the vessel track, which could be confused with an intentional AIS transmission break.

5. Large volume of data : The introduction of AIS dramatically increased the volume of mar- itime positional data. AIS positional information is broadcast every 2 to 12 seconds, depend- ing on the speed of the vessel. At this frequency of reporting, and considering that there are about 165,0005 vessels world-wide, roughly 2.1 million AIS reports could in fact be generated worldwide in approximately 2 minutes [32], and the number of S-AIS providers is expected to increase significantly over the next five years. Moreover, the increasing popularity of AIS class B will continue to increase the volume of positional data. Although this information

5http://www.exactearth.com/products/exactais

10

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 3. AIS Data

explosion represents a great opportunity for MDA, it comes with significant challenges in terms of data management and analytic capabilities (software and hardware).

3.4 Other Maritime Data Sources

S-AIS alone is not sufficient to provide the necessary information for a complete and comprehensive MDA, as it suffers from many problems mentioned in section 3.3. Spaceborne earth observation systems can provide a partial solution to these problems, and complement other information prod- ucts that are necessary for building a more complete MDA. Ship position data can be

collected by active or passive sensors providing detection capabilities that vary depend- ing on specific parameters (e.g. resolution, spatial coverage, update rate, latency), con- ditions (e.g. meteorological or physical oceonographic data) and target vessel properties (e.g. size and orientation) [41].

This additional position information can be effectively used with AIS data to detect anomalies such as a ship going dark or spoofing its position. Spaceborne Synthetic Aperture Radar (SAR) and high-resolution optical imaging systems are the main systems of interest for MDA support.

Other data sources can also be gathered from public or private databases to enhance the MDA and provide additional information to better analyze AIS and EO data. Earth Observation (EO) and AIS, even fused, are still isolated from the rest of the MDA data, including ship registers, reports on ship inspection, incidents and illegal fishing, protected areas, ship routes, etc.

This section presents earth observation assets, their capabilities and related problems, and the application of derived information products in MDA generation. An overview of contextual in- formation is then provided. While the application of information-theoretic techniques to these sources falls out of the scope of the study, they are presented as a matter of completeness.

3.4.1 Earth Observation Data and Information Product Sources

High-resolution and timely geospatial information with global access and coverage is increasingly important. Constellations of optical and radar satellites will play a major role in this task. Space- borne SAR is the only sensor that has all-weather, day-and-night, high-resolution imaging capa- bilities.

Maritime surveillance has become a major application of spaceborne SAR, due to its capability to detect some forms of suspicious or illegal activities such as illegal fishing and oil pollution. In Canada, RADARSAT-2 is used to provide near real-time surveillance of ships approaching the east and west coasts. The upcoming three-satellite RADARSAT Constellation Mission (RCM) will further expand Canadian maritime surveillance capability. Several SAR systems are in orbit and can provide end-users with a variety of images of different frequencies and polarizations.

There is a large number of high resolution optical satellites currently in orbit. Eighty-five high- resolution optical imagers are identified in [42]. The resolutions provided by existing and planned

11 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14 space-based sensors will be more than adequate for many civilian and military applications. These are suitable for performing ship detection and sometimes even ship classification and identification. They can also provide a wealth of additional information after visual inspection of the images given the spatial resolution they possess. The next section will cover the capabilities of SAR and optical satellites in more detail.

3.4.2 Earth Observation Capabilities and MDA Usage

Much information of interest for the MDA can be obtained from spaceborne imaging sensors, especially from spaceborne SAR systems. A very good review of the current SAR systems and their capabilities for ocean monitoring is provided in [43]. SAR are used in many operational applications such as oil spill monitoring, hazard detection, such as icebergs (see [44]), bottom topography change measurement, and wind field measurement. SAR can also provide a measurement of the surface wave field in all weather conditions.

Spaceborne SAR can also detect vessels and wakes, which are the two basic approaches to ship detection using SAR, and considerable work on this topic appears in the literature (see, e.g., [44]). From [42]:

Advantages of SAR include wide area coverage obtained in scansar mode and all- weather operation capability. However, among its current limitations are its limited use for non-metallic vessels, the highly restricted and controlled access to data imposed by some satellite operators, and the relatively small number of spacecraft equipped with SAR, compared to the number of optical imaging satellites currently in service. ... The authors note that vessels under 20 m in length were undetected because they do not need to report AIS/LRIT data and do not appear in SAR images.

High-resolution imaging satellites could be used to detect such small vessels and complement wide-area surveillance. In addition, high-resolution optical satellites with submetre to few metres resolution can provide more detailed information.

As depicted in this section, both spaceborne SAR and high resolution optical systems can provide positional information about detected vessels as well as, in some cases, information about ship type, length and heading. Moreover, spatial resolution can provide further superstructure detail, thereby enhancing the ability to classify a vessel by type. In addition, it is possible to derive information products related to oil spills, hazards to navigation, wind speed and wave field from SAR sensors. However, the usage of EO data for compiling the MDA is affected by its inability to provide information about the identity of detected vessels.

3.4.3 Other Public Information Products

AIS and EO data provide positional information about vessels, and partial identity and voyage information. As mentioned in the sections above, these data are not perfect, as they provide

12

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 3. AIS Data only a partial picture for MDA. Public information sources can be used as additional sources of information to enhance the MDA.

The table 3.2 presents the main public data sources of interest, and describes the main capabilities for each of them.

Table 3.2: Public data sources and information product capabilities.

Data source/in- Information Potential for MDA formation prod- uct

Ship registers ([45], Mostly static identity infor- Identity disambiguation, [46], [47]) mation (name, MMSI, IMO, AIS-related anomaly detec- call sign, type). tion.

Ship inspection Vessel identity information, Identify vessels of interests ([48], [49]) inspection details, deficien- and cue EO gathering. cies and measure applied.

Ship incidents ([50], Incident description and Identify vessels and areas of [51], [52]) localization, implied vessel interests and cue EO gather- identity. ing.

Protected areas Geometry, status and type of Identify vessels of interests ([53], [54]) the protected area. and cue EO gathering.

Illegal fishing activ- Vessel identity history (name, Identify vessels of interests ities ([55], [56]) flag, owner), reporting re- and cue EO gathering. gional fisheries management organizations, infraction de- tails.

Ship routes ([57], Geometry of routes per ship Deviation from typical ship [58]) types and time periods. routes.

Bathymetry charts Seafloor depth. Identify areas of interest and ([59]) cue EO gathering.

Social Media Unstructured text, hashtags, Can provide boat pictures, pictures and web links. position and identity of small near shore undetected ves- sels using standard means or identity of person on board.

13 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Table 3.2: Public data sources and information product capabilities.

Data source/in- Information Potential for MDA formation prod- uct

Press Clippings Unstructured textual and vi- Contextual information ([60]) sual information of various about owners, crimes. Pic- kinds. tures, etc.

Ship manifest Cargo description, passengers Identify vessels of interest and crew listing. and cue EO gathering, AIS- related anomaly detection (e.g., ship type).

Vessel arrival re- ETA, destination. Identify vessels of interest ports and cue EO gathering, AIS- related anomaly detection.

3.5 Spatio-Temporal and AIS Data Mining

The report from St-Hilaire and Hadzagic ([5]) produced by OODA Technologies in an earlier call-up task provides an in-depth review of spatio-temporal and AIS data mining methods and application. As reported, specific features ([61]) of spatial data limit the use of general purpose data mining algorithms. Among them:

1. “the spatial relationships among the variables, which implies that proximity to certain entities influence one’s characteristics”;

2. “the spatial structure of errors”;

3. “the presence of mixed distributions as opposed to commonly assumed normal distributions, which is the direct effect of the scale of observation”;

4. “observations that are not independent and identically distributed (i.i.d.)”;

5. “spatial auto correlation among the features, which calls for the use of spatial data mining techniques”;

6. “nonlinear interactions in feature space”.

14

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 3. AIS Data

From [5]: “the complexity of spatial data and implicit spatial relationships limit the usefulness of conventional data mining techniques for extracting spatial patterns”. Readers are referred to [62] for current state-of-the-art algorithms in spatial/geographical data mining.

Spatio-temporal data mining also requires explicit or implicit modeling of spatio-temporal auto correlation and constraints. [5] identifies two important data types that are commonly considered in spatio-temporal data mining:

• spatio-temporal block level data, when a fixed region is partitioned in a grid of a finite number of cells. Such a grid provides the adjacency information of cells. Spatio-temporal smoothing, inference and predictions are some commonly used statistical techniques for processing this kind of data type.

• spatio-temporal point data, which is an event of interest occurring and indexed by the spatial point. A spatially and temporally continuous point process typically refers to trajectories of moving objects over time, which consist of sampled locations at specific timestamps.

A frequent problem in spatio-temporal data mining with many applications is spatio-temporal co-occurrence pattern mining, where two or more different object-type instances are often located in spatial and temporal proximity.

The most important AIS data mining technique involves the extraction and definition of motion patterns. When detecting anomalies in ship motion, an anomaly detection algorithm is subse- quently applied. Definition of motion patterns can be quite complex when dealing with a network of origins/destinations with multiple connecting paths. AIS reports are timestamped, but some traditional data mining techniques lose the time stamp and represent a navigation trajectory as a set, rather than a sequence, of consecutive latitude-longitude vessel positions.

Several approaches have been used in AIS data mining, for example:

• In [11], Pallotta et al.(2013) adapted the DBSCAN algorithm to the maritime application of deriving motion patterns.

• In [63], de Souza et al. (2016) use Hidden Markov Models to improve fishing pattern detec- tion.

• In [21], Mascaro et al. (2014) tackle anomaly detection with Bayesian Networks, learning them from real world AIS data.

A more detailed list is provided in [5]. The rest of this document introduces the concepts of information theory. This theory has been largely ignored in spatio-temporal data mining applied in the maritime domain. Basic concepts of information-theoretic measures are provided, and their applications in various domains, including the maritime domain, are presented. Potential application concepts are then introduced, and a demonstration on real AIS data is presented.

15 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

This page is intentionally left blank.

16

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 4

Information Theory Basic Concepts

Information theory studies the transmission (communication), processing (encryption, encod- ing/decoding), utilization, and extraction of information. This field was introduced by C. Shannon in 1948 with his milestone paper ”A Mathematical Theory of Communication”.

One can find many documents of reference on the web about information theory (books, articles, lecture notes, etc.). Usually, they are distributed in two categories: mathematics and communica- tion theory [64, 65, 66, 67, 68]; and applications[69, 70]. In this section, attention will be focused on the latter.

4.1 Information-Theoretic Measures

4.1.1 Entropy

One important quantity introduced by information theory is entropy (a.k.a. Shannon entropy or information entropy). It was the first measure presented in Shannon’s famous paper in 1948. From [71]: “the inspiration for adopting the word entropy in information theory came from the close resemblance between Shannon’s formula and very similar known formulae from statistical mechanics” defined by J.W. Gibbs. Some variations of this entropy have been developed. We present here two of them.

4.1.1.1 Shannon Entropy

This is the first and the most common version of information entropy. n X H(X) = − P (xi)logb(P (xi)) (4.1) i where b is the base of the used. Common values of b are 2, Euler’s number e, and 10, and the unit of entropy is shannon for b = 2, nat for b = e, and for b = 10. When b = 2, the units of entropy are also commonly referred to as .

17 Study Report RISOMIA Call-up 14

Shannon entropy is a measure of the expected value of the information, where information is a “purely a quantitative measure of communicative exchanges” [72], a quantity which can be understood mathematically and physically. It is based solely on the frequency of occurrence of observed variables. Entropy as defined by Shannon was first used to estimate the minimal storage needed for encoding a variable over a communication channel.

Shannon entropy was invented originally for signal/communication analysis, but since its under- lying concepts can be generalized to other sources of information such as images, spatial data and much more, it can be used to extract some features from these other forms of information. Among others, it was applied in signal processing, cryptography, biology, cosmology, quantum physics and economics.

4.1.1.2 R´enyi Entropy

R´enyi entropy is a generalization of Shannon entropy. It is defined ([73]) as:

n ! 1 X H (X) = log pα (4.2) α (1 − α) i i=1 where α is the order of the entropy and must be α ≥ 0 and α 6= 1. From Wikipedia [73]:

As α approaches zero, the R´enyi entropy increasingly weighs all possible events more equally, regardless of their probabilities. In the limit for α → 0, the R´enyi entropy is just the logarithm of the size of the support of X. The limit for α → 1 is the Shannon entropy. As α approaches infinity, the R´enyi entropy is increasingly determined by the events of highest probability.

The R´enyi entropy is important in ecology and statistics as indices of diversity ([73]). The value of α then defines the sensitivity of the diversity value to rare vs. abundant number of categories (such as species in the case of ecology) by modifying how the weighted mean of these categories proportional abundances is calculated.

4.1.2 Conditional Entropy

Conditional entropy is defined as [74]:

X X H (X|Y ) = − P (x, y) logP (x|y) (4.3)

x∈Vx y∈Vy where Vx and Vy are the set of values for the x and y dimensions of an observation. Conditional entropy can be used to identify one-way associations by selecting the most important attributes. The following measure: H (X|Y ) X ⇐ Y = (4.4) log |Vx|

18

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 4. Information Theory Basic Concepts

Figure 4.1: Venn diagram representing the different entropies and the mutual information. can be used to rank attributes in increasing order [74].

4.1.3 Mutual Information

Mutual information measures the mutual dependence between two variables. More specifically, it “measures the amount of information that can be obtained about one random variable by observing another” [75].  p(x, y) I(X; Y )= p(x, y)log( ) (4.5) x,y p(x)p(y)

4.1.4 Kullback-Leibler Divergence

The Kullback-Leibler divergence has many synonyms: relative entropy, information divergence and information gain. It is a measure of the difference between two probability distributions P and Q. It is the amount of information lost when Q is used to approximate P.  p(x) DKL(p(X)||q(X)) = p(x)log( ) (4.6) x∈X q(x)

4.1.5 Self-Information or Surprisal

Self-information, also known as surprisal, is a measure of the associated with an event in a probability space. The smaller the probability of an event, the larger the surprisal associated with the information that the event occurs. By definition, the measure of surprisal is positive and additive. If an event C is the intersection of two independent events A and B, then the amount of information knowing that C has happened equals the sum of the amounts of information of event A and event B, respectively: I(A ∩ B)=I(A)+I(B) (4.7)

19 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

1 I(A) = log( ) = −log(P (A)) (4.8) P (A) where P (x) is the probability of occurrence of the event x.

4.2 Spatio-Temporal Extension of Information Theory

Shannon’s entropy describes the average information content of a distribution. Even though the elements of this distribution can be mapped to cells on a grid, their distribution is not in and of itself spatial, in the sense that the relation and adjacency between events is not considered. Leibovici, Claramunt, Le Guyader & Brosset, 2014 [8], Leibovici, 2009 [7] and Tupin et al (2000) [76] explore spatial entropy-based measures taking into account the closeness of objects and different classes (or types) of objects. Three definitions of spatial entropy are retained here, one based on co- occurrences of events, one on the ratio of closeness between different classes of events, and the other on markovian random fields.

4.2.1 Co-occurrence-based Spatial Entropy

The co-occurrence formula 4.9 does not consider the notion of categories of events or objects, and considers instead only their surroundings in its approach:

NC XOO Hk(COO, d) = −1/log(NC00 ) pCOO log(pCOO ) (4.9) cOO where COO is the multivariate co-occurrence of events with different or same attributes. A co- occurrence takes place between k number of attributes (also called the order of co-occurrence) of N observed events within distance d of each other (the co-location distance). The number of interacting attributes is intrinsic to the expression of co-location. A degree 2 co-location looks for pairs events occurring within distance d of each other while a degree 3 co-location looks for triplets occurring within a distance d of each other. From a practical point of view, a pre-defined configuration of attributes can be limiting when dealing with complex multi-attribute data. From a data mining perspective, it is more useful to focus on the co-occurrence as an integral event or as a starting point in data exploration and to subsequently converge on influential attributes. Note that the equation is extracted from [8] and the subscript 0, without being defined, seems to report to the occurrence of a value.

Entropy is a global measure, in that it produces a single result characterizing the entire dataset. However it is often desirable to quantify a subset of the data, from a local area of interest. [8] shows that it’s possible and relevant to consider local entropy values and suggests an interesting alternative to locality by looking at a fixed number of nearest neighbours. Subdividing the spatial domain in a grid and computing the spatial entropy independently within each cell can also be applied to obtain a local representation of spatial entropy, as it limits the number of events by artificially setting a distance d around the central point of a cell.

20

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 4. Information Theory Basic Concepts

4.2.2 Distance Ratios Spatial Diversity

Using a slightly more generic formula, Leibovici, Claramunt et al. [77, 7, 8] explore a weighted measure of entropy called spatial diversity:

X dint H (C) = c p log(p ) (4.10) S dext c c c c

where pc is the probability distribution of a categorical variable C (the number of observations int of the variable c over the total number of observations), dc is the average distance computed between all pairs of observations which have the same value c of the categorical variable C (called ext the intra-distance), and dc is the average distance computed between all pairs of observations where with different values (c and c0 where c 6= c0) of the categorical variable C. When considering few object attributes, the co-occurrence and spatial distribution measures offer similar outcomes. However, the latter provides more flexibility when dealing with large or undetermined numbers of attributes or classes, as the comparative factor of the formula is not bound to a pre-defined set.

As mentioned in [8], both of the proximity centric entropy measures can be extended to the time domain. A co-occurrence can be said to happen within time t and distance d, and diversity can be checked over average time proximity between class objects.

4.2.3 Markov Random Field-based Spatial Entropy

Specifically developed for image processing to assess the heterogeneity of a pixel’s neighborhood, Tupin et al. (2000) [76] introduced a spatial entropy measure based on Markov Random Field properties. Spatial entropy for a defined neighborhood centred on a pixel is computed as follows:

X X H = − p(x, N k)log(p(x|N k)) (4.11) k x where k is the number of equivalence classes, x is the label of the pixel or grid cell, and N k is an equivalence class configuration as defined in [76]. While this measure was developed for textural characterization of SAR images, it is extensible to any type of labelled grid. It introduces the concept of global regularity of neighbouring cells by using equivalence class configuration. Note that defining the equivalence class is necessary only if the number of classes is high (this number is 256 on a grey-scaled image). The concept of equivalence class is represented in figure 4.2.

The equivalence class is solely based on the number of neighbours sharing different values. In figure 4.2, the equivalence class of the neighborhood would be 4-2-1-1.

21 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Figure 4.2: Two neighborhoods of the same equivalence class (4-2-1-1), as the actual values of the neighbours are not considered, only their numbers. Source: reference [76].

4.3 Summary

This chapter presented information-theoretic measures along with some of their spatial extensions. These measures are commonly used to quantify the amount of information in a dataset. They can be used to assess the relative information content of two different distributions. The fact that they quantify information is the key to successful applications in widely different domains (see section 5).

In order to represent the spatial properties of the data within information, some extensions have been proposed in the literature. They are mostly based on distance between data or on their co-occurrence within a defined neighborhood. Even if the reported results are encouraging, these extensions are still way less mature than their original counterpart, and more study would be required to assess their full potential for the quantification of spatial information content.

One of the main advantages of information theory over other statistical methods is that it is non- parametric. Nonparametric methods estimate the underlying distributions from the data without making strong assumptions about the structures of the distributions. The performance of para- metric statistical methods can be severely affected when the assumed distribution is not correct. Due to its nonparametric aspect, information theory can deal with a variety of problems, in which many other techniques would either fail or at least require the a priori extraction of statistics or the usage of domain knowledge.

In one- and two-way association, information theory quantifies the dependency between random variables and is able to capture nonlinear dependence. In contrast, cross correlation usually as- sumes a form of linear relationship between 2 sets of data. So, information theory makes much weaker assumptions than other methods or frameworks.

If one observes information theory in the context of a large AIS dataset, one can draw out why information theory has strong potential to deal with this particular kind of data. As mentioned in section 3.3, five main problems arise when dealing with AIS data, but information theory should

22

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 4. Information Theory Basic Concepts be able to deal with most in the context of data mining:

• Quality issues: missing attributes or wrong values might have an impact on the estimation of the underlying data distribution. However, assuming that missing and wrong values should be evenly distributed, there is a good chance that, in the context of large datasets, the impact might not be large enough to hide the information content.

• Identity ambiguity: since information theory should be used in the context of ship tracking, this aspect should not influence the results of the application of information over maritime data.

• Probability of detection: probability of detection, which can be problematic, is expected to be less critical when using information theory. Even if not all ships are detected, distributions in the data should remain fairly consistent with reality. Of course, further study would be required, and mutual information between distributions with different probability of detection would be a measure of choice in this particular case.

• Refresh rate and latency: this problem is not relevant in data mining for pattern extraction, except if it needs to be performed in real time. Data mining and pattern extraction is very different from ship tracking required in building a comprehensive MDA.

• Large volume of data: since information theory is nonparametric and estimates the under- lying distribution from the data, a larger volume usually means more precise estimates and therefore a better quantification of the information.

Also, as presented in section 3.5, specific features of spatial data limit the use of general purpose data mining algorithms. Here again, information theory gives the impression that it can deal with any of them:

1. spatial relationships among variables, which implies that proximity to certain entities influ- ence one’s characteristics: spatial extension of information theory has been proposed in the literature for this particular aspect;

2. spatial structure of errors: errors and missing data are not expected to disturb information theory for pattern extraction over large datasets;

3. the presence of mixed distributions as opposed to commonly assumed normal distributions, which is the direct effect of the scale of observation: information theory is nonparametric, and, as such, no assumption is made with regards to the underlying distribution;

4. observations that are not independent and identically distributed (i.i.d.): information theory estimates the distribution directly from the data;

5. spatial autocorrelation among features, which calls for the use of spatial data mining tech- niques: spatial extension of information theory has been proposed;

6. nonlinear interactions in feature space: information theory is able to capture and quantify nonlinear dependencies.

23 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

As one can see, information theory seems to be a candidate of choice for data mining of large AIS datasets, as most of its problems can be handled by information theory. The next chapter will present examples of application of information-theoretic measures in a wide variety of domains.

24

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 5

Applications of Information Theory

This chapter explores records related to information theory research and applications that either have been or have the potential to be employed for the advancement of maritime situation aware- ness. This section is organized as follows:

• Section 5.1 lists past efforts to link information-theoretic notions to maritime data.

• Section 5.2 presents applications in other fields of study tangential to the maritime domain. The purpose of the latter topics is to highlight commonalities with the maritime domain wherever possible, and to suggest additional or complementary solutions to existing problems.

• Section 5.3 highlights the common points between information theory and visualization, and presents how information-theoretic measures can be used in that particular domain.

• Section 5.4 serves as a summary to this chapter.

5.1 Maritime Domain Applications

5.1.1 Pattern Discovery in Maritime Data

In their 2015 paper, Liu, Dobias and Eisler [6] set out to explore a novel approach to pattern discovery in maritime data using Shannon’s entropy for the spatial distribution of S-AIS and Radarsat 2 (RS2) data. The process involves splitting a Cartesian area of interest into sub-blocks forming a grid and computing the probability of detection for each cell. From the probability densities, a global entropy value for the region is then calculated through direct application of Shannon’s entropy formula.

Shannon’s tenets on entropy remain true in that an even distribution of detections yields a high entropy value, while an irregular (structured) distribution yields a low entropy value. However, the result is highly dependent on the grid’s size. The authors attempted to strike a balance between cell size and the resulting number of empty cells (as small cells lead to more empty cells). In

25 Study Report RISOMIA Call-up 14 general, smaller cells yield a lower entropy level as detection clusters stand out more distinctly relative to neighbouring cells. In other words, a fine grid exposes the structure in detections, while a coarse grid hides it. For performance reasons, the number of cells can be kept to a measured minimum.

Liu et al., in [6], computed spatial entropy values for data collected in Area Of Interest (AOI) in the Canadian Atlantic and Pacific waters over the course of one year. Results show the existence of patterns in recorded detections with entropy values maintaining a tendency of 0.7. Some variability in entropy over certain months may be indicative of fishing seasons or a rise in commercial shipping.

The analysis provided by Liu et al. in [6] provides insight into the existence of patterns of vessel detection data but is insufficient to ascertain the nature and inner mechanics of its discoveries. Notably, the spatial nature of this entropy measure does not take into account the proximity of detections relative to each other. The notion of structure here is only indicative of some cells having more detections than others, irrespective of their geographic properties. Therefore the direct use of entropy here has limited usefulness, for confirming the existence of structure, but not for understanding it. Other methodologies have proven more efficient in detecting and mapping traffic patterns from vessel data, especially traffic lanes.

Pallotta Vespe & Bryan, 2013 [11, 12], aggregate AIS vessel reports into tracks before applying clustering methods to identify common traffic lanes. The route extraction work-flow, for all its simplicity, produces remarkable results and proves a valuable tool in deconstructing complex traffic patterns. By analyzing the number of vessel reports required to produce stable routes, the authors observe varying complexity levels among distinct areas of interest. For example, the Strait of Gibraltar requires 10 days of AIS data to uncover 80% of its extracted routes. For the same time period, only 8% of extracted routes may be uncovered in the Indian Ocean. While there are many factors contributing to the discrepancy between results, such as scale and availability of data and density, the authors remark on parallels that can be drawn between the complexity of an area of interest and Shannon’s entropy. Specifically, since entropy can be used to quantify information uncertainty, identifying areas of high traffic regularity also leads to more reliable anomaly detection. This would warrant further investigation.

5.1.2 Measures of Diversity Based on Distance and Co-occurrence

In 2014, Leibovici et al. [8] subjected co-occurence-based spatial entropy (eq. 4.9) and distance ratios spatial diversity (eq. 4.10) to a test case of a structurally evolving dataset. The goal was to test each method’s spatial entropic capabilities. Spatial diversity is shown to be most sensitive to modification in spatial distribution, while co-occurrence with short to medium collocation dis- tances does a slightly better job of capturing the speed of modifications of a spatial distribution across multiple observation times. Both equations 4.9 and 4.10 produce global values from local components. To that end, the authors also explore the impact of local contributions of colloca- tion and diversity by restraining each weight factor to a vicinity of n nearest neighbours. Results show that collocation is more resilient to a sample size restriction, which is expected since it is dependent on local distance contributions. The diversity function loses some of its sensitivity due to structural change as the number of nearest neighbours is reduced. Intuitively, this makes sense, as considering all neighbours amounts to computing the global entropy measure for each case, and

26

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 5. Applications of Information Theory any reduction in scope translates into some loss of information.

On the whole, co-occurence seems more difficult to implement and requires several trials with varying collocation distances in order to identify an optimal value, where the amount and slope of structural change over time is the most important factor.

Leibovici et al. also applied co-occurence-based spatial entropy (eq. 4.9) and distance ratios spatial diversity (eq. 4.10) to a maritime case study overlooking a set of recorded activities in the Bay of Brest, France. These activities are recorded as spatio-temporal points spanning a one-year time span and include AIS data of commercial shipping. Due to the comparative nature of the formulae, activities are compared as tuples, where one activity is always AIS data and the other is one of the other available datasets (fishing, water sports, etc.). Data is modelled as polygons of activity, and their crossing is labelled as a conflict zone. The authors were able to demonstrate that models of local entropy contributions are useful in identifying local clusters of conflict between activities. Similar procedures can certainly be beneficial to other maritime domain applications.

5.1.3 Vessel Imaging

Vessel detection techniques based on imaging make use of various forms of saliency, “a perceptual quality which makes some items in the world stand out from their neighbours” [78]. In general, the process involves amplifying features of vessels relative to their background, allowing detection algorithms to pick up vessel characteristics.

In 2015, Cruz & Bernardino [14] attempt to automate the detection of life rafts from infrared (IR) images obtained by sensors mounted on Unmanned Aerial Vehicle (UAV)s. Life rafts are expected to be hotter than the surrounding water and therefore emit Long Wavelength IR (LWIR) at higher intensity. Water generally appears as a uniform dark surface in IR images, whereas vessels stand out as brighter angular objects. In order to increase saliency, the image contrast can be increased, which produces practical results. However, when no ship is present, an increase in contrast leads to an increase in noise, which highlights randomly distributed hotspots very similar to vessel hotspots. Noisy salient images without vessels can therefore lead to the detection of false-positives.

An important distinction between salient images containing a vessel and salient images showing only noise is the distribution of hotspots. In noisy images, hotspots are uniformly distributed throughout the frame, whereas images with a vessel will feature hotspots only on the vessel. It can be said that images with vessel detection are more structured and therefore Shannon’s entropy is a good discriminant between the two. By computing the probability distribution of each pixel on an image, it is possible to find the corresponding entropy value. Images with vessels should have lower entropy than images without vessels.

Cruz & Bernardino, in [14] show that despite their algorithm’s overall good performance, there are a significant number of missed detections and a few false positives. Unmanned life rafts are particularly difficult to detect due to their small size and low heat emissions.

27 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

5.1.4 Feature Extraction and Categorization

A straightforward application of Shannon’s entropy to maritime domain data can be found in Chen, 2014 [10] as a tool for determining data reliability when using multiple sensors on the same target. Each sensor is rated for a set of operating conditions such as visibility, target reflection properties, time of day, etc. The rating reflects the probability that a sensor reports a reliable measure for a given condition. Ratings r are normalized over m number of sensors so that the sum of probabilities is equal to 1. Therefore the set of sensors with ratings ri have a probability distribution pi of measure reliability:

ri pi = Pm (5.1) i ri

The normalized entropy Hu for each operating condition is:

m X Hu = −1/ ln m pi ln pi (5.2) i

This reliability entropy can be used as a weight in establishing the influence of each operating condition on sensor data. A low entropy rating indicates that the influence of a given attribute is high.

5.1.5 Vessel Report Quality Assessment

Iphar, Napoli & Cyril, 2015 [15] and [16] raise the issue of AIS message reliability given the lack of oversight over message inputs. The system is inherently vulnerable as there is ample room for incorrect inputs, intentional falsification and spoofing of data. While the study focuses mainly on identifying the pitfalls of AIS, the authors suggest, for future research, that an information- theoretic approach could be used for data quality assessment by comparing each incoming message and its fields against a historical reference, and therefore establishing a measure of integrity. The uncertainty measure associated to a vessel and the area it is reporting from can be used to determine the reliability of the message. The use of information theory for such an application was only made as a suggestion, and no implementation or testing was done by the authors.

5.2 Non-Maritime Domain Applications

5.2.1 Psychology

Our very own perceptions hint that our minds obey information-theoretic rules. Attneave suggested in 1954 [79] that information along visual contours is concentrated in regions of high magnitude of curvature. Experiments have shown for example that representations of objects where only peaks and corners were kept and joined by straight lines were still easily identifiable. Similarly, other

28

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 5. Applications of Information Theory experiments attempted deletion of high-curvature segments from drawings that then proved much harder to identify. Using Shannon’s definition of information entropy, Feldman & Singh, 2005 [80] set out to tie these observations to a mathematical proof.

They started out by subdividing a curve into ∆S segments and describing a tangent at each point. Then there exists an angle α between each successive segment tangent and a curvature κ such that:

α κ = (5.3) ∆S

By making the assumption that the angle α is a von Mises distribution (a circular analogue to the normal distribution), it follows that the curvature is also similarly distributed. Then by inserting this distribution in p(κ) into the surprisal of Shannon’s entropy:

u(κ) ≈ − log A0 − b(∆S)2 cos(∆Sκ) (5.4) where A0 is a constant and b is a parameter describing the spread in the distribution of α. Ignor- ing the constant terms, it follows that the surprisal is proportional to − cos(∆Sκ) and therefore increases with curvature.

Recall that the sum of all surprisals weighted by their probability corresponds1 to the measure of entropy as defined by Shannon. Cumulative curve surprisals R u(κ) or their discrete equivalent P C C u(κ) called cumulative information in [80] which states: ”This integration is particularly in- teresting in light of the relationship between Shannon (1948) information and complexity, widely recognized in the statistical and machine-learning literatures.” Possible research relative to the maritime domain could employ the above curve entropy to categorize vessel tracks or shipping lanes and identify anomalous vessel tracks when a vessel’s path does not match the expected entropy of its trajectory. Other potential applications involve the study of seasonal changes of shipping lanes, or the classification of track entropy by vessel type.

5.2.2 Image Processing

In video compression, a key factor responsible for information reduction is that images in a typical movie evolve little from one frame to the next. It is therefore more economical to store the changes between frames rather than the frames themselves. Key frames therefore act as foundation points from which the video is constructed, and enable video navigation by skipping to the next key frame. In order to maximize the efficiency of the compression algorithm, key frames are chosen when the image changes drastically, such as scene changes. The amount of pixel colour transitions that occur from one frame to the next can be characterized as distance. The more pixels needing an update, the greater the distance. Maximizing key frame distance is key to compression efficiency and therefore to storage efficiency, which leads to real-life savings for video streaming services. [81] make use of Relative Entropy (RE) (also called Kullback-Leiber divergence) and Square Root Relative

1http://journal.frontiersin.org/article/10.3389/fphy.2016.00047/full

29 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Entropy (SRRE) as a computationally efficient distance measure between neighbour frames. The distance between the i-th frame (fi) and the (i+1)-th frame (fi+1) can be described as follows:

m X pi(j) d (f , f ) = p (j) log (5.5) RE i i+1 i p (j) j=1 i+1 where pi is the probability distribution of image pixels derived from the RGB channels. Taking the square root of dRE, we obtain SRRE, which is shown to amplify the distances between frames. It is considered a more potent discriminant of frame distance when two frames are similar. There are also applications where it is desirable to find the minimal distance between two images.

The process of image registration focuses on aligning two images of the same or very similar subjects at different rotation angles such that the common features are superimposed. A common example of registration use is in collating multiple satellite images to form a large composite map. When taken at large enough intervals, they can also serve to analyze the evolution of geographical features such as river migration or flooding. In terms of information theory, this process corresponds to maximizing mutual information, as defined in equation 4.5, and is centred on the notion of grey values in one image likely corresponding to similar grey values in another. A probability distribution is derived for each grey value by counting its occurrence against all other grey value counts. For two images A and B, the mutual information I(A, B) can be described as

I(A, B) = H(B) − H(B|A) (5.6) where H(B) is Shannon’s entropy and H(B|A) is the conditional entropy of B given A. As mutual information is a measure of the dependence between two variables, registration is achieved when mutual information is maximized. In other words, the mutual information corresponds to minimizing the amount of uncertainty about B when A is known. I(A, B) can also be written in terms of joint entropy, where

I(A, B) = H(A) + H(B) − H(A, B). (5.7)

That is, mutual dependence is achieved by adding the individual gray value entropy of each image and subtracting the joint entropy of both images, and is also equivalent to minimizing joint entropy.

In order to achieve registration, one image is generally transformed relative to the other. While there are several different types of transformations such as perspective and scaling, it is easier to consider rigid transformations when attempting to understand registration. Rigid transformations involve only translation and rotation. As part of the search process for mutual information maxi- mization, one image is displaced until a peak mutual information is found. Egnal and Daniilidis, 2000 [82] detail several implementation methods. Pluim, Naintz, Antoine & Viergever, 2003 [83] provide a comprehensive description of image registration along with more detail on intermediate steps.

In Tupin, Sigelle & Maitre, 2000 [76], MRF-based spatial entropy is used as a textural discriminant for land cover segmentation and classification for SAR imagery. Interesting results are obtained.

30

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 5. Applications of Information Theory

On the subject matter of MDA research, relative entropy may be used to measure long-term changes over any MDA situation rendered as an image. Heat maps of vessel detection density in an area of interest, for example, can be measured for subtle changes on a day-to-day basis. Relative entropy has potential applications in MDA research, especially in anomaly detection, but this would require further research to resolve.

5.2.3 Information Theory for KDD

Datasets commonly represent objects as row entries in a table, and columns as attributes of those objects. Attribute importance and the relations between attributes are crucial to the backdrop of Knowledge Discovery and Data Mining (KDD) and pattern recognition. Many KDD algorithms can be conceptualized as efforts to identify relationships or to find some form of structure in data attributes; as seen in Yao 2003, [74]. Within tables, associations and relationships can exist between different attributes or between values of the same attribute. These are also referred to as horizontal or vertical relationships. In an information theory framework, the relationships between object attributes are characterized by concepts of entropy minimization. A set of values for a given attribute will exhibit high entropy if their distribution is irregular. Regularity, and therefore structure, are associated with low entropy values. Structure within a column or among different columns indicates that the dataset can be classified into distinct populations. Various decision tree algorithms make use of this concept to create efficient branching.

Conversely, the maximum entropy principle is applied when information about a subject is limited or lacking. In broad terms, the principle states that it is best to adopt the most conservative view of a situation when there is insufficient data to say otherwise [84]. In terms of entropy and the preceding example, this means that the probability distribution of an attribute with partial data available should be chosen such that the entropy is maximized. Another way to put it is that the most fitting class model should be the one that yields the most uncertainty (or least information). The maximum entropy principle has been applied in various fields, including epidemiology as seen in Leblanc 1990, [85], natural language processing, in Berger, Della Pietra, 1996 [86], computer imaging, sensing and others.

This topic has overarching applications to data analysis in general. It useful for classifying data, extracting relations between attributes. Applied to MDA and spatio-temporal data, the intensity of information-theoretic measures, when mapped on a spatial grid, can help highlight meaningful patterns and spatial structures within the data.

5.2.4 Decision Trees

Datasets are often described in tabular form, with each row representing an object or repeated observations of an object, and columns representing an object’s attributes. Columns may also characterize objects in more encompassing terms, as classes. The distinction between attributes and classes is somewhat abstract and generally attributable to scale. We often think of classes as enveloping a set of attribute values describing and object. For example, we can think of patient symptoms as attributes, and of the ensuing diagnostic as a class. From an algorithmic point of view, the distinction between class and attribute is mostly a question of perspective, where a set of

31 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14 known attributes are used to determine another. In real-world applications, we are often interested in predicting an attribute’s value given a set of other known attributes describing an object.

Iterative Dichotomiser 3 (ID3), described by Quinlan in 1986 [87], is a decision tree machine learning algorithm for the classification of categorical data. It is a supervised learning algorithm, meaning it requires training data in order to build a model. Data is organized in a tree structure, with object attributes as nodes and outcomes as branches. The tree leaves are the resulting outcomes of every decision path and therefore the classes to be learned. The process of creating a tree model is iterative and starts by looking at a subset of the training set called a window. A tree is built from the window that correctly classifies the subset and is subsequently used to classify the entire dataset. If the tree succeeds in correctly classifying all the data, the process is stopped. Otherwise, a section of the incorrectly classified rows is introduced into the window and a new tree is built. A decision tree can completely represent any dataset. In other words, any table of row objects and column attributes can be represented in a tree format without loss of information. However, ID3 is not built as a substitute to a table but rather as a concise alternative that is good enough.

The crux of the problem is to find a simple tree that can represent the entire dataset as well as possible. This comes down to selecting the root and all subsequent nodes in a manner that separates the dataset as efficiently as possible. It turns out that a good measure for partitioning data is information gain. The information gain measures the change in entropy from a previous state to a new state partitioned on an attribute. The attribute generating the highest information gain is chosen to branch the dataset.

X |GAv| gain(GA) = H(GA) − H(A) (5.8) |GA| v∈V alues(SA)

where H(T ) is the total entropy of the tree at its top level and the second term is the average entropy of a subtree with a root attribute A weighted by the number of elements.

Li and Claramunt, 2006[88] introduces a spatial-based decision tree for geo-referenced datasets. A spatial diversity coefficient is included in the entropy factor of information gain. The rules governing this coefficient state that dissimilar objects in proximity to one another increase diversity, while similar objects in proximity to one another decrease diversity. This measure of spatial diversity was introduced in 4.10 and replaces Shannon’s entropy for spatial ID3. This extension is applied in a land-use classification problem, where data attributes have been previously aggregated to form spatial clusters. This can be related to the MDA if one considers the application of a clustering algorithm or an aggregation of some sort to build such clusters. A spatial ID3 could then be applied to identify the cluster using a modelization of the expert knowledge. Applied in successive temporal windows, one could derive a spatial evolution of the maritime patterns / structures.

32

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 5. Applications of Information Theory

5.2.5 Entropy Based Time Series Analysis in Biomedical Applications

Entropy measures can expose core properties of sequences of recorded events that can then be used to make better predictions. Lopez, Sole et al., 2016 [89] study various entropy measures in an attempt to identify important features for diagnosing Essential Tremor in patients. The methodology involves recording patients either writing or drawing on an electronic drawing pad, then extracting features from recorded data (coordinates, pressure, pen angle, etc.). Finally, entropy features are added to the dataset of recorded observations before applying machine learning algorithms to a training set. The guiding principle is that entropy features can uncover salient information than can be picked up by machine learning algorithms in order to improve their predictive result.

Permutation Entropy (PE) and Multivariate Multiscale Permutation Entropy (MMSPE) are shown to contribute most to the classification abilities of chosen machine learning algorithms. It should be noted, however, that PE algorithms are particularly well-suited for physiologic measurements as they are designed to catch patterns in a sequence of observations. These algorithms could be well-suited for measuring complexity in vessel tracks, which contain many points and are prone to repeated patterns. The methodology employed in Lopez, Sole et al. [89] could prove helpful to the maritime domain with some alterations in the choice of information-theoretic measures.

Through a similar approach, Bian, Ouyang et al., 2016 [90] make use of PE and weighted PE to diagnose amnestic mild cognitive impairment (aMCI) in patients with type 2 diabetes by looking at their resting state EEG signals. In this case, PE and WPE reveal lower complexity in EEG signals of aMCI patients compared to the control group and are sufficient to establish a working diagnostic.

In a study on the predictability of defibrillation outcomes in cardiac arrest by Chicote, Irusta et al., 2016 [91], features of sample entropy, fuzzy entropy and conditional entropy were used to generate reliable predictors of success. As with previous entropy methods, these quantities detect patterns in time series measures. The study serves to highlight the subtle alterations required from entropy methods when tackling the specific needs of an application domain. In this case, the amplitude of the ventricular fibrillation waveform is an important predictor of success and required a modified version of conditional entropy to include a fixed set of voltage levels.

5.3 Application in Visualization and Interactive Analysis

From [92]: “While our world is highly dimensional mathematically, we can only perceive lower dimensions. This leads to the definition of visualization as mapping from higher into lower di- mensional space.” From [93]: “Currently, the visual analysis process is mostly operated by the user through trial and error in an ad hoc manner where important parameters for visualization algorithms need to be frequently updated and refined before satisfactory visualization results are obtained”.

From [93]: “ One major cause of difficulties in the visual analysis of large datasets is the lack of quantitative metrics to measure the visualization quality relative to the amount of information contained in the data. As the size of data grows even larger, these problems will become even

33 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14 worse since the user’s ability to move and process the data will be severely limited”.

Chen and J¨anicke, in [94], present a theoretic study into the conceptual connection between in- formation theory and visualization. They argue that the two fields of communication, where information theory comes from, and visualization, have a strong resemblance, for instance:

• “data abstraction usually results in data compression”;

• “creating and viewing a visualization is usually an information discovery process”;

• “the messages in a visualization are not guaranteed to be received by a viewer”;

• “the quality of a visualization is often measured by probabilistic experiments”; which suggest a strong connection between information theory and visualization. This assumption is supported by Wand and al (2011)[93] who states that the

visualization process can be treated as an information channel, i.e., a visual commu- nication channel that attempts to communicate the information in the source data to the destination, the viewer. In a typical visualization pipeline, the data need to be transformed by a sequence of steps such as denoising, filtering, visual mapping, and projection. Each of the transformation steps in the visualization pipeline can be thought of as an encoding process where the goal is to preserve the maximum amount of information from the input and generate the output for the next stage of the pipeline.

A very good overview of fields where information-theoretic measures have been used in visualization and computer graphics is presented in [94]. They include, but are not limited to: scene and shape complexity analysis, light source placement, view selection in mesh rendering, focus of attention in volume rendering, feature highlighting in time-varying volume visualization, and measuring aesthetics. Specifically for scientific visualization, Wang and Shen [93] present an overview of applications of information theory in this context, which provide greater detail than the high-level overview in [94]. As presented in [93], information theory could be used in:

• View selection in volume rendering: From [93]: Good viewpoints reveal essential information about the underlying data. There- fore, presenting them sooner to viewers can improve both the speed and efficiency of data understanding ... View selection has its practical value in large-scale data visualization when interactive rendering cannot be achieved. ... Consequently, the ratio between visibility and noteworthiness should be somewhat even for all voxels to maximize view entropy.

• Streamline seeding and selection: From [93]: The concept of entropy can be applied to detect salient regions and generate streamlines for flow visualization. ... high entropy regions correspond to a larger degree of variation in the vector directions. These regions are usually near the crit- ical points or other important flow features such as separation lines. Streamline seeds can be placed accordingly to enhance these important features.

34

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 5. Applications of Information Theory

• Transfer function for multimodal data: From [93]:

multimodal visualization complicates the transfer function design because multiple values at every data point need to be considered. The challenge for multimodal visualization is how to fuse multiple parameters in the high-dimensional transfer function space to enable easy and intuitive transfer functions designed in the 2D screen space. In this work, the authors considered the joint occurrence of mul- tiple features from one or multiple variables by utilizing the concept of mutual information is useful here.

• Selection of representative isosurfaces: From [93]:

Isosurface rendering is one of the most popular techniques to visualize volumetric datasets. ... The key issue is how to select salient isovalues such that the surfaces extracted are informative and representative. The similarity between isosurfaces is usually evaluated using mutual information.

• Level-of-detail selection for multiresolution volume visualization: From [93]:

Building a multiresolution data hierarchy from a large-scale dataset allows us to visualize the data at different scales and balancing image quality and computation speed. ... The key to multiresolution volume visualization is to select appropriate LODs that highlight important features in the image for rendering. The goal is to maximize the amount of information contained in the image under a certain constraint about the computation cost.

• Time-varying and multivariate data analysis: From [93]:

Identifying important regions in the data enables effective data reduction, view- ing, and understanding, which provides a scalable solution to handle large-scale data. ... First, a data block itself contains a different amount of information. For example, a data block evenly covering a wide range of values contains more information than another block with uniform values everywhere. Second, a data block conveys a different amount of information with respect to other blocks in the time sequence. For instance, a data block conveys more information if it has less common information with other blocks at different timesteps. Therefore, in- tuitively, a data block is important if it contains more information by itself and its information is more unique with respect to other blocks. By defining the im- portance as the amount of data change over time, they employed the conditional entropy to measure the importance of data blocks quantitatively. The importance value of each block varies over time, indicating its temporal behavior. Clustering all these importance curves for the volume allows classification of data blocks and importance-driven visualization of time-varying datasets

• Information channel between objects and viewpoints: the best viewpoint is achieved when mutual information is minimized when dealing with a set of objects in a field of view.

Chen in [95] suggests using information theory in visual analytics. As he mentioned in [95], “vi- sual analytics must integrate and interpret findings at both microscopic and macroscopic levels”.

35 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Analysts perform well with macroscopic patterns, but they can have quite a challenge with micro- scopic attributes and relations. Information-theoretic techniques can be used to ease handling of the latter (quotations from [95]):

• Information foraging and sensemaking:

from an information-theoretic view, information scent only makes sense if it’s con- nected to the broader information-foraging context, including the search’s goal, the analyst’s prior knowledge, and the contextual situation. This broader context im- plies a deeper connection between the information-theoretic view and the various analytic tasks in situation awareness.

• Evidence and beliefs: From [95], “People review their beliefs when new information be- comes available”, and “adapt their search path according to revised beliefs as the pro- cess progresses. This adaptive strategy is similar to the profit maximization assumption of information-foraging theory”.

• Saliency and novelty: From [95]:

are essential information properties to visual analytics. A salient feature or pattern is prominent in that it stands out perceptually, conceptually, or semantically. In contrast, novelty characterizes the uniqueness of information. ... Information theory can be used to define saliency as statistical outliers in a semantic or visual feature space. Novelty, on the other hand, can be defined as statistical outliers along a specific dimension of the space, such as the temporal dimension.

• Structural holes and brokerage: From [95]:

structural holes are a topological property of a social network ... The structural- hole theory can guide foragers in selecting potentially information-rich paths... An information-theoretic view brings us a macroscopic level of insight.

• Macroscopic views of information content: From [95]:

information entropy is a useful system-level metric for information uncertainty in a large-scale dynamic system. ... To compare and differentiate distributions, one can use information-theoretic metrics such as information bias, which measures the degree to which a subsample differs from the entire sample that it belongs to. We can easily identify high-profile thematic patterns in terms of term fre- quencies. Low-profile thematic patterns are information-theoretic outliers from the mainstream keyword distributions. Low-profile patterns are as important as high-profile patterns in analytical reasoning because they tell us something that we are not familiar with, and so something novel.

With regard to time step selection in time-varying data, Wand and al. [96] assumes “a Markov sequence model for the time-varying data (i.e., any time step t is dependent on time step t − 1, but independent of older time steps), the heuristic of our algorithm is to maximize the joint entropy of the selected time steps.”

36

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 5. Applications of Information Theory

Chen and J¨anicke, [94], define theory and measurements for qualifying visual information. They explain these theories with examples that demonstrate the natural use of information theory in many existing visualization techniques. Table 5.1, subset of Table 1 in [94], presents the funda- mental concepts of information theory as they relate to the subject of visualization.

Table 5.1: Information theory concepts as they relate to visualization. Excerpt from Table 1 in Chen and J¨anicke article ([94]).

Fundamental Concepts A possible mathematical framework that underpins the subject of visualization

Quantitative measurements about the data and visualization Major Quantities and Proper- space, and the relationship between input and output of a ties process or subsystem at different stages of a visualization pipeline.

Entropy Measuring information content; salience in visualization.

Mutual Information Uncertainty reduction in visualization; information-assisted visualization.

Major Theorems Many theorems can be used to explain visualization phe- nomena and events.

Given two visualizations, A and B, the amount of informa- Information Balance tion about A contained in B is the same as that about B in A; overview + detail; multi-view visualization.

After visual mapping, the visualization normally cannot con- Data Processing Inequality tain more information than the original data; information cannot be recovered after being degraded by some processes or subsystems in a visualization pipeline.

Channel Types Providing a theoretical basis for classifying visualization subsystems.

Noiseless Channel Not common in practical visualization pipelines.

Noisy Channel Most visualization processes and subsystems can be affected by noise.

37 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Table 5.1: Information theory concepts as they relate to visualization. Excerpt from Table 1 in Chen and J¨anicke article ([94]).

Fundamental Concepts A possible mathematical framework that underpins the subject of visualization

It can be adapted to define the maximum amount of in- Channel Capacity formation that can be visualized or displayed, but a major extension is necessary when considering channels with mem- ory and interaction.

Redundancy Efficiency of a visual mapping; error detection and correc- tion

Large datasets are now produced on a regular basis. This increase in data volume does not mean that the amount of information pertinent in a given context scales proportionally. “This suggests an information-aware solution for data simplification or reduction” [93]. Information theory can apply and explain a very large number of phenomena and events in visualization. From [94]:

It can provide visualization with a theoretic framework, underpinning many aspects of visualization, including but not limited to visual mapping, interaction, user studies, quality metrics, and knowledge-assisted visualization.

Despite their potential in scientific visualization, they do have some inherent limitations. As noted by Wang and al. in [93]:

• “information theory considers the dataset as a collection of distributions, which may not be suitable for extracting specific spatial structures embedded in the underlying features.”

• “when using histograms, the result can be sensitive to the level of discretization, i.e., the num- ber of bins. This problem can be remedied by using various probability density estimation techniques.”

• “although it works well with frequency probability (in terms of frequencies of occurrence of events, or by relative proportions in populations or collectives), its application with Bayesian probability (in terms of degree of rational belief) is not clear. Bayesian probability is more difficult to apply practically since human observers’ input needs to be incorporated. Fre- quency probability allows us to access the information content of a dataset, but Bayesian probability allows scientists to update their belief when the new evidence is presented or the new result is generated.”

38

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 5. Applications of Information Theory

5.4 Summary

As mentioned in chapter 4, information theory can be applied in many, vastly different fields. Applications in statistical inference, natural language processing, cryptography and cryptanalysis, data compression, neurobiology, the evolution and function of molecular codes, model selection in ecology, thermal physics, quantum computing, linguistics, plagiarism detection, pattern recogni- tion, anomaly detection, bioinformatics, gambling and investing, can be found in the literature. A few applications were found for maritime data analysis, but not as many as expected.

The main application across all domains is to measure the quantity of information using the entropy. Most applications then use the entropy and adapt it to their particular context of applications. Mutual information is also widely used to assess the correlation of random variables for a wide variety of problems, as it is able to handle nonlinear dependence, which is useful in many domains where information sources are disparate and complementary.

Three observations were drawn from the literature survey. First, the number of articles applying information-theoretic techniques to the maritime domain was not as high as expected, slightly marginal compared to other data analysis approaches (data mining, spatial analysis, etc.). How- ever, one could see the research potential of finding new results related to the maritime domain. Second, the information theory approach in most of the articles was not the main focus of the paper, but rather a tool to support another mathematical analysis technique. This is consistent with another statement made in chapter 4, where although information theory measures do provide some metadata, they are somewhat limited by the fact that they are aggregated scalar quantities that can only be compared to one another and limited to the same dataset. Nevertheless, even in the sidekick role, the contribution of information theory is still valuable in selecting parameter range or speeding up a given algorithm.

It was difficult to extract advantages and disadvantages for each of the articles in the literature survey, partially because the articles were sparse in number, sometimes with only partial links to information theory, and often orthogonal from each other in terms of domains. Expectedly, entropy was the most common measure found in the literature survey, while a few papers focused on other measures. Many of the applications using Shannon Information Theory were extrapolated from his work, some successfully, others unsuccessfully, in some cases with with difficulty correctly interpreting results or misusing concepts they are not familiar with. There was no generic study about information theory and its relation or comparison with other techniques such as spatio- temporal analysis. Again, comparisons are made in very specific examples and domains, rather than across the mathematical field of study. Most papers required very little a priori assumption. In fact, it was the other way around, where information theory provided the means to select the more relevant attributes or scale.

39 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

This page is intentionally left blank.

40

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6

Proof-of-Concept Demonstration

Based on the literature survey of the different measures of information and the particular character- istics of an AIS dataset, one can see some application contexts of interest. This chapter highlights the potential of information-theoretic measures to analyze AIS datasets. For the proof-of-concept demonstration, the scope has been defined as follows:

1. The method shall be capable of being integrated into DRDC Atlantic’s Canadian Naval Experimental C2 System;

2. The implemented method will be applied to a DRDC-supplied AIS dataset, covering a period of one month, and having global extent.

As the provided dataset covers only one month’s worth of data, the temporal aspect is not impor- tant at a global scale, as there should not be large variations of patterns to be expected. On a more local investigation, daily patterns, such as a fishing vessel fleet or a ferry, should become visible. One should see the spatial and temporal aspects as scales selection for the analysis of changing patterns highlighted by a specific measure over specific attributes. It might not be possible to demonstrate both scales implication within the limit of this call-up.

One underlying hypothesis of the proof-of-concept is that the exact nature of what we want to find is unknown. Information-theoretic measures will therefore be applied in the context of an exploratory study, and, if needed, in an unsupervised segmentation / classification context.

This chapter is organized as follows:

• Section 6.1 presents candidates for the demonstration of information theory over maritime datasets.

• Section 6.2 presents the results of the implementation of the information-theoretic measure as a proof-of-concept demonstration.

41 Study Report RISOMIA Call-up 14

6.1 Potential Applications of Information Theory for Maritime Domain Awareness

This section presents the potential applications and information-theoretic measures that are of interest for maritime domain awareness. Following the literature survey, one can see that infor- mation theory can be used at several levels of a data mining process where each step feeds the next one. For instance, as an example represented by figure 6.1, it could be used to select rel- evant attributes, extract spatio-temporal patterns over these attributes using segmentation and classification procedures, and so on. Applied at multiple time intervals, one could obtain multiple maps of segmented regions and proceed with an analysis of the temporal variations of segmented patterns.

Figure 6.1: Multi-level application of information theory for analysis of large spatio-temporal datasets

When analyzing spatio-temporal data with information theory, one must always have in mind the aspect of scale, or how to maximize the information content following the application of the algorithms / methods. As such, one should always include a step for determining the appropriate scale for the available data. On that topic, many approaches have been presented in section 6.1.5.

For instance, the approach proposed in [6] is relevant and is based on a certain percentage of empty cells on a grid, while a pure multiscale approach, even if very simple, such as using multiple grids (shown in section 6.1.5.1) of different cell sizes, would be more pertinent for an exploratory

42

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration study. As we do not know the patterns to be observed, the computation of information-theoretic measures at several scales makes more sense.

It seems more intuitive to use a multiscale approach, as different information-theoretic measures might react differently from one another with regard to the scale. For instance, co-occurrence based spatial entropy, as defined in [8], may require a coarser scale than a measure of spatial diversity using ratio distances between pairs of observations as defined in [77].

Most information-theoretic measures, as well as the extension of entropy to the spatial domain, are always applied on discrete attributes. However, some attributes present in AIS data are continuous by nature, speed and heading, for instance. Continuous attributes shall be segmented in a given number of bins prior to the application of information-theoretic measures.

One other aspect is that information-theoretic measures are either global or local. Measured locally, a heat map is produced, i.e, a grid of some sort showing the intensity of the underlying information-theoretic measures. As a first step, a visual analysis might reveal interesting patterns, and should be considered for this proof-of-concept. However, further analysis with an unsupervised segmentation of the grid, i.e, to segment the grid into regions of similar characteristics, should be performed to obtain more meaningful and actionable results. The selection of a clustering algorithm and likelihood function falls out of scope of this study.

6.1.1 Global and Local Entropy Computation

A simple application of information-theoretic measures over a large database, the measure of structuredness, is presented by Yao in [74], and is directly applicable to an AIS database. As suggested by Yao, the entropy of an attribute, as defined by equation 4.1, could be computed for each and every available attribute in the AIS message.

Moreover, a lower entropy suggests that the probability distribution is uneven, and consequently one may have better prediction using that attribute. The attribute entropy serves as a measure of diversity or unstructuredness. An attribute with a larger domain divides the database into more, smaller classes and should have a higher entropy value.

The implementation of such a measure could be used for attribute selection, which could later be used for:

• Indexing of databases to speed query and direct analysis toward meaningful attributes.

• Partition of the AIS dataset using the most useful attributes.

• Spatial diversity computed over the selected attributes (co-occurrence-based entropy, dis- tance ratio, etc. . . ).

This quantification of attribute importance could be seen as the primary step of a data mining pipeline using information theory. In addition, this algorithm is mostly domain agnostic. It presents the advantage of being applicable on any database available in a Naval C2 system.

43 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

The analysis of attribute entropy should also be made locally to assess the existence of spatial patterns. It should also be made in a multiscale context as different attributes should show different patterns at different scales. The division of an area of interest in multiple grids with different cell sizes should be considered as it is a direct and easy way to implement an exploratory multiscale analysis.

At a global scale, all identity attributes should not be considered. As the scale leans toward a more local level, specific attributes should be investigated to highlight particular patterns. For instance, identity attributes and destination information could be introduced into the analysis, as they might provide interesting insight in the MDA, but in an initial experiment, speed, course over ground, and ship type should be considered first, as they might be easier to analyze.

6.1.2 Co-occurrence-based Spatial Entropy for Pattern Extraction

Co-occurrence of attribute values within grid cells, as presented by Leibovici et al. in [8], should reveal interesting patterns in vessel traffic data. As mentioned in that paper, the proximity of occurrences for a particular class, or bin of values, is crucial when assessing the spatial distribution of a categorical variable. However, in our context of applications, investigating the scale at which to compute the measure is of tremendous importance. In order to effectively compute and analyze this measure on an AIS dataset, one would have to perform the following tasks:

• Determine the grid-scale for the spatial analysis of interest, or choose to perform in a multi- scale process using multiple grids;

• Compute the co-occurrence-based spatial entropy as defined in [8].

This method should be applied after a subset of attributes have been selected. Ideally, a quanti- tative method should select the subset automatically to avoid any subjective input. This is where the measures proposed by [74] can be used to select more structure attributes. At larger scales, it is believed that ship type, speed, heading, navigational status, and perhaps destination could provide meaningful and interesting results, while, at a local scale, identity information can be introduced.

These local entropy maps could then be segmented into regions to help highlight a particular phenomenon using both the entropy measures as well as the underlying attributes. The temporal evolution of identified patterns could be achieved by applying the measure over a successive time period.

This information-theoretic measure could be useful in answering questions like: Are there distinct lanes for commercial shipping and other crafts? Do pleasure crafts mix with other types of vessels? One problem that might arise is that, at a selected scale, one grid cell can contain a mix of patterns. It is unclear if and how information-theoretic measures can deal with this situation.

44

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

6.1.3 Spatial Diversity Applied to Vessel/Data Attributes Using Distance Ra- tios

The implementation of the measure of spatial diversity using ratio distances between pairs of observations as defined by Claramunt and al. in [77] is also of interest to high vessel traffic patterns over space and time, as it is reported to be the most responsive to spatial structuring and its evolution. The scale of analysis is also important for deriving meaningful trends and patterns.

In order to effectively compute and analyze this measure on an AIS dataset, one would have to perform the following tasks:

• Determine the grid-scale for spatial analysis, possibly using the method presented by Liu and al. in [6], or use a multiscale approach, and apply it to each detection using a k-nearest neighbors strategy;

• Compute the spatial diversity of discrete attributes.

It is recommended that this measure be applied in a local and multiscale context to investigate the scale effect. Some small areas of interest could be selected to see the effect of the measure on identity attributes in a special context (ferry passing regularly at some specific point) even if this behaviour should be more observable using coastal AIS data.

A visual analysis should then be performed as a first step. If this measure is computed following a supervised/unsupervised segmentation/classification process, the following task could be per- formed:

• Automatically compute the Spatial diversity of regions/patterns.

• Analyze local/global variability.

However, this value-added procedure is out of the scope of the project, as it would require more resources than available.

The results and additional processing are the same as in section 6.1.2, unless applied at the regions/patterns level. In the referenced paper, it is applied at the class level and not on the attribute level. This algorithm seems well-suited for application on a segmented grid but can still be applied directly at the attribute level if such an attribute is discretized.

6.1.4 Other Avenues and Considerations

This section presents other avenues to demonstrate the potential of information theory over large AIS datasets. However, they should not be considered for implementation within this project, due to greater complexity than limited available resources allow.

One could use entropy as a measure of complexity in the time series as a demonstration. This quantity of information could be applied to ship paths (or changes in direction and speed) in order

45 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14 to categorize ships. For instance, fishing ship path characteristics are completely different from those of commercial shipping vessels. Applications would look for differences with the measured profile and could range anywhere from analyzing shipping lane changes (in entropy) due to seasonal activity, to detecting the anomalous behaviour of individual ship tracks in a given area.

On the implementation side, this requires treating the data as tracks, perhaps leading to more complexity. It is believed that implementing a simpler demonstration that still highlights the potential of information theory could serve as a starting point to investigate more complex analysis, like this one, afterwards.

Also, one must keep in mind the aspect of scale during analysis. Using a single scale of analysis for the proof-of-concept is clearly against known patterns of the maritime domain. Vessel traffic is expected to present distinct patterns at different scales depending on the region of interest. Patterns should be affected by scale whether one examines near shore traffic or high sea transit. An exploratory and simple multiscale analysis must be considered for the proof-of-concept. More details are provided in section 6.1.5 on multiscale analysis possibilities.

6.1.5 Space-time Scale Selection for Spatio-Temporal Analysis

The need for multiscale representation arises when designing methods for automatically analyzing and deriving information from signals that are results of real-world measurements [97]. The type of information that can be obtained is to a large extent determined by the relationship between the size of actual structures and patterns in the data, and the size, or resolution, at which we look at them. Some of the very fundamental problems in spatio-temporal data processing concern what type of operators to use, where to apply them, and how large they should be in the spatio-temporal domain. If these problems are not appropriately addressed, then the task of interpreting operator responses can be very hard.

MDA data can easily be related to image and videos in the context of data processing, and especially in the case of scale saliency and selection. It is easy to see the RMP at one time as an image with a collection of point vectors representing the vessel, and the temporal evolution of these points as a video (i.e., vessels are moving).

The correct scale at which to apply information-theoretic measures is an essential parameter to be determined. This section presents several strategies that could be applied in our context and from which the proof-of-concept to be developed could benefit. However, due to project constraints, the simpler strategy should be considered for an exploratory analysis.

6.1.5.1 Multigrid Methods

From Wikipedia [98]:

Multigrid (MG) methods in numerical analysis are algorithms for solving differential equations using a hierarchy of discretizations. They are an example of a class of techniques called multiresolution methods, very useful in problems exhibiting multi-

46

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

ple scales of behavior.

Figure 6.2: A multigrid Example

This representation, as presented in figure 6.2 could be used with maritime datasets by, for example, computing the local entropy of attributes for each grid cell and each grid scale. The visualization of the result or its numerical analysis could be used to highlight patterns that arise/disappear at different scales.

6.1.5.2 Space-time Permutation Scan Statistic

Kulldorf et al.([99]) developed the space-time permutation scan statistic method, which uses cylinders-shaped scanning windows, each being a possible candidate for a pattern of interest, in their case the detection of a disease outbreak.

The circular base represents the geographical area of the potential outbreak. A typical approach is to first iterate over a finite number of geographical grid points, and then gradually increase the circle radius from zero to some maximum value defined by the user, iterating over the zip code [the data] in the order in which they enter the circle. In this way, both small and large circles are considered, all of which overlap with many other circles. The height of the cylinder represents the number of days [or time dimension], with the requirement that the last day is always included together with a variable number of preceding days, up to some maximum defined by the user. ... This means that we will evaluate cylinders that are geographically large and temporally short, forming a flat disk, those that are geographically small and temporally long, forming a pole, and every other combination in between.

The advantages of this method according to the authors are:

• user-friendly

47 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

• only requires case data

• easily adapting to spatial only and temporal only variation.

• spatial adjustment by day-of-week interaction

• ability for dealing with missing information

On the other hand, when dealing with very little data, this method will fluctuate if some of the data is missing or incomplete. Another observation is that the boundary of the detected outbreak may not fit the boundary of the true outbreak. The use of circles for the base of the cylinder window will make detected outbreaks more or less likely circular. The authors tried other shapes for the scanning window but they found that the best one is still the circular base as it has the ability to detect noncircular outbreak areas. The article mentions that space-time permutation scan statistic may be applicable to other domains where early detection is an important issue. The application of this method to derive information-theoretic measures at multiple scales can easily be conceptualized.

6.1.5.3 Scale-Space Analysis

Scale-space representation is a special type of multiscale representation, “ which was developed by the computer vision community to handle image structures at different scales in a consistent manner. The basic idea is to embed the original signal into a one-parameter family of gradually smoothed signals, in which the fine scale details are successively suppressed [97]”.

For instance, in image processing, the linear scale-space representation of an image is a set of images obtained by the successive convolution of the image with a Gaussian kernel of varying scale. The scale parameter is actually the variance of the Gaussian filter. As the variance increases, it results in the smoothing of the image with a larger and larger filter, thereby removing more and more of the details that the image contains. Details that are significantly smaller than the filter are to a large extent removed from the image at a given scale parameter. From [100]:

The motivation for generating a scale-space representation of a given data set originates from the basic observation that real-world objects are composed of different structures at different scales. This implies that real-world objects, in contrast to idealized mathe- matical entities such as points or lines, may appear in different ways depending on the scale of observation.

This assumption is also true for the maritime domain. However, this technique may not be suitable for our problem as it would require filtering out some data, and this might have an unforeseen effect on the result. Varying the neighborhood size around a center point might be a more suitable approach in the case of MDA.

6.1.5.4 Number of Empty Grid Cells for Spatial Entropy

Liu and al. ([6]) use spatial entropy as measures of randomness in the spatio-temporal distribu- tions of ship detections, which are S-AIS detection in a given time window. They investigate the

48

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration dependence of spatial entropy values with relation to the grid size used. It is noted that the value of spatial entropy is dependent on the grid cell size once the area of interest is divided into a grid.

Their conclusions are the following:

• Too few grid cells, meaning that when the average point density is too high, it leads to spatial entropy values close to one, “creating the appearance of a uniformly distributed set”;

• Too many grid cells, with a very low average point density in each cell, “spatial entropy would drop much more than expected for a random set”.

The spatial entropy, computed with the probability of detection in a grid cell, is expressed as:

X S = − pi ln (pi) (6.1)

From [6]:

th Then, assuming that Ni out of N detection points are in the i sub-block, the proba- bility of a detection being in the ith block will be:

Pi(b) = Ni (b) /N (6.2)

and ... [the spatial entropy] then becomes:

(B/b)2 1 X S (b) = P (b) ln (1/P (b)) (6.3) 2 ln (B/b) i i i=1

Note that the factor preceding the sum is to maintain the maximum entropy value normalize to 1. It was determined that the most reasonable values corresponded to densities between one and two points per grid cell. Given their application context, namely determining whether or not there is a detection in a cell, this approach makes sense. It might make less sense in a context where patterns or processes need to be mined at multiple scales. Following this technique in our context, one should aim to have a certain percent of empty cells (40 percent, for instance).

6.1.5.5 Entropy-based Scale Saliency

The notion of saliency can be introduced in terms of local signal complexity or unpredictability. An important issue in salient point detection is the automatic selection of the scale at which the salient points will be detected and the local features will be extracted. From Kadir and Brady [101]:

Finally, there is often an implicit, but difficult to quantify requirement that the salient regions be relevant to the task of interest – in order words, the regions or descriptions

49 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

subsequently extracted from them are somehow characteristic of the scene contents they are intended to signify.

They propose scale saliency:

a novel method for measuring the saliency of image regions and selecting optimal scales for their analysis ... the approach offers a more general model of feature saliency com- pared with conventional ones, such as those based on kernel convolutions, for example wavelet analysis, since such techniques define saliency and scale only with respect to a particular set of basis morphologies.

They estimate the information content in circular neighborhoods at different scales in terms of the entropy. Local extremes of changes in entropy across scales are detected and the saliency of each point at a certain scale is defined in terms of both the entropy and its rate of change at the scale in question. In addition, they developed a clustering algorithm in order to form salient regions from groups of salient points that are detected at similar locations and scales.

The scale selection in this method is limited to the spatial dimension. However, it is easy to see how an image can be related to a MDA dataset of limited spatial and temporal extent. As such, this method is directly applicable to large AIS datasets.

6.1.5.5.1 Spatio-Temporal Saliency

Filipovych and Ribeiro [102] developed “an approach for determining scales of sub-volumes of interest given the locations of spatio-temporal features”. The proposed method works “by mea- suring the average variation of local motion content calculated on subsequences of motion filter responses” on a video feed.

They argue that:

There are two main problems with the saliency measurement approach for automatic scale selection in the spatio-temporal domain [as described in [101]]. First, the saliency measurement ... is based on calculating pD (q, s, x) on all values in the spatio-temporal volume D. We propose an alternative approach that uses the average average spa- tial entropy for saliency calculation. Secondly, all values in pD (q, s, x) are assumed to be independent and identically distributed (i.i.d., uniform prior). As the scale of the analyzing subregion increases, the zero responses (i.e., static pixels) dominate the distribution.

Filipovych and Ribeiro [102] “propose an improvement to the temporal descriptiveness of the distributions pD (q, s, x)”. They analyze the evolution of the average information. In this way, temporal variation is included into the scale detection process. Their algorithms begin “by calcu- lating Shannon’s entropy of local image attributes inside cylindrical spatio-temporal volumes (e.g., intensity, filter response) over a range of scales”. From [102]: Z HD (s, x) = − pD (q, s, x) log2pD (q, s, x) dq (6.4) qD

50

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

The scales s = (s1, ..., sn) represent the size parameters of the analyzed volumes (e.g., spatio-temporal radius cylinder’s radius and length). Once the local entropy values are at hand, a set of candidate scales is selected for which the entropy HD has local maxima.

They demonstrate that their scale selection approach improves the performance of action recogni- tion algorithms in a video feed.

Vessels that change position between two time periods can be used as an equivalence to such motion filter response, as motion can be easily derived from position information contained in an AIS dataset. This scale saliency approach could be transposed and tested in the maritime domain to evaluate the scale of the patterns that are present in the context of MDA.

6.1.6 Summary of the Proposed Proof-of-Concept Demonstration

Three potential demonstrations of information-theoretic measures over large spatio-temporal datasets were proposed in sections 6.1.1 - 6.1.3. It was decided to limit the investigation to the basic entropy (proposed in section 6.1.1) over the entire dataset, to assess the structuredness of the data. Then the entropy is computed locally for some attributes in a multiscale context. The spatial diversity (proposed in section 6.1.3) was also implemented and computed locally in a multiscale context. Both information-theoretic measures were compared against each other using a visual inspection of the obtained values. If interesting patterns/behaviour should arise, conditional entropy (intro- duced in section 4.1.2) and mutual information (introduced in section 4.1.3) to assess one- and two-way association respectively could have been calculated but were not implemented because of project constraints.

Multiscale analysis will be carried out using the method described in section 6.1.5.1. This method is fast to implement, and applying this technique on attributes such as speed and heading should yield interesting results with regards to traffic patterns or particular areas, and serves to highlight the behaviour and potential of information-theoretic measure applications over large spatio-temporal datasets. The scales will be selected in an ad-hoc manner, but will cover a range of values sufficient to highlight both large and small patterns or areas. We propose to limit the investigation to the following cell sizes: 2.0 x 2.0 degrees, 1.0 x 1.0 degrees, 0.5 x 0.5 degrees, 0.1 x 0.1 degrees and 0.05 x 0.05 degrees.

The temporal aspect could be addressed by processing specific time periods one after the other. However, this analysis will not be carried out as part of the proof-of-concept.

6.2 Application of Information-Theoretic Measures over a Large Maritime Dataset

This section presents the results of the application of information-theoretic measures over a large maritime dataset. The dataset used was composed of one month of space-based AIS data of message of all type both mostly compose of type 1 ( 83.7%), type 3 (9%), type 5 (2.3%) and a

51 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14 negligible amount of type 2 (0.1%) and having global coverage around the Earth acquired in April 2014. The other message types are ignored as they are negligible in quantity or irrelevant for this study as they lack position information. Information-theoretic measures were computed globally over the whole dataset and then applied in the Canadian’s EEZ on the East Coast at multiple scales in an exploratory manner to assess the behaviour of such measures in the maritime domain for information mining.

6.2.1 Description of the Dataset

The datasets consist of 30 archive files of raw Satellite AIS data (source: Exact Earth) containing a total of 87,612,712 AIS reports. These files were fed to the DRDC-A MSARI Java application which populated a PostgreSQL database schema (MSARI DB). The dataset covers the period of April 1, 2014, 00:00:00 to April 30, 2014, 23:44:37. The dataset contains also some AIS reports (178,355) that mostly overlap with the end of March, and which were not removed. The data covers the whole planet. The data provided by exactEarth was not sampled at a fixed rate (like other sources such as MSSIS).

AIS data are mostly of types 1,3 and 5, and the entropy and spatial diversity were computed over the course over ground, speed and ship type attributes. These were chosen under the assumption that patterns at global and local scales should stand out using a visual inspection of the obtained values. It was reasonable to assume that maritime traffic patterns are affected strongly by these attributes.

6.2.2 Global Entropy of the Dataset

Table 6.1 presents the global normalized entropy (using a base 2 logarithm) of the different at- tributes of AIS messages. Only valid values were used in the computation. Magnitude was not considered in the case of the Rate of Turn attributes. Entropy values are normalized by dividing the sum of surprisals (the entropy) by the binary log of the number of classes.

Hnorm = H/log2(numberofdistinctvalues) (6.5)

The high entropy value of the Course over Ground and True Heading attributes are normal at a global scale. However, this suggests that the analysis of the AIS data would benefit from a more local or multiscale perspective as Course over Ground and True Heading should be an important component of the MDA.

Attributes related to the identity of a vessel, such as IMO number or MMSI, have been left out of this analysis. As one might suspect, at a global scale, the normalized entropy should be close to one, due to the nature of the attribute itself. But, at a more local scale, this attribute might be extremely relevant, for example, the identity of a ferry boat crossing an area several times a day.

Attributes related to dimensions are considered independently in this table; however, it might prove more useful to consider them in combination in order to obtain the overall dimension of the ship. This was not performed as part of this study. The application of other information-theoretic

52

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

Table 6.1: Normalized Entropy, or relative entropy, for different attributes of an AIS message for the complete dataset

Attribute Range of values Number of bins Normalized Entropy

AIS Type 1-3

Navigation Status [0 − 15] 16 0.33

Rate of Turn [0 − 127] 128 0.5

Speed over Ground [0 − 103] 104 0.64

Course over Ground [0 − 360[ 360 0.99

True Heading [0 − 360[ 360 0.99

AIS Type 5

Ship Type [0 − 99] 100 0.54

Dimension to Bow [0 − 520] 52 0.76

Dimension to Stern [0 − 520] 52 0.55

Dimension To Port [0 − 63] 64 0.31

Dimension to Starboard [0 − 63] 64 0.30

Draught [0 − 255] 26 0.46

53 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14 measures, such as mutual information, might also provide interesting insight, for example if there is a two-way association between draught and ship type.

Section 6.2.3 presents the results of the computation of the entropy and spatial diversity over three different attributes, namely: the COG, the speed and the ship type. These computations were performed locally, meaning that the region of interest has been divided using a grid and the information-theoretic measures have been computed within each grid cell using only the values present within a cell. The attributes were selected based on a priori assumptions about their relative importance in uncovering traffic patterns in a spatio-temporal maritime dataset.

6.2.3 Multiscale Analysis and Measures Comparison

Figures 6.3 – 6.5 provide the results of the computation of the entropy. Figures 6.6 – 6.8 provide the results of the computation of spatial diversity applied locally at multiple scales over a month of AIS data in the Canadian EEZ. Figures 6.9, 6.10 and 6.11 provide a side by side comparison of entropy and spatial diversity over the two smaller scales.

Entropy is normalized using the equation presented in 6.2.2, applied within each grid cell. As each cell in the grid has a varying number of reports and attribute values, the log operation is recomputed for each cell to take into account only the values that are present. The resulting normalized entropy is bounded between 0 and 1 and enables the comparison of cells among themselves.

Spatial diversity was also computed for each grid cell, taking into account only the reports present in a cell. In our study, λ parameter was set to 0, for the reason that in the original paper where λ was set to one, Claramunt applied it to rooms, which are static objects that require at least one step to pass from one to another. In his case, distance was measured in steps, and performing zero step made no sense. In our problem, we are using the great circle distance, and it is possible that 2 different reports occur at the same position, hence we set the value to zero.

The diversity factor frees the diversity entropy of its upper bound and requires us to apply a different normalization procedure on it in order to retain the same heat map scheme. This level of normalization rescales all results between the maximum and minimum entropy diversity values so that diversity values are constrained between 0 and 1. For every cell diversity H, the normalization is calculated as follows:

Hnorm = (H − min(H))/(max(H) − min(H)) (6.6) which is the generic equation for normalizing a variable. This makes the comparison between diversity maps and entropy maps more difficult but allows us to have a common scheme for displaying results. The minimum and maximum values of the normalized spatial diversity are the minimum and maximum values calculated over all grid cells.

It is important to note that both entropy and spatial diversity are not computed in a grid cell if fewer than two reports are present in that cell.

An overall observation is that the spatial diversity is globally lower than the entropy, which suggests that the different values in each cell are not evenly distributed. Some patterns, particularly the shipping lanes, exhibit lower entropy at multiple scales for the speed and COG attributes, which

54

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

(a) Scale of 2 degrees (b) Scale of 1 degree

(c) Scale of 0.5 degree (d) Scale of 0.1 degree

Figure 6.3: Entropy of the course over ground attributes over multiple scales

55 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

(a) Scale of 2 degrees (b) Scale of 1 degree

(c) Scale of 0.5 degree (d) Scale of 0.1 degree

Figure 6.4: Entropy of the speed attribute over multiple scales

56

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

(a) Scale of 2 degrees (b) Scale of 1 degree

(c) Scale of 0.5 degree (d) Scale of 0.1 degree

Figure 6.5: Entropy of the ship type attribute over multiple scales

57 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

(a) Scale of 2 degrees (b) Scale of 1 degree

(c) Scale of 0.5 degree (d) Scale of 0.1 degree

Figure 6.6: Spatial diversity of the course over ground attributes over multiple scales

58

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

(a) Scale of 2 degrees (b) Scale of 1 degree

(c) Scale of 0.5 degree (d) Scale of 0.1 degree

Figure 6.7: Spatial diversity of the speed attribute over multiple scales

59 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

(a) Scale of 2 degrees (b) Scale of 1 degree

(c) Scale of 0.5 degree (d) Scale of 0.1 degree

Figure 6.8: Spatial diversity of the ship type attribute over multiple scales

60

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

(a) Entropy at 0.5 degrees (b) Diversity at 0.5 degrees

(c) Entropy at 0.1 degrees (d) Diversity at 0.1 degrees

Figure 6.9: Entropy and spatial diversity of the course over ground attributes at 0.5 and 0.1 degrees

61 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

(a) Entropy at 0.5 degrees (b) Diversity at 0.5 degrees

(c) Entropy at 0.1 degrees (d) Diversity at 0.1 degrees

Figure 6.10: Entropy and spatial diversity of the speed attribute at 0.5 and 0.1 degrees

62

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

(a) Entropy at 0.5 degrees (b) Diversity at 0.5 degrees

(c) Entropy at 0.1 degrees (d) Diversity at 0.1 degrees

Figure 6.11: Entropy and spatial diversity of the ship type attribute at 0.5 and 0.1 degrees

63 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14 implies that, for these attributes, the range of values is more uniform within these spatial patterns. It might be worthwhile, in this case, to investigate the application of spatio-temporal scale selection techniques or saliency measures to extract the optimal scale that describes this pattern. Moreover, these patterns are much less visible when looking at the spatial diversity for all attributes. However, in the latter case, the applied normalization function might have had an impact on the resulting display. This is discussed later. Finally, cells in the area of ports have a low speed entropy.

Figures 6.13 and 6.14 present a zoomed portion of the entropy and spatial diversity over the St- Lawrence Gulf for scales of 0.5, 0.1 and 0.05 degrees. These figures highlight the low entropy of both course over ground and speed attributes at the entrance of the St-Lawrence river and within the shipping lanes. In addition, both seems to be linked over shipping lanes, suggesting that the application of conditional entropy or mutual information might serve to highlight other patterns in the dataset.

Also visible on these figures is the ferry route between Nova Scotia and Newfoundland. However, the latter is only visible on scales of 0.1 and 0.05 degrees, supporting even more the need to investigate scale selection or saliency algorithms. Refer to figure 6.12 for a map of the actual ferry routes in the investigated area.

On the spatial diversity plot, the ferry route appears to be visible as a region of slightly increased spatial diversity against a background of low spatial diversity. More investigation is needed to assess the reason of this behaviour. However, one assumption is that the presence of very few different values can cause the external distance to be quite low, therefore increasing the spatial diversity value.

64

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

Figure 6.12: Ferry Route map (from http://newfoundland.hilwin.nl/PHP/en/gettingthere. php )

Across scales, we can see a decrease in coverage for all of the investigated attributes. This is related to implementation, where if an insufficient number of reports is present within a grid cell, the entropy and spatial diversity are not computed. As a consequence, the ferry route between Blanc Sablon and Sept-Iles in Quebec Province is only visible at the 0.5 degree scale on the entropy map of all the attributes investigated.

Figures 6.15 to 6.23 present the entropy of the previously used attributes on a global coverage. The computation time could quite long depending of the time scale and the region that is being studied. For example, for the most of the graphs presented in this paper, the calculations were performed on an Ubuntu Server 14.04 with i7 Intel processor (4 cores) and 32GB of RAM. For calculating entropy, it took 20 minutes for the world map with a 2x2 degree grid, 90 minutes for a 1x1 degree grid for a month of AIS data. For diversity, it took 10 to 20 times longer compared with the entropy calculations.

The shipping lanes are clearly visible and exhibit lower entropy at multiple scales for the speed, COG and ship type attributes, which implies that for these attributes, the range of values is more uniform within these spatial patterns. In the global coverage case, the application of a scale selection algorithm should be useful to select the correct scale for these patterns.

65 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

(a) Entropy at 0.5 degrees (b) Diversity at 0.5 degrees

(c) Entropy at 0.1 degrees (d) Diversity at 0.1 degrees

(e) Entropy at 0.05 degrees (f) Diversity at 0.05 degrees

Figure 6.13: Entropy and spatial diversity of the course over ground attributes at 0.5, 0.1 and 0.05 degrees

66

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

(a) Entropy at 0.5 degrees (b) Diversity at 0.5 degrees

(c) Entropy at 0.1 degrees (d) Diversity at 0.1 degrees

(e) Entropy at 0.05 degrees (f) Diversity at 0.05 degrees

Figure 6.14: Entropy and spatial diversity of the speed attribute at 0.5, 0.1 and 0.05 degrees

67 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Figure 6.15: Entropy of the course over ground attribute at 2.0 degree

68

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

Figure 6.16: Entropy of the speed attribute at 2.0 degree

69 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Figure 6.17: Entropy of the ship type attribute at 2.0 degree

70

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

Figure 6.18: Entropy of the course over ground attribute at 1.0 degree 71 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Figure 6.19: Entropy of the speed attribute at 1.0 degree 72

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

Figure 6.20: Entropy of the ship type attribute at 1.0 degree

73 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Figure 6.21: Entropy of the course over ground attribute at 0.5 degree

74

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

Figure 6.22: Entropy of the speed attribute at 0.5 degree 75 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

Figure 6.23: Entropy of the ship type attribute at 0.5 degree

76

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 6. Proof-of-Concept Demonstration

Both entropy and spatial diversity were computed using all the available data. For the entropy, this means that we are more confident about obtained results as we have a better approximation of the probability distribution. However, this seems to be problematic for spatial diversity, if compared to a limited time window. More samples will impact the distance ratio used for its calculation. Further investigation would be needed to assess the extent of the impact. Moreover, one can observe that spatial diversity tends toward zero as the size is decreased. This is a by-product of the normalization process. The maximum spatial diversity value, occuring in a cell is very high when compared to the other cells of the grid. As such, the normalization process used for visual display tends to push most of the values toward zero. It would be interesting to use the actual spatial diversity values rather than normalized ones for further investigation or more advanced analysis, as the simple normalization function used in this project seems to be problematic, at least for visual inspection of the results.

This visual inspection suggests that information-theoretic measures could be used to extract pat- terns from a spatio-temporal dataset. Automatic extraction of spatio-temporal features could be achieved by feeding the results in a more evolved and complete information extraction pipeline. As an example, applying a clustering algorithm over the resulting images, followed by other types of analysis, such as shape extraction on obtained regions (for instance, elongated regions having more probability to be shipping lanes).

Following the analysis of the obtained results, spatio-temporal scale saliency or selection approaches could be implemented to avoid a subjective scale representation of the patterns.

77 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

This page is intentionally left blank.

78

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Part 7. Conclusion

Part 7

Conclusion

Information theory is the mathematical study of the encoding of information, and studies the transmission (communication), processing (encryption, encoding/decoding), utilization, and ex- traction of information. This field defines some quantity of information measures, such as entropy, that can be generalized or applied to other types of information besides encoded communications. Information theory has been successfully applied to many different fields.

Results of the literature survey showed that although information theory has been applied in many heterogenous fields, very little research has been made in the maritime domain. The main application across all domains is the measure of information quantity using entropy. Most of the applications then use entropy and adapt it to their particular context of application. Mutual information is also widely used to assess the correlation of random variables for a wide variety of problems, as it is able to handle nonlinear dependence, which is useful in many domains where information sources are disparate and complementary. Specific for spatio-temporal data, it can be used to extract motion patterns and their appropriate scale.

By its very nature, information theory should be able to handle problems specific to AIS data, as well as those arising when performing spatio-temporal data mining. AIS data are suitable for information-theoretic analysis in a spatio-temporal context, and provide interesting results.

The entropy and spatial diversity measures were applied locally at multiple scales, on selected attributes of AIS data, within Canada’s Exclusive Economic Zone. Visual inspection was made to assess its potential to highlight patterns in this kind of data. It was demonstrated that transit zones, or shipping routes, exhibit fairly stable characteristics resulting in lower entropy and spatial diversity intensity, compared to areas where multiple activities seem to occur.

In a multiscale context, large shipping lanes are visible at different scales. However, smaller pat- terns, such as ferry routes, are only visible at a very fine scales. Temporal behaviour was not investigated, however, results suggest that the implementation of the spatio-temporal scale selec- tion process should lead to the extraction of meaningful maritime processes over large datasets. As processing is concerned, one should consider implementing the information-theoretic measure within a framework designed for large-scale cloud computing, to speedup information extraction, even in an exploratory study, as the runtime required over large datasets quickly becomes pro-

79 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14 hibitive on a single processor, especially for a complex computation such as spatial diversity.

Information-theoretic measures have the potential to be used over large maritime datasets in a data mining context, and this potential, to our knowledge, remains largely untapped based on the conducted literature survey. Further investigation should be considered to truly assess their full potential when applied over large maritime spatio-temporal datasets. For instance, investigating more attributes at a local scale, testing one- and two-way association measures, and implementing a space-time scale selection mechanism should be considered as a potential future research avenue. Performance comparison between information theory and another theoretical framework should also be made in a specific context of applications.

80

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Bibliography

[1] exactEarth. Applications of satellite-ais (s-ais) for national defence & security. Technical report, exactEarth, 2015. URL http://cdn2.hubspot.net/hub/183611/file-28645722- pdf/Collateral_for_Download/Defence_whitepaper.pdf.

[2] Anthony W Isenor, Marie-Odette St-Hilaire, Sean Webb, and Michel Mayrand. Msari: A database for large volume storage and utilisation of maritime data. The Journal of Navigation, pages 1–15, 2016. doi: http://dx.doi.org/10.1017/S0373463316000540. URL https://www.cambridge.org/core/journals/journal-of-navigation/article/msari- a-database-for-large-volume-storage-and-utilisation-of-maritime- data/3E7BA68539C63E2FB9E3FC6CA474737D.

[3] Tim Hammond, Mark McIntyre, David M Chapman, and Anna-Liesa S Lapinski. The implications of self-reporting systems for maritime domain awareness. Technical report, DTIC Document, December 2006. URL http://cradpdf.drdc-rddc.gc.ca/PDFS/unc64/p527761.pdf.

[4] KG Aarsæther and T Moan. Computer Vision and Ship Traffic Analysis: Inferring Maneuver Patterns from the Automatic Identification System, chapter 31, pages 177–182. 2009. URL http: //www.cesos.ntnu.no/cesos/Files_joomla/KGAarsaether_TMoan_TRANSNAV2009.pdf.

[5] Marie-Odette St-Hilaire and Melita Hadzagic. Information mining technologies to enable discovery of actionable intelligence to facilitate maritime situational awareness: I-mine. Technical report, DRDC, January 2013. URL http://cradpdf.drdc-rddc.gc.ca/PDFS/unc198/p800505_A1b.pdf.

[6] Min Jing Liu, Peter Dobias, and Cheryl Eisler. Fractal patterns in coastal detections on approaches to canada. 2015. URL http://www.orlabanalytics.ca/jaor/archive/v7/n2/jaorv7n2p80.pdf.

[7] Didier G Leibovici. Defining spatial entropy from multivariate distributions of co-occurrences. COSIT 2009, pages 392–404, 2009. URL http://www.academia.edu/877795/Defining_spatial_entropy_from_multivariate_ distributions_of_co-occurrences.

[8] Didier G Leibovici, Christophe Claramunt, Damien Le Guyader, and David Brosset. Local and global spatio-temporal entropy indices based on distance-ratios and co-occurrences distributions. International Journal of Geographical Information Science, 28(5):1061–1084,

81 Study Report RISOMIA Call-up 14

2014. URL https://www.researchgate.net/profile/Christophe_Claramunt/ publication/262938981_Local_and_global_spatio- temporal_entropy_indices_based_on_distance-ratios_and_co- occurrences/links/55b500f208aed621de02d82f.pdf?origin=publication_detail.

[9] Bilal Idiri. M´ethodologie d’extraction de connaissances spatio-temporelles par fouille de donn´eespour l’analyse de comportements `arisques: application `ala surveillance maritime. PhD thesis, Ecole Nationale Sup´erieuredes Mines de Paris, 2013. URL https://halshs.archives-ouvertes.fr/tel-01124006/document.

[10] Jinbiao Chen. Study on maritime search and rescue decision-support system. International Journal of Innovative Science, 1(9):123–129, November 2014. URL http://www.ijiset.com/v1s9/IJISET_V1_I9_20.pdf.

[11] Giuliana Pallotta, Michele Vespe, and Karna Bryan. Vessel pattern knowledge discovery from ais data: A framework for anomaly detection and route prediction. Entropy, 15(6): 2218–2245, 2013. URL http://www.cmre.nato.int/employment/current- vacancies/doc_download/685-vessel-pattern-knowledge-discovery-from-ais- data-a-framework-for-anomaly-detection-and-route-prediction.

[12] Giuliana Pallotta, Michele Vespe, and Karna Bryan. Traffic knowledge discovery from ais data. In 2013 16th International Conference on Information Fusion (FUSION), pages 1996–2003. IEEE, July 2013. URL http://fusion.isif.org/proceedings/fusion2013/html/pdf/Friday,%20%2012% 20July%202013/13.10-14.50/58.Special%20Session%20Context- based%20Information%20Fusion%20I%20Barbaros%20%20B/2- 281_128Traffic%20Knowledge.pdf.

[13] Junier B. Oliva. Anomaly detection and modeling of trajectories. Master’s thesis, Carnegie Mellon University, School of Computer Science, August 2012. URL http://reports-archive.adm.cs.cmu.edu/anon/2012/CMU-CS-12-133.pdf.

[14] Gon¸caloCruz and Alexandre Bernardino. Image saliency applied to infrared images for unmanned maritime monitoring. In International Conference on Computer Vision Systems, pages 511–522. Springer, 2015. URL http://vislab.isr.ist.utl.pt/publications/15-ICVS-GCruz.pdf.

[15] Cl´ement Iphar, Aldo Napoli, and Ray Cyril. Data quality assessment for maritime situation awareness. In ISSDQ 2015-The 9th International Symposium on Spatial Data Quality, volume 2, 2015. URL https://hal-mines-paristech.archives- ouvertes.fr/hal-01269684/file/Iphar_ISSDQ.pdf.

[16] Cl´ement Iphar, Aldo Napoli, and Cyril Ray. Detection of false ais messages for the improvement of maritime situational awareness. In Oceans’ 2015, 2015. URL https://hal-mines-paristech.archives-ouvertes.fr/hal-01203049/document.

[17] Marco Guerriero, Stefano Coraluppi, Craig Carthel, and Peter Willett. Analysis of ais intermittency and vessel characterization using a hidden markov model. In GI Jahrestagung (2), 2010. URL http://natorto.cbw.pl/uploads/2010/1/NURC-FR-2010-002.pdf.

82

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. BIBLIOGRAPHY

[18] Christoffer Brax. Anomaly detection in the surveillance domain. PhD thesis, Orebro¨ University, 2011. URL https://www.diva-portal.org/smash/get/diva2:431243/FULLTEXT01.pdf.

[19] Fotios Katsilieris, Paolo Braca, and Stefano Coraluppi. Detection of malicious ais position spoofing by exploiting radar information. In 2013 16th International Conference on Information Fusion (FUSION). IEEE, July 2013. URL http://fusion.isif.org/proceedings/fusion2013/html/pdf/Thursday,%2011% 20July%202013/13.10-14.50/35.Special%20Session%20Trust%20inBarbaros%20A/3- %20172_140%20Detection%20of%20malicious.pdf.

[20] Jeroen HM Janssens. Outlier selection and one-class classification. PhD thesis, Tilburg University, June 2013. URL http://jeroenjanssens.com/jeroenjanssens-thesis.pdf.

[21] Steven Mascaro, Ann E Nicholso, and Kevin B Korb. Anomaly detection in vessel tracks using bayesian networks. International Journal of Approximate Reasoning, 55(1):84–98, 2014. URL http://ceur-ws.org/Vol-818/paper13.pdf.

[22] Floris Goerlandt, J Montewka, E Sonne Ravn, M H¨anninen,and A Mazaheri. Analysis of the near-collisions using ais data for the selected locations in the baltic sea. Report, Department of Applied Mechanics, January 2012. URL http://efficiensea.org/files/mainoutputs/wp6/d_wp6_2_03.pdf.

[23] Agnieszka Lazarowska. Ship’s trajectory planning for collision avoidance at sea based on ant colony optimisation. Journal of Navigation, 68(02):291–307, 2015. URL http://www.gdansk.pan.pl/images/Zgloszenia/N_TECHN_Nagroda_OddzPAN/ LAZAROWSKA_Agnieszka/LAZAROWSKA%20Agnieszka_publikacja.pdf.

[24] Niels Willems. Visualization of vessel traffic. PhD thesis, October 2011. URL http://www.win.tue.nl/vis1/home/cwillems/public/phdthesis.pdf.

[25] Urˇska Demˇsarand Kirsi Virrantaus. Space–time density of trajectories: exploring spatio-temporal patterns in movement data. International Journal of Geographical Information Science, 24(10):1527–1542, 2010. URL https://mycourses.aalto.fi/pluginfile.php/222850/mod_folder/content/0/ DemsarArticle.pdf?forcedownload=1&usg=AFQjCNGQKgrrRFdF7R8QEeAoljLMHpvh_w.

[26] Sabarish Senthilnathan Muthu. Visualization, statistical analysis, and mining of historical vessel data. Master’s thesis, University of New Brunswick, 2015. URL http: //www2.unb.ca/~estef/UNB_Home_files/theses/2015_MScE_Sabarish_Muthu.pdf. [27] Roeland Scheepens, Niels Willems, Huub van de Wetering, Gennady Andrienko, Natalia Andrienko, and Jarke J van Wijk. Composite density maps for multivariate trajectories. IEEE Transactions on Visualization and Computer Graphics, 17(12):2518–2527, 2011. URL http://www.win.tue.nl/vis1/home/cwillems/public/infovis11.pdf.

[28] Ove Daae Lampe, Johannes Kehrer, and Helwig Hauser. Visual analysis of multivariate movement data using interactive difference views. In VMV, pages 315–322, 2010. URL http://www.ii.uib.no/vis/publications/pdfs/lampe10difference.pdf.

83 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

[29] Mar´ıaJos´eRiveiro. Visual analytics for maritime anomaly detection. 2011. URL http://oru.diva-portal.org/smash/get/diva2:381336/FULLTEXT06.pdf. [30] Tal Meir. Acoustic ship classification. The DHS Science Conference - Fifth Annual University Network Summit, 2011. URL https://www.orau.gov/dhssummit/presentations/studentday/meir.pdf. [31] Volpe. Maritime safety and security information system (mssis), January 2015. URL https://www.volpe.dot.gov/infrastructure-systems-and- technology/situational-awareness-and-logistics/maritime-safety-and. [32] A Isenor, M-O St-Hilaire, S Webb, and M Mayrand. Msari: A database for large volume storage and utilization of maritime data. Journal of Navigation, 2016. [33] Peter B. de Selding. Orbcomm, exactearth compete for key canadian satellite-ais contract, February 2016. URL http://spacenews.com/orbcomms-skywave-exactearth-compete- for-key-canadian-government-satellite-ais-contract/. [34] James K.E. Tunaley. Utility of various ais messages for maritime awareness. In 8th Advanced SAR Workshop, Canada, 2013. [35] U.S. Department of Homeland Security and U.S. Coast Guard. U.s. navigation center. https://www.navcen.uscg.gov/?pageName=AISMessagesA, 2017. URL https://www.navcen.uscg.gov/?pageName=AISMessagesA. [36] Alfredo Alessandrini, Pietro Argentieri, Marlene Alvarez Alvarez, Thomas Barbas, Conor Delaney, Virginia Fernandez Arguedas, Vincenzo Gammieri, Harm Greidanus, Fabio Mazzarella, Michele Vespe, et al. Data driven contextual knowledge from and for maritime situational awareness. Context-Awareness in Geographic Information Services (CAGIS 2014), page 39, 2014. [37] Eric Meger. Limitations of satellite ais: Time machine wanted! October 2013. URL http://store.cloudeo-ag.com/sites/default/files/product_images/6% 20Limitations%20of%20Satellite%20AIS%20-%20Time%20Machine%20Wanted.pdf. [38] Dan Radulescu, Marie-Odette St-Hilaire, Yannick Allard, and Tim Hammond. Sharing ais-related anomalies (sara). Technical Report Unpublished, DRDC Atlantic, March 2016. [39] Michel Mayrand. Maritime situational awareness research infrastructure (msari): Performance analysis, monitoring and optimization. Technical report, DRDC Atlantic, March 2014. [40] Aungon Nag Radon, Ke Wang, Uwe Glasser, Hans Wehn, and Andrew Westwell-Roper. Contextual verification for false alarm reduction in maritime anomaly detection. In Big Data (Big Data), 2015 IEEE International Conference on, pages 1123–1133. IEEE, 2015. [41] European Commission. Blue hub – transform data into knowledge, 2016. URL https://bluehub.jrc.ec.europa.eu/research_areas_maritime/. [42] Bannister N. P. and Neyland D. L. Maritime domain awareness with commercially accessible electro-optical sensors in space. International Journal of Remote Sensing, 36(1), 2015.

84

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. BIBLIOGRAPHY

[43] Kazuo Ouchi. Recent trend and advance of synthetic aperture radar with selected topics. Remote Sensing, 5(2):716–807, 2013.

[44] Desmond Power, Bing Yue, and James Youden. Summary of previous research in iceberg and ship detection and discrimination in sar. Technical Report DRDC-RDDC-2014-C79, C-Core, December 2013.

[45] Ihs fairplay. https://www.ihs.com/products/maritime-world-ship-register.html. URL https://www.ihs.com/products/maritime-world-ship-register.html.

[46] Mars itu. http://www.itu.int/en/ITU-R/terrestrial/mars/Pages/default.aspx. URL http://www.itu.int/en/ITU-R/terrestrial/mars/Pages/default.aspx.

[47] Shipspotting. http://www.shipspotting.com/. URL http://www.shipspotting.com/.

[48] Tokyo motion of understanding. http://212.45.16.136/isss/public_apcis.php?Action=getSearchForm. URL http://212.45.16.136/isss/public_apcis.php?Action=getSearchForm.

[49] Paris motion of understanding. https://www.parismou.org/inspection-search. URL https://www.parismou.org/inspection-search.

[50] Transportation safety board of canada marine. http://www.tsb.gc.ca/eng/stats/marine/index-ff.asp. URL http://www.tsb.gc.ca/eng/stats/marine/index-ff.asp.

[51] Imo marine casualties and incidents database. https://gisis.imo.org/Public/MCI/Default.aspx. URL https://gisis.imo.org/Public/MCI/Default.aspx.

[52] National Geospatial Agency. World wide threat to shipping reports. http://msi.nga. mil/NGAPortal/MSI.portal?_nfpb=true&_st=&_pageLabel=msi_portal_page_64. URL http://msi.nga.mil/NGAPortal/MSI.portal?_nfpb=true&_st=&_pageLabel=msi_ portal_page_64.

[53] Protected planet. http://www.protectedplanet.net/. URL http://www.protectedplanet.net/.

[54] Northwest atlantic fisheries organization. http://www.nafo.int/data/frames/data.html. URL http://www.nafo.int/data/frames/data.html.

[55] Trygg Mat Tracking. Combined iuu vessel list, 2016. URL http://iuu-vessels.org/iuu.

[56] Global fishing watch. http://globalfishingwatch.org/. URL http://globalfishingwatch.org/.

[57] exactearth density maps. http://www.exactearth.com/products/exactais-density-maps. URL http://www.exactearth.com/products/exactais-density-maps.

[58] Nceas analysis products. https://www.nceas.ucsb.edu/globalmarine2008/impacts. URL https://www.nceas.ucsb.edu/globalmarine2008/impacts.

85 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

[59] Bathymetric products. http://www.charts.gc.ca/data-gestion/bathy/bathymetri-eng.asp. URL http://www.charts.gc.ca/data-gestion/bathy/bathymetri-eng.asp. [60] Maasmond Maritime. Daily collection of maritime press clippings. http://newsletter.maasmondmaritime.com/ShippingNewsPdf/magazine.pdf. URL http://newsletter.maasmondmaritime.com/ShippingNewsPdf/magazine.pdf. [61] S. Shekhar, M.R. Evans, J.M. Kang, and P. Mohan. Identifying patterns in spatial information: a survey of methods. John Wiley & Sons, Inc. WIREs Data Mining Knowledge Discovery, 1:193–214, 2011. [62] H. J. Miller. Handbook of geographic information science, eds: J. p. wilson and a. s. fotheringham, blackwell publishing geospatial ontology development and semantic analytics, 2008. [63] Erico N de Souza, Kristina Boerder, Stan Matwin, and Boris Worm. Improving fishing pattern detection from satellite ais using data mining and machine learning. PloS one, 11 (7):e0158248, 2016. URL http://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0158248. [64] Raymond W Yeung. A first course in information theory. Springer Science & Business Media, 2012. URL http://iest2.ie.cuhk.edu.hk/~whyeung/post/draft7.pdf. [65] Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons, 2012. URL http://coltech.vnu.edu.vn/~thainp/books/Wiley_-_2006_- _Elements_of_Information_Theory_2nd_Ed.pdf. [66] Marine Minier. Th´eoriede l’information. 2012. URL http://perso.citi.insa-lyon.fr/mminier/images/Th_Info.pdf. Lecture notes. [67] Marie-Pierre B´ealand Nicolas Sendrier. Th´eoriede l’information et codage (notes de cours). 2012. URL http://igm.univ-mlv.fr/~beal/Enseignement/TheorieInfo/info.pdf. [68] F Bavaud, J.-C. Chappelier, and J. Kohlas. Introduction `ala th´eoriede l’information et ses applications, February 2008. URL http://icwww.epfl.ch/~chappeli/it/pdf/FullCourseEPFL-FR.pdf. Lecture notes. [69] David JC MacKay. Information theory, inference and learning algorithms. Cambridge university press, 2003. URL http://www.inference.phy.cam.ac.uk/itprnn/book.pdf. [70] J.G. Daugman. Information theory and coding, 2009. URL http://www.cl.cam.ac.uk/teaching/0809/InfoTheory/InfoTheoryLectures.pdf. Lecture notes. [71] Entropy (information theory). URL https://en.wikipedia.org/wiki/Entropy_(information_theory). [72] T. Roszak. The Cult of Information: The Folklore of Computers and the True Art of Thinking. Lutterworth Press, 1986. ISBN 9780718826741. URL https://books.google.ca/books?id=QGStQgAACAAJ.

86

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. BIBLIOGRAPHY

[73] Renyi entropy. URL https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy. [74] YY Yao. Information-theoretic measures for knowledge discovery and data mining. In Entropy Measures, Maximum Entropy Principle and Emerging Applications, pages 115–136. Springer, 2003. URL http://www2.cs.uregina.ca/~yyao/PAPERS/information_measures.pdf. [75] Information theory. URL https://en.wikipedia.org/wiki/Information_theory. [76] Florence Tupin, Marc Sigelle, and Henri Maitre. Definition of a spatial entropy and its use for texture discrimination. In Image Processing, 2000. Proceedings. 2000 International Conference on, volume 1, pages 725–728. IEEE, 2000. URL https://www.semanticscholar.org/paper/Definition-of-a-Spatial-Entropy-and- its-Use-for-Tupin-Sigelle/df05eed5fe788289c4ca6fe7b1eddd2538f07cd8/pdf. [77] Christophe Claramunt. A spatial form of diversity. In International Conference on Spatial Information Theory, pages 218–231. Springer, 2005. URL http://christophe.claramunt.free.fr/images/papers/ClaramuntCosit05.pdf. [78] Visual salience. URL http://www.scholarpedia.org/article/Visual_salience. [79] Fred Attneave. Some informational aspects of visual perception. Psychological review, 61 (3):183, 1954. URL http://wexler.free.fr/library/files/attneave%20(1954) %20some%20informational%20aspects%20of%20visual%20perception.pdf. [80] Jacob Feldman and Manish Singh. Information along contours and object boundaries. Psychological review, 112(1):243–252, 2005. URL http: //ruccs.rutgers.edu/images/personal-jacob-feldman/papers/feldman_singh.pdf. [81] Yuejun Guo, Qing Xu, Shihua Sun, Xiaoxiao Luo, and Mateu Sbert. Selecting video key frames based on relative entropy and the extreme studentized deviate test. Entropy, 18(73), March 2016. URL http://www.mdpi.com/1099-4300/18/3/73/htm. [82] Geoffrey Egnal and Kostas Daniilidis. Image registration using mutual information. 2000. URL http: //repository.upenn.edu/cgi/viewcontent.cgi?article=1119&context=cis_reports. [83] Josien PW Pluim, JB Antoine Maintz, and Max A Viergever. Mutual-information-based registration of medical images: a survey. IEEE transactions on medical imaging, 22(8): 986–1004, 2003. URL http://www.isi.uu.nl/People/Josien/Papers/Pluim_TMI_2003.pdf. [84] Maximum entropy probability distribution. Wikipedia. URL https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution. [85] Leblanc Raymond. The Maximum Entropy Principle as a Basis for Statistical Models in Epidemiology. PhD thesis, McGill University, 1990. URL http://digitool.library.mcgill.ca/thesisfile74600.pdf. [86] Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra. A maximum entropy approach to natural language processing. URL http://www.isi.edu/natural-language/people/ravichan/papers/bergeretal96.pdf.

87 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

[87] J. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986. URL http://hunch.net/~coms-4771/quinlan.pdf. [88] Xiang Li and Christophe Claramunt. A spatial entropy-based decision tree for classification of geographical information. Transactions in GIS, 10(3):451–467, 2006. URL http:// citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.596.669&rep=rep1&type=pdf.

[89] Karmele L´opez-de Ipi˜na,Jordi Sol´e-Casals,Marcos Faundez-Zanuy, Pilar M. Calvo, Enric Sesa, Unai Martinez de Lizarduy, Patricia De La Riva, Jose F. Marti-Masso, Blanca Beitia, and Alberto Bergareche. Selection of entropy based features for automatic analysis of essential tremor. Entropy, 18(184), June 2016. URL http://www.mdpi.com/1099-4300/18/5/184/pdf.

[90] Zhijic Bian, Gaoxiang Ouyang, Zheng Li, Qiuli Li, Lei Wang, and Xiaoli Li. Weighted-permutation entropy analysis of resting state eeg from diabetics with amnestic mild cognitive impairment. Entropy, 18(307), May 2016. URL http://www.mdpi.com/1099-4300/18/8/307/pdf.

[91] Chicote Beatriz, Irusta Unai, Alcaraz Ra´ul,Rieta Jos´eJoaqu´ın, Aramendi Elisabete, Isasi Iraia, Alonso Daniel, and Ibarguren Karlos. Application of entropy-based features to predict defibrillation outcome in cardiac arrest. entropy, 9, September 2016. URL http://www.mdpi.com/1099-4300/18/9/313/htm.

[92] Andreas Holzinger, Matthias Dehmer, and Igor Jurisica. Knowledge discovery and interactive data mining in bioinformatics-state-of-the-art, future challenges and research directions. BMC bioinformatics, 15(6):1, 2014.

[93] Chaoli Wang and Han-Wei Shen. Information theory in scientific visualization. Entropy, 13 (1):254–273, 2011.

[94] Min Chen and Heike J¨anicke. An information-theoretic framework for visualization. IEEE Transactions on Visualization and Computer Graphics, 16(6):1206–1215, 2010.

[95] Chaomei Chen. An information-theoretic view of visual analytics. IEEE Computer Graphics and Applications, 28(1):18–23, 2008.

[96] Chaoli Wang, Hongfeng Yu, and Kwan-Liu Ma. Importance-driven time-varying data visualization. IEEE Transactions on Visualization and Computer Graphics, 14(6): 1547–1554, 2008.

[97] Tony Lindeberg. Scale-space theory: A basic tool for analyzing structures at different scales. Journal of applied statistics, 21(1-2):225–270, 1994.

[98] Multigrid methods. URL https://en.wikipedia.org/wiki/Multigrid_method.

[99] Martin Kulldorff, Richard Heffernan, Jessica Hartman, Renato Assun¸cao,and Farzad Mostashari. A space–time permutation scan statistic for disease outbreak detection. Plos med, 2(3):e59, 2005.

[100] Scale space. URL https://en.wikipedia.org/wiki/Scale_space.

88

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. BIBLIOGRAPHY

[101] Timor Kadir and Michael Brady. Scale saliency: A novel approach to salient feature and scale selection. In Visual Information Engineering, 2003. VIE 2003. International Conference on, pages 25–28. IET, 2003.

[102] Roman Filipovych and Eraldo Ribeiro. Determining the scale of interest regions in videos. In 2009 16th IEEE International Conference on Image Processing (ICIP), pages 985–988. IEEE, 2009. URL http://cs.fit.edu/~eribeiro/papers/filipovych_ribeiro_icip2009.pdf.

89 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

This page is intentionally left blank.

90

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Appendix A

Conferences and books

A.1 Conferences

Information theory and spatial analysis are alive and well. As proof, here is a subset of the most important conferences and workshop on these subjects:

1 • WIMAKS’16 2nd International Workshop on Maritime Flows and Networks

2 • Maritime Anomaly Detection (2011)

3 • Conference on Spatial Information Theory (COSIT)

4 • Marine Environmental Observation Prediction and Response Network (MEOPAR) : AIS Applications, Analysis and Data Management Techniques 5

6 • Spatial Analysis and GEOmatics (SAGEO)

7 • 13th International VTS Symposium 2016:

8 • 1st International Workshop on Maritime Domain Data Mining at ICDM 2016

A.2 Books

Some books found in the literature survey:

1http://www.world-seastems.cnrs.fr/index.php?page=page_ERC_WS_51 2https://mad.uvt.nl/mad/mad2011-proceedings.pdf 3http://www.cosit.info 4http://meopar.ca 5http://meopar.ca/memberuploads/discussions/AIS_Workshop_Agenda_Sept.24,_2015.pdf 6https://sageo2016.sciencesconf.org/ 7https://www.vts-symposium2016.my/ 8https://www.linkedin.com/pulse/mddm-2016-1st-international-workshop-maritime-domain-michele- vespe

A-1 Study Report RISOMIA Call-up 14

9 • Maritime Networks: Spatial structures and time dynamics

10 • Statistics for Spatio-Temporal Data

9https://books.google.ca/books?id=IvCoCgAAQBAJ 10https://books.google.ca/books/about/Statistics_for_Spatio_Temporal_Data.html?id=-kOC6D0DiNYC

A-2

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Appendix B

Administration and User Guide

In this appendix, we provide the instructions for reproducing the database environment used in this research and the instructions on how to execute the code deliver in this Call-up. The deliverables for this Call-up consist of USB key containing:

1. database: A PostgreSQL MSARI database backup (call14.sql.gz) and a large table (reports with more attributes.sql.gz) in compressed SQL format.

2. code: A copy of the Java and SQL code used in the investigation of this Call-up.

3. docker image: A compressed Docker image which consists in a Postgresql 9.4 server with a PostGIS 2.2 extension and a Java 1.7 environment within an Ubuntu 14.04 OS container.

4. docker build files: All the necessary files to recreate and reconfigure the Docker image.

5. literature repository: All the references found during the literature survey.

6. bibliography: A PDF bibliography of the documents found for the literature survey. Please note that additional documents found after the literature survey are listed in the bibliography of the final report.

7. final report: The final report and the Latex files used for its edition.

The sections are divided as follows:

1. section B.1.1 lists the requirements before installing the Docker image.

2. section B.1.2 explains how to load and run the docker image and also how to load the database used for this study;

3. section B.2.1 describes how to create an AIS MSARI database from raw AIS data files;

4. section B.2.2 shows how to execute the Java code delivered as part of this project in order to compute entropy and spatial diversity over attributes of the MSARI database;

B-3 Study Report RISOMIA Call-up 14

5. section B.2.3 explains how to use the provided SQL scripts;

6. section B.2.4 describes how to run the Bash script reports with more attributes.sh.

B.1 Administration guide

B.1.1 Requirements

The instructions of the Statement of Work specify the delivery of the demo environment in a Docker image to be run as a container. The deliverables include a Docker image to be run under the latest version of Docker (1.12). By design, Docker does not persist data. If it crashes or is turned off, all database data will be lost unless you use Docker volume feature to persist the data. The instructions contain in this user guide will allow you to persist the data on your system, please follow them carefully.

The necessary environment used for this study is a PostgreSQL server with Postgis extension and a Java environment. The deliverables have been designed so you can decide if you want to use the Docker environment or use your own PostgreSQL server and Java environment. If you decide to choose the latter, please note that most of the tests were done on a Linux environment (Ubuntu 14.04) with Java 1.8 and PostgreSQL 9.3 but any computer with Linux or Windows can be used for running the code and scripts. As a minimum, we recommend using Java (JDK)1 1.6 or higher, PostgreSQL2 9.2 or higher with module Postgis3 2.1 or higher. Executables for Windows for all these packages are available as well and can be found on their respective web site. Before going further, make sure that the environment variables ($JAVA HOME, $PATH, etc.) point to the proper packages and executables and that the configuration setup (network permission access, proxy settings, port information, username and password for accessing the database server, etc.) have been implemented and tested.

B.1.2 Quick Installation Guide Using Docker

In this section, we list the steps on how to install the ready-to-use Docker image on a single computer. For this section, we assume that the user has full administration privileges, connected as root directly or using the sudo su command or as a minimum, be a member of the docker group on his/her computer, please contact your system administrator if you need any help in configuring your computer and your Docker environment.

1. Select a computer preferably with Linux (Ubuntu or Centos) already installed. If you use another Operating System, make sure that it satisfies the requirements for running Docker. Also, verify that there is enough memory on that computer, we recommend 8 GB or higher. The more memory you have, the more complex and large query you will be able to do.

1https://java.com/en/download/manual.jsp 2https://www.postgresql.org/download/ 3http://postgis.net/install/

B-4

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Appendix B. Administration and User Guide

2. Make sure that Docker is installed. We recommend the latest version if possible, at a mini- mum, Docker 1.10.3 or higher. For this project, Docker version 1.12.5 was used.

3. Copy the Docker image directory, Docker build file directory and the large database backup file (located on the USB key in the docker images, docker build files and database directory respectively) into the computer. Whenever possible, try to copy the files on a local disk of the user computer (in /opt for example) and not on some remote file server. This will improve significantly the performance of the installation and of your development environment.

4. Load the images in the docker registry using the command:

gzip -d < postgresql_for_call14.tar.gz | docker load

In order to persist the data of the PostgreSQL database, you need to create a docker volume:

docker volume create --name pg_data

The data will be stored in /var/lib/docker/volumes/pg data. Make sure that you have enough disk space for storing the database ( 100GB). Take into account that you will require more space if you plan to add more raw AIS data files.

5. Start the Docker image with the command

docker run -d --name=postgresql -i -t -h pgserver -p 5433:5432 \ -v pg_data:/var/lib/postgresql drdc/postgresql_for_call14

Use -p 5433:5432 is you already have a PostgreSQL installed on your computer. You can use -p 5432:5432 otherwise.

At this point, your PostgreSQL server should be running. To test it, you can

1. enter the Docker container using the command:

docker exec -it postgresql bash

The default user for this container is the postgres user, if you need to be root, use instead:

docker exec -u 0 -it postgresql bash

2. connect remotely on your computer, using the host name and the port number of the Docker container. For this, you will need to install the PostgreSQL client package on your computer. Please note that the default password for connecting to the database server as user postgres is postgres.

psql -U postgres -h 172.17.0.2 -p 5432

If you have problems connecting to your Docker container, verify the IP given to the container by viewing the /etc/hosts file inside the container. The hostname for the container was given in the ”Docker Run” command and was set to pgserver. If you want your computer to be able to resolve the name into its IP, you will have to either edit your /etc/hosts file manually or use the Docker network tool (man docker-network).

B-5 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

To complete the database installation you need to create the database and load the backup provided with the deliverables: createdb -U postgres -h 172.17.0.2 -p 5432 call14_db gzip -d < call14.sql.gz | psql -U postgres -h 172.17.0.2 -p 5432 call14_db

The loading will take several hours. This backup contains all the important scripts and the results table used in the final report. The backup also contains the data model. If you want to create a new MSARI database, the data model can be found in /opt/msari/msari template Dec15 2016.sql. For example, if you want to create a fresh empty MSARI database in order to import a new batch of raw AIS data files, then, you can create a new database as follows: createdb -U postgres -h 172.17.0.2 -p 5432 call14_new_db psql -U postgres -h 172.17.0.2 -p 5432 call14_new_db < /opt/msari/msari_template_Dec15_2016.sql

The last item to back up is the non MSARI table reports with more attributes.sql.gz. Load the SQL file into the database with the following command: gzip -d reports_with_more_attr.sql.gz | psql -U postgres -h 172.17.0.2 -p 5432 call14_db

This last table is not an MSARI table per say but it is much easier to use in the context information- theoretic research as it contains all the important attribute in one single table. In section B.2.4, we describe the script that can be run on the MSARI database in order to generate this table.

B.2 User guide

B.2.1 Creating a MSARI database from raw AIS compressed files

The database creation and loading process:

1. If not already done, create a new database and load the MSARI data model using the latest template as explained above.

2. Transfer the raw AIS data files (single-day files compressed (gz) or not). Use scp for fast and easy transfer over your network. This implies that you have set SSH correctly. Copy the files to a local directory on your computer. Then, you need to transfer the file to the Docker container. This is done with the command:

docker cp EXACTEARTH_2014-04-01.log.gz postgresql:/opt/msari/MSARILoadingDir/AIS

Note that AIS raw files must all be in a directory named AIS within the loading directory. Please note that this command uses the Docker container name postgresql and not the hostname pgserver. Remember that the files you transferred have root permission, you need to change them to the postgres user. Connect to the container using docker exec -u 0 -it postgresql bash and run the command:

B-6

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Appendix B. Administration and User Guide

chown -R postgres:postgres /opt/msari/MSARILoadingDir/AIS

3. Stay in the Docker container but switch to the user postgres with the command su postgres then go to the directory /opt/msari. Make sure that the dbconfig.properties is setup for file loading:

DBName=call14_db ServerIP=127.0.0.1 Port=5432 Username=postgres Password=your_password DebugOnOff=off ArchivingOutputOnOff=off ArchivingOutputDir=/opt/archiveMSARI MSARIOnOffSwitch=on InputMode=file FileLoadingInputDir=/msari/MSARILoadingDir

4. Make sure that startMSARI, MSARI.jar, dbconfig.properties and msarisources.xml are in the same directory and that startMSARI has executable permission. If you change the location of the raw data files, adjust the dbconfig.properties accordingly. Make sure that your Java environment variable and Java binary path are defined. In a command line shell, start the MSARI process on the main server.:

./startMSARI

Do not close the shell to prevent losing the MSARI process. The loading can take from a couple of hours to a day or two depending on the number of files you are loading. When the loading of the files is completed, the loading directory will be empty and the MSARI process will terminate naturally. By monitoring the content of the loading directory, one can see the progress of the transfer until the directory is empty. If for some reason a power failure or an interruption should occur, only restart the MSARI process with the above command and it will continue where it was before the interruption.

5. You can follow the filling up of the database from your computer by monitoring its size by using the following command a couple of times:

psql -U postgres -h 172.17.0.2 -p 5432 call14_db \ -c "SELECT pg_size_pretty(pg_database_size('call14_db'))"

6. When all the files are loaded, run the command for vacuuming:

vacuumdb -U postgres -h 172.17.0.2 -p 5432 -f -z call14_new_db

This process can take several hours (calculate fewer than 30 min per day of data)). Note that this need to be executed only once.

When completed, MSARI is ready and optimized for query.

B-7 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

B.2.2 Java script

The computation of the entropy and spatial diversity can be launched using the Java tool ant. You can run the Java code within the Docker container or your computer if you copy the code directory and if you modify the build.xml with your own database server parameters. If you log within the Docker container:

• cd /opt/call14/code

• Edit the header of the build.xml file with your simulation parameters. See below for param- eter descriptions.

• ant main

The MSARI database was designed to accept reports from very different sources where attributes are stored in separate tables according to their type. Consequently, in order to enable the com- putation of information-theoretic measures on most of the different attributes available, an SQL script need to be executed on the selected subset of data and transform it in a more suitable way for queries and create various stored procedures in the database. If the data needs to be reorganized for the time period specified in the build file with the start and end date, an SQL process will be performed. This process might be quite long and consume lots of memory if the time period is too large.

Computation parameters can be modified directly in the build.xml file. The parameters that can be modified are:

• server: valid PostgreSQL database server in the form: jdbc:postgresql://your_ip_adress: port_number/database_name

• username: valid user name for the PostgreSQL database

• password: password corresponding to the user name

• startDate: reports having a timestamp greater than the startDate are used in the computa- tion (example: 2014-04-01)

• endDate: reports having a timestamp lower than the endDate are used in the computation (example: 2014-04-16)

• attribute: name of the attribute on which the computation is made, valid attributes are listed below

• binSize: size of the bins for numeric data, for example if your data are going from 0 to 360, setting a bin size of 10 will create 36 bins. If binSize is set to zero, no binning is performed. The bin size can be lower than one (example: 0.5). A bin size of zero should be used for textual values, such as the destination.

• gridSizeInDegrees: size of the grid cells which divide the world or your area of interest

B-8

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Appendix B. Administration and User Guide

• computeDiversity: a boolean value to indicate if one wants to compute the spatial diversity, as spatial diversity is quite long to compute

• westLong: west longitude of the bounding box of interest (± notation)

• eastLong: east longitude of the bounding box of interest (± decimal notation)

• northLat: north latitude of the bounding box of interest (± decimal notation)

• southLat: south latitude of the bounding box of interest (± decimal notation)

• outputTableName: name of the database table where the output is written (example: global entropy heatmap 1 degree)

Available attributes for the computation are listed below:

• report id

• mmsi

• source name

• data type

• entity type

• report timestamp

• latitude

• longitude

• altitude

• geom

• raw data

• quality

• message type

• callsign

• imo number

• name

• ship type

• ais source id

• dimension to bow

B-9 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Study Report RISOMIA Call-up 14

• dimension to port

• dimension to starboard

• dimension to stern

• eta month

• eta day

• eta hour

• eta minute

• draught

• destination location

• navigational status

• rate of turn

• speed

• course

• heading

B.2.3 SQL Scripts

The SQL scripts are located in the code/scripts deliverables directory. The three scripts used by the Java application:

• get cell reports.sql

• world grid gen.sql

• reports with more attributes.sql are already loaded in the PostgreSQL database backup provide with the deliverables. Also, the Java application load automatically these three SQL scripts to the database defined in the configuration file.

The other SQL scripts were used at some point within this Call-up and was included in the deliverables as they can be reused for research purpose. They each have a small description at the top of the script. To load a script, you can copy/paste the content of the file into the PostgreSQL console or you can load the script remotely using:

psql -U postgres -h hostname -p port_number database_name < code/scripts/name_of_the_script.sql

Note that in order to work, the scripts are designed to work within the MSARI data model. Also, some of the scripts may depend on another one, so you will need to load any dependencies as well.

B-10

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document. Appendix B. Administration and User Guide

B.2.4 Bash Script

There is a single Bash script in the code/scripts deliverables directory: reports_with_more_attributes.sh

Same as its homologous SQL script (reports with more attributes.sql), this script takes the tables of MSARI and produces a single table where the AIS reports and a subset of the ship attributes are grouped in the same row. This format is easier for applying information-theoretic algorithms. The MSARI database was designed for accommodating different heterogeneous kinds of maritime data sources but was not designed to be information-theoretic friendly.

The SQL script does not work for large periods (larger than a day) because it takes too much RAM. The Bash script cut the query into smaller time periods and accumulate the results in a single table.

Before running the script, modify it to adjust the variable dbname and for your environment setup (such as remote hostname and port number if applicable). Modify also the input parameters for your query such as start and end time, min and max of the latitude and longitude of the region of interest. Choose also a name for the table that will be created in the database that will not conflict with another table. You can avoid entering a password if you define a hidden file4 .pgpass in your home directory that will allow you to connect to your database without interaction.

Afterwards, to run the script, just log on in any Linux command shell and enter the command:

/path_to_script_directory/reports_with_more_attributes.sh

When completed, the result is a new table in the database with defined in the script. The script could take long to execute depending on the time interval chosen.

4https://www.postgresql.org/docs/9.4/static/libpq-pgpass.html

B-11 OODA Technologies Inc.

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.