Visualization Techniques for Network Analysis and Link Analysis Algorithms

Visualization Techniques for Network Analysis and Link Analysis Algorithms Ying Zhao1, Ralucca Gera1, Quinn Halpin2 and Jesse Zhou3 1Naval Postgraduate School, Monterey, CA, U.S.A. 2Cornell University, Ithaca, NY, U.S.A. 3JZ Tech Consulting, San Francisco, CA, U.S.A. Keywords: Visualization, Data-Driven Documents (D3), Network Analysis, Lexical Link Analysis (LLA), Smart Data, Automatic Dependent Surveillance-Broadcast, ADS-B. Abstract: Military applications require big distributed, disparate, multi-sourced and real-time data that have extremely high rates, high volumes, and diverse types. Warfighters need deep models including big data analytics, network analysis, link analysis, deep learning, machine learning, and artificial intelligence to transform big data into smart data. Explainable deep models will play a more essential role for future warfighters to understand, interpret, and therefore appropriately trust, and effectively manage an emerging generation of artificially intelligent machine partners when facing complex threats. In this paper, we show how visualization is used in two typical deep models with two use cases: network analysis, which addresses how to display and present big data both in the exploratory and discovery process, and link analysis, which addresses how to display and present the smart data generated from these processes. By using various visualization tools such as D3, Tableau, and lexical link analysis, we derive useful information from discovering big networks to discovering big data patterns and anomalies. These visualizations become intepretable and explainable deep models that can be readily used by warfighters and decision makers to achieve the sense making and decision making superiority. 1 INTRODUCTION and decision makers: The first requirement is to show the process of discovering knowledge, exploring in- The US Department of Defense (DoD) faces chal- sight from big data and building actionable deep mod- lenges that demand more deep models to produce in- els. The second requirement is to comprehend the telligent, autonomous, and symbiotic systems to sup- resultant smart data and deep models. Visualization port situation awareness and decision making supe- provides one of the important components for the two riority. Military applications require big distributed, needs. There are two the research questions to address disparate, multi-sourced, and real-time data that have in this paper as follows: extremely high rates, high volumes and diverse types. 1. How to display and present big data, both in the Warfighters need deep models including big data an- exploratory and discovery process? alytics, network analysis, link analysis, deep learn- 2. How to display and present the smart data gener- ing, machine learning, and artificial intelligence to ated from these processes? transfer big data into smart data. Warfighters then can apply the insight and knowledge generated from We show how visualization is used in two typical big data for decision making and actions. Explainable deep models with two use cases: network analysis, deep models will play a more essential role for future which addresses the first research question and link warfighters to understand, interpret, and therefore ap- analysis, which addresses the second research ques- propriately trust, and effectively manage an emerg- tion. They both have the characteristics of discovering ing generation of artificially intelligent machine part- and exploring new and high-value information where ners when facing complex threats. Researchers need warfighters lack useful information. The process be- consider two requirements for understandable, inter- longs to the new frontiers of deep analytics with the pretable, and explainable deep models for warfighters potentials to handle so-called “unknown unknowns” 561 Zhao, Y., Gera, R., Halpin, Q. and Zhou, J. Visualization Techniques for Network Analysis and Link Analysis Algorithms. DOI: 10.5220/0008377805610568 In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019), pages 561-568 ISBN: 978-989-758-382-7 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved KDIR 2019 - 11th International Conference on Knowledge Discovery and Information Retrieval scenarios: We do not know if there are any unknowns that has been inferred, (c) greedily placing new sen- in a battlespace. sors on a highest degree neighboring node of the cur- Many commercially-available data manipulation rent monitor, (d) greedily placing new sensors on and visualization tools exist today. Typical spread- highest undiscovered degree node neighboring an in- sheet tools (e.g., scatter and line plots, bar and pie ferred node. The algorithms have a probability based charts, and bubble and radar charts) are widely used restarting feature which varies between restarting at to visualize statistical characteristics to support data- a previously discovered but unmonitored node and driven reasoning and decision making. Engineers use a random teleportation to an unexplored node some- MATLAB, Octave, and Python libraries to display nu- where in the network. The algorithms stop when there meric data and analysis results. Developers use Data- is an attempt to place a sensor on a node where a sen- driven Documents (D3) and Javascript to manipu- sor already exists. late document object models (DOM) within browsers We say that a monitor on node i detects an edge i j and generate dynamic and interactive visualizations. if i and i j are incident, and i detects the label i j of the Business users use Tableau to generate insights from edge (i.e. the monitor discovers the label of the other relational databases and data cubes. end node of i j) (Davis et al., 2016). This then implies In this paper, we focus on several types of visual- that a monitor on node i detects a node j if there is an ization and how they are useful for two deep models edge i j connecting them. We also allow the monitor such as network analysis and link analysis. Types of on i to discover the deg j (Davis et al., 2016). visualization considered in this paper include statis- The following screenshots overview the interac- tical, topical, network, temporal and geospatial. The tive site visualizing the progression in discovering a data types considered include unstructured, structured network. Figure 1 displays a network before the dis- and network data. covery process. 2 NETWORK ANALYSIS Visualizing data complements the analysis process and enables understanding of the “why” behind the “what” is observed (CED3, 2018). Inferring the structure of an unknown network is of interest to researchers in the government/military, academia and industry. Generally, the ground truth of a network is not known because it is extremely Figure 1: An example of a network to be discovered. large or because complete information about it is not available. Thus, researchers make decisions based on The first column identifies the search algorithm to the inferred network, whose information is still in- be used, the network to be analyzed, and the statistical complete, but which acts as the true network. The properties of the network as well as the network to be degree of incompleteness of information is not gen- discovered. erally known, since the true network is unidentified The rest of Figure 1 is divided into four quadrants. and there are no standard techniques to measure their The top left quadrant, shows the network to be discov- topological difference. This visualization project al- ered, whose nodes and edges turn green and blue, re- lows for the exploration of pre-loaded network exam- spectively, as the network is being lit/discovered. The ples, uploaded networks or the creation of new net- bottom left quadrant shows a temporal progression of works using the left navigation toolbar the network as it is discovered, both for nodes and Our visualization presents the output of novel edges, color-coded with blue and green to match the algorithms previously introduced to infer nodes first quadrant. Notice that the x-axis doesn’t go be- and edges in an unknown network (Davis et al., yond 60% of monitored nodes, since by that time ei- 2016)(Gera et al., 2017)(Wijegunawardana et al., ther the whole network is discovered, or there is no 2017)(Chen et al., 2017). Our algorithms utilize sen- particular strategy needed for the left over part of the sors that have the capability to detect neighboring network. The upper right quadrant, shows the left- nodes (and their labels) and the edges incident to the over network that has not yet been discovered; it starts sensor. The algorithms infer the networks through a with the original network, and the discovered part is combination of (a) random walks, (b) greedily plac- being taken away. The bottom right quadrant dis- ing new sensors on a highest degree neighboring node plays a network’s heat map, a node being colored dark 562 Visualization Techniques for Network Analysis and Link Analysis Algorithms green if 100% of the neighbors have been discovered, type displays no particular structure being random. and white if 0% of the neighbors have been discov- The second network is built from four cliques, a mod- ered, with intermediate percentages represented be- ular network identifying how the algorithms work dif- tween white and green. ferently if there is structure present. This visualization was created with the goal of supporting decision makers in planning the discovery 2.1 First Case Study: A Random of a network, by (1) allowing to visually see what por- Network tion of the network has been discovered much like a map would, (2) what percent of the network has been We first consider an ER random graph with 30 nodes identified, measured both for edges and nodes, and and 40 edges. Figure 2 displays a temporal snapshot (3) what portion of the network has better coverage of the first two quadrants on network discovery using through the heat map. The visualization is particu- a random walk as the inference algorithm to light up larly useful in testing the patterns and performance of the network.

Load more