International Technology Alliance in Network & Information Sciences

One Year Peer Review Report

July 2014

Prepared for: Peer Review Panel

Prepared by: Dinesh Verma & David Watson, IBM 19 Skyline Drive, Hawthorne, NY – 10549, USA & MP 137, IBM Hursley, Hursley Park, Winchester, Hants., SO21 2JN, UK

Distribution authorized to U.S. Government agencies and their contractors; test and evaluation (December 2009). Other requests for this document shall be referred to Director, U.S. Army Research Laboratory, ATTN: AMSRD-ARL-CI-IA, 2800 Powder Mill Road, Adelphi, MD 20783-1197

1

1. Introduction ...... 4 1.1 What is the Peer Review Report (PRR)? ...... 4 1.2 Overview of ITA ...... 4 1.2.1 ITA Mission ...... 5 1.2.2 Alliance Members ...... 7 1.2.3 The Ways of Working ...... 9 1.3 Document Organization ...... 10 2. Technical Area 5: Coalition Interoperable Secure & Hybrid Networks ...... 11 2.1 Overview ...... 11 2.1.1 Accomplishments Highlights ...... 12 2.1.2 Other Important information / facts related to TA 5 ...... 13 2.2 Project 1- Hybrid Networks: Performance and Metrics ...... 15 2.2.1 Introduction ...... 15 2.2.2 Research Impact ...... 16 2.2.3 Technical Accomplishments ...... 19 2.2.4 References for Project 1 ...... 23 2.3 Project 2 – Security/Network Management and Control ...... 25 2.3.1 Introduction ...... 25 2.3.2 Research Impact ...... 26 2.3.3 Technical Accomplishments ...... 33 2.3.4 References for Project 2 ...... 41 2.4 Project 3 – Security for Distributed Services ...... 43 2.4.1 Introduction ...... 43 2.4.2 Research Impact ...... 43 2.4.3 Technical Accomplishments ...... 46 2.4.4 References for Project 3 ...... 51 3. Technical Area 6: Distributed Coalition Information Processing for Decision Making ...... 52 3.1 Overview ...... 52 3.1.1 Accomplishments Highlights ...... 53 3.1.2 Other Important information/facts related to TA6 ...... 54 3.2 Project 4 - Human-Information Interaction ...... 56 3.2.1 Introduction ...... 56 3.2.2 Research Impact ...... 57 3.2.3 Technical Accomplishments ...... 61 3.2.4 References for Project 4 ...... 68 3.3 Project 5 – Distributed Coalition Services ...... 70 3.3.1 Introduction ...... 70 3.3.2 Research Impact ...... 71 3.3.3 Technical Accomplishments ...... 74 3.3.4 References for Project 5 ...... 76 3.4 Project 6 - Collective Sensemaking Under Uncertainty ...... 78 3.4.1 Introduction ...... 78 3.4.2 Research Impact ...... 78 3.4.3 Technical Accomplishments ...... 81 3.4.4 References for Project 6 ...... 90

2

4. Experimentation Framework ...... 92 4.1 Overview ...... 92 4.2 Experimentation as a Service – The Reference Framework for Experimentation ...... 92 4.2.1 Operating System Platform Layer ...... 93 4.2.2 Emulation and Orchestration Layer ...... 93 4.2.3 Data Management Layer ...... 94 4.2.4 Services Layer ...... 95 4.2.5 Decision Layer ...... 95 4.2.6 Automation, Monitoring & Visualization Tools ...... 95 4.3 Datasets ...... 96 4.3.1 3G/4G Cellular Network Measurement from UMASS ...... 96 4.3.2 Multicast Mobile Node Movement Data from BBN ...... 97 4.3.3 SYNCOIN Data ...... 97 4.3.4 GeoLife Trajectories ...... 97 4.3.5 Mobility Patterns of San Francisco Taxi Cabs ...... 97 4.3.6 A Collection of Data Sets for Wireless/Mobile Data from CRAWDAD ...... 97 4.4 Exploitation and Ongoing Efforts ...... 97 Appendix A. ITA Program Metrics ...... 98 Appendix B. Technology Transitions ...... 114

3

1. Introduction

The International Technology Alliance is a ground-breaking collaborative effort across several universities, industrial research laboratories, and government researchers in the United States and the United Kingdom. This document provides an evaluation report of the activities and achievements of the alliance during year eight of the program (May 2013 – May 2014). In this introductory section, we discuss the purpose of this evaluation report, a brief overview of the alliance, and the organization of the document.

1.1 What is the Peer Review Report (PRR)?

This Peer Review Report (PRR) provides a summary of the activities and achievements of the alliance for year seven and eight of its existence. Under the rules for the governance of the alliance’s activities, the technical oversight of the alliance is done by a Peer Review Board – which comprises technical experts in the field of network and information sciences. The Peer Review Report has been developed as input to the Peer Review Board to provide a concise summary of the technical activities of the alliance. The PRR provides an overview of the organization, accomplishments, and ongoing activities of the alliance.

1.2 Overview of ITA

This document describes the fourth Biannual Program Plan (BPP2013, or simply, BPP13) for the International Technology Alliance in Network and Information Sciences. The projects and tasks described in this BPP are to be undertaken in the period of May 12th, 2013 and May 11th, 2014. The U.S./UK International Technology Alliance in Network and Information Sciences (NIS-ITA or, simply, ITA) was formed to tackle some of the most difficult fundamental science and technology challenges facing our armed forces. It is one of the first research consortia to be established under a 2002 Memorandum of Understanding between the U.S. and UK Governments to co-operate on defense research. ITA was awarded on May 11, 2006, on behalf of the U.S. Army Research Laboratory and the UK Ministry of Defense to a consortium led by International Business Machines (IBM). ITA is a fundamental research program that takes the best features from the UK Defence Technology Centres and the U.S. Army Collaborative Technology Alliances and applies them internationally in a ground-breaking collaborative effort. Each area of research within ITA is conducted by teams whose members are drawn from both countries and are comprised of world-class academics, leading industrial researchers, and key government scientists—all focused on enhancing our ability to conduct coalition operations and addressing the operational challenges that arise due to differences in equipment, training, doctrine, culture, and trust amongst coalition members. Following a successful first 5-year phase (May 12, 2006- May 11, 2011), the ITA program started its second five-year phase with a research program outlined in the third Biannual Program Plan (BPP2011 or BPP11) and is now in the fourth B Biannual Program Plan (BPP13) Technology transition from fundamental research to operational use in the field remains a particular focus, both in the contract arrangements and in program leadership. Funding for the program is provided 50:50 by the U.S. and UK governments and it is a condition of the program that the funding is spent 50:50 in the U.S. and UK with a minimum of 40% of the funding going

4

to academia. The program comprises two technical areas of research each receiving roughly 50% of the total funding each. Some projects and areas in the program may have more U.S. participants while others may have more UK participants, but the overall program is roughly balanced between the two nations. With twenty-five partner organizations split across academia and industry in two countries the ITA program is a major undertaking in terms of collaborative research. From the outset the emphasis has been on collaboration, and no significant activity in the program is undertaken in only one country, or by one partner. The program has enabled multiple new collaborations between industry and academia in both countries, opening up new channels of discussion between, for example, U.S. Universities and UK Industry.

1.2.1 ITA Mission The ITA program was conceived to deliver new insights into some of the complex and difficult issues facing coalition forces in the theatre of urban warfare by the use of a wide range of skills and expertise. A central aim of the program is to show more progress can be made than would be expected by in-country or single supplier research. The program has three primary goals (a) make advances in the fundamental science of coalition operations (b) promote collaboration among U.S. and UK researchers and (c) transition the fundamental science to meet the needs of U.S. and UK armed forces. ITA is focused on facilitating and enhancing the formation of dynamic coalitions that bring together a number of different partners into a single US/UK Collaboration operation or mission. This requires a degree of transience of existence US/UK Collaboration and membership, distinct from persistent alliances with permanent infrastructure. Each coalition partner will bring different cultures and ITA Transitions to US/UK social perspectives, different policies and procedures, different systems FIundTamenAtal Science and networks, all supported by a variety of technologies. These must be Transitions to US/UK brought into harmony to achieve a common goal. Creating revolutionary technologies that enable seamless operation between Fundamental Science coalition partners is the primary mission of the alliance.

Research Agenda

The science agenda of the ITA is highly influenced by the ever-evolving operational realities that military forces need to adapt to while engaging in complex coalition operations. Emerging trends impacting coalition operations include: • Counterinsurgency operations are complex, implying that: o Increased emphasis is needed on (a) information and analysis at the lowest levels, (b) shortened decision making time-scales, (c) exploitation of a wider array of information sources due to problem complexity. • Continued growth in volume of data, especially informal information with limited structure, implying that: o To support decision-making, disparate information must be transformed into relevant knowledge. • Processing power and storage capacity is increasing faster than communications bandwidth, implying that: o To efficiently use bandwidth resources, data and services must be smart and smartly positioned within networks.

5

• Increased use of commercial cellular networks, implying that: o Hybrid networks that exploit and interoperate with commercial wireless networks are key. • Enhancing coalition decision making depends on secure communications and information networks, implying that: o The end-to-end problem of data-to-decision (in coalition environments) must be addressed. Going after these emerging trends, ITA researchers are focused on making technical advances in cross- disciplinary research the following two technical areas (TA1s): • TA5: Coalition Interoperable Secure and Hybrid Networks (CISHN) with the objective to investigate and develop the fundamental underpinnings for secure hybrid wireless networking that enables adaptable and interoperable communication and information services for military coalition operations. Key challenges include efficient and adaptive secure networking that adapts rapidly to dynamic missions and ad hoc team formation, operates without reliance on centralized network or security services, and the composability of disparate security and networking systems. An underlying challenge is to jointly treat networking and security. The research in TA5 is grouped in the following three projects, details of which are given later in this document: 1. P1: Hybrid Networks: Performance and Metrics 2. P2: Security/Network Management and Control 3. P3: Security for Distributed Services • TA6: Distributed Coalition Information Processing for Decision Making (DCIPDM) with the objective to investigate and develop the fundamental underpinnings for exploiting and managing an agile network of data and information sources for effective understanding and decision making across the coalition for dynamic complex problems. The overall challenge is to develop the fundamental science to underpin a 2-way end-to-end socio-technical chain reaching from “data- to-decision” (and from “decision-to-data”) resolving “complex problems” while operating in a coalition environment. The research in TA6 is grouped in the following three projects, details of which are given later in this document: 4. P4: Human-Information Interaction 5. P5: Distributed Coalition Services 6. P6: Collective Sensemaking Under Uncertainty As the aforementioned emerging trends note, it is becoming increasingly evident that to support coalition decision making, the communication network must be closely linked with the information infrastructure and security mechanisms must support both seamlessly. Thus the two TAs, and the projects in them, are interlinked and should also be treated as a whole jointly solving addressing challenges that support cross- cutting research themes in the areas of: • Content and Context Aware Networking; and • Decision-Making to Support Mission Command.

1 The TA numbering (TA5, TA6) follows the four TAs from that of the first phase of the ITA.

6

Despite the synergistic nature of the technical areas and the projects in them, the above division in TAs and projects provides a convenient mechanism for administrative purposes. A project champion, i.e., a researcher who has the primary responsibility for coordinating the cross-Atlantic, multi- organization team that conducts the research in the project, leads each team. The details of the technical achievements and collaboration in each of the projects are described in detail in subsequent chapters of this document. The technical work in each area is led by a team of four technical area leaders, one from the academia to promote focus on the fundamental science, one from the industry to promote focus on science that can be transitioned, one from the U.S. Army research to keep the focus on the needs of the U.S. Army, and one from the UK Ministry of Defense to keep the focus on the needs of the UK military. Collaborative Alliance Managers (one from U.S. and one from UK) and the consortium managers (one from U.S. and one from UK) provide overall coordination among all technical areas.

All project champions (a.k.a. PCs), technical area leaders (a.k.a. TALs) and consortium managers are active researchers in the alliance who perform administrative tasks as an adjunct part of their research activities within the program.

1.2.2 Alliance Members Success in a vision like ITA requires three key competencies, (a) adventurous thinking that results in blue-sky high-risk high-reward research activities, (b) military knowledge and experience in delivering military systems to the U.S. and UK and (c) ability to coordinate and bridge the gap between fundamental research and the military transition. The research philosophy for ITA envisions adventurous thinking to emanate primarily (but not exclusively) from the universities, the military expertise coming from the two governments and industrial members, with IBM, as the research facilitator, providing the bridge between the two components leveraging its large research organization and commercial transition expertise. In addition to UK MoD and U.S. ARL researchers, the alliance includes foremost academic researchers from eight U.S. and eight UK top-tier universities. Our industrial alliance members fall into two categories: large system integrators with extensive defense and commercial presence (The Boeing Company, Honeywell and CGI) and small/medium enterprises (SMEs) with renowned expertise in the specialized fields that the ITA program is focused upon. The alliance has created a joint U.S./UK team that has successfully met the needs of ITA. IBM’s Watson Research Center at Yorktown Heights in New York is one of the largest IT research facilities in the world. IBM’s Hursley Laboratories is the largest IT R&D (research & development) facility in the UK. As a matter of fact, IBM is one of the largest employers of Information Sciences professionals (researchers, engineers, and developers) in the U.S. as well as in the UK. Few other companies can claim such a commanding presence of in-house research capability in both the countries. IBM’s strength in Information Sciences is augmented by its willingness and ability to work collaboratively with experts in other companies and technical fields to bring innovative solutions to the marketplace. IBM has research collaboration agreements with several universities and industrial consortium members, and is involved in a large number of concurrent research projects with several other consortium members. We have leveraged on our existing relationships to build a strong consortium for the ITA program. IBM has a reputation for innovation in technology and services. It has succeeded in maintaining a very good rate for transition of fundamental research into commercial products and services. To reach this position, IBM has developed open, collaborative, effective ways of managing and transitioning research.

7

IBM is also a leading business consultancy in the world (after acquiring PricewaterhouseCoopers Consulting in 2002). We will be drawing on that matchless capability to assist ARL and MoD with the technology transition process.

1.2.2.1 Academic Members of the Alliance The alliance includes some of the most prominent university professors, widely recognized as titans in their fields of research. Our partner Universities include University of Massachusetts, Amherst, University of California, Los Angeles, University of Cambridge, Cardiff University, Imperial College of Science and Technology, Columbia University, University of York, Royal Holloway College, Pennsylvania State University, University of Southampton, University of Aberdeen, Rensselaer Polytechnic Institute, Carnegie Mellon University, University of Maryland, and Royal Military College of Science at Cranfield University. HBCU/MI Requirements: City University of New York (CUNY) is our HBCU consortium member. CUNY has one of the pre-eminent graduate research programs in the U.S. among the historically black colleges and universities (HBCU), and the expertise and the technical excellence of CUNY professors is strong in related technical areas.

1.2.2.2 Industrial Members of Alliance IBM has teamed with two leading U.S. defense system integrators to bring additional military domain experience to the alliance. The Boeing Company has a rich heritage of nurturing fundamental research in network and information technology leading to the development of technologies that are then integrated into more mature capabilities for our client base. Honeywell is one of the world’s largest aerospace research and technology development organizations whose business spans from sensors to systems. The alliance’s capability is further strengthened by the addition of several small/medium enterprises with subject matter experts in technical areas related to the ITA program. ARA Klein is a leader in the field of decision-making and coalition planning. BBN Technologies is an expert in creating military networks, widely recognized for their contributions in the creation of the Internet. Roke Manor Research has a long- standing track record of defense communications research in the UK. LogicaCMG, now CGI brings expertise about the needs of UK armed forces to the consortium. Systems Engineering & Assessment Ltd. (SEA) is a defense contractor with extensive experience in military security issues, and possesses a renowned niche specialty in the area of sensor development and associated technology. EADS now Airbus joined the consortium in Autumn 2013 as they were part of a successful research proposal in the BPP13 plan. Together, IBM and its industrial partners have a combination of in-house research, military expertise, and pull-through capability required by ITA in U.S. as well as UK.

1.2.2.3 Government Members of the Alliance The researchers of United Kingdom Ministry of Defence and United States Army Research Laboratories are participating in the research and transition programs that are undertaken during the lifetime of ITA. In addition to providing technical and management oversight of the program, Government researchers will also participate in the various projects that are described in subsequent sections. The list of all members within the alliance is provided in the figure below. The alliance members include some of the most prominent researchers in their respective technical areas in each of the countries.

8

1.2.3 The Ways of Working The alliance members firmly believe that the fundamental research under ITA must challenge traditional assumptions, break down current barriers, depart from compartmentalized thinking, and pursue revolutionary solutions. In order to develop technology that cuts across traditional vertical boundaries and delivers the needed fundamental insights, the alliance has developed a horizontal research delivery model. There are six cross-cutting research projects defined that deliberately span across the two technical areas contributing to and/or exploiting directly or indirectly the research in these projects. Each project team is comprised of leading academic, industrial and Government researchers from both the U.S. and UK, working seamlessly together to develop a new model of international collaboration. Moreover, each project team includes faculty members and industry researchers who have expertise in more than one area. Each project consists of several activities that focus on specific research problems. The activities conducted within these projects may change over the course of this program and projects and activities may be terminated and new ones defined. But the basic principle of pursuing cross-area, cross-organizational research projects will remain unchanged. A strong emphasis within the alliance has been on promoting collaboration among the researchers across the U.S. and the UK, across industrial, academic and government researchers, and across the technical areas. The alliance has promoted such collaboration using a variety of techniques. IBM has been acting as the glue to bring together collaboration among several Universities by hosting students in its summer internship programs, promoting staff exchanges, and facilitating collaborative research. There have been various meetings and visits among the researchers in the alliance – resulting in the development of friendship and partnership across the different participating members. A number of workshops and conferences have been hosted jointly, and various papers and technical activities have been undertaken that include researchers across technical areas and across institutions. Web based collaboration tools and a public web-site act as the conduit for sharing information, and a significant increase in collaboration

9

was promoted by virtual and face-to-face interactions among the members of the program. As a result, a significant number of collaborative papers, co-organized workshops, and shared research programs were produced directly or indirectly due to the program over the past years, and plan to continue doing in this BPP.

1.3 Document Organization

The remaining document provides a summary of the technical achievements of each technical area. Each technical area has a section devoted to it. The first part of each section enumerates the key technical accomplishments and significant collaborations that have happened in the corresponding technical area in the sixth year of the program. Then the details of each of the projects within the technical areas are provided. Each project description begins with an introductory section that describes the objectives of the research in that project. The set of individual tasks within the project is described in that introduction. Sections that enumerate and briefly describe the technical accomplishments in each of the tasks follow this. Detailed descriptions of these accomplishments can be found in the references and publications that are enumerated in these summaries.

10

2. Technical Area 5: Coalition Interoperable Secure & Hybrid Networks

2.1 Overview

The main objective of our research is to further the research efforts on the understanding of performance, interoperability and security of hybrid coalition networks. We expect such advances to inform the design, management, and control of dynamic coalition operations with the goal of jointly meeting both networking and security objectives. The use of hybrid networks for coalition military operations introduces both opportunities and challenges. On one hand, hybrid networks (such as those that interoperate with commercial cellular networks) may reduce both infrastructure and operational cost and improve network availability and bandwidth. On the other hand, hybrid networks (such as those with disparate trust and security levels) introduce challenges in security/network management and control. Further, such opportunities and challenges introduced by hybrid coalition networks necessitate novel (re)design of distributed services to best exploit multiple communication modalities while mitigating various security threats. Project P1 – Hybrid Networks: Performance and Metrics: To date, most work on the design and control of military tactical ad-hoc networks assumes the absence of any infrastructure. However, these networks often operate in the presence of extensive commercial 3G/4G cellular networks and 802.11 access connected to the public Internet or private intranets. This available infrastructure offers the opportunity to build “hybrid networks”, which can offer and greater robustness. This project focuses on the design, monitoring, and control of such networks. It is divided into three tasks. The first focuses on the efficient monitoring and measurement of such a network along with its optimization. The second focuses on the control of such networks. Finally whereas these first two tasks focus primarily on unicast communications, the third task focuses on multicast in such networks. Project P2 – Security/Network Management and Control: Supporting agile management of the computational and communication resources, and applications executing on these resources with minimum human involvement is essential for maintaining hybrid coalition networks. Moreover, such management must also account for security concerns driven by military coalition requirements. While most state-of-the-art approaches handle resource and security management independently, we believe that supporting an integrated and coherent approach to network-wide analysis of network and security configuration is vital for enhancing the efficacy of coalition operations. This project will focus on the three aspects of this problem. The first two are the development and application of cloud-based and data- streaming techniques to resource and security management, and the third is the development and application of the declarative networking paradigm to account for security in coalition network environments. Project P3 –Security for Distributed Services: One of the key challenges in coalition operations is to foster collaboration and to support flexible information exchange amongst coalition members while protecting sensitive information from misuse. We envision that in such environments, distributed services that span multiple coalition members will collaboratively gather information from multiple sources and filter, fuse and query them with the goal of meeting various mission objectives. Further complicating matters, there may be wide variability in computational power, available power resources, and bandwidth between different members of the coalition. The main focus of this project is on protecting storage, retrieval, processing and fusion of sensitive information from multiple coalition domains. In particular, this project seeks to develop solutions using quantify and analyze cross-domain information flows and

11

develop fundamental cryptographic primitives (such as verifiable outsourcing using Quadratic Span Programs), for hybrid coalition networks.

2.1.1 Accomplishments Highlights Key highlights and contributions for each project in TA5 include: Project 1- Hybrid Networks: Performance and Metrics • Derivation of fundamental conditions for additive link metric tomography along with development of efficient algorithms for monitor placement, path selection, and link metric calculation. • Robust measurement design for network tomography in the presence of failures (joint work with P5.1). • Derivation of fundamental conditions for node failure localization via boolean network tomography. • Characterization of performance of multipath data transport in a hybrid cellular WiFi environment as a function of transfer size, number of paths, rate/route controller. • Development of path and energy management algorithms to provide robust, high performance, low energy data transfers in a multipath environment. • Development of algorithms for optimal multicasting in hybrid wireless networks where nodes have either point-to-point, broadcast, or both capabilities. Algorithms trade off approximability and time complexity. • Development of upper bounds on multicast capacity along with algorithms that can achieve a constant factor of this capacity. Project 2- Security/Network Management and Control • Development of poly-log competitive online approximation algorithm for embedding an application graph onto a physical micro-cloud network. • Development and evaluation of Villagecache – a system that leverages locality of interest and temporally shifts downloads to serve communities with poor network connectivity. Preliminary results from deployment in sub-Saharan Africa indicate significant improvements in availability and response latencies. • Development of distributed signature-based intrusion detection algorithm for multipath routing attacks with proof of correctness. Empirical performance results show detection delay is linearly dependent on network throughput. • Development of new policy-info-leakage aware query planning algorithm with proof of query plan safety. Algorithm insures that unacceptable authorization info leakages does not result from any generated query plan. • Development of network-aware query planning algorithms that adapt to network variability in mobile ad hoc networks (MANETs). • Development of optimal and near-optimal joint routing and caching schemes for wireless data streaming environments.

12

• Development of algorithms to authenticate the freshness of high volume data streams outsourced to untrusted parties (e.g., coalition partners). Project 3- Security for Distributed Services • Acceleration of Fully Homomorphic Encryption (FHE) using Field Programmable Gate Arrays (FPGA) that for the first time allows the use of FHE in practice. • Developed new models and metrics for quantifying the information release of dynamic secrets, which are secrets (like one's location, passwords, and encryption keys) that change over time. Used the model to prove some surprising results, e.g., that security can diminish if change is too frequent. • Developed a new, general-purpose programming language called Wysteria, with which one can write generic, mixed-mode secure multiparty computations (SMCs) - resulting is about 30x speedup. Wysteria's implementation has been made available to the academic community. • Accelerated implementation of verifiable keyword search on encrypted data whose performance is only 10x slower than the UNIX Global Regular Expression Parser (GREP) utility. • Generalized authenticated MACs for Arithmetic Circuits that allows a public verifier to validate computations on authenticated data without knowledge of secret key; and allows the secret-key owner to verify the validity of the computations without needing to know the original (authenticated) inputs. • Developed a new technique for SMC that uses the RAM model, rather than the circuit model, while preserving the same security guarantees. For sub-linear and multiple queries over a static dataset, this approach results in orders of magnitude speedups.

2.1.2 Other Important information / facts related to TA 5 TA5 research is actively engaged in two ongoing transition projects. First project on “Policy Controlled Database Federation” developed a database proxy based on the Gaian Database and the Policy Management Toolkit. The proxy will allow fine grained access to data in multiple distributed databases. Its first instantiation is for the U.S. Army Research Laboratory (ARL) multimodal signature database (MMSDB). Second transition project on “Enabling Unattended Asset Interoperability Using Controlled English” will involve several ITA technologies, including: the Controlled English (CE) Store from TA6, the Virtual Information Exchange (VIE) from TA5, the Policy Management Toolkit (PMT) from TA5 (formerly TA2), and the Distributed State Machine (DSM) from TA5. The scenario being highlighted to show the capabilities of these technologies involves sharing of information across different domains of authority, and involves authentication and access control capabilities. Research results from project 1 on network tomography have been compiled as a toolkit, which has been made available to all members of the consortium. In addition parts of this toolkit have been transferred to ARL for its internal use. Also, algorithms developed under the multipath data transport task have all been deployed on the Android platform. Several research groups outside of ITA (NSF FIA; UCL, Belgium) are currently using portions of this deployed code. Elements from demonstrations developed by this task, including the convoy scenario have been transitioned into the ITA experimentation framework. Research output from project 2 on microcloud is being prototyped (https://www.usukitacs.com/node/2708) for CERDEC as the infrastructure to support dynamic deployment of capabilities to (and from) Command and Control and edge posts such a forward operating bases, sensor systems, and platoons. A capability demonstration is currently planned and will be an initial step for transition. Further, the output from declarative networking task on secure and collaborative query

13

processing is being transitioned in stages to the GaianDB software product. Algorithms from the stream processing task on adaptive query planning is made available as extensiosn to SEEP - an open source Java-based stream processing engine Research results from the security project have been released as a library, which would be made available to all members of the consortium. This includes the FHE (Fully Homomorphic Encryption) library, FHE acceleration library using FPGAs (Field Programmable Gate Arrays), verifiable outsourced keyword search library (also deployed as an experimental service on IBM-US public cloud infrastructure), and a toolkit of algorithms for tracking information flow release as extensions to probabilistic programming languages. Finally, TA5 research has contributed significantly to the development of the joint ITA/CTA Experimentation-as-a-Service platform (most of which are scheduled to be available by the end of 2014) including, Simple Policy Service, Micro-Cloud as a Service, Federated database authentication as a service, and Semantic enrichment service.

14

2.2 Project 1- Hybrid Networks: Performance and Metrics

2.2.1 Introduction Project 1 seeks to better understand the design, monitoring and control of hybrid networks. The project comprises the following three tasks. Task 1- Tomography of Hybrid Coalition Networks pertains to obtaining accurate and timely knowledge of the state of the network. Network state is essential for effective utilization of network resources in operations such as capacity planning and traffic engineering; in cases of anomalous network behaviours, it is also crucial for diagnosing the sources of anomaly. Traditional solutions obtain such information using monitoring agents distributed throughout the network to directly measure the states of network elements (nodes/links), which do not satisfy the needs of hybrid coalition networks where part of the network belongs to coalition partners or third parties, as the native monitoring system does not satisfy the required level of trust or is not integrated with the main monitoring system. This challenge calls for network tomographic approaches that can distill the needed information from measurements collected from scattered external nodes employed as monitors. The research issues investigated during the year were: (i) fundamental conditions for consistent network state recovery (a.k.a. network identifiability) and efficient algorithms for supporting such recovery via optimized placements of monitors and construction of measurement paths; (ii) practical methods for determining the partial network state that is consistently recoverable in cases where complete recovery is not feasible and efficient monitor placement algorithm to optimize the recoverable partial state under monitor resource constraint; (iii) robust measurement design in the presence of unreliable links to optimize the recovery of the remaining link states; and (iv) fundamental bounds on the capability of localizing simultaneous failures from observed failures of measurement paths. Task 2- Multipath Control in Hybrid Coalition Networks is developing mechanisms that can take advantage of path diversity often present in military hybrid environments so as to provide robust, high performance communications to the end soldier. This diversity can be the consequence of the presence of coalition networks that overlap spatially and/or different technologies (e.g., WiFi and 3G/4G cellular). The research issues investigated since the BPP began include: (i) quantifying and understanding the benefits of multipath data transport in realistic environments; (ii) understanding how path outages occur and developing mechanisms to manage such outages in a graceful manner; (iii) understanding the energy- performance trade-offs that arise when using diverse technologies for creating multipath transfers and developing mechanisms that leverage these trade-offs to provide good performance at low energy consumption; and (iv) the development of mechanisms to harness path diversity even when end hosts are not multi-homed. Task 3- Robust Coalition Multicast in Dynamic Wireless Networks is studying of both theoretical and practical aspects of multicasting in hybrid wireless networks that have not received much attention in the past. The goal of this task is to develop foundational theory and algorithms that would eventually lead to practical multicast protocols in hybrid wireless networks. The research issues investigated this year included: (i) optimization algorithms for hybrid wireless networks where nodes may have either or both point-to-point wireless links or broadcast wireless links; (ii) development of bounds and optimization algorithms for scheduling multiple multicast sessions in multi-channel/multi-radio (MC/MR) networks; (iii) preliminary study of the space of trade-offs for robust multicast in dynamic wireless networks; and (iv) preliminary study of the k-anycast problem in a declarative network setting.

15

2.2.2 Research Impact

2.2.2.1 Task 1- Tomography of Hybrid Coalition Networks Technical merit: The key technical contributions for the past year are: (i) We developed fundamental conditions that characterize the identifiability of network state in terms of observable network parameters such as network topology and monitor placement, based on which we developed efficient algorithms for optimized monitor placement and measurement path construction that significantly improve the state of art [1]2 [2]3 (Best Paper at ICDCS’13 and Best Paper Nominee at IMC’13); (ii) We provided an efficient algorithm to determine the identifiable partial network state in cases the entire state is not identifiable, based on which we developed a polynomial-time monitor placement algorithm that is proved to be optimal for networks with mild resilience (no single point of failure) [3]4; (iii) We proposed a general framework for robust measurement design in the presence of stochastic link failures, so as to maximize the information on network state extracted from available measurements [4]5; (iv) We established fundamental bounds on the capability of localizing failed nodes from observed path failures, using which we formally quantified the impact of probing mechanisms on network failure localization. Synergistic value of collaboration: Research activities in this project were performed jointly between US industry (IBM), US academia (UMass), and UK academia (Imperial College). Part of the activities (item (4) above) was the result of collaboration with P5.1 (Managing Distributed Service Deployments in Hybrid Wireless Networks). All the activities benefit significantly from detailed discussions, including inputs from the perspective of defense applications, with collaborators at ARL (Ananthram Swami) and DSTL (Jessica Lowe). Scientific challenge: The main challenge in this project comes from the lack of understanding on how the solvability of network tomography problems relates to observable/controllable network parameters. Although many of the above problems accept a brute-force solution (e.g., enumeration of all possible monitor placements/measurement paths/failure events), such a solution scales very poorly. Hence, our research aims at providing solutions with low complexity. Military relevance: The research in the projects addresses the need of hybrid coalition networks to support services vital to coalition forces with a substrate consisting of possibly not-completely-trusted, not-fully-instrumented elements. By providing the services with accurate and timely knowledge of the state of the network, we address several military needs identified in the new BPP: (i) agile and automated security/network management (by providing awareness of network conditions), (ii) new tomographic monitoring methodologies suitable for composite networks, (iii) untrusted or even adversarial environments (by not relying on in-network monitoring agents). Exploitation & technology transition potential: As a result of this work, we have developed a network tomography toolkit with functions that support the entire flow of network tomography including monitor

2 Liang Ma, Ting He, Kin K. Leung, Ananthram Swami, and Don Towsley, Identifiability of Link Metrics based on End-to-end Path Measurements, ACM IMC (BEST PAPER NOMINEE), October 2013. https://www.usukitacs.com/node/2483 3 Liang Ma, Ting He, Kin K. Leung, Don Towsley, and Ananthram Swami, "Efficient Identification of Additive Link Metrics via Network Tomography," IEEE ICDCS (BEST PAPER AWARD), July 2013. https://www.usukitacs.com/node/2593 4 Liang Ma, Ting He, Kin K. Leung, Ananthram Swami, and Don Towsley, Monitor Placement for Maximal Identifiability in Network Tomography, IEEE INFOCOM, April 2014. https://www.usukitacs.com/node/2484 5 Srikar Tati, Simone Silvestri, Ting He, and Tom La Porta, Robust Network Tomography in the Presence of Failures, IEEE ICDCS, June 2014. https://www.usukitacs.com/node/2735

16

placement, selection/construction of measurement paths, determination of identifiable network elements, and inference of network state. We have shared this toolkit with other consortium members through the ITA Experiment Framework. Our toolkit can be transitioned as a stand-alone monitoring facility that provides in-depth understanding of network performance/health in coalition environment. Moreover, it can be integrated with higher layer applications (e.g., distributed services) to enable network-aware self- optimized/self-healed services for military users. Furthermore, software of some of the tomography algorithms developed in this work has been transferred to ARL for its use.

2.2.2.2 Task 2- Multipath Control in Hybrid Coalition Networks

Technical merit: The key technical contributions for the past year are/include: (i) A thorough characterization of the benefit of multipath data transport in a hybrid environment, which accounts for content download/upload length, control strategy, and number of paths [6]6. (ii) The development of fluid models of data transport in a mobile environment7. (iii) Development of novel mechanisms for handling path outages, specifically those traversing WiFi [7]8. (iv) Characterization of energy consumption profiles for devices executing data downloads/uploads over multiple paths along with the development of mechanisms to select the combination of paths used to perform downloads so as to provide high throughput at low energy consumption [8]9. (v) An end-to-end delay analysis over multipath data transport and a study of how “buffer bloat” affects the performance of multipath data transport10. (vi) Modelling the benefits that arise from using multipath algorithms in the presence of traffic splitting and path diversity 11. (vii) The prototyping and demonstration of the potential benefits of multipath multisource data transport 12. Synergistic value of collaboration: All of the research activities have involved University of Cambridge, IBM USA, and UMass-Amherst, and Roke Manor, either as equal collaborators or as sounding boards for some of the ideas. It cannot be stressed strongly enough that the research output would neither have been as deep or as broad in the absence of such collaboration. For example, the IMC paper received high ratings because of the care put into the experimental design data analysis. IBM and UMass-Amherst led the experimental design effort with support from Cambridge, while Cambridge led the analysis effort with support from IBM and UMass-Amherst. Similar statements apply for several other completed studies.

6 Y.-C. Chen, Y.-S. Lim, R.J. Gibbens, E.M. Nahum, R. Khalili, D. Towsley. “A measurement-based study of Multipath TCP performance in wireless networks”, ACM IMC 2013, Nov. 2013. https://www.usukitacs.com/node/2596 7 A. Bejan, R.J. Gibbens, Y.-s. Lim, D. Towsley, "A performance analysis study of multipath routing in a hybrid network with mobile users", Proc. of ITC 2013, Sept. 2013. 8 Y.-S. Lim, Y.C. Chen, E. Nahum, D. Towsley, K.-W. Lee. "Cross-layer path management in multi-path transport protocol for mobile devices,", INFOCOM 2014. https://www.usukitacs.com/node/2642 9 Y.-S. Lim, E. Nahum, D. Towsley, R. Gibbens. “How Green is Multipath TCP for Mobile Devices?” to appear at 4th Workshop on All Things Cellular: Operations, Applications and Challenges, 2014. https://www.usukitacs.com/node/2719 10 Y.C. Chen, D. Towsley. “On Bufferbloat and Delay Analysis of MultiPath TCP in Wireless Networks” IFIP Networking 2014. 11 A. Bejan, R. Gibbens, R. Hancock, H. Tripp, A. Freeman and D. Towsley 2014 “On Enhancing Multipath Data Transport via Traffic Splitting AFM 2014 https://www.usukita.org/node/2738 12 J. Kim, Y.-C. Chen, R. Khalili, D. Towsley, A. Feldmann, “Multi-source Multipath HTTP: A Proposal”, ACM SIGMETRICS 2014.

17

Scientific challenge: The main challenge of this task is to handle the uncertainties and dynamics that arise in military settings when using multipath route/rate control in a hybrid network setting. Path disruptions occur randomly and without warning. Thus resource allocation methods must be developed that will handle such disruptions robustly and seamlessly. Moreover, they must be cognizant of energy limitations of untethered mobile devices and must be able to take advantage of path diversity even when end hosts are not multi-homed. Military relevance: Future tactical operations will frequently have multiple different communications technologies available. These may be available civilian infrastructures such as 3G/4G or WiFi, networks administered by different coalition partners, or simply that future tactical networks are based on layered communications systems, for example using specialized military radios in conjunction with militarized COTS technology such as tactical cellular networks. This network diversity provides an opportunity to provide improved robustness and throughput to units by allowing them to form end-to-end connections over multiple paths through these different networks. Our task is focused on developing the mechanisms needed to harness this diversity to provide quality of service and fair sharing, in an energy efficient manner, with minimal management overhead.

Exploitation & technology transition potential: The multipath approach can be deployed incrementally, and can achieve significant benefits even where routers and other network infrastructure are entirely unmodified. (One of the outputs of this work will be a quantified understanding of how much can be gained by coordinated enhancements to end-systems and infrastructure.) This makes the technology highly attractive for exploitation in future networks, where these are based on evolving and extending rather than replacing the current capability. We have already seen how experimental demonstrations can greatly help us to describe our approach to enhancing performance and robustness of networks through use of multipath flow control and routing. All of our mechanisms have been deployed in Android-based handsets and are readily available for trials out in the field. Earlier contacts with CERDEC suggests that this would fit with some of their thinking for the deployment of future coalition networks and would create opportunities for further refinement and testing of our basic findings in increasingly challenging and realistic situations.

2.2.2.3 Task 3- Robust Coalition Multicast in Dynamic Wireless Networks Technical merit: The key technical contributions for the past year are/include: (i) algorithms with guaranteed approximation factors for multicasting in hybrid wireless networks with both point-to-point and broadcast wireless transmission technologies [9]13; (ii) new bounds and optimization algorithms for scheduling multiple multicast sessions in Multi-Channel/Multi-Radio (MC/MR) wireless networks; (iii) study of robust multicast – in particular, investigation of fundamental trade-off between the number of nodes supporting multicast and the gains in connectivity in terms of the number of “node-disjoint paths” among nodes in the multicast group [10]14 , and preliminary investigations into more practically achievable measures of path redundancy that are prevalent in mobile and dynamic networks; (iv) fundamental (analytical) characterization of broadcast percolation thresholds in a large family of multi- layered hybrid networks which can model coalition network scenarios [11]15– a stepping stone for

13 P. Basu, C. K. Chau, A. Iu. Bejan, R. Gibbens, and S. Guha, “Efficient Multicast in Hybrid Wireless Networks”, ITA Annual Fall Meeting, Maryland, September 2013. https://www.usukitacs.com/node/2515 14 R. Irwin and P. Basu, “Robust Multicast Clouds”, Proc. MILCOM 2013, San Diego CA, November 2013. https://www.usukitacs.com/node/2372 15 S. Guha, D. Towsley, C. Capar, A. Swami, and P. Basu, “Layered Percolation”, http://arxiv.org/abs/1402.7057 (Selected for Oral Presentation at NetSci 2014, Berkeley CA, June 2014.) https://www.usukitacs.com/node/2742

18

studying multicast percolation; (v) preliminary research into the development of distributed heuristics for the k-anycast problem based on a bound-and-prune extension of the Prim’s Minimum Spanning Tree algorithm within the declarative networking paradigm. Synergistic value of collaboration: Research activities in this project were performed jointly between BBN, Cambridge University, IBM, and Roke Manor. All the research activities were inherently the result of multi-institution collaborations, including the problem formulation phase. The ongoing research on declarative anycast is due to a collaboration of a subset of researchers from this task and those from P2.2 (declarative multicast). The research has benefited significantly from visits by researchers from Cambridge and IBM to BBN, face-to-face technical interchange meetings at Maryland and Imperial College, and several military-relevant inputs from ARL (Ananthram Swami) and DSTL (Stuart Farquhar). Scientific challenge: The main challenge in this project comes from the fact that coalition wireless network systems are challenging due to being heterogeneous in terms of a diverse choice of wireless technologies, policies across coalition partners, and due to their dynamic and resource-constrained nature. Most simple problems involving multicast are computationally hard to solve (NP complete), hence the heterogeneous and dynamic network multicast problems are not any easier. Therefore, the technical challenge is to develop practical algorithms that not only yield approximation guarantees, but also have reasonable time complexity. Military relevance: The research in the projects addresses the need for multicast communications in tactical wireless networks. The usage of voice and eventually video call groups is expected to rise in the tactical wireless networks of the current and the future and hence the importance of multicast will potentially grow. MC/MR networks are rising in popularity in the military due to the desire for higher network throughput, hence new research to efficiently schedule as many concurrent multicast sessions as possible is highly relevant. Finally, military networks suffer from frequent outages and dynamics; hence research on improving the robustness of multicast communications is highly relevant. Exploitation & technology transition potential: As a result of this work, we plan to discuss opportunities for the transition of the algorithms that have / will come out of the research in this task to real military MC/MR systems such as Wireless Network After Next (WNAN). There is potential to transition hybrid network multicast algorithms to programs within CERDEC. Finally, declarative implementations of multicast and anycast, once realized, would eventually become a part of the ITA Experimentation Framework.

2.2.3 Technical Accomplishments

2.2.3.1 Task 1- Tomography of Hybrid Coalition Networks

Fundamental Monitor Measurement Network state understanding placement design recovery [MHLST2013] [MHLST2013] [MHLTS2013] [MHLTS2013] [MHLST2014] [MHLST2014] [TSHP2014] [MHSTLL2014]

Fig. 1 Work flow of network tomography

19

Fig. 1 illustrates a high-level workflow in applying the methodology of network tomography. In Task 1, our investigation has covered all stages of the workflow, where the labels represent our publications addressing challenges in each stage. Our specific accomplishments are: Paper [1]2, “Identifiability of Link Metrics Based on End-to-end Path Measurements” (Best Paper Nominee at IMC’13), investigates the fundamental condition on network topology and monitor placement to uniquely infer link metrics from path measurements; moreover, it also proposes a linear-complexity monitor placement algorithm that satisfies this condition with the minimum number of monitors. Paper [2]3, “Efficient Identification of Additive Link Metrics via Network Tomography” (Best Paper at ICDCS’13), further proposes a quadratic-complexity path construction algorithm and a linear-complexity link identification algorithm that construct the paths used for probing the network and deduce the states of individual links from the path measurements, respectively. Both papers provide breakthrough techniques that are orders of magnitude more efficient than existing solutions when applying to large networks. Paper [3]4, “Monitor Placement for Maximal Identifiability in Network Tomography” (INFOCOM’14), complements the above works by addressing cases where complete identifiability cannot be achieved (e.g., due to limited monitor resource). It proposes a linear-complexity algorithm to determine the set of identifiable links under given monitor placement, based on which it further proposes a polynomial-time (O(n4)) algorithm to place a given number of monitors so as to maximize the number of identifiable links. We show that the proposed monitor placement algorithm is provably optimal for networks without single point of failure (2-connected networks) and empirically near optimal for several real networks. Paper [4]5, “Robust Network Tomography in the Presence of Failures” (ICDCS’14), studies the problem of measurement path selection in the context of unreliable links. It proposes a formal framework for selecting measurement paths as a constrained optimization of expected path rank (number of linearly independent paths that are successfully measured) under limited measurement budget. We show the NP- hardness of an optimal solution, followed by a polynomial-time algorithm with guaranteed approximation ratio and a reinforcement-learning algorithm that addresses the challenge of unknown failure distribution. Both algorithms are shown to perform significantly better than state-of-art path selection algorithms (joint work with Project 5 Task 1). Paper [5]16, “Node Failure Localization via Network Tomography”, investigates the network’s fundamental capability in diagnosing and localizing failed nodes from observed path states. It proposes a novel measure, maximum identifiability, to quantify this capability as the maximum number of simultaneous failures that can be uniquely localized. We establish tight upper/lower bounds on this measure under a set of representative probing mechanisms providing different trade-offs between the flexibility of probing and the cost of implementation. These bounds enable formal optimization of the design of network monitoring systems based on quantifiable trade-off between cost and benefit.

2.2.3.2 Task 2- Multipath Control in Hybrid Coalition Networks

Paper [6]6 “A measurement-based study of Multipath TCP performance in wireless networks” reports on a careful measurement study of multipath TCP (MPTCP) in a hybrid network consisting of WiFi and 3G/4G cellular. It studies the download and upload latencies of transfers of files for a range of sizes ranging from 8KB to 32MB, for different numbers of paths, 1, 2, 4, and different controllers. Briefly it

16 Liang Ma, Ting He, Ananthram Swami, Don Towsley, Kin K. Leung, and Jessica Lowe, "Node Failure Localization via Network Tomography", submitted to ACM IMC, 2014. https://www.usukitacs.com/node/2736

20

shows that MPTCP does at least as well as the best single path TCP data transfer and often substantially better, especially as file sizes increase. In addition, four paths provide better performance than two paths. Paper [7]8 “Cross-layer path management in multi-path transport protocol for mobile devices” addresses the problem of lossy wireless paths and how they should be dealt with in the context of multipath data transport. It reports severe performance degradation in a hybrid WiFi/cellular environment as an end host moves in and out of range of WiFi access points. A mechanism is then proposed that uses WiFi retransmission statistics to determine when to deactivate WiFi-based paths and SNR readings to determine when to activate them. Measurements indicate that this mechanism is effective in handling impaired WiFi channels. Paper [8]9 “How Green is Multipath TCP for Mobile Devices?” reports the most complete set of measurements regarding energy consumption of MPTCP accounting for different SNRs and different network loads. Using these measurements, it proposes a mechanism to select the combination of paths that can provide good performance at low energy consumption. Measurements indicate the resulting mechanism to be quite promising.

2.2.3.3 Task 3- Robust Coalition Multicast in Dynamic Wireless Networks Research in this task this year has revolved around six different threads and has resulted in the following accomplishments. The first thread involved the development of algorithms for optimal multicasting in hybrid wireless networks where some nodes may have point-to-point wireless connectivity (e.g., 4G Cellular or SATCOM); some may have broadcast wireless connectivity (e.g., WiFi in ad hoc mode) and some nodes may have both capabilities. This work resulted in a paper “Efficient Multicast in Hybrid Wireless Networks” [9]13 that focuses on the underlying optimization problem, which is a hybrid of two well- known NP-hard graph optimization problems—the Minimum Steiner Tree problem (for point-to-point links) and the Minimum Steiner Connected Dominating Set problem (for broadcast links). We consider both edge- and node-weighted versions of this problem and use distinctly different methodologies to formulate two algorithms with guaranteed approximation factors. The first approximation algorithm involves formulating the problem as a directed hypergraph, converting it to a characteristic graph, and then applying a known Directed Steiner Tree computation algorithm on the resultant graph. The second approximation algorithm combines those for Set Cover and Steiner Tree effectively to find a hybrid structure. We further demonstrate by means of simulation modeling of standard deployment scenarios that while the first algorithm outperforms the second one in terms of the tree cost (since the latter has a relatively poor approximation factor of O(logn) ), the latter outperforms the former in terms of complexity and other practical considerations. Finally, using algorithmic ideas from percolation theory, we demonstrate the trade-off between network connectivity and multicast cost, when hybrid capability is added incrementally to the nodes of a MANET-only deployment. The second thread focuses on developing new bounds and optimization algorithms for scheduling multiple multicast sessions in MC/MR wireless networks. Specifically, we consider the problem of maximizing the uniform multicast throughput across multiple sessions in such networks. This is a highly challenging problem because it requires making joint decisions about channel assignment, scheduling, as well as routing and becomes intractable for general networks. To gain insight into the problem, we first consider a simpler network topology where all nodes are within each other's transmission range (aka "Parking Lot"). In spite of its simplicity, this topology is practically important since it is encountered is several real-world settings. Further, as we show, a solution to the multicast MC/MR scheduling problem for this network serves as a building block for more general scenarios that are otherwise hard to analyze.

21

We note that the original problem of maximizing the uniform multicast throughput across multiple sessions remains an NP-hard problem for the Parking Lot network. However, the special structure of this network has enabled us to derive upper bounds on the multicast capacity. Further, we have designed two polynomial time approximation algorithms that are guaranteed to achieve a constant factor of this capacity for this network with arbitrary multicast group memberships. The first algorithm has an approximation ratio of dmax where dmax is the maximum group participation degree of any node. This algorithm ensures that any end-to-end path is at most 2 hops long. The second algorithm has an approximation ratio of 4. However, the end-to-end path can be up to dmax hops long. Thus, these algorithms provide different trade-offs between achievable throughput and total number of transmissions. We will be submitting this work as a paper at INFOCOM 2015. In the near future, we are working on using the Parking Lot topology as a building block of more general topologies. As an example, consider a hybrid network where multiple Parking Lot networks are connected via a cellular base-station. The nodes in a multicast session could be distributed across multiple Parking Lot networks. We can use the results for the basic Parking Lot topology by decomposing this hybrid network into a collection of independent networks that are coupled by a single constraint. In the third thread, we investigated robust multicast in dynamic wireless networks. In the paper “Robust Multicast Clouds” [10]14, we argue for designing tunable reliability more explicitly into the act of topology construction based on the severity of forecasted dynamics in the network. This paper studies the number of nodes in the network supporting multicast traffic for a group of nodes; with more nodes supporting multicast traffic, the number of available redundant paths increases, enhancing reliability. We investigate the fundamental trade-off of the amount of nodes supporting multicast communications and the gains in the number of “node-disjoint paths” among nodes in the multicast group. Traditionally, wired and wireless approaches to multicast focus on constructing some sort of tree-based topology with the benefit of having the minimum amount of resources allocated and a simplified routing protocol without redundant paths. Others have argued for ring-based topologies, augmented trees or rings allowing for alternate paths, and others have explored the mesh-based routing problem for multicast. We formulate a set of mixed integer linear programs (MILPs) for determining the nodes participating in the multicast group; we denote the set of nodes supporting multicast traffic as the multicast cloud. We investigate the fundamental trade-offs of the fidelity of the cloud over a wide-range of parameters, accounting for various types of multicast topology allocation strategies (e.g. tree- and meshed-based approaches). We find that many nodes are needed to construct a single path among all nodes in a multicast group, but the cost for an additional node-disjoint path requires a fraction of additional nodes. We characterize this sub-linear cost growth in terms of the number of nodes needed to provide connectivity to the cloud, and we characterize how the diameter of the cloud decreases as more nodes participate in group-communications. Since the aforementioned traditional version of robustness (i.e., construction of node-disjoint paths) may be expensive in terms of resource consumption and perhaps overkill, we are currently focusing on studying less stringent robustness metrics involving “braided” paths for multicast. By careful overprovisioning of links at certain locations in the network at the current instant of time, we can make the multicast delivery robust to mobility in the near future. We want to characterize the trade-off space between multicast delivery ratio vs. cost of multicast forwarding vs. cost of topology gathering as a function of mobility parameters. In contrast with several heuristic studies proposing mesh-based multicast protocols, our approach attempts to capture the fundamental trade-offs between the metrics and parameters mentioned above in the asymptotic limit as well as for finite networks. In the fourth thread, we have developed new fundamental results for an interesting class of hybrid networks, which form a stepping-stone for further study in the multicast setting. In the paper “Layered Percolation” [11]15 we have considered the problem of percolation in “hybrid” networks that can be constructed by taking a union of multiple sparser networks. One class of hybrid coalition networks can be modeled using this approach (with some simplifying assumptions), where each layer can correspond to a

22

UK or a US tactical communications network. We study the emergence of long-range connectivity (or network-wide broadcast property) of a homogeneous multiplex networks formed via merging M random site-percolating instances of the same graph G with single-layer site-occupation probability q – we argue that when q exceeds a threshold qc (M ) = Θ(1/ M ) , a spanning cluster appears in the multilayer network. This means significantly sparser connectivity is required in each participating network in order to perform a broadcast in the “union” network. Using a configuration model approach, we find qc(M) exactly for random graphs with arbitrary degree distributions. For multilayer percolation in a + general graph G, we show that qc / M < qc (M ) < −ln(1− pc ) / M ,∀M ∈ Z , where qc and pc are the site and bond percolation thresholds of G, respectively. We show a close connection between multilayer percolation and mixed (site-bond) percolation, since both provide a smooth bridge between pure-site and pure-bond percolations. We also find excellent approximations and bounds on layered percolation thresholds for regular lattices using the aforesaid connection. The insights on “broadcast” connectivity developed in this work are likely to be very useful for developing analogous thresholds for “multicast” connectivity in hybrid coalition networks. Finally, we are collaborating with researchers in Project 2 (Declarative Networks) on developing new distributed algorithms for k-anycast, which can provide a critical service for content-centric multicasting. In content-centric networking, communications are driven by content and host/node metadata instead of host/node addresses, enabling effective and efficient communications in the challenging environment characterized by short-lived connectivity and highly dynamic network topologies. The “delivery semantics” here is whether a publishing node is interested in sending content to any, some, or all nodes whose queries match the published content. We formalize this problem as k-anycast, where delivery is attempted to some k matching nodes. In contrast to traditional multicast, only the number k is known here a priori but not the actual addresses of the matching terminal nodes. This is also relevant in battlefield scenarios where it is virtually impossible to reach every node in the multicast group – instead, relaxing this requirement to a simpler requirement of reaching some k nodes that meet some appropriate policies can significantly reduce resource consumption. Since k-anycast is NP-hard, we are developing distributed heuristics based on a bound-and-prune extension of the Prim’s Minimum Spanning Tree algorithm, which are amenable to practical implementation within the declarative network engine, DSM.

2.2.4 References for Project 1 [1] Liang Ma, Ting He, Kin K. Leung, Ananthram Swami, and Don Towsley, Identifiability of Link Metrics based on End-to-end Path Measurements, ACM IMC (BEST PAPER NOMINEE), October 2013. https://www.usukitacs.com/node/2483 [2] Liang Ma, Ting He, Kin K. Leung, Don Towsley, and Ananthram Swami, "Efficient Identification of Additive Link Metrics via Network Tomography," IEEE ICDCS (BEST PAPER AWARD), July 2013. https://www.usukitacs.com/node/2593 [3] Liang Ma, Ting He, Kin K. Leung, Ananthram Swami, and Don Towsley, Monitor Placement for Maximal Identifiability in Network Tomography, IEEE INFOCOM, April 2014. https://www.usukitacs.com/node/2484 [4] Srikar Tati, Simone Silvestri, Ting He, and Tom La Porta, Robust Network Tomography in the Presence of Failures, IEEE ICDCS, June 2014. https://www.usukitacs.com/node/2735 [5] Liang Ma, Ting He, Ananthram Swami, Don Towsley, Kin K. Leung, and Jessica Lowe, "Node Failure Localization via Network Tomography", submitted to ACM IMC, 2014. https://www.usukitacs.com/node/2736

23

[6] Y.-C. Chen, Y.-s. Lim, R.J. Gibbens, E.M. Nahum, R. Khalili, D. Towsley. “A measurement-based study of Multipath TCP performance in wireless networks”, ACM IMC 2013, Nov. 2013. https://www.usukitacs.com/node/2596

[7] Y.-s. Lim, Y.C. Chen, E. Nahum, D. Towsley, K.-W. Lee. "Cross-layer path management in multi-path transport protocol for mobile devices,", INFOCOM 2014. https://www.usukitacs.com/node/2642

[8] Y.-S. Lim, E. Nahum, D. Towsley, R. Gibbens. “How Green is Multipath TCP for Mobile Devices?” to appear at 4th Workshop on All Things Cellular: Operations, Applications and Challenges, 2014. https://www.usukitacs.com/node/2719

[9] [BCBGG2013] P. Basu, C. K. Chau, A. Iu. Bejan, R. Gibbens, and S. Guha, “Efficient Multicast in Hybrid Wireless Networks”, ITA Annual Fall Meeting, Maryland, September 2013. https://www.usukitacs.com/node/2515 [10] [IB2013] R. Irwin and P. Basu, “Robust Multicast Clouds”, Proc. MILCOM 2013, San Diego CA, November 2013. https://www.usukitacs.com/node/2372 [11] [GTCSB2014] S. Guha, D. Towsley, C. Capar, A. Swami, and P. Basu, “Layered Percolation”, http://arxiv.org/abs/1402.7057 (Selected for Oral Presentation at NetSci 2014, Berkeley CA, June 2014.) https://www.usukitacs.com/node/2742

24

2.3 Project 2 – Security/Network Management and Control

2.3.1 Introduction The modern tactical communications environment presents increasingly complex challenges for network and security management and control. Multiple independent trends17 conspire to complicate the task of the military user in deploying, configuring and exploiting the communications and processing capabilities that are available. These demands mean that Hybrid Coalition Networks (HCNs) increasingly (and are roadmapped to) deploy with higher processing and bandwidth capabilities. As a result, future coalition operations envision processor- and data- intensive real time services such as location tracking, intrusion detection, context sharing, and multimedia analytics to aid data-to-decision at the tactical edge. These analytics services will enable each commander on the ground to gather relevant information and take mission-critical decisions, alongside the same information being aggregated up the command chain for a more global view of operations and high-level decisions. An overarching objective of this project is therefore to look towards these attractive cloud, data-centre and communication technologies and see which of them can be utilised in the security constrained hybrid coalition network and tactical scenario. We have started to show that many of these technologies cannot be directly applied in their current form, as they are designed based on assumptions that do not match the constraints of the tactical environment. The project is structured into three tasks. Two new tasks (task 1 & task 3) look at how emerging technologies can be adapted to the specific needs of secure military tactical networks, and the continuing task (task 2) looks at how these complex technologies (and more generally other technologies) can be provably verified to meet security and other goals. • Task 1 – Mobile Micro Cloud: looks at the problem of users requesting information and how the task of gathering and processing the request is devolved to the "cloud". How should the nodes of this cloud be arranged? How should the workloads and load balancing of the nodes be managed? How can this be done within the context of dynamic coalition networks and security constraints? o This task has developed novel algorithms to address the placement problem; carried out real world experimentation in limited bandwidth environments; and developed military relevant scenarios and application pattern abstractions and security constraints to under pin a future tactical cloud middleware.

• Task 2 - A Declarative Infrastructure for Network And Security Management: looks at the more general problem of provably designing and implementing these complex distributed algorithms in a secure way. Under what situations can I guarantee certain security properties? How can a complex distributed system be shown to be secure, not only in theory, but after implementation?

17 Technologies are more sophisticated and their behavior is more difficult to understand and predict intuitively; Military operations are both more diverse and more fluid, meaning that the communications requirements change in style and scope from one day to the next; The demands on communications capability continue to grow and diversify emphasizing the need to manage the security properties of the network jointly with its other performance attributes.

25

o This task has enhanced the declarative framework for the specification, simulation and analysis of distributed applications – and then used it to (i) identify a new type of multipath network threat that would not be detected by current intrusion detection algorithms and (ii) discovered, formulated, and investigated a fundamentally important new data protection problem in coalition environments – that of policy information leakages and on how to stop them.

• Task 3 - Data Stream Processing in Hybrid Coalition Network looks at the problem of processing and distributing the ever increasing volumes of real-time data that enter the network “on the fly”. This approach looks at adapting the relatively new idea of stream processing, and seeks to extend its applicability from core data centers to the distributed dynamic HCN environment, where the specifics of the military domain pose significant challenges. o In the Network thread we introduced frameworks, algorithms and system prototypes which enable stream processing applications to operate efficiently in distributed and dynamic hybrid coalition network environments. In particular, we investigated three fundamental problems, load balancing of streaming data and computations, adaptive query planning, and optimal integration of real-time data streams with archived cached data contents. In the Security thread, we introduced a framework for authentication of fresh data when streaming computations are outsourced to third untrusted or partially trusted parties (i.e. public cloud or coalition member) and a risk analysis framework that analyzes large volumes of security alerts and computes the reputation of diverse entities based on domain knowledge and interactions between entities.

2.3.2 Research Impact

2.3.2.1 Task 1 – Mobile Micro-Cloud Technical merit: The key technical contributions for the past year are: (1) Developed a poly-log competitive online approximation algorithm for embedding an application graph onto a physical micro- cloud network; (2) Conducted extensive experimental studies and prototype deployments in sub-Saharan Africa for enabling media sharing and redelivery at the edge for bandwidth-limited networks; (3) Identified key design issues in delivery of content to mobile devices and showed that mobility has a significant impact on response latencies (up to 75x based on analysis of 130 million HTTP transaction records); and (4) Formulated the military and security aspects of the tactical cloud so that abstract research is targeted toward interesting and challenging problems, and also towards aspects that are specific and unique to the future military operational needs. Synergistic value of collaboration: Research activities in this project were performed jointly between IBM-US, Imperial College, Roke Manor, and UCLA. Shiqiang Wang (Imperial) was an intern at IBM-US in 2013 and will be an intern at ARL/IBM-US in 2014. Joint session at ITA/CTA bootcamp, 2014 on Apollo factfinder and mobile micro-cloud. Scientific challenge: The main challenges in this project come from (1) Networks are dynamic resulting in underspecified graphs unlike the traditional cloud environment, (2) Cloud environment is fragmented, heterogeneous, and resource constrained (e.g., limited bandwidth, energy) requiring new algorithms for application mapping. The overarching research challenge of this task is therefore to first understand, and then quantify, the differences between the two environments; and secondly to propose new techniques and extensions to current systems to mitigate these differences and assess their security properties.

26

Military relevance: The research in the projects addresses a key challenge in warfighting scenarios, as highlighted by the keynote talk at ACITA 201218, which is to reduce the information gap between the higher-level military units (e.g., division, battalion) and the lower-level units (e.g., platoon, squad). Worked with a number of military tactical advisors to develop a set of future use case scenarios as exemplars of how a tactical microcloud could be used and the benefits in might deliver. Exploitation & technology transition potential: The mobile micro-cloud prototype has been proposed to CERDEC as the infrastructure to support dynamic deployment of capabilities to (and from) Command and Control and edge posts such a forward operating bases, sensor systems, and platoons. A capability demonstration is currently planned and will be an initial step for transition.

2.3.2.2 Task 2 - A Declarative Infrastructure for Network And Security Management Technical merit: In Task 2, the key technical contributions for the past year are in the formulation and development of decentralised solutions for (i) detection of distributed intrusion in network traffic and (ii) query planning protocols in federated databases that preserve policy confidentiality. Multi-path TCP (MPTCP) protocols, although being able to improve network performance in hybrid wireless network, are subject to a new type of distributed signature-based attacks. Malicious contents can be split and sent across multiple paths and none of the monitors deployed on the individual paths has the full coverage of the network traffic and is able to perform signature-based intrusion detection. Inspired by the distributed and asynchronous features of our declarative networking framework we have formalised this distributed signature-based intrusion detection problem as an asynchronous online exact string-matching problem and proposed a new algorithm that we first tested using our declarative networking simulation environment19 20. We have proved via a theorem that if every packet in a multipath connection is captured by at least one monitor and if there exists a malicious pattern in the connection then our algorithm will detect it [5]21. To demonstrate the effectiveness of our algorithm we have implemented it, set up a testbed and conducted and comprehensive set of experiments to find signatures in MPTCP traffic. We were able to demonstrate under different network throughputs (WiFi-WFi, WiFi-LAN, and LAN-LAN) that the behavior of our proposed algorithm is independent of signature size and number of MPTCP connections, and of the number and the position of signatures in the flows. The detection time grows linearly with respect to the throughput. This is a significant advance as this is the first distributed signature-based intrusion detection algorithm that addresses a key open problem in MPTCP secure networking22 . As parallel subtask we have identified, formulated, and investigated into a fundamentally important new data protection problem in coalition database environments. This problem is on the detecting and

18 H. Mueller, “Challenges, Technologies, and Collaborations”, ACITA 2012 Keynote, available at https://www.usukitacs.com/sites/default/files/201208%20ITA_keynote_hmuller_final_unmarked%20%28RGC%29. pdf 19 Jiefei Ma, Franck Le, Alessandra Russo, Jorge Lobo, “Enhancing Specification and Analysis of Declarative Distributed Computing and Comparison with Existing Approaches, Annual Fall Meeting 2013 (https://www.usukita.org/node/2517). 20 Andrei Petric, Jiefei Ma, David Wood, Frank Le, Alessandra Russo and Jorge Lobo, “A simulation environment for declarative distributed applications”, Annual Fall Meeting 2013 (https://www.usukita.org/node/2518). 21 Jiefei Ma, Frank Le, Jorge Lobo, Alessandra Russo, “Detecting Distributed Signature-based Intrusion: The Case of Multi-Path Routing Attacks”, CCS 2014 (under review). 22 Internet Engineering Task Force, RFC 6824 “TCP Extensions for Multipath Operation with Multiple Addresses”, http://tools.ietf.org/html/rfc6824, January 2013.

27

preventing the leakages of access control policy information. We propose the definition LA-safe (i.e., leakage safe) query plans. These are plans that only leak policy information according to (meta-)policies that users defined about what policy information there are willing to leak. Then, we developed a query planning algorithm that uses meta-policy information that uses query planning searches that we called V´- compliant query planning techniques. We proved via a theorem that V´-compliant query planning always produces LA-safe query plans. We thoroughly evaluated a new query planning algorithm using a new simulator developed by us. Synergistic value of collaboration: The research conducted in Task 2 within the first year has been made possible because of the close collaboration between IBM-US, IBM-UK, Imperial College and Penn State University. For example, the foundation work on detection of distributed intrusion in network traffic has benefitted from well-established research collaboration between Imperial College (Dr Russo, Dr Jiefei Ma, Dr Jorge Lobo) and IBM-US (Dr Frank Le, David Wood), valuing in particular the expertise of Dr Frank Le who has brought to the task his in-depth expertise in network protocols and MPTCP. The implementation of the DN simulation environment, demonstrated at the Annual Meeting in September 2013, was extensively supported by David Wood at IBM-US. Research activities on the decentralised query planning protocol were performed jointly between Penn State University (Prof. Peng Liu, Mingyi Zhao, Qiang Zeng), IBM US (Dr. Seraphin Calo), IBM UK (Graham Bent, Patrick Dantressangle, David Vyvyan, Tom Berman), and Imperial College (Dr. Jorge Lobo, Dr. Poonam Yadav). Scientific challenge: The main challenge in this project comes from the need to tackle real open problems in the area of secure networking for which no solution has ever been proposed. Multipath TCP has been demonstrated to improve network performance, but at the same the extension of TCP with multipath capability has brought a number of new threats22. Multi-path attack signature detection is one of these possible threats. The challenge is to provide a distributed solution that is not worse than existing intrusion detection solutions for standard TCP. Snort23 is one of the most popular open source tool for intrusion detection for standard TCP. It uses Aho-Corasick algorithm24 for multi-pattern string matching in order to match the payload of each packet against rules defining the signatures of known attacks. It presupposes, however, complete access to the traffic flow in order to successfully detect known attacks. In MPTCP even if multiple monitors may be deployed on different network paths, none of them has the full coverage of the network traffic to be able to detect intrusion signatures. We have been able to develop a fully distributed algorithm whereby monitors on different paths locally scan and process their traffic, coordinate their actions and exchange states in order to prevent attacks that split signatures across multiple paths. Similarly to Aho-Corasick’s, the time complexity of our algorithm is linear with respect to the size of the input, but detection time grows linearly with the throughput. Technical challenges have been to guarantee minimum memory consumption during the (distributed) scanning process, and minimum inter-monitors communications. We have been able to demonstrate experimentally that both these two measures are constant and independent from the size (or number) of the flow(s), and number of the patterns. The second scientific challenge has been to guarantee data protection in coalitions networks. A coalition network enforcing all pair-wise data confidentiality protection policies (amid data sharing among parties) can, nevertheless, still suffer from policy info leakages. In the literature, there is no formal characterization of the policy info leaking problem. It is an open problem regarding how to rigorously prove that a query planning algorithm can prevent policy info leakages. New query planners that can stop policy info leakages are yet to be developed. Our work proposes a formal definition of policy info leakage and algorithm that somewhat tries to minimize leakage.

23 SNORT Users Manual 2.6, howpublished = http://manual.snort.org/. Accessed: 2014. 24 Aho, A. V., and Corasick, M. J., “Efficient string matching: an aid to bibliographic search”, Communications of the ACM 18, 6 (1975), 333–340.

28

Military relevance: The research in the task addresses the need for managing secure networking in hybrid dynamic military networks. As dynamic coalitions operations are more likely to use hybrid network infrastructures, secure networking require new techniques that are capable of detecting more advanced distributed situations of attacks, whilst preserving efficient performance. Project 1 has demonstrated in BBP11 the benefit of Multipath TCP (MPTCP) protocol to simultaneously use multiple paths in hybrid wireless networks, and ultimately improve the network performance. But there is clearly the need to integrate new secure management services within these new network protocols. We have initially used our Declarative Networking infrastructure and simulation environment to support the design and fast prototyping of a new algorithm for multipath signature-based intrusion detection. We have implemented the algorithm in C to conduct a comprehensive set of experiments using an MPTCP testbed that we have also developed, in order to test the execution of the algorithm under different conditions (e.g. different multipath network throughput, different flows, multiple patters, different sizes of flows). We aim to conduct formal analysis using our DN tool to verify relevant classes of properties. Our research on secured distributed query processing addresses, other two unique security research challenges associated with military Coalition Operations: (a) policy confidentiality is an essential security requirement in Coalition Operations, but policy confidentiality has not been addressed in either distributed or federated databases; (b) data services and information exchange operations in coalition scenarios need to be protected by autonomous pairwise policies, but how to enforce pairwise policies has not been really researched in either distributed or federated databases (they both assume a stitched- together policy). In both cases, authorization info leakages can expose the intention of the querying party, and can seriously hurt the trust relationships between coalition parties. Exploitation & technology transition potential: Among the research and results conducted in this task, we expect both the implementation of the multipath signature-based intrusion detection and the collaborative query processing technology to have most transition potentials. For the former, we are currently exploring the possibility of undertaking industrial transitions through the Academic Centre of Excellence in Cyber Securty Research (ACE_CSR) at Imperial College and its close collaboration with GCHQ. As for the collaborative query processing technology, through a series of weekly teleconferences and joint implementation-testing-debugging work with the GaianDB team led by Patrick Dantressangle, David Vyvyan and Graham Bent, we have taken several steps towards transitioning our secure collaborative query processing technology to the GaianDB software product. In particular, several enhancements of GaianDB software (prototype version) have already been done, including flexible execution of in-network joins inside GaianDB networks. In addition, we plan to continue the exploitation of our Declarative Networking framework by investigating its use in a recently funded EPSRC project at Imperial College, by Dr Lupu and Dr Russo, on the development of new solutions for run-time intelligent cloud protection.

2.3.2.3 Task 3 - Data Stream Processing in Hybrid Coalition Networks Technical merit: The key technical contributions for task 2.3 in the past year are/include: • Developed a novel framework for studying the fundamental problem of jointly processing real- time stream data that need to be matched to archival content data in a hybrid coalition network. The problem was addressed in the context of a hybrid network that supports in-network caching – MANET users can access all content data remotely at a back-end server via a cellular infrastructure but part of the content data may also be cached and accessed locally at certain nodes within the MANET. The key technical contributions include model conception/formulation as a joint caching and routing problem, derivation of fundamental complexity results and distributed and approximate solutions.

29

• Developed a novel framework for network-aware adaptive query planning in a mobile ad hoc network (MANET). The core idea is to generate and deploy replicated query plans that can cope with different network conditions and then use dynamic routing of data tuples to select the query plan which minimizes network cost. The framework supports stateful multi-input database query operators and fault-tolerance mechanisms with strong reliability guarantees such as exactly-once delivery. Developed a simulator environment based on the JIST/SWAN simulator for scalable evaluations of Java-based stream processing applications. • Developed a framework called EdgeStreams addressing the fundamental problem of where and when each streaming computation should be performed in a hybrid coalition network. The core idea is to use the concept of an augmented stream application graph which explicitly encodes the decision (and its cost) of whether to execute each streaming computation locally or offload it to another node. This enables a unified approach for dynamic load balancing across both network and processing resources using distributed routing algorithms of low complexity. It also allows online and balanced scheduling of incoming stream application requests with worst case bounds compared to an offline solution. An initial prototype of the EdgeStream framework has been implemented and demonstrated with a face detection application of video streams. • Addressed for the first time the problem of authenticating temporal freshness of the data outsourced for computation to data streaming databases in the cloud (i.e. an untrustworthy third party server cannot omit the latest data or return out-of-date data). Developed an authentication framework for multi-version key-value data streams which consists of a formal model and a lightweight authentication scheme based on a novel data structure called INCBM-Tree built by an incremental algorithm to support efficient authentication and verification of high volume data streams. A theoretical bound for the error rate of INCBM-Tree was derived and used to prove that the cost of INCBM-Tree is constant. Our implementation and evaluation using real-world benchmark data show that INCBM-Tree achieves an order of magnitude higher throughput for data stream authentication than existing approaches. • Investigated the problem of ranking the risks of high value assets/targets based on streams of security alerts in a heterogeneous network. Technical contributions include 1) design of a formal framework to analyze the reputation and the risk of diverse entities in a heterogeneous network 2) Development of scalable algorithms to explore the networking structures and identify potential risky entities that may be overlooked by a discrete risk score 3) Development of a highly flexible monitoring infrastructure and system that can incorporate data sources in multiple domains and 4) experimental evaluation with real traces from a large enterprise network. • We investigated the problem of modeling social influence and its impact on continuously evolving systems. Technical contributions include 1) a novel quantitative framework inspired by the spiking activities of multi-neuron systems to model social influence of prior collective (modeled as a stream of past opinion ratings) on subsequent individual decision making. 2) Evaluation on a very large dataset of 28M ratings on 1.7M assets which demonstrates the model’s ability to separate social biases introduced by prior ratings and effectively predicts the long-term cumulative growth of ratings through a scalable estimation model solely based on early rating trajectories.

Synergistic value of collaboration: Research activities in this task were performed jointly between all partners in IBM, Umass, and Imperial College London. The task successfully combined the expertise in security and networking of IBM and UMass with the data-stream management and distributed systems expertise of Imperial. Some of the projects were also performed in collaboration with Ph.D. research

30

interns from University of Illinois at Chicago and Georgia Tech. The collaboration among people with different backgrounds significantly helped in identification of the problems and contributed to the development of the final solutions. On a separate thread, we held discussions with Paul Sullivan to derive ITA applications and scenarios that could be supported by the stream processing model. In addition the researchers of this task held discussions and meetings with researchers of other ITA tasks such as microcloud task P2.1 to identify common topics of interest. Scientific challenge: The scientific challenges of this project are rooted to the lack of methods for efficient and secure stream processing computations in distributed and dynamic hybrid coalition networks (HCNs). Extensive research has applied the data stream processing model to commercial applications such as social networking sites and financial risk calculations which typically run in Internet data centers where high volumes of data traffic are processed centrally by high-performance networked computer clusters at low cost. HCNs differ from data centers in both nature and objectives and yield a unique set of scientific challenges that have not yet been addressed. The objective is resilience of critical mission operations rather than low cost. Centralized data processing is neither feasible nor desirable due to network bandwidth limitations. Data sources and processing resources are heterogeneous, distributed and connected with a hybrid energy-constrained mobile network. Resource allocation depends on the location of data sources, network security policies and mission context. HCNs have multiple administrative domains, and require security frameworks that can handle dynamic streams. Furthermore, the existence of multiple untrusted or semi-trusted coalition entities creates new security issues related to authentication of outsourced data streaming computations. Military relevance: BPP13 Call for Whitepapers identified content and context aware networking as a topic of key interest to the military and the ITA program. This task proposal is in the area of in-network processing applied specifically to data streams as the content and prioritized real-time information needs of the decision makers as context. The goal is to understand how to manage and configure the network so that the constraints imposed by the available networking resources and the security policies are taken into account while executing queries that are required to satisfy the information needs of the decision makers. The task is focused on challenges encountered in a coalition tactical network - it considers significantly different models for network resources and security than typically considered in the literature. Indeed, the goal of the in-network processing in HCNs is not to optimize some type of homogeneous metric (such as revenue) but ensure that despite failures and dynamics in the network and computing nodes, critical mission continuous queries on moving data streams are provided resources and are able to execute. The research in the task addresses the issues surrounding processor and data intensive real-time applications, including situational awareness, location tracking, intrusion detection, context sharing, and multimedia analytics, needed to aid data-to-decision at the tactical edge. These analytics services could provide situational awareness to troops deployed in the field or enable each commander to gather relevant information and take mission critical decisions. This information could also be sent to a remote command center for a more global view of operations and high-level decisions. On the network front, our research on load balancing and network-aware adaptive query planning techniques are aimed to support the efficient execution of such real-time applications in dynamic hybrid coalition networks while providing efficient sharing of compute and communication resources, reliability guarantees and expressive stream query programming models needed for practical application development. The research in the Optimal Caching and Routing in Hybrid Networks effort could be used for any in-theater operations when both computing and storage are required to perform an operation, e.g., such as facial or scene recognition On the security front, the research in authentication aims to verify the integrity and freshness of the information originated from multiple administrative domains. In particular, if the data storage (e.g., cloud) are deployed in hostile environment and subject to being tampered by adversaries, the adversaries may

31

provide fake data or replay old data to mislead military operations. Therefore, it is of critical importance that the correctness and temporal freshness of the data streams can be efficiently verified. In addition, as the data volume keeps increasing and often-limited resources (batteries, computation power) of sensors, a light weight solution is more desirable and practical for field deployment. The research addressed both challenges and developed a novel authentication mechanism to efficiently achieve secure, precise and fresh delivery of critical data. In a similar vain, our security research in application of stream processing to risk management in intrusion detection systems addresses one critical challenge facing military coalition operations, that is, how to better understand and rank the risks of high value assets/targets in a heterogeneous networks. When security incidents happen, the proposed approach will allow more effective resource allocation and periodization for further investigation. Finally, our research on modeling social influence for continuously evolving systems has applicability to collective decision making problems that arise in military coalition operations. We believe that our framework can aid in studies of modeling complex social processes and untangling manipulations and biases within social environments and provides insights toward design of platforms that aggregate individual opinions.

Exploitation & technology transition potential: Data stream processing is a relatively new paradigm for analytics and one of our goals in this task is to adopt and evaluate the potential of this technology for tactical operations. Our goal is to pursue transition opportunities on three fronts: (a) knowledge transfer to US Army and UK MoD/Dstl partners; (b) contribute data stream processing capabilities to existing ITA assets that have already made a transition; (c) enhancements to various open source and commercial offerings that make them more suitable for tactical operations. Our progress this year toward these goals have been satisfactory. Toward (a) we have made use of regular ITA meetings and other briefing opportunities to perform this knowledge transfer and exchanges of ideas with US and UK Govt collaborators. We have also engaged in discussions with Paul Sullivan in order to derive scenarios and corresponding stream processing applications that provide real-time analytics and situational awareness. Toward b) and c) we have implemented our adaptive stream query planning framework as an extension of the Java-based Jist/Swans network simulator. The simulator allows rapid prototyping and testing of experiments at larger scale than a real-world prototype. We also developed an initial prototype of the EdgeStreams framework using a Virtual Machine implementation on top of an existing data stream processing framework (IBM Infosphere Streams) and demonstrated its real time monitoring and load balancing capabilities 25. On the security front, we implemented and demonstrated the proposed stream freshness authentication system on top of HBase and experimentally evaluated it using real-world benchmark data (YCSB: Yahoo! Cloud Serving Benchmark). Similarly, the asset risk Scoring with Mutually Reinforced Reputation Propagation system was prototyped and evaluated on a real enterprise network based on large streams of network data. We are currently planning to port our Java-based simulator modules for adaptive query planning to work on SEEP (Imperial’s Java-based stream processing engine), on the ITA CORE-EMANE emulator and on real-world mobile devices to perform more realistic performance evaluations. Ongoing efforts also

25 ABM Musa, Theodoros Salonidis, Seraphin Calo, Data Stream Processing for Hybrid Coalition Networks (HCNs): Demo, Annual ITA Fall Meeting https://www.usukita.org/node/2583

32

include interfacing the VM-based EdgeStream framework with the ITA CORE/EMANE emulator. This is done both in order to provide to CORE/EMANE stream analytics capabilities and in order to support realistic experimental evaluations of stream processing frameworks in an emulated hybrid network environment. Finally, we are pursuing the possibility of offering some of the above data stream processing frameworks either as standalone ITA assets or as part of a cloud-based ITA/CTA experimentation facility.

2.3.3 Technical Accomplishments Although the work in this Project is centred around particular technologies (cloud computing, declarative networking and stream processing in the three tasks respectively), the higher-level collective accomplishment of the Project is to focus research on the network and security boundary. Understanding the networked aspects of security, the subtleties and complications of their interaction, and the additional constraints that are generated. In philosophical terms, networking is the art of connecting things together and sharing information; security is the art of isolating knowledge, creating walls and restricting information flows. The work in project 2 explores this tension between the two goals, by focusing on the more practical aspects of particular technologies rather than generic abstract theories.

To that end (as examples) we can see the micro-cloud team (task 1) developing algorithms to cope with the placement restrictions as a result of security constraints; the declarative networking team (task 2) using their techniques to understand the intrusion detection properties of multipath TCP; and the network streaming team (task 3) developing dual network and security frameworks to assess the common challenge from alternative perspectives.

Overall therefore the work of Project 2 has its eye firmly on the applicability of the research to (shorter term) future deployments, where addressing the twin challenges of networks and security is fundamental to the nature of the military domain.

2.3.3.1 Task 1 – Mobile Micro-Cloud Our goal in this task is to develop cloud technologies to manage data services in tactical networks with the aim of improving small unit operations by providing critical, timely, mission relevant Situational Awareness. As such, the key challenges that we face are to establish infrastructure in a rapid and timely manner so that intelligence could be gathered in support of Command and Control from distributed sources such as command posts, platoons, sensor resources, and so forth. We adopt a two pronged approach to address this challenge – (a) Develop key algorithmic techniques to address resource mapping issues in tactical cloud environments and (b) Conduct extensive experiments based on real-world micro- cloud scenarios to develop a deep understanding of the protocol and design issues. In cloud environments, one of the basic problems is the efficient assignment of physical resources to user application processing and communication requests. In its full generality, this application placement problem is notoriously hard, and therefore, heuristic algorithms are employed in common practice, which may unknowingly suffer from poor performance as compared to the optimal solution. In this piece of

33

work paper [1]26 [2]27, we focus on developing practical application placement algorithms with provable performance bounds. The crux of our approach is to restrict the placement to “cycle-free” path assignment, which enables us to obtain an exact optimal algorithm for linear application graph placement and online approximation algorithms for tree application graph placement. In our formulation, we jointly consider node and link assignments, and also incorporate multiple types of computational resources at nodes, as well as practical domain and conflict constraints related to security and access-control policies. We conducted a pilot study in sub-Saharan Africa with resource-poor network (e.g., low bandwidth, high latency connectivity to the core). This study found that the quality of experience for local network users when viewing videos or sharing media content is quite poor due to the network’s constraints. We designed VillageCache paper [3]28; a system which maintains local information at the edge of the network and re-delivers copies of uploaded media content (at the network edge). We time-shift media uploads to better utilize bandwidth and make video file uploads more attainable for users in poorly connected networks. We evaluated VillageCache in a lab and a test environment and find the system saves bandwidth and provides a drastic improvement in user quality of experience compared to typical rural networks in developing regions. Another key challenge in being able to deliver content (at the edge) to devices is the mobility of the users. The current methodology for delivering content in mobile networks is to route a request from the device to the core and the core assigns the nearest edge node without taking into account the device’s actual location. Due to the hierarchical nature and device mobility in such networks, the current edge node assignment algorithms fail and can result in increased response latencies (to requests). In paper [4]29 we study the performance of such algorithms in a hierarchical network with about 1.5 million mobile devices and over 130 million HTTP accesses for one day and identify that ~20% of the traffic is affected by improper assignment of edge nodes, which can result in 75x increased response latencies.

2.3.3.2 Task 2 - A Declarative Infrastructure for Network And Security Management In Task 2 we have tackled our research challenges along parallel lines of work, under the close collaboration between Imperial College, IBM-US, IBM-UK and Penn State University. Declarative Networking: We have enhanced our Declarative Networking (DN) framework in different ways. We have refined our DN language to make the description of distributed algorithms more compact, and the computational model of state transitions much simpler. We have improved the scalability of our analysis approach by developing a new transition system-based approach for the analysis of DN specifications that has enabled us to overcome a bottleneck limitation of the ASP solver used in our previous pure logic-based computational approach. We have also conducted a detailed survey of related techniques for analysis of network routing protocol specifications and compared our analysis results with key approaches presented in the literature. Outcomes of this comparative investigation has shown that,

26Online Application Placement with Provable Performance Guarantees in Cloud Environments, S. Wang, M. Zafer, and K. Leung, Under Submission, https://www.usukitacs.com/node/2580 27 Mobility-Induced Service Migration in Mobile Micro-Clouds, S. Wang, R. Urgaonkar, T. He, M. Zafer, K. Chan, and K. Leung, Under Submission, https://www.usukitacs.com/node/2669 28 VillageCache: Media Upload Gathering and Redelivery at the Edge, P. Schmitt, R. Raghavendra, and E. Belding, Under Submission, https://www.usukitacs.com/node/2645 29 Are CDN and Distributed Computing Working Well, G. Tu, R. Ganti, M. Srivatsa, and S. Lu, Under Submission, https://www.usukitacs.com/node/2638

34

although scalability is a well-known key open issue in formal analysis of routing protocols, our approach can support the analysis task of “always converge” for BGP protocol over network topologies with dispute wheels that are twice the size of those considered by existing advanced analysis approaches30. A systematic, comprehensive characterization of the computational complexity of verification in declarative distributed systems, is yet to come. There are empirical results that could be lifted from specific analysis done to particular algorithms but not general results. We have also provided a first stepping-stone in this direction. We have considered the case where the network is a fixed connected graph, and where nodes are homogeneous and memory-bounded. In spite of the limited computational power of each single node, we have shown that verification of convergence properties for this kind of systems is already PSPACE-complete in the size of the network. We have done so by showing how CTL_Q, a variant of CTL, can be used to express properties of the declarative networking system. This connection will be useful for further analysis of such systems. We have also extended our Declarative Networking framework with a simulation environment that allows developers to experiment with distributed algorithms written in our language over simulated networks with dynamic topologies. We have made use of this environment to support fast prototyping30 of our new distributed pattern-matching algorithm for multipath signature-based intrusion detection[5]21. Multipath signature-based intrusion detection: As second contribution of this task, we have developed a new multipath signature-based intrusion detection algorithm that is able to detect signature-based attacks distributed across multiple paths via MPTCP protocols. A first prototype of a distributed pattern- matching algorithm was developed in Declarative Networking and demonstrated in [8]31 using our simulation environment. During this stage we were able to detect errors in our algorithm through the analysis of simulation traces. Building on these findings we have then developed a multipath signature- based intrusion detection algorithm in C and a multipath TCP network environment as testbed to evaluate the effectiveness of our solution in real network scenarios. We have been able to demonstrate that the delay in detecting the signature changes linearly with respect to the throughput whereas the ration of the communications between monitors decreases with the increase of the throughput. These are encouraging. The algorithm has also been proved to satisfy a set of properties and its C implementation can be used in real MPTCP network settings. Stopping policy info leakages in coalition environments: As third contribution of this task, we have developed decentralized query planning protocols that preserve policy confidentiality. A query execution plan is essentially a partial order of join operations performed at different nodes. A trusted central planner renders a potential threat to policy confidentiality; knowing all pairwise policies. We have developed a decentralized query planning technique and proved that it always produces leakage safe query plans. We have developed a new query-planning algorithm and tested it using a simulator that we have also developed. Accomplishment 1: In Paper [5]21 we have identify a new type of network threats that may occur in the presence of MPTCP and for which existing signature-based intrusion detection mechanisms would result to be inadequate. We have proposed a distributed signature-based intrusion detection algorithm that defines the N-IDS problem in terms of a distributed exact string-matching problem. Monitors located on

30 Jiefei Ma, Franck Le, Alessandra Russo, Jorge Lobo, “Enhancing Specification and Analysis of Declarative Distributed Computing and Comparison with Existing Approaches, Annual Fall Meeting 2013 (https://www.usukita.org/node/2517). 31 Andrei Petric, Jiefei Ma, David Wood, Frank Le, Alessandra Russo and Jorge Lobo, “A simulation environment for declarative distributed applications”, Annual Fall Meeting 2013 (https://www.usukita.org/node/2518).

35

different path are modeled as automata, running for partially observed input strings. Asynchronous communication among the automata allows them to a global state of the string-matching. Different sub-flows, with split signatures, may be received by different monitors. Each of them scans each received packet locally and broadcasts its automaton state to all the other monitors. The broadcast enables the monitors to synchronise their local scans. So, the overhead of the communication is small since information exchanged is merely automaton states. Through comprehensive experimental results we have shown that the performance of the algorithm depends only on the network throughput. Delays in detecting the signature grows linearly with respect to the throughput, whereas the communication ratio decreases with the increase of the throughput. Accomplishment 2: As described in our first paper, “Enforcement of Autonomous Authorizations in Collaborative Distributed Query Evaluation” [6]32 (under review by IEEE Transactions on Knowledge and Data Engineering, 2nd revision, two reviewers have accepted the paper as is), and subsequent paper, “Preventing Authorization Information Leakage in Collaborative Distributed Query Processing” [8]31 (under review by ESORICS-14 conference), we have identified, formulated, and investigated into a fundamentally important new data protection problem in coalition environments. This problem is on the discovery of policy info leakages and on how to stop such leakages. Our main technical contributions are as follows. (a) This is a new problem. We found that a coalition network enforcing all the data confidentiality protection policies (amid data sharing among parties) can, however, still suffer from policy information leakages. Such leakages can expose the intention of the querying party, and can seriously hurt the trust relationships between coalition parties. (b) We formulated this problem in a formal way. This is, to our best knowledge, the first formulation (of this problem) in the literature. This formulation rigorously defines “authorization info leakages” and “LA-safe query plans.” (c) We developed two new query planning algorithms: the first one invents a new join operator split-join and it significantly reduces the communication cost by pushing partial query computation to data sources without violating any access control policies; the second one, denoted “V´-compliant query planning”, aims to stop unacceptable policy information leakages. (d) We proved a key theorem stating that V´-compliant query planning always produces LA-safe query plans. (e) We developed a new simulator in Java that simulates the collaborative data querying aspect of a coalition network. And we have thoroughly evaluated the two query planning algorithms against over 10,000 queries in multiple perspectives, including the number of acceptable leakages over time, the impact of query selectivity, the impact of the overlapping ratio of authorized views, the impact of link costs, the impact of network types, the query response time, the benefits of split joins, and the tradeoffs between acceptable leakages and query processing costs. Accomplishment 3: Paper [7]33 describes our enhanced declarative framework for the specification, simulation and analysis of distributed applications. Specifically, it presents new features of our extended DN language, the new transition-based graph new transition system-based approach and it shows how convergence queries can be answered by analyzing this transition graph. These queries include not only the standard properties of whether the protocol sometime, always or never converges, but also the detection of oscillations and ability to identify whether oscillations are transient or permanent. The simulation environment is also presented together results on the application of the DN framework to two distributed problems: pattern formation in multi-robot systems and distributed pattern matching.

32 Qiang Zeng, Mingyi Zhao, Peng Liu, Seraphin Calo, Jorge Lobo, “Enforcement of Autonomous Authorizations in Collaborative Distributed Query Evaluation”, IEEE TKDE, 2013 (under review). 33 Jiefei Ma, Frank Le, Alessandra Russo, and Jorge Lobo, Declarative Framework for Specification, Simulation and Analysis of Distributed Applications, TDE 2014 (submitted).

36

2.3.3.3 Task 3 - Data Stream Processing in Hybrid Coalition Networks (HCNs Our research in this task during this year has been conducted in a network thread and a security thread. In the network thread, we introduced frameworks, algorithms and system prototypes which enable stream processing applications to operate efficiently in distributed and dynamic hybrid coalition network environments. In particular, we investigated three fundamental problems, load balancing of streaming data and computations, adaptive query planning, and optimal integration of real-time data streams with archived cached data contents. In the security thread, we introduced a framework for authentication of fresh data when streaming computations are outsourced to third untrusted or partially trusted parties (i.e. public cloud or coalition member) and a risk analysis framework that analyzes large volumes of security alerts and computes the reputation of diverse entities based on domain knowledge and interactions between entities. Network thread: Adaptive request routing and content caching in hybrid networks. Unlike the traditional stream processing model which operates exclusively on streamed real-time data, a wide range of stream analytic applications for the tactical edge requires combination of real-time stream data with archived content data. A representative example is a situational awareness scenario where real- time streamed data tuples (e.g., faces or objects extracted from images) form requests that need to be matched against content data (e.g. known face/object images) stored in an archival database in the backend cloud. A major challenge is how to jointly process streamed data and archival content data in an optimal manner, given that the content data can be distributed in the hybrid coalition network. In our research described in [9]34 35, we addressed this general problem in the context of a hybrid network that supports in-network caching – MANET users can access all content data remotely at a back-end server via a cellular infrastructure but part of the content data may also be cached and accessed locally at certain nodes within the MANET. Although prior work has demonstrated the dramatic benefits of in- network content caching in wired networks (e.g. Internet CDNs), there is limited research understanding the challenges and potential benefits introduced by caching in hybrid coalition networks, where it may be even more beneficial to offload scarce communication resources by caching requested content in the field, for rapid and efficient retrieval later by others deployed in the field. In our research we addressed the following fundamental question – how should nodes route their streaming data tuple requests between the cellular infrastructure and an in-MANET cache, and what in- MANET caching policy should be adopted to minimize expected overall network delay? We have mathematically formulated this problem as a joint caching and routing problem and model the cellular path as either i) a congestion-insensitive fixed-delay path or ii) a congestion-sensitive path modeled as an M/M/1 queue. We proved that under the assumption of stationary, independent requests, it is optimal to adopt static caching (i.e., to keep a cache’s content fixed over time) based on content popularity. We also proved that it is optimal to route to in-MANET caches for content cached there, but to route requests for remaining content via the cellular infrastructure for the congestion-insensitive case and to split traffic between the in-MANET cache and cellular infrastructure for the congestion-sensitive case. Finally, we developed a low complexity distributed algorithm for the joint routing/caching problem and demonstrated its efficacy via simulation.

34 Mostafa Dehghan, Anand Seetharam, Ting He, Theodoros Salonidis, Jim Kurose, and Don Towsley, “Optimal Caching and Routing in Hybrid Networks,” submitted to IEEE Milcom 2014 https://www.usukita.org/node/2724 35 Jim Kurose, Dan O’Keeffe, Peter Pietzuch, Theodoros Salonidis, Anand Seetharam, Don Towsley Adaptive Data Stream Processing for Hybrid Coalition Networks: Challenges and Initial Solutions, Annual ITA Fall Meeting, Palisades, Sep. 2013. https://www.usukita.org/node/2543

37

Our ongoing work on this topic is focused on two threads. In the first thread we aim to develop distributed algorithms for the joint routing/caching problem that dynamically adapt to changing content popularity at short time scales. In the second thread, we seek to understand the complexity of this general problem. To this end, we have derived complexity results for the most general case of multiple in- MANET caches and heterogeneous users. We have proven that the joint routing/caching problem is NP hard, both for the case that the un-cached path has a fixed-delay or has a congestion-sensitive delay. We have also proven that certain special cases of the fixed-delay case (two in-MANET caches, and the case that each mobile user accesses just one file) can be solved in polynomial time. Based on these insights, we have developed an approximation algorithm that is within a factor of two of optimal.

Network thread: Adaptive stream query planning in mobile ad hoc networks (MANETs). In this work we introduced a framework for the problem of adaptive planning of streaming queries in MANETs. Streaming queries are expressed as complex dataflow graphs that support a wide range of stateful or stateless operations on various types of real-time data. The database community has proposed query planning techniques that adapt to changes in data characteristics (load profiles and operator selectivity/cardinality). Such approaches are targeted to single-server or datacenter environments where network variability is relatively low and centralized control is available. The distributed and highly dynamic nature of MANETs makes it extremely challenging to deploy and execute streaming queries in an efficient and reliable manner--while a query plan executes, changes in the network mean that any fixed query plan will eventually become outdated leading to low performance or a failure of processing. Additional challenges include providing strong reliability guarantees such as exactly-once delivery, and supporting the ability to express complex stream processing queries involving e.g. database join operators or custom user-defined operators needed for face recognition applications. To address the above challenges in 35 we introduced a new network-aware framework for adaptive query planning in MANETs. The framework consists of two phases (i) query plan generation, where we pre- compute and deploy a replicated query plan with enough redundancy to cope with MANET dynamics, and (ii) dynamic tuple routing, where operator-hosting nodes monitor MANET costs and use a distributed shortest-path algorithm to route in-flight tuples along the lowest-cost downstream query replica operator. Our initial simulation experiments showed that this approach can achieve more effective stream processing in dynamic networks compared to static query planning, which computes a single deployment plan for a given dataflow graph. Since 35, we have extended our framework to enable streaming queries with both stateless operators and multi-input stateful operators such as streaming database window joins. In order to support multi-input stateful operators, we introduced a novel lightweight joint cost measurement technique that enables join operator replicas to coordinate state migration in response to changes in network conditions. Our framework also enables a fault-tolerance model with strong reliability guarantees of exactly-once delivery. More specifically, we introduced novel fault-tolerant techniques which extend existing upstream backup techniques to the case of replicated query plans and exploit the diversity available in such replicated query plans to speed up failure recovery. Our subsequent focus has been on implementing and evaluating the above design choices through simulation. To this end, we have implemented our adaptive stream query planning framework as an extension of the Java-based Jist/Swans network simulator. The simulator allows rapid prototyping and testing of experiments at larger scale than a real-world prototype. Our ongoing experiments include investigation of the performance of different replication factors for networks with different size and dynamicity/mobility; investigation of the effect of different network measurement mechanisms and heuristic cost metrics (e.g. hop count, link bandwidth) on system throughput and latency; Performance evaluation of our fault-tolerant delivery mechanisms, and in particular the effectiveness of our path-

38

diverse recovery technique in reducing recovery time; performance comparison with existing network- aware reliable stream processing techniques, including tuple replication and dynamic query re-planning. We are also planning to port our Java-based simulator modules for adaptive query planning to work on the ITA CORE-EMANE emulator and on real-world mobile devices to perform more realistic performance evaluations.

Network thread: Load balancing of streaming data and computations in hybrid coalition networks A basic problem that arises in stream processing applications in the tactical edge is the decision of where and when each computation takes place. For example, in a situational awareness scenario, a compute- intensive stream processing operation such as face/object detection can be executed at the mobile sensor or an edge network node or the cloud, and this decision may depends on the availability of processing and communication resources. In addition, there is a need to share efficiently the network and communication resources among multiple stream applications sharing the network. A potential approach to address static versions of this problem is to use existing general-purpose graph placement approaches which perform optimal mapping of stream applications on network resources. This approach typically requires solving complex static optimization problems and incurs high (re-)computation and communication (migration) cost when there is a need to move computations. We have introduced a framework called EdgeStreams that exploits the special property of stream processing in that we can move data in addition to computations. The core idea is to use the novel concept of augmented stream application graph whose edges model the cost of moving and processing data along different computation options (including the decision of whether to execute computation locally or offload it to another node). This framework enables a unified way of addressing dynamic data and computation decisions as a routing problem which can be solved in a distributed manner. Furthermore, routing on such augmented application graphs can be performed in logarithmic time (as opposed to polynomial time in arbitrary graphs). We also showed that if the edge costs of augmented graphs are exponential functions of utilization, incoming stream applications can be placed in a balanced fashion across both computation and communication resources and the worst-case solution is within a logarithmic factor from the optimal (unrealistic) case where all incoming stream applications and their requirements are known in advance. In36, we developed an initial prototype of the EdgeStreams framework using a Virtual Machine implementation on top of IBM Infosphere Streams and demonstrated its real time monitoring and load balancing capabilities in a face detection application on sensed video streams. Ongoing efforts include experimental evaluation of the EdgeStream framework using this prototype implementation.

Security thread: outsourcing multi-version key-value stores with verifiable data freshness In [10]37, 38 39 we identified one of the major gaps in the current solutions for outsourced data streaming database: the lack of efficient and practical mechanism to ensure temporal freshness of the data (i.e. an

36 ABM Musa, Theodoros Salonidis, Seraphin Calo, Data Stream Processing for Hybrid Coalition Networks (HCNs): Demo, Annual ITA Fall Meeting https://www.usukita.org/node/2583 37 Yuzhe Tang, Ting Wang, Xin Hu, Jiyong Jang, Peter Pietzuch, Ling Liu, Authentication of Freshness for Outsourced Multi-Version Key-Value Stores, submitted to 2014 Annual Computer Security Applications Conference (ACSAC). https://www.usukita.org/node/2640

39

untrustworthy third party server cannot omit the latest data or return out-of-date data). While a large corpus of work focused on authentication of streaming data, efficient freshness authentication still remains as a challenging and understudied problem. Our contributions to this new and important problem are as follows: 1) We designed an authentication framework for multi-version key-value data streams. The framework consists of three interacting parties in different administrative domains: data owner, data user and public cloud. A formal model was created and a lightweight authentication scheme is designed to be publicly verifiable and provide guarantees for the freshness and correctness in outsourced data streaming databases; 2) We developed a new data structure called INCBM-Tree which embedded the Bloom filter into Merkle Hash Tree. An incremental construction algorithm was also designed to build INCBM- Tree in order to support efficient authentication and verification of high volume data streams; A theoretical bound for the error rate of INCBM-Tree was also derived and used to prove that the cost of INCBM-Tree is expected to be a constant value 3) We implemented the proposed system on top of HBase and experimentally evaluated it using real-world benchmark data (YCSB: Yahoo! Cloud Serving Benchmark). The results show that INCBM-Tree achieves more throughput (in an order of magnitude) for data stream authentication than existing work. For data owners and end users that have limited computing power, INCBM- TREE can be a practical solution to authenticate the freshness of outsourced data while reaping the benefits of broadly available cloud services. INCBM-TREE can be a practical solution to authenticate the freshness of outsourced data while reaping the benefits of broadly available cloud services. An initial prototype of the system has been implemented and demonstrated in ICDE 2014 conference 38.

Security thread: Asset Risk Scoring in Enterprise Network with Mutually Reinforced Reputation Propagation The research in this thread considered stream processing applications which address a critical challenge facing military coalition operations, that is, how to better understand and rank the risks of high value assets/targets in a heterogeneous network. Current threat protection systems e.g. intrusion detection and prevention (IPS/IDS) systems regularly generate thousands of alerts every day. Plagued by false positives and irrelevant events, it is often impractical to analyze and respond to all the alerts. The main challenges in addressing this problem come from 1) how to identify important risk factors from enormous amount of noisy alerts generated by typical IPS/IDS systems that are labor intensive and time consuming to investigate. 2) how to efficiently capture the mutually reinforced relationship between different entities’ risks and use the framework to process large stream of data received from different data sources.

38 Yuzhe Tang, Ting Wang, Xin Hu, Reiner Sailer, Ling Liu, Peter Pietzuch, Secure Outsourcing of Big Key-Value Storage with Freshness Authenticity, IEEE International Conference on Data Engineering (ICDE) 2014, https://www.usukita.org/node/2584 39 Yuzhe Tang, Ting Wang, Xin Hu, Reiner Sailer, Peter Pietzuch, INCBM-TREE : Outsourcing Intensive Data Stream with Authenticated Freshness, Annual ITA Fall Meeting 2013, short paper. https://www.usukita.org/node/2542

40

In 40 we proposed a formal framework to model the dynamics of risk based on the mutual reinforcement principle and implemented a prototype system that collects intensive data streams from multiple data sources and correlates large amount of alerts to derive a meaningful risk scores and rankings for different entities in the network. The key technical contributions include: 1) designed a formal framework to analyze the reputation and the risk of diverse entities in a heterogeneous network (e.g. Hybrid Coalition Networks), 2) designed and implemented a monitoring infrastructure to collect data streams from multiple sources 3) developed scalable algorithms to explore the networking structures and identify potential risky entities that may be overlooked by a discrete risk score, 4) proposed a highly flexible system that can incorporate data sources in multiple domains, and experimentally evaluated it with real network traces from a large enterprise. This system helps analysts to make an informed decision on allocation of resources and prioritization of further investigation to develop proper defense mechanisms at an early stage. In addition, the system can be fully distributed and executed in parallel, thus capable of scaling to large network. Security thread: Quantifying Herding Effects in Crowd Wisdom In many diverse settings, aggregated opinions of others play an increasingly dominant role in shaping individual decision making. One key prerequisite of harnessing the "crowd wisdom'' is the independency of individuals' opinions, yet in real settings collective opinions are rarely simple aggregations of independent minds. Recent experimental studies document that disclosing prior collective opinions distorts individuals' decision making as well as their perceptions of quality and value, highlighting a fundamental disconnect from current modeling efforts: How to model social influence and its impact on systems that are constantly evolving? In [11] 41, we developed a mechanistic framework to model social influence of prior collective opinions (e.g., online product ratings) on subsequent streams of individual decision making. We find our method successfully captures the dynamics of rating growth, helping us separate social influence bias from inherent values. Using large-scale longitudinal customer rating datasets, we demonstrate that our model not only effectively assesses social influence bias, but also accurately predicts long-term cumulative growth of ratings solely based on early rating trajectories. We believe our framework will play an increasingly important role as our understanding of social processes deepens. It promotes strategies to untangle manipulations and social biases and provides insights towards a more reliable and effective

2.3.4 References for Project 2 [1] Online Application Placement with Provable Performance Guarantees in Cloud Environments, S. Wang, M. Zafer, and K. Leung, Under Submission, https://www.usukitacs.com/node/2580 [2] Mobility-Induced Service Migration in Mobile Micro-Clouds, S. Wang, R. Urgaonkar, T. He, M. Zafer, K. Chan, and K. Leung, Under Submission, https://www.usukitacs.com/node/2669 [3] VillageCache: Media Upload Gathering and Redelivery at the Edge, P. Schmitt, R. Raghavendra, and E. Belding, Under Submission, https://www.usukitacs.com/node/2645

40 Xin Hu, Ting Wang, Marc Ph. Stoecklin, Douglas L. Schales, Jiyong Jang, Reiner Sailer, Asset Risk Scoring in Enterprise Network with Mutually Reinforced Reputation Propagation, IEEE International Workshop on Cyber Crime (IWCC), co-located with IEEE Oakland conference May 2014, San Jose, CA. https://www.usukitacs.com/node/2660 41 Ting Wang, Dashun Wang, Fei Wang, Quantifying Herding Effects in Crowd Wisdom, ACM KDD 2014, Aug 2014, New York, NY. https://www.usukita.org/node/2743

41

[4] Are CDN and Distributed Computing Working Well, G. Tu, R. Ganti, M. Srivatsa, and S. Lu, Under Submission, https://www.usukitacs.com/node/2638 [5] Jiefei Ma, Frank Le, Jorge Lobo, Alessandra Russo, “Detecting Distributed Signature-based Intrusion: The Case of Multi-Path Routing Attacks”, CCS 2014 (under review). [6] Qiang Zeng, Mingyi Zhao, Peng Liu, Seraphin Calo, Jorge Lobo, “Enforcement of Autonomous Authorizations in Collaborative Distributed Query Evaluation”, IEEE TKDE, 2013 (under review). [7] Jiefei Ma, Frank Le, Alessandra Russo, and Jorge Lobo, Declarative Framework for Specification, Simulation and Analysis of Distributed Applications, TDKE 2014 (submitted). [8] Mingyi Zhao, Peng Liu, Qiang Zeng, Jorge Lobo, Preventing Authorization Information Leakage in Collaborative Distributed Query Processing, ESORICS’14, under review, (https://www.usukitacs.com/node/2725) [9] Mostafa Dehghan, Anand Seetharam, Ting He, Theodoros Salonidis, Jim Kurose, and Don Towsley, “Optimal Caching and Routing in Hybrid Networks,” submitted to IEEE Milcom 2014 , https://www.usukita.org/node/2724 [10] Yuzhe Tang, Ting Wang, Xin Hu, Jiyong Jang, Peter Pietzuch, Ling Liu, Authentication of Freshness for Outsourced Multi-Version Key-Value Stores, Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) 2014 . https://www.usukita.org/node/2640 [11] Ting Wang, Dashun Wang, Fei Wang, Quantifying Herding Effects in Crowd Wisdom, ACM KDD 2014, Aug 2014, New York, NY. https://www.usukita.org/node/2743

42

2.4 Project 3 – Security for Distributed Services

2.4.1 Introduction One of the key challenges in coalition operations is to foster collaboration and support flexible information exchange amongst coalition members while protecting sensitive information from misuse. We envision that in such environments, distributed services that span multiple coalition members will collaboratively gather information from multiple sources and filter, fuse and query them with the goal of meeting various mission objectives. In this project our main focus is to enable such secure distributed services that perform computations over potentially sensitive information from multiple coalition domains.

This project sets out to address two key areas that are critical to security in distributed coalition services: • The development of theory and practice to support fine grained tracking of information release in distributed coalition services, and • The development of novel cryptographic primitives to support outsourced and distributed computations in hybrid coalition networks

Task -1 Secure Information Flows is developing techniques to track information release in distributed coalition services. In particular, the focus is on tracking “aggregate” information released across two or more information flows between different coalition settings. This task seeks to explore computational models that are amenable to quantitative information flow analysis in various coalition settings. This task initially will develop information release tracking models in a simple two-party client-server setting and then expand the scope to multi-party coalition settings. These models will be enhanced to develop information release models for evolving and uncertain secrets. This would allow (for example) fine- grained tracking of information release across coalition domains and support knowledge-based access control policies.

Task 2- Delegated, Outsourced and Distributed Computation is developing security technologies which allow data from one security domain to be processed in a different security in a way which respects the security requirements of the original security policy. The task investigates the potential solutions that come from homomorphic cryptography and its related primitives; homomorphic cryptography allows data to be protected in such a way that certain data processing can be performed on the data without losing the security guarantees of the cryptography. This would allow (for example) data processing to be outsourced from a resource-constrained operational device to an untrusted data service in the hybrid network (via a commercial cellular network) without compromising security.

2.4.2 Research Impact Technical merit: The key technical contributions for the past year are/include: • Acceleration of Fully Homomorphic Encryption (FHE) using Field Programmable Gate Arrays (FPGA) that for the first time allows the use of FHE in practice. • Developed new models and metrics for quantifying the information release of dynamic secrets, which are secrets (like one's location, passwords, and encryption keys) that change over time. Used the model to prove some surprising results, e.g., that security can diminish if change is too frequent.

43

• Developed a new, general-purpose programming language called Wysteria, with which one can write generic, mixed-mode secure multiparty computations (SMCs) - resulting is about 30x speedup. Wysteria's implementation has been made available to the academic community. • Accelerated implementation of verifiable keyword search on encrypted data whose performance is only 10x slower than the UNIX Global Regular Expression Parser (GREP) utility which operates on unencrypted text. • Developed a new technique for SMC that uses the RAM model, rather than the circuit model, while preserving the same security guarantees. For sub-linear and multiple queries over a static dataset, this approach results in orders of magnitude speedups. • Developed a general-purpose programming language for writing authenticated data structures (ADS), which permit outsourcing storage processing while verifying that query results are processed correctly (i.e., the storage provider is not lying about the result). Our language subsumes all prior work while retaining the same performance guarantees. • Generalized authenticated MACs for Arithmetic Circuits that allows a public verifier to validate computations on authenticated data without knowledge of secret key; and allows the secret-key owner to verify the validity of the computations without needing to know the original (authenticated) inputs.

Synergistic value of collaboration: Research activities in this project were performed jointly between IBM-US, IBM-UK, UMD, RHUL and CUNY with substantial guidance from research staff in CESG and Dstl. One of the hallmarks of this collaboration is the joint effort on Accelerating FHE using FPGAs led by RHUL and IBM-UK with active participation from all of the project members. This effort started with a research paper on realizing FHE by IBM-US, followed by several extensions (e.g., adding verifiability) by CUNY and UMD, followed by a implementation of the FHE library by IBM-US and software (randomization reuse) / hardware (FPGAs) acceleration by RHUL and IBM-UK. Yet another example is the joint effort led by CUNY on an implementation of verifiable keyword search that is now hosted on the IBM-US cloud infrastructure. This effort started with a research paper by IBM-US, followed by algorithmic extensions and a implementation by CUNY, followed by hosting on ITA experimentation facility by IBM-US.

In addition to collaboration within the project, P3 researchers have worked closely with researchers in broader TA5 and TA6. Examples include work with P1 on security issues in multi-path TCP; work with P2 has resulted in quantification of security issues in location-based services when offered under the micro-cloud paradigm. Collaboration with TA6 researchers continued in BPP13 resulting in an extension of Controlled English store with trust assessment.

Scientific challenge: The main challenge in this project comes from the need to support distributed coalitions that are both computationally efficient (e.g., in a hybrid tactical network) and sufficiently secure (e.g., adequately address challenges arising from adversaries and ad hoc teams formation in coalition networks). The project operates under the premise that coalition services require gathering, querying and processing data from disparate coalition domains. Our work in task 1 examines the problem of information aggregation arising from answering multiple queries (two or more). The goal here is to develop the theory and practice of quantifying aggregate information released across multiple queries and develop solutions to throttle queries (e.g., accept/reject queries) based on information release policies. Task 2 examines the problem of outsourced execution of queries on untrusted (but presumably resourceful) coalition entities. The goal here is to not only develop cutting edge scientific research in cryptography, as can be inferred from the quality of its publication venues, but also providing accelerated implementation of novel cryptographic primitives that allows them to be used in practice. Making

44

progress here requires novel techniques (e.g., based on homomorphic encryption) with rigorous security analysis, as well as foundational questions concerning the limits of secure communication and computation.

Military relevance: A defining feature of a “coalition” is the operational deployment of varied information systems and networks with distinct security policies. Data might originate from one network and be processed by an intermediate coalition party; the results may in turn be made available to a third coalition partner. Further complicating matters, there may be wide variability in computational power, available power resources, and bandwidth between different members of the coalition. The aim of this project is to develop solutions using quantify and analyze cross-domain information flows and advance the state of cryptographic techniques (such as homomorphic cryptography), for ensuring security in this setting.

Exploitation & technology transition potential: As a result of this work, we plan to exploit the research outcomes through the development of open sourced libraries and ITA transition projects.

In April 2013 IBM-US released an FHE library42 in C++ with an open challenge to the academic community to improve its performance. Since then we have developed several extensions have been developed using BGV batching techniques 43 that allow the solution to be scaled on devices such as Field Programmable Gate Array (FPGAs). To illustrate how FHE can be used in practice, we have developed a demonstration of secure two party oblivious transfer in the context of a military ‘blue on blue’ avoidance scenario44.

While FHE offers the method to securely outsource arbitrary computations, we have also studied verifiable outsourcing of restricted class of functions (such as Fast Fourier transforms, Polynomial multiplication, arithmetic circuits). In order to illustrate this capability we have developed an implementation of verifiable keyword search over outsourced text documents. A CUNY led effort has deployed the verifiable keyword search solution on IBM-US cloud infrastructure45 - standard performance benchmarks indicate only a 10x slow down over the GREP (Global Regular Expression Parser) utility which operates on unencrypted text.

In addition, several algorithms for tracking information flow release have been open sourced as extensions to probabilistic programming languages46. These solutions allow information release tracking over dynamic secrets, methods mixed-mode SMC computations while guaranteeing aggregate information release and game-theoretic models of information release (between an information provider and information consumer).

42 https://github.com/shaih/HElib 43 Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (Leveled) fully homomorphic encryption without bootstrapping. In Shafi Goldwasser, editor, ITCS, pages 309-325. ACM, 2012. 44 Accelerating Fully Homomorphic Encryption. https://www.usukitacs.com/node/2728. 45 https://bitbucket.org/wes_ccny/verifiable-delegation 46 https://github.com/plum-umd/qif

45

We are also engaged in an ongoing transition project on “Policy Controlled Database Federation” which is developing a database proxy based on the Gaian Database and the Policy Management Toolkit. The proxy will allow fine grained access to data in multiple distributed databases. Authentication and access control are key aspects of its design. Its first instantiation is for the U.S. Army Research Laboratory (ARL) multimodal signature database (MMSDB). Another transition project on “Enabling Unattended Asset Interoperability Using Controlled English” will involve several ITA technologies, including: the Controlled English (CE) Store, the Virtual Information Exchange (VIE), the Policy Management Toolkit (PMT), and the Distributed State Machine (DSM). The scenario being highlighted to show the capabilities of these technologies involves sharing of information across different domains of authority, and involves authentication and access control capabilities.

2.4.3 Technical Accomplishments The project has made substantial technical progress in the last year, resulting in high quality publications in leading conference venues (including three papers in IEEE Security and Privacy Symposium 2014, two papers in Public Key Cryptography Conference 2014 and one paper in EuroCrypt 2014). In task 1 we have obtained fundamental results on quantification of aggregate information release over time varying secrets and used the insights to study two applications: security in multi-path TCP and secure location-based services on micro-cloud. In task 2, besides acceleration of cryptographic primitives such as FHE and verifiable keyword search, we have obtained foundational results on authenticated data structures that allow an untrusted prover to carry out operations such that it can be verified as authentic by a qualified verifier.

2.4.3.1 Task 1: Secure Information Flow Since the beginning of BPP11 we have been working on methods for protecting information flows in coalition systems. Two themes of research have been (1) tracking information release quantitatively, to use it as a basis for a security policy, and (2) using secure multiparty computation to reduce release in the first place, but nevertheless tracking this release using the same techniques. We have considered several applications including security-oriented query analysis, security analysis of multi-path TCP and secure location-based services in a micro-cloud.

In BPP13 we have deepened and generalized this work. Our work on mobility traces reasoned about a particular sort of sensitive information flow in a particular setting. The community still lacks strong theoretical foundations for quantifying the information flowing from dynamic (time varying) secrets. To fill this gap, we developed formal models and metrics for this purpose; our paper [1] which appeared in IEEE S&P'1447. In particular, our model of information flow describes probabilistic, interactive systems (sensor networks are a good example) and we consider adaptive adversaries, whose interactions with the system can influence the system, and be influenced by it. We built an analysis based on this model and ran a series of experiments to understand how different parameters affect information flows (the code and results are available online). In particular:

47 P. Mardziel and M. Hicks, Quantifying Information Flow for Time Varying Secrets, IEEE Security and Privacy Symposium, 2014, https://www.usukitacs.com/node/2566.

46

• Frequent change of a secret can increase leakage, even though intuition might initially suggest that frequent changes should decrease it. The increase occurs when there is an underlying order that can be inferred and used to guess future (or past) secrets. • Wait-adaptive adversaries, who can carefully choose when to exploit a target system, can derive significantly more gain than adversaries who cannot make this choice. Thus, ignoring the adversary’s adaptivity (as in prior work on static secrets) might lead one to conclude secrets are safe when they really are not. • A wait-adaptive adversary’s expected gain increases monotonically with time, whereas a non-adaptive adversary’s gain might not. • Adversaries that are capable of influencing the system they are observing can learn exponentially more information than adversaries who cannot provide inputs.

We are now collaborating with Carlos Cid and his group from Royal Holloway to bring insights from game theory to our setting; one observation that has emerged is that gain for an adversary may not always be zero-sum with the loss to the defender, and that both can and should be modeled independently.

Another technology for privacy-preserving computation among coalitions is secure multiparty computation (SMC), and we previously have worked on a query analysis for reasoning about "adversary knowledge." Inspired by this analysis, we developed new analysis (published in PLAS'1248) whose purpose is to optimize a protocol by breaking it into smaller protocols so long as the overall knowledge profile is unaffected. For one application, this resulted in a 30x speedup. Most recently we have developed a new, high-level programming language for SMCs called Wysteria. Wysteria supports mixed- mode programs, which combine local, private computations with synchronous SMCs, making them a suitable target for the above-mentioned analysis. Wysteria complements a standard feature set with built- in support for secret shares and with wire bundles, a new abstraction that supports generic, N-party computations. Our paper [2], appeared at IEEE S&P'1449, contains a formalization of Wysteria, its refinement type system, and its operational semantics. We show that Wysteria programs have an easy-to- understand single-threaded interpretation and prove that this view corresponds to the actual multi- threaded semantics. Wysteria's implementation and examples are freely available. Current efforts have focused on expanding the applications it can support.

Finally, our work on RAM-model secure computation50 addresses the inherent limitations of circuit- model secure computation considered in almost all previous work. Almost all previous implementations of general-purpose secure computation assume the underlying computation is represented as a circuit. While theoretical developments using circuits are sensible (and common), compiling typical programs, which assume a von Neumann-style Random Access Machine (RAM) model, to efficient circuits can be challenging. One significant challenge is handling dynamic memory accesses to an array in which the

48 P. Mardziel, M. Hicks, J. Katz and M. Srivatsa, Knowledge-Oriented Secure Multiparty Computation, ACM PLAS 2012, https://www.usukitacs.com/node/2003 49 P. Mardziel and M. Hicks, WYSTERIA: A Programming Language for Generic, Mixed-Model Multiparty Computations, IEEE Security and Privacy Symposium, 2014, https://www.usukitacs.com/node/2565. 50 C. Liu, Y. Huang, E. Shi, J. Katz and M. Hicks. Automating Efficient RAM-model Secure Computation, IEEE Security and Privacy Symposium, 2014, https://www.usukitacs.com/node/2688.

47

memory location being read/written depends on secret inputs. A typical program-tocircuit compiler typically makes an entire copy of the array upon every dynamic memory access, thus resulting in a huge circuit when the data size is large. Theoretically speaking, generic approaches for translating RAM programs into circuits incur, in general, O(TN) blowup in efficiency, where T is an upper bound on the program’s running time, and N is the memory size. We have developed the first automated approach for RAM-model secure computation in the semihonest model. We define an intermediate representation called SCVM and a corresponding type system suited for RAM-model secure computation. Leveraging compile-time optimizations, our approach achieves about two order-of-magnitude speedups compared to both circuit-model secure computation and the state-of-art RAM model secure computation.

In addition to developing new theory for secure information flow, we have applied this theory to several applications:

(1) Security analysis of multi-path TCP: We reported a new vulnerability in multi-path TCP51 [3] that stems from the interdependence between the multiple sub-flows in an MPTCP connection. MPTCP congestion control algorithms are designed to achieve resource pooling and fairness with single-path TCP users at shared bottlenecks. Therefore, multiple MPTCP sub-flows are inherently coupled with each other, resulting in potential side-channels that can be exploited to infer cross-path properties52.

(2) Secure location-based services in micro-cloud53: While prior work has examined the ‘how to enforce location security’ problem, the question of ‘where to enforce location security’ has not been extensively studied. We examined the ‘where’ problem and in particular, the tradeoffs between enforcing location security at a device vs. enforcing location security at an micro-cloud edge server. Using real location traces(taxicab traces from San Francisco, Shanghai and Stockholm) we show that device-based solutions either suffer from high false positive rate (about 25% probability of not meeting the desired security requirement) or low utility (about 600 meters higher error in obfuscated location data).

2.4.3.2 Task 2: Delegated, Outsourced and Distributed Computation Since the beginning of BPP11, the aim of this task is to study cryptographic techniques for information services that span multiple security domains. Homomorphic cryptography provides one way for certain computations to be performed on data in a manner which preserves data confidentiality and integrity: e.g., homomorphic encryption 54 allows arbitrary computations on encrypted data without revealing the data. However, most existing schemes are simply not efficient for practical applications. We have adopted a two-pronged approach to address this issue: (i) hardware acceleration mechanisms for FHE using FPGAs, and (ii) efficient solutions for simpler/restricted classes of computations (such as Fast Fourier transforms, Polynomial multiplication, Arithmetic circuits).

51 Z. Shafiq, F. Le and M. Srivatsa, Cross-Path Inference Attacks on Multipath TCP, HotNets 2013, https://www.usukitacs.com/node/2491. 52 Z. Shafiq, F. Le, and M. Srivatsa. Cross-Path Inference and Attacks on Multipath TCP. Submitted to ACM CCS 2014. https://www.usukitacs.com/node/2683 53 S. Arunkumar, M. Srivatsa and R. Rajarajan, Location Information Flow Control - Where to Enforce?, Submitted to MILCOM 2014, https://www.usukitacs.com/node/2624. 54 C. Genry. Fully homomorphic encryption using ideal lattices. In STOC 2009

48

Gentry’s initial FHE scheme 54 had a per gate computation overhead that scaled as O(λ4) in the security parameter λ resulting in a very inefficient scheme. This was a result of the construction of his scheme, which involves computing the ‘(augmented) decryption circuit’ homomorphically via an innovative bootstrapping theorem. As a result of this, attempts have been made to improve the scalability of fully homomorphic schemes by seeking to remove the need for bootstrapping. One such recent optimization (BGV12 ) proposed a radically new approach to fully homomorphic encryption in a paper entitled ‘(Levelled) Fully Homomorphic Encryption without Bootstrapping’, that dramatically improves performance. This approach uses a scheme based on polynomial rings with integer coefficients which removed the need for any bootstrapping due to a new approach in the construction of the fully homomorphic encryption scheme. The BGV scheme scales as O(λ2) or O(λ.L3) for L-level arithmetic circuits, i.e., quasi-linear in the security parameter. This can be improved further using a batching technique that allows the per-gate computation to be reduced from quasi-linear in the security parameter to poly-logarithmic.

In April 2013 IBM-US released an FHE library in C++ based on the BVG scheme 55, with an open challenge to the community to improve on the scheme and implementation. Implementations of the scheme have been reported (e.g., 56, 57) all of these schemes describe binary message implementations, limiting the message space within each ciphertext to d bits where d is the degree of the ciphertext polynomial. In BPP13 we have developed novel implementation of this scheme for message spaces of d*log(p) bits, making use of the BGV batching techniques (batch size p) to optimize the scheme such that it can be efficiently implemented on devices such as Field Programmable Gate Array (FPGA). Whilst FPGAs provide an unmatched ability to tailor their circuitry leading to better performance at lower power, the skills required to program FPGAs are often beyond the expertise of most skilled software programmers. Recent work 58 has shown how to bridge the gap between programming software vs. programming hardware using a new Object-Oriented language called LIME that can be compiled for the JVM and run on a GPP or into a synthesizable hardware description language to run on a target FPGA/GPU. Lime extends Java with features that provide a way to carry OO concepts into efficient hardware. IBM has developed a prototype cross compiler that we term ‘Liquid Metal’ which provides LIME as a single unified programming language and a runtime for programming hybrid computers comprised of multi-core processors such as GPUs, and FPGAs. We have developed an implementation of the modified BGV FHE scheme that facilitates coding in the LIME language, thereby allowing for cross compilation onto a an FPGA to obtain the necessary performance improvements. To illustrate how FHE can be used in practice, we have developed a demonstration of secure two party oblivious transfer in the context of a military ‘blue on blue’ avoidance scenario44

55 Shai Halevi and Victor Shoup. Design and Implementation of a Homomorphic-Encryption Library. Manuscript, IBM Research 2013 56 Craig Gentry, Shai Halevi, and Nigel P. Smart. Fully homomorphic encryption with polylog overhead. In Pointcheval and Johansson [PJ12], pages 465-482 57 Craig Gentry, Shai Halevi, and Nigel P. Smart. Homomorphic evaluation of the AES circuit. In Reihaneh Safavi- Naini and Ran Canetti, editors, CRYPTO, volume 7417 of LNCS, pages 850-867. Springer, 2012 58 David F. Bacon, Rodric Rabbah, and Sunil Shukla. FPGA Programming

49

Another thread of work has been to explore efficient secure outsourcing solutions for simpler classes of computations. Our work has addressed the ability to add verification to outsourced computations59,60 [4], [5] (verification was not a part of the original FHE scheme). We have developed a generalization of Homomorphic MACs for Arithmetic Circuits59 that allows anyone to validate computations (expressed as an Arithmetic Circuit) on authenticated data without knowledge of the secret key. At Eurocrypt 2013 Catalano and Fiore proposed two realizations of homomorphic MACs that support a restricted class of computations (arithmetic circuits of polynomial degree), are practically efficient, but fail to achieve both succinctness and composability at the same time. In this paper we generalize the work of Catalano and Fiore in several ways. First, we abstract away their results using the notion of encodings with limited malleability. Next, we generalize their constructions to work with graded encodings, and more abstractly with $k$-linear groups. The main advantage of this latter approach is that it allows for homomorphic MACs which are (somewhat) composable while retaining succinctness. Interestingly, our construction uses graded encodings in a generic way. Thus, all its limitations (limited composability and non-constant size of the tags) solely depend on the fact that currently known multilinear maps share similar constraints. This means, for instance, that our scheme would support arbitrary circuits (polynomial depth) if we had compact multilinear maps with an exponential number of levels.

We have formalized the notion of Verifiable Oblivious Storage (VOS)60[4]59, in which a client outsources the storage of data to a server while ensuring privacy of the data and verifiability and obliviousness of access to that data. VOS generalizes the notion of Oblivious RAM (ORAM) in that it allows the server to perform computation, and also explicitly considers the issue of data integrity and freshness. We show that allowing server-side computation allows us to circumvent the known lower bound on the bandwidth required for ORAM constructions. Specifically, for large block sizes we can construct a VOS scheme with constant bandwidth per query; furthermore, answering queries requires only poly-logarithmic server- side computation. We also show how to apply our VOS construction to achieve a dynamic proof-of- retrievability scheme that is asymptotically more bandwidth-efficient than existing state-of-the-art.

In BPP13, we have developed a practical protocol for verifiable keyword search on dynamically changing files stored on untrusted servers. The problem can be formalized as follows: a client stores a document D with an untrusted server and later wants to verify if a given keyword w appears in D or not. Our starting point is the protocol by Benabbas, Gennaro and Vahlis from CRYPTO1161 (work from ITA BPP11): in its most efficient implementation, after uploading the document with server, the client needs only constant (i.e., independent of |D|) memory and computation time to query the server and verify its response. However, the server's workload is the primary computational limitation of this protocol: to respond to a query, the server must read the entire file, performing exponentiations along the way. Another limitation of the original protocol is that it only deals with a static document D, i.e., even a small modification to D would require the client to re-authenticate the entire file.

59 D. Catalano, D. Fiore, R. Gennaro and L. Nizzardo, Generalizing Homomorphic MACs for Arithmetic Circuits, PKC 2014, https://www.usukita.org/node/2631. 60 D. Apon, J. Katz, E. Shi and A. Thiruvengadam, Verifiable Oblivious Storage, PKC 2014, https://www.usukitacs.com/node/2359 61 Siavosh Benabbas, Rosario Gennaro, Yevgeniy Vahlis: Verifiable Delegation of Computation over Large Datasets. CRYPTO 2011: 111-131, https://www.usukitacs.com/node/1776

50

To address these limitations, we developed a basic indexing technique which allows us to dramatically improve the efficiency of the server's query response time, without compromising the client's efficiency. In fact, keyword searches in our modified protocol run in constant time without substantially changing storage requirements or the cost of the initial upload phase. Moreover the same indexing technique yields an efficient way to handle updates to the file D. The parties' workload will increase only to O(log m), where m is the total number of updates up to that point. We showed that our approach is practically feasible by presenting a prototype implementation of the BGV protocol with our modifications. Our prototype shows that the client can be easily implemented over a mobile device (e.g., almost any modern smart phone), and that even over very large files both the client and the server's work remains feasible.

Finally, we have developed solutions for building authenticated data structure (ADS62) [6]. ADS is a data structure whose operations can be carried out by an untrusted prover, the results of which a verifier can efficiently check as authentic. This is done by having the prover produce a compact proof that the verifier can check along with each query result. ADSs thus support outsourcing data maintenance and processing tasks to untrusted servers without loss of integrity. Past work on ADSs has focused on particular data structures (or limited classes of data structures), one at a time, often with support only for particular operations. This paper presents a generic method, using a simple extension to a ML-like functional programming language we call lambdaAuth with which one can program authenticated operations over any data structure constructed from standard type constructors, including recursive types, sums, and products. The programmer writes the data structure largely as usual; it can then be compiled to code to be run by the prover and verifier. Using a formalization of lambdaAuth we prove that all well-typed lambdaAuth programs result in code that is secure under the standard cryptographic assumption of collision-resistant hash functions. We have implemented our approach as an extension to the OCaml compiler, and have used it to produce authenticated versions of many interesting data structures including binary search trees, red-black trees, skip lists, and more. Performance experiments show that our approach is efficient, giving up little compared to the hand-optimized data structures developed previously.

2.4.4 References for Project 3 [1] P. Mardziel and M. Hicks, Quantifying Information Flow for Time Varying Secrets, IEEE Security and Privacy Symposium, 2014, https://www.usukitacs.com/node/2566 [2] P. Mardziel and M. Hicks, WYSTERIA: A Programming Language for Generic, Mixed-Model Multiparty Computations, IEEE Security and Privacy Symposium, 2014, https://www.usukitacs.com/node/2565. [3] Z. Shafiq, F. Le and M. Srivatsa, Cross-Path Inference Attacks on Multipath TCP, HotNets 2013, https://www.usukitacs.com/node/2491. [4] D. Catalano, D. Fiore, R. Gennaro and L. Nizzardo, Generalizing Homomorphic MACs for Arithmetic Circuits, PKC 2014, https://www.usukita.org/node/2631. [5] D. Apon, J. Katz, E. Shi and A. Thiruvengadam, Verifiable Oblivious Storage, PKC 2014, https://www.usukitacs.com/node/2359 [6] Andrew Miller, Michael Hicks, Jonathan Katz, and Elaine Shi, Authenticated Data Structures, Generically, POPL, July 2013, https://www.usukitacs.com/node/2502

62 Andrew Miller, Michael Hicks, Jonathan Katz, and Elaine Shi, Authenticated Data Structures, Generically, POPL, July 2013, https://www.usukitacs.com/node/2502

51

3. Technical Area 6: Distributed Coalition Information Processing for Decision Making

3.1 Overview

Technical Area 6 aims to develop fundamental underpinnings to support exploitation and management of an agile network of data and information sources. The intention is to better enable effective understanding and decision-making across the coalition for complex problems in this dynamic environment. The overall challenge is to develop the fundamental science to underpin a 2-way end-to-end socio-technical chain reaching from ‘data to decision’ (and back, from decision to data) resolving “complex problems” while operating in a coalition environment. In addressing this challenge the research considers a number of related perspectives on task-relevant data and information, including: an understanding of the data itself, the sources and any measurable factors; consideration of the network location of such data/information and any related services; and, techniques for finding task-relevant information from within this disparate data within the dynamic network context. In addition to these data- and network-centric considerations TA6 also takes account of relevant human, social and cultural considerations, providing insight and perspective on human-level issues and considerations in such an environment. The TA6 research builds on the earlier research from BPP11 timeframe and extends most of its core themes, including the work on collective cognition and network simulation, supporting human-human- machine working via natural language and Controlled Natural Language (CNL), management of information-level services in the hybrid network environment, improved extraction of structured Controlled English (CE) from unstructured data sources, and techniques for dealing with uncertain information in a coalition context with variable levels of trust between partners. In addition new complimentary focus areas are introduced, such as improved representation and reasoning techniques in CE, service management approaches that are open to non-technical users, and better support for collaborative intelligence analysis in the coalition context. As before, a number of technical themes are present throughout the work, which often cross project boundaries: for example, the focus on the potential for Controlled English to support human-machine working in multiple problem-solving contexts, and the handling of various kinds of uncertainties in the coalition information provisioning space, including unstable network services, varying levels of trust, and conflicting sources. Project 4 – Human-Information Interaction: The human user(s) of the coalition network environment are a key consideration for the capabilities and services that must be provided, and the manner in which they should be made available. This project considers a number of factors and capabilities in this context, starting with consideration of how to configure the socio-technical elements of the coalition environment in order to optimize collective sensemaking capabilities using multi-agent simulation techniques. This work extends that of BPP11 by considering cognitively sophisticated agents with semantically-enriched forms of inter-agent communication. A key element of this project is the consideration of improving human-machine interaction with respect to the processing of unstructured textual resources, the communication of human/machine knowledge and information, reasoning over this information, and the understanding of reasoning outputs, all supported by the use of Controlled English. The project is also addressing improving the informational support provided to coalition decision-makers in a manner that is sensitive to various aspects of the decision-making context, using conversational human-machine interaction to help resolve the relevant semantic features of a decision problem and cope with uncertainties relating to decision complexity, network availability, and the temporal horizon of the decision-making process.

52

Project 5 – Distributed Coalition Services: This project addresses the ability to offer services and information to end users in a tactical network. The specification of services by end users that are non- technical is addressed, along with rapid policy checking and recommendations for easing policies that preclude a service from being offered. For service deployment on the tactical network deployment, monitoring and redeployment are addressed to maximize the usefulness of the services. The services making up applications are expected to be hosted on a mix of mobile and fixed assets (i.e., hybrid wireless networks). A key element of the work here is to develop algorithms for monitoring distributed service deployments and adapting deployments to improve utility, considering both monitor placement and inferring the cause of degradation and failures to services. Another thrust of this research is to enable users of a system to be able to quickly understand and specify services and guiding policies in a coalition environment, and then quickly configure specific service instances to meet ad hoc needs. For this, a user- oriented policy model that can be understood by non-technical people is being defined, and feedback provided to the users on what policies may conflict or preclude a service from being offered. Project 6 – Collective Sensemaking Under Uncertainty: Effective decision-making in coalition operations relies on collaborative acquisition of information from hard and soft information sources, its dissemination across coalition member boundaries, and eventually its collective analysis in support of the development of knowledge and situation awareness for the tactical tasks at hand. A key element of this project is concerned with individual and collective methods to acquire, evaluate, integrate, and interpret information in order to make intelligence analysis at the network edge and at higher echelons more effective for improved situational awareness. The project is taking forward previous BPP11 work in trust, quality and value of information, with a goal of establishing a principled framework for reasoning about the uncertainties that arise from risk-value trade-offs that occur for both information sources that acquire and share information, and for information consumers that analyze it and use it for decision-making. In addition, Project 6 seeks to develop the scientific foundations of a methodology to enable robust linking and reasoning “from data to decision” and vice versa to solve complex coalition dynamic problems. With strong links to Projects 4 and 5, mechanisms for understanding information needs are being established, efficiently acquiring and integrating relevant information from different sources, and presenting it in a fashion that minimizes information overload and increases quality of decisions.

3.1.1 Accomplishments Highlights Key highlights and contributions for each project in TA6 include: Project 4 – Human-Information Interaction: • a cognitive computational model of collaborative problem solving providing a proof of concept regarding the feasibility of implementing complex, team-based tasks in the ACT-R cognitive architecture, together with an ACT-R-based cognitive social simulation capability (ACT- R/CSSC) to support studies into collective or team cognition. • a research framework to represent deep linguistic semantics in CE and to apply linguistic reasoning to extract CE facts and rules from NL sentences, based upon integration of DELPH-IN linguistic resources, together with an analysis of uncertainty and ambiguity in NL, showing how even simple sentences require ambiguity resolution using domain reasoning. • the definition and validation, including initial trials with human subjects, of a conversational protocol for free-flowing human-machine, machine-machine, and machine-human exchanges, incorporating NL and CE, applied to use cases in a tactical coalition operation context (information collection from humans, fusion, sensor tasking including handling of uncertain factors such as resource availability). Project 5 – Distributed Coalition Services:

53

• contributions in robust service monitoring (achieving high availability of time-sensitive management data in a MANET), topology inferencing in the context of service deployment in MANETs (combining service level monitoring and an approximation of the network topology), and integrated frameworks for diagnosing the cause of degradation in distributed service environments (isolating faults in a highly dynamic environment based on service dependency graphs). • contributions in autonomous configuration of services (algorithms to configure distributed services based on cost and information gain), and sharing assets in a coalition environment governed by high-level human readable policies (comparing performance of asset-task assignment under policies based on asset ownership vs those based on coalition team membership). Project 6 – Collective Sensemaking Under Uncertainty: • A Semantic Obfuscation Framework which utilizes Subjective Logic (SL) to provide a tractable method for deciding what semantic information to share and with whom, how to systematically obfuscate, and how to reason about the uncertainty with which a recipient can make inferences, together with new operators for SL to provide improved fusion of uncertain and obfuscated information. • A subjective ontology-based data access system (SOBDA), which addresses the challenge of representing and aggregating uncertain data at large scale by allowing decision-makers (DM) to query large data silos containing uncertain information in terms of a common schema, together with a framework to address bottlenecks in peer-to-peer federated databases (e.g., GaianDB). • A principled model for representing and reasoning about the provenance of information sources and analyses in argumentation schemes, and a means to reason about preferences over evidence on the basis of timeliness, reliability, source trustworthiness, and accuracy of evidence derivation, together with an integrated Collaborative Intelligence Spaces tool that supports flexible and rigorous collaborative analysis.`

3.1.2 Other Important information/facts related to TA6 Project 4 features strong inter-disciplinary US-UK collaborations across all tasks, including on cognitive modelling (Southampton, CMU, ARL, IBM UK, Airbus), natural language processing (IBM UK, Boeing, Cambridge, ARL), and content-aware assistance (Cardiff, IBM UK, PSU, UCLA, IBM US, ARL). There is also strong cross-project collaboration around inter-linking themes of cognition and human-machine interaction. Significant opportunities exist for exploitation and technology transition in the use of ACT-R/CSSC as a generic platform for undertaking research into team or collective cognition, for example in applications to investigate the effect of specific technologies (e.g. collaborative intelligence analysis systems) or other socio-technical interventions (e.g. security policies or communication policies) on team performance. To position P4 results for transition, the experimentation framework (CE Store) underpins work across the project. Transitions have taken place with DSTL, using CE for intelligence analysis with a focus on storytelling (Project J), modelling cyber assets, modelling links between mission objectives and assets (Mission Assurance and Configuration using Requirements Traceability), and - involving DSTL staff working at IBM UK - exploration of a CE agent for analysing Chinese. Discussions have begun with IBM product groups on incorporating aspects of the research, including Information Virtualisation with CE as a business level language.

54

There is ongoing transition with ARL on the applicability of CE and the experimental framework, and various collaborations between P4 members and the NS-CTA, TerraHarvest, and ARL anomaly determination teams. P4’s conversational interface (with real-time text-to-voice and voice-to-text) was demonstrated to CERDEC and other parties at a large US technology event. The CE infrastructure was used to orchestrate a high profile NS-CTA experiment briefed to the “RMB” and deemed to be a great success for the NS-CTA and a good example of cross-program collaboration with the ITA. Further ARL transitions include supporting work with Fraunhofer (FKIE) on the Battle Management Language, and discussions with the ARL language team. Project 5 also features strong US-UK collaboration with PSU, Imperial and ARL working together on fault diagnosis and on inferring service topologies. PSU and IBM collaborate closely with TA5 (P1) on network tomography approaches. RPI collaborate with IBM US and UK on autonomous service configuration, while Cardiff and IBM have worked together on asset sharing models and algorithms. P5 links closely to P4 on integrating CE into the asset sharing framework, and with P6 on using CE to define a commander or analyst’s objectives. Work on topology inferencing and fault diagnosis has been transitioned into a DTRA (Defense Threat Reduction Agency) project. The ITA Information Fabric is being open sourced as a basis for the Land Open System Architecture (LOSA) Common Open Infrastructure (Land) (COIL) implementation. It is also part of the LOSA COIL standard, UK MOD Defstan 23-14, It has been used as part of the UK Air C2 JBRIDGES concept. The service-based deployment of the ITA Information Fabric has been transitioned to the Dstl CCS (C4ISR Concepts & Solutions) programme. Patents have been filed - or are in process of disclosure - on service composition and high-level policy management. Project 6 is built on strong academic, industrial, and government collaboration to address crosscutting issues of trust, uncertainty, reasoning, information acquisition, and inference management. Key collaborations include: Aberdeen, ARL, CMU, IBM US and UCLA collaborating on developing new subjective logic operators; Aberdeen, CMU and CUNY on developing argument schemes to represent probabilistic evidence; and Aberdeen, Honeywell and UCLA on the development of the collaborative intelligence analysis space. P6 collaborates with P4 on the use of CE to express uncertainty in crowd- sourced sensing. P6 researchers are pursuing diverse opportunities for transitioning ideas into exploitable technologies. The inference and trust management work has been positioned as a multitiered architecture on top of the ITA Information Fabric with Android mobile devices at the edges. The HyperArg reasoning engine software enables argumentation-based decision making while minimizing cognitive overload and integrating information coming from distributed sources with various degree of trust. The CISpaces tool, already demonstrated to the scientific community, will assist coalition partners in the generation and resolution of hypotheses to attain situational awareness and determine courses of action. One concrete transition vehicle is the MIPS (Management of Information Processing Services) system, which involves the Information Fabric, the Gaian Database, and the CE to enable agile composition of information processing pipelines across a Coalition. The CISpaces tool, the HyperArg reasoning engine software, and potentially the multi-tiered Inference Management Firewall can all be made available as composable services in MIPS. Additionally, the DSTL expects to transition the CISpaces tool, if successful, into a multinational applied research programme in support of the 5-eyes intelligence community.

55

3.2 Project 4 - Human-Information Interaction

3.2.1 Introduction Project 4 seeks a better understanding of how to support human agents in achieving benefit from the informational environments in which they operate. Coalitions operate within a complex environment in which human thought and action is shaped by the capabilities of information and communication technologies, making it easier to generate and acquire information for use in mission-relevant tasks and to communicate and share information. However, as more information becomes available it becomes harder for people to make sense of this information within a useful timeframe. Project 4 seeks scientific advances for enabling humans to interact with information resources, taking account of the nature of the human cognitive system. Task 1: Collective Cognition in Military Coalition Environments, aims to experimentally evaluate the interactions that occur between cognitive, social, informational and technological factors in the context of tasks that involve the coordinated interaction of multiple individuals, working within a distributed, network-enabled environment. The task involves research into a generic capability for running computer simulation experiments that feature cognitively-sophisticated synthetic agents as experimental participants and research into cognitive modelling techniques within a cognitive architecture, ACT-R. Research in the first year has extended ACT-R to support cognitive social simulation experiments, and has built a simulation capability to support novel investigations into team cognition. A problem solving task in the ELICIT framework is being modelled which draws on cognitive, social, informational and technological factors, and is suitable for testing the ability of the simulation capability to support experimental investigations of socially-distributed cognition in complex task environments. Task 2: Fact Extraction and Reasoning Using Controlled Natural Language, aims to improve machine-assisted fact extraction from Natural Language (NL) text and application of these facts in analytic and problem-solving tasks, to reduce the cognitive load on humans in a collaborative coalition environment. This combines a human- and machine-readable controlled natural language, ITA Controlled English (CE) with recent developments in linguistics. CE assists collaborative reasoning and communication between man and machine, and can lead to a better understanding of conclusions. Deeper linguistic knowledge and language processing capabilities can be used to extract finer details from text and to understand how CE can be made more expressive. Research issues are to integrate linguistic resources, the English Resource Grammar (ERG) and Minimal Recursion Semantics (MRS) to CE, to handle uncertainty and ambiguity in NL by using assumptions to reason over alternative interpretations, and to transform linguistic semantics into domain semantics to extract CE facts and rules from sentences. Task 3: Coalition Context-Aware Assistance for Decision Makers, aims to improve the informational support provided to coalition decision-makers in a manner that is sensitive to their decision-making context. The research uses conversational human-machine interaction to elucidate and model the key semantic features of a decision problem, and to combine these features with semantically-enriched representations of information to put human agents in touch with relevant coalition resources. The key advances are conversational techniques to support human-machine communication (e.g. for resource identification and tasking), the ability to capture new “local knowledge” and semantic models dynamically from users at the edge of the network, and the factoring-in of contextual uncertainties (e.g. quality-of-information and network availability) into the exploitation of information. The central research issue is to define and validate a conversational protocol for exchanges between humans, and between humans and machines, able to handle information collection from humans and sensors and information fusion.

56

3.2.2 Research Impact Technical merit: The key technical contributions for the past year include: • an initial cognitive computational model of collaborative problem solving in a specific task context (the ELICIT task), implemented using the ACT-R cognitive architecture and providing a proof of concept regarding the feasibility of implementing complex, team-based tasks in ACT-R [1]63. (Task 1) • an ACT-R-based cognitive social simulation capability (the ACT-R/CSSC) to support studies into collective or team cognition [2]64. (Task 1) • the design of a human experiment and supporting software platform to gather data on the effect of different information sharing strategies on human task performance on a variant of the ELICIT task, and to investigate how guesses at the solution in the early stages of the task affect subsequent team performance [3]65. (Task 1) • a research framework to represent the deep ERG and MRS linguistic semantics in CE and to apply linguistic reasoning to extract CE facts and rules from NL sentences, based upon integration of these DELPH-IN linguistic resources and transformation of the ERG and MRS semantics into domain semantics guided by the CE conceptual model [5]66. (Task 2) • the analysis of uncertainty and ambiguity in NL, showing how even simple sentences require ambiguity resolution using domain reasoning 67, applying CE reasoning to represent NL uncertainties [7]68, and extending mechanisms in computational linguistics to allow the CE domain model to guide NL disambiguation processing 69. (Task 2) • the extraction of CE facts and rules from the ELICIT sentences and the use of these facts to perform the ELICIT task, providing a detailed example of CE for reasoning, visualisation of assumptions and their impact, and transformation of sentences into rules 70[5] 66. (Task 2) • the definition and validation, including initial trials with human subjects, of a conversational protocol for free-flowing human-machine, machine-machine, and machine-human exchanges

63 Smart, P. R., & Sycara, K. (2014) Cognitive Social Simulation and Collective Sensemaking: An Approach Using the ACT-R Cognitive Architecture. 6th International Conference on Advanced Cognitive Technologies and Applications (COGNITIVE'14), Venice, Italy. (https://www.usukitacs.com/node/2618) 64 Smart, P. R., Richardson, D. P., Sycara, K., & Tang, Y. (2014) Towards a Cognitively Realistic Computational Model of Team Problem Solving Using ACT-R Agents and the ELICIT Experimentation Framework. 19th International Command and Control Research Technology Symposium (ICCRTS'14), Alexandria, Virginia, USA. 65 Smart, P. R., Sycara, K., & Tang, Y. (2014) Using Cognitive Architectures to Study Issues in Team Cognition in a Complex Task Environment. SPIE Defense, Security, and Sensing: Next Generation Analyst II, Baltimore, Maryland, USA. (https://www.usukitacs.com/node/2563) 66 Mott, D., Poteet, S., Xue, P., & Copestake, A. (2014), Natural Language Fact Extraction and Domain Reasoning using Controlled English , DELPH-IN 2014, Portugal. https://www.usukitacs.com/node/2675 & http://www.delph- in.net/2014/Mot:Pot:Xue:14.pdf 67 Mott D. (2014) On Interpreting ELICIT sentences, Jan 2014, https://www.usukitacs.com/node/2603 68 Xue, P., Poteet, S., Kao, A., Mott, D., & Giammanco, C. (2014) Representing Uncertainty in CE (accepted for MILCOM 14) https://www.usukitacs.com/node/2677 69 Mott, D., (2014) CE-based mechanisms for handling ambiguity in Natural Language, Feb 2014, https://www.usukitacs.com/node/2612 70 Mott, D. (2014) Conceptualising ELICIT sentences, Apr 2014, https://www.usukitacs.com/node/2604

57

[8]71, incorporating NL, CE, and graphics, applied to use cases in a tactical coalition operation context (information collection from humans, fusion, sensor tasking) [9]72. (Task 3) • a model of resource allocation in network systems as a "stochastic knapsack problem" to handle uncertain factors like unreliable wireless medium or variable quality of sensor outputs [10]73, and a systems architecture 74 to integrate with the conversational interface. (Task 3) • the conducting of a technology integration experiment to show how key research elements can be combined to support rapid but informed decision-making capabilities at lower echelons in coalition operations 75[11]76. (Task 3) Synergistic value of collaboration: The research on cognitive modelling based on ACT-R is principally being undertaken by Southampton in collaboration with CMU and ARL. Input from CMU is particularly important as the underlying ACT- R cognitive architecture was originally developed at CMU. IBM UK, in partnership with Southampton, leads the effort to integrate CE into the simulation capability based upon the CE store and this has generated discussions on language and cognition. Airbus Group (formerly EADS) and Southampton have worked to identify focus areas for cognitive social simulation experiments. The choice of the ELICIT task has influenced the Task 2 NL and CE reasoning research by providing a challenging analytic task that requires these techniques. NL processing and CE reasoning research is being undertaken by IBM UK and Boeing in collaboration with Cambridge and ARL. Boeing's linguists and IBM UK have jointly researched the handling of NL ambiguities and uncertainties. Collaboration with Cambridge has enabled deeper understanding of the ERG and MRS, has provided validation of our work against the state-of-the-art, has initiated work on distributional semantics for proposing new additions to the CE model when unknown words are found in text, and has introduced us to the wider DELPH-IN community, to which we have contributed novel work on mapping of linguistic semantics to domain semantics. ARL is working with IBM UK to explore the NL and CE systems for analytic tasks. Collaboration is occurring with P6.1 on the link between CE and argumentation, with a view to making arguments more accessible to users. Collaboration is occurring across P4 on CE for NL, reasoning, knowledge representation and cognition. Research on context-aware assistance is led by Cardiff with experience in matching resources to mission tasks, in collaboration with IBM UK who leads the NL processing, CNL-based modelling and reasoning mechanisms that underpin the conversational interfaces to the user. UCLA collaborates on crowdsourcing

71 Preece, A., Braines, D., Pizzocaro, D., & Parizas, C. (2014) Human-Machine Conversations to Support Multi- Agency Missions, ACM SIGMOBILE Mobile Computing and Communications Review, 18(1):75-84, 2014. https://www.usukitacs.com/node/2630 72 Preece, A., Gwilliams, C., Parizas, C., Pizzocaro, D., Bakdash, J., & Braines, D. (2014) Conversational Sensing, Next-Generation Analyst II (SPIE DSS), 2014. https://www.usukitacs.com/node/2569 73 Hu, N., Pizzocaro, D., M P Johnson, M. P., La Porta, T., & Preece, A. (2013) Resource Allocation With Non- Deterministic Demands and Profits, in Proc IEEE MASS, 2013. https://www.usukitacs.com/node/2275 74 Hu, N., La Porta, T., Pizzocaro, D., & Preece, A. (2013) A System Architecture for Decision-Making Support on ISR Missions With Stochastic Needs and Profit, in Proc Ground/Air Multisensor Interoperability, Integration, and Networking for Persistent ISR IV (SPIE Vol 8742), 2013. 75 Braines, D., de Mel, G., Gwilliams, C., Parizas, C., Pizzocaro, D., & Preece, A. (2014) Agile Sensor Tasking for COIST using Natural Language Knowledge Representation and Reasoning, SPIE Ground/Air Multisensor Interoperability, Integration, and Networking for Persistent ISR V, 2014 76 Braines, D., Preece, A., de Mel, G., & Pham, T. (2014) Enabling CoIST Users: D2D at the Network Edge, FUSION 2014. https://www.usukitacs.com/node/2697

58

soft information and local knowledge from humans, with particular focus on uncertainty in the quality of information (aligned with their activities in P6). PSU leads research on handling uncertainties in resource availability, and provides a bridge to P5 research on the distributed coalition services layer that provides information services and access to assets; PSU also provides links to the NS-CTA QoI research. Collaboration on the design and conducting of experimentation is occurring between IBM US on technology integration experiments and scenarios, ARL on human-behavioural experiments, and Cardiff, IBM UK and UCLA on conversational interfaces. Collaboration will occur with Westpoint Cadets at IBM UK in June /July. Scientific challenge: The underlying challenge for Project 4 is to create better computational models of the cognitive and interactional processes involved in human-machine collaboration on complex tasks, in order to inform the design of computer systems that may more effectively assist humans in these tasks. Task 1 seeks to understand how features of military coalition environments affect socially-distributed cognitive processes via the use of cognitive computational modelling techniques and the ACT-R cognitive architecture. Specific challenges are: to develop a generic capability for running cognitive social simulation experiments using ACT-R, to undertake experiments with small teams of human subjects in a cognitively-demanding task, to develop cognitive computational models of human task performance, to test the integrity of the simulation capability and evaluate the fidelity of the computational models of task behaviour. Task 2 seeks ways to apply a user's conceptual model of a domain to the extraction of CE facts from NL sentences and to perform problem solving by inferencing of high value information from these facts. Specific challenges are: to transform the linguistic semantics as derived from the ERG NL parsing system into domain specific CE facts, to guide disambiguation of sentences using the domain model, to represent uncertainties in sentences, to support the user in expressing their knowledge and reasoning in CE in order to perform complex problem solving, and to present the rationale (including uncertainties) for conclusions to the user in a meaningful way. Task 3 seeks to use conversational Natural Language interaction to open up the process by which a non- technical decision maker, at or near the network edge in a coalition context, can access the best set of information resources to meet their problem-solving needs. Specific challenges are: to combine NL processing and CE to assist the decision maker in elucidating and modelling the semantic features of their problem and linking those features to available resources, and to automate a formal matching procedure between user’s intent and ISR resources, by playing back CE to the user to provide a more formal, less ambiguous statement, of the requirements. Military relevance: Research into team cognition is important for understanding the effect of the military coalition environment on the dynamics of collaborative problem solving and distributed decision making, and the outputs of Task 1 research are intended to provide insights into the socio-technical and human factors interventions that might be required to improve team performance in military coalitions. We aim to use ACT-R/CSSC to extend the results of ELICIT experiments undertaken in respect of the NATO Network- Enabled Capability Command & Control Maturity Model 77. By supporting cognitive social simulation experiments that factor in the dimensions characterised by this model, we aim to further our understanding of the interaction between cognitive, social, informational and technological factors in different C2 organisational environments.

77 Alberts, D. S., Huber, R. K., & Moffat, J. (2010) NATO NEC C2 Maturity Model. Command and Control Research Program, Department of Defense, USA.

59

Sources of NL information are key to military personnel in performing analytic and other cognitive tasks, and the outputs of Task 2 research into extraction of CE facts from NL text are intended to increase the range of applications that could support the military in these tasks, by making more domain specific information available for assisted and automated reasoning. The research into extending the capabilities of CE could facilitate collaboration between man and machine in cognitive tasks that require inference of high value information, fusion of information from different sources, handling of uncertainties, hypothetical reasoning, provision of explanation for conclusions and analysis of the impact of uncertain information. An emerging generation of intelligence analysts and information users are familiar with the capabilities of context-aware (typically mobile) computing devices for providing access to relevant sources (e.g. “apps”) in tactical decision-making situations. Task 3 seeks to support such users in information-collection, fusion, and asset-tasking via human-machine conversational interactions, requiring relatively low training overhead. This could support decision makers by identifying solution options in terms of the best coalition information resources (78 gives evidence for the need for support) whilst considering the effect of uncertainty (problem vagueness, source value and availability), by refining user's queries to enable assistance from the wider social network of coalition experts, and by capturing local knowledge at the tactical edge. Exploitation & technology transition potential: For task 1, a significant opportunity for exploitation and technology transition concerns the use of the ACT-R/CSSC as a generic platform for undertaking research into team or collective cognition, without disrupting the operational environment. The capability is intended to serve as a tool for use in organizational environments where team-based problem solving and distributed decision-making are of critical significance. The ACT-R/CSSC could be applied to investigate the effect of specific technologies (e.g. collaborative intelligence analysis systems) or other socio-technical interventions (e.g. security policies or communication policies) on team performance, extending previous ACT-R work focused on individual human agents. For task 2, transitions have taken place with DSTL, using CE for: intelligence analysis with a focus on storytelling (Project J); modelling cyber assets; modelling links between mission objectives and assets (Mission Assurance and Configuration using Requirements Traceability); and (involving DSTL staff working at IBM UK) exploration of a CE agent for analysing Chinese. Discussions have begun with IBM product groups on incorporating aspects of the research, including Information Virtualisation with CE as a business level language. Task 2 has supported Cheryl Giammanco (ARL) in CE conceptual model generation to support future experimentation with analysts in the Agri-Development Team with a view to potential transition opportunities. For task 3, there is ongoing transition with ARL on the applicability of CE and the experimention framework, and various collaborations between P4 members (IBM UK, IBM US, Cardiff) and the NS- CTA, TerraHarvest, and ARL anomaly determination teams. The conversational interface (with real-time text-to-voice and voice-to-text) was demonstrated to CERDEC and other parties at a large US technology event. The CE infrastructure was used to orchestrate a high profile NS-CTA experiment briefed to the “RMB” and deemed to be a great success for the NS-CTA and a good example of cross-program collaboration with the ITA. Further ARL transitions include supporting work with Fraunhofer (FKIE) on the Battle Management Language, and discussions with the ARL language team. The conversational interface is being applied in a project to support real-time analysis of social media in relation to the 2014 NATO summit being hosted in the UK.

78 Bakdash, J., Pizzocaro, D., & Preece, A., Soldier Decision-Making for Allocation of Intelligence, Surveillance, and Reconnaissance Assets, 19th ICCRTS, 2014. https://www.usukitacs.com/node/2571

60

3.2.3 Technical Accomplishments The type of military activity that we aim to support, such as intelligence analysis, planning and sensemaking, can involve complex and collaborative problem solving that requires human cognition. To understand how such activities can be assisted by computers, our research seeks a better understanding of human cognition including reasoning and use of language. Task 1 is researching a cognitively realistic computational model of human agents, using the ACT-R framework 79 80 to simulate and explore collaborative problem solving; papers are offered on the initial construction of a ACT-R cognitive model, on a simulation capability to support cognitive social simulation experiments using ACT-R cognitive models, on the theoretical motivations and research areas for conducting such experiments, and on the exploration of the relations between the ACT-R cognitive framework and language. Task 2 is researching into the cognitive task of extracting facts from NL sentences, and reasoning with those facts, including the use of rationale; papers are offered on the use of a CE conceptual model to guide extraction of CE facts from NL sentences and problem solving using these facts, on integration of the ERG 81 and MRS 82 to the CE system, and on handling of uncertainties in NL sentences using assumptions. Task 3 is researching how conversational interfaces can aid dynamic exchange of knowledge between man and machine in a more natural way whilst ensuring that the information is accurately conveyed in terms of a shared conceptual model. Papers are offered on the creation of a computational model for NL-CE conversational interactions, on the application of this model to tasks in the ISR domain with a pilot experiment, on resource allocation under uncertain conditions, and on a human-in-the-loop technology integration experiment to assist tactical decision-making. Human cognition has a close relationship to language, and all the tasks use CE, and the CE store, as a common language to represent knowledge or reasoning. This allows sharing of common domain models as well as fostering a common approach to designing knowledge representations and reasoning strategies. Each task has created challenges to the CE modelling and reasoning infrastructure, and this has led to a more powerful CE capability and better examples of of its use. We are using the "ELICIT task" (from the ELICIT framework, 83 84 85 designed to advance understanding of collective performance) which requires collaborating human and machine agents to identify participants and targets in a possible terrorist attack on the basis of pieces of textual information

79 Anderson, J. R. (2007) How Can the Human Mind Occur in the Physical Universe? Oxford University Press, Oxford, UK. 80 Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., & Qin, Y. (2004) An integrated theory of the mind. Psychological Review, 111(4), 1036-1060. 81 Flickinger, D. (2007) The English Resource Grammar, LOGON technical report #2007-7, www.emmtee.net/reports/7.pdf 82 Copestake, Ann., Flickinger, D., Sag, I. A., & Pollard, C. (2005) Minimal Recursion Semantics: an introduction. Research on Language and Computation, 3(2-3):281–332. (2005) 83 Ruddy, M. (2007) ELICIT - The experimental laboratory for investigating collaboration, information sharing and trust. 12th International Command and Control Research and Technology Symposium (ICCRTS), Newport, Rhode Island, USA. 84 Manso, M. (2012) N2C2M2 Validation using abELICIT: Design and Analysis of ELICIT runs using software agents. 17th International Command and Control Research and Technology Symposium, Fairfax, Virgina, USA. 85 Manso, M., & Ruddy, M. (2013) Comparison between human and agent runs in the ELICIT N2C2M2 validation experiments. 18th International Command and Control Research and Technology Symposium (ICCRTS), Alexandria, Virginia, USA.

61

(factoids). As recommended by the 2013 peer review, we aim to construct an integrated CE-based demonstration 86 to combine Task 1 simulation capability with Task 2 NL processing and Task 3 conversational interfaces, grounded on the ELICIT task, and the use of the CE store will facilitate future integration into the experimentation framework. Project 4 is collaborating with Task 5.2 on the specification of mission requirements and asset management policies via a conversational interface and their representation in a CE model. Project 4 is collaborating with Task 6.1 on the link between assumptions and argumentation for collaborative reasoning 87; the exploration of uncertainty in NL processing is providing Task 6.1 with a representative example of the use of assumptions and the argumentation algorithms could extend CE reasoning capabilities. Further collaboration may occur in the translation of the Task 6.1 argumentation process expressed in logical programming rules into a CE in the hope that this will be more intuitive to the user. Our work in conversational interfaces is also relevant to Task 6.3, where crowdsourcing experiments are planned in the use of such interfaces to explore issues of QoI where information is reported by users interactively. Experimentation is being conducted to gain feedback from humans exposed to our technology: a variant on the ELICIT framework 88 is being used to run experiments involving human and synthetic agents in problem-solving; informal experiments on building conceptual models and reasoning are being undertaken with ARL; and pilot studies have been undertaken on the conversational interface for assisting input of information about events captured on photographs. Task 4.1 Collective Cognition in Military Coalition Environments Cognitive Social Simulation and Collective Sensemaking: An Approach Using the ACT-R Cognitive Architecture [1]63 Smart, P. R., & Sycara, K. (2014) Cognitive Social Simulation and Collective Sensemaking: An Approach Using the ACT-R Cognitive Architecture. 6th International Conference on Advanced Cognitive Technologies and Applications (COGNITIVE'14), Venice, Italy. (https://www.usukitacs.com/node/2618) This paper describes the approach taken with respect to the development of an ACT-R cognitive model to support studies into socially-distributed cognition. The focus of the modelling effort is the ELICIT task, which is a task previously used to investigate sensemaking performance within teams of subjects in the context of the ELICIT experimentation framework. The paper presents the results of a knowledge analysis of the task domain. This was undertaken in order to better understand the conceptual structures that are required by agents performing the ELICIT task. The knowledge analysis also revealed the various inferences that were supported by task-relevant information. The results of the knowledge analysis were presented in the form of a CommonKADS knowledge model, which is a semi-formal approach to knowledge representation commonly used in the early stages of knowledge engineering initiatives. The knowledge model was subsequently used as a specification for a prototype reasoning system, which was developed using the C Language Integrated Production System (CLIPS) expert system shell. This prototype was used to evaluate the knowledge model specification and check that the various inferences captured by the model were sufficient to yield a solution to the problem presented by the ELICIT task. The revised knowledge model specification was subsequently used to implement the ‘ELICIT Cognitive

86 Mott, D. ITA Project 4 common demonstration, May 2014, https://www.usukitacs.com/node/2691 87 Toniolo, A., Norman, T.J., Sycara K., & Mott, D. “On the benefits of argumentation schemes in deliberative dialogue for collaborative decision-making”. ITA Annual Fall Meeting, 2013. 88 Ruddy, M. (2007) ELICIT - The experimental laboratory for investigating collaboration, information sharing and trust. 12th International Command and Control Research and Technology Symposium (ICCRTS), Newport, Rhode Island, USA.

62

Computational Model’ (ECCM), which is an ACT-R cognitive model capable of performing the ELICIT task. The paper provides an overview of the ECCM in terms of the various data structures and production rules that support agent-level cognitive processing. In addition, the paper describes how multiple instances of the ECCM can be instantiated within the ACT-R environment in order to support the implementation of multi-agent (i.e., social) simulation experiments. The paper concludes with a discussion of some of the limitations of the current ECCM, and describes how these limitations can be addressed in future work. In general, the paper describes our approach to the development of an initial ACT-R cognitive model that is capable of solving the ELICIT task. The paper also illustrates how conventional knowledge engineering techniques can be applied to the problem of developing ACT-R models that must operate within complex task environments. Towards a Cognitively Realistic Computational Model of Team Problem Solving Using ACT-R Agents and the ELICIT Experimentation Framework [2]64 Smart, P. R., Richardson, D. P., Sycara, K., & Tang, Y. (2014) Towards a Cognitively Realistic Computational Model of Team Problem Solving Using ACT-R Agents and the ELICIT Experimentation Framework. 19th International Command and Control Research Technology Symposium (ICCRTS'14), Alexandria, Virginia, USA. (https://www.usukitacs.com/node/2619) The main aim of this paper is to describe the simulation capability that was developed to support cognitive social simulation experiments using the ACT-R architecture. This capability is referred to as the ACT-R/CSSC. The ACT-R/CSSC comprises a number of extensions to the core ACT-R architecture, which are implemented as ACT-R modules (this is the main way in which the functionality of the existing ACT-R architecture is extended). Each of the new modules is described in some detail in the paper. The modules include the messaging module (which is used to support inter-agent communication), the language module (which is used to support the interpretation of messages that are exchanged by ACT-R agents), the web module (which is used to support the interaction of ACT-R agents with shared information repositories), and the self module (which supports the representation of self-related information, such as an agent's personality characteristics, attitudes and behavioural predispositions). These modules work together to support the implementation of simulation experiments similar to those undertaken in the context of the ELICIT experimentation framework, which was developed as part of the DoD’s Command and Control Research Program (CCRP). The paper also describes a number of memory- resident databases that are used to store information about particular experimental simulations (e.g. experimental configuration information and experimental results). The content of these databases can be exported at the conclusion of a simulation experiment in order to support the visualization and analysis of simulation results. A specific tool, called the ACT-R/CSSC Results Viewer is briefly described in the paper. This tool is capable of importing the data exported by the ACT-R/CSSC, and it provides features that enable end-users to study the cognitive processing routines of individual agents at each step in the simulation. In general, the paper provides a detailed discussion of the architecture and functionality of the ACT-R/CSSC. It also describes how the features of the ACT-R/CSSC can be used to support simulation experiments similar to those undertaken with the ELICIT experimentation platform. Using Cognitive Architectures to Study Issues in Team Cognition in a Complex Task Environment [3]65 Smart, P. R., Sycara, K., & Tang, Y. (2014) Using Cognitive Architectures to Study Issues in Team Cognition in a Complex Task Environment. SPIE Defense, Security, and Sensing: Next Generation Analyst II, Baltimore, Maryland, USA. (https://www.usukitacs.com/node/2563) The main aim of this paper is to outline a number of areas of research that can be undertaken using the ACT-R/CSSC. These include the use of the ACT-R/CSSC to explore issues in agent communication, social trust and influence, and the interaction between memory processes and problem-solving performance in highly dynamic information environments. Importantly, each of these areas of research is

63

intended to capitalize on the availability of cognitively-sophisticated agent implementations in order to run experiments that would be difficult to perform with conventional (non-cognitive) synthetic agents. The core claim is that by factoring in the kinds of constraints and capabilities exhibited by the human cognitive system, cognitive architectures enable us to assess the role that cognitive-level variables play in supporting (or subverting) task performance. Conventional agent-based simulations sometimes fail to yield this sort of understanding precisely because they do not aim to represent many of the features associated with human cognitive processing in the context of complex, collaborative tasks. Exploring the use of Controlled English for communication with ACT-R agents [4]89 Mott, D., Stone, P., & Richardson, D., (2013) Exploring the use of Controlled English for communication with ACT-R agents, ITA Annual Fall Meeting, 1st - 3rd October 2013, https://www.usukitacs.com/node/2499 This paper explores some of the implications of using ITA Controlled English as the language for inter- agent communication in the Task 1 ACT-R collaborative sensemaking experiments. The original proposal suggested the need for a communication language that would convey semantic information to assist collaborative cognition, and CE is a useful candidate for this purpose. In addition, its use would facilitate other human and non-human agents to be involved, either from Project 4 or from other ITA projects. The paper considers how transformations may occur between ACT-R representational structures (called ‘chunks’) and CE, based upon a mapping between the semantics of ACT-R chunks and CE sentences. Although a basic mapping is relatively straightforward, the use of CE led to additional considerations that would probably not have arisen had a more computational language, such as OWL, been chosen as the communication language. CE permits alternative ways to express the same logical information, in effect defining different styles of communication, and it is speculated that stylistic information might be useful in communicating information over and above the basic logic of the statements. In more general terms, the use of CE could raise interesting questions as to how language might affect cognition within the context of the Project 4 research. Task 4.2: Fact Extraction and Reasoning Using Controlled Natural Language Natural Language Fact Extraction and Domain Reasoning using Controlled English [5]66 Mott, D., Poteet, S., Xue, P., & Copestake, A. (2014), Natural Language Fact Extraction and Domain Reasoning using Controlled English, DELPH-IN 2014, Portugal. https://www.usukitacs.com/node/2675 & http://www.delph-in.net/2014/Mot:Pot:Xue:14.pdf

This paper describes the research undertaken to explore the ELICIT identification task as an example of a complex problem that involves many cognitive tasks which would have to be modelled in order to be achieved by a computer agent. Successful problem solving requires the analysis of NL sentences based on common sense knowledge, the representation of the domain, the representation of a problem solving strategy, the construction of suitable reasoning rules, the running of the rules, the making of assumptions, and detection of inconsistencies and the presentation of rationale. The paper describes research into how CE technology can contribute to the solution of these problems. Knowledge of the domain was represented in a CE conceptual model that defined the concepts and the logical rules to perform reasoning. Two simplifications were made in order to achieve initial results. Firstly it was decided to focus on one aspect of the identification task (‘who’ were the agents responsible?). Secondly, analysis

89 Mott, D., Stone, P., & Richardson, D., (2013) Exploring the use of Controlled English for communication with ACT-R agents, ITA Annual Fall Meeting, 1st - 3rd October 2013, https://www.usukitacs.com/node/2499

64

reported in 67 showed that even the relatively simple sentences used in the ELICIT task contained significant ambiguities that required domain and common sense reasoning, and automatically interpreting all of the NL sentences would be highly complex, as much common sense knowledge would have to be extracted and represented. Therefore it was decided initially to have a human convert the original sentences into simpler sentences that were still Natural Language but avoided any ambiguities, thus temporarily avoiding the complex problems of common-sense interpretation. These simpler sentences were analysed by the ERG system into an MRS linguistic semantic representation, which was further analysed by the CE-based linguistic-to-domain semantic transformation, resulting in the extraction of domain CE facts. As also noted in 67, some of the NL sentences expressed rules rather than facts, and research was undertaken to determine how the CE rules for processing a sentence could themselves generate other CE rules that were needed to solve the ELICIT identification task. The paper describes an extension of the CE meta-model, which allowed rules to be objects generated by other rules. The domain model and extracted rules were run to identify the ‘who’ component, and this provided a rationale graph of the reasoning leading to the conclusions, showing dependency on an assumption. Following 69, this assumption could be explored in the rationale graph, allowing the analyst to understand this source of uncertainty. The research is also being used as an example of the use of assumptions provided to Project 6 for collaboration on argumentation. A reduced version of this paper is provided in 90 and some of the ideas (using the earlier Stanford parser technology) were presented at the RuleML 13 conference 91. Using the English Resource Grammar to extend fact extraction capabilities [6]92 Mott, D., Poteet, S., Xue, P., Kao, A., & Copestake, A., (2013) Using the English Resource Grammar to extend fact extraction capabilities, ITA Annual Fall Meeting, 1st - 3rd October 2013, https://www.usukitacs.com/node/2498. This paper describes research into how the information available from the DELPH-IN community's linguistic systems can be utilised by the CE-based reasoning system to analyse NL sentences into CE facts. Supported by the PET parser 93, the ERG (English Resource Grammar) 81 and MRS (Minimal Recursion Semantics) 82 provide a deep but complex analysis of the sentences, and construct a rich linguistically-based semantic representation. It is this MRS information that must be transformed into the domain semantics that may be written as the CE facts to be extracted. The paper explores the steps (and the initial implementation) towards understanding how this transformation may occur, and describes how the MRS output is converted into a low level CE representation, how this may be transformed by CE rules first into generic semantics (involving situations) and then into domain specific semantics. Additional consideration is given to how new items may be added to the lexicon, and how the basic grammatical knowledge in the ERG can be expressed in CE diagrams and semantic frames, potentially allowing the user to change the grammar (as further described in 94). The paper describes the analysis of an example SYNCOIN sentence, and a more advanced demonstration was given at AFM13 involving the

90 Mott, D., Braines, D., Xue, P., & Poteet, S (2014), ITA Controlled English and its applications, Paper for future submission, https://www.usukitacs.com/node/2676 91 Xue, P., Poteet, S., Kao, A., Mott, D., & Braines D. (2013) Constructing Controlled English for Both Human Usage and Machine Processing, RuleML, July 2013, https://www.usukitacs.com/node/2507 92 Mott, D., Poteet, S., Xue, P., Kao, A., & Copestake, A., (2013) Using the English Resource Grammar to extend fact extraction capabilities, ITA Annual Fall Meeting, 1st - 3rd October 2013, https://www.usukitacs.com/node/2498 93 The PET parser, http://moin.delph-in.net/PetTop 94 Mott, D. (2013) Is “male” a mass noun: searching for mal-rules and justifications, Nov 2013, https://www.usukitacs.com/node/2564

65

disambiguation of an anaphoric reference pronoun "I" in the context of a report about communications 95, 96. An earlier version of this paper had been presented to the DELPH-IN community 97, and several non ITA researchers were motivated to consider the ideas to assist their own research. Representing Uncertainty in CE [7]68 Xue, P., Poteet, S., Kao, A., Mott, D., & Giammanco, C. (2014) Representing Uncertainty in CE, (accepted for MILCOM 2014) https://www.usukitacs.com/node/2677 An important consideration in the extracting of facts from NL sentences is the analysis and handling of uncertainty and ambiguity, since it is important that the user of the facts be aware of any uncertainties and the degree of uncertainty that underlie the facts. Uncertainties in language may be derived from a variety of sources, including specific expressions of uncertainty ("the cat may be sleeping"), or ambiguities ("the man hit the dog with the stick"). This paper focuses on the first source, and shows that the SYNCOIN dataset contains a number of different types of linguistic expression suggesting some degree of uncertainty in the mind of the speaker (using words such as "may", "possible", "remember", "claim"). The paper considers how such linguistic expressions could be represented in CE as the target of fact extraction. The mapping of linguistic expressions onto numeric estimates of uncertainty is considered to be unsatisfactory, and an approach where linguistic expressions could be categorised into different qualitative kinds of uncertainty is more appropriate. The paper proposes that qualitative uncertainty could be represented as different types of CE assumptions, and extends the CE syntax for this purpose. Such typed assumptions could be presented to the user to show the sources of uncertainty, and this may also of benefit to the future work of task 3, as described in [8]71. The research extends 69 which applies similar mechanisms to address the handling of word ambiguity by managing alternative interpretations together with the use of domain knowledge from the CE model to rule out inconsistent interpretations, a novel extension to selectional restrictions in NL processing. Earlier work 98 on the possible use of CE to address issues of cross-cultural ambiguities was presented at ICCRTS13. Task 3: Coalition Context-Aware Assistance for Decision Makers Human-Machine Conversations to Support Multi-Agency Missions [8]71 Preece, A., Braines, D., Pizzocaro, D., & Parizas, C. (2014) Human-Machine Conversations to Support Multi-Agency Missions, ACM SIGMOBILE Mobile Computing and Communications Review, 18(1):75- 84, 2014. https://www.usukitacs.com/node/2630 This paper proposes a Natural Language-based conversational model intended to support human-machine working in coalition (multi-agency) settings. The main novel feature of the model is to support the flow of conversations from full Natural Language into a form of Controlled Natural Language, CE, amenable to machine processing and automated reasoning, including high-level information fusion tasks. The use of CE, for example in feeding back the sentence interpretations from machine to human, is intended to

95 Mott, D., Poteet, S., Kao, A., Xue, P., & Copestake, A. (2013) ITAFM13 demonstration of exploring enhanced fact extraction capabilities using Controlled English (summary), ITA Fall Meeting September 2013, https://www.usukitacs.com/node/2671 96 Mott, D., Poteet, S., Kao, A., Xue, P., & Copestake, A. (2013) ITAFM13 demonstration of exploring enhanced fact extraction capabilities using Controlled English, ITA Fall Meeting September 2013, https://www.usukitacs.com/node/2672 97 Mott, D., Poteet, S., Xue, P., Kao, A., & Copestake, A. (2013) Fact Extraction using Controlled English and the English Resource Grammar, DELPH-IN Summit, http://www.delph-in.net/2013/david.pdf 98 Poteet, S., Xue, P., Kao A., Mott, D., Braines D., & Giammanco C., Controlled English for Effective Communication during Coalition Operations, 18th ICCRTS, June 2013, https://www.usukitacs.com/node/2506

66

ameliorate traditional problems of ambiguity in Natural Language human-machine communication and provide rich (yet formal) semantic information in a format more amenable to understanding by human users. The paper shows how the conversational interactions supported by the model include requests (by human or machine) for expansions and explanations (rationale) of machine-processed information. The work is situated in domains such as emergency response, environmental monitoring, policing and security, where the need for sensor and information networks to assist human users across multiple agencies requires effective human-machine cooperation at a tactical level: human users need to task the network to help them achieve mission objectives, while humans (sometimes the same individuals) are also sources of mission-critical information. The model is validated by means of its application to a previously-developed surveillance vignette 99 exhibiting a number of use cases. Conversational Sensing [9]72 Preece, A., Gwilliams, C., Parizas, C., Pizzocaro, D., Bakdash, J., & Braines, D. (2014) Conversational Sensing, Next-Generation Analyst II (SPIE DSS), 2014. https://www.usukitacs.com/node/2569 This paper builds on the work described in [8]71 to argue that information fusion and situational awareness for Intelligence, Surveillance and Reconnaissance (ISR) activities can be seen as a conversational process among actors (both human and machine) at or near the tactical edges of a network. Motivated by use cases in the domain of Company Intelligence Support Team (CoIST) tasks, the paper shows how the conversational protocol enables interactions such as: turning eyewitness reports from human observers into actionable information (from both soldier and civilian sources); fusing information from humans and physical sensors (with associated quality metadata); and assisting human analysts to make the best use of available sensing assets in an area of interest (governed by management and security policies). Examples are provided of various alternative styles for user feedback, including NL, CNL and graphical feedback. The paper includes results from a pilot experiment with human subjects showing that a prototype conversational agent is able to gather usable CNL information from untrained human subjects. Resource Allocation With Non-Deterministic Demands and Profits [10]73 Hu, N., Pizzocaro, D., M P Johnson, M. P., La Porta, T., & Preece, A. (2013) Resource Allocation With Non-Deterministic Demands and Profits, in Proc IEEE MASS, 2013. https://www.usukitacs.com/node/2275 This paper presents a model of resource allocation in network systems to address the issues that, because of uncertain factors like unreliable wireless medium or variable quality of sensor outputs, it is not practical to assume that both demands and profits of tasks are deterministic and known a priori (both of which may in fact be stochastic following certain distributions). The work focuses on a specific case in which both demands and profits follow normal distributions, which are then extended to Poisson and Binomial variables, formulated as a stochastic knapsack problem. Tunable parameters are introduced to configure two probabilities: one limits the capacity overflow rate with which the combined demand is allowed to exceed the available supply, and the other sets the minimum chance at which expected profit is required to be achieved. Novelty lies in the way the paper defines relative (rather than constant) values for random variables in given conditions, and utilises them to search for the best resource allocation solutions. Heuristics are proposed with different optimality/efficiency tradeoffs, and results show that the algorithms run relatively fast and provide results considerably closer to the optimum than previous approaches. A companion paper 74 presents a systems architecture incorporating this work with the conversational interface and shows how the approach can be used in a coalition setting to handle the demands of competing missions, with use cases based on our surveillance vignette 99. Enabling CoIST Users: D2D at the Network Edge [11]76

67

Braines, D., Preece, A., de Mel, G., & Pham, T. (2014) Enabling CoIST Users: D2D at the Network Edge, FUSION 2014 (submitted) https://www.usukitacs.com/node/2697 This paper presents a human-in-the-loop technology integration experiment to illustrate the impact of our research capabilities in rapid but informed decision-making capabilities at lower echelons in coalition operations. The paper situates the experiment in the context of a vignette (adapted and extended from 99) and shows how several elements of the research (including the conversational protocol and uncertain resource demands introduced in the above papers) combine to: capture local information reporting, automatically infer high value information based on background knowledge, automatically raise intelligence tracking tasks and match, rank and propose appropriate assets to tasks, taking into account contextual factors such as environmental and the distributed network conditions. The approach also brings in ontology-based resource matching capabilities from earlier BPP11 work 100, and uses CE as a human- friendly, but machine processable, language that is expressive enough to serve as a single common format for both human and machine processing to support information collection from humans, information fusion, inference, and asset tasking. This demonstration capability is designed to operate in a lightweight distributed environment (built on elements of the ITA Experimental Framework) at the edge of the network and has been used in a more formal human-in-the-loop experiment to explore the potential of such systems [9]72.

3.2.4 References for Project 4

[1] Smart, P. R., & Sycara, K. (2014) Cognitive Social Simulation and Collective Sensemaking: An Approach Using the ACT-R Cognitive Architecture. 6th International Conference on Advanced Cognitive Technologies and Applications (COGNITIVE'14), Venice, Italy. (https://www.usukitacs.com/node/2618) [2] Smart, P. R., Richardson, D. P., Sycara, K., & Tang, Y. (2014) Towards a Cognitively Realistic Computational Model of Team Problem Solving Using ACT-R Agents and the ELICIT Experimentation Framework. 19th International Command and Control Research Technology Symposium (ICCRTS'14), Alexandria, Virginia, USA. (https://www.usukitacs.com/node/2619) [3] Smart, P. R., Sycara, K., & Tang, Y. (2014) Using Cognitive Architectures to Study Issues in Team Cognition in a Complex Task Environment. SPIE Defense, Security, and Sensing: Next Generation Analyst II, Baltimore, Maryland, USA. (https://www.usukitacs.com/node/2563) [4] Mott, D., Stone, P., & Richardson, D., (2013) Exploring the use of Controlled English for communication with ACT-R agents, ITA Annual Fall Meeting, 1st - 3rd October 2013, https://www.usukitacs.com/node/2499 [5] Mott, D., Poteet, S., Xue, P., & Copestake, A. (2014), Natural Language Fact Extraction and Domain Reasoning using Controlled English , DELPH-IN 2014, Portugal. https://www.usukitacs.com/node/2675 & http://www.delph-in.net/2014/Mot:Pot:Xue:14.pdf

99 Preece, A., Pizzocaro, D., Braines, D., Mott, D., de Mel, G., & Pham, T. (2012) Integrating Hard and Soft Information Sources for D2D Using Controlled Natural Language, in Proc 15th International Conference on Information Fusion, 2012 100 Preece, A., Norman, T., de Mel, G., Pizzocaro, D., Sensoy, M., & Pham, T. (2013) Agilely Assigning Sensing Assets to Mission Tasks in a Coalition Context, IEEE Intelligent Systems, 28(1):57-63, 2013.

68

[6]Mott, D., Poteet, S., Xue, P., Kao, A., & Copestake, A., (2013) Using the English Resource Grammar to extend fact extraction capabilities, ITA Annual Fall Meeting, 1st - 3rd October 2013, https://www.usukitacs.com/node/2498. [7] Xue, P., Poteet, S., Kao, A., Mott, D., & Giammanco, C. (2014) Representing Uncertainty in CE (accepted to MILCOM 14) https://www.usukitacs.com/node/2677 [8] Preece, A., Braines, D., Pizzocaro, D., & Parizas, C. (2014) Human-Machine Conversations to Support Multi-Agency Missions, ACM SIGMOBILE Mobile Computing and Communications Review, 18(1):75-84, 2014. https://www.usukitacs.com/node/2630 [9] Preece, A., Gwilliams, C., Parizas, C., Pizzocaro, D., Bakdash, J., & Braines, D. (2014) Conversational Sensing, Next-Generation Analyst II (SPIE DSS), 2014. https://www.usukitacs.com/node/2569 [10] Hu, N., Pizzocaro, D., M P Johnson, M. P., La Porta, T., & Preece, A. (2013) Resource Allocation With Non-Deterministic Demands and Profits, in Proc IEEE MASS, 2013. https://www.usukitacs.com/node/2275 [11] Braines, D., Preece, A., de Mel, G., & Pham, T. (2014) Enabling CoIST Users: D2D at the Network Edge, FUSION 2014. https://www.usukitacs.com/node/2697

69

3.3 Project 5 – Distributed Coalition Services

3.3.1 Introduction In Project 5 we address the ability to offer services and information to end users in a tactical network. We address the specification of services by non-technical end users, including rapid policy checking and generating recommendations for easing policies that are too restrictive and preclude a service from being offered. We address deployment, monitoring and redeployment to maximize the usefulness of the services on a tactical network. The services making up applications are expected to be hosted on a mix of mobile and fixed assets (i.e., hybrid wireless networks).

Service composition must be agile and resilient, in the sense that services need to be composed rapidly with a high degree of fitness-for-purpose, but also comply with high-level doctrinal constraints (not just security-type policies). There must be a verifiable link at all times between the decision intent of commanders, and the composed information services. Likewise, the performance and correct operation of deployed services must be monitored, and services may require redeployment based on network conditions.

In this project we execute two tasks. The first task addresses service deployment, monitoring and reconfiguration to provide the best set of capabilities to a decision maker. The second task addresses service and policy specification, and policy checking for coalition and security purposes.

Task 1 - Managing Distributed Service Deployments in Hybrid Wireless Networks In this task we develop algorithms for monitoring distributed service deployments and adapting deployments to improve utility. For monitoring we consider both monitor placement and inferring the cause of degradation and failures to services. The purpose of the monitoring is to provide enough information to determine the relationships between component services that make up a composite service, and to gather enough information so that the cause of service degradation and failure can be determined. We monitor both the physical graph of the network as well as the service graph. These graphs must often be learned because in some cases information is not made available (due to coalition policy) or because of the dynamics of the missions (service graph) or communication network (physical graph). We collaborate with P5.2 on service graphs. To determine the cause of outages, we must overcome partial information gathered by the monitors. Some information may not be directly learned because of the dynamics of the network and coalition ownership of network components. We infer causes of failures by applying network tomography. We collaborate with P1.1 (TA5) on tomography techniques.

Task 2 - Policy Driven Cross-Coalition Service Provisioning with Humans in the Loop In this task we strive to enable users of a system to be able to quickly understand and specify services and guiding policies in a coalition environment, and then quickly configure specific service instances to meet ad hoc needs. We define a user-oriented policy model that can be understood by non-technical people, and provide feedback to the users on what policies may conflict or preclude a service from being offered. For the policy and service model we use Controlled English (CE). CE is convenient and powerful because it is both human readable and machine processable (supporting knowledge representation and reasoning). This is true for both the specification of policy and configuration, and corresponding

70

feedback generated by analysis. As part of satisfying service configurations we provide support for policy conflict resolution, that is, policy negotiation/relaxation if services are not implementable given the set of policies currently in force. We reuse the ITA Experimentation Framework to determine the performance characteristics of service configuration in the context high mission dynamics. We collaborate with P4.3 (TA6) on the CE aspects of this work.

3.3.2 Research Impact Technical merit: The key technical contributions for the past year include: Managing Distributed Service Deployments in Hybrid Wireless Networks (Task 1): In this area we made contributions in three main areas: robust monitoring, topology inferencing in the context of service deployment in MANETs, and integrated frameworks for diagnosing the cause of degradation in distributed service environments. For robust monitoring, we tackled two problems. In the first we attempt to harvest as much data as we can for management purposes. We show that by using intelligent combinations of epidemic and delay tolerant techniques, we can achieve high availability of time- sensitive management data in a MANET[1]101. In the second case, we focus on gathering data in real- time by monitoring from endpoints using network tomography. We collaborated with P1.1 on developing algorithms to place monitors when using network tomography that would be resilient against network dynamics and failures. Our results show our algorithms provide good diagnosis coverage in the presence of failures[2]102. For topology inferencing we extended work performed on a DTRA project to apply to MANETs and adapted it to service deployments[3]103. Our results show that our algorithms work fast enough to be effective in MANETs. We developed a cross layer framework for isolating faults in a distributed service environment by combining service level monitoring and an approximation of the network topology[4]104. Our work on generating service dependency graphs and isolating faults in a distributed service environment, which were discussed in the last peer review report, were accepted and published105,106. In these works, we developed algorithms for determining inter-dependencies between component services, and developed techniques to isolate faults in a highly dynamic environment based on service dependency graphs.

Policy Driven Cross-Coalition Service Provisioning with Humans in the Loop (Task 2): In this area we made progress in two main areas: autonomous configuration of services, and sharing assets in a coalition environment governed by high-level (human readable) policies. For autonomous configuration, we defined two main algorithms to configure distributed services, one based on cost and the other on

101 P. Novotny, A Wolf, B.K. Ko, “Delay Tolerant Harvesting of Monitoring Data in Mobile Ad Hoc Networks,” in submission. 102 S. Tati, S. Silvestri, T. He, T. F. La Porta, “Robust Network Tomography in the Presence of Failures,” Proc. of IEEE ICDCS, 2014. 103 S. Silvestri, B. Holbert, P. Novotny, T. La Porta, and A. Swami, “Inferring Network Topologies in MANETs Applied to Service Deployment,” Penn State INSR Tech. Report, NAS-TR-0177-2014. 104 S. Tati, P. Novotny, B.J. Ko, A. Wolf, A. Swami, T.F. La Porta, “Diagnosing Degradation of Services in Hybrid Wireless Tactical Networks,” SPIEGround/Air Multisensor Interoperability, Integration, and Networking for Persistent ISR IV, 2013. 105 P. Novotny, A. L. Wolf, and B. J. Ko, “Discovering Service Dependencies in Mobile Ad Hoc Networks,” IFIP/IEEE International Symposium on Integrated Network Management, Ghent, Belgium, May 2013, pp. 527-533. 106 P. Novotny, A. L. Wolf, and B. J. Ko, “Fault Localization in MANET-Hosted Service-Based Systems,” 31st International Symposium on Reliable Distributed Systems, Irvine, California, October 2012, pp. 243-248.

71

information gain. The configuration itself could be performed in a centralized or distributed way[5]107. For asset sharing, we defined two methods of sharing assets in a coalition environment: asset based (derived from a traditional model where access is granted based on asset ownership) and team based (where access is granted based on a user’s team assignment, often associated with a particular mission). We evaluated these approaches for different levels of team and ownership heterogeneity and mobility. Our results show the asset-based model is slightly better than the team based approach, and that the team- based approach is robust against mobility[6]108,109. Synergistic value of collaboration: Several collaborations have been ongoing in the project. In 5.1, PSU, Imperial and ARL have been working together on the cross-layer framework for diagnosing degradation and faults in a distributed service environment and on inferring service topologies. PSU and IBM US, both in P5.1 and P1.1 in TA5, have been collaborating on network tomography approaches to diagnosing degradation in dynamic environments. Imperial and IBM US have been collaborating on learning service dependency graphs, isolating faults using these graphs, and gathering time-sensitive data in MANETs.

RPI, IBM US and IBM UK have collaborated on autonomous service configuration. This includes defining the algorithms and performing evaluation on the ITA Information Fabric. Cardiff and IBM US have worked together on asset sharing models and algorithms. This team has also worked with P4.1 on integrating CE into the asset sharing framework. This team is further working with P6.3 on using CE to define the commander’s objectives.

Imperial and RPI have worked together on jointly treating service composition and deployment.

Scientific challenge: There are several scientific challenges addressed by the work in P5. First, there are the constraints imposed by a tactical network, requiring solutions that are distributed and can work within policy constraints. Three of the most important challenges we face are the dynamics of the network and services, the lack of full knowledge when making decisions and limitations due to policy.

The dynamics arise from the properties of tactical networks (network dynamics) and tactical operations (mission dynamics). These types of dynamics demand that algorithms operate on a short time scale and are robust against changes to network topology. The time-scale is specifically addressed in the work on inferring network topology, diagnosing faults, gathering time-sensitive information, and autonomous service configuration. These algorithms are designed to run fast enough that meaningful results are presented even as networks evolve. Dynamics are also addressed in the work on tomography and autonomous service deployment. For tomography, the algorithms are robust to broken links as may occur in a MANET. For autonomous service deployment, distributed implementations of the algorithms allow it to be robust in the face of a changing network.

107 S. Y. Shah, B. Szymanski, P. Zerfos, C. Bisdikian, C. Gibson, D. Harries, “Autonomous Configuration of Spatially Aware Sensor Services in Service Oriented WSNs,” Proc. IEEE Percom Demos, San Diego, CA, March 22, 2013, pp. 312 - 314. 108 C. Parizas, D. Pizzocaro, A. Preece, and P. Zerfos, “Sharing Smart Environment Assets in Dynamic Multi-Partner Scenarios,” in Proc 6th IEEE/IFIP International Workshop on Management of the Future Internet, 2014. 109 C. Parizas, D. Pizzocaro, A. Preece, and P. Zerfos, “Sharing Policies for Multi-Partner Asset Management in Smart Environments,” in Proc IEEE NOMS, 2014.

72

Partial knowledge is a key characteristic caused by difficulty in collecting information in a dynamic and coalition environment. In particular, the work on topology inferencing, tomography, and building service dependency graphs address partial information. In all cases the algorithms operate without full knowledge of the network in terms of topology or failure characteristics.

The limitations due to policy are directly addressed by the work on asset sharing in P5.2. This work addresses challenges specifically in allocating resources.

Military relevance: The military context for this research is based on the ideas of cross-coalition communities of interest110 and the NATO Edge C2111 model wherein partners in a task-focused team are permitted to readily share information assets. Hybrid wireless networks provide the communication and computational resources underlying the application-level services vital to the mobile, coalition warfighter. In this sense, the primary concern is the availability and performance of those services. The services making up applications are expected to be hosted on a mix of mobile and fixed assets. This is in line with the trend to use warfighters as information gatherers and to enable warfighter devices to execute intelligent apps.

To support decision-making, our integrated policy representation is intended to bridge from relatively high-level decision needs to the generation of solution-specific information from dynamically configured services. The negotiation of policy constraints at or near the network edge reflects the mission-command style of military command which promotes decentralized, delegated freedom of action, leaving subordinates relatively free in terms of how best to achieve their missions.

For tactical networks, we accommodate the fact that devices are becoming more powerful when compared to communications resources by examining distributed services in which all devices may execute part of a service; we also directly address HWNs that use cellular networks and are more stable than MANETs.

For cross-coalition scenarios we address incompatibilities between the language used to express policy at source and that used at the enforcement points; the desire for service provision to reflect, and adjust to, changes in operational needs in a consistent fashion; and the requirement of military users operating at the network end-points to have access to timely and relevant information.

Exploitation & technology transition potential: A portion of the work on topology inferencing and fault diagnosis has been transitioned into a DTRA project, and vice-versa. In particular, as part of the DTRA work, a topology inferencing algorithm was

110 Pham, T., Cirincione, G., Verma, D. and Pearson, G. (2008) “Intelligence, surveillance, and reconnaissance fusion for coalition operations”. in Proc 11th International Conference on Information Fusion. 111 Alberts, D. S., Huber, R. K., & Moffat, J. (2010) “NATO NEC C2 Maturity Model. Command and Control Research Program”, Department of Defense, USA.

73

developed. We have adapted that algorithm to run on a faster scale applied to service deployments. These results will be fed back to the DTRA project.

The ITA Information Fabric is being open sourced as a basis for the Land Open System Architecture (LOSA) Common Open Infrastructure (Land) (COIL) implementation. It is also part of the LOSA COIL standard, UK MOD Defstan 23-14. The service-based deployment of the ITA Information Fabric has been transitioned to the Dstl CCS programme. The Management of Information Processing Services (MIPS) framework continues to be transitioned to Dstl.

One patent was filed on providing composite sensor services112. A second patent disclosure is in process on management of high-level policies.

3.3.3 Technical Accomplishments In task 5.1, we made progress on monitoring tactical networks so that causes of service failure or degradation could be found. This required gathering time-series monitoring data, isolating the performance of links (via tomography), determining topologies of networks (via inferencing) to optimize service deployment, and determining the interplay between service graphs and network topology (cross- layer monitoring).

In task 5.1, “Delay Tolerant Harvesting of Monitoring Data in Mobile Ad Hoc Networks101,”[1] we are concerned with reliably harvesting time bounded, time-sensitive time-series data collected in a mobile ad hoc network (MANET) environment. The data describe the state of the network and hosted applications. Harvesting is used in time-series analyses requiring a global view of distributed monitoring data. The MANET environment challenges data harvesting, due to inherently unstable and unpredictable connectivity and the resource limitations of wireless devices. We present an epidemic, delay tolerant method to improve the availability of time-series monitoring data in the presence of network instabilities, asymmetries, and partitions. The method establishes a network-wide synchronization overlay to incrementally and efficiently move data to intermediate nodes. We have implemented the algorithm in Java EE and evaluated it in the CORE and EMANE MANET emulation environments. Our method can achieve up to 99.8% and 98.8% availability in the grouped and independent movement scenarios, respectively.

In “Robust Network Tomography in the Presence of Failures102,”[2] we study the problem of selecting paths to improve the performance of network tomography applications in the presence of network element failures. For example, links in a MANET can be broken. We model the robustness of paths in network tomography by a metric called expected rank. We formulate an optimization problem to cover two complementary performance metrics: robustness and probing cost. The problem aims at maximizing the expected rank under a budget constraint on the probing cost. We prove that the problem is NP-Hard. Under the assumption that the failure distribution is known, we propose an algorithm called RoMe with

112 Providing a Sensor Composite Service Based on Operational and Spatial Constraints IBM-US, RPI YOR9-2013- 0519

74

guaranteed approximation ratio. However, since evaluating the expected rank is generally hard, we provide a bound which can be evaluated efficiently. We also consider the case in which the failure distribution is not known and propose a reinforcement learning algorithm to solve our optimization problem, using RoMe as a subroutine. This is applicable to the case in which we do not know network characteristics, or there are high network dynamics. We run a wide range of simulations under realistic network topologies and link failure models to evaluate our solution against a state-of-art path selection algorithm. Results show that our approaches provide significant improvements in link identifiability by 5- 6 times over existing approaches in the face of network failures.

In “Inferring Network Topologies in MANETS Applied to Service Discovery103,”[3] we adapt an algorithm for inferring network topologies in the Internet to MANETs. In this algorithm, we probe the network to determine its topology. Given that many tactical networks are coalition networks, we assume partner nodes may not reply with information about their own topology. This leads to holes in the learned network topology. We first augment our partially derived topology with virtual links to restore known connectivity. We then define rules by which these virtual links and nodes and be combined to result in a topology closer to the ground truth. Our ability to accurately infer the network topology is limited by two main factors – the observability of the network (when probing, not all paths are exercised), and the dynamics in the network while the algorithm is running. Our results show that form networks of 50 or so nodes, the algorithm runs fast enough to accurately infer a network topology so that distributed services may be efficiently deployed.

In “Diagnosing Degradation of Services in Hybrid Wireless Tactical Networks104” [4] we consider a problem related to service management and deployment in tactical military networks. We provide a cross- layer framework to detect and diagnose the causes of performance degradation as part of service management; it includes a monitoring model of services and a network model for hybrid wireless networks. In addition, we give a working example in tactical military networks to illustrate our framework. In the monitoring model of services of cross-layer framework, we monitor the performance of services at various clients. At service layer, we utilize infrequent updates from various nodes and service deployment information to derive a dependency graph called service layer dependency graph (SLDG). This dependency graph consists of clients and services as nodes. In the network model of cross-layer framework, we require only an approximated network topology of the tactical networks unlike an accurate network topology in the network-layer based techniques. The major benefit of this cross-layer framework is that we can map the service links and the approximated network topology to infer the causes of unreliable services at network layer.

In task 5.2 we made progress on two important avenues for service construction and deployment. In the first, we determined how to select information assets based on information needs. In the second we determined how assets could best be shared in a coalition environment with heterogeneous device ownership and team membership.

In task 5.2, “Autonomous Configuration of Spatially Aware Sensor Services in Service Oriented WSNs107,”[5] we explore how to dynamically configure complex services that produce highly relevant information to the user’s interest. We focus on the design of a service-oriented system that can capture the concept of relevancy and can compose complex services from services hosted on sensor or mobile nodes. The sensor service configuration is more elaborate form of service composition. The process of service configuration not only performs service composition but also takes into account the implicit and

75

explicit requirements specified by the user of the service and incorporate these requirements into the system as hard and soft constraints. Some of such requirements include, user specified policies and restrictions, spatio-temporal relevancy constraints on the services hosted at the edge of the network and modes of operation of the service. We present and evaluate Cost Based Model (CBM) and Gain Based Model (GBM) approaches to capture the relevancy of services hosted on WSN nodes in composite service configuration. The system is resilient to failures and operates in manual and autonomous recovery modes. The system has three service configuration methods namely, distributed, centralized and hybrid.

In “Sharing Smart Environment Assets in Dynamic Multi-Partner Scenarios108,”[6] we extend our initial work performed in this BPP8. We compare two asset sharing approaches; the first is based on a traditional asset ownership model, while the second (novel) one is based on a formalisation of an edge team-based model where users are grouped into cross-partner teams and as team members they have access to assets belonging to all the partners participating in team. We further experiment with the second, unexplored team-based sharing model by testing its behavior under different user mobility patterns and extreme asset ownership models investigating its impact on MSTA-P, a policy-regulated version of an existing asset-task assignment protocol developed as part of the prior BPP. For the protocol’s evaluation we implement a multi-partner operation scenario using an open source, agent-based and discrete time simulation environment. The algorithm has two phases. In the first, queries are generated, and a list of potential information assets that can be used to respond to the query is formed. In the second phase, the information assets are picked. We conclude that asset-centric model performs better than team-centric but not with a large margin (this is important as, based on evidence in11 there may be other organisational reasons for favouring an edge-based approach, so our result shows that performance degradation is marginal in those cases). We observe that the protocol behaves better when the team moves together as compared to users moving individually, and that team-centric sharing model behaves efficiently in response to extreme ownership models for team’s heterogeneity greater than 75%.

In collaboration between tasks 5.1 and 5.2 we have initiated work on “A Multi-layer Service Management for Sensor Networks”. In this work we combine the dynamic service placement algorithms (Placement Service) developed in Task 5.1 with dynamic service composition algorithms (Configuration Service) developed in Task 5.2. We propose a comprehensive situational aware service management of services in a service oriented systems in which “Placement Service” and “Configuration Service” work together to perform efficient service placement and configuration, in sensor networks. The “Placement Service” positions services on network nodes while optimizing given network-level parameters, such as distance between nodes, bandwidth on links etc. The “Configuration Service” produces efficient composite services contingent to hard constraints defined by policies and soft constraints such as geospatial relevancy to the area of interest. Utilizing a bidirectional feedback mechanism between “Placement Service” and “Configuration Service”, the “Configuration Service” provides composite service graphs (and thus topology) to the “Placement Service” which then migrates services in the network to optimize network level communication among nodes in the composite service graph. Based on the live monitoring data the “Placement Service” continuously looks for optimal placement of services. This work is ongoing.

3.3.4 References for Project 5

[1] P. Novotny, A Wolf, B.K. Ko, “Delay Tolerant Harvesting of Monitoring Data in Mobile Ad Hoc Networks,” in submission. https://www.usukita.org/node/2693

76

[2] S. Tati, S. Silvestri, T. He, T. F. La Porta, “Robust Network Tomography in the Presence of Failures,” Proc. of IEEE ICDCS, 2014. https://www.usukitacs.com/node/2706

[3] S. Silvestri, B. Holbert, P. Novotny, T. La Porta, and A. Swami, “Inferring Network Topologies in MANETs Applied to Service Deployment,” Penn State INSR Tech. Report, NAS-TR-0177-2014. https://www.usukitacs.com/node/2734

[4] S. Tati, P. Novotny, B.J. Ko, A. Wolf, A. Swami, T.F. La Porta, “Diagnosing Degradation of Services in Hybrid Wireless Tactical Networks,” SPIEGround/Air Multisensor Interoperability, Integration, and Networking for Persistent ISR IV, 2013. https://www.usukitacs.com/node/2328

[5] S. Y. Shah, B. Szymanski, P. Zerfos, C. Bisdikian, C. Gibson, D. Harries, “Autonomous Configuration of Spatially Aware Sensor Services in Service Oriented WSNs,” Proc. IEEE Percom Demos, San Diego, CA, March 22, 2013, pp. 312 - 314. https://www.usukitacs.com/node/2276

[6] C. Parizas, D. Pizzocaro, A. Preece, and P. Zerfos, “Sharing Smart Environment Assets in Dynamic Multi-Partner Scenarios,” in Proc 6th IEEE/IFIP International Workshop on Management of the Future Internet, 2014. https://www.usukitacs.com/node/2576

77

3.4 Project 6 - Collective Sensemaking Under Uncertainty

3.4.1 Introduction Project 6 seeks to understand, reason about, and manage the uncertainties that occur in various phases of the end-to-end collective sensemaking process that supports coalition operations. Effective decision- making in coalitions relies on collaborative acquisition of information from hard and soft sources; its dissemination across coalition member boundaries; and eventually its collective analysis in support of the development of knowledge and situation awareness for the tactical tasks at hand. During this process of sensemaking, as information from multiple sources (and of multiple modalities) coalesce for the benefit of analysts and decision makers, care must be taken to accommodate the multiple challenges and operational constraints on the generation, distribution, and analysis processes. These include dealing with uncertainties in the information that may emerge in the information generation and analysis processes, or that are purposefully introduced The challenges also include the mismatch that exists in the high-level semantics in forming queries for desired information at the command level and the lower-level semantics typically associated with the data collected from various coalition sources. Furthermore, the deluge of uncertain data collected should be managed and presented meaningfully to avoid overwhelming its intended recipients (analysts, decision-makers), to facilitate its collaborative analysis, and to guide additional information acquisition. The project comprises the following three tasks: Task 1- Information Fusion and Inference Management under Uncertainties is concerned with reasoning about the uncertainties and risk-value trade-offs that occur for both producers and consumers during information sharing. During the review period the task team developed a constructive theory and associated multi-tiered architecture for systematically obfuscating information for utility-vs.-risk trade-off and deriving bounds on them; generalized the approach to ontology-driven semantic information with associated reasoning framework and decision procedure; and developed a family of operators for fusing uncertain and obfuscated information based on a geometric interpretation of Subjective Logic. Task 2- In-time Assembly and Presentation of Relevant Information in Support of Coalitions Decision Making is concerned with effective information processing and coalition decision support at the tactical edge which is characterized by dynamically changing data, large volume of highly uncertain information from distributed sources, and most crucially the need for time-pressed decisions on the evolving situation. During the review year we developed the scientific foundations of a methodology to enable robust linking and reasoning “from data to decision” and vice versa to solve complex, dynamic coalition problems. Task 3- Collaborative Intelligence Analysis is concerned with individual and collective methods to acquire, evaluate, integrate, and interpret information in order to make intelligence analysis at the network edge and at higher echelons more effective for improved situational awareness. During the review year we developed a framework that uses evidential reasoning, hypothesis-driven crowd-sourcing, and collective intelligence spaces for effective collaboration among partners as they acquire and share intelligence, and bring their distinct expertise and resources to bear towards analyzing information.

3.4.2 Research Impact Technical merit: The key technical contributions of this project for the review period are as follows: • A Semantic Obfuscation Framework which utilizes Subjective Logic to provide a tractable method for deciding what semantic information to share and with whom; how to

78

systematically obfuscate; and how to reason about the uncertainty with which a recipient can make inferences. [Task 6.1] • A constructive framework based on well-defined information theoretic metrics for systematically devising information sharing mechanisms based on obfuscation to achieve desired operating points in a utility-risk-deception space, and to reason about achievable regions in this space. [Task 6.1] • New operators for Subjective Logic (SL), based on a geometric interpretation of SL and on developing variations to the classical deduction operator; to provide improved fusion of uncertain and obfuscated information; and reasoning about them. [Task 6.1] • Dempster-Shafer argumentation schemes for effective formulation and assessment of competing hypotheses, chains of supporting reasons, and chains of attacks among conflicting reasons, and Markov Argumentation Random Fields for rigorous evaluation of hypotheses. [Tasks 6.2 & 6.1] • A subjective ontology-based data access system (SOBDA), which addresses the challenge of representing and aggregating uncertain data at large scale by allowing decision-makers to query large data silos containing uncertain information in terms of a common schema. [Task 6.2] • A principled Integer Linear Programming based framework guaranteed to find optimal plans on known workloads based on live statistics dynamically collected from data sources in order to address bottlenecks of state-of-the-art peer-to-peer federated databases (e.g., GaianDB). [Task 6.2] • A principled model for representing and reasoning about the provenance of information sources and analyses in argumentation schemes, and a means to reason about preferences over evidence on the basis of timeliness, reliability, source trustworthiness, and accuracy of evidence derivation. [Task 6.3] • An integrated Collaborative Intelligence Spaces tool that supports flexible collaborative analysis with rigorous automated reasoning and innovative HUMINT acquisition through argumentation schemes, Evidential Reasoning Service, and Hypothesis-Driven Crowd Sourcing mechanisms. [Task 6.3] • A principled probabilistic model to automatically aggregate noisy and possibly conflicting crowd-sourced information received in spatial event detection tasks while simultaneously inferring the truth of events and reliability of information suppliers in an unsupervised manner. [Task 6.3] Synergistic value of collaboration: The project’s research is highly synergistic and collaborative, leveraging the complementary expertise of the academic, industrial, and government researchers in the team to tie together the crosscutting issues. Collaborations have occurred at multiple scales and include: the Aberdeen, ARL, CMU, IBM UK, IBM US, UCLA (6.1, 6.2) collaboration on developing new Subjective Logic operators; the Aberdeen, CMU, IBM US (6.1, 6.2) collaboration on models of multi- agent probabilistic information disclosure policies; the Aberdeen, IBM US, UCLA (6.1) collaboration on systematic inference management for quantitative and semantic information; the Aberdeen, ARL, CMU, IBM US (6.2, 6.3) collaboration developing Markov Argumentation Random Fields to enable the assessment of probabilities on accepting and rejecting evidence, hypotheses and reasons for collaborative analysis; the Aberdeen, CMU, CUNY (6.2, NS-CTA) collaboration on developing Dempster-Shafer Argument Schemes to represent probabilistic evidence; the Aberdeen, IBM UK collaboration (6.3, P4) on argumentation schemes in Controlled English (CE) based deliberative dialogues113 to support human-

113 A. Toniolo, T. J. Norman, K. Sycara and D. Mott, “On the benefits of argumentation schemes in deliberative dialogue for collaborative decision-making”. ITA Annual Fall Meeting, 2013.

79

machine conversation; the Aberdeen, Honeywell, UCLA (6.3) collaboration on the CISpaces tool; and the Cardiff, IBM UK, UCLA (6.3, P4) collaboration on using CE to express uncertainty in crowd- sourced sensing. Scientific challenge: This project aims at a comprehensive understanding of how large volumes of uncertain sensory information from heterogeneous sources is acquired, shared, and processed as part of collaborative sensemaking in coalitions. The complex interplay of uncertain information quality, dynamic trust relationships, and conflicting interests presents challenges for both the producers and consumers of information. To the consumers of information the computational system must enable the formulation of hypotheses and make decisions with inconsistent, uncertain, and incomplete information from a multiplicity of sources of varied competence and trustworthiness; trace the chains of reasoning associated with hypotheses; and – in the light of new knowledge – reject hypotheses that are no longer plausible or reinstate previously rejected hypotheses that are again plausible. Moreover, this must be done in a manner that permits collaborative decision-making and does not overload cognitive capabilities. To the producers of information the computational system must provide ability to evaluate and understand the implications of sharing various forms of information with recipients of uncertain trustworthiness, side knowledge, computational prowess, and collusion behaviour, and mechanisms for them to modulate the form and quality of information so as to achieve desired level of risk vs. utility. Critical to solving these challenges is for the computational system to bridge the semantic gap between raw data and inferences that can be made from it, and do so for multidimensional quantitative and qualitative data that has high volume, velocity, and variety. In realistic settings, two separate phases must be considered: data to information, where raw data is transformed through aggregation and assessment of uncertainty, and information to analysis, where the information is further filtered and analysed to support decision making. The challenge in the first phase is to resolve conflicting information, taking into consideration the sensitivity of conclusions to the introduction of new data. The second phase utilises the experience of the analyst in the generation and analysis of competing hypotheses from foraged information. The challenge here is to capture reasoning used to construct and critically challenge hypotheses in order to prevent cognitive biases. While both phases must deal with reasoning under uncertainties, the manner in which this reasoning takes place must be tailored to the task at hand. Argumentation provides us with a common means to model the defeasibility of inferences from data, through information, to analysis; similarly, across subtasks, Dempster-Shafer theory and Subjective Logic provide us with a common set of approaches to model probabilistic reasoning. Military relevance: As military operations become more complex, high-pressure, and faster-paced with frequent unforeseen events, coalitions require more effective acquisition and sharing of large volumes of complex data, and collaborative decision-making in the presence of uncertainties. Our research is addressing these needs from several synergistic perspectives: (i) Task 6.1 is providing theoretically-sound mechanisms, efficient architectures, and decision procedures for sharing semantic and quantitative information while quantifying risk-utility trade-offs, guiding sharing decisions, and robustly fusing uncertain trust information; (ii) Task 6.2 is assisting decision-makers in the formulation and evaluation of hypotheses and reasons in complex and dynamic environments, and providing answers for queries issued against uncertain data in a distributed, scalable fashion; and (iii) Task 6.3 is enabling better and quicker analysis via automated competing hypothesis theory and provenance vetting with a more consistent outcome, lower false alarm rate, and more complete, refined and robust analytical products. The technology being developed in this project will be a vital "force multiplier" by enabling higher quality, faster, defendable decision-making, together with better management of information sharing. Exploitation & technology transition potential: With its close ties with researchers at ARL and DSTL, the project researchers are pursuing diverse opportunities for transitioning ideas into exploitable technologies. The inference and trust management methods in Task 6.1 have been realized as a multitiered architecture

80

on top of the ITA Information Fabric with Android mobile devices at the edges114. The HyperArg reasoning engine software115 from Task 6.2 enables argumentation-based decision making while minimizing cognitive overload and integrating information from distributed sources. The CISpaces tool from Task 6.3116 will assist in collaborative generation and resolution of hypotheses to attain situational awareness and determine courses of action. A potential transition vehicle is the MIPS (Management of Information Processing Services) system, which involves the Information Fabric, the Gaian Database, and the CE to enable agile composition of information processing pipelines across a Coalition. The CISpaces tool, the HyperArg reasoning engine software, and the Inference Management Firewall can all be made available as composable services in MIPS. There is the potential to transition Task 6.3 technologies to both military and civilian use through Honeywell’s established Condition Based Maintenance products, and its nascent sensor management and fusion products. Additionally, DSTL expects to transition the CISpaces tool into a multinational applied research programme in support of the 5-eyes intelligence community.

3.4.3 Technical Accomplishments In this section we provide summaries of selected papers resulting from our research representing the specific technical accomplishments of this project. To help the reader appropriately contextualize the papers and their relationships, we start by noting that the research in the project across the three tasks seeks to deliver a comprehensive understanding of how large volumes of uncertain sensory information from heterogeneous sources is acquired, shared, and processed as part of collaborative sensemaking in coalitions. Unified by their shared emphasis on uncertainty, the three tasks focus on complementary themes associated with the preceding objective. The first two tasks build upon prior work started under BPP11, whereas the third task charters a new direction of research by focusing on the collaboration aspect of sensemaking. Collectively the three tasks bring to bear a suite of mathematical methods and tools, such as subjective logic, argumentation logic, evidential reasoning, information theory, crowd-sourced sensing etc., and validate the resulting mechanisms through experimentation with research grade software prototypes. Task 1 examines deeply the sharing quantitative and semantic information from the perspective of both information producers and consumers. It approaches its objective of information fusion and inference management under uncertainties through systematic control of risk, value, and detectability from the perspective of information producers, while using tools from information theory, subjective logic, and ontology-based reasoning. Task 2 complements the first task by focusing on the decision maker at the edge of the network. To address the challenges of uncertainty and information overload when making decision using data from disparate sources in a dynamic context, it brings to bear an argumentation based decision support system, frameworks for representing and aggregating over uncertain data, and the efficient, peer-to-peer evaluation of queries. Finally, Task 3 complements both Tasks 1 and 2 by focusing on the role of collaboration during sensemaking, and brings to bear argumentation schemes and automated reasoning to help structure evidence during sense-making; crowd- sourcing to harness diverse expertise; and collective intelligence spaces across a coalition that take into account resource constraints.

114 S. Pipes, S. Chakraborty, “Multitiered Inference Management Architecture for Participatory Sensing,” Crowdsensing (Workshop in conjunction with IEEE PerCom), 2014. (https://www.usukitacs.com/node/2586) 115 Y. Tang, K. Sycara, F. Cerutti, J. Z. Pan, A. Fokoue. Demo. ITA Annual Fall Meeting, 2013. 116 A. Toniolo, T. Dropps, R. W. Ouyang, J. A. Allen, D. P. Johnson, T. J. Norman, N. Oren and M. Srivastava, “Argumentation-based collaborative intelligence analysis in CISpaces”. In Proc. of the 5th Intl. Conf. on Argument and Computation (Demo track), Pitlochry, Scotland, Sept 2014. (https://www.usukitacs.com/node/2703)

81

Figure 1: Overview of Project 6 Research Agenda Figure 1 shows the relationships among the three tasks. Given the crosscutting themes underlying the three tasks, a lot of the research and many of the resulting papers span multiple tasks. For the sake of clarity of exposition, the papers are classified under the task where they are the best fit.

3.4.3.1 Task 1: Information Fusion and Inference Management under Uncertainties Papers [1],[2],[3] “Protecting Data Against Unwanted Inferences”117, “Obfuscation of Semantic Data: Restricting the Spread of Sensitive Information”118, and “Multitiered Inference Management Architecture for Participatory Sensing”119 These three papers are representative of the comprehensive research undertaken by Aberdeen, IBM, and UCLA in Task 6.1 on investigating the challenging problem of sharing of information from the perspective of risk-value trade-off as faced by the producer of information. Previously during BPP11 we introduced120 the concept of information producers exercising control over the inferences that the recipients can make by specifying allowable and sensitive inferences as whitelists and blacklists of inferences respectively. While introducing a systematic risk vs. utility space in which to reason about information sharing from a producer’s perspective, our early research was restricted in the type of information it dealt with, ad hoc in the mechanisms it employed (termed as “obfuscation” or “masking”) to manage the risk vs. utility trade-off, and lacked appropriate quantifiable metrics. During the first year of BPP13, we have made significant advances in key aspects of the inference management problem that collectively address the aforementioned limitations.

117 S. Chakraborty, N. Bitouze, M. Srivastava, L. Dolecek, “Protecting Data Against Unwanted Inferences,” IEEE Information Theory Workshop, 2013. (https://www.usukitacs.com/node/2364) 118 F. Cerutti, G. R. de Mel, T. J. Norman, N. Oren, A. Parvizi, P. Sullivan and A. Toniolo, "Obfuscation of Semantic Data: Restricting the Spread of Sensitive Information," Submitted to 27th International Workshop on Description Logics, 2014. (https://www.usukitacs.com/node/2670) 119 S. Pipes, S. Chakraborty, “Multitiered Inference Management Architecture for Participatory Sensing,” Crowdsensing (Workshop in conjunction with IEEE PerCom), 2014. (https://www.usukitacs.com/node/2586) 120 S. Chakraborty, K. R. Raghavan, M. Srivastava, C. Bisdikian, L. Kaplan, “Balancing Value and Risk In Information Sharing Through Obfuscation,” IEEE 15th Intl. Conference on Information Fusion (FUSION), 2012.

82

First, in Paper [1] (a collaboration with a US Intelligence Advanced Research Project Agency project at UCLA) we develop an informational theoretic framework to study the competing goals of utility and risk of information leakage as they arise when a provider shares information with a recipient as part of a coalition operation or to get a service. We formally defined utility and risk parameters using information- theoretic notions and derived a bound on the region spanned by these parameters, provided constructive schemes for achieving certain boundary points of this region, and improved the region by sharing data over aggregated time slots. Our work motivates multiple facets of the problem for further study, e.g., the effect of side channel information and correlation between the shared data samples over time. Tools from coding theory may also prove valuable in designing schemes in order to achieve better trade-off points, especially in the case of multiple time slots. In recent work we have expanded the trade-off space beyond utility and risk to also consider detectability (or, equivalently, deceptiveness) – i.e. the probability that the information recipient may detect the obfuscation, and thus lowering its trust in the information provider and affecting future transactions. Recently we have developed iDeceit 121, a framework that systematically substitutes sensitive data segments with synthetic data while ensuring the statistical plausibility of the entire information stream. Formally, the problem is formulated as one of finding an alternative trajectory through a state space model over time that is statistically plausible while avoiding the states designated as sensitive. Besides developing the theoretical machinery, we have also created a proof-of-concept realization for real-time sharing of location traces from Android smartphones. Second, in Paper [2], we extend the scope of our work from quantitative information such as time-series of sensory measurements to qualitative information where concepts are linked by an ontology that provides a semantic description of the domain. The paper introduces a Semantic Obfuscation Framework which provides a computationally effective machinery for deriving a measure of “closeness” between two semantic concepts which can be then exploited for (i) deciding what information to share and with whom, (ii) how to systematically obfuscate concepts, and (iii) how to reason about the uncertainty, expressed using the Subjective Logic formalism, with which the recipient of obfuscated information can make various inferences. The framework also enables representation of contextual knowledge when reasoning about the obfuscation process, and has utility beyond the military domain, which has been the primary target case study, to domains such as as social networks. Lastly, in Paper [3] we explore how inference management might be systematically realized both at the core of a networked information fabric as well as in at-the-edge devices. Specifically, the paper describes a multitiered architecture for realizing an inference management firewall (IMF) that employs context- aware information masking techniques for systematic management of risk-vs.-value trade-off of sensor data. This architecture combines two complementary assets into a single, integrated solution: (i) an implementation of an approach to treating potential inferences from shared data as primitives with which to reason about consumer knowledge (ipShield122); and (ii) an information-sharing middleware that enables efficient, robust and policy-managed exchange of information between a diverse range of network devices (ITA Information Fabric). This architecture will enable a network-wide view of policy management and a finer control to be exercised over risk and utility trade-off than is presently possible. The paper identifies key requirements, including: (i) binding of policy to information flow to ensure consistent application throughout the network; (ii) resilience to threats in the network to information confidentiality, integrity and availability; and (iii) the need for an expressive policy language for enforcement of information sharing rules both within the network and at network endpoints.

121 S. Chakraborty, “Balancing Behavioral Privacy and Information Utility in Sensory Data Flows,” Ph.D. Thesis, UCLA Electrical Engineering, 2014. (https://www.usukitacs.com/node/2643) 122 S. Chakraborty, C. Shen, K. R. Raghavan, M. Millar, M. Srivastava, “ipShield: A Framework For Enforcing Context-Aware Privacy,” USENIX NSDI, 2014. (https://www.usukitacs.com/node/25870)

83

Paper [4] (joint with Task 6.2) “Reasoning About the Impacts of Information Sharing”123 Whereas Papers [1-3] focus on the mechanisms that help information producers enforce desired utility- risk trade-offs during information sharing, Paper [4] addresses a complementary aspect of the risk-value trade-off faced by information producers during sharing, namely what is the risk-value trade-off presented when specific information is shared. Typically, information is shared between entities in order to achieve joint goals, or to allow allies to achieve their goals. However, information can be intercepted by malicious entities, causing harm to the information provider and the coalition. This paper identifies what information should be shared to maximise utility. It considers information sharing as taking place over a communications networks, with edges representing communications links between nodes representing agents. These nodes have a likelihood of sending a message onto others (depending on the exact nature of the message), and the receipt of a message by an agent has a positive or negative impact on utility. After formalising the problem, the paper describes a decision procedure allowing an information transmitter to determine precisely what messages are best for it to send. Our model and decision procedure captured interesting features including the ability to provide misinformation, and to send information along multiple paths in order to transmit it in the best possible manner. Paper [5] “Subjective Logic Operators in Trust Assessment: An Empirical Study”124 Paper [5] looks at the information sharing problem from the perspective of the recipient, and specifically at the issue of trust. A trust and reputation system seeks to associate a level of trust with information based on the level of trust placed in the information providers. To do so, the information provider trust must be fused into a single rating, with which the information's reliability can be discounted. Subjective Logic (SL) provides a popular framework in which to compute trust, as it considers not only levels of belief and disbelief in information, but also the uncertainty associated with this information. These values are combined into an opinion, and Subjective Logic provides various operations over such opinions. This paper identifies certain desiderata which Subjective Logic operators should satisfy in a trust system, and demonstrates that existing operators fail to do so. It then describes a family of operators based on a geometric interpretation of Subjective Logic which do meet the desiderata. The paper evaluates these new operators, and identifies situations in which they should be used instead of the existing operations. Another of our papers125 also explored alternatives to the classical SL deduction operator. Given antecedent and consequent propositions, the new operators form opinions of the consequent that match the variance of the consequent posterior distribution given opinions on the antecedent and the conditional rules connecting the antecedent with the consequent. As a result, the uncertainty of the consequent maps to the spread of the probability projection of the opinion. Monte Carlo simulations demonstrated this

123 Y. Tang, F. Cerutti, N. Oren, C. Bisdikian, "Reasoning about the Impacts of Information Sharing," Accepted in Information Systems Frontiers journal, Springer, 2014. (https://www.usukitacs.com/node/2700). Conference version: C. Bisdikian, Y. Tang, F. Cerutti, N. Oren, "A Framework for Using Trust to Assess Risk in Information Sharing," Proceedings of the Second International Conference of Argument Technologies (AT 2013), pp 135-149, Beijing, China, August 1-2, 2013. (https://www.usukitacs.com/node/2335). 124 F. Cerutti, L. M. Kaplan, T. J. Norman, N. Oren and A. Toniolo, "Subjective Logic Operators in Trust Assessment: an Empirical Study," Accepted for Information Systems Frontiers Journal, Springer, 2014. (https://www.usukitacs.com/node/2559). Conference version: F. Cerutti, A. Toniolo, N. Oren, and T. J. Norman, “An Empirical Evaluation of Geometric Subjective Logic Operators,” Proceedings of the Second International Conference of Argument Technologies (AT 2013), pp 90-104, Beijing, China, August 1-2, 2013 (https://www.usukitacs.com/node/2334). 125 L. Kaplan, M. Sensoy, Y. Tang, S. Chakraborty, C. Bisdikian, G. de Mel, “Reasoning Under Uncertainty: Variations of Subjective Logic Deduction,” IEEE 16th International Conference on Information Fusion (FUSION), 2013. (https://www.usukitacs.com/node/2345)

84

connection for the new operators and evaluated the quality of fusing opinions from multiple agents before and after deduction.

3.4.3.2 Task 2: In-time Assembly and Presentation of Relevant Information in Support of Decision Making Paper [6] “Reasoning about uncertain information and conflict resolution through trust revision”126 Uncertainty is a core feature of many domains in which agents are expected to operate. In such environments, information sources such as sensors provide agents with state information. However, in the context of a military coalition environment, different sets of sensors can be controlled by different coalition partners, each with their own capabilities. In order to reason about the state of the environment, an agent must therefore request potentially noisy, incomplete, or misleading information from potentially untrustworthy agents. By obtaining information from multiple sources and performing information fusion, the agent can build up a more accurate representation of its environment than by utilizing its own sensors alone. Various researchers have examined aspects of this problem --- work in computational trust (e.g. A. Jøsang et al127) is intended to allow an agent to determine which other agents should be asked for information, while work on information fusion (e.g. A. Rogers et al128) examines how incomplete and noisy information from different sources, i.e., sensors, should be combined in order to obtain a true picture of the environment. However, neither work in isolation considers both sources of uncertainty in the domain, namely trustworthiness of information sources as well as incompleteness and vagueness of the provided information. We have made three key technical contributions to addressing the problem of reasoning about uncertain information and conflict resolution: • First, we have combined Description Logics (DLs) and Dempster-Shafer theory (DST) to create a computationally tractable framework for reasoning about uncertain information obtained from different agents. This framework enables us to detect conflicts in uncertain information due to constraints imposed by the domain. It also enables us to resolve such conflicts. • Second, we have shown how trust in uncertain information can be revised when additional information is received. We model this problem as an optimization problem, and propose heuristics that allow us to identify high quality solutions. • Finally, through simulations, we show that our approach can successfully handle highly misleading information in challenging settings. The simulations also show that the approach is robust in the face of strategic liars.

126M. Sensoy, A. Fokoue, J. Z.Pan, T. J. Norman, Y. Tang, N. Oren, and K. Sycara: “Reasoning about uncertain information and conflict resolution through trust revision”. Proceedings of the 2013 International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'13). 2013. (https://www.usukitacs.com/node/2221) 127A. Jøsang and R. Ismail. The beta reputation system. In Proc. of the 15th Bled Electronic Commerce Conference e-Reality: Constructing the e-Economy, pages 48–64, 2002. 128A. Rogers, R. K. Dash, and N. R. Jennings. Computational mechanism design for information fusion within sensor networks. In Proceedings of The 9th International Conference on Information Fusion, 2006.

85

We are currently extending our framework by developing the theoretical foundations required to implement a Subjective Ontology-Based Data Access (SOBDA) system129. Querying through an SOBDA system allows the agents to obtain answers to queries issued in the vocabulary of a common schema, rather than in the vocabulary of every specific data source. This allows for a seamless integration of multiple sources of uncertain data. Paper [7] (joint with Task 6.1 and NS-CTA) “Dempster-Shafer Argument Schemes”130 This paper introduces a class of argument schemes for Dempster-Shafer rules on reasoning about uncertain information, using critical questions to capture the assumptions behind the rules, tracking symbolic Dempster-Shafer Evidence131. A rich body of work concerns itself with reasoning with uncertainty. Less attention has been given to the dual of this approach, namely reasoning about uncertainty. More specifically, while most existing frameworks determine how uncertain some conclusion is, given some uncertain premises and rules for processing this uncertainty, little work addresses which rules should be applied in what context in order to obtain correct conclusions in the presence of uncertain information. The ability to argue both with, and about uncertainty leads to a complete, extensible system for uncertain reasoning. Here, the proposed Dempster-Shafer Argument schemes provide foundations for a system of argumentation that combines logical reasoning and the Dempster-Shafer theory. The system allows arguments to be constructed from facts expressed in a predicate logic, each of which has a numerical measure associated with it expressed using the Dempster-Shafer theory. Rules capable of dealing with uncertainty are applied to uncertain facts, resulting in new facts with an associated level of uncertainty. These facts in turn could be used to determine which rules should be applied, and with what confidence. The process would repeat over these additional rules and inferences. Such a system identifies likelihoods of conclusions both due to uncertainty regarding facts in the world, and differences (again due to uncertainty) regarding which reasoning rules should be applied. Paper [8] (joint with Task 6.1 and Task 6.3) “Argumentation Random Field”132 This paper proposes a new approach to integrating probabilities and argumentation, based on Markov Random Fields (MARFs), and establishing a probabilistic measure over the compatibility of the formulated hypotheses along with their reasons by taking into account all the available knowledge and data. It builds on the connection between conditional independence and the acceptability status of arguments. MARFs seamless integrate symbolic argumentation with probabilistic graphical models for formulating hypotheses from inconsistent, uncertain and incomplete knowledge and facts. MARFS compile knowledge, in the form of inference rules for deriving new conclusions and defeat rules for rejecting conclusions and their reasons into a graphical model of Markov Random Fields (MRFs). The resulted MRFs can then compute the potentials and probabilities of the formulated hypotheses and their reasons, as well as decisions on accepting or rejecting these hypotheses and their reasons. The established potentials are essentially measures of the compatibility of the formulated hypotheses along with their

129J. Garcia, A. Fokoue, J. Z. Pan, T. J. Norman, Y. Tang, N. Oren, and K. Sycara: “Ontology-Based Data Access over Subjective Ontologies” submitted to ISWC 2014 (Doctoral Consortium Track) (https://www.usukitacs.com/node/2696) 130Y. Tang, N. Oren, S. Parsons and K. Sycara, "Dempster-Shafer Argument Schemes," Tenth International Workshop on Argumentation in Multi-Agent Systems, May 2013. (https://www.usukita.org/node/2343) 131 Y. Tang, C-W Hang, S. Parsons and M. P. Singh," Dempster-Shafer Argument Schemes," Fourth International Conference on Computational Models of Argument, September 2012. (https://www.usukita.org/node/2224) 132Y. Tang, A. Toniolo, K. Sycara, and N. Oren, "Argumentation Random Field," Eleventh International Workshop on Argumentation in Multi-Agent Systems, May 2014. (https://www.usukita.org/node/2635). Conference Version: Y. Tang, N. Oren, A. Toniolo, and K. Sycara, "Markov Argumentation Random Field," Submitted to COMMA 2014. (https://www.usukitacs.com/node/2686)

86

reasons by taking into account all the available knowledge and data. The probabilities are then computed by normalizing the computed potentials over all possible hypotheses, reasons and their acceptability (accepted, rejected, and undecided) assignments. Furthermore the resulting model can be lifted to enable learning knowledge from data as well as evaluating the sensitivity of conclusions against changing environments. While the practice of linchpin analysis – formulating and assessing competing hypotheses and their chains of reasoning – in the context of intelligence analysis is based on a list of human actionable guidance without a formal model, MARFs establish a rigid mathematical foundation towards a computational model to enable effective decision supports for linchpin analysis. Paper [9] “An Offline Optimal SPARQL Query Planning Approach to Evaluate Online Heuristic Planners”133 An empirical evaluation of a system for decentralized ontological reasoning over GaianDB has identified important performance bottlenecks for typical reasoning query workloads134. To address them, we have extended the heuristic-based planner of DB2RDF135 to account for network cost, and we have also developed a principled approach to evaluate and improve heuristic based query planners. Obtaining good performance for declarative query languages in a centralized or distributed setting requires an optimized total system, with an efficient data layout, good data statistics, and careful query optimization. One key piece of such systems is a query planner that translates a declarative query into a concrete execution plan with minimal cost. This problem has been extensively studied - in particular, in the relational database literature. The traditional solution builds a cost-model that, based on data statistics, is able to estimate the cost of a given query execution plan. However, since the number of execution plans can be extremely large, only a small subset of all valid plans are constructed (using heuristics and/or greedy approaches that consider plans likely to have a low cost). The cost of those selected candidate plans are then estimated using the cost-model, and the cheapest plan is selected for execution. The chosen plan is a local optimum and not guaranteed to be globally optimalal. Even with sub-optimal plans, the performance of an optimizer is still considered satisfactory if it performs better (in terms of evaluation times) when compared to other competing optimizers. Yet, there is an alternative metric to measure how well the optimizer performs: how far its local optimal plans are from global optimal plans. However, finding a global optimal is challenging and is one of the reasons why heuristic planners were devised in the first place. To the best of our knowledge, there is no practical mechanism for assessing how good these planners are, i.e. whether they produce optimal plans given the data layout and statistics available. Our main contribution is an efficient offline technique to find optimal plans for the graph query planning problem (in a centralized or decentralized environment). The problem is NP-hard, as shown by prior work. Our approach to find optimal plans works by casting it as an integer programming (or ILP) problem. Although ILP is also known to be NP-hard in the worst case, in practice, highly optimized solvers exist to efficiently solve our formulation of the query planning problem (see section 4). Furthermore, we show that our ILP formulation can be used to evaluate the effectiveness of any greedy/heuristic planning solution. In fact, the two approaches can be potentially combined, with the ILP

133A. Fokoue, M. Bornea, J. Dolby, A. Kementsietsidis, and K. Srinivas, “An Offline Optimal SPARQL Query Planning Approach to Evaluate Online Heuristic Planner,” To appear at the 15th International Conference on Web Information System Engineering (WISE 2014). (https://www.usukitacs.com/node/2685) 134P. Ravindra, A. Fokoue, P. Dantressangle, J. Z. Pan, P. D. Stone, K. Sycara, Y. Tang, “Peer-to-Peer Ontological Reasoning in Military Coalition Data Networks”, ITA Fall Meeting 2013. (https://www.usukitacs.com/node/2501) 135M. A. Bornea, J. Dolby, A. Kementsietsidis, K. Srinivas, P. Dantressangle, O. Udrea, B. Bhattacharjee, “Building an efficient RDF store over a relational database,” SIGMOD Conference 2013: 121-132: http://dl.acm.org/citation.cfm?doid=2463676.2463718.

87

formulation being used to precompile specific queries that may occur frequently within a workload, or test the heuristic solution to find how far away it is from optimal.

3.4.3.3 Task 3: Collaborative Intelligence Analysis Paper [10] “Making informed decisions with provenance and argumentation schemes”136 A key technical focus of this research is to develop means through which collaborative analysis can be informed by the provenance of both evidence and the analysis process. In this paper we presented the first principled method of linking provenance data with the analysis of competing hypotheses through argumentation schemes. We employ the W3C standard PROV Data Model (PROV-DM) to represent provenance information137. The PROV-DM enables the representation of entities (e.g. SPOT reports, results of crowd-sourced queries), activities (e.g. generation of a SPOT report, issuing of a query to a targeted population) and actors (e.g. Commander Special Operations Team A, all coalition patrols operating within area B in last 24hrs), along with information such as when reports were created. The model we have developed enables analysts to evaluate hypotheses while exploring and assessing the provenance of supporting evidence through: • An argumentation scheme that explicitly introduces provenance data in the reasoning process. This scheme refines the argumentation scheme from expert opinion, which is widely used in legal domains. • An argumentation scheme for defining provenance-based preferences. This scheme provides a principled means to express preferences over elements of a provenance record to enable an analyst to explore and declare orderings over evidence on the basis of their timeliness, source trustworthiness, the reliability of activities generating evidence, and distance from primary sources. We use stereotypical patterns in provenance graphs suitable for these kinds of reasoning that capture primary sources, the generation of new information entities (such as through analysis by an expert), and the goal(s) of analyses (intelligence requirements). These knowledge representation and reasoning mechanisms have been implemented in the Evidential Reasoning Service (ERS), which is integrated with the Collaborative Intelligence Spaces (CISpaces) sense-making infrastructure. Paper [11] “CISpaces, towards a system for online collaborative argument construction and debate”138 The contribution represented by this paper is a framework that provides virtual spaces for assisting individual and collaborative sense making. We discuss how our approach, combining argumentation schemes, evidential and provenance reasoning, and hypothesis-driven crowd sourcing offers the means to support on-line debate in a wide range of application contexts. In addition to the support of collaborative intelligence analysis in coalitions, we argue that this approach is directly applicable to domains including e-participation in online policymaking, and exploiting shared intelligence to address business or scientific innovation challenges. A key distinguishing factor of our approach, and one that makes it uniquely applicable to such a variety of domains, is in how we integrate structured, collaborative analysis and

136 A. Toniolo, F. Cerutti, N. Oren, T. J. Norman and K. Sycara, “Making informed decisions with provenance and argumentation schemes”. In Proceedings of the 11th International Workshop on Argumentation in Multi-Agent Systems, Paris, France, May 2014. . https://www.usukitacs.com/node/2629 137 http://www.w3.org/TR/prov-dm/ 138 A. Toniolo, T. J. Norman, N. Oren, R. W. Ouyang, M. Srivastava, T. Dropps, J. A. Allen, D. P. Johnson and G. De Mel, “CISpaces, towards a system for online collaborative argument construction and debate”. In Proceedings of Arguing on the Web 2.0, Amsterdam, The Netherlands, June 2014. (https://www.usukitacs.com/node/2687)

88

crowd sourcing. Further, the use of crowd-sourced opinions in the sense-making process brings important challenges in the detection, mitigation and modelling of bias. To integrate crowd-sourced information into structured analysis of competing hypotheses, we exploit the argument scheme from a generally accepted opinion. This scheme is used to represent the defeasible inference that some statement A can plausibly be take to be true if a significant majority in a group accepts that A is true. As with other argument schemes (e.g. expert opinion, cause to effect, and the provenance-based schemes introduced in Paper [11]), this inference may be undermined through critical questions that, for example, focus on potential biases in the opinions gathered. The explicit modelling of biases through the hypothesis-driven crowd sourcing service provides a rigorous underpinning for this reasoning. We use a combination of signature analysis, comparison of overlapping data from different members of the crowd, and test tasks with known correct responses to detect bias and probabilistically mitigate it. This combination of techniques enables evidence from the crowd to be considered by analysts in concert with other forms of evidence in a principled manner (see Figure 2).

DEBATE WITH OTHER USERS Goal: has heroin ChatBox consumption increased? debate

PROVENANCE In Zurich, the CROWD SOURCING consumption of heroin has been legalised in Plaztspitz Park Node 1: ..... Was Generated by Action: Compile at Time: 2012-03-02T20:15:00 + con Was Associated with Author: User2 consequence consequence ..... + Node 2: CON Was Generated by Action: Compile (critical question) at Time: 2012-03-02T20:18:00 Was Associated with Author: User1 pro Less people were Estimated Probability: The prices of is there any other - attracted by the pro heroin went down consequence? curiosity of trying True (24%), False (76%) illegal products Node 1 preferred to Node 2 + as it is more TIMELY SUPPORT consequence (warrant) expert opinion

PROVENANCE CROWD-SOURCING The heroin Is there any provenance criteria The heroin demand - DEBATE opposite consumption has that permits us to solve this has increased Claim to verify: is it true conflict? decreased that less people are attracted to consumption?

Figure 2: An example of integrating crowd-sourced and other evidence in on-line debate through CISpaces Paper [12] on “Truth Discovery in Crowdsourced Detection of Spatial Events”139 The ubiquity of smartphones has led to the emergence of mobile crowdsourcing tasks such as the detection of spatial events when smartphone users move around in their daily lives. However, the credibility of those detected events can be negatively impacted by unreliable participants with low-quality data. Consequently, a major challenge for quality control is to discover true events from diverse and noisy participants' reports. This truth discovery problem is uniquely distinct from its online counterpart in that it involves uncertainties in both participants' mobility and reliability. Unfortunately, these two types of uncertainties cannot be easily decoupled unless continuous location tracking, which raises privacy and

139 R. W. Ouyang, M. Srivastava, A. Toniolo, and T. J. Norman, "Truth Discovery in Crowdsourced Detection of Spatial Events," ITA Technical Report, 2014. (https://www.usukitacs.com/node/2637)

89

energy issues, is performed. In Paper #13, we propose a new method to tackle this truth discovery problem through principled probabilistic modeling. In particular, we integrate the modeling of participants' location visit indicators, truth of events, location popularity and three-way participant reliability in a unified framework. The proposed model is thus capable of efficiently handling various types of uncertainties and automatically discovering true events without any supervision or the need of continuous location tracking. Experimental results on both real-world and synthetic datasets demonstrate that our proposed method outperforms existing state-of-the-art truth discovery approaches in the mobile crowdsourcing environment.

3.4.4 References for Project 6 [1] Supriyo Chakraborty, Nicolas Bitouze, Mani Srivastava, Lara Dolecek, “Protecting Data Against Unwanted Inferences,”IEEE Information Theory Workshop, 2013. (https://www.usukitacs.com/node/2364)

[2] Federico Cerutti, Geeth R. de Mel, Timoty J. Norman, Nir Oren, Artemis Parvizi, Paul Sullivan and Alice Toniolo, "Obfuscation of Semantic Data: Restricting the Spread of Sensitive Information," Submitted to 27th International Workshop on Description Logics, 2014. (https://www.usukitacs.com/node/2670)

[3] Stephen Pipes, Supriyo Chakraborty, “Multitiered Inference Management Architecture for Participatory Sensing,”Crowdsensing (Workshop in conjunction with IEEE PerCom), 2014. (https://www.usukitacs.com/node/2586)

[4] Yuqing Tang, Federico Cerutti, Nir Oren, Chatschik Bisdikian, "Reasoning about the Impacts of Information Sharing," Accepted in Information Systems Frontiers journal, Springer, 2014. (https://www.usukitacs.com/node/2700). Conference version: Chatschik Bisdikian, Yuqing Tang, Federico Cerutti, Nir Oren, "A Framework for Using Trust to Assess Risk in Information Sharing," Proceedings of the Second International Conference of Argument Technologies (AT 2013), pp 135-149, Beijing, China, August 1-2, 2013. (https://www.usukitacs.com/node/2335).

[5] Federico Cerutti, Lance M. Kaplan, Timothy J. Norman, Nir Oren and Alice Toniolo, "Subjective Logic Operators in Trust Assessment: an Empirical Study," Accepted for Information Systems Frontiers Journal, Springer, 2014. (https://www.usukitacs.com/node/2559). Conference version: Federico Cerutti, Alice Toniolo, Nir Oren, and Timothy J. Norman, “An Empirical Evaluation of Geometric Subjective Logic Operators,”Proceedings of the Second International Conference of Argument Technologies (AT 2013), pp 90-104, Beijing, China, August 1-2, 2013 (https://www.usukitacs.com/node/2334).

[6] Murat Sensoy, Achille Fokoue, Jeff Z.Pan, Timothy J. Norman, Yuqing Tang, Nir Oren, and Katia Sycara: “Reasoning about uncertain information and conflict resolution through trust revision”. Proceedings of the 2013 International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'13). 2013. (https://www.usukitacs.com/node/2221)

[7] Yuqing Tang, Nir Oren, Simon Parsons and Katia Sycara, "Dempster-Shafer Argument Schemes," Tenth International Workshop on Argumentation in Multi-Agent Systems, May 2013. (https://www.usukita.org/node/2343)

[8] Yuqing Tang, Alice Toniolo, Katia Sycara, and Nir Oren, "Argumentation Random Field," Eleventh

90

International Workshop on Argumentation in Multi-Agent Systems, May 2014. (https://www.usukita.org/node/2635)

[9] Achille Fokoue, Mihaela Bornea, Julian Dolby, Anastasios Kementsietsidis, and Kavitha Srinivas: “An Offline Optimal SPARQL Query Planning Approach to Evaluate Online Heuristic Planner”Submitted to 15th International Conference on Web Information System Engineering (WISE 2014). (https://www.usukitacs.com/node/2685)

[10] A. Toniolo, F. Cerutti, N. Oren, T. J. Norman and K. Sycara, “Making informed decisions with provenance and argumentation schemes”. In Proceedings of the 11th International Workshop on Argumentation in Multi-Agent Systems, Paris, France, May 2014. https://www.usukitacs.com/node/2629

[11] A. Toniolo, T. J. Norman, N. Oren, R. W. Ouyang, M. Srivastava, T. Dropps, J. A. Allen, D. P. Johnson and G. De Mel, “CISpaces, towards a system for online collaborative argument construction and debate”. In Proceedings of Arguing on the Web 2.0, Amsterdam, The Netherlands, June 2014. (https://www.usukitacs.com/node/2687)

[12] Robin Wentao Ouyang, Mani Srivastava, Alice Toniolo, and Timothy J. Norman, "Truth Discovery in Crowdsourced Detection of Spatial Events," ITA Technical Report, 2014. (https://www.usukitacs.com/node/2637)

91

4. Experimentation Framework

4.1 Overview

Research in network sciences can greatly benefit of experimentation in a emulation environment where most, if not all, the parameters can be controlled and experiments can be easily reproduced to enable the comparison of results from multiple executions of the same experiment. Experimentation in an emulation environment is a valuable tool to the validation, verification and comparison of theoretical concepts, algorithms, technologies and methodologies. It facilitates the testing of technologies and deployment scaling under more realistic constraints, thus exposing gaps in the theory that may need special attention and identifying additional areas where the research might focus and develop.

Currently, researchers can only gain access to a small number of purpose built facilities with specific combination of tools for experimentation, but subject to restrictions and limitations of various types. To overcome some of these problems, research conducted under ITA defined a common and extensible framework, namely the ITA Experimentation Framework, for rapid prototyping and experimentation. That research has been extended through a collaboration between ARL and IBM UK to define the concept of Experimentation-as-a-Service (EaaS) that enables the provisioning of reconfigurable, ad hoc and on demand experimentation environments, and design and implement the framework to accomplish these goals.

4.2 Experimentation as a Service – The Reference Framework for Experimentation

The EaaS framework comprises a layered architecture and a software stack that enables the provisioning of reconfigurable ad hoc and on demand experimentation environments. It is based on standard hardware and open standards software and also allows for the integration of externally connected Commercial off- the-shelf (COTS) resources and assets such as sensors, mobile devices, radio bearers, databases and application, all interconnected via real and emulated networks.

The framework layered architecture (Figure 1) comprises a set of hardware and software components grouped in functional layers: (a) Operating System Platform; (b) Emulation and Orchestration; (c) Data Management; (d) Services and (e) Decision. Cutting across all the layers are the Monitoring, Management, and Visualization tools.

92

Figure 1 - EaaS Framework layered architecture The Operating System Platform and the Emulation and Orchestration layers, and the Monitoring, Management and Visualization tools provide the core functionality for experimentation. The remaining layers, Data Management, Services and Decision, are functional place holders for research implementations of the theoretical output and also other more mature technologies developed as part of other research programs.

4.2.1 Operating System Platform Layer The Operating System Platform layer is a pluggable component responsible for the provisioning of the virtualized cluster/cloud based and/or bare metal hardware resources in support of a given experiment. The EaaS framework has been designed to easily integrate with open standards and proprietary provisioning technologies.

Virtualized cluster or cloud-based resources provide the ability to perform multiple experiments simultaneously, but special attention must be given to issues like unbalanced workloads and undesirable network crosstalk. In the context of this research, the EaaS framework makes use of ARL's Dynamically Allocated Virtual Clustering Management System (DAVC)140

DAVC is an experimentation infrastructure that provides the means to dynamically create, instantiate and manage virtual clusters within a cloud computing environment based on resource utilization and availability. DAVC configures the software defined network components of the cloud to avoid network crosstalk.

4.2.2 Emulation and Orchestration Layer The Emulation and Orchestration layer is based on a Linux open source software stack that includes the tools for mobile ad-hoc network emulation and network discrete-event simulation combined with ITA

140 K. Marcus and J. Cannata, “Dynamically Allocated Virtual Clustering Management System" in SPIE Defense Sensing and Security, pp. 87420X-87420X-10, 2013.

93

technologies for mobility modeling and declarative networking overlay, a messaging service infrastructure and data capture and replay functionality.

The emulation, simulation and orchestration functionality is provided by the Extendable Mobile Ad-hoc Network Emulator (EMANE) 141 , ns-3142 the discrete-event simulator and the Common Open Research Emulator (CORE) 143,144 . The mobility modeling functionality is provides by two technologies, the ITA Universal Mobility Modeling Framework (UMMF) - Mobility Modeling Tool145 and the NRL's Mobile Network Modeling (MNM) Tools146 and the ITA Analysis Framework for Declarative Network147 provides the declarative networking overlay. The Messaging service infrastructure is based on the publish/subscribe paradigm and using the MQ Telemetry Transport Protocol (MQTT)148,149 protocol, and the data capture and replay provided through a collection of ad-hoc scripts.

4.2.3 Data Management Layer

The Data Management Layer employs the more mature technologies developed as part of the ITA programme and provides a level of abstraction and services for seamless integration of data sources and data sinks with high/coarse policy controlled access. In addition, a placeholder for structured and unstructured data (in the form of data sets, e.g. SINCOIN) is also provided. The ITA technologies incorporated in this layer are: (a) Information Fabric150; (b) the Gaian Dynamic Distributed Federated Database151,152 and (c) the Policy Management Library153,154.

141 US Naval Research Laboratory, “EMANE." available at http://cs.itd.nrl.navy.mil/work/emane/index.php 142 “NS-3 Consortium” available at http://www.nsnam.org/consortium/about/ 143 US Naval Research Laboratory, “Common Open Research Emulator (CORE)" available at http://cs.itd.nrl.navy.mil/work/core/ 144 J. Ahrenholz, “Comparison of CORE Network Emulation Platforms" in Proceedings of IEEE MILCOM Conference, pp. 864-869, Oct. 2010. 145 A. Medina, G. Gursun, P. Basu, and I. Matta, “On the Universal Generation of Mobility Models" in Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2010 IEEE International Symposium on, pp. 444-446, 2010. 146 US Naval Research Laboratory, “Mobile Network Modeling (MNM) Tools" available at http://www.nrl.navy.mil/itd/ncs/products/mnmtools 147 Russo, A., “A Declarative Framework for the Analysis of Routing Protocols," (Feb. 2013), available at https://www.usukitacs.com/node/2291 148 Locke, D., “MQ Telemetry Transport (MQTT) V3.1 Protocol Specification", available at http://www.ibm.com/developerworks/webservices/library/ws-mqtt/index.html 149 MQTT.ORG, \MQTT: MQ Telemetry Transport." available at http://mqtt.org/ 150 J. Wright, C. Gibson, F. Bergamaschi, K. Marcus, R. Pressley, G. Verma, and G. Whipps, “A Dynamic Infrastructure for Interconnecting Disparate ISR/ISTAR Assets (The ITA Sensor Fabric)," in IEEE/ISIF Fusion Conference, July 2009 151 G. Bent, P. Dantressangle, D. Vyvian, and V. Mitsou, “A Dynamic Distributed Federated Database" in Proceedings of the 2nd Annual Conference of ITA, 2008. 152 IBM, “GAIAN Database." available at https://www.ibm.com/developerworks/community/alphaworks/tech/gaiandb 153 IBM, “Policy Management Library." available at http://www.alphaworks.ibm.com/tech/wpml

94

4.2.4 Services Layer The Services Layer is a placeholder for the experimentation and prototypes originating from the current fundamental research in Distributed Coalition information Processing for Decision Making (DCIPDM)155 in the ITA programme. Research in Services Composition156 and in Sensor Asset Matching157 have already produced prototypes for experiments which have been tested in the EaaS Framework in small scale scenarios and as part of the biannual research 2013-2014, will expand to larger scale. We expect that as more prototypes emerge from the research activities around Information Analytics and Services Deployment, these will become available in the framework.

Two new components, from the ITA NS-CTA Collaboration, the Apollo Fact-Finder158 tool and the Medusa Crowd Sensing159 application were added to the EaaS framework Services Layer. Apollo is an assured information distillation tool for discovering facts in noisy social (human-centric) sensing data. Medusa is a framework for crowd sensing that provides a high-level abstraction for specifying the steps required to complete a crowd-sensing task.

4.2.5 Decision Layer The Decision Layer is mostly a placeholder for the experiments and prototypes originating from the current fundamental research in the Distributed Coalition information Processing for Decision Making (DCIPDM) in the ITA programme. However, research around Controlled Natural Languages already produced working prototypes such as the Controlled English Store160,161 are included in the EaaS framework. We expect that as more prototypes emerge from the research activities around Contextual Services, Information Exploitation and Information Visualization, these will also become available in the framework.

4.2.6 Automation, Monitoring & Visualization Tools The EaaS framework is extensible and architected in a way that allows for the execution and control of the experiments to be achieved through the use of partially scripted automation of the test runs. In addition, the framework allows for scripted monitoring and report generation to simplify the process of

154 D. Agrawal, S. Calo, K. Lee, J. Lobo, and D. Verma, Policy Technologies for Self-Managing Systems, IBM Press, 2008. 155 ITA Consortium, \International Technology Alliance-Technical Areas (2011-2016) | ITACS." available at https://www.usukita.org/BPP11-TAP_pub 156 R. Dilmaghani, S. Geyk, K. Grueneberg, J. Lobo, S. Yousaf Shah, B. K. Szymanski, and P. Zerfos, “Policy- aware Service Composition in Sensor Networks," in Proceedings of the IEEE SCC, 2012. 157 A. Preece, D. Pizzocaro, K. Borowiecki, G. de Mel, M. Gomez, W. Vasconcelos, A. Bar-Noy, M. P Johnson,T. La Porta, H. Rowaihy, G. Pearson, and T. Pham, \Sensor Assignment to Missions in a Coalition Context: The SAM Tool," in Proceedings of INFOCOM 2009, 2009. 158 UIUC, “Apollo: Toward Fact-_nding for Social (Human Centric) Sensing." available at http://apollo2.cs.illinois.edu/ 159 M.-R. Ra, B. Liu, T. F. La Porta, and R. Govindan, \Medusa: A Programming Framework for Crowd-sensing Applications," in Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services, pp. 337{350, ACM, (New York, NY, USA), 2012. 160 IBM, :IBM Controlled Natural Language Processing Environment." available at http://ibm.co/RDIa53 161 P. Ping Xue, S. Poteet, D. Mott, and D. Braines, \Constructing Controlled English for Both Human Usage and Machine Processing," in 8th International Symposium on Rules (RulesML2013), (Seattle).

95

monitoring and analysing system load and network traffic. Users of the framework are encouraged to create and contribute such automation scripts into a library of scripts and assets for reuse across tests by all researchers.

The EaaS framework incorporates a number of standard operating system and open source monitoring tools. Amongst others, examples include tcpdump162 , iptraf 163 for capturing and monitoring network traffic; argus, which was originally designed to monitor servers and network connections in mission- critical ISP (Internet Service Provider) environments; nmon for monitoring Linux performance; and rrdtool164 for handling time-series data.

The EaaS framework also includes the Scripted Display Tool (sdt3d)165 a map visualization tool developed and mantained by the US Naval Research Laboratory (NRL). sdt3d uses NASA's World Wind166 3D interactive world viewer and provides 3D visualization capability with overlay annotation of nodes, links icons, images, text and etc. sdt3d overlay annotations can be placed at geographic coordinates that can be dynamically updated to move in synchronization with experimental emulations and simulations and/or with input from real mobility and sensing from real devices in the field. sdt3d is written in Java using WorldWind's open source SDK and supports multi-platform deployment, and is integrated and tested on Linux, Mac OS X and Windows. sdt3d is licensed under NRL Code 5520 Software License167

4.3 Datasets

4.3.1 3G/4G Cellular Network Measurement from UMASS This data set is ideal for evaluating network performance in hybrid wireless networks. The dataset contains TCP traces over WiFi and three major commercial cellular network providers in the US: Verizon, AT&T and Sprint in three areas in Massachusetts: Boston, Amherst, and Sunderland. The purpose of the measurements was conducted to evaluate the performance of multi-path TCP between a wired server residing at the UMass and a mobile client. The server is configured as a multi-homed host, connected via 2 Intel Gigabit Ethernet interfaces to two subnets. The mobile client has a built-in 802.11 interface, is wired to 3 additional cellular broadband data devices to connect to the cellular services. The data contains packet traces from both interfaces of the UMass server and all the interfaces used of the client using the tcpdump tool. For each experiment, the mobile client downloaded a 100MB file from the UMass server to measure TCP throughput, loss rate, and round trip time.

162 Martin Garcia, L. and org, t., “Tcpdump/Libpcap." available at http://tcpdump.org 163 Paul Java, G., “IPTraf - An IP Network Monitor." available at http://iptraf.seul.org/ 164 Oetiker, T., “RRDtool - About RRDtool," (Nov. 2011). available at http://oss.oetiker.ch/rrdtool/ 165 US Naval Research Laboratory, “The Scripted Display Tools ("sdt/sdt3d")." available at http://pf. itd.nrl.navy.mil/sdt/sdt.html 166 http://worldwind.arc.nasa.gov/index.html 167 US Naval Research Laboratory, \NRL Code 5520 ." available at http://cs.itd.nrl.navy.mil/products/License.php

96

4.3.2 Multicast Mobile Node Movement Data from BBN This is a network dataset that has been synthesized using various sources available in the public domain. The data set contains the position data of 1000+ nodes as deployed in the division assembly area in an operation scenario. The division comprises of several armored brigades, which in turn comprises units in specific roles such as Infantry, Guards, Fusiliers, and various Regiments (e.g., tank, artillery, etc.). In addition to the geolocation information, the nodes are associated with various levels in the command-and- control hierarchy that is 9-level deep. Also each node belongs to one or more multicast groups, which has specific data load characteristics.

4.3.3 SYNCOIN Data Synthetic Counter Intelligence (SYNCOIN) data set was developed under a MURI grant for “unified research on network-based hard/soft information. The data models considered for the development of SYNCOIN data set are intended to mimic the analysis of various unstructured messages derived from the tactical intelligence effort in the counter-insurgency domain. The data set captures the fusion model elements such as agent, location, time, event, and equipment.

4.3.4 GeoLife Trajectories This dataset is GPS trajectories collected from 178 users over a period of about three years from Beijing (by Microsoft Research Asia). A GPS trajectory of this dataset is represented by a sequence of time- stamped points, each of which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours.

4.3.5 Mobility Patterns of San Francisco Taxi Cabs This dataset contains mobility traces of taxicabs in San Francisco, USA. It contains GPS coordinates of approximately 500 taxis collected over 30 days in the San Francisco Bay Area in February 2009.

4.3.6 A Collection of Data Sets for Wireless/Mobile Data from CRAWDAD CRAWDAD is the Community Resource for Archiving Wireless Data At Dartmouth University, a wireless network data resource for the research community. This archive stores wireless trace data from many contributing locations, and is maintained by the CRAWDAD staffs, who develop tools for collecting, anonymizing, and analyzing the data.

4.4 Exploitation and Ongoing Efforts

In this BPP period, we have made extended efforts to educate the ITA members about the ITA experimentation framework and assets, and demonstrated how they can leverage the framework. We have held multiple tutorials during this BPP period, including AFM workshop focused on experimentation, individual sessions at bootcamps, and also at TA5 and TA6 meetings. We have also closely collaborated with the ARL Network Science Collaborative Technology Alliance (NS-CTA) for their adoption of the ITA EaaS Framework to conduct experiments. ITA researchers in various projects have actively used and are extending the ITA EaaS with new assets (for a list of these assets and experiments, please refer to the individual projects in afore listed sections).

97

Appendix A. ITA Program Metrics

This appendix includes bibliography of scholarly publications, invention disclosures, patents, graduate students/post docs supported, degrees awards, staff rotations and visits, and workshops/seminars/short courses held. A.1 Bibliography of Scholarly Publications A.2 Invention Disclosures and Patents A.3 Graduate Students and Post Doctorals Supported A.4 Degrees Awarded A.5 Staff Rotations and Visits A.6 Workshops, Seminars, and Short Courses

98

Bibliography of Scholarly Publications

List of references will be collected and provided in separate volumes of this report

99

Invention Disclosures and Patents

Type Key: (F) for Field, (D) for Disclosed, and (P) for Published. Year Type Description Note 2008 D Finding Context Based Similarity via RPI Disclosure #1234, Computing with Time 2008 F Discovering Network Topology from IBM-US Routing Information 2008 F A System and Method for Obtaining IBM-US Network Link State Information from Distance Vector Routing Tables” 2008 F Patent Application for Classification/Curve- IBM-US & York Fitting Technique 2008 D Web 2.0 interface to Semantic Web Data GB820080647: IBM-UK 2008 D Semantic pingback GB820080652: IBM-UK 2008 D Distributed semantic querying GB820080654: IBM-UK 2008 F An apparatus for propagating a query GB920070076: IBM-UK filed in US 09/19/2007 2008 F An apparatus for storing a logical statement GB920070077: IBM-UK filed in US 09/19/2007 2008 F An apparatus for enabling connections GB920070149: IBM-UK filed in US 09/19/2007 2008 F Automated propagation of non-conflicting GB920080233:IBM-UK queries in distributed databases filed in US 10/09/2008 2008 F Dynamic context definitions in distributed GB920080237: IBM-UK databases filed in US 10/09/2008 2008 F Method for trusting data source provenence GB920080239: IBM-UK in distributed databases filed in US 10/09/2008 2008 F Node-level sub-queries in distributed GB920080238: IBM-UK federated databases filed in US 10/09/2008 2008 P Web 2.0 interface to Semantic Web Data GB820080647: IBM-UK 2008 P Semantic pingback GB820080652: IBM-UK 2009 F A method for distributing a query GB920090014:IBM-UK filed at the European Patent Office 5/1/2009 2010 F Adaptive Remote Decision Making Under IBM-US Quality Of Information Requirements

100

Year Type Description Note 2010 A mechanism for content management in 40369, IBM-US wireless mobile networks 2011 D Mobile Application hinted Caching at 40669, IBM-US Wireless Cellular Networks 2011 F A mechanism for content management in 40640, IBM-US wireless mobile networks 2011 F Assigning gateways for heterogeneous 40681, Cambridge UK wireless mobile networks and IBM-US 2011 F Characterizing and Selecting Providers of 40591, IBM-US Relevant Information Based on Quality of Information Metrics 2012 D Method to assess trust in route information IBM-UK through a network

2012 D Method to dynamically route information IBM-UK through the fastest most trusted path in a network 2012 D A Method for Growth Perfect Power-Law RPI Graphs, 40739 2012 F Systems and Methods for Provisioning IBM-US Sensing Resources for Mobile Sensor Networks. 2012 D A method for value and context aware IBM-US communications networking 2012 F Scalable common infrastructure for IBM-US information collection from networked devices 2010 F Secret key exchange for wireless and sensor IBM-US networks 2012 D Policy-aware Service Composition IBM-US 2012 D Rate adaptive transmission of wireless IBM-US broadcast packets 2011 D A method for saving bandwidth in IBM-US transmission of compressed data 2012 D Reliable Data Broadcast in Wireless IBM-US Networks

2012 D A Method and Apparatus for Policy Aware IBM-US Service Composition

101

Year Type Description Note 2012 D Multi-tier indexing Methodology for IBM-US scalable mobile device data collection

2013 F Real-time information request response IBM-US system among wireless users YOR8-2012-1178 2013 D A method for secure user authentication in IBM-UK dynamic netoworks GB8-2012-0420 2013 D Reliable data broadcast in wireless networks IBM-US YOR8-2012-0884 2013 D Method and system for privacy preserving IBM-US query in federated coalition networks YOR8-2012-1091 2013 D A method for bounded budget monitor IBM-US deployment in monitoring networks via end YOR8-2012-1324 to end probes 2013 F Scalable messaging appliance based on IBM-US centralized controller YOR8-2013-0526 2013 F Controlled english and trust IBM-UK GB8-2012-0659 2013 D Method and system for sensor service IBM-US composition with operational and spatial YOR8-2012-1460 constraints 2013 F Privacy preserving query method and system IBM-US YOR9-2013-0268 2013 F QA based on context aware, real time, IBM-US information form mobile devices YOR8-2012-0371 2013 D Cross-path attacks on Multi-path TCP IBM-US,YOR8-2013- 41487

2013 F Methods to prevent mulitpath TCP inference IBM-US attacks YOR8-2013-1200 2013 D An novel approach for applying "Possible IBM-UK World" interpretation to fact extraction GB8-2013-0278 2013 D Method and system for cross domain IBM-US gathering of relevant information YOR8-2013-0244

102

Year Type Description Note 2013 F Enabling content uploads through edge IBM-US caching and time shifted uploads in poorly YOR8-2013-1486 connected networks 2013 D Optimal testing procedures for sequential IBM-US processes with delayed observations YOR8-2011-1110 2013 F Providing a Sensor Composite Service IBM-US, RPI Based on Operational and Spatial YOR9-2013-0519 Constraints 2014 F Method and apparatus for optimized IBM-US execution of data stream processing YOR8-2013-1905 applications in heterogeneous and dynamic network environments 2014 D Location information control using location IBM-UK anonymization GB8-2013-0585 2014 P A method and apparatus for online IBM-US application placement in hierarchical cloud YOR8-2013-1913 computing environments 2014 F Snoop virtual receiver time IBM-US YOR8-2014-0392 2014 F Methods and apparatus for matching IBM-US untagged data sources to untagged data YOR8-2014-0409 analysis applications 2014 F System and method to rapidly deploy and IBM-US update applications across multiple virtual YOR8-2014-0397 machines 2014 F System and apparatus to extend cloud IBM-US computing to on premises data YOR8-2014-0396 2014 D Enabling local content sharing through edge IBM-US caching and time shifted uploads in poorly YOR8-2013-1820 connected networks

Disclosed 27 Published 3 Filed 29

103

Graduate Students and Post Staff Rotations

Partial list of graduate students: Chang Liu, University of Massachusetts Guan-Hua Tu, University of California, Los Angeles Darren Richardson, University of Southampton Hengfei Li, University of Aberdeen Anthony Etuk, University of Aberdeen Jhonatan Garcia, University of Aberdeen Youaf Shah, Rensselaer Polytechnic Institute Buster Holzbauer, Rensselaer Polytechnic Institute Tom Babbitt, Rensselaer Polytechnic Institute, US Army Officer Daniel Apon, University of Maryland James Parker, University of Maryland Aseem Rastogi, University of Maryland Sriram Keelveedhi, University of Maryland Yupeng Zhang, University of Maryland Konstantinos Vamvourellis, City University of New York Irippuge Milinda Perera, City University of New York Diarmuid Ó Séaghdha (Dermot O'Shay), Cambridge University Will Webberley, Cardiff University Chris Gwilliams, Cardiff University Diego Pizzocaro, Cardiff University Siddharth Mehrotra, Carnegie Mellon University Gita Sukthankar, Carnegie Mellon University Sadaf Zahedi, University of California, Los Angeles Younghun Kim, University of California, Los Angeles Zainul Charbiwala, University of California, Los Angeles Thao Le, University of Aberdeen Mingyi Zhao, Pennsylvania State University Daniele Masato, University of Aberdeen Supriyo Chakraborty, University of California, Los Angeles Sepideh Nazemi Gelyan, Imperial College Shiqiang Wang, Imperial College

104

Christos Parizas , Cardiff Steven Okamoto, Carnegie Mellon University Anand Seetharam, University of Massachusetts Kasturi Rangan Raghavan, University of California, Los Angeles S. Yousaf Shah, Rensselaer Polytechnic Institute Aishwarya Thiruvengadam, University of Maryland Piotr Mardziel, University of Maryland Laing Ma, Imperial College Nan Hu, Pennsylvania State University Geeth de Mel, University of Aberdeen Yeon-sup Lim, University of Massachusetts Qiang Zeng, Pennsylvania State University Zhengguo Sheng, Imperial College Bruce Leow, Imperial College George Tychogiorgos, Imperial College Cagatay Capar, University of Massachusetts Boulat Bash, University of Massachusetts Victoria Manfredi, University of Massachusetts Harold Chi Liu, Imperial College Petr Novotny, Imperial College Salvatore Scellato, Cambridge University Srikar Tati, Pennsylvania State University Nanxi Chen, University of California, Los Angeles Gustavo Marfia, University of California, Los Angeles Eugenio Giordano, University of California, Los Angeles Yougtae Noh, University of California, Los Angeles Michael Meisel, University of California, Los Angeles Chris Burnett, University of Aberdeen David Emele, University of Aberdeen Daniele Masato, University of Aberdeen Chee Yen Leow, Imperial College Caleb Vincent, Rensselaer Polytechnic Institute

105

Partial list of Post-doctorates: Yung-Chih Chen, University of Massachusetts Yuqing Tang, Carnegie Mellon University Jean Oh, Carnegie Mellon University Wentao Ouyang, UCLA Alice Toniolo, University of Aberdeen Federico Cerutti, University of Aberdeen Matthew Hammer, University of Maryland Yan Huang, University of Maryland Petr Novotny, Imperial College Mario Gomez, University of Aberdeen David Emele, University of Aberdeen Hang Zhao, Columbia University Sahin Cem Geyik, Rensselaer Polytechnic Institute Eyuphan Bulut, Rensselaer Polytechnic Institute Arkady Yerukhimovich, University of Maryland Stephen Magill, University of Maryland Dominique Schroeder, University of Maryland Hong-Sheng Zhou, University of Maryland Konrad Borowiecki, Cardiff Diego Pizzocaro, Cardiff Yuqing Tang Carnegie Mellon University Andrei Bejan, Cambridge Matthew Johnson, UCLA Wei Zhang, Royal Holloway Nir Oren, University of Aberdeen Chris Burnett, University of Aberdeen Jiefei Ma, Imperial College Yang Song, IBM-US Cagatay Capar, University of Massachusetts Yun Hou, Imperial College Sudarshan Vasudevan, University of Massachusetts Uichin Lee, University of California, Los Angeles Eric Yu-En Lu, Cambridge University

106

Juan Estevez Tapiador Murat Sensoy, University of Aberdeen Martin Kollingbaum, Carnegie Mellon University Jie Bao, Rensselaer Polytechnic Institute Paul Smart, University of Southampton Trung Dong Huynh, University of Southampton Yun Hou, Imperial College Felipe Meneguzzi, Carnegie Mellon University Sid Chau, Cambridge University

107

Degrees awarded

Degrees 2007 2008 2009 2010 2011 2012 2013 2014 Awarded PhD 7 5 6 13 3 8 4 2 Masters 2 3 1 3 1 0 4 0

PhD Awardees: Yung-Chih Chen UMASS Alice Toniolo Aberdeen Thao Le Aberdeen Geeth de Mel Aberdeen Steven Okamoto CMU Srikar Tati PSU Petr Novotny Imperial Gita Sukthankar CMU Sadaf Zahedi UCLA Younghun Kim UCLA Zainul Charbiwala UCLA David Emele Aberdeen Daniele Masato Aberdeen Sahin Cem Geyik RPI Eyuphan Bulut RPI Arkady Yerukhimovich UMD Konrad Borowiecki Cardiff Diego Pizzocaro Cardiff Jiefei Ma Imperial Fangfei Chen PSU Chris Burnett Aberdeen Marcin Szczodrak CUNY Joel Branch RPI Joung-Sik Lee RPI Omer Horvitz UMD Gita Reese UMD

108

Sewook Jung UCLA Nir Oren Aberdeen Ellen (Xiao Lan ) Zhang UMASS Leeger Yu UMD Vrizlynn L. L. Thing Imperial Uichin Lee UCLA Shane Balfe RHUL Junning Liu UMASS Hosam Rowaihy PSU Qiming Lu RPI Alberto Schaeffer Filho Imperial Thomas Schmid UCLA Steffen Reidt RHUL Victoria Manfredi UMASS Sharanya Eswaran PSU Harold Liu Imperial Ivan Wang-Hei Ho Imperial Yun Hou Imperial Luke Dickens Imperial Yuqing Tang CUNY Christopher Seaman CUNY Matthew Johnson CUNY Bruno Ribeiro UMASS Yow-Tzu Lim York

Masters Awardees: Lei, Chen RPI Irippuge Milinda Perera CUNY Yousaf Shah RPI Christos Parizas Cardiff Manu Bansal UCLA Zhengguo Sheng Imperial Laura Balzano UCLA Hang Zhao Columbia

109

Moritza Johnson Columbia Thomas Babbit US Army Officer, West Point instructor / RPI Christopher Morrell US Army Officer, West Point instructor / RPI

110

Staff Rotations and Visits

The following staff rotations and visits occurred within the ITA since the prior PRR published in 2012. Type Whom From To Timing BPP Quarter

Across Internship Anthony Etuk Aberdeen IBM US Summer BPP11 Q7/Q8 Countries 2012 Across Visit Y. Shah & RPI Cardiff One week BPP11 Q7/Q8 Countries B. Szymanski

Across Visit K. Sycara CMU Aberdeen One week BPP11 Q8 Countries

Across Internship Nick Westpoint IBM UK Summer BPP13 Q1 Countries Clawson 2013 3 weeks

Across Internship William Viana Westpoint IBM UK Summer BPP13 Q1 Countries 2013 3 weeks Same Internship Y. Shah RPI IBM US Summer BPP13 Q1 / Q2 Country 2013 Same Internship UCLA IBM US Summer BPP13 Q1 / Q2 Country 2013 Across Internship C. Parizas Cardiff IBM US Summer BPP13 Q1 / Q2 Countries 2013 Same Internship Chang Liu UMASS IBM US Summer BPP13 Q1 / Q2 Country 2013 Same Internship Yeon-sup Lim UMASS IBM US Summer BPP13 Q1 / Q2 Country 2013 Same Internship Liang Ma Imperial IBM US Summer BPP13 Q1 / Q2 Country 2013 Same Rotation G. de Mel IBM US ARL July 15 to BPP13 Q1/Q2 Country Oct 31 2013 Across Visit A. Swami, K. ARL Imperial Two days BPP13 Q2 Countries Chan Across Visit A. Swami, K. ARL Cambridge One day BPP13 Q2 Countries Chan Same Visit S. Farquhar DSTL Cambridge One day BPP13 Q2 Country Across Visit C. Giammanco ARL IBM UK One day BPP13 Q3 Countries Across Visit K. Chan, P. Yu ARL Imperial One day BPP13 Q3 Countries Same Visit Don Towsley UMASS BBN Once per BPP13 Q4 Country month

111

Across Visit Richard Cambridge BBN & Ten days BPP13 Q4 Countries Gibbens UMASS Across Rotation F. Cerutti Aberdeen UCLA One month BPP13 Q4 Countries Across Visit K. Sycara CMU Aberdeen One week BPP13 Q4 Countries Across Internship J. Garcia Aberdeen IBM US Summer BPP13 Q5 Countries 2014 Across Rotation J. Pan Aberdeen IBM US One month BPP13 Q5 Countries Across Internship C. Parizas Cardiff ARL Summer BPP13 Q5 Countries 2014 Same Visit G. de Mel IBM US ARL Two days BPP13 Q5 Country July 2014 Across Internship Jiung Kim Westpoint IBM UK Summer BPP13 Q5 Countries 2014 3 weeks Across Internship Michael Shares Westpoint IBM UK Summer BPP13 Q5 Countries 2014 3 weeks Across Internship Jonathan Westpoint IBM UK Summer BPP13 Q5 Countries Thiess 2014 3 weeks

112

Workshops, Seminars and Short Courses

Report on ITA Bootcamp 2011

A successful ITA Bootcamp was held at the Marriott Hotel, Cardiff in the UK on the 18th to 21st of June 2011. There were 81 attendees from the US and UK with government, industry and academia well represented. The Bootcamp was effectively a kick off for the new BPP13, so after an opening the opening plenary with a TA overview from each TA and an update on the experimentation framework the TAs and Projects had two days to work closely together beginning their BPP13 tasks, with a wrap up plenary on the final afternoon. The Bootcamp presentations are available here https://www.usukitacs.com/node/2314

Report on Annual Conference of ITA, ACITA 2013

A successful ITA Annual Fall Meeting (ITA FM) was held at the Palisades Center, Palisades New York USA from October 1st – 3rd. With 150 attendees from the US and UK the meeting was well attended by government, industry, and academia. The ITA FM featured scientific paper (16 long papers) and poster presentations (24 short paper s), ITA technology demonstrations (18 demos), and three (3) technical workshops. Special emphasis was placed on providing an overview of the ITA program and it’s accomplishments to the new Army Research Lab (ARL, US sponsor) director. An informal Peer Review was hosted in addition to discussions regarding program close out, documentation, and deliverables. Full details are available at www.usukitacs.com/annual_fall_meeting.

ITA Face to Face Technical Meetings

ITA Face to Face meetings were held at Imperial College London in early January, many thanks to Kin Leung for arranging them. There were three days of meetings one day for TA5, a cross over day for joint TA meetings and a TA6 day which included meetings with various people from the UDRC program. Alongside these meetings an ITA overview and poster session was run for Dr Russell. The meetings were well attended and productive. Additional details can be found at www.usukitacs.com/f2f

ITA / CTA Experimentation Demonstrations

The ITA experimentation framework and various ITA researchers were involved the NS-CTA experiments that were shown at their RMB in November 2013, along with some of the CE work from ITA. This helped them to show significant progress in collaboration, experimentation, and military relevancy contributing to the decision to extend their program for 5 years.

ITA SIG Events (UK)

Two, one day ITA briefing days were held at IBM Hursley in early July, with around 50 MOD and industry partners attending, the days consisted of over view presentations in the morning and demos and informal discussions during the afternoon, the turnout was good and there was a lot of interest and follow up contact and discusson. The presentations are available on the public ITA site at https://www.usukitacs.com/node/2741

113

Appendix B. Technology Transitions

This appendix details technology transitions which have occurred within the ITA. A.1 Joint US/UK Technology Transitions A.2 Joint US/UK/Commercial Technology Transitions A.3 UK Technology Transitions A.4 US Technology Transitions A.5 Commercial Technology Transitions

114

Joint US/UK Technology Transitions

Coalition Warfare Program – Integration of Gaian Database onto NIFC BICES Network OSD’s Coalition Warfare Program (CWP) funded a US-UK transition project on “ITA Policy Controlled Information Query & Dissemination” in 2011. The goal of this CWP project was to develop an extensible capability for performing distributed federated query and information dissemination across a coalition network of distributed disparate data/information sources with access-controlled policies. The US Army Research Laboratory (ARL) and UK Defence Science Technology Laboratory (Dstl) led the CWP project with software development by IBM UK and IBM US. The CWP project exploited three key technology components developed within the ITA, namely the Gaian Database, Access Policy Decision and Enforcement mechanisms and extensions to the Kerberos ticketing framework. A key metric of success for a CWP project was the transition of coalition-related technology from TRL-3 or 4 to TRL-6 or higher. Thus, the end goal of this CWP project was to demonstrate the GaianDB and policy technology within an operational environment at the NATO Intelligence Fusion Centre (NIFC) at Molesworth RAF. An initial demonstration of this technology on in a ‘stand alone’ environment was undertaken at the NIFC in November 2011 using a data set comprised of 140,000 documents. In 2012/13 the system was been modified to include the secure authentication mechanism based on a Kerberos ticketing framework and this was accredited and integrated onto the NIFC Battlefield Information, Collection, & Exploitation System (BICES) network. The system was successfully demonstrated to the senior management team at the NIFC and the team was awarded a medal of excellence for the work undertaken.

115

Joint US/UK/Commercial Technology Transitions

Accelerating Fully Homomorphic Encryption (FHE) The efficiency of FHE can be measured as the ratio between the time taken to compute a circuit homomorphically to the time taken to compute it in the clear. We can express this efficiency in terms of a security parameter λ, where typically for a sufficient level of security λ is chosen to be ~100. Gentry’s initial scheme had a PGCO that scaled as O(λ 4) resulting in a very inefficient scheme. As a result of this, attempts have been made to improve the scalability of fully homomorphic schemes. One such recent optimisation (Brakerski, et al., 2012) proposed a radically new approach to fully homomorphic encryption in a paper entitled ‘(Leveled) Fully Homomorphic Encryption without Bootstrapping’, that dramatically improves performance. This new approach, which we refer to as the BGV scheme after the authors, uses a scheme based on polynomial rings with integer coefficients which removed the need for any bootstrapping due to a new approach in the construction of the fully homomorphic encryption scheme. The BVG scheme scales as O(λ 2) or O(λ.L3) for L-level arithmetic circuits, ie. Quasi linear in the security parameter and this can be improved further using a batching technique that allows the per-gate computation to be reduced from quasi-linear in the security parameter to polylogarithmic.

In April 2013 IBM released an FHE library in C++ based on the BVG scheme and ITA Project 3.2 has implemented a modified version of the algorithm in SAGE and confirmed the scaling. IBM and Dstl, with Dstl contributing additional funding to support the theoretical aspect of the work undertaken by Wei Zhang supervised by Carlos Cid at RHUL, have jointly funded this transition project. IBM’s contribution has been to assemble a multidisciplinary team from Hursley in the UK and Yorktown in the US to investigate the use of Field Programmable Gate Arrays (FPGA) to accelerate the performance of the BGV algorithm at lower power. To do this the project has made use of a new Object-Oriented language called Liquid Metal (LIME) that can be compiled for the JVM or into a synthesizable hardware description language that can run on the FPGA. The project is also investigating how applications that make use of FHE can be written in a high-level description language that can then automatically generate the required homomorphic circuit equivalent. A demonstration of the capability using a two party oblivious transfer application will be demonstrated at the September Fall Meeting.

This work is being jointly funded by UK Dstl, IBM UKLtd and IBM Research.

116

UK Technology Transitions

Gaian Database in Operational IPA System The Gaian Database has been incorporated into MOD’s operational IPA application to provide extended query capabilities. Due to the classification of the project no details of the integration or the system are available for this report. Project J – Use of Controlled Natural Language in Operational Analytical Solutions This transition activity was a short (20 day) consultancy on the potential role of a Controlled Natural Language (in this case the ITA Controlled English language) to help operational analysts more effectively collect, analyse and share intelligence information in a variety of domains and in a context where the areas of interest (and therefore underlying models) may need to evolve quickly as the situation develops. The funding client for this work was a consortium of three UK government agencies with the remit to fund innovative and potentially disruptive research ideas in this space. The project ran during December 2013 and January 2014 and was delivered both as a detailed technical report as well as a presentation and demonstration of the capability at one of the client locations in London, UK. The project itself showed the potential usage of Controlled English in this setting as anticipated, but based on client interactions throughout the project we also extended the scope to focus more on the “storytelling” aspects of an intelligence analyst’s role and showed the potential for a number of storytelling related capabilities that build upon the dynamic and flexible basis provided by the underlying Controlled English representation. In this transition activity we therefore took the idea of storytelling back to basics and investigated the ability to compose a story from actors and scenes in order to better communicate the various parts of an evolving situation. One example of this is the idea of using a comic style of representation for reporting conversations between subjects (e.g. in social media, telephone calls, or other observable mediums). An example is hown in the figure below:

Figure1: An example of storytelling through comic book rendering Whilst the capability demonstrated within this short project was very basic, the potential value of such a capability is great, especially in terms of conveying a large volume of contextual information (through imagery) around the core conversational elements. The client was pleased with the work delivered under this transition contact and as a research team we have opened a number of new topic areas that could be further investigated in later research or other transition activities.

117

Mission Assurance and Configuration using Requirements Traceability The network element of a system provides benefits of connectivity between disparate systems, including within coalitions. In order to understand behaviours of networked systems it is not sufficient just to capture network laydowns or other technical architectures. Instead the people, processes and technology need to be understood alongside their respective interactions. A key challenge the military faces is how mission objectives can be related to the assets available (technical & people). This project, conducted with City, York and Bournemouth Universities, investigated how concepts in requirements engineering, augmented with system survivability techniques, can provide practical means to achieve mission assurance and mission configurability. Requirements traceability is achieved using satisfaction arguments, while also capturing assumptions and constraints. Further, requirements descriptions using controlled natural language were demonstrated as a first step in providing a machine traceable approach that would allow automated detection and response in the future, such as for network hardening. Further work will develop a prototype toolset that can be trialed with defence users. MIPS 1&2 - Fabric and Gaian The Management of Information Processing Services (MIPS) project set out with two main objectives; notifying analysts when relevant new information has arrived and the automatic processing of the new information. Within these objectives a number of significant challenges were addressed during the project. To achieve the first objective, the development team had to demonstrate the capability for specific analysts to be “tipped-off” in real-time that textual reports and sensor-data have been received that were relevant to their analytical tasks, including the possibility that such reports have been made available by other nations. In the case of the second objective, the project had to demonstrate the capability for the infrastructure to automatically initiate processing of input data as it arrives, consistent with satisfying the analytical goals of teams of analysts, in as an efficient a manner as possible (including the case where data is made available by more than one nation). Using the Information Fabric middleware, the development team created a service based information processing infrastructure to achieve the objectives set by the customer. This paper will identify the challenges in designing and implementing the MIPS infrastructure together with describing its architecture and illustrating its use with a worked example use case. MIPS was updated in 2014. The updates include allowing the three core components of Fabric (Fabric Manager, Broker and Registry) to run in the background as Windows Services; the addition of a new GUI facility for adding services to MIPS; provision of a new logging system for MIPS services to utilize and a complimentary GUI for centrally viewing/managing log information; fixes for DSTL identified issues with the previous version. As part of the update, extensions to the Gaian database were made to support MongoDB and Accumulo as data sources. This allows Gaian to federate data from these NoSQL databases. C4ISR LeTacCIS roadmap This transition contract was for a scoping study to investigate how different technologies and commercial models could benefit future MOD capabilities. It is expected that future cappabilities will be open and modular where modularity is a technical characteristic (allowing pluggable subsystems to be deployed) and openness is a commercial one (identifying contractual models that provide flexibility for MOD to embed new subsystems in existing assets). This transition included IBM UK, Cardiff University and Imperial College. The deliverables from the project included the definition of a process framework for identifying and assessing emerging technologies relevant to long-term network and communications trends, recommendations for further exploitation of ITA technologis by MOD, identifying an initial set of technology areas and supporting research currently relevant to MOD stakeholders that can be taken through the framework process and identifying a framework for assessing new commercial and civilian

118

business and contractual models that would better enable MOD to procure and support open systems. C4ISR LeTacCIS Experimentation This transition project was run in conjuction with the roadmap project described previously. It consisted of a scoping study into the establishment of an Open Systems Experimentation and Trade-off Facility for Dstl and to develop options for a multi-year programme for the implementation of such a facility. The role of such a facility would be to deliver evidence and identify benefits on new systems based on a mixture of paper-based assessment, analysis of technologies, simulation, emulation, laboratory testing and field trials. This transition project included a demonstration of the ITA Experimentation Framework and it is expected that further transition work will be conducted to develop this framework further and include the deployment of the ITA Experimentation Framework into Dstl. Data Exchange for Cyber System Analysis This transition project developed a draft information model for a Cyber Systems Analysis Toolset using the ITA Controlled English (CE). Following the clarification of Dstl requirements, the project used a set of scenarios and use cases to assess different information model representations. Alternative approaches to representing cyber-physical systems were assessed including common representations such as XML – in particular the extraction of information from existing tool sets such as MOOD which is widely used within MOD for developing MODAF architectural models. The project demonstrated a set of ontologies and metadata linked to cyber system elements in a representative model. Land Open Systems Architecture (LOSA) IBM UK fielded a proposed Systems Information Exploitation solution with Paradigm UK in 2011. The solution fielded provided verification validation of the following: • It is practicable to gather data from a range of vehicles from various manufacturers and present this data in a common format to Authority staff using existing Authority infrastructure and applications; • Data can be gathered from legacy platforms with or without a common data bus, such as GVA, Ethernet, CAN or SAE J1939, and automatically uploaded to data warehouse within the existing Authority infrastructure; • Information sources from platforms can be published for exploitation by users on DII and these users can subscribe to these data sources remotely from DII, i.e. data can be captured and stored from different sensors available on the platform, as required by the user; In 2012 IBM provided the Information Fabric technoogy in support of a dstl field experiment (ICC) and the first Land Open Ssystems Architecture (LOSA) RED trial in Caerwent. The LOSA comprises a variety of subsidiary elements including: Generic Vehicle, Base and Soldier Architectures which are linked via the Common Open Interface Land (COIL) node. The node enables information exchange with other domains: Joint, Air and Maritime. In 2013 IBM were tasked to define the Platform Independent Models (PIM) to enable the generation of Platform Specific Models (PSM) during the implementation in the Land Open Systems Architecture (LOSA) Common Open Interface Land (COIL) Node for use in Field Trials in 2014. IBM led a collaboration of leading defence corporations to define the data transfer requirements and interfaces between platforms, soldiers, bases and the wider networked capability in order to generate Information Exchange Requirements.

119

IBM defined the architecture and relevant LOSA COIL Node information and enabled the production of a new COIL Defence Standard 23-14. The IBM team provided verification of the Defence Standard through experiments, exploiting COTS technology and outputs from ITA research. The following reports are available to the Authority either DSTL or DE&S: • Systems Information Exploitation Steering Group - SIE Route Map, Data Ownership, Information Exchange Requirement and Defence Standard 25-24 review, reference: OREC1- SIE1, dated: 09 Feb 10; • Systems Information Exploitation Pilot Programme, reference: OREC2, dated: 28 May 2010; • Urgent Statement of User Requirement (USUR) for the Delivery of Health and Usage Monitoring (HUMS) / System Information Exploitation (SIE) Capability, dated: 11 May 10; • Systems Information Exploitation Verification Validation and Test, Caerwent, reference: TX-RP- 01584-S-PDGM, dated: Feb 11. ESII Task 27 – Enabling C2 for Air The ITA Information Fabric formed a component of a hybrid information layer for task 27 of the Enabling Secure Information Infrastructure (ESII) programme. The purpose of the task was to demonstrate a hybrid information layer consisting of the Information Layer and RTI’s Data Distribution Service (DDS) that provided a common set of services independent of the underlying messaging middleware. DDS is an implementation of an open standard from the Open Management Group (OMG) and is used on several existing MOD programmes. DDS is designed for real-time use and therefore requires certain qualities of service from the underlying network. The Information Fabric is more flexible in its network requirements and will operate in more dynamic environments. The combination of the two messaging middleware systems was successfully demonstrated to MOD stakeholders running an Air C2 scenario.

120

US Technology Transitions

ITA Transition - ITA Experimentation Framework Currently, researchers can only gain access to a small number of purpose built facilities with specific combination of tools for experimentation, but subject to restrictions and limitations of various types. To overcome some of these problems, research was conducted under ITA to define a common and extensible framework, namely the ITA Experimentation Framework, for rapid prototyping and experimentation. Since late 2012, that research has been extended through a US Army Research Laboratory (ARL) funded transition/collaboration between ARL and IBM UK to define the concept of Experimentation-as-a- Service (EaaS) that enables the provisioning of reconfigurable, ad hoc and on demand experimentation environments, and design and implement the framework to accomplish these goals. The EaaS framework makes use of open source technologies, includes additional ITA assets such as the Controlled English (CE) Store, the Services Composition Framework, the Information Fabric, the GaianDB, the Watson Policy Management Libraries and the ITA Analysis Framework for Declarative Network, and shall be continually updated to serve as the core infrastructure for experimentation. Also as part of this transition/collaboration, the experimentation framework has been deployed at the Network Science Research Laboratory (NSRL) @ ARL computer cluster for use by the ARL funded research community and also available as a self contained "Experimentation in a Box" virtual machine. The initial practical use of the ARL EaaS framework was a multi-genre network science experiment conducted by the Network Science Collaborative Technology Alliance (NS CTA).

Enabling Unattended Asset Interoperability Using Controlled English This transition project is funded by ARL and is looking into the potential role of Controlled English in a number of applied situations relating to the general task of achieving improved asset interoperability, especially for unattended sensors that must work in a more autonomous and efficient mode due to their isolation. The main focus areas so far for this project have been the following: • A collaborative demonstration with CERDEC Highlighting the potential value of an intelligent agent “digital assistant” as per the current P4T3 research but in a context relevant to the CERDEC interests. This work was demonstrated with CERDEC at a US only technology demonstration event in January 2014 as part of a wide ranging set of demonstrations from numerous industry and government teams • Collaboration with NS-CTA program Under this transition contract folks from IBM UK worked closely with ARL and NS-CTA researchers to use the Controlled English language to orchestrate the overall execution of the high profile experiment that was demonstrated to the leadership board (RMB) of the NS-CTA program in November 2013. The use of Controlled English in this context provided a structured but flexible basis against which the many experiment permutations could be expressed and subsequently executed. This usage of Controlled English was in addition to the planned usage of the CE Store as a simple inference engine to demonstrate one component within the NS-CTA emulation environment. • Anomaly determination Controlled English is being explored as a possible flexible language to facilitate the management

121

and deployment of various feature extraction, classification and anomaly determination algorithms as part of the ARL-funded project that is working in this area. The diagram below shows the overall context for this funded transition project and highlights the desire to draw together a number of distinct focus areas within ARL as a result of using the Controlled English language and common models within the CE environment to span the identified projects and new focus areas too.

Figure 2: The overall scope and context for this transition project This project is funded to run from October 2013 until January 2016 and is comprised of input from IBM UK, IBM US and Cardiff University.

NS CTA Multi-Genre Network Science Experimentation - CE integration To actively support clear synergies between certain ITA research components and the needs of the NS CTA programme ARL funded a short, focused transition project to apply the ITA Experimentation as a Service (EaaS) framework to and develop specific Controlled English (CE) models, agents and workflows for the high profile NS CTA "Multi-Genre Network Science Experimentation on an Emulated Tactical Network". This work involved close collaboration with multiple parties on the NS CTA programme, and integration of a number of their research components in the context of the ARL-designed overall scenario. The Controlled English language and the CE Store were used in two distinct roles within this work: 1) As an inference engine designed to take into account background intelligence and social network information to infer additional task-relevant information gathering, and 2) As the overall orchestration component, used to execute the numerous experiment iterations, invoke services, consume responses and generate summary output for reporting and analysis purposes. The work was performed in close collaboration with ARL staff and NS CTA researchers in late 2013 with the experiment itself carried out in the NSRL (Network Science Research Laboratory) at the ARL Adelphi location. The ITA/NS CTA collaboration greatly benefitted NS CTA researchers, while also presenting a number of potential opportunities for future work. The results of the experiment were presented to NS CTA senior management and stakeholders during the NS CTA Review Management Board Annual meeting and this work was very well received and praised by both ARL and NS CTA management.

122

Distributed Composite Network Science Experimentation The ITA/CTA Experimentation program has the goal of accelerating the pace of composite network science research progress. Some of the key priorities identified are: • Integrated experimentation that advances deeper collaboration among academic, industry, and ARL researchers and further connects basic research with Army relevance. • Creating, executing, recording, reproducing, and extending complex distributed network experimentation with research capabilities that can readily be enlisted by Alliance researchers. • Collaborative experimentation exploiting shared research capabilities within the ITA, CTA, other ARL and CERDEC research efforts, and other network science research programs.

The execution of rigorous research experimentation requires new approaches that will also yield new research infrastructure and practices. This project addresses these fundamental challenges by advancing designing, performing, analyzing, and reproducing experiments that explore cross-network interactions. Specifically, it provides the means to execute joint NS CTA/ITA network science research using a novel joint NS CTA/ITA distributed virtual experimentation facility that provides access to powerful experimentation assets, software services, and research technologies that can be used within the facility by researchers addressing key network science challenges. It includes systems to federate services across multiple administrative domains as encountered in a coalition context, and systems to provide policy based control of experimental infrastructure. The experimentation facility will bring together the various technologies being created, and a suite of research tools will rapidly expand the capabilities of the network science researchers.

Policy-controlled DB Federation In this project we developed a database proxy that provides fine-grain policy controls allowing authorization and filtering of data at the table, row, and cell level based on arbitrary user profile data. It integrates the Gaian database, an extension of , and WPML to enable policy on database read operations. The proxy is packaged as a configurable VM that can be instantiated for an arbitrary database schema. Policy authoring templates are automatically generated from the database schema and content. The software was delivered to the Army Research Lab to provide fine-grain policy controls over access to their Multi-modal Signature Database (MMSDB).

123

Result sets from MMSDB queries issued in the client portal are filtered with the use of the policy enforcement proxy, with minimal changes to the existing client software and database. Before resulting data is returned to the client, policies are evaluated to determine if the user or role is authorized to access the data. Policies can be authored to filter data at the row, table or column level of a result set. The system utilizes various technologies developed in the International Technology Alliance in Network and Information Science (ITA) for policy controlled information sharing and dissemination. Use of the Policy Management Library provides a mechanism for the management and evaluation of policies to support finer grained access to the data in the MMSDB system.

Enabling Unattended Asset Interoperability The purpose of this work is to show how ITA technologies, such as the Controlled English Store (CE Store), WPML, DSM, and semantic enrichment, can be used to support the definition of mission needs, the matching of assets to fulfill these needs plus the generation of specific sensor/asset configurations that could assist with meeting mission requirements and asset employment, all in the context of Data-to- Decision. Coalition-wide distributed decision making is a key requirement for multi-party operations, be they military or humanitarian. This requires establishing infrastructure in a rapid and timely manner so that intelligence could be gathered in support of command and control (C2) from distributed sources such as command posts, platoons, sensor resources, and so forth. Also such infrastructures should be able to adapt over time to accommodate new sources – especially sensor resources – and decision procedures so that new capabilities could be deployed, redeployed, and removed seamlessly. This is the focus of the transition work with US Army Research Laboratory (ARL) and its sister organization, the United States Army Communications- Electronics Research, Development and Engineering Center (CERDEC). The mobile micro-cloud is proposed as the infrastructure to support dynamic deployment of capabilities to (and from) C2 and edge posts such as forward operating bases, sensor systems, and platoons. The planned capability demonstration will show: how computation could be pushed from C2 to edge nodes so that relevant information is gathered in support of C2's decision making life cycle; how different analytics could be deployed and activated (or deactivated) at the edge nodes based on the needs of operations; how optimal resource consumption could be achieved by delegating computations to available (or idle) resources; and, auto discovery of micro-cloud nodes, thus, enabling scalability. All the above capabilities will be achieved while adhering to strict policy controls deployed by the coalition.

SOAR Program “Data to Decisions” (D2D) is a top priority for the DoD. “Proliferation of sensors and large data sets are overwhelming analysts with data because they lack the tools to efficiently process, store, analyze and retrieve it.” The DoD is investing in Data Management, Analytics and User Interface Technologies that will be available for reuse across the various solution components. As a result,

124

ARL has an internal research program to support D2D by leveraging research within the ITA and other related research programs, especially, the Network Science CTA. The ARL initiative focuses on aligning the collection, processing, and presentation of data, text and imagery with decision making. The scope is to conduct multi-disciplinary research to address the D2D challenges and provide the ability to access, combine, and present information from widely disparate sources in a manner that enables timely and effective decision making.

The SOAR program recruits students and post doctoral candidates to work with ITA and ARL researchers to carry out collaborative projects while located on the premises of the US Army Research Lab in Adelphi, MD. This supports and contributes to the ARL Open Campus concept. Technical areas that are being addressed include: Decision-based Adaptive Collection, development of methodologies to (semi)automate decision-based collection of relevant data and information from disparate sources to enhance the D2D process; Extracting Structure from Text and Metadata Techniques for Video Tagging, development of statistical and linguistic methods to process and extract semantics from potentially high-value text and to develop metadata representations for automated tagging aligned with tactical decision requirements; Trust in Information, development of methodologies to assess the trust in information sources and the information itself to elevate the confidence of the (semi)automated decision making in coalition networks; and, Policy assessments, development of a context-based reasoning framework to assess the impact of policy violations within a coalition D2D life cycle.

125

Commercial Technology Transitions

Gaian Database in IBM SmartCloud Monitoring - Application Insight : Product The first commercial transition of the Gaian Database has been into the IBM SmartCloud Monitoring product. The following is a short summary of the product capabilities. Customers deploying applications to “public” clouds fall into at least two categories: enterprise customers using cost-controlled private clouds constructed to emulate public clouds, and smaller customers for whom public clouds are the production environments. In both cases, application teams don’t have the ability to deploy management servers and other monitoring components to the cloud infrastructure. Nor are they likely to have the budget or manpower to purchase and manage a typically complex APM solution. Instead, they need a lightweight solution—one that works with multiple provisioning solutions, is relatively simple to use, fast to deploy, and thus also fast to deliver value. Ideally, it would also have the flexibility and range of features required to ensure workloads are hitting target thresholds, and if they're falling short, identify and address the root problem to mitigate any business impact. As the name suggests, SmartCloud Monitoring - Application Insight allows application owners to see how their cloud-hosted applications are performing, ensuring that they’re getting the performance they expect from the cloud, and that their applications are serving customers and delivering value to the business. Detailed monitoring capabilities are embedded in virtual machine images stored in one of the supported provisioning tools: Amazon EC2, VMware vCenter, and IBM SmartCloud Provisioning. Each time a new VM instance is provisioned from that base image, monitoring starts seamlessly, and automatically. The integration with the provisioning engine allows each new VM to be automatically discovered by the fabric node and associated with the correct business application, so existing application dashboards are updated to reflect the new virtual machine in seconds. Key to getting the intended value from workloads executing in a public cloud is establishing what kind of demand those workloads are facing, how well the cloud is scaling (or not scaling) to meet the demand, and what kind of experience the end users of cloud-based applications are actually getting. Despite its lightweight architecture, SmartCloud Monitoring - Application Insight delivers on all three of those disciplines. To begin with, it tracks both query volumes and user response times, displaying these in intuitive, “clickable” graphs in the application health dashboard. Application performance metrics can be visually correlated with Linux operating system metrics, to help determine if resource constraint issues with the virtual machine itself are actually causing an apparent application problem. An innovative IBM-developed distributed database (The Gaian Database) is used to collect and centralize monitoring data, drawing it from each node for federated analysis. This helps reduce the bulk and complexity of the monitoring infrastructure—a particularly compelling point, since the entire goal is to improve application execution, not consume cloud resources for the monitoring process that the applications themselves might have needed.

126