CAN UNCLASSIFIED

Analysis of Autonomous Cyber Operation Simulation Environments

Delfin Montano Solana Networks Inc.

Prepared by: Delfin Montano Solana Networks Inc. via General Dynamics Mission Systems–Canada 1941 Robertson Road Ottawa, Ontario K2H 5B7 PSPC Contract Number: W7714-115274/001/SV Technical Authority: Adrian Taylor, Defence Scientist Contractor's date of publication: September 2019

Defence Research and Development Canada Contract Report DRDC-RDDC-2019-C257 Novembere 2019

CAN UNCLASSIFIED

CAN UNCLASSIFIED

IMPORTANT INFORMATIVE STATEMENTS

This document was reviewed for Controlled Goods by Defence Research and Development Canada using the Schedule to the Defence Production Act.

Disclaimer: This document is not published by the Editorial Office of Defence Research and Development Canada, an agency of the Department of National Defence of Canada but is to be catalogued in the Canadian Defence Information System (CANDIS), the national repository for Defence S&T documents. Her Majesty the Queen in Right of Canada (Department of National Defence) makes no representations or warranties, expressed or implied, of any kind whatsoever, and assumes no liability for the accuracy, reliability, completeness, currency or usefulness of any information, product, process or material included in this document. Nothing in this document should be interpreted as an endorsement for the specific use of any tool, technique or process examined in it. Any reliance on, or use of, any information, product, process or material included in this document is at the sole risk of the person so using it or relying on it. Canada does not assume any liability in respect of any damages or losses arising out of or in connection with the use of, or reliance on, any information, product, process or material included in this document.

Template in use: EO Publishing App for CR-EL Eng 2019-01-03-v1.dotm

© Her Majesty the Queen in Right of Canada (Department of National Defence), 2019 © Sa Majesté la Reine en droit du Canada (Ministère de la Défense nationale), 2019

CAN UNCLASSIFIED ARMOUR Task 11 Document # CAD00010262

SOLANA NETWORKS ANALYSIS OF AUTONOMOUS CYBER OPERATIONS SIMULATION ENVIRONMENTS

This document was prepared by General Dynamics Mission Systems – Canada under Public Works and Government Service Canada Contract No. W7714-115274/001/SV. Use and dissemination of the information herein shall be in accordance with the Terms and Conditions Contract No. W7714- 115274/001/SV.

Prepared By:

General Dynamics Mission Systems–Canada 1941 Robertson Road Ottawa, Ontario K2H 5B7

Date: 16 September 2019 Reference No.: HCJ

RELEASE RECORD

Approver Role Approver Name Approval Date DPME Mike Gingell 16 September 2019 Project Manager Karen Arsenault 16 September 2019

REVISION STATUS SHEET

Revision Date Comments

CAD00010262 Baseline 16 September 2019 Initial Release

Preliminary Research Investigation Report

Analysis of Autonomous Cyber Operation Simulation Environments

Version 2.1

List of Authors

Delfin Y. Montuno [email protected]

Abstract

Autonomous Cyber Operations is an important area of research and so is having a simulation and modeling tool that can optimize, assess, and enable exploration of strategic autonomic algorithms, especially against adversarial artificial intelligent agents in the network supporting mission critical operations. To assist in informing decisions about procuring or developing cyber simulation environments for the purpose of automated cyber operation experimentation, we investigate existing literature for cyber simulation solution theories, methods, tools, and results, and categorize them according to their simulation objectives by analyzing their key characteristics, including problem abstraction issues. Based on the investigation results, we formulate an attributes enumeration template for analyzing simulation environments in generic terms, and illustrate the application of the attribute template to selected solutions. Next we tailor the attributes enumeration template for analyzing solutions that deal with assessing security posture with intelligent actors, both benign and adversarial, and illustrate the application of the goal-specific attributes enumeration template to selected solutions. Finally we discuss sample building block solutions and propose a simple buy or build decision making process for obtaining cyber simulation environments.

Résumé

L’émergence de la opérations autonomes en ligne en tant que domaine de recherche important a entraîné le besoin d’outils de simulation et de modélisation qui peuvent optimiser, évaluer et permettre l’exploration d’algorithmes stratégiques autonomes. Ces algorithmes permettraient aux systèmes de protéger les opérations essentielles à la mission contre les agents intelligents artificiels adverses déployés sur le réseau. Pour éclairer les décisions concernant l’acquisition ou le développement d’environnements de cybersimulation à des fins d’expérimentation de cyberopérations automatisées, nous examinons la documentation existante sur les théories, méthodes, outils et résultats des solutions de cybersimulation. Nous classons les travaux existants en fonction de leurs objectifs de simulation en analysant leurs caractéristiques clés, y compris les problèmes d’abstraction. À partir des résultats de l’enquête, nous formulons un modèle d’énumération des attributs pour analyser les environnements de simulation en termes génériques et illustrons l’application du modèle d’attributs à des solutions choisies. Ensuite, nous adaptons le modèle d’énumération des attributs pour analyser les solutions qui traitent de l’évaluation de la posture de sécurité pour les scénarios impliquant des acteurs intelligents, tant bénins que conflictuels. Nous illustrons l’application du gabarit d’énumération des attributs propres aux objectifs à des solutions choisies. Enfin, nous discutons d’exemples de solutions de base et proposons un processus de prise de décision simple d’achat ou de construction pour l’obtention d’environnements de cybersimulation.

Table of Contents

LIST OF TABLES ...... VII LIST OF FIGURES ...... VIII 1. INTRODUCTION ...... 1 1.1 Background ...... 1 1.2 Objective ...... 2 1.3 Cyber Security Simulation and Modeling...... 4 1.4 A Characteristics Analysis Framework ...... 5 1.5 Attributes of a Cyber Security Simulation Environment (CSSE) ...... 7 1.5.1 Technical Attributes – Simulation and Modeling Support ...... 7 1.5.2 Quality Attributes ...... 9 2. SUMMARY OF CSSES REVIEW ...... 11 2.1 Attack Related Solutions ...... 11 2.2 Defense Related Solutions ...... 13 2.3 Cyber Solution Enablers ...... 19 2.4 Applications ...... 23 2.5 Others ...... 27 3. APPLICATION OF ATTRIBUTES ENUMERATION TO DIFFERENT PROBLEMS ...... 28 3.1 Sample Attributes Enumeration Results a Using Generic Template ...... 28 3.1.1 Red Teaming Simulation [Applebaum-2016]...... 28 3.1.2 Simulation for Deception [Durkota-2016] ...... 30 3.1.3 Simulation with Learning [Elderman-2017]...... 31 3.1.4 Simulation for Assessment [Moskal-2018] ...... 32 3.1.5 Cyber-Physical Simulation [Niu-2015] ...... 33 3.1.6 Gamified Learning Simulation [Gallagher-2017 and Morton-2018] ...... 34 3.1.7 Breach and Attack Simulation [SafeBreach-2017] ...... 35 3.2 CSSE Specific Attributes Enumeration Template for Security Posture Applications ...... 36 3.2.1 Assessing Security Posture Specific Issues...... 37 3.2.2 Assessing Most Security Posture Issues ...... 39 3.2.2.1 Protection as the Only Key Security Posture Issue ...... 41 3.2.3 Attributes Enumeration Template for Security Posture Assessment CSSE ...... 43 3.3 To Build or Buy a Cyber Range Solution for Security Status Assessment ...... 51 3.3.1 A Decision Process ...... 53 4. SOLUTION OPTIONS AND TRADEOFF DISCUSSION ...... 54 4.1 Sample Potential Building Blocks ...... 54

-i-

4.1.1 Cytoscape ...... 54 4.1.2 RANGE ...... 57 4.1.3 DLV Planning System ...... 58 4.1.4 GNS3 (Graphical Network Simulator-3) ...... 62 4.1.5 Cloud� Options ...... 63 4.2 Detailed Attribute Analysis of Three Cyber Security Simulation Solutions ...... 64 5. RESEARCH CONCLUSION ...... 69 5.1 Future Work ...... 70 5.1.1 Adapting Recent Enhancements of Reinforcement Learning ...... 70 5.2 Summary ...... 71 6. REFERENCES ...... 73 APPENDIX A: SURVEY OF EXISTING CYBER SIMULATION SOLUTIONS AND ENVIRONMENTS ...... 93 A.1. Adversarial Machine Learning ...... 94 A.1.1 [Vorobeychik-2013] Operational Decisions in Adversarial Environments ...... 94 A.1.2 [Wagner-2017b] Security and Machine Learning ...... 94 A.2. Algorithm Implementation ...... 95 A.2.1 [Durkota-2015a, Durkota-2015b, and Durkota-2016] Finding Approximate Algorithms in Honeypots Allocation ...... 95 A.2.2 [Hamilton-2008] Managing Search Space in Adversary Modelling & Simulation ...... 95 A.2.3 [Moore-2017] Feature Extraction and Feature Selection for Classifying Threats ...... 95 A.2.4 [Yoginath-2015] Addressing Timing Issues in Modeling and Simulating Complex Systems ...... 96 A.3. Attack Planning and Adversary Emulation ...... 97 A.3.1 [Applebaum-2016] Emulating Red Team Behavior ...... 97 A.3.2 [Applebaum-2017] Analyzing Emulated Attacker Behavior ...... 99 A.3.3 [Futoransky-2010a] Building Computer Network Attacks ...... 99 A.3.4 [Futoransky-2010b] Simulating Cyber-Attacks ...... 100 A.3.5 [Grant-2018] Speeding Up Cyber Attack Planning ...... 100 A.3.6 [Holm-2016] Enabling Reproducible Cyber Security Experiment ...... 100 A.3.7 [Miller-2018] Automating Adversary Emulation ...... 100 A.4. Attack Tree/Graph Enhancements ...... 101 A.4.1 [Gadyaskaya-2016] Using Timed Automata in Attack-Defense Trees ...... 101 A.4.2 [Gjøl-2017] Evaluating Time Automata in Attack-Defense Trees ...... 101 A.4.3 [Wang-2013] Exploring Attack Graph for Security Hardening ...... 101 A.5. Breach and Attack Simulation (BAS) ...... 102 A.5.1 [SafeBreach-2017] A Breach and Attack Simulation Solution ...... 106 A.6. Cloud Security ...... 108 A.6.1 [Zhang-2014] Exploiting and Resolving Cloud Vulnerability ...... 108 A.7. Data Generation ...... 109 A.7.1 [Krieder-2015] Specifying Attack Sequence...... 109 A.7.2 [Kuhl-2007] Modeling and Simulating Cyber Attack ...... 109

-ii-

A.8. Defense Assessment ...... 110 A.8.1 [Acosta-2017] Risk Analysis with Execution-Based Model Generation ...... 110 A.8.2 [Baiardi-2016] Assessing and Managing Risk ...... 110 A.8.3 [Bergin-2015] Cyber-Attack and Defense Simulation Framework ...... 111 A.8.4 [Hwang-2015] Operational Exercise Integration Recommendations ...... 111 A.8.5 [Marshall-2012] Distributed Cyber Operations Development ...... 112 A.8.6 [Moskal-2018] Cyber Threat Assessment ...... 112 A.8.7 [Probst-2016] The Attack Navigator ...... 112 A.8.8 [Smith-2017] Digital Warfighting for Systems Engineering...... 113 A.8.9 [Thompson-2018] Agent-Based Cybersecurity Modeling Framework ...... 113 A.8.10 [Wagner-2017a] Quantifying Mission Impact ...... 113 A.8.11 [Xue-2014] Security Concepts for Autonomous Vehicle Networks ...... 113 A.9. Defense by Confusion, Deception, Evasion, and Concealment ...... 113 A.9.1 [Durkota-2015a, Durkota-2015b, Durkota-2016] Deceiving with Honeypots ...... 114 A.9.2 [Ferguson-2018] Deception for Cyber Defense ...... 114 A.9.3 [Jajodia-2017] Cyber Deception with Probabilistic Logic ...... 114 A.9.4 Moving Target Defense ...... 115 A.9.4.1 [Lei-2018] Incomplete Information Markov Game Theoretic Approach ...... 115 A.9.4.2 [Vorobeychik-2013] Operational Decisions in Adversarial Environments ...... 115 A.9.4.3 [Wheeler-2014] Evaluating Moving Target Network Defense Mechanisms ...... 115 A.9.4.4 [Zhuang-2012] Effectiveness Study of Moving Target Network Defense .... 115 A.9.4.5 [Zhuang-2015] Analyzing Moving Target Defense Systems ...... 116 A.10. Defense through Reinforcement Learning ...... 117 A.10.1 [Ben-Asher-2015] Understanding Challenges of Cyber War ...... 117 A.10.2 [Chung-2016] Game Theory with Learning for Monitoring ...... 118 A.10.3 [Elderman-2017] Learning in Cyber Security Simulation ...... 119 A.10.4 [Niu-2015] Defense and Control in Cyber-Physical Systems ...... 119 A.11. Education, Training, and Development ...... 121 A.11.1 Cyber Ranges [Franklin-2018, Zoto-2018, and Vykopal-2017] ...... 121 A.11.2 Gamified Learning Environments [Gallagher-2017] ...... 122 A.11.2.1 ESCALATE ...... 123 A.11.2.2 Circadence Project Ares ...... 123 A.11.2.3 Capture The Packet Training Tool ...... 125 A.11.2.4 Cyber Attack Academy ...... 125 A.11.3 [Fite-2014] Simulating Cyber Operations ...... 126 A.11.4 [Marshall-2014] Cyber Warfare Training Prototype Development ...... 126 A.11.5 [Sommestad-2015b] Experimenting Operational Cyber Security in CRATE ...... 126 A.11.6 [Stytz-2010] Addressing Simulation Cyber Warfare Technologies Issues ...... 127 A.11.7 [Weiss-2017] Cybersecurity Education and Assessment in EDURange ...... 127 A.12. Human Factors ...... 128 A.12.1 [Buchler-2018] Human Dynamics in Cybersecurity Performance ...... 128

-iii-

A.12.2 [Rosque-2019] Cognitive Complexity in Cyber Range Event ...... 128 A.12.3 [Veskler-2018] Human Cognition in Cybersecurity Operations ...... 128 A.13. Intrusion Detection, Response, and Monitoring ...... 129 A.13.1 [Jajodia-2016] Temporal Probabilistic Logic for Security Events Monitoring ...... 129 A.13.2 [Krall-2016] Rare-Event Simulation for Infiltration Monitoring ...... 129 A.13.3 [Ramaki-2015] Algorithm for Multi-Step Attack Scenario Detection ...... 129 A.13.4 [Rege-2017] Temporal Assessment of Cyber Intrusion Chains ...... 130 A.13.5 [Shameli-2012] Survey and Taxonomy of Intrusion Response Systems ..... 130 A.13.6 [Shameli-2014] Cyber-Attack Response System Using Risk Impact Tolerance ...... 130 A.13.7 [Zonouz-2014] Game Theoretic Intrusion Response and Recovery Engine ...... 130 A.14. MDP-based and POMDP-based Approaches ...... 132 A.14.1 [Hoffmann-2015] Accurate Modeling of Real Attackers ...... 132 A.14.2 [Lisý-2016] Counterfactual Regret Minimization in Security Games ...... 133 A.14.3 [Sarraute-2012] POMDPs Model More Accurately Hackers ...... 133 A.14.4 [Sarraute-2013] Suitability of POMDP Model for Penetration Testing ...... 133 A.14.5 [Shmaryahu-2016] Constructing Plan Trees for Penetrating Testing ...... 134 A.14.6 [Wang-2016] Markov Game Model –Based Analysis of Network Security ...... 134 A.15. Payoffs Quantification ...... 135 A.15.1 [Chatterjee-2015] Quantifying Uncertainties in Cyber Attacker Payoffs ...... 135 A.15.2 [Chatterjee-2016] Propagating Uncertainties in Cyber Attacker Payoffs ...... 135 A.15.3 [Rass-2017] Quantifying Losses and Payoffs in Advanced Persistent Threats ...... 135 A.16. Securing Defense ...... 137 A.16.1 Defense Optimization ...... 137 A.16.1.1 [Hota-2016] Addressing Security Investments with Interdependent Assets ...... 137 A.16.1.2 [Nguyen-2017] Multi-Stage Attack Graph Security Games ...... 137 A.16.1.3 [Yang-2017] Optimization Model of Network Attack-Defense Game ...... 138 A.16.2 Interdiction Techniques ...... 138 A.16.2.1 [Letchford-2013] Optimal Interdiction of Attack Plans ...... 138 A.16.2.2 [Nandi-2016] Protecting via Interdicting Attack Graphs ...... 138 A.17. Simulation Systems ...... 139 A.17.1 [Kavak-2016] Characterization of Cybersecurity Simulation Scenarios ...... 139 A.17.2 [Leblanc-2011] Overview of Cyber Attacks and Networks Operations Simulation ...... 140 A.17.3 [Moskal-2013] Simulating Attack Behaviors ...... 141 A.17.4 [Moskal-2014] Fusing Context Models in Network Attack Simulation ...... 141 A.17.5 [Noel-2015a] Analyzing Mission Impacts of Cyber Actions ...... 141 A.18. Software Vulnerability ...... 143 A.18.1 [Song-2015] Automating Software Vulnerability Analysis ...... 143 A.19. Validating Changes, Mitigation Strategies ...... 144 A.19.1 [Atighetchi-2017] An Approach for Security Validation ...... 144

-iv-

A.19.2 [Backes-2017] Simulated Penetration Testing and Mitigation Analysis ...... 144 A.20. Others ...... 145 A.20.1 [Dowse-2018] Need for Trusted Autonomy in Military Cyber Security ...... 145 A.20.2 [Gundersen-2017] State of the Art of Reproducibility in Artificial Intelligence ...... 145 A.20.3 [Holm-2017] Effectiveness of a Typical Script Kiddie ...... 146 APPENDIX B: PROBLEM ABSTRACTION IN CYBER SIMULATION WITH LEARNING STRATEGIES ...... 147 B.1. [Ben-Asher-2015] Understanding Challenges of Cyber War...... 149 B.1.1 Problem Definition ...... 149 B.1.2 Results ...... 149 B.1.3 Instance Based Learning Theory (IBLT) ...... 149 B.1.4 Problem Modeling ...... 150 B.1.4.1 Player Definition ...... 150 B.1.4.2 Game Rules ...... 150 B.1.4.3 Learning Strategy ...... 151 B.1.5 Assumptions ...... 151 B.1.6 Summary ...... 152 B.2. [Chung-2016] Game Theory with Learning for Monitoring ...... 153 B.2.1 Problem Definition ...... 153 B.2.2 Results ...... 153 B.2.3 Problem Modeling ...... 153 B.2.3.1 Game States, Actions, Transitions, Rewards, and Rules...... 153 B.2.4 Learning Models ...... 154 B.2.4.1 Markov Game ...... 154 B.2.4.2 Minimax Q-Learning ...... 155 B.2.4.3 Naïve Q-Learning ...... 155 B.2.5 Assumptions ...... 155 B.2.6 Summary ...... 155 B.3. [Elderman-2017] Learning in Cyber Security Simulation ...... 157 B.3.1 Problem Definition ...... 157 B.3.2 Results ...... 157 B.3.3 Problem Modeling ...... 157 B.3.3.1 Network Model ...... 157 B.3.3.2 Player Models ...... 157 B.3.3.3 Game Rules ...... 158 B.3.4 Learning Models ...... 159 B.3.5 Assumptions ...... 161 B.3.6 Summary ...... 161 B.4. [Niu-2015] Defense and Control in Cyber-Physical Systems ...... 162 B.4.1 Problem Definition ...... 162 B.4.2 Results ...... 162 B.4.3 Solution Strategy ...... 162 B.4.3.1 Formulation of the Cyber-Physical System Interaction ...... 162 B.4.3.2 Formulation of the Cyber System Defence ...... 164 B.4.3.3 Formulation of the Physical System Control Dynamics ...... 166

-v-

B.4.4 Assumptions ...... 166 B.4.5 Summary ...... 167 APPENDIX : PROBLEM ABSTRACTION IN CYBER SIMULATION WITH DECEPTION STRATEGIES ...... 168 C.1. [Durkota-2016] Deceiving with Honeypots ...... 168 C.1.1 Problem Definition ...... 168 C.1.2 Results...... 169 C.1.3 Problem Modeling ...... 169 C.1.3.1 Player Definition ...... 169 C.1.3.2 Computing Optimal Hardening Strategy [Durkota-2015a] ...... 169 C.1.4 Deception Strategy ...... 170 C.1.5 Assumptions ...... 170 C.1.6 Summary ...... 170 APPENDIX D: PROBLEM ABSTRACTION IN CYBER SIMULATION FOR THREAT ASSESSMENT ...... 172 D.1. [Moskal-2018] Cyber Threat Assessment ...... 173 D.1.1 Problem Definition ...... 173 D.1.2 Results...... 173 D.1.3 Problem Modeling ...... 173 D.1.3.1 Network Model ...... 173 D.1.3.2 Attacker Behavior Model (ABM) ...... 173 D.1.4 Assessment Strategy ...... 174 D.1.5 Assumptions ...... 175 D.1.6 Summary ...... 175

-vi-

List of Tables

Table 1 Attributes Enumeration Template ...... 28 Table 2 Red Teaming Simulation (CALDERA) [Applebaum-2016] ...... 29 Table 3 Simulation for Deception [Durkota-2016] ...... 30 Table 4 Simulation with Learning [Elderman-2017] ...... 31 Table 5 Simulation for Assessment [Moskal-2018] ...... 32 Table 6 Cyber-Physical Simulation [Niu-2015] ...... 33 Table 7 Gamified Learning Simulation (Project Ares) [Gallagher-2017, Morton- 2018] ...... 34 Table 8 Breach and Attack Simulation [SafeBreach-2017] ...... 35 Table 9 Resources to evaluate for security statuses ...... 44 Table 10 Resources Attributes ...... 46 Table 11 Resources Attack Attributes ...... 47 Table 12 Simulation Engine Support Attributes ...... 49 Table 13 Pros and Cons for Building ...... 51 Table 14 Pros and Cons for Buying ...... 52 Table 15 Resources and Their Attack Attributes for CALDERA, CRaaS, and SafeBreach ...... 65 Table 16 Support Attributes of CALDERA, CRaaS, and SafeBreach...... 67 Table 17 Breach and Attack Simulation Solutions Extracted from [Barros-2018, Harvey-2018b, and Kumar-2018] ...... 102 Table 18 Learning Strategies in Selected Literature ...... 147 Table 19 Pros and Cons of Selected Learning Strategies ...... 148 Table 20 Deception Strategies in Selected Literature ...... 168 Table 21 Pros and Cons of Selected Deception Strategies ...... 168 Table 22 Assessment Strategies in Selected Literature ...... 172 Table 23 Pros and Cons of Selected Assessment Strategies ...... 172

-vii-

List of Figures

Figure 1 A Cyber Security Simulation Environment (CSSE) ...... 6 Figure 2 CSSE with exposed behavioral aspect ...... 6 Figure 3 CSSE with two users and refined behavioral aspects ...... 7 Figure 4 Flowchart of an Assessment with Learning Simulation Tool ...... 44 Figure 5 Sussman’s world of blocks planning problem example reproduced from Fig.1 of [Eiter-2000] ...... 60

-viii-

1. INTRODUCTION

1.1 Background Autonomous Cyber Operations (ACO) is an important area of research for DRDC Ottawa. There is a near-term requirement to research the use of adversarial artificial intelligence agents in cyber environments, in part through observing how humans approach cyber operations, and also exploring how artificial agents can learn to perform cyber operations through experimentation. However the complexity of the cyber environment makes performing such research challenging. Production computer networks are not well-suited to experimentation, particularly where the outcomes can include breaking needed functionality.

Simulated cyber environments range from fully operational networks designed for human-in-the- loop activities to highly abstracted game-like simulations. At one end of the realism range, the engagement of penetration testers in production environments may be viewed as a kind of simulation; while the environment is real, the penetration testers have some rules of engagement. However these are expensive to employ, cannot run faster than real time, and provide only a few examples of attacker activities per engagement. Staged cyber events such as red vs. blue competitions provide more flexibility at the expense of realism; network environments are deliberately left vulnerable, resources are more abstracted, and user activities are simulated. However they are still restricted to running in real time and require significant investments of participant’s time. Further abstractions of the cyber environment provide different trade-offs in capability and accuracy. For example an attack graph, properly informed with factual and complete data, represents a possibility space of attack vectors to the extent that its descriptive language allows. Divorcing the environment from a real system in this way allows for processing and analysis faster than real time, but an attack graph is static and does not encompass all categories of cyber activities. Highly abstracted games, such as the one described in [Elderman- 2017], simplify an attack/defence scenario into a game with simple elements representing abstracted actions and stochastic outcomes. Choosing the appropriate level of abstraction in a simulation thus depends on requirements for ease of use, complexity, cost, and realism.

A simulated cyber environment will be required to enable research in ACO. In support of that goal, the purpose of this task is to investigate and analyze cyber simulation theories, methods, tools, and results in the academic literature, open source communities, and commercial solutions. Since the requirements for a simulated cyber environment are still being developed, this analysis considers a range of possible requirements and how well-suited specific environments would be for different use cases.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

1 28 March 2019

1.2 Objective This document aims to compile the required information useful for making the appropriate decisions about procuring or developing cyber simulation environments for the purpose of automated cyber operations experimentation. In doing so, we investigate cyber simulation theories, methods, tools, and results in the academic literature, open source communities, and commercial space. This report is organized as follows:

• Section 2 Summary of CSSEs Review, provides a summary of the investigation and analysis of 20 aspects of existing cyber simulation environments (with details in Appendix A) and the problem abstraction issues of some selected environments involving learning in Appendix B, deception in Appendix C, and security assessment in Appendix D. The summary groups all the surveyed literature under five high-level categories consisting of attack, defense, enablers, applications, and others related solutions. • Section 3 Application of Attributes Enumeration to Different Problems, formulates a generic attributes enumeration template and applies it to seven categories of cyber simulation solutions based on the analysis of the technical and non-technical characteristics of the environment. Tailoring the attributes enumeration template to cyber simulation solutions that focus primarily on security posture assessment, we elaborate the template as we applied it in depth to the class of cyber simulation solutions for security posture assessment with learning capabilities. Finally, we discuss the general issues associated with building or buying cyber simulation solutions. • Section 4 Solution Options and Tradeoff Discussion, describes sample component solutions that could be part of a final cyber simulation solution and the results of applying the detailed attributes enumeration templates formulated in Section 3.2.3 to three cyber security simulation solutions; • Section 5 Research Conclusion, provides a conclusion with a list of possible future work and a summary; • Section 6 References, provides a list of references; • Appendix A: Survey of Existing Cyber Simulation Solutions and Environments, provides a detailed survey of 20 aspects of cyber simulation environments and solutions; The aspects cover adversarial machine learning, algorithm implementation, attack planning and adversary emulation, attack tree and graph enhancements, breach and attack, cloud security, test data generation, cyber security defense with learning, cyber security defense using deception and evasion, security posture assessment, training and education, human factors, intrusion detection with response and monitoring, MDP- and POMDP-based approaches, securing defense, software vulnerability, validating changes and mitigation strategies selection, and other specialized or generic simulation systems. • Appendix B: Problem Abstraction in Cyber Simulation with Learning Strategies, describes in detail the problem definition, results, learning models, and assumptions of four cyber simulation solutions; • Appendix C: Problem Abstraction in Cyber Simulation with Deception Strategies, describes in detail the problem definition, results, deception strategy, and assumptions of a cyber simulation solution;

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

2 28 March 2019

• Appendix D: Problem Abstraction in Cyber Simulation for Threat Assessment, describes the problem definition, results, assessment strategy, and assumptions of a cyber simulation solution;

Note that in the following discussion, we will be using the following terms interchangeably unless indicated otherwise.

• cyber simulation environment (CSE) • cyber security simulation environment (CSSE) • cyber simulation solution (CSS) • cyber security simulation solution (CSSS)

Before surveying cyber security simulation environment (CSSE) related literature and analyzing the attributes of CSSEs, we describes first the concept of cyber security simulation and modeling, a framework for analyzing the characteristics of CSSE, and the technical and non- technical attributes of a CSSE.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

3 28 March 2019

1.3 Cyber Security Simulation and Modeling Our intended purpose for simulation is to have a mean of extracting results in a controlled environment. To simulate, we will need one or more models to represent the reality we want to simulate. Each model could be refined or decomposed into sub-components. These models interact with one another within a simulation environment that orchestrates the simulation process to obtain the results we are looking for. Maria [Maria-1997] provides a general introduction to modeling and simulation. In it, the author answered the following questions: “ • What is modeling? • What is simulation? • What is simulation modeling and analysis? • What types of problems are suitable for simulation? • How to select simulation software? • What are the benefits and pitfalls in modeling and simulation? “

In our investigation, we are interested in what a CSSE should be able to support. Basically, we should be able to investigate almost all cyber security issues using such an environment. As the scope of cyber security issues is broad, so will the approaches to cyber security simulation environment be varied as described in Appendix A. To understand these different approaches, we will need to categorize the different objectives of such varied environments. Some of them are as follows:

• Training of personnel o Educational . Conceptual . Decision making o Mission-specific o Operational, including defense and attack • Assessment of security posture o Configuration weaknesses o Denial of Service (DoS) o Distributed DoS (DDoS) o Mission capability o Malware defense o Threat assessment o Breach and attack • Design and Analysis of security solutions o Network segmentation and communication isolation mechanisms o Learning techniques and game theory . Reinforcement learning: Q-Learning, Monte-Carlo learning . Neural Network . Deep Q network o Confusion, deception, evasion, and concealment mechanisms . Allocation of honey pots

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

4 28 March 2019

. Dynamic network configuration . Moving target defense o Intrusion detection techniques . Placement of sensors . Fusion of sensor data o Performance and analysis of candidate mechanisms o Optimization and improvement of existing solutions • Generation of test data o Attack scenarios o Intrusion data o DoS, DDoS o Network configuration o Vulnerability data

1.4 A Characteristics Analysis Framework As evident from above, each simulation objective can be further refined into more specific areas of investigation. For example, in the design and analysis of security solutions, we could be exploring various learning techniques. In such a case, we would like the environment to be able to support easily the different learning techniques. It should also be noted here that the described objectives may be interrelated. For example, we may need to be able to generate various attack scenarios for the learning examples described above.

We will describe here a top down view of a generic cyber simulation environment. The goal is to have a framework for organizing the various characteristics that arise in different simulation objectives. The environment should be able to support all the components and their interacting relationships to achieve its simulation goal.

Figure 1 shows a cyber security simulation environment we will be using to discuss what to look for in such an environment. It has four models (users, network, physical system, and manager) and an object (report). The users act on the network to get network response and sense the network for any changes in its state. The manager sets/defines the network and the physical system. The physical system reports back to the network and the network reports back to the manager which generates a report for the person doing the simulation. The details of each component are described in Section 1.5.1.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

5 28 March 2019

Figure 1 A Cyber Security Simulation Environment (CSSE)

Figure 1 shows multiple users interacting with a network. It can be refined to bring out the behavioral aspect of each model as shown in Figure 2.

Figure 2 CSSE with exposed behavioral aspect

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

6 28 March 2019 Figure 3 CSSE with two users and refined behavioral aspects Note that some simulation objectives will not require a physical system and associated behavior model if no explicit physical system is being controlled in the simulation. In Figure 3, we show how the behavioral aspect of each model affects its interaction with its neighbor. Although the number of users could be many, only two are shown.

1.5 Attributes of a Cyber Security Simulation Environment (CSSE) In this section, we describe the technical and non-technical attributes of cyber security simulation solutions.

1.5.1 Technical Attributes – Simulation and Modeling Support Here, we describe the components of the cyber security simulation environment (CSSE) in more detail. The detail attributes of each component will depend on the simulation objective and the specifics of the simulation scenario. Nevertheless, the component if present will have the following high-level attributes.

• User model, which could be about human, machine or software, allows for modeling of the following types of users according to their behavior and the simulation scenarios with the right level of detail they are in: o attacker – one who attacks the network o defender – one who defends the network o normal user – one who uses the services of the network

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

7 28 March 2019

o malware – software that is designed to disrupt, damage, or gain unauthorized access to the network o bot – an autonomous program that interacts with the network for whatever its programmed purposes • Network model allows for simulating network operation and usage abstraction according to simulation scenarios (behavior, configurations, …) at the right level of detail as needed such as the following: o packet size, loss, and delay o nodes, connectivity, and subnets o sensors, firewall, … o timing constraint o network characteristics such as wireless or not o services • Manager model helps set up and initialize the network configuration according to the simulation scenarios • Physical system model simulates a physical system that is under the control of the network • Report provides the results of the simulation, including the simulation setup configuration so that results can be reproduced

Implicit within the environment are the following supporting attributes: • Attack mechanism modeling • Defence mechanism modeling • Multi-agent modeling • Collaborative modeling • Test data generators • Simulation engine that supports all of the above and controls the timely execution of the models be it in a discrete or a continuous event system specification formalism

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

8 28 March 2019

1.5.2 Quality Attributes Equally important to the technical attributes are the quality attributes of a CSSE. They concern about its ease of setup, use, maintenance, verification, validation, and understanding of the output results. They are usually applicable to all types of CSSEs. We list them below. More can be added as they arise.

• Ease of setup o Test data generation o Scenario setup, including attack, network usage, network defense o Model definition and initialization • Ease of verification and validation o Model consistency check o Simulation scenario consistency check o Model reflecting accurately the reality it is modeling • Ease of usage o Low complexity, easy to use o Highly scalable o Intuitive graphical user interface o Realistic abstraction of multiple levels o Allow for control and programmability of model o Allow for interacting with a human interface o Allow for iterative enhancement of the models and validation o Allow for definition of performance metrics, analysis tools, and data collection o Control on the scope of test/simulation o Visualization support • Output and analysis o Useful feedback o Allow for visualization of results o Customizable detailed report o Provide statistics on performance measure o Report and analysis of simulation results o Provide meaning feedback such as remediation actions o Accumulation of cyber knowledge • Ease of Maintenance o Tool/model updates/enhancements o Tool test o Database o Model library management o Model library coverage o Test cases o Ease of fine tuning • Performance o Reliable o Repeatability o Ubiquitous

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

9 28 March 2019

o Maturity o Model reusability o Standard compliance or based on standards o Cover a wide range of problem types and scenarios o Cover different types of data generation o Modular, can be composed, plug and play o Handle timing issues correctly and accurately if necessary o Traceability of occurred scenarios o Documentation o Customer support o Active user community

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

10 28 March 2019

2. SUMMARY OF CSSES REVIEW This section summarizes the cyber simulation solutions surveyed in Appendix A and analyzed in more details in Appendices B, C, and D. To facilitate the presentation, we use five high-level categories to group the 20 different solution aspects. Thus, some solutions are grouped under the attack related solutions; some under the defense related solutions; some under the cyber solution enablers; some under applications and the rest under others. For each solution, we describe what is the problem being solved, why is it being solved, and what approach is being taken. Note that only some surveyed articles in each aspect are presented here for illustrative purposes and that the grouping is not entirely orthogonal.

2.1 Attack Related Solutions This section describes solutions that are mainly focused on the attack aspects of the cyber security such as attackers manipulating defenses, attackers planning attacks, encoding attacks for analysis, and simulating breach and attack.

• Attackers manipulating defenses (See Appendix A.1 for more details) Attackers can manipulate in the different aspects of a cyber defense, especially if the defense deploys machine learning to predict attacker’s behavior. The attacker can generate false instances so as to cause the learner to predict wrongly on the future relevant instances. [Vorobeychi-2013] described in Appendix A.1.1 uses a linear program to compute an optimal operational policy based on the predictions from a learning algorithm. Instead of filtering the false instances, they authors consider randomized operational decisions to offset set the influence of false instances. In their experiment, the authors use the TREC spam corpora from 2005-2008 to perform a 10-fold cross-validation and analysis of their results. To compute optimal randomized operational decisions, they use an output of an arbitrary probabilistic classifier in a linear optimization program together with their machine learning predictions, operational constraints, and adversarial model. Thus, the simulation environment is basically the computation of the linear optimization program and the machine learning program.

• Attackers planning attacks (See Appendix A.3 for more details) Attack plans are outcome of attacker’s behavior. To generate attack plan, we need to first understand the attacker’s characteristics such as knowledge, capability, and goals. Similarly, to generate a defense plan requires understanding corresponding defender characteristics. [Applebaum-2016] describes how to emulate both the attacker and defender behavior and uses MITRE ATT&CK to describe techniques, tactics, and procedures. The ATT&CK model is encoded in [Eiter-2000] planning language and implemented using the DLV [ ] planning system. They build a red teaming tool called CALDERA composed of two primary� systems: � � o Virtual Red Team System (ViRTS) – the software infrastructure for instantiating and emulating a red team adversary o Logic for Advanced Virtual Adversaries (LAVA) – the logical model for plan generation and execution

• Encoding attacks for analysis (See Appendix A.4 for more details)

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

11 28 March 2019

We can use attack trees, attack graphs, and attack-defense trees to analyze attack events and sequences. With stochastic timed semantics enhancement to these representations as suggested in [Gadyaskaya-2016], we can not only about the existence of an attack sequence but also the time bound and probability of success of such an attack sequence. The authors use a python script to translate the attack-defense tree, the description of the cost of atomic attacker actions, the execution time and their probability of succeeding into timed automata. This output is then used in UPPAAL [Behrmann-2004] or UPAAL STRATEGO [David-2015] model checking to answer some of the time reachability questions in the attack-defense tree.

• Simulating breach and attack (See Appendix A.5 for more details) This class of cyber security solutions tries to find vulnerabilities before the adversaries on an automated and continual basis. It is like pen testing but without the compromising effect. There are at least 17 vendors of such products. A more detail description of such a product is given in Section 3.1.7 and Appendix A.5.1.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

12 28 March 2019

2.2 Defense Related Solutions This section describes solutions that are mainly focused on the different defense aspects of the cyber security such as assessing defense mechanisms; defensive techniques by confusion, deception, and concealment; defensive techniques by evasion and moving target defense; defensive techniques with reinforcement learning; securing defense by optimization; securing defense interdicting attacks; securing defense with intrusion detection, response, and monitoring; securing defense by securing software vulnerability; and validating changes and deploying mitigation strategies;

• Assessing defense mechanisms (See Appendix A.8 for more details) In defense, there are many aspects that could be assessed. Some of which are as follows:

Risk [Acosta-20171, Baiardi-2016 2, Probst-20163] o 4 5 o Operation [Hwang-2015 , Marshall-2012 ] Threat [Moskal-2018] o 6 7 o Mission [Smith-2017 , Wagner-2017a ]

1 A number of tools are used here. Some of them are common open research (CORE) [Ahrnholz-2008] for designing and running network scenarios using a graphical interface; Multi- Generator (MGEN) [MGEN-2016] for generating traffic; REPTree algorithm [Hall-2009] for generating classifiers; and MULVAL [Ou-2006] for its underlying reasoning engine that also allows users to introduce custom interaction rules. 2 The authors describe the capabilities of Haruspex, “a suite of tools to assess and manage the risk posed by an information and communication technology system.” Some of the tool components are a Monte Carlo method to handle intelligent agents implementing chains of attacks to reach their goals; tools to describe agents and target system with its vulnerabilities; and tools to analyze simulation results to select countermeasures. 3 The authors develop an “attack navigator” under the TREsPASS project. “This navigator makes it possible to say which attack opportunities are possible, which of them are the most urgent, and which countermeasures are most effective.” 4 “This report presents a summary of the state of the art in cyber assessment technologies and methodologies and prescribes an approach to the employment of cyber range operational exercises (OPEXs).” 5 “For our HLA (High Level Architecture) environment, the Modeling Architecture for Technology, Research and Experimentation (MATREX [MATREX]) 1.3 NG Run time Infrastructure (RTI) was used with the Advanced Testing Capability (ATC) [McCray-2008 and MATREX-2012b] or One Semi- Automated Forces (OneSAF) to generate the scenario platforms. Messages to intercept, deny, or delay are generated by the ATC, OneSAF or Battle Command Management Services (BCMS [MATREX- 2012a]) federates. The Cyber Operation Service (CYOS) was created to handle the Cyber Ops duties. It is a Java federate that uses ProtoCore [Snively-2006] to connect to the RTI. It determines when cyberattacks can be made and implements some of the attack types that are not currently implemented by other federates. The CYOS was added to the BCMS federates and full documentation will be in the BCMS User's Guide [MATREX-2012a].” 6 The authors describe the development of Early Synthetic Prototyping (ESP) that is “a physics-based game environment to rapidly assess how technologies might be employed on the battlefields.” ESP consists of various components. Some of which are Operation Overmatch, “a small unit first person shooter”; Office of the Secretary of Defense (OSD) Mission Engineering concept [Gold-2016] that “treats the end-to-end mission as the system in the operational context to drive performance requirements for individual systems”; computer aided engineering (CAE) simulations to analyze technological solutions; an interface to “ inform tradespace tools such as Army WSTAT [Edwards-

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

13 28 March 2019

8 9 10 o Cyber-physical systems [Bergin-2015 , Thompson-2018 , Xue-2014 ]

Here we expand on threat assessment as described in [Moskal-2018]. The problem is about the threat of network vulnerability with respect to attackers of different abilities. To conduct the experiment, the authors use Virtual Terrain (VT) to represent the network, Attacker Behavior Model (ABM) to represent attacker, and Adversary Intent Module (AIM) to represent intent context models in their simulator called CASCADES (Cyber Attack Scenario and Network Defense Simulator). More details of this surveyed is given in Section 3.1.4, Appendix 7.8.6, and Appendix D.

• Defensive techniques by confusion, deception, and concealment (See Appendix A.9 for more details). Attackers need a clear and accurate view of your systems to launch effortless and successful attacks. Attackers can be fooled by the presence of decoys [Durkota-2016] (more details in Appendix C and Section 3.1.2), manipulating true payoffs and controlling attacker perception and response [Ferguson-2018]11, and controlling the true and false responses to scan requests [Jajodia-2017]. The authors in [Jajodia-2017] “implemented a Java prototype of the proposed framework. Each experiment was run on an 8-core, 8-processor machine with 20GB of RAM. The optimization problems were solved using the IBM CPLEX library, and the Jung library for handling network graphs.” Cauldron software [Jajodia-2011] is used to scan for software on each node of a network and to find vulnerabilities that are consistent with the NVD. NS2 network simulator [Issariyakul-2008] is used to add network connections.”

• Defensive techniques by evasion and moving target defense (See Appendix A.9.4 for more details) Static target allows for the steady accumulation of attack surface over time. Changing network configuration over time distorts the intruders’ accumulated attack surface [Wheeler-2014]. The author develops a dynamic computer network model call Dynamic Virtual Terrain (DVT) to support the evaluation of Moving Target Network Defense Mechanisms (MT-NDMs) through

2012] and Marine Corps FACT [Browne-2012]; data mining and machine learning solutions; and scoring mechanism [Ross-2016] . 7 “The simulation framework is implemented using NetLogo [Netlogo-2015]. Matlab release 2014b [Matlab-2014] and Python 2.7 are used for data aggregation across simulation runs and the calculation of statistical measures.” 8 “The paper describes a notional framework to support this type of (cyber-attack and defense simulation) modeling, as well as a detailed use case and example cyber-attack simulation system.” The example is about DDoS (Distributed Denial of Services) being injected into the simulator or emulator framework. The framework includes the QualNet network simulator. 9 “We develop and implement a discrete-time agent-based simulation model in Java based on the modeling framework …, which we then use to perform simulation experiments of malware spreading through a mobile tactical network …. All experiments are conducted on an Intel Core i7 processor operating at 2.40 GHz with 16 GB of memory running Windows 10.” 10 Only algebraic, spectral, and graphical characterizations of the security concepts are provided. 11 No implementation detail is provided.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

14 28 March 2019

cyber attack simulation. CASCADES (Cyber Attack Scenario And network Defense Simulator) supports the development and implementation of DVT. It is based on many of the same concepts used in the design of MASS (Multistage Attack Scenario Simulator). The author describes CASCADES architecture which at that time is under development but not its implementation.

When the target is moving, the attackers have to ensure that their exploration surface is indeed the attacking surface, a concept described in [Lei-2018]. When a target can move or change, then we need to consider how often it should occur; what kind of change should it be; and which part should be changed. This is also considered in [Lei-2018]. The authors formally define IIMG- MTD 12 and design an optimal strategy generation algorithm but do not describe its implementation.

[Zhuang-2015] presents the theory and concept related to moving target defense systems without going into the implementation aspects while [Zhuang-2012] investigates its effectiveness. To do so, the authors in [Zhuang-2012] “devised a high-level simulation that reflects a MTD (Moving Target Defense) system. The simulation testbed was built on an existing network security simulator called NeSSi2 [Schmidt-2010]. NeSSi2 is an open-source, discrete-event based network security simulator with extensive support for constructing complex application-level scenarios based on a simulated TCP/IP protocol stack.”

• Defensive techniques with reinforcement learning (See Appendices 7.10 and 8, for more details) Knowing what the attackers can do is a good start for devising defensive actions. Learning how an attack and defense sequence may playout helps in taking the right action at the decisive moment. Moreover, when initial knowledge of the situation is usually not available, being able to learn allows for various attack and defense scenarios to be investigated. There are many forms of learning. [Ben-Asher-2015] studies the decision making processes that drive the dynamics of cyber-war involving multi-cognitive, learning agents making decision using Instance-Based Learning Theory. “The CyberWar game and the cognitive agents were implemented using Netlogo [Wilensky-1999], a multi-agent simulation environment where each agent was an IBL model.” [Ben-Asher-2015]

Learning with limited opponent information appears to be useful according to [Chung-2016]. Although the authors compare the performance of different decision making algorithms by simulation, they provide no details of the simulation environment, but plan to test and evaluate further their work by embedding it in a security analytics testbed [Cao-2015a] and in using factor graph [Cao-2015b].

On the other hand, [Elderman-2017] finds Monte Carlo learning with Softmax exploration strategy to be most effective in performing both defensive and attack role among other learning strategies they studied. The cyber security simulation is about “two learning agents playing against each other on a cyber network. The simulation is modelled as a Markov game with

12 IIMG-MTD: Incomplete Information Markov Game theoretic approach to defensive strategy generation for Moving Target Defense

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

15 28 March 2019

incomplete information in which the attacker tries to hack the network and the defender tries to protect it and stop the attacker.” [Elderman-20107] The authors describe the experimental setup but not the simulation environment.

Learning can also be applied in defense and control of cyber-physical systems [Niu-2015]. The authors evaluate their proposed theory on the yaw-channel control of an unmanned aerial vehicle (UAV) on the physical side. On the cyber side, the UAV is assumed to be controlled by a base station through a wireless network that suffers from cyber-attacks. They model effects of the attack as delay and packet loss rate of control packets. The authors, however, do not explicitly describe the environment used to do the simulation.

• Securing defense by optimization (See Appendix A.16 for more details) Often we have limited resources. When the countermeasures consist of only disabling nodes, [Nguyen-2017] describes strategies to protect network data from cyber-attacks for scenarios modeled by Bayesian attack graphs. The authors use “a simulation-based methodology, called empirical game-theoretic analysis (EGTA) [Wellman-2016], to construct and analyze game models over the heuristic strategies.”

[Hota-2016] finds that it is better to cooperate than being selfish in a multiple defenders scenario. Here, each defender manages its own set of assets interdependently with other assets that are managed by others in delivering some overall service. The authors “use the MATLAB tool CVX [Grnat-2008] for computing the best response of a defender and the social optimum.”

• Securing defense by interdicting attacks (See Appendix A.16.2 for more details) Being able to proactively defend is the best security strategy. [Letchford-2013] describes both deterministic and Markov Decision Process (MDP)-based interdiction plans. Two solutions are presented for the former that consists of one based on a mixed-integer program and leveraged partial satisfaction planning techniques, and the other based on greedy heuristics. For their experimental evaluation, the authors “used the pathways planning domain from the 2006 international planning competition (IPC) 13. … All computational experiments were performed on a 64 bit 2.6.18-164.el5 computer with 96 GB of RAM and two quad-core hyperthreaded Intel Xeon 2.93 GHz processors, unless otherwise specified. We did not make use of any parallel or multi-threading capabilities, restricting a solver to a single thread, when relevant. Integer programs were solved using CPLEX version 12.4.” They also “leverage the extensive heuristic planning work in AI, and use an off-the-shelf state-of-the-art heuristic planner, such as SGPLAN [Chen-2006].

Allowing the defender to randomize its commitment strategy is left for future work. On the other hand, [Nandi-2016] finds an optimal interdiction plan on an attack graph to minimize the loss due to security breaches. “All the experiments are performed on synthetic attack graphs … All the experiments are carried out on laptop with an Intel core i7 2.70 GHz processor and 16 GB RAM. The algorithms are implemented in Python 3.4 with Gurobi [Gurobi-2015] as the solver”. [Nandi-2016]

13 Go to https://ipc2018-classical.bitbucket.io/ for more recent info on IPC.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

16 28 March 2019

• Securing defense with intrusion detection, response, and monitoring (See Appendix A.13 for more details) Knowing when and where an attack will occur with high probability is always an advantage [Jajodia-2016]. They “develop a prototype implementation of three algorithms for the defender that optimize the defender’s objectives”. “Our code was implemented in Java with IBM ILOG CPLEX 12.6 for solving (integer) linear programs. All the experiments were run on an Intel Xeon CPU clocked at 2.40 GHz, running Linux, with 24 GB RAM.”

Some events are so rare in their occurrence that we need a rate-event simulation to study them [Krall-2016]. Here the authors apply “the importance sampling method to estimate the likelihood of a successful network infiltration, given that sufficiently many network alerts have not been generated to achieve said success. … The simulations are performed utilizing Java. Code is used from the Apache Commons Mathematics Library to calculate Z-statistics for constructing the confidence intervals.”

Predicting and detecting a sequence is more useful than a single isolated event. [Ramaki-2015] proposes a real time episode correlation algorithm for multi-step attack scenario detection. “The proposed correlation framework has been implemented in Java. The application has been tested on a system with 4 GB RAM, the Intel Core2 2.60 GHz, CPU, and the Windows 7 operating system.”

Knowing how adversaries carry out and adapt during the course of cyberattacks allows us the defender to response appropriately [Rege-2017]. The authors use a combination of qualitative observations and quantitative data science to address the problem. A time series representation of the qualitative data is created for temporal quantitative analysis of the cyberattack process. Moreover, data science methods14 such as hierarchical clustering analysis is also apply on the time series.

• Securing defense by securing software vulnerability (See Appendix A.18 for more details) Having no software vulnerability is for sure more secure. However, if any is present, we need to know at least how to fix it. [Song-2015] describes two aspects of the solution, in the context of “The DARPA Cyber Grand Challenge (CGC)”, consisting of reverse engineering and vulnerability analysis. “The CGC organizers defined a problem space that they believed to be tractable, yet difficult. Specifically, DARPA developed DECREE 15 (DARPA Experimental Cyber Research Evaluation Environment), an open source Linux extension designed for controlled cybersecurity experiments. The DECREE OS executes files in a modified Executable and Linking Format and supports only seven system calls: terminate, send, receive, fdwait, rand,

14 Specifically, they are Agglomerative hierarchical clustering [Maimon-2005] and Peak-Valley detection algorithm [Nopiah-2008 and Schneider-2011]. 15 “DARPA has made the DECREE environment and tools available to the public (https://cgc.darpa.mil).” [Song-2015]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

17 28 March 2019

allocate, and deallocate. DARPA also developed a tool suite to test and evaluate automated cybersecurity solutions as well as a limited assembly language library to provide a system call API, setjmp and longjmp exception handlers, and floating-point arithmetic APIs.”

• Validating changes and deploying mitigation strategies (See Appendix A.19 for more details) In this subsection, we consider preventive and mitigating defensive techniques. Having an approach for validating security properties of prototype systems before deployment is very desirable. [Atighetchi-2017] considers first evaluating the attack surface of a new unit, emulating the deployment of the unit, and then analyzing logically its combination with other units. The authors in [Atighetchi-2017] use “a tailored approach to security evaluations involving a strategic combination of model-based quantification, emulation, and logical argumentation.” They use an “automated system scanning tools to create an ontological model of a specific deployment of the overlay network that was setup in a testbed containing an actual specific virtual network environment involving VMs connected through a range of routable IP addresses. … Unlike Attack Surface Reasoning, which uses a testbed containing actual VMs, we simply represent each component through a Java object16 with local method calls between those objects representing networked interactions between distributed components in the real world. The emulation framework focuses on emulating communication flows in terms of being able to communicate, but it does not currently model performance related aspects of the communication, e.g., latency and throughput.”

Sometimes, we have to choose one out of a number of possible mitigating solutions. [Backes- 2017] proposes to use simulated penetration testing to determine which subset to the mitigating strategies to apply. They “implemented the mitigation analysis algorithm on top of the FD planning tool [Helmert-2006]. Our experiments were conducted on a cluster of Intel Xeon E5- 2660 machines running at 2.20 GHz. We terminated a run if the Pareto frontier was not found within 30 minutes, or the process required more than 4 GB of memory during execution. … In our evaluation we focus on coverage values, so the number of instances that could be solved within the time (memory) limits. We investigate how coverage is affected by (1) scaling the network size, (2) scaling the number of fix actions, and (3) the mitigation budget, respectively the attacker budget.”

16 Specifically, it is the Plain Old Java Objects (POJOs) in Java.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

18 28 March 2019

2.3 Cyber Solution Enablers This section describes various types of enablers of the cyber security simulation solutions. The enablers are with respect to the practicality of solutions, the availability of test data, optimizing human factors, representing accurately problem space, and quantifying payoffs.

• Finding practical algorithmic solutions (See Appendix A.2 for more details) A number of cyber security related problems are hard problems. For example, “computing the attacker’s optimal attack plan is NP-hard” [Durkota-2016]. Finding practical approximate such as pruning and bounding the search space [Durkota-2015b and Durkota-2015a] is needed. The authors in [Durkota-2016] use attack graphs (AGs) to compactly represent possible attack plans (attacker strategies) in their game models. MulVAL [Ou-2006] is used to automatically construct AGs from information collected by network scanning tools (i.e., Nessus or OpenVAS). The action costs and success probabilities can be estimated using the Common Vulnerability Scoring System [Mell-2006] or other data sources. They then model the honeypot allocation problem as an extensive-form game that captures the attacker’s limited information about the network, allowing for deception.

“The problem of computing the optimal attack policy can be represented as a finite horizon Markov Decision Process (MDP). We use domain-specific MDP algorithms to find the optimal strategies [Durkota-2015b].” All the experiments in [Durkota-2015b] “were run in single thread on 8-core Intel i7 3.5GHz processor with 10GB memory limit. Attack graphs for the experiments were generated with MulVAL [Ou-2006] tool, augmented with additional cost and probability information.” Although the authors in [Durkota-2015a] experimentally evaluate and compare their proposed approximate models and the corresponding algorithms, they do not describe the experimental environment used. They do mention that “All experiments were run on a 2-core 2.6 GHz processor with 32GB memory limitation and 2 h of runtime.”

[Hamilton-2008] uses a probability based search technique to curtail the search-space explosion issues. “The search described above was implemented using a move set developed by the Disruptive Technology Office (formerly ARDA) during the Cyber Strategy and Tactics Workshop. This workshop, held at the Space and Naval Warfare Center (SPAWAR) in San Diego, conducted attacker - defender cyber warfare exercises where the blue teams (defenders) were pitted against red teams (attackers) in a series of three cyber warfare scenarios. The missions and scenarios tested included timing based attacks, which exploited the need for certain cyber resources to be available during particular times.”

In classifying threats that requires feature extraction and feature selection, we might need “to remove non-salient features and determine an appropriate level of dimensionality for” differentiating cyberattack from normal operating conditions [Moore-2017]. There is no specific simulation environment used in [Moore-2017] but a number of tools are used at the different stages of the methodology. For examples, “The CDX network traffic data under analysis was collected at AFIT [Mullin-2007] and was captured in the libpcap file format (commonly denoted as pcap), which is the standard format for network capture tools such as TCPDump [TCPDump and Fuentes-2005] and Wireshark [Wireshark]. … Features were extracted from the CDX data

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

19 28 March 2019

using fullstats.v1.0 (Fullstats), a script created by Moore and Zuev [Moore-2005] available at BRASIL Downloads [BRASIL]. … The ANN-SNR feature selection method [Bauer-2000] is a weight-based approach and is an extension of the work of Steppe and Bauer [Steppe-1996] and Belue and Bauer [Belue-1995].”

• Generating test data (See Appendix A.7 for more details) In cyber simulation, we will need different kinds of test data. We may need attack sequences to guide attacker according to the goals of an attack plan [Krieder-2015]. The author defines the Attack Guidance Template (AGT), which is one of the context models defined within XML files used in CASCADES.

If we are simulating intrusion detection [Kuhl-2007] of cyberattack scenarios, we will need the scenarios based on [Sudit-2005]. In [Kuhl-2007], the authors develop an object-oriented Java simulation model having various features, one of which is ‘Exporting a modeled network to a “Virtual Terrain” XML file or importing a network into the model from a “Virtual Terrain” XML file (The concept of a “Virtual Terrain” is used to represent networks in the Information Fusion realm).’

• Considering human factors (See Appendix A.12 for more details) Human factors could be a liability if not properly incorporated in any cyber security solutions. Human dynamics affects the performance of cyber security operations and has to be assessed in their ability to maintain services, response to incidences and accomplish time-constrained tasks [Buchler-2018]. To collect the needed data, the authors “participated in data collection at the Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC) to understand the key features of effective team processes defined by outcome measures of scored team success. We collected data from wearable social sensors to assess face-to-face interactions and using a 16-point teamwork instrument called OAT (Observational Assessment of Teamwork) to assess teamwork and leadership behaviors in cyber defense.” To assess, the authors use various tools such as “Exploratory factor analysis (EFA) with principal component analysis was used for factor extraction. Principal component analysis summarizes the relationships among the original variables in terms of a smaller set of dimensions/factors. … We tested the adequacy of the factor analysis for our 15 variable data set using the Kaiser-Meyer-Olkin Measure of Sampling Adequacy (MSA) and the Bartlett’s test of Sphericity. … The Sociometric Badges © (Humanyze Inc.) provided a social sensing platform to track all the individual and team interaction behaviors for six of the eight teams over the two-day MACCDC. … We conducted a Bayesian statistical analysis to predict MACCDC scored team performance using all of our derived measures as potential variables in a Bayesian Multiple Linear Regression.”

Cognitive complexity of cyber event operating environment affects cognitive workloads and human functions. It needs to be assessed, as well as the assessment method used[Rosque-2019]. In [Rosque-2019], the authors presents “a conceptual foundation and empirical framework for modeling C(OEcre), the cognitive complexity of cyber range event operating environments. This includes theory-grounded definitions, an assessment capable of being automated given a cyber

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

20 28 March 2019

range event operating environment description, and a validation method.” However, its implementation is not described.

Similar to [Rosque-2019], [Veskler-2018] deals with the role of human cognitive complexity in cybersecurity operations. The authors in [Veskler-2018] describe three major approaches of modeling a network: simple replication (where an existing network or its relevant parts is copied), pure simulation, and hybrid network emulation. For the latter two approaches, we refer the readers to the numerous examples described in [Veskler-2018].

• Representing accurately problem space (See Appendix A.14 for more details) Accurate representation of the security problem space determines the meaningfulness of extracted results. Here we consider mainly the uncertainty model of action and interaction model of the players. Depending on the assumptions made on the action model, we can use various approaches such as Markov Decision Process (MDP) or Partially Observable MDP [Hoffmann- 2015]. Similarly for the interaction model, we can use explicit network graph, monotonic actions, or general actions [Hoffmann-2015]. Based the proposed model taxonomy, there are nine classes of planning problems, some of which have a corresponding concrete pentesting- related model.

[Lisý-2016 17 and Wang-2016 18] consider the MDP approach and [Sarraute-2012 19 and Shmaryahu-2016 20] consider the POMDP approach.

• Quantifying payoffs (See Appendix A.15 for more details) Usually payoffs are used to weigh decision making. Therefore, being able to quantify payoffs in the presence of uncertainties is vital. [Chatterjee-2015 21] provides a framework to quantify and reason about attacker payoff uncertainties. The uncertainties due to attacker types and their attack plans, target protective conditions, and the state of the system are represented as conditional probabilities.

On the other hand, [Rass-2017] suggests considering outcome as ordinal “… (as in conventional game theory)”. All theory sketched in [Rass-2017] “has been implemented in R (including full- fledged support of multi-criteria game theory based on distributions), to validate the method and

17 The authors implemented CFR+ , a variant of counterfactual regret minimization algorithm, within some publicly available code base. 18 The authors formulate the equilibrium strategy as a nonlinear problem and implemented their solution “in MATLAB on a computer equipped with a 600-MHZ Pentium III and 128 MB of RAM.” 19 “The experiments are run on a machine with an Intel Core2 Duo CPU at 2.2 GHz and 3 GB of RAM. The 4AL algorithm is implemented in Python. To solve and evaluate the POMDPs generated by Level 4, we use the APPL toolkit.” Their algorithm is called 4AL because it “addresses network attack at 4 different levels of abstraction.” 20 Empirical study of their solution is left to future work. 21 Their study explores the mathematical treatment of mixed payoff uncertainties. Application of their approach will be the focus of their future work.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

21 28 March 2019

to compute the results for the examples.” The authors also “strongly rely on TVA 22 [Jajodia- 2005] and human expertise in our model building, which we believe to be a viable approximation of how security risk management works in practice.”

22 Topological Vulnerability Analysis

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

22 28 March 2019

2.4 Applications This section describes the applications of cyber security simulation solutions such as in the cloud infrastructure, and educating, training, and developing personnel.

• Considering cloud infrastructure (See Appendix A.6 for more details) When applying cybersecurity solutions, the context might dictate unanticipated response. [Zhang-2014] finds that it is much more cost-effective to exploit prevalent vulnerabilities over public IaaS (Infrastructure as a Service) cloud than the traditional in-house computing environment. At the same time, the traditional environment has much lower cost-loss ratio for the cloud defenders. For both players, taking prompt action can be much more rewarding. The authors conduct their experiments on public images over Amazon EC2 (Elastic Compute Cloud). “Open source organizations like BitNami 23 and Ubuntu, IT companies such as Oracle and Amazon itself, and arbitrary number of individual contributors have published over 6,000 public images. Like potential attackers, we do penetration test over these images by launching corresponding VMs in order to analyze the weakness of running instances in the cloud.” [Zhang- 2014]

• Educating, training, and developing personnel (See Appendix A.11 for more details) Cyber security simulation solutions (CSSSs) have many applications. Educating newcomers to the cybersecurity field, improving performance of cybersecurity personnel, and experimenting and investigating cybersecurity issues are some of the high-level usages of CSSSs. General purpose solutions such as cyber ranges described in Appendix A.11.1 and gamified learning environments described in Appendix A.11.2 are available for the above purposes. There are also special purpose simulation solutions as described in Appendix A.17, some which are as follows:

. [Marshall-2014] argues for the analysis of cyber warfare effects and [Marshall-2015] describes a prototype. The authors in [Marshall-2015] provide the analysis in their prototype called COBWebS (Cyber Operations Battlefield Web Service). It is “a Service Oriented Architecture (SOA) based software application that simulates the effects of C2 (Command and Control) communications between synthetic entities and the MCS (Mission Command Systems).” “The COBWebS server leverages the MCA- WS (Mission Command Adapter – Web Service) to provide a series of CNA (Computer Network Attack) services that other simulation models can use to trigger the effects of cyber-attacks on mission command devices.” Ultimately, the authors would like to see “COBwebS transitioning to the OneSAF (One Semi-Automated Forces) baseline when it matures.”

. [Sommestad-2015b] has an environment “to test hypothesis and tools related to security assessments and situational awareness in the cyber security domain.”

23 http://bitnami.org/

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

23 28 March 2019

The author in [Sommestad-2015b] uses CRATE (Cyber Range And Training Environment) developed and operated by the Swedish Defense Research Agency (FOI). “Physically, CRATE consist of approximately 750 severs, a number of switches and various auxiliary equipment (e.g. for remote access). During an experiment these servers are instrumented with between five and twenty VirtualBox machines of various types. Instrumentation is straightforward for any type of operating system and application software that VirtualBox can handle, but most applications of CRATE use a mix of Linux and Windows machines. In addition to the typical enterprise software, simplified versions of social network services, industrial control systems and search engines are available. … Instrumentation is done through a web based graphical user interface. This interface allows the experiment designer to manipulate parameters controlling operating system, application software, firewall configurations, network topology, users, windows domains, and more. After the design stage, as set of scripts will deploy machines that matches the design. While the aim is to have the whole instrumentation process fully automated, the experimenter often need to perform some configurations on machines in order to make them fit for the experiment at hand. For example, an experimenter may want to change a web server’s configuration to make it vulnerable to some attack or plant files in network folders with user credentials. This can be done by creating a new template with these configurations or by interacting with the machine after deployment.”

. [Styz-2010] focuses on analyzing the decision-making results caused by various attack scenarios. The authors in [Styz-2010] use two types of simulators consisting of the cyber simulator and the traditional simulator. The traditional simulator basically simulates the network operations and services that are affected by bandwidth availability, information availability, information uncertainty, information compromise, and loss of communication and services. The cyber simulator simulates the effect of various attack scenarios by controlling how the operations and services are affected without causing any actual damages. Some of the parameters used are “type of cyber event (attack or defense), the probability that the cyber event will be successful, the nominal states of the cyber event, and the simulation time when the cyber event begins. … The recipient systems compute the probability that the cyber attack would penetrate the simulated cyber defenses that protect the traditional simulation host and, if the attack is successful, the initial cyber attack state that the host system should enter and the effect upon the decision-makers’ data.”

. [Weiss-2017] describes EDURange “for teaching security analysis skills to undergraduate students”. EDURange is a public cloud-based framework for cybersecurity exercises. Specifically, the exercises are hosted on Amazon EC2 (Elastic Compute Cloud).

. [Moskal-2013 and Moskal-2014] focuses on simulating attack behaviors.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

24 28 March 2019

The authors use a simulator called Multistage Attack Scenario Simulator (MASS) to analyze and predict complex network attack strategies. MASS 24 “comprises an attack generation core and four context models: Virtual Terrain v2 (VT.2), Vulnerability Hierarchy (VH), Scenario Guidance Template (SGT), and Attack Behavior Models”.

. [Noel-2015a] analyzes mission impacts of cyber actions (AMICA). “AMICA currently includes libraries for operational (kinetic) missions, computing infrastructure on which missions depend, cyber attacker TTPs, and cyber defensive TTPs.

Behavioral and temporal aspects of the system (workflow, timing constraints, required resources, etc.) are implemented through executable process models and stochastic discrete-event simulation (in iGrafx). Structural and functional aspects (environmental constraints, mission and system dependencies, event flows, etc.) are maintained through MIT Lincoln Laboratory’s Network Environment Oracle (NEO), and MITRE’s Cyber Command System (CyCS) [CyCS] and CyGraph [Noel-2015b]. CyCS contains a directed graph comprising the information and system dependencies of each mission function. NEO contains additional topological and vulnerability information that is not captured in CyCS. CyGraph provides topological and attack graph-focused visualization of the environment and cyber attack progress. The initial state of the structural cyber (attack graph) model is generated from the network topology, firewall rules, and system vulnerabilities via the Government Off-The-Shelf (GOTS) tool TVA (Topological Vulnerability Analysis) [Jajodia-2005 and Jajodia- 2010]. In this way, we leverage established tools for dependency knowledge management and automated model building.

To capture workflows, decision points, workloads, resources, and temporal constraints, AMICA employs a technique called Mission-Level Modeling (MLM) [MLM]. MLM leverages BPMN (Business Process Model Notation) to define, refine, and verify operational processes, decisions, and information flows among producer/consumer systems and people. It supports model libraries and parameterization to quickly assemble new prototypes. MLM handles the high degree of concurrency inherent in information-sharing operations, and explores impacts on MOEs/MOPs (Measure of Effectiveness/Performances) through simulation of mission models.

MLM is based on BPMN and discrete-event simulation, implemented in iGrafx (a commercial tool). MLM replaces static tools such as Visio and PowerPoint, providing an executable, visual model to support stakeholder collaboration to develop and validate new concepts. This provides a single model for qualitative and quantitative analysis, and enables rapid prototyping and reuse thorough a single modeling standard.

24 In [Moskal-2013], MASS is call CyberSim.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

25 28 March 2019

The TVA tool [Jajodia-2005 and Jajodia-2010] provides the network topology and vulnerable attack paths through the network. This represents the initial state of the network, before cyberattacks and defenses are simulated. TVA initializes NEO, which maintains dynamic cyber state under simulation and provides choices for next possible cyber states. Similarly, CyCS maintains dynamic simulation state for mission dependencies.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

26 28 March 2019

2.5 Others This section describes various issues related to cybersecurity simulation solutions that do not fit nicely into any of the above sections. More details are available in Appendix A.20. In brief, it considers the challenges of trusted autonomy in military cyber security [Dowse-2018]; the reproducibility of results from artificial intelligence-based solutions [Gundersen-2017]; and the effectiveness of script kiddie in compromising network services [Holm-2017].

The author in [Dowse-2018] only “outlines the principles of cyber security and future challenges” and claims the solutions “most likely will only come with assistance from trusted autonomy.” Thus, there is no mention of the cyber security simulation environment.

The authors in [Gundersen-2017] are interested in quantifying the state of reproducibility of empirical AI research using their proposed reproducibility metrics to measure the different degrees of reproducibility. As such, only survey of existing literature is needed and no cyber security simulation is used.

The authors in [Holm-2017] use the cyber range CRATE (Cyber Range And Training Environment) to host the experiment. The main tools used are: • AutoVAS for identifying vulnerabilities in machines • SVED [Holm-2016] for running exploits against those machines considered by AutoVAS • CRATEweb for instrumenting target machines

“CRATE was used as the base experimental platform for conducting tests. Approximately 350 virtual machines (VM) were instrumented for the test. None of these 350 systems were created for the experiment presented in this paper. They were created by a plethora of persons for various purposes during the years the cyber range has been operational. Most of them was [sic] created for use in cyber defense exercises. A large number of different operating systems are included among these 350 machines, such as Windows-based (e.g., XP, 7, Server 2003 and Server 2012) and Linux-based (e.g., Ubuntu, Kubuntu, Gentoo, Debian and CentOS). As several of the machines were created several years before the test as potential targets in exercises, many contain a large number of software vulnerabilities, making them suitable for the test at hand. Out of the 350 machines, 329 were vulnerable and 204 contained vulnerabilities with matching exploits in Metasploit.” [Holm-2017]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

27 28 March 2019

3. APPLICATION OF ATTRIBUTES ENUMERATION TO DIFFERENT PROBLEMS This section describes the results of applying the process of attributes enumeration to different problems. We first derive a generic attributes enumeration template and then apply it to seven classes of CSSEs. Next, we consider a specific attributes enumeration template for security posture applications. Lastly, we discuss the general issues of building or buying solutions for security status assessment.

3.1 Sample Attributes Enumeration Results a Using Generic Template Based on the characteristics analysis framework and technical attributes list described in Sections 1.4 and 1.5, we derive the attributes enumeration template shown in Table 1. We use it to bring out the attributes of the following cyber simulation solutions [(CALDERA) Applebaum-2016, Durkota-2016, Elderman-2017, Moskal-2018, Niu-2015, (Project Ares) Gallagher-2017, Morton- 2018 and SafeBreach-2017]. The results are shown, respectively, in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, and Table 8.

Table 1 Attributes Enumeration Template Action Model Behavior Comments Sense Act Respond Report User Network Manager Physical System

As shown in Table 1, the attributes enumeration template considers four high-level objects, {User, Network, Manager, Physical System}, that could be modeled. Each one of them can have different behavior that defines how the simulation process will proceed. These behaviors can materialize themselves in the form of actions that cause changes to the environment. Note that the template does not enumerate all the attributes in all their details. It does, however, provide a mean of associating key attributes to key components for which more details can be investigated. For such a detailed discussion example, see Section 3.2.

As the subject of cyber security simulation solutions covers a broad spectrum of problems, we expect varied solution types. This becomes obvious as we examine Table 2 to Table 8. First of all, most of them do not have cyber-physical systems. Secondly, although all of them have attacker and defender behaviors, their specific actions differ due to the type of problem being addressed. Thirdly, some (Table 2, Table 4 and Table 7) execute in a more interactive mode than others (Table 3). Fourthly, BAS (Table 8) is usually executed in a production environment, while the others are mostly simulated in a virtual environment. Lastly, some managers, especially those involved in controlling (Table 6), learning (Table 4), BAS (Table 8) and training (Table 2 and Table 6) are more involved than others.

3.1.1 Red Teaming Simulation [Applebaum-2016]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

28 28 March 2019

Here, we enumerate the attributes of CALDERA as described in [Applebaum-2016], a brief description of which is given in Appendix A.3.1. From [Applebaum-2016], we know that CALDERA consists of two primary systems: “ • Virtual Red Team System (ViRTS) – the software infrastructure used to instantiate and emulate a red team adversary. • Logic for Advanced Virtual Adversaries (LAVA) – the logical model that CALDERA uses to choose which actions to execute. • “ A simulated environment is used to test the capabilities of the LAVA reasoning engine. The simulation is built around a high-level representation of the ATT&CK model encoded into the [Eiter-2000] planning language and implemented using the DLV [ ] planning system wrapped inside of scripts. � � � Table 2 Red Teaming Simulation (CALDERA) [Applebaum-2016] Action Model Behavior Comments Sense Act Respond Report Escalate Select Also keep footholds, which results of attack Remotely gather attack actions and User1 Attacker probe credentials, action NA known host enumerate vulnerabilities hosts, exfiltrate data Log in/out of Currently, a the PC and User2 Defender NA NA NA passive turn/off the defender PC Connectivity is static. Node and Node Network connectivity NA NA NA NA vulnerability is model increased or decreased Manage Execute Report Set up and attacker and Formulate the best the orchestrate the Manager NA defender attack plans attack game game behaviors action result Physical NA NA NA NA NA NA System

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

29 28 March 2019

3.1.2 Simulation for Deception [Durkota-2016] Here, we enumerate the attributes of the cyber security simulation solution described in [Durkota-2016], the detail of which is given in Appendices 7.2.1, 7.9.1 and 9.

The authors use attack graphs (AGs) to compactly represent possible attack plans (attacker strategies) in their game models. MulVAL [Ou-2006] is used to automatically construct AGs from information collected by network scanning tools (i.e., Nessus or OpenVAS). The action costs and success probabilities can be estimated using the Common Vulnerability Scoring System [Mell-2006] or other data sources. They then model the honeypot allocation problem as an extensive-form game that captures the attacker’s limited information about the network, allowing for deception.

“Computing the defender’s strategy is difficult because: i) the number of possible defender actions grows combinatorially with the number of honeypots and host-types, ii) computing the attacker’s optimal attack plan is NP-hard problem, and iii) computing the defender’s strategy in Stackelberg equilibrium is an NP-hard problem for imperfect-information games. We describe the game model in more detail along with several approximation algorithms in [Durkota- 2015a].” [Durkota-2016]

Table 3 Simulation for Deception [Durkota-2016] Action Model Behavior Comments Sense Act Respond Report Limited Not actively User1 Attacker Knowledge NA NA NA involved of network Attacker Decide To deceive knowledge location Allocate attacker based on User2 Defender and actual and honey NA its limited network number of pots knowledge of the information honey pots network Node and Connectivity is connectivity static, except for Network model in NA NA NA NA the addition of form of honey pots attack graph Based on Determine Report the Set up and attacker and the choices of orchestrate the defender allocation honey pot search for Manager knowledge, NA options of NA allocation honey pots help defender honey pots allocation decide honey options pot allocation Physical NA NA NA NA NA NA System

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

30 28 March 2019

3.1.3 Simulation with Learning [Elderman-2017] Here, we enumerate the attributes of the cyber security simulation solution described in [Elderman-2017], the detail of which is given in Appendices 7.10.3 and 8.3.

The cyber security simulation is about “two learning agents playing against each other on a cyber network. The simulation is modelled as a Markov game with incomplete information in which the attacker tries to hack the network and the defender tries to protect it and stop the attacker.” [Elderman-20107] The authors describe the experimental setup but not the simulation environment.

Table 4 Simulation with Learning [Elderman-2017] Action Model Behavior Comments Sense Act Respond Report Increase Select which Local to each User1 Attacker NA attack node to NA users attack Select to Increase Increase Local to each User2 Defender defend or to NA detection defense users detect Connectivity is static. Node and Node Network connectivity NA NA NA NA vulnerability is model increased or decreased Manage Check if Determine Report Set up and attacker and attacker if the the orchestrate the Manager defender is or not game is NA game game behaviors detected over or result not Physical NA NA NA NA NA NA System

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

31 28 March 2019

3.1.4 Simulation for Assessment [Moskal-2018] Here, we enumerate the attributes of the cyber security simulation solution described in [Moskal- 2018], the detail of which is given in Appendices 7.8.6 and 10.

The authors “propose a framework to represent cyber attacker behaviors and apply the methodology to an existing cyber attack simulator to measure the effects network configurations and cyber attacker behaviors might have on the overall network security. … To conduct these experiments, the Cyber Attack Scenario and Network Defense Simulator (CASCADES) is used to model and simulate a variety of cyber attack scenarios. The network, the attacker, and the intent context models are represented in CASCADES as the VT (Virtual Terrain), ABM (Attack Behavior Model), and Adversary Intent Module (AIM), respectively. … the CASCADES system, which is written in Java and allows thousands of concurrent simulations of the same or different configurations of the context models. ” [Moskal-2018]

Table 5 Simulation for Assessment [Moskal-2018] Action Model Behavior Comments Sense Act Respond Report Has capability to breach and Recon- Breach, User1 Attacker NA NA some naissance Exfiltration knowledge of the network Does not User2 Defender NA NA NA NA actively defend Node and Connectivity Network connectivity NA NA NA NA is static. model Assess Assessment Set up and Manage attacker of the assess Manager attacker NA NA behavior network attacker behavior behavior Physical NA NA NA NA NA NA System

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

32 28 March 2019

3.1.5 Cyber-Physical Simulation [Niu-2015] Here, we enumerate the attributes of the cyber security simulation solution described in [Niu- 2015], the detail of which is given in Appendices 7.10.4 and 8.4.

The authors evaluate their proposed theory on the yaw-channel control of an unmanned aerial vehicle (UAV) on the physical side. On the cyber side, the UAV is assumed to be controlled by a base station through a wireless network that suffers from cyber-attacks. They model effects of the attack as delay and packet loss rate of control packets. The authors, however, do not explicitly describe the environment used to do the simulation.

Table 6 Cyber-Physical Simulation [Niu-2015] Action Model Behavior Comments Sense Act Respond Report Try to overload Increase User1 Attacker NA NA NA the wireless attack system Control physical Defend the Decide Sense system network to response to User2 Defender network load according to NA effectively network condition network load control the load and cause of physical system network load Node and Wireless wireless connectivity is Network NA NA NA NA connectivity affected by model network load. Manage Coordinate Determine Report the Set up and attacker, defender if the status of orchestrate the network, and actions physical the network control and Manager NA physical system is and the attack scenario system still within physical behaviors control system According According to Report to to the the control Physical Control the Wirelessly 25 Stability control signal and System UAV controlling controlled signal from environment network network al effects

25 UAV: Unmanned Airborne Vehicle

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

33 28 March 2019

3.1.6 Gamified Learning Simulation [Gallagher-2017 and Morton-2018] Here, we enumerate the attributes of the cyber security simulation solution described in [Gallagher-2017 and Morton-2018], a description of which is given in Appendix A.11.2.2.

The authors in [Gallagher-2017] describe “a mission-based cyber training platform that allows both offensive and defensive oriented participants to test their skills in a game-based virtual environment against a live or virtual opponent. The system builds realistic virtual environments to perform the training in an isolated and controlled setting. Dynamic configuration supports unique missions using a combination of real and/or virtual machines, software resources, tools, and network components.”

Table 7 Gamified Learning Simulation (Project Ares) [Gallagher-2017, Morton- 2018] Action Model Behavior Comments Sense Act Respond Report Attacker can be User1 Attacker Mission based actions NA a person or an AI opponent. Defender can be User2 Defender Mission based actions NA a person or an AI player. Node and Connectivity is Network connectivity NA NA NA NA static. model Manage Check Update Provide Report Set up, attacker and learner’s scenario learner the orchestrate the Manager defender progress knowledge advice and game game, score and behaviors base assessment result advise players. Physical NA NA NA NA NA NA System

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

34 28 March 2019

3.1.7 Breach and Attack Simulation [SafeBreach-2017] Here, we enumerate the attributes of the cyber security simulation solution described in [SoftBreach-2017], the detail of which is given in Appendix A.5.1.

“SafeBreach offers a BAS platform that combines a deep playbook of breach methods based on actual attacks, active investigations and unique research with deployed simulators that play the role of a virtual hacker. These simulators are deployed in critical segments of the network, in the cloud and on endpoints for a kill chain operation – infiltration, lateral movement and exfiltration of data. … Three different types of simulators are supported. Most environments can make use of host- based simulators supported on Windows, Mac OS X and Linux operating systems and deployed as lightweight agents on endpoint or server systems. Network simulators are deployed as virtual machines on VMware, Citrix or Hyper-V servers and run network breach methods. A third class is the cloud simulators. They act as infiltration and exfiltration points, located in the enterprise cloud infrastructure and are more focused as they participate in network breach methods only. ”26 Table 8 Breach and Attack Simulation [SafeBreach-2017] Action Model Behavior Comments Sense Act Respond Report Thinks and User1 Attacker Simulate across the entire kill chain NA acts like a hacker Learns and Select Perform acts based on corrective User2 Defender NA corrective NA all challenges action to action from take attackers Node and Changes to Network connectivity network Select the configuration Network model (in NA devices and change to NA is subject to real /or make change production) connectivity Determine Orchestrate Report the Orchestrate Continuously required and execute simulation and execute Manager test the NA corrective breach result breach system actions scenarios scenarios Physical NA NA NA NA NA NA System

26 https://www.scmagazine.com/home/reviews/safebreach-breach-attack-simulation-platform/

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

35 28 March 2019

3.2 CSSE Specific Attributes Enumeration Template for Security Posture Applications In this section, we formulate an attributes enumeration template specifically for the class of CSSEs that focus on assessing security posture, building on the generic template described in Section 3.1. Starting with the definition of security posture, we illustrate various issues involved in doing such an assessment, analyze a few relevant CSSEs in Sections 3.2.1, and 3.2.2, and then formulate the CSSE specific attributes enumeration template in Section 3.2.3. The application of this CSSE specific template to three CSSEs is described in Section 4.2. Finally, we conclude this section with a buy or build discussion in Section 3.3.

According to NIST (National Institute of Standards and Technology), security posture is defined as “The security status of an enterprise’s networks, information, and systems based on information assurance resources (e.g., people, hardware, software, ‘and [sic]’ policies) and capabilities in place to manage the defense of the enterprise and to react as the situation changes.” From the definition, we understand the two key factors contributing to the security posture of networks, information, and systems are information assurance resources and capabilities to do so.

To be more specific about security posture, we need to go into detail of what constitutes good security posture. Some of details are given below. Additional issues may be added as they arise.

• Prioritized assets are protected • Prioritized services/missions are guaranteed • Deployed security controls are operational as intended • Risk level is under control • Personnel are trained and aware of security issues all the time • Any resource changes do not affect adversely any of the above

To ease further discussion, we categorize the security posture as having to deal with the following issues.

• Protection of assets, services, and missions • Normal operation of security controls • Viability of risk control measures • Security training of personnel • Correction of adverse change effects

How cyber security simulation will be able to address some or all of the above issues is the subject of investigation of this section. In Section 3.2.1, we describe existing solutions that deal with the specific issues of security posture. In Section 3.2.2, we describe the class of breach and attack simulation (BAS) solutions which cover a broader range of security posture issues. In Section 3.2.2.1, we consider only the issues related to the protection aspect of the security posture.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

36 28 March 2019

3.2.1 Assessing Security Posture Specific Issues As there are many security posture issues, it is naturally more manageable to deal with them one at a time. With the exception of those described in Appendix A.5 Breach and Attack Simulation (BAS) (which is considered in more detail in Section 3.2.2 Assessing Most Security Posture Issues), what most existing solutions do is as described in Appendices 7.8 Defense Assessment and 7.11 Education, Training, and Development. Here we recapitulate those solutions under the security posture categories given above.

From Appendix A.8, we have the following:

• Protection of assets, services, and missions o The solutions in both [Bergin-2015] and [Xue-2014] consider the secure operation of autonomous vehicles. The denial of services is the key issue in [Bergin-2015] while protecting the network dynamics from being discovered is that of [Xue- 2014]. o In [Marshall-2012], the authors propose a capability to assess potential mission command applications. In [Smith-2017], the authors describe a rapid digital assessment framework for measuring “progress towards mission accomplishment through test and evaluation in the mission context.” o For the authors in [Moskal-2018], it is about how effectively secured are different network configurations from various attack behaviors. o In [Thompson-2018], the authors consider the spread of malware in the mobile tactical networks. • Operation of security controls o In [Wagner-2017a], the authors examine and quantify network segmentation and communication isolation as a means of cyber defensive measures.

From Appendix A.11, we have the following:

• Security training of personnel o The cyber ranges described in [Zoto-2018, Vykopal-2017, and Weiss-2017] are used for education and training. In [Marshall-2014], the authors investigate on how to best incorporate cyber warfare effects in cyber simulation tools for training purposes. In [Stytz-2010], the authors focus on providing the decision makers the opportunity to learn from the results of various attack scenarios. o ESCALATE, Project Ares, Capture The Packet Training (CPT) tool, and Cyber Attack Academy (CAA) are a few example of gamified learning environments used for training cybersecurity personnel.

From the above solution samples, we note that most solutions are for very specific purpose and high-level applications such as secure operation of autonomous vehicles that might be affected by denial of service attacks or network interference [Bergin-2015 and Xue-2014], mission command applications [Marshall-2012], progress towards mission accomplishment [Smith- 2017], and cyber warfare effects [Marshal-2014].

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

37 28 March 2019

On the other hand, the rest of the solutions deal with the more fundamental aspects of security posture such as network configurations [Moskal-2018], spread of malware [Thompson-2018], network segmentation and communication isolation [Wagner-2017a], security training solutions [Stytz-2010, Vykopal-2017, Weiss-2017, and Zoto-2018], and gamified learning environments.

Nevertheless, some of the cyber ranges are also used for cyber research, development, and testing and digital forensic analysis [Vykopal-2017], in addition to cyber security education and training [Vykopal-2017, Weiss-2017, and Zoto-2018].

Note also that none of the above deal with assessing or improving risk and managing change controls. Thus, it is desirable to have a cyber security simulation solution that covers the fundamentals of network security posture in a broad manner such as the class of breach and attack simulation (BAS) solutions which is described in Appendix A.5 and will be discussed next in more detail from the security posture assessment point of view.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

38 28 March 2019

3.2.2 Assessing Most Security Posture Issues27 Table 17 of Appendix A.5 lists 17 breach and attack simulation (BAS) solutions that address security posture issues in various ways. According to the authors of [Barros-2018], “Breach and attack simulation tools help make security postures more consistent and automated.” And they “simulate a broad range of malicious activities (including attacks that would circumvent their current controls), enabling customers to determine the current state of their security posture.”

In this section, we are not going to describe each and every one of them but will only consider the union of all their good features and weaknesses, pointing out only some major differences among them. First of all, BAS solutions are different from the so-called penetration testing tools such as Core Impact, Immunity CANVAS, and Rapid 7 Metasploit. Nevertheless, some of them (Picus Security and XM Cyber, for examples) combine BAS capabilities with penetration and exploitation capabilities with an option to switch from one mode to the other.

According to the authors in [Barros-2018], the BAS solutions are expected to:

• Ensure the effectiveness of existing security controls • Obtain an objective and repeatable measure of the network security posture • Determine the right security control mix versus the realistic risks outstanding • Determine the exposure to certain threat scenarios

BAS solutions ensure functioning security control through continuous and consistent simulated test. They are automated and consider multi-stage attack chain, not just detecting the presence of those published and known vulnerability detected in a vulnerability assessment (VA) tool. As for security controls, they do not just audit for their presence but also test for their normal operation and effectiveness under network traffic and/or process overload conditions. “For example, Verodin and Circumventive highlight that they are deployed to check the effectiveness of security controls and to provide evidence for selecting and deploying security technologies.” [Barros-2018]

The BAS solutions assume that the presence of some security controls are already somehow optimally configured. Based on this assumption, the BAS solutions try to validate the network is actually working correctly as intended. Examining the observed BAS use cases, and emerging core BAS capabilities and functions indicated in [Barros-2018], we can see how well BAS solutions could assess security posture. Note that each use case statement is preceded with the Observed Use Case acronym, “OUC”, followed by a number and a hyphen “-“; and each capability and function statement is preceded with the Emerging Core Capability acronym, “ECC”, followed by a number and a hyphen. Listing them under the security posture issue headings specified in the above section, we have the following:

• Protection of assets, services, and missions o OUC2-Enhanced penetration testing and red team operation

27 The content of this section is based mainly on the Gartner Report [Barros-2018].

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

39 28 March 2019

o ECC1-The ability to test all phases of an attack, from the exploit to post- exploitation, persistence and maintaining access (some BAS tools place much higher importance on post-attack activities) o ECC5-An out of band (OOB) attack method library, ideally based on or mapped to MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) and pre-ATT&CK frameworks (library should have individual tests and attack campaigns) o ECC6-Content updates for the library to continuously deliver simulation content relevant to the threats of the moment o ECC7-A customizable attack/campaign logic that is flexible enough to address present and likely future threats • Operation of security controls o OUC1-Control effectiveness measurement o OUC3-Reporting and prioritizing security controls o OUC5-Reporting on security controls o OUC6-Proof of controls and compliance o ECC4-Functions to test both network and endpoint security controls • Viability of risk control measures o ECC8-Some form of severity prioritization for test findings • Security training of personnel 28 • Correction of adverse change effects o OUC4-Security purchase improvement o ECC2-The capabilities to test continuously, periodically and on-demand • Other security posture issues o ECC3-The ability to deliver safe tests with no or extremely low chances of interfering with business operations, and no to low user interference when deployed on production assets o ECC9-Integration with external tools, such as security operations, analytics and reporting (SOAR) or SIEM, to gather logs and alerts, test detection,

As is evident from above, BAS solutions cover a wide scope though not all equally well with respect to assessing security posture. Moreover, the ability for BAS solutions to test continuously, periodically and on-demand may be able to catch adverse effects of changes in an organization, policy, standards, technology, personnel, and network. However, BAS solutions seem not to address adequately in the testing of the viability of risk control measures such as testing the effectiveness of the incident response process. If the BAS solutions are to be considered for security posture, we definitely need to start with a detail list of what constitutes the assessment of security posture for each of these issues and then evaluate how well the BAS solutions are going to satisfy our requirements.

Before going on to the next section, we list here the strengths and weaknesses of BAS solutions as described in [Barros-2018]. Note that each strength statement is preceded with the letter “S”

28 Although security training of personnel is not explicitly covered by either the OUCx or ECCx, BAS solutions are helpful in making the defenders better over time.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

40 28 March 2019

followed by a number and hyphen and each weakness statement is preceded with the letter “W” followed by a number and hyphen. This is done for ease of future reference.

Strengths of BAS Solutions: “ • S1-It is one of the first repeatable and objective ways to test security as deployed in the real world. • S2-It can be used for continuous or, at least, frequent testing. • S3-It can test a wide range of controls and assets. • S4-It does not require expensive human talent and ‘one-off’ methodologies. • S5-It can test not just the exploitation, but post exploit attacker actions and even insiders. • S6-It can simulate particular actors and specific malware that is of interest. • S7-It helps train the security team/blue team. “

Weaknesses of BAS Solutions: “ • W1-The nascent market is characterized by a huge difference in technology coverage and test types. • W2-It lacks the depth (and ‘off the books’ creativity and adaptability) of a real red team test. • W3-It puts one more set of action items to the security team plate. • W4-Tools lack robust prioritization of the findings • W5-Detection process testing may have many manual steps. • W6-BAS does not simulate application penetration testing • W7-BAS tools do not test recovery and remediation procedures. • W8-BAS tools that have exploitation and penetration testing functions are a safety risk. “

3.2.2.1 Protection as the Only Key Security Posture Issue If protection of assets and services is the main concern of security posture and BAS solutions are potential solutions, then we should be tracking the observed use case OUC2, emerging core capabilities ECC1, ECC5, ECC6, and ECC7, strengths S3, S6, and S7, and weaknesses W2 listed in the previous section and also repeated below.

• OUC2-Enhanced penetration testing and red team operation • ECC1-The ability to test all phases of an attack, from the exploit to post-exploitation, persistence and maintaining access (some BAS tools place much higher importance on post-attack activities) • ECC5-An out of band (OOB) attack method library, ideally based on or mapped to MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) and pre- ATT&CK frameworks (library should have individual tests and attack campaigns)

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

41 28 March 2019

• ECC6-Content updates for the library to continuously deliver simulation content relevant to the threats of the moment • ECC7-A customizable attack/campaign logic that is flexible enough to address present and likely future threats • S3-It can test a wide range of controls and assets. • S6-It can simulate particular actors and specific malware that is of interest. • S7-It helps train the security team/blue team. • W2-It lacks the depth (and "off the books" creativity and adaptability) of a real red team test.

The basis for most of the protection concerns is the attacker characteristics as indicated in the following: • [OUC2, S6, S7, W2] with respect to penetration testing and red team operation • [ECC1, ECC5, ECC6, ECC7] with respect to all phases of an attack method

Based on the above discussion, we consider next a security posture specific CSSE attributes enumeration template.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

42 28 March 2019

3.2.3 Attributes Enumeration Template for Security Posture Assessment CSSE Here we elaborate on the attributes enumeration template described in Section 3.1. Before describing the template in more detail for security posture assessment, we describe first the useful preparation associated with having such an assessment and revisit the definition of security posture assessment. Next, we describe a process flow of an assessment with learning capability. For assessment, protecting the network and personnel resources is the main concern. As such, the attributes associated with the resources are listed in Table 10 and their respective attack attributes in Table 11. We also list in Table 12 the attributes of different components needed to support the simulation process.

Before the security posture of a network is assessed, it is prudent to have done the following: • Defined security posture objective • Listed the aspects that must be addressed in assessing security posture • Done as much as possible the following to secure the network: o patch the network o know your system, risk of unpatched software and misconfigurations o have built-in resiliency o follow some standard operating procedure of keeping the network secure

Now, repeating the definition of security posture according to NIST, we have it as “The security status of an enterprise’s networks, information, and systems based on information assurance resources (e.g., people, hardware, software, ‘and [sic]’ policies) and capabilities in place to manage the defense of the enterprise and to react as the situation changes.”

From the definition, we note that security posture covers a set of resources in their ability to secure themselves in their current hardened operating condition with respect to any future change events to an acceptable risk level. These changes could be normal or adversarial. Normal changes that have unintended side effects have to be detected and resolved. Moreover, all adversarial changes have to be detected and mitigated. However, the resources themselves may not be absolutely secure and thus there is a need for setting an acceptable risk level. This acceptable risk level is based on the set of hardened resources at some required operating conditions.

Considering the resources and the security statuses they are to be evaluated, we list in Table 9 the five key resources that are involved in the security status of the networks, personnel, information, and systems.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

43 28 March 2019 Table 9 Resources to evaluate for security statuses Resources to evaluate Security Status Assets, services, and mission securely available Security controls effectively operating Incident response measures viably controlling risk level Personnel security-wise savvy Changes in personnel, all the above are still true policy, standards, technology with no adverse security effects

Assessment of security posture thus means the evaluation or estimation of the nature or quality of the security status. In this investigation, we are concerned with the evaluation of security quality. If we are just concerned with the estimation of security nature, we may just end up listing the presence of security measures. For example, with respect to assets, services, and mission, we may be contented with just their availability and not necessarily including their integrity under some form of attacks.

To derive the most benefit from an assessment of an organization and its network, we need to present to the assessment tool an accurate picture of the organization and its network, (henceforth called the resources) and an expected set of security objectives for each aspect, according to each resource, of the security posture.

Figure 4 Flowchart of an Assessment with Learning Simulation Tool

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

44 28 March 2019

The assessment with learning simulation tool as shown in Figure 4 uses the four inputs (objectives, resources, acceptable risk level, incident response measures) to come up with the security status of the resources. Comparing the expected security objectives to the estimated security status, the tool compiles the delta which indicates the list of discrepancies and deficiencies found. If this delta is manageable by incident response measures to be within risk tolerable level or is already below the acceptable risk level, then the security posture of the resources (organization and network) is in line with the expected security objectives. Otherwise, we may need to do one of the following: • To accept the new risk level as there could be situations wherein no hardening or response measures are available or possible at the moment. In such cases, we need also to note such risk level change for timely future resolution. • To harden the resources (organization and the network) and/or update the incident response measure and repeat the assessment process.

The outputs of such an assessment with learning simulation tool are as follows: • The assessed resource is within acceptable risk level per associated set of security objectives after the first assessment iteration. • The assessed resource is within a newly accepted higher risk level requiring some changes in the associated set of security objectives after the first iteration. • The assessed resource is within acceptable risk level per associated set of security objectives after two or more assessment iterations after doing one or more of the following: o harden the resource o revise associated incident response measures o change the acceptable risk level to the newly found higher risk level requiring some changes in the associated set of security objectives

Next, we list the attributes associated with the resources, actors, and simulation engine components in Table 10, Table 11, and Table 12 , respectively. For each resource in Table 10, we consider model attribute to describe its features, operational attribute to describe its normal behavior or expected operating condition, and metrics to evaluate how well it should behave for consideration in the risk level analysis. For each resource in Table 11, we consider sample attacks that could affect its behavior and its associated detailed attributes that may be needed depending on the required level of abstraction. Note that a similar table for the normal usage attributes of the resources, though possible, is not presented here. In Table 12, we present the attributes associated with various supporting components of the simulation engine.

Note that the content in the tables below are for illustrative purposes only and thus, are not exhaustive. These attributes are of the technical aspect. For non-technical aspect, refer to Section 1.5.2. A detailed attributes enumeration using the templates described here is illustrated for three CSSEs for assessing security posture in Section 4.2.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

45 28 March 2019

Table 10 Resources Attributes Model Operational Resources Metrics 29,30 Attributes Attributes Network • network • Availability • How many users/tasks can graph • Performability be on-lined at the same • attack • Connectedness time? graph • How fast should the • activity response time be for each thread 31 task/user? • How many isolated network components should there be? Service • service • Availability • How many users can use access • Integrity the service at the same profile time? • Are the services performing the tasks they are deployed for? Information • information • Availability • Is the information access • Integrity available at current time? control • Confidentiality • Is the information intact at current time? • Has the information been exfiltrated at current time? Mission • resource • Probability of • What resources will be requirement successful needed? profile completion • Are the resources • execution available? plan • How well the availability of the needed resources could be guaranteed? • How long will the mission take? • What is the weakest link in the resources?

29 A question related to the objectives could require either a categorical or quantitative answer depending on how the question is asked. We will defer to the future work with respect to what is the right type of question to ask. 30 The objectives are usually set with respect to some form of attack scenarios. 31 Activity threads define the paths an adversary has taken. Attack graphs identify and enumerate paths an adversary could take. [Pendergast-2014]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

46 28 March 2019

Security • deployment • Availability • Are they operating? Controls profile • Integrity • Are they located where they are supposed to be? • Are they cooperating as expected? Incident • response • Effectiveness • How effective is the Response measures • Coverage response with respect to Measure plan time, cost, and effort? • How well the response mitigates different incidents? • How many types of incidents are covered? Personnel • personnel • Level of security • How security savvy are profile knowledge the personnel? • Level of security awareness • Level of security savviness Changes • change • Consequences on • How each change affects plan 32 all of the above any of the above assets? • attack assets • How resilient are the plan 33 assets to any of the • threat changes? intelligence profile 34

Table 11 Resources Attack Attributes Resources Attack Attributes Other Detailed Attributes 35 Network • Denial of Service (DOS) • Network traffic • Malware • Network capacity • Persistent Threat Attack (PTA) • Wireless environment • Link bandwidth • Network topology

32 The change plan may involve the commission or omission of assets after maintenance, repair, replacement, and improvement operation. This may also include changes in personnel, policy, standards, and technology. 33 This kind of changes is adversarial and is not the result sanctioned operation. 34 Any update in the threat intelligence feeds. 35 The need for most of these attributes depends on the level of abstraction.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

47 28 March 2019

• Delay and loss • Packet size Service • DOS • Type of services • Malware • Protocol • PTA • Hardware requirements Information • PTA (data exfiltration) • Location in the network • Malware • Access profile • Type of information • Value of information Mission • DOS • List of required resources • Malware • Each resource usage requirement • PTA • Context of mission • Electronic jamming • Execution timing profile Security • Disabled • Location in the network Control • Overloaded • Types of control • Bypassed • Threshold values Incident • No proper response measures • List of measures against a list of Response • Ineffective response measures incident scenarios Measure • Ineffective implementation of • Execution plan for each of the response measures measures • Wrong response measures Personnel • Phishing • Security knowledge of personnel • Watering hole • Security training history of • USB personnel • Chat room • Security incident history of personnel • Security responsibility of personnel • Job history of personnel Changes • Personnel, policy, standards, • New hire, job change, retirement, technology and layoff • PTA • New policy and policy revision • Threat intelligence such as in the firewall rules, sensor threshold, and fusion algorithm • New standards and changes in security recommendations • New hardware/software, process, configuration, and solution or changes in existing hardware/software, process, solution, and configuration. • New deployment and environment • Updates from threat intelligence feeds

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

48 28 March 2019

• Changes in the resources due to PTA • Time stamped records of changes

Table 12 Simulation Engine Support Attributes Detailed Support Model Attributes Attributes Planning • AI decision planning • Controlled depth and breadth [Applebaum-2016] search [Applebaum-2016] • Reinforcement learning • Markov Decision Process, [Elderman-2017] Monte-Carlo learning, Q- • Instance based-learning [Ben- learning [Elderman-2017] Asher-2015] • Collection of instances and similarity matching for decision making [Ben-Asher- 2015] Behavior • Defender behavior • Kill chain [Moskal-2018] • Attacker behavior • Capability [Moskal-2018] • Normal user behavior • Opportunity [Moskal-2018] • Intent [Moskal-2018] • Knowledge [Moskal-2018] • Malware Execution • Training and learning, • virtual 36 37 o Red teaming, cyber range • virtual on-line 38 o Gamified learning • constructive • Breach and attack • constructive on-line 39 • Defense assessment • Incident response • Research and development of algorithm Data • Training • PTA-based attack scenarios Generation • Mission • DDOS • Test • Intrusion • Attack/Defence • Virtualized network

36 A simulation involving real people operating simulated systems [https://en.wikipedia.org/wiki/Live,_virtual,_and_constructive]. 37 A virtual simulation is done on a live system without affecting the normal functions of the live system. 38 A simulation involving simulated people operating simulated systems [https://en.wikipedia.org/wiki/Live,_virtual,_and_constructive]. 39 A constructive simulation is done on a live system without affecting the normal functions of the live system.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

49 28 March 2019

• Network configuration configuration with and without • Network traffic pattern vulnerabilities • Mission tasks • Training lessons/scenarios • Security tutorial Analytic • Model verification and • Update knowledge base validation • Update algorithm • Suggest remediation measures • Prioritize resource security • Prioritize remediation measures Report • Result of assessment • Training assessment report Generation • Intermediate result of • Attack consequence/scenarios assessment • Resource assessment result • Customized result • Simulation configuration Actor • Actor of various behavior, • AI-based actor (green, blue, Creation skill, knowledge, intent red, white, yellow) 40 • Actor with attacker behavior • Actor with defender behavior • Actor with normal user behavior • Real person as an actor Gaming and • Encouragement • Badges, rewards, certification Training • Guidance • Player assessment • Player guidance • Playback Simulation • Assessment • Easily and rapidly set up/reset Set up/Reset • Training simulation sessions for • Incident response validation different assessment scenarios. Knowledge • Knowledge base • Repository of learning Base Support experience and generated data for training, test, mission,

40Obtained from https://www.paloaltonetworks.com/solutions/initiatives/cyberrange-under-hood, “Green Team: Creates “legitimate” user and server application traffic. Red Team: Acts as malicious users and malicious or compromised servers. Yellow Team: Simulates innocent users clicking on phishing links or unknowingly installing malicious applications and compromising the network’s security. White Team: Launches cyberattack scenarios, creates traffic and monitors the success or failure of the Blue Team in terms of incident handling and scenario response. Blue Team: Represents the network operations center, security operations center and Cyber Incident Response Team.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

50 28 March 2019

attack, network, actors, and others

3.3 To Build or Buy a Cyber Range Solution for Security Status Assessment To make the right decision whether to build or buy a specific solution, we will need to have answers to the following questions: • What are the requirements? o See Section 1.5.1 for examples of technical requirements o See Section 1.5.2 for examples of non-technical requirements • When is it needed? • What is the budget?

Then the decision is simply choosing the option that allows us to have the solution that satisfies all the following conditions: • Meet all the requirements • Is presently available • Is within budget

Simple as the decision process may seem, the conditions present some contradicting realities. Meeting all the requirements is easier if we are building the solution. This is also true if the solution is for a well-studied problem that has long been solved. However, building a solution will usually take longer time than buying an existing solution, assuming the first condition is met by some existing solutions. As for budget, buying is usually easier than building to estimate the upfront and long term cost.

Nevertheless, each option has its own advantages and disadvantages as shown in Table 13 for building and in Table 14 for buying.

Table 13 Pros and Cons for Building Pros Cons • Able to customize solution to • Need hired or in-house one’s needs because of one or expertise on the subject matter more of the following: and the development process o no solutions exist due to • May take a long while before missing must-have features something useful is available o existing solutions do not completely satisfy needs o existing solutions are not compatible with one’s environment

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

51 28 March 2019

• Increase flexibility and • Need track record of building adaptability of the solution if customized solution hired or in-house capability • Harder to estimate total cost exists and on-going cost • Own the solution • Need planning and tracking of the project • Need on-going hired or in- house support for maintenance, training, and future development • Gain competitive advantage if • Complexities and hidden costs no similar solution exists and is available

Table 14 Pros and Cons for Buying Pros Cons • What are available may cover • Need time and effort to completely or most of one’s determine which one is most needs suitable • Can use the available solutions • What are available may not immediately completely meet one’s need • No guarantee a satisfying solution exists • Can test drive potential • Need to ensure the selected solutions one is extensible to cover one’s needs • Established on-going training, • Need to rely on the vendors to maintenance, development and resolve issues support system • To some extent, the solutions • Need gap analysis to have been “tested” understand what are missing and when they will be available, if at all or not yet available • Minimal development effort if • Vendors own the solution the solutions satisfy most of the must-have requirements • Easier to estimate total and on- • Depend on the vendors for going cost product functionality, especially for future feature development • May not have full control of

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

52 28 March 2019

the on-going cost

• Known considerations and • May have to make up for the limitations limitations if they are not too restrictive

3.3.1 A Decision Process Usually, developing from scratch is not necessary as we can often buy whatever we need. Unless it is obvious that there is no way around, we should avoid building as much as possible. To minimize development, we should buy whatever we can 41 and develop only what we need to satisfy our requirement.

Keeping in mind the pros and cons of buying and building described above, we can use a decision process as follows:

1. We buy the best solution that satisfies our requirements, if one exists. 2. Otherwise, we buy the best solution that satisfies best most of our needs, if one exists and the requirement gap can reasonably be covered to an acceptable level by one or more of the following: a. Reducing or deferring some of the requirements b. Developing missing features within the solution by in-house or hired expertise, including by the vendors c. Developing missing features outside of the solution by in-house or hired expertise, including by the vendors d. Combining it with one or more of the other existing solutions 3. Otherwise, build incrementally the solution according to the prioritized requirement list.

41 For example, see CyberRange as a Service™ at https://www.tsieda.com/single- post/2018/07/08/CyberRange-as-a-Service. For more information, see [TSI-2018]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

53 28 March 2019

4. SOLUTION OPTIONS AND TRADEOFF DISCUSSION This section describes some potential building blocks for the cyber security simulation solutions and lists the attributes of three cyber security simulation solutions from the security posture assessment perspective.

4.1 Sample Potential Building Blocks In this section, we describe some of the potential building blocks for use in the development of cyber security simulation tool for security posture assessment. They are:

• Cytoscape, described in more detail in Section 4.1.1, is a generic graph generation, visualization, analysis, and manipulation tool. It may be used to display and manually manipulate attack graph once it is defined in or loaded into the system; • RANGE, described in more detail in Section 4.1.2, is implemented on top of Cytoscape as a plug-in and is used to create, generate, display, and manipulate realistic network models; • DLV , described in more detail in Section 4.1.3, is a planning system built on top the disjunctive� logic programming system DLV (DataLog with Disjunction [ ]) using a new logic-based planning language called . CALDERA encodes a high-level representation of the ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge)⋁ model into the planning language to generate attack� plans; • GNS3, described in more detail in Section 4.1.4, is graphical network simulator for building,� designing, and testing networks in a virtual environment; and • Cloud Options, described in more detail in Section 4.1.5, is about the different aspects of security in the SPI (Software, Platform, and Infrastructure) stack.

4.1.1 Cytoscape From Cytoscape web page, Cytoscape is described as “an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data. Although Cytoscape was originally designed for biological research, now it is a general platform for complex network analysis and visualization. Cytoscape core distribution provides a basic set of features for data integration, analysis, and visualization. Additional features are available as Apps (formerly called Plugins). Apps are available for network and molecular profiling analyses, new layouts, additional file format support, scripting, and connection with databases.”

Its core data structure is basically a network graph with nodes and links that can store application specific attributes. In addition to populating its core data structure with attributes, its software component allows for visualization, selection, and filtering of the stored data, and interface to external methods. These methods are implemented as Apps for application specific analysis and data manipulation. In visualizing the network graph, we can use a variety of layout algorithms that include cyclic, tree, force-directed, edge-weight, yFile Organic 42 layouts.

42 “The organic layout style is based on the force-directed layout paradigm. When calculating a layout, the nodes are considered to be physical objects with mutually repulsive forces, like, e.g., protons or electrons. The connections between nodes also follow the physical analogy and are considered to be

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

54 28 March 2019

To ease visual analysis, “an attribute-to-visual mapping allows data attributes to control the appearance of their associated nodes and edges. Cytoscape supports a wide variety of visual properties, such as node color, shape, and size; node border color and thickness; and edge color, thickness, and style; a data attribute is mapped to a visual property using either a lookup table or interpolation, depending on whether the attribute is discrete valued or continuous.” [Shannon- 2003]

“To reduce the complexity of a large molecular interaction network, it is necessary to selectively display subsets of nodes and edges. Nodes and edges may be selected according to a wide variety of criteria, including selection by name, by a list of names, or on the basis of an attribute. More complex network selection queries are supported by a filtering toolbox that includes • a Minimum Neighbors filter, which selects nodes having a minimum number of neighbors within a specified distance in the network; • a Local Distance filter, which selects nodes within a specified distance of a group of preselected nodes; • a Differential Expression filter, which selects nodes according to their associated expression data; and • a Combination filter, which selects nodes by arbitrary and/or combinations of other filters.

” [Shannon-2003]

To customize Cytoscape, “Plug-in modules provide a powerful means of extending the Core to implement new algorithms, additional network analyses, and/or biological semantics. Plug-ins are given access to the Core network model and can also control the network display. Although the Cytoscape Core is Open Source, plug-ins are separable software that may be protected under any license the plugin authors desire.” [Shannon-2003] Separating the application specific analyses and semantics from the Cytoscape Core allows user great freedom and flexibility to accommodate a wider range of applications.

According to the Cytoscape web page, Apps/plug-ins “may be developed by anyone using the Cytoscape open API based on Java™ technology and App community development is encouraged. Most of the Apps are freely available from Cytoscape App Store.”

Some examples of Apps that might be of interest are listed below: • Enhance Search It performs “search on multiple attributes using logical operators and wildcards. Locate nodes and edges of interest based on their attributes and highlight them on the network.” • Network Randomizer

springs attached to the pair of nodes. These springs produce repulsive or attractive forces between their end points if they are too short or too long. The layout algorithm simulates these physical forces and rearranges the positions of the nodes in such a way that the sum of the forces emitted by the nodes and the edges reaches a (local) minimum.” [http://docs.yworks.com/yfiles/doc/developers- guide/smart_organic_layouter.html]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

55 28 March 2019

“Network Randomizer is a Cytoscape app for generating random networks, as well as randomizing the existing ones, by using multiple random network models. Further, it can process the statistical information gained from these networks in order to pinpoint their special, non-random characteristics.” • Path Explorer “PathExplorer enables the user to choose source and destination nodes and highlights the paths from and to them respectively. It also enables the user to exclude nodes and edges based on their attributes thus filtering the paths to desired properties. The user can change the width of the paths in order to suit his/her visual style. Finally, after obtaining the desired paths, the user can select them and save them as a network for future study.”

With respect to cyber security, we may need to investigate more about Cytoscape with respect to the following: • multi-user access to the same network graph • network graph evolving over time • graph traversal • apps for traversing a network graph according to the attacker and defender behavior

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

56 28 March 2019

4.1.2 RANGE RANGE (Rapid Network Generation and modeling tool) is an application that runs on the Cytoscape platform. It was developed to allow for easy generation of realistic network models for use in the research of cyber security applications. Each network has a topology. The network consists of nodes and links. Each node can be a device or host that has associated attributes such as interfaces, installed software and vulnerabilities, and IP address. If a device is a firewall, then firewall associated attributes can be specified such as source object, source ports, source MAC address, destination object, destination ports, destination MAC address, rule index, action, and protocol. Each link can have attributes such as network interface source and destination and connectivity type such as wireless or not. For more detail, see the Cyber Data Exchange Format (CDE) specified in [Campbell-2019].

A network can be manually created with nodes filled with attributes and then connected as needed. It can also be imported “from several external scanning tools, such as Nmap 43 in their native formats to facilitate the creation and modelling of existing networks. These existing networks can be modified to add additional software and vulnerabilities in order to create models for further analysis.” [Campbell-2019] The specified CDE data model in a Java Script Object Notation (JSON) allows for exporting created model for external storage and external analysis tools and for importing from other cyber research applications.

Once loaded or created in RANGE, the network model can be further modified in terms of the attributes and/or augmented in the number and type of nodes and links already present. Moreover, Visual Layout in RANGE allows the user to display the network in different layout topologies and display the associated attributes as available in Cytoscape and ease the manual manipulation and visual analysis of a large network.

Though not described in [Capmbell-2019], RANGE is able to generate an attack graph out of its generated network model or to use its exported CDE formatted data to generate its corresponding attack graph. This attack graph, in CDE format, can also be imported back into Cytoscape for further refinement.

43 Nmap: the Network Mapper – a free security scanner available from https://nmap.org/

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

57 28 March 2019

4.1.3 DLV Planning System DLV is a planning� system built on top the disjunctive logic programming system DLV (Data�Log with Disjunction [ ]) using a new logic-based planning language called .

The authors in [Faber-1999]⋁ describe DLV as “a knowledge representation system,� based on disjunctive logic programming, which offers front-ends to several advanced KR formalisms.” The DLV system incorporates new techniques for the computation of answer sets of disjunctive logic programs. ‘These techniques try to “push" the query goals in the process of model generation (query goals are often present either explicitly, like in planning and diagnosis, or implicitly in the form of integrity constraints). This way, a lot of useless models are discarded “a priori" and the computation converges rapidly toward the generation of the “right" answer set.’ [Faber-1999] More information on DLV is available in http://www.dlvsystem.com/dlv/ and [Grasso-2011].

In [Eiter-2000], the authors propose the “new logic-based planning language, called . Transitions between states of knowledge can be described in , and the language is well suited for planning under incomplete knowledge. Nonetheless, also supports the representation � of transitions between states of the world (i.e., states of complete� knowledge) as a special case, proving to be very flexible. .... This novel system allows� for solving hard planning problems, including secure planning under incomplete initial states, which cannot be solved at all by other logic-based planning systems such as satisfiability planners.”

The syntax 44 of the language allows us to describe the following: • Fluents for describing relevant properties of the domain of discourse • Actions for changing current� state by modifying the truth values of some fluents if some preconditions hold in the current state • Causation rules for defining static and dynamic dependencies • Initial state constraints for specifying initial constraints • Conditional executability for specifying excecutability based on a set of truth values • Safety restriction for ensuring all rules and executability conditions satisfy certain syntactic restriction • Action description “is a pair = , where is a finite set of action and fluent declarations and is a finite set of safe executability conditions, safe causation rules, and safe initial state constraints” [Eiter�� -2000]〈� � 〉 � • Planning domain� “is a pair = , where is a normal stratified datalog program (referred to as background knowledge) that is safe (in the standard LP [Logic Programming] sense), and �� is an 〈action�� �� 〉description.”�� [Eiter-2000] • Query for querying the planning domain • “Planning problem =�� , is a pair of a planning domain and a query ” [Eiter-2000] �� 〈�� �〉 �� �

44 For more details, please refer to [Eiter-2000].

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

58 28 March 2019

The semantics 45 of the language allows us to understand how the planning is being executed. • Instantiation defines the legal instantiation of action and fluent declarations • State is “a consistent set of ground fluent literals” [Eiter-2000] • State transition is “a tuple = , , where , ’ are states and is a set of action atoms” [Eiter-2000] • For a positive (that is if �no default〈� � �′ negation〉 �occurs� in ) and �a state , a set of actions is called the executable action set with respect to the state iff for each there exists an �executability� condition satisfying certain�� conditions 46� � � � ∈ � A sample program structure 47 is shown below� ∈ �with various, some optional, sections that start with a keyword followed by a colon. “ � fluents: actions: always: initially: [noConcurrency.] [securePlan.] goal: ?(i) 48

where and are sequences of declarations as defined in Section 2.1 of [Eiter-2000], and (resp., ) is a sequence of causation rules and executability conditions which apply to any (resp., any initial) state.

By default, DLV will look for plans allowing concurrent actions (that is, plans that may contain transitions , , � with | | > 1). By specifying ′ 〈� �noConcurrency.� 〉 �

the user can ask for sequential plans. In the presence of

securePlan.

or the command-line option -FPsec, DLV will only compute secure plans, as opposed to the default situation where all (optimistic) plans� are computed and the user interactively decides whether to check their safety.”

45 ibid. 46 ibid. 47 ibid. 48 “(i)” is used to specified the maximum number of steps allowed to reach the goal.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

59 28 March 2019

For an example of DLV application, we reproduce here the simple Blocks World instance in [Eiter-2000] shown in Figure� 5.

Figure 5 Sussman’s world of blocks planning problem example reproduced from Fig.1 of [Eiter-2000]

In this example, our goal is to have block c on top of block b and block b on top of block a, starting with blocks b and a being somewhere on the table and block c on top of the block a. Moreover, we would like to achieve the goal state in three steps, each of which consisting of only one move.

As the planning problem is = , , where = , and is the query, we have = , , , where is the static background knowledge and is the action description for the blocks world for the above�� example.〈�� �〉 More specifically,�� 〈�� �� 〉 consists� of the following rules ��and actions:〈〈�� �� 〉 �〉 �� �� �� block(a). block(b). block(c). location(table). location(B) :- block(B).

Similarly, has one action move and two fluents on, and occupied as shown in the following: �� fluents: on(B,L) requires block(B), location(L). occupied(B) requires location(B). actions: move(B,L) requires block(B), location(L).

The associated rules and initial rules are as follows: always: executable move(B,L) if not occupied(B), not occupied(L),B <> L. inertial on(B,L). caused occupied(B) if on(B1,B), block(B). caused on(B,L) after move(B,L). caused -on(B,L1) after move(B,L), on(B,L1), L <> L1. initially: on(a,table). on(b,table). on(c,a).

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

60 28 March 2019

noConcurrency.

With the aim of achieving the goal state in three steps, we specify the query as follows:

goal: on(c,b),on(b,a),on(a,table)? (3)�

“Intuitively, the executable rule says that a block B can be moved on location L B if both B and L are clear (note that the table is always clear, as it is not a block). The causation rules for ≠ on and -on specify the effect of a move. It is worthwhile noting that the totality of these fluents is not enforced. Both on(x,y) and -on(x,y) may happen to be not true at a given instant of time.” [Eiter-2000]

Executing the above program on DLV produces the following result in three steps: � PLAN: move(c,table,0), move(b,a,1), move(c,b,2)

For more information on DLV system, please refer to [Eiter-2000, Eiter-2003a, Polleres-2001, and Shah-2006] �

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

61 28 March 2019

4.1.4 GNS3 (Graphical Network Simulator-3) GNS3 is a free graphical network simulator that allows users to “build, design, and test your network in a risk-free virtual environment” [GNS3]. It has the following key features:

• Real-time network simulation for pre-deployment testing without the need for network hardware by running the OS that emulates real behavior of network hardware • Connect GNS3 to any real network • Run and test multiple (20+) hardware vendors without the need for hardware • Customized topologies and labs within GNS3 for network certification training • Create dynamic network maps for troubleshooting and proof of concept testing

From GNS3 features, we note that it is intended mainly for proving out and understanding the expected functionality of a network. It is not intended for “managing Virtual Machines or appliances in a production environment” [GNS3-Security]. Its current focus is not on “hardening the application against every possible threat … we do encourage contributions of this nature to the project …” [GNS3-Security]. We can therefore surmise that there is still a lot of work needed for it to be a working cyber security simulator. Even when GNS3 becomes a cyber simulator, we still need to ensure the faithful replication of a real network in a virtual environment as with all other cyber security simulators.

Moreover, GNS3, itself, has security issues [GNS3-Security], some of which are listed below:

• “The choice of making GNS3 usable without the need of a VM and multi-platform offers a powerful solution to users even with low resources computers but increases the attack surface. The best is to consider that if someone has access to a running GNS3 server, he has access to the account where the server is running. We try to limit that, but due to the nature of the experimentation running in GNS3 the GNS3 server has powerful control on the account.” • “When the GNS3 server is run on a local computer, it has access to the entire filesystem. This allows users to use images located in different locations of the file system. To summarize; if the GNS3 server process is compromised, it will have access to the entire file system.” • “Injection of packets into the virtual networks: Due to limitation of some , the virtual network listens on all IPs. This means you can inject packet inside and exploit a security hole in a running image.” • “GNS3 is a wrapper on proprietary and open source technologies. This means the security level depends of the security of each technology.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

62 28 March 2019

4.1.5 Cloud Options Using the cloud does not do away with security. Although there is a rightful sense of the transfer of operational responsibility to the cloud providers, the security accountability ultimately remains with the users. Moreover, the security issues also do not just disappear. “Cybersecurity technology is exactly the same, whether in-house or in the cloud. Cloud servers and in-house servers run the same operating systems. Therefore, they have the same configuration issues. Additionally, they speak the same network protocols. Finally, cryptography is the same math and logic wherever it runs. What’s very different is control and visibility.” 49

Depending on the SPI (Software, Platform, Infrastructure) stack level, we need to be concerned with the different aspects of security. For examples,

• With IaaS (or Infrastructure as a Service), we need to secure access, patching, and configuration. • • With PaaS (or Platform as a Service), we need to secure the OS and software environment. • • With SaaS (or Software as a Service), we might have to rely on what the providers provide or test the security for the providers.

Thus as we move up the SPI stack, we lose control and visibility, and also we need to cooperate more with the providers to ensure the whole SPI stack is secure.

Moreover, the authors in [Zhang-2014] identify that it is more cost-effective for the attackers to exploit cloud vulnerabilities and for the defenders to secure cloud as soon as possible. This means, the defenders have less time to procrastinate.

49 https://blog.learningtree.com/what-are-the-cybersecurity-challenges-associated-with-cloud-computing/

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

63 28 March 2019

4.2 Detailed Attribute Analysis of Three Cyber Security Simulation Solutions In this section, we highlight the key characteristics of three cyber security simulation solutions (CALDERA [Applebaum-2016], CRaaS50 [TSI-2018], and SafeBreach [SafeBreach-2017]) for security posture assessment and then provide a detailed analysis of their attributes.

Below, we list the key characteristics of each cyber security simulation solutions.

CALDERA • Focuses on post-compromise – after the perimeter has been breached • Focuses mainly on automated intelligent adversarial behavior – what damages it can cause • Uses ATT&CK kill chain for simulating adversarial behavior • Uses logic planning to decide the best option given uncertainty of the state

CRaaS51 • Focuses on the infrastructural support for building cyber range • No consideration of adversarial behavior which has to be developed and layered on top of the infrastructure

SafeBreach • Focuses on both breach and “attack” aspects of the adversarial behavior, although attack is not actually executed • Focuses not only on the adversarial behavior but also on the preventative aspects such as operational control working as planned, software being up-to-date, and configurations being properly set • Can be automated • Can be continuously run • Running it on a real network could pose non-negligible potential risk

From the above characteristics, we make the following non-exhaustive observations:

• All solutions (CALDERA and CRaaS), except SafeBreach, require accurate and proper level of network abstraction.

50 CRaaS (Cyber Range as a Service) is being provided by Technical System Integrators (https://www.tsieda.com/). It allows for the developers, consumers, and administrators to perform cyber training, testing, and exercises on a replica of their production infrastructure in fast and economical manner. 51 A similar solution is VxRail. Though it could be used as a cyber range infrastructure as for CRaaS, it is more focused on real business usage. Otherwise, they both have similar attributes as listed in Table 22 and Table 23.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

64 28 March 2019

• If SafeBreach is used on an accurately and properly replicated CRaaS infrastructure, then there is no potential risk. • CRaaS will have a lot of resources attributes but few, if any, resources attack attributes.

Next, we use the resources attributes template of Table 10 and resources attack attributes of Table 11 in coming up with the resources and their attack attributes for CALDERA, CRaaS, and SafeBreach in Table 15. In coming up with support attributes of the same three solutions in Table 16, we use the simulation engine support attributes template in Table 12.

Table 15 Resources and Their Attack Attributes for CALDERA, CRaaS, and SafeBreach Resources CALDERA CRaaS SafeBreach

• Network graph • Replicated network • Real network consisting of infrastructure including

node and link services Attribute Resource • Persistent Threat • Not Considered • Are configurations

Attack still appropriate? Network • ATT&CK kill • Optimized to detect

Attack chain and prevent threats Attribute • ATT&CK kill chain

• Not explicitly • Considered • Real services and considered software

Attribute Resource

• Not explicitly • Not considered • Software properly Service considered maintained and updated? Attack Attribute

• Considered • Considered • Real information

Attribute Resource

• Data exfiltration • Not considered • Data exfiltration Information Attack Attribute

• Not considered • Not considered • Not considered Mission Attribute Resource

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

65 28 March 2019

• Not considered • Not considered • Not considered Attack Attribute

• Not considered • Considered • Check if correctly implemented Attribute Resource

• Not considered • Not considered • Check if working as planned Security Controls Attack Attribute

• Not considered • Not considered • Ensure compliance with regulatory mandates Attribute Resource

• Not considered • Not considered • Provide impact of a Measure breach Attack Incident Response Response Incident Attribute

• Train blue team • Training has to be built • Help improve • Emulate red team on top of the replicated network security and

infrastructure thus indirectly train Attribute Resource blue team • • •

Red teaming Not considered Empower red teaming by allowing Personnel members to test

Attack different breach and Attribute what-if scenarios

• Detect adversarial • No detection of changes • Continuously test behavior security controls and challenge security Attribute Resource defenses

• • • Formulate attack Changes captured only Due to constant

Changes plan at the time or replication changes in personnel, process, Attack

Attribute and technology

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

66 28 March 2019

Table 16 Support Attributes of CALDERA, CRaaS, and SafeBreach

Support CALDERA CRaaS SafeBreach

• AI decision planning (Logic • Not • Manage and execute for Advanced Virtual considered breach scenarios but no Adversaries, LAVA) description of specific

Planning [Applebaum-2016] techniques

• Defender behavior • Not • The Hacker’s • Attacker behavior considered PlaybookTM of breach 52 • ATT&CK chain methods • Use entire ATT&CK kill Behavior chain • Red teaming with • Default • Continuous simulation uncertainty in the world infrastructure (Virtual Red Team System, operation ViRTS) Execution

• Training • Not • Breach and what-if • Attack/Defence considered scenarios • Network Data Data Generation • Analyze attack results • Not • Correlate and analyze all • No model verification and considered breach methods, and validation support present information

Analytic Analytic useful for corrective actions • Result of assessment • Audit trail • Result of simulation

Report Generation

• Actor of various behavior, • Not • Hacker-like behavior skill, knowledge, intent considered Actor Creation

52 The Hacker’s Playbook is available here: https://safebreach.com/SafeBreach-Third-Edition-Hackers- Playbook-Findings

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

67 28 March 2019

• Not gamified • Not • Not gamified • Training but no explicit considered • Training but no explicit

and and guidance and guidance and Gaming Training encouragement encouragement • Considered • Management • Considered of the infrastructure Simulation Simulation Set up/Reset

• Activity of the adversary • Not • Hacker’s behavior considered • What-if scenarios Knowledge Base Support Base

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

68 28 March 2019

5. RESEARCH CONCLUSION Most, if not all, of the cyber security simulation (CSS) solutions surveyed in this report are not built on a common cyber simulation environment/platform. This is understandable due to the wide scope of the problem being solved. The contributing reasons could be one or more of the following:

• Solution objective: the aspect of security problem of interest being different • Problem complexity: the set of security components of interest being different • Abstract levels of interest: the level of details in the components and solutions being different • Maturity levels of solution: the choice of available solutions used – exploratory or established • Assumptions: the detail required of the output result; the available information and certainty in the user, action and state space

Thus each CSS solution has each own set of attributes that may have some commonality with the attributes of the other CSS solutions. To elicit these attributes on a high-level, we propose considering only the following;

• User model: the components that actively interact with the network model and one another • Network model: the components that provide services and react to the user model • Manager model: the component that manages the user and network configuration and the reporting model • Physical system model: the physical system that is being controlled/affected by the user and network model • Report model: the component that generates report of the simulation result • Environment support model: includes but not limited to the execution of the above models to create simulation results; examples of other components are planning algorithm, user behavior, cost function, reward function, test data generation, and knowledge base support

To extract a more detailed set of attributes of a CSS solution, we propose a procedure illustrated on a hypothetical CSS solution with learning strategy for assessing security posture consisting of the following steps:

• Listing first the objectives of the CSS solution • Based on what and how the objectives are to be achieved, determine the attributes of the following: o input model, o input model behavior o output metrics o attack model o environmental support components

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

69 28 March 2019

We also propose a simple procedure to decide when to buy or to build a CSS solution based on the principle of buying as much as one can and building as much as one needs without reinventing the wheel. Moreover, we describe sample building block solutions and tradeoffs.

5.1 Future Work We list here some research work for future consideration. • Track the maturity of BAS products as they are more in line with network assessment through simulation • Investigate the impact of adversarial learning techniques on those cyber solutions using on-line learning strategy • Investigate planning solutions that involve probabilistic actions such as described in [Eiter-2003b] • Consider formulating building block solution for our cyber simulation environment. • Look out for cyber simulation solution platform that allows for plug-and-play of component solutions • Reiterating the research challenge posed in [Hoffmann-2015], “Can we exploit the restrictions for designing effective algorithms and, with that, practically feasible yet more accurate simulated pentesting tools?” Another related question is “are there feasible ways of making the MDP approximation of the POMDP model more accurate?”

We describe next a set of possible investigations to adapt the more recent findings of reinforcement learning (RL) being applied to game, natural language processing, and image processing to the cyber security domain.

5.1.1 Adapting Recent Enhancements of Reinforcement Learning In [Elderman-2017], the authors studied how different types of reinforcement learning techniques can be used to formulate attacker sequential decision. Though Monte Carlo learning was shown to be better among those studied, reinforcement learning has a fundamental flaw. It is learning from scratch [Kurenkov-2018a]. Learning from scratch takes time and time may not be a luxury available in the cyber security domain. Therefore, enhancements that expedite the reinforcement leaning process will be of major interest. Another challenge of RL is the difficulty of defining the reward function.

A partial list of potential enhancements from [Kurenkov-2018b] is given below.

• Hindsight Experience Replay, to deal with sparse rewards in reinforcement learning • Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning, to develop zero-shot task generalization capabilities in reinforcement learning, where learning agent is given only instructions for the new task and expected to do well without any initial experience of the new task • Learning Goal-Directed Behavior, to deal with the correct assignment of credits over long periods of time and dealing with sparse rewards with notions of goals

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

70 28 March 2019

• Kickstarting Deeping Reinforcement Learning, to kick start the training of a new 'student' agent using previously-trained ‘teacher’ agent, leveraging ideas from policy distillation and population based training • Meta-Reinforcement Learning of Structured Exploration Strategies, to explore effectively new situations by leveraging the experience of exploring prior tasks • Learning Robust Rewards with Adversarial Inverse Reinforcement Learning, to acquire automatically reward function based on adversarial inverse reinforcement learning • Meta-leaning in Reinforcement Learning, to tune meta-parameters in reinforcement learning to the environmental dynamics and the animal performance, where meta- learning is about learning how to learn: making learning new skills faster after having learned similar skills

From [Alexander-2018], we have the following enhancements to reinforcement learning:

• Algorithms for Inverse Reinforcement Learning [Ng-2000], to extract reward function • Apprenticeship Learning via Inverse Reinforcement Learning [Abbeel-2004], to also try to recover reward function from observation of the task being performed

To deal with the scalability issues, we have the study on hierarchical reinforcement learning to “address these issues by learning to operate on different levels of temporal abstraction” [Flet- Berliac-2019]. Another solution is to use Deep Q Networks 53, a combination of Q Learning with Deep Learning, where the Q Learning table is replaced with a neural network that tries to approximate the Q Values.

5.2 Summary The document surveyed the simulation theories, methods, tools, and results of 20 types of cyber simulation environments in the academic literature, open source communities, and commercial space. It provided an in-depth analysis of six cyber simulation solutions with respect to the problem being solved, results found, assumptions made, and solution strategy. In trying to find a common set of attributes, we recognized that the list of relevant attributes is as diverse as the problems being solved and the solution strategies being used. To elicit the attributes of each type of cyber simulation solution, we proposed first a generic attributes template for the technical aspect of a cyber simulation solution. The non-technical attributes are mostly applicable to all types of cyber simulation solution. We applied this attributes enumeration template to seven cyber simulation solutions.

Next, we noted that the attributes of a cyber simulation solution depend on the following factors: • problem definition (e.g., assessment, deception, learning, training, education, data generation, development) • problem context (e.g., wireless, wired, heterogeneous network, with/without physical system)

53 See https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/ or https://skymind.ai/wiki/deep-reinforcement-learning for more information.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

71 28 March 2019

• level of abstraction (e.g., node and link, node of different device types and link of different bandwidths, network/communication connectivity, control of remote vehicle, mission evaluation, service availability) • solution strategy (e.g., deception, learning, planning, network interference) • user behavior (e.g., DDOS, exfiltration, malware) • output (e.g., training, risk level, mission completion probability) • simulation engine support components (e.g., planning, actor behavior, knowledge base, analytics, data generation)

Considering the above factors and elaborating on the generic attribute template, we proposed an attributes enumeration template specifically for a cyber simulation with learning solution that is designed to provide security posture assessment. The procedure used to formulate this CSSE specific attribute enumerate template is applied to three sample CSSs and can also be used for other types of CSSEs.

Finally, we described a procedure for deciding when to buy or to build a cyber simulation solution and a set of potential building block solutions for building a final CSS solution.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

72 28 March 2019

6. REFERENCES

[Abbeel- P. Abbeel and A Y. Ng, “Apprenticeship Learning via Inverse Reinforcement Learning,” in 2004] Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004. http://people.eecs.berkeley.edu/~russell/classes/cs294/s11/readings/Abbeel+Ng:2004.p df

[Acosta- J. Acosta, E. Padilla, J. Homer, and S. Ou. 2017. “Risk Analysis with Execution-Based 2017] Model Generation.” CSIAC Journal of Cyber Security and Information Systems 5 (1). https://www.csiac.org/journal-article/risk-analysis-with-execution-based-model- generation/7/.

[Alexander- J. Alexander, “Learning from humans: what is inverse reinforcement learning?” The 2018] Gradient, June 2018. https://thegradient.pub/learning-from-humans-what-is-inverse-reinforcement-learning/

[Alpcan- T. Alpcan and T. Baar (2010) Network security: A decision and game-theoretic approach. 2010] Cambridge University Press, Cambridge

[Applebaum- A. Applebaum, D. Miller, B. Strom, C. Korban, and R. Wolf. 2016. “Intelligent, Automated 2016] Red Team Emulation.” In Proceedings of the 32Nd Annual Conference on Computer Security Applications, 363–373. ACSAC ’16. New York, NY, USA: ACM. https://doi.org/10.1145/2991079.2991111.

[Applebaum- A. Applebaum, D. Miller, B. Strom, H. Foster, and C.Thomas. 2017. “Analysis of 2017] Automated Adversary Emulation Techniques.” In Proceedings of the Summer Simulation Multi-Conference, 16:1–16:12. SummerSim ’17. San Diego, CA, USA: Society for Computer Simulation International. http://dl.acm.org/citation.cfm?id=3140065.3140081.

[Ahrenholz- J. Ahrenholz, C. Danilov, T.R. Henderson, & J.H. Kim, (2008). “CORE: A real-time network 2008] emulator,”IEEE Military Communications Conference (pp. 1-7). IEEE.

[Atighetchi- M. Atighetchi, F. Yaman, D. Last, C. N. Paltzer, M. Caiazzo, and S. Raio. 2017. “A Flexible 2017] Approach Towards Security Validation.” In Proceedings of the 2017 Workshop on Automated Decision Making for Active Cyber Defense, 7–13. SafeConfig ’17. New York, NY, USA: ACM. https://doi.org/10.1145/3140368.3140370.

[Avatao- Avatao (Last accessed on Jan. 21, 2019). https://avatao.com/. 2019]

[Backes- M. Backes, J. Hoffmann, R. Künnemann, P. Speicher, and M. Steinmetz. 2017. “Simulated 2017] Penetration Testing and Mitigation Analysis.” ArXiv:1705.05088 [Cs], May. http://arxiv.org/abs/1705.05088.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

73 28 March 2019

[Baiardi- F. Baiardi, F. Tonelli, and A. D. R. D. Biase. 2016. “Assessing and Managing Risk by 2016] Simulating Attack Chains.” In 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 581–84. https://doi.org/10.1109/PDP.2016.50.

[Barros- A. Barros and A. Chuvakin, “Utilizing Breach and Attack Simulation Tools to Test and 2018] Improve Security,” Gartner Report, May 2018, https://www.gartner.com/doc/3875421/utilizing-breach-attack-simulation-tools

[Bauer- K.W. Bauer Jr, S.G. Alsing, and K.A. Greene, “Feature screening using signal-to-noise 2000] ratios,” Neurocomputing 2000; 31: 29–44.

[Ben- N. Ben-Asher, V. Dutt, and C. Gonzalez (2013a) Accounting for integration of descriptive Asher- and experiential information in a repeated prisoner’s dilemma using an instance-based 2013a] learning model. In: Kennedy B, Reitter D,Amant RS (eds) Proceedings of the 22nd Annual Conference on Behavior Representation in Modeling and Simulation, Ottawa, Canada, BRIMS Society.

[Ben-Asher- N. Ben-Asher, C. Lebiere, A. Oltramari, and C. Gonzalez (2013b) Balancing fairness and 2013b] efficiency in repeated societal interaction. In: Proceedings of the 35th Annual Meeting of the Cognitive Science Society, Humboldt Universität, Berlin, 31 July-3 August 2013.

[Ben-Asher- N. Ben-Asher and C. Gonzalez, “Cyber War Game: A Paradigm for Understanding New 2015] Challenges of Cyber War,” in S. Jajodia et al. (eds.), Cyber Warfare, Advances in Information Security 56, pp. 207-220, 2015.

[Benzel- T. Benzel, “The science of cyber security experimentation: the DETER project,” 2011] Proceedings of the 27th Annual Computer Security Applications Conference. ACM (2011), pp. 137-148.

[Bergin- D.L. Bergin, “Cyber-attack and defense simulation framework,” Journal of Defense 2015] Modeling and Simulation: Applications, Methodology, Technology, 2015, vol. 12, no. 4, pp 383-392.

[Berman- M. Berman, J.S. Chase, L. Landweber, A. Nakao, M. Ott, D. Raychaudhuri, I. R. Ricci, and 2014] Seskar, “GENI: A federated testbed for innovative network experiments,” Computer Networks, 61 (2014), pp. 5-23.

[Brown- B. Brown, A. Cutts, D. McGrath, D.M. Nicol, T.P. Smith, and B. Tofel, (2003). “Simulation 2003] of cyber attacks with applications in homeland defense training,” In E. M. Carapezza (Ed.), Sensors, and command, control, communications, and intelligence (c3i) technologies for homeland defense and law enforcement II (pp. 63-71).

[Browne- D.C. Browne, T.R. Ender, W. Yates, & M. O’Neal, (2012). “FACT: Enabling systems 2012] engineering as a web service.,” 15th Annual Systems Engineering Conference. San Diego, CA. Retrieved from http://www.dtic.mil/ndia/2012system/ttrack514877.pdf

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

74 28 March 2019

[Buchler- N. Buchler, P. Rajivan, L. R. Marusich, L. Lightner, and C. Gonzalez. 2018. “Sociometrics 2018] and Observational Assessment of Teaming and Leadership in a Cyber Security Defense Competition.” Computers & Security 73 (March): 114–36. https://doi.org/10.1016/j.cose.2017.10.013.

[Behrmann- G. Behrmann, A. David, and K.G. Larsen: A tutorial on UPPAAL. In: Bernardo, 2004] M., Corradini, F. (eds.) SFM-RT 2004. LNCS, vol. 3185, pp. 200–236. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30080-9 7

[Belue-1995] L.M. Belue and K.W. Bauer Jr, “Determining input features for multilayer perceptrons,” Neurocomputing 1995; 7: 111–121.

[BRASIL] BRASIL Downloads.http://www.cl.cam.ac.uk/research/srg/netos/brasil/downloads/ (2019, accessed 30 May 2019).

[Bunchball- Bunchball, “Gamification 101: An Introduction to the Use of Game Dynamics to Influence 2016] Behavior,” Retrieved from https://www.bunchball.com/gamification101.

[Buttyán- L. Buttyán, M. Félegyházi, and G. Pék, (2016). “Mentoring talent in IT security–A case 2016] study,” In 2016 USENIX Workshop on Advances in Security Education (ASE16).

[Campbell- B. Campbell, B. Schofield, M. Gingell, “RApid Network Generation and modElling tool 2019] (RANGE) quick start guide,” Defence Research and Development Canada, Contract Report, DRDC-RDDC-2019-C057, March 2019.

[Cao-2015a] P. Cao, E. C. Badger, Z. T. Kalbarczyk, R. K. Iyer, A. Withers, and A. J. Slagell, “Towards an unified security testbed and security analytics framework,” in Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. ACM, 2015, p. 24.

[Cao-2015b] P. Cao, E. Badger, Z. Kalbarczyk, R. Iyer, and A. Slagell, “Preemptive intrusion detection: Theoretical framework and real-world measurements,” in Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. ACM, 2015, p. 5.

[Chatterjee- S. Chatterjee, M. Halappanavar, R. Tipireddy, M. Oster, and S. Saha. 2015. “Quantifying 2015] Mixed Uncertainties in Cyber Attacker Payoffs.” In 2015 IEEE International Symposium on Technologies for Homeland Security (HST), 1–6. https://doi.org/10.1109/THS.2015.7225287.

[Chatterjee- S. Chatterjee, R. Tipireddy, M. Oster, and M. Halappanavar. 2016. “Propagating Mixed 2016] Uncertainties in Cyber Attacker Payoffs: Exploration of Two-Phase Monte Carlo Sampling and Probability Bounds Analysis.” In 2016 IEEE Symposium on Technologies for Homeland Security (HST), 1–6. https://doi.org/10.1109/THS.2016.7568967.

[Chen-2006] Y. Chen, B. W. Wah, and C.W. Hsu, “Temporal planning using subgoal partitioning and resolution in SGPlan,” Journal of Artificial Intelligence Research, 26:323–369, 2006.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

75 28 March 2019

[Chung- K. Chung, C. A. Kamhoua, K. A. Kwiat, Z. T. Kalbarczyk, and R. K. Iyer. 2016. “Game Theory 2016] with Learning for Cyber Security Monitoring.” In 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE), 1–8. https://doi.org/10.1109/HASE.2016.48. http://assured-cloud-computing.illinois.edu/files/2014/03/Game-Theory-with-Learning- for-Cyber-Security-Monitoring.pdf

[Circadence- Circadence, “Gamifying Cybersecurity for Team-Based Training and Assessments,” 2018 2018] https://marketing.circadence.com/acton/media/36273/gamified-cyber-security-training

[Cohen- F. Cohen, (1999). “Simulating cyber attacks, defences, and consequences,” Computers & 1999] Security (pp. 479-518). Elsevier Science Ltd.

[Constantini- K.C. Costantini, (2007). “Development of a cyber attack simulator for network modeling 2007] and cyber security analysis.” Unpublished manuscript, Department of Industrial and Systems Engineering, Rochester Institute of Technology, Rochester, New York. Retrieved from https://scholarworks.rit.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=6703&c ontext=theses

[CTF365- CTF365 (Last accessed on Jan. 21, 2019). Capture The Flag 365. https://ctf365.com/. 2019]

[CyCS] The MITRE Corporation, Cyber Command System (CyCS), web page, http://www.mitre.org/research/technology-transfer/technology-licensing/cyber- command-system-cycs.

[David-2015] A. David, P.G. Jensen, K.G. Larsen, M. Mikucionis, and J.H. Taankvist: UPPAAL STRATEGO. In: Baier, C., Tinelli, C. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46681-0 16. ISBN: 978-3-662-46680-3

[DHS-2006] Department of Homeland Security, National Cyber Security Division. (2006). ”Cyber storm exercise report,” Available in https://www.hsdl.org/?abstract&did=466697

[Dowse- A. Dowse, 2018. “The Need for Trusted Autonomy in Military Cyber Security.” In 2018] Foundations of Trusted Autonomy, 203–13. Studies in Systems, Decision and Control. Springer, Cham. https://doi.org/10.1007/978-3-319-64816-3_11.

[Durkota- K. Durkota, V. Lisy, B. Bosansky, and C. Kiekintveld, “Approximate Solutions for Attack 2015a] Graph Games with Imperfect Information,” in MHR Khouzani et al. (Eds.): GameSec 2015, LNCS 9406, pp. 228–249, 2015. https://link-springer- com.proxy.library.carleton.ca/content/pdf/10.1007%2F978-3-319-25594-1_13.pdf

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

76 28 March 2019

[Durkota- K. Durkota, V. Lisy, B. Bošansky, and C. Kiekintveld. 2015. “Optimal Network Security 2015b] Hardening Using Attack Graph Games.” In Proceedings of the 24th International Conference on Artificial Intelligence, 526–532. IJCAI’15. Buenos Aires, Argentina: AAAI Press. http://dl.acm.org/citation.cfm?id=2832249.2832322. http://www.lancaster.ac.uk/staff/suchj/acyse2015- proceedings/ACySe2015_submission_Durkota.pdf

[Durkota- K. Durkota, V. Lisy, C. Kiekintveld, B. Bosansky, and M. Pechoucek. 2016. “Case Studies of 2016] Network Defense with Attack Graph Games.” IEEE Intelligent Systems 31 (5): 24–30. https://doi.org/10.1109/MIS.2016.74. http://agents.fel.cvut.cz/~durkota/files/ieee16.pdf

[EDURange- EDURange (Last accessed on Jan. 21, 2019). http://www.edurange.org/. 2019]

[Edwards- S. Edwards, “Whole Systems Trades Analysis Tool. Briefing, PEO Ground Combat 2012] Systems,” (2012).

[Eiter-2000] T. Eiter, W. Faber, N. Leone, G. Pfeifer, and A. Polleres, “Planning under Incomplete Knowledge,” In: Lloyd J. et al. (eds) Computational Logic —CL 2000. Lecture Notes in Computer Science, vol. 1861. Springer, Berlin, Heidelberg.

[Eiter- T. Eiter, W. Faber, N. Leone, G. Pfeifer, and A. Polleres, “A logic programming approach 2003a] to knowledge-state planning, II: The DLV system,” Artificial Intelligence, vol. 144, 2003, pp. 157-211. �

[Eiter- T. Eiter and T. Lukasiewicz, “Probabilistic Reasoning about Actions in Nonmonotonic 2003b] Causal Theories,” UAI, pp. 192-199, 2003. https://arxiv.org/ftp/arxiv/papers/1212/1212.2461.pdf

[Elderman- R. Elderman, L. Pater, A. Thie, M. Drugan, and M. Wiering, “Adversarial Reinforcement 2017] Learning in a Cyber Security Simulation,” International Conference on Agents and Artificial Intelligence (ICAART), 2017. https://www.rug.nl/research/portal/en/publications/adversarial-reinforcement- learning-in-a-cyber-security-simulation(32575f79-ec81-487d-a1d0-8371d47a84f9).html. http://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/CyberSec_ICAART.pdf

[Emulab- Emulab - Network Emulation Testbed Home. https://www.emulab.net. As of 21 January 2019] 2019

[Faber-1999] W. Faber, N. Leone, and G. Pfeifer, “Pushing Goal Derivation in DLP Computations,” LPNMR '99 Proceedings of the 5th International Conference on Logic Programming and Nonmonotonic Reasoning, pp. 177-191, December 1999. https://www.researchgate.net/publication/2880453_Pushing_Goal_Derivation_in_DLP_ Computations

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

77 28 March 2019

[Ferguson- K. Ferguson-Walter, D. Lafon, and T. Shade. 2018. “Game Theory for Adaptive Defensive 2018] Cyber Deception,” SPAWAR Systems Center PACIFIC Technical Report 3141, July 2018.

[Fite-2014] B.K. Fite, “Simulating Cyber Operations: A Cyber Security Training Framework,” SANS Institute InfoSec Reading Room, February, 2014. https://www.sans.org/reading- room/whitepapers/bestprac/simulating-cyber-operations-cyber-security-training- framework-34510

[Flet-Berliac- Y. Flet-Berliac, “The Promise of Hierarchical Reinforcement Learning,” The Gradient, 2019] March 2019. https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/

[Franklin- C. Franklin Jr., “7 University-Connected Cyber Ranges to Know Now,” DarkReading, 2018] March 2018, https://www.darkreading.com/cloud/7-university-connected-cyber-ranges- to-know-now/d/d-id/1331224

{Fuentes- F. Fuentes and D.C. Kar, “Ethereal vs. Tcpdump: a comparative study on packet sniffing 2005} tools for educational purpose,” J Comput Sci College 2005; 20: 169–176.

[Futoransky- A. Futoransky, L. Notarfrancesco, G. Richarte, and C. Sarraute. 2010. “Building Computer 2010a] Network Attacks.” ArXiv:1006.1916 [Cs], June. http://arxiv.org/abs/1006.1916.

[Futoransky- A. Futoransky, F. Miranda, J. Orlicki, and C. Sarraute. 2010. “Simulating Cyber-Attacks for 2010b] Fun and Profit.” ArXiv:1006.1919 [Cs], June. http://arxiv.org/abs/1006.1919.

[Gadyatskay O. Gadyatskaya, R. R.Hansen, K. G.Larsen, A. Legay, M. C. Olesen, and D. B. Poulsen. a-2016] 2016. “Modelling Attack-Defense Trees Using Timed Automata.” In Formal Modeling and Analysis of Timed Systems, 35–50. Lecture Notes in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-44878-7_3.

[Gallagher- P.S. Gallagher and F. DiGiovanni, “Learning Cyber Operations Through Gaming: An 2017] overview of current and up and coming gamified learning environments,” Journal of Cyber Security and Information Systems, vol. 5, no. 4, , Serious Games to Enhance Defense Capabilities – Modeling & Simulation Special Edition https://www.csiac.org/journal-article/learning-cyber-operations-through-gaming-an- overview-of-current-and-up-and-coming-gamified-learning-environments/

[Gjøl-2017] P. Gjøl Jensen, K. G. Larsen, A. Legay, and D. B. Poulsen. 2017. “Quantitative Evaluation of Attack Defense Trees Using Stochastic Timed Automata.” In GraMSec 2017 - The Fourth International Workshop on Graphical Models for Security. Santa Barbara, United States. https://hal.inria.fr/hal-01640091.

[Gold-2016] R. Gold, “Mission Engineering,” (2016, October 24-27). Springfield, VA. Retrieved from http://www.acq.osd.mil/se/briefs/2016-10-26-NDIA-SEC-Gold-ME.pdf

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

78 28 March 2019

[GNS3] Graphical Network Simulator-3, https://www.gns3.com/

[GNS3- Graphical Network Simulator-3 Security, https://docs.gns3.com/1ON9JBXSeR7Nt2- Security] Qum2o3ZX0GU86BZwlmNSUgvmqNWGY/index.html#header

[Grant-2008] M. Grant, S. Boyd, and Y. Ye, “CVX: Matlab software for disciplined convex programming,” (2008)

[Grant-2018] T. Grant, 2018. “Speeding up Planning of Cyber Attacks Using AI Techniques: State of the Art.” In ICCWS 2018 13th International Conference on Cyber Warfare and Security, 235.

[Grasso- G. Grasso, N. Leone, M. Manna, and F. Ricca, “ASP at Work: Spin-off and Applications of 2011] the DLV System,” in Logic Programming, Knowledge, Representation, and Nonmonotonic Reasoning, M. Balduccini and T.C. Son (Eds.): Gelfond Festschrift, LNAI 6565, pp. 432– 451, 2011.

[Gundersen- O. E. Gundersen, and S. Kjensmo. 2017. “State of the Art: Reproducibility in Artificial 2017] Intelligence.” In AAAI Digital Library.

[Gurobi- O. Gurobi, “Gurobi optimizer reference manual. URL http://www.gurobi.com ; 2015. 2015] 〈 〉 [Hall-2009] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, & I.H. Witten, (2009). “The WEKA Data Mining Software: An Update,” SIGKDD Explorations.

[Hamilton- S.N. Hamilton and W.L. Hamilton, “Adversary Modelling and Simulation in Cyber 2008] Warfare,” In: Jajodia S., Samarati P., Cimato S. (eds) Proceedings of The Ifip Tc 11 23rd International Information Security Conference. SEC 2008. IFIP – The International Federation for Information Processing, vol 278. Springer, Boston, MA. https://pdfs.semanticscholar.org/e97b/17bbbdc1e645ca1914032369c08b97229e82.pdf

[Harvey- C. Harvey, “Breach and Attack Simulation: Find Vulnerabilities before the Bad Guys Do,” 2018a] October 2018, eSecurity Planet, https://www.esecurityplanet.com/threats/breach-and- attack-simulation.html

[Harvey- C. Harvey, “11 Top Breach and Attack Simulation (BAS) Vendors,” December 2018, 2018b] eSecurity Planet, https://www.esecurityplanet.com/products/top-breach-and-attack- simulation-bas-vendors.html

[Harwell- S.D. Harwell and C.M. Gore, “Synthetic Cyber Environments for Training and Exercising 2013] Cyberspace Operations,” (2013). M&S Journal, vol. 8, no. 2 (2013), pp. 36-48.

[Hazon- N. Hazon, N. Chakraborty, and K. Sycara (2011) Game theoretic modeling and 2011] computational analysis of n-player conflicts over resources. In: Proceedings of the 2011 IEEE International Conference on Privacy, Security, Risk and Trust and IEEE International Conference on Social Computing. IEEE, Los Alamitos, p 380–387.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

79 28 March 2019

doi:10.1109/PASSAT/SocialCom.2011.178

[Helmert- M. Helmert. The Fast Downward planning system. Journal of Artificial Intelligence 2006] Research, 26:191–246, 2006.

[Hertwig- R. Hertwig, G. Barron, E.U. Weber, and I. Erev (2004) Decisions from experience and the 2004] effect of rare events in risky choice. Psychol Sci 15(8):534–539. doi:10.1111/j.0956– 7976.2004.00715.x

[Hoffmann- J. Hoffmann, ‘Simulated Penetration Testing: From “Dijkstra” to “Turing Test++”,’ 2015] Proceedings of the Twenty-Fifth International Conference on Automated Planning and Scheduling. http://fai.cs.uni-saarland.de/hoffmann/ff.html

[Holm-2016] H. Holm, and T. Sommestad. 2016. “Sved: Scanning, Vulnerabilities, Exploits and Detection.” In Military Communications Conference, MILCOM 2016-2016 IEEE, 976–981. IEEE.

[Holm-2017] ———. 2017. “So Long, and Thanks for Only Using Readily Available Scripts.” Information and Computer Security 25 (1): 47–61. https://doi.org/10.1108/ICS-08-2016-0069.

[Hota-2016] A. R. Hota, A. A. Clements, S. Sundaram, and S. Bagchi. 2016. “Optimal and Game- Theoretic Deployment of Security Investments in Interdependent Assets.” In 7th International Conference on Decision and Game Theory for Security - Volume 9996, 101– 113. GameSec 2016. New York, NY, USA: Springer-Verlag New York, Inc. https://doi.org/10.1007/978-3-319-47413-7_6.

[Hwang- N. J. Hwang and K. B. Bush. 2015. “Operational Exercise Integration Recommendations 2015] for DoD Cyber Ranges.” TR-1187. MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB.

[Issariyakul- T. Issariyakul and E. Hossain, Introduction to Network Simulator NS2. New York, NY, USA: 2008] Springer, 2008.

[Jajodia- S. Jajodia, S. Noel, and B. O’Berry, “Topological Analysis of Network Attack Vulnerability,” 2005] in Managing Cyber Threats: Issues, Approaches and Challenges, Springer, 2005.

[Jajodia- S. Jajodia and S. Noel, “Topological Vulnerability Analysis,” in Cyber Situational 2010] Awareness, Advances in Information Security 46, Springer, 2010.

[Jajodia- S. Jajodia, S. Noel, P. Kalapa, M. Albanese, and J. Williams, “Cauldron mission-centric 2011] cyber situational awareness with defense in depth,” in Proc. MILCOM, Nov. 2011, pp. 1339–1344.

[Jajodia- S. Jajodia, N. Park, E. Serra, and V. S. Subrahmanian. 2016. “Using Temporal Probabilistic 2016] Logic for Optimal Monitoring of Security Events with Limited Resources.” Journal of Computer Security 24 (6): 735–91. https://doi.org/10.3233/JCS-160555.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

80 28 March 2019

[Jajodia- S. Jajodia, N. Park, F. Pierazzi, A. Pugliese, E. Serra, G. I. Simari, and V. S. Subrahmanian. 2017] 2017. “A Probabilistic Logic of Cyber Deception.” IEEE Transactions on Information Forensics and Security 12 (11): 2532–44. https://doi.org/10.1109/TIFS.2017.2710945.

[ ] K Planning System http://www.dlvsystem.com/k-planning-system/ � [Kavak- H. Kavak, J.J. Padilla, D. Vernon-Bido, R.J. Gore, and S.Y. Diallo, “A Characterization of 2016] Cybersecurity Simulation Scenarios,” 19th Communications and Networking Simulation Symposium (CNS’16), At Pasadena, CA, USA. https://www.researchgate.net/publication/299820368_A_Characterization_of_Cybersec urity_Simulation_Scenarios

[Kelton- W.D. Kelton, R. P. Sadowski, and D. T. Sturrock. 2004. Simulation with ARENA, Third 2004] Edition, McGraw-Hill, Boston, MA.

[Krall-2016] A. L. Krall, M. E. Kuhl, S. F. Moskal, and S. J. Yang. 2016. “Assessing the Likelihood of Cyber Network Infiltration Using Rare-Event Simulation.” In 2016 IEEE Symposium Series on Computational Intelligence (SSCI), 1–7. Athens, Greece: IEEE. https://doi.org/10.1109/SSCI.2016.7849913.

[Kuhl-2007] M.E. Kuhl, J. Kistner, K. Costantini, and M. Sudit, “Cyber attack modeling and simulation for network security analysis,” Proceedings of the 39th conference on Winter simulation, pp. 1180-1188, Washington D.C. — December 09 - 12, 2007.

[Kumar- C. Kumar, “8 Cyber Attack Simulation Tools to Improve Security,” December 2018, 2018] Geekflare, https://geekflare.com/cyberattack-simulation-tools/

[Kurenkov- A. Kurenkov, “Reinforcement learning’s foundational flaw,” The Gradient, July 2018. 2018a] https://thegradient.pub/why-rl-is-flawed/

[Kurenkov- A. Kurenov, “How to fix reinforcement learning,” The Gradient, July 2018. 2018b] https://thegradient.pub/how-to-fix-rl/

[Lantz-2010] B. Lantz, B. Heller, and N. McKeown, “A network in a laptop: rapid prototyping for software-defined networks,” In Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, ACM (2010).

[Lathrop- Lathrop, S. D., Conti, G. J., & Ragsdale, D. J. (2002). Information warfare in the trenches. 2002] Unpublished manuscript, US Military Academy, West Point, NY. Retrieved from http://www.rumint.org/gregconti/publications/iwar.doc

[Leblanc- S.P. Leblanc, A. Partington, I. Chapman and M. Bernier, “An Overview of Cyber Attack 2011] and Computer Network Operations Simulation,” Proceedings of the 2011 Military Modeling & Simulation Symposium, pp. 92-100, 2011. https://studydaddy.com/attachment/55236/cuktok2lyx.pdf

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

81 28 March 2019

[Leeuwen- B.V. Leeuwen, V. Urias, W. Stout, and B. Wright, “Emulytics at Sandia National 2015] Laboratories,” MODSIM World 2015, pp. 1–10.

[Lei-2018] C. Lei, H.-Q. Zhang, L.-M. Wan, L. Liu, and D.-H. Ma. 2018. “Incomplete Information Markov Game Theoretic Approach to Strategy Generation for Moving Target Defense.” Computer Communications 116 (January): 184–99. https://doi.org/10.1016/j.comcom.2017.12.001.

[Lejarraga- T. Lejarraga, V. Dutt, and C. Gonzalez (2012) Instance-based learning: A general model of 2012] repeated binary choice. J Behavioral Decision Making 25(2):143–153.

[Letchford- J. Letchford and Y. Vorobeychik. 2013. “Optimal Interdiction of Attack Plans.” In 2013] Proceedings of the 2013 International Conference on Autonomous Agents and Multi- Agent Systems, 199–206. AAMAS ’13. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems. http://dl.acm.org/citation.cfm?id=2484920.2484955.

[Liljenstam- M. Liljenstam and J. Liu, (2006). “Rinse: the real-time immersive network simulation 2006] environment for network security exercises (extended version),” SIMULATION, 82(1), pp. 43-59.

[Lincoln Lincoln Laboratory, https://www.ll.mit.edu/r-d/cyber-security-and-information-sciences. Laboratory- As of 21 January 2019 2019]

[Lisý-2016] V. Lisý, T. Davis, and M. Bowling. 2016. “Counterfactual Regret Minimization in Sequential Security Games.” In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 544–550. AAAI’16. Phoenix, Arizona: AAAI Press. http://dl.acm.org/citation.cfm?id=3015812.3015894.

[Littman- M.L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” 1994] In: Proceedings of the eleventh international conference on machine learning, New Brunswick, NJ, 10–13 July 1994.

[Littman- M. Littman, “Friend-or-foe Q-learning in general-sum Markov games,” In: Proceedings of 2001] eighteenth international conference on machine learning, Williamstown, MA, 28 June 28–1 July 2001.

[Maimon- O. Maimon and L. Rokach. Data mining and knowledge discovery handbook. Vol. 2. 2005] Springer, 2005.

[Maria- A. Maria, “Introduction to Modeling and Simulation,” Proceedings of the 29th 1997] conference on Winter simulation, pp. 7-13, Atlanta, Georgia, USA — December 07 - 10, 1997.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

82 28 March 2019

http://acqnotes.com/Attachments/White%20Paper%20Introduction%20to%20Modeling %20and%20Simulation%20by%20Anu%20Maria.pdf

[Marshall- H. Marshall, C. Gaughan, T. Good, and J.S. McDonnell, “Development of a Distributed 2012] Cyber Operations Modelling and Simulation Framework,” Fall, 2012. https://www.sisostds.org/DesktopModules/Bring2mind/DMX/API/Entries/Download?Co mmand=Core_Download&EntryId=36209&PortalId=0&TabId=105 or the paper “12f-siw-021” in https://www.sisostds.org/Default.aspx?tabid=105&EntryId=41329

[Marshall- H. Marshall, R. Wells, J. Truong, and L Elliott, “Development of a Cyber Warfare Training 2014] Prototype for Current Simulations,” Fall, 2014. https://www.sisostds.org/DesktopModules/Bring2mind/DMX/API/Entries/Download?Co mmand=Core_Download&EntryId=42362&PortalId=0&TabId=105 or the paper “14f-siw-17” in https://www.sisostds.org/Default.aspx?tabid=105&EntryId=42341

[Marshall- H. Marshall, J.R. Mize, M. Hooper, and R. Wells, “Cyber Operations Battlefield Web 2015] Services (COBWebS) – Concept for a Tactical Cyber Warfare Effect Training Prototype,” Simulation Interoperability Standards Organization (SISO) Simulation Interoperability Workshop (SIW) Fall 2015, At https://www.sisostds.org/DesktopModules/Bring2mind/DMX/Download.aspx?

[Matlab- Matlab (version 2014), http://www.mathworks.com/products/ matlab/(2014, accessed 1 2014] August 2016).

[MATREX] “MATREX Program Overview, Papers and Briefings,” Available via https://www.matrex.rdecom.army.mil/

[MATREX- "MATREX: BCMS Software User's Manual," Version 4.6.0a, February, 2012 2012a]

[MATREX- “MATREX: Advanced Testing Capability (ATC) User's Guide,” Version 6.1.0, March 2012 2012b]

[McCray- P. McCray and K. Snively, “08S-SIW-063 Functional Component Testing for Distributed 2008] Simulations,” Simulation Interoperability Workshop, April 2008

[McGregor- I. McGregor, “The relationship between simulation and emulation,” Proceedings of the 2002] Winter Simulation Conference, IEEE (2002), pp. 1683-1688.

[MCR-2017] The Michigan Cyber Range. https://www.merit.edu/cyberrange/.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

83 28 March 2019

[Mell-2006] P.Mell, K. Scarfone, and S. Romanosky. “Common vulnerability scoring system,” Security & Privacy, pages 85–89, 2006.

[MGEN- Multi-Generator (MGEN). (2016). Retrieved from 2016] http://www.nrl.navy.mil/itd/ncs/products/mgen (Accessed May 30, 2019)

[Miller- D. Miller, R. Alford, A. Applebaum, H. Foster, C. Little, and B. E. Strom. 2018. “Automated 2018] Adversary Emulation: A Case for Planning and Acting with Unknowns.” The MITRE Corporation, June. https://www.mitre.org/publications/technical-papers/automated- adversary-emulation-a-case-for-planning-and-acting-with.

[Mirkovic- J. Mirkovic, T.V. Benzel, T. Faber, R. Braden, J.T. Wroclawski, and S. Schwab, S. (2010). 2010] “The DETER Project,” In 2010 IEEE International Conference on Technologies for Homeland Security (HST ’10).

[MLM] Mission-Level Modeling, The MITRE Corporation, Systems Engineering Guide – Collected Wisdom from MITRE’s Systems Engineering Experts, technical paper, 2014.

[Moore- A.W. Moore and D. Zuev, “Internet traffic classification using bayesian analysis 2005] techniques,” ACM SIGMETRICS Performance Evaluation Review 2005; 33: 50–60.

[Moore- K.L. Moore, T.J. Bihl, K.W. Bauer Jr, and T.E. Dube, “Feature extraction and feature 2017] selection for classifying cyber traffic threats,” Journal of Defense Modeling and Simulation: Applications, Methodology, Technology, 2017, vol. 14, no. 3, pp 217-231.

[Morton- G.D. Morton, M. Mihelic, M. Moniz, P.R. Thornton, R. Pressley, and L. Lee, “Mission- 2018] based, game-implemented cyber training system and method ,” U.S. Patent No. 10,056,005, Granted on August 21, 2018, Assignee: Circadence Corporation (Boulder, CO) US. http://patft.uspto.gov/netacgi/nph- Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch- adv.htm&r=3&p=1&f=G&l=50&d=PTXT&S1=circadence.ASNM.&OS=an/circadence&RS= AN/circadence

[Moskal- S. Moskal, D. Kreider, L. Hays, B. Wheeler, S. J. Yang, and M. Kuhl. 2013. “Simulating 2013] Attack Behaviors in Enterprise Networks.” In 2013 IEEE Conference on Communications and Network Security (CNS), 359–60. https://doi.org/10.1109/CNS.2013.6682726.

[Moskal- S. Moskal, B. Wheeler, D. Kreider, M.E. Kuhl, and S.J. Yang, “Context Model Fusion for 2014] Multistage Network Attack Simulation,” 2014 IEEE Military Communications Conference, pp. 158-163. https://doi.org/10.1109/MILCOM.2014.32.

[Moskal- S. F. Moskal, “Knowledge-based Decision Making for Simulating Cyber Attack Behaviors,” 2016] MSc. Thesis, Rochester Institute of Technology, July 2016.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

84 28 March 2019

[Moskal- S. Moskal, S.J. Yang, and M.E. Kuhl, “Cyber threat assessment via attack scenario 2018] simulation using an integrated adversary and network modeling approach,” Journal of Defense Modeling & Simulation, vol. 15, no. 1, pp. 13-29, Jan. 2018. https://doi.org/10.1177/1548512917725408. http://journals.sagepub.com/doi/pdf/10.1177/1548512917725408

[Mudge- R.S. Mudge and S. Lingley, (2008). “Cyber and air joint effects demonstration (CAALED),” 2008] Unpublished manuscript, Air Force Research Laboratory, Information Directorate, Rome Research Site, Rome, NY. Retrieved from https://apps.dtic.mil/dtic/tr/fulltext/u2/a481288.pdf

[Mullin- B.E. Mullins, T.H. Lacey, R.F. Mills, et al. “How the cyber defense exercise shaped an 2007] information-assurance curriculum,” IEEE Security and Privacy 2007; 5: 40–49.

[Nandi- A. K. Nandi, H. R. Medal, and S. Vadlamani. 2016. “Interdicting Attack Graphs to Protect 2016] Organizations from Cyber Attacks: A Bi-Level Defender–Attacker Model.” Computers & Operations Research 75 (November): 118–31. https://doi.org/10.1016/j.cor.2016.05.005.

[Naudon- Naudon, M. (2010, June 25). “Exercice piranet 2010,” Retrieved from 2010] http://www.ssi.gouv.fr/IMG/pdf/2010-06- 25_Communique_de_presse_Piranet_2010.pdf

[NCR-2019] The National Cyber Range. (Last accessed on Jan. 21, 2019). https://www.acq.osd.mil/dte-trmc/ncr.html

[Netlogo- Netlogo (version 5.1.0), http://ccl.northwestern.edu/netlogo/ (2015, accessed 1 August 2016] 2016).

[Ng-2000] A. Y. Ng and S. J. Russell, “Algorithms for Inverse Reinforcement Learning,” ICML ’00 Proceedings of the Seventeenth International Conference on Machine Learning, June 2000, pp. 663-670. https://ai.stanford.edu/~ang/papers/icml00-irl.pdf

[Nguyen- T. H. Nguyen, M. Wright, M. P. Wellman, and S. Baveja. 2017. “Multi-Stage Attack Graph 2017] Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis.” In Proceedings of the 2017 Workshop on Moving Target Defense, 87–97. MTD ’17. New York, NY, USA: ACM. https://doi.org/10.1145/3140549.3140562.

[Niu-2015] H. Niu and S. Jagannathan, “Optimal defense and control of dynamic systems modeled as cyber-physical systems,” Journal of Defense Modeling and Simulation: Applications, Methodology, Technology, 2015, vol. 12, no 4, pp. 423–438.

[Noel- S. Noel, J. Ludwig, P. Jain, D. Johnson, R. K. Thomas, J. McFarland, B. King, S. Webster,

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

85 28 March 2019

2015a] and B. Tello. 2015. “Analyzing Mission Impacts of Cyber Actions (AMICA).” In Proceedings of the NATO IST-128 Workshop: Assessing Mission Impact of Cyberattacks. Istanbul, Turkey.

[Noel- S. Noel, E. Harley, K H. Tam, and G. Gyor, “Big-Data Architecture for Cyber Attack Graphs: 2015b] Representing Security Relationships in NoSQL Graph Databases,” IEEE Symposium on Technologies for Homeland Security (HST), 2015.

[Nopiah- Z. Nopiah et al. “Peak-valley segmentation algorithm for fatigue time series data”. In: 2008] WSEAS Transactions on Mathematics 7.12 (2008), pp. 698–707.

[NS-3-2019] https://www.nsnam.org. As of 21 January 2019

[Ou-2006] X. Ou, W. F. Boyer, and M. A. McQueen. “A scalable approach to attack graph generation,” In CCS, pages 336–345, 2006.

[Park-2001] J.S. Park, J.-S. Lee, H.K. Kim, J.-R. Jeong, D.-B. Yeom, and S.-D. Chi, (2001). “Secusim: a tool for the cyber-attack simulation,” Information and Communications Security (pp. 471-475). Heidelberg: Springer Berlin.

[Pendergast- A. Pendergast, “The Diamond model for intrusion analysis: A Primer,” Threat Connect, 2014] 2014, https://digital-forensics.sans.org/summit- archives/cti_summit2014/The_Diamond_Model_for_Intrusion_Analysis_A_Primer_Andy _Pendergast.pdf

[Pizzonia- M. Pizzonia and M. Rimondini, “Netkit: network emulation for education,” Software: 2014] Practice and Experience. Wiley (2014);

[Polleres- A. F. Polleres, “The DLVK System for Planning with Incomplete Knowledge,” 2001 MS 2001] Thesis, Vienna University of Technology, Austria. https://vtechworks.lib.vt.edu/bitstream/handle/10919/71546/15_1.pdf?sequence=1&is Allowed=y

[Probst- C. W. Probst, J. Willemson, and W. Pieters. 2015. “The Attack Navigator.” In Graphical 2015] Models for Security, 1–17. Lecture Notes in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-29968-6_1.

[Puterman- M.L. Puterman, “Markov decision processes: discrete stochastic dynamic programming,” 1994] New York: John Wiley & Sons, 1994.

[Raghaven- T. Raghaven, T. Ferguson, and T. Parthasarathy, et al., “Stochastic games and related 1990] topics,” edited by T.E.S. Raghaven, Thomas S. Ferguson, T. Parthasarathy, and O.J. Vrieze, Springer, 1990

[Raj-2016] A.S. Raj, B. Alangot, S. Prabhu, and K. Achuthan, “Scalable and Lightweight CTF Infrastructures Using Application Containers,”In 2016 USENIX Workshop on Advances in Security Education (ASE16), Austin, TX. USENIX Association.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

86 28 March 2019

[Ramaki- A. A. Ramaki, , M. Amini, and R. E. Atani. 2015. “RTECA: Real Time Episode Correlation 2015] Algorithm for Multi-Step Attack Scenarios Detection.” Computers & Security 49 (March): 206–19. https://doi.org/10.1016/j.cose.2014.10.006.

[Rass-2017] S. Rass, S. König, and S. Schauer. 2017. “Defending Against Advanced Persistent Threats Using Game-Theory.” PLOS ONE 12 (1): e0168675. https://doi.org/10.1371/journal.pone.0168675.

[Rege-2017] A. Rege, Z. Obradovic, N. Asadi, B. Singer, and N. Masceri. 2017. “A Temporal Assessment of Cyber Intrusion Chains Using Multidisciplinary Frameworks and Methodologies.” In 2017 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), 1–7. https://doi.org/10.1109/CyberSA.2017.8073398.

[Rimondini- M. Rimondini, “Emulation of computer networks with Netkit,” Dipartimento di 2007] Informatica e Automazione, Roma Tre University, (2007)

[Riverbed- Riverbed, http://www.riverbed.com/about/news-articles/press-releases/riverbed-to- 2019] acquire-opnet-technologies-inc.html. As of 21 January 2019

[Ross-2016] D.J. Ross, “Influencing Gameplay in Support of Early Synthetic Prototyping,” Thesis, Naval Postgraduate School. (2016). Retrieved from http://www.dtic.mil/get-tr - doc/pdf?AD=AD1026809

[Rossey- L.M. Rossey, R.K. Cunningham, D.J. Fried, J.C. Rabek, R.P. Lippmann, J.W. Haines, and M. 2002] Zissman, “Lariat: Lincoln adaptable real-time information assurance testbed,” Aerospace Conference Proceedings, IEEE (2002).

[Rossey- L. Rossey , “SimSpace Cyber Range,” ACSAC 2015 Panel: Cyber Experimentation of the 2015] Future (CEF): Catalyzing a New Generation of Experimental Cybersecurity Research,” In https://www.acsac.org/2015/program/ACSAC%202015%20CEF%20Panel%20- %20Balenson.pdf

[SafeBreach- SafeBreach, “Breach and Attack Simulation Primer,” SafeBreach e-Book 2017]

[Sakhardand R.R. Sakhardande, (2008). “The use of modeling and simulation to examine network e-2008] performance under denial of service attacks,” Unpublished manuscript, Department of Telecommunications, SUNY Institute of Technology, Utica, NY.

[Sarraute- C. Sarraute, Carlos, O. Buffet, and J. Hoffmann. 2012. “POMDPs Make Better Hackers: 2012] Accounting for Uncertainty in Penetration Testing.” In Twenty-Sixth AAAI Conference on Artificial Intelligence. https://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4996.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

87 28 March 2019

[Sarraute- C. Sarraute, O. Buffet, and J. Hoffmann. 2013. “Penetration Testing == POMDP Solving?” 2013] ArXiv:1306.4714 [Cs], June. http://arxiv.org/abs/1306.4714.

[Schmidt- S. Schmidt, R. Bye, J. Chinnow, K. Bsufka, A. Camtepe, and S. Albayrak. “Application-level 2010] simulation for network security,” SIMULATION, 86:311–330, 2010.

[Schneider- R Schneider. “Survey of Peaks/Valleys identification in Time Series”. In: Department of 2011] Informatics, University of Zurich, Switzerland (2011).

[Security Security Competence (Last accessed on Jan. 21, 2019). Hacking-Lab. Competence http://www.hacking-lab-ctf.com/technical.html. -2019]

[Shah-2006] A.A. Shah, “Experimenting with prototype system DLVK via different approach,” Knowledge-Based Systems, vol. 19, issue 8, December 2006, pp. 681-686.

[Shameli- A. Shameli-Sendi, N. Ezzati-Jivan, M. Jabbarifar, and M. Dagenais. 2012. “Intrusion 2012] Response Systems: Survey and Taxonomy.” International Journal of Computer Science and Network Security 12 (1): 1–14.

[Shameli- A. Shameli-Sendi and M. Dagenais. 2014. “ARITO: Cyber-Attack Response System Using 2014] Accurate Risk Impact Tolerance.” International Journal of Information Security 13 (4): 367–90. https://doi.org/10.1007/s10207-013-0222-9.

[Shannon- P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. 2003] Schwikowski, and T. Ideker, “Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks,” Genome Res. 2003 Nov; 13(11): 2498–2504. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC403769/pdf/0132498.pdf

[Shmaryahu- D. Shmaryahu. 2016. “Constructing Plan Trees for Simulated Penetration Testing.” In The 2016] 26th International Conference on Automated Planning and Scheduling. Vol. 121.

[Singh-2009] R. Singh, (2009, March 26). “Divine matrix: Indian army fears china attack by 2017,” Retrieved from https://www.hindustantimes.com/delhi-news/indian-army-fears-attack- from-china-by-2017/story-olQBpRdaSIKp2rDUkfbSGN.html

[Smith-2017] R.E. Smith and B.D.Vogt, “Early Synthetic Prototyping, Digital Warfighting for Systems Engineering,” CSIAC JOURNAL OF CYBER SECURITY AND INFORMATION SYSTEMS, pp.34- 41, December, 2017. in https://www.csiac.org/wp- content/uploads/2017/11/CSIAC_Journal_V5N4_WEB.pdf

[Snively- K. Snively and P. Grim: “06F-SIW-093 ProtoCore: A Transport Independent Solution for 2006] Simulation Interoperability” Simulation Interoperability Workshop, September 2006

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

88 28 March 2019

[Sommestad T. Sommestad and F. Sanstrom, “An empirical test of the accuracy of an attack graph -2015a] analysis tool,” Journal of Information and Computer Security, vol. 23, no. 5, pp. 516-531, https://doi.org/10.1108/ICS-06-2014-0036

[Sommestad T. Sommestad. 2015. “Experimentation on Operational Cyber Security in CRATE.” In . -2015b] NATO STO-MP-IST-133 Specialist Meeting, Copenhagen, Denmark.

[Song-2015] J. Song and J. Alves-Foss. 2015. “The DARPA Cyber Grand Challenge: A Competitor’s Perspective.” IEEE Security & Privacy 13 (November): 72–76. https://doi.org/10.1109/MSP.2015.132.

[Steppe- J.M. Steppe and K.W. Bauer Jr, “Improved feature screening in feedforward neural 1996] networks,” Neurocomputing 1996; 13: 47–58.

[Stytz-2010] M.R. Stytz and S.B. Banks, “Addressing Simulation Issues Posed by Cyber Warfare Technologies,” 2010, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.465.7023&rep=rep1&type=p df

[Sudit-2005] M. Sudit, A. Stotz, and M. Holender. 2005. Situational awareness of coordinated cyber attack. In Proceedings of the International Society for Optical Engineering Conference, Orlando, FL.

[TCPDump] TCPDump. http://www.tcpdump.org (accessed 30 May 2019).

[Templeton- S. J. Templeton and K. Levitt. “A requires/provides model for computer attacks,” In 2000] Proceedings of the 2000 Workshop on New Security Paradigms, NSPW ’00, pages 31–38, New York, NY, USA, 2000. ACM.

[Thompson- B. Thompson and J. Morris-King, “An agent-based modeling framework for cybersecurity 2018] in mobile tactical networks,” Journal of Defense Modeling and Simulation: Applications, Methodology, Technology, 2018, vol. 15, no. 2, pp. 205–218.

[TSI-2018] TSI (Technical Systems Integrators), “Cyber Range as a Service (CRaaS),” June 2018. https://docs.wixstatic.com/ugd/b18c0e_d79d1fac157a44c7919079a2fab2a187.pdf

[Varga-2001] A. Varga, “The OMNeT++ discrete event simulation system,” Proceedings of the European simulation multiconference, 2001.

[Veksler- V.D. Veksler, N. Buchler, B.E. Hoffman, D.N. Cassenti, C. Sample, and S. Sugrim, 2018] “Simulations in Cyber-Security: A Review of Cognitive Modeling of Network Attackers, Defenders, and Users,” Frontiers in Psychology, May 2018, vol. 9, article 691. https://doi.org/10.3389/fpsyg.2018.00691.

[Vigna-2014] G. Vigna, K. Borgolte, J. Corbetta, A. Doupe, Y. Fratantonio, L. Invernizzi, D. Kirat, and Y. Shoshitaishvili, (2014). “Ten Years of iCTF: The Good, The Bad, and The Ugly,” In 2014 USENIX Summit on Gaming, Games, and Gamification in Security Education (3GSE 14).

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

89 28 March 2019

[Vorobeychi Y. Vorobeychik and John R. Wallrabenstein. 2013. “Using Machine Learning for k-2013] Operational Decisions in Adversarial Environments.” Sandia National Laboratories (SNL- CA), Livermore, CA (United States).

[Vykopal- J. Vykopal, R. Oslejsek, P. Celeda, M. Vizvary, and D. Tovarnak, “KYPO Cyber Range: 2017] Design and Use Cases,” 2017, https://is.muni.cz/repo/1386573/2017-ICSOFT-kypo-cyber- range-design-paper.pdf

[Wabiszewsk M.G. Wabiszewski Jr, T.R. Andel, B.E. Mullins, and R.W. Thomas, “Enhancing realistic i-2009] hands-on network training in a virtual environment,” Proceedings of the 2009 Spring Simulation Multiconference, Society for Computer Simulation International (2009).

[Wagner- N. Wagner, C.S. Sahin, M. Winterrose, J. Riordan, D. Hanson, J. Pena, and W.W. Streilein, 2017a] “Quantifying the mission impact of network-level cyber defensive mitigations,” Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 2017, vol. 14, no. 3, pp. 201–216.

[Wagner- D. Wagner. 2017. “Security and Machine Learning.” In Proceedings of the 2017 ACM 2017b] SIGSAC Conference on Computer and Communications Security, 1–1. CCS ’17. New York, NY, USA: ACM. https://doi.org/10.1145/3133956.3134108.

[Wang- S. Wang, Z. Zhang, and Y. Kadobayashi. 2013. “Exploring Attack Graph for Cost-Benefit 2013] Security Hardening: A Probabilistic Approach.” Computers & Security 32 (February): 158– 69. https://doi.org/10.1016/j.cose.2012.09.013.

[Wang- C. Wang, C. Shi, C. Wang, and Y. Fu. 2016. “An Analyzing Method for Computer Network 2016] Security Based on Markov Game Model.” In 2016 IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), 454–58. https://doi.org/10.1109/IMCEC.2016.7867253.

[Weiss- R. Weiss, F. Turbak, J. Mache, and M.E. Locasto, “Cybersecurity Education and 2017] Assessment in EDURange,” IEEE Security & Privacy, vol. 15 , no. 3, pp. 90-95, June 2017.

[Wellman- M. P. Wellman, “Putting the agent in agent-based modeling,” Autonomous Agents and 2016] Multi-Agent Systems, 30:1175–1189, 2016.

[Wheeler- B.F. Wheeler, “A Computer Network Model for the Evaluation of Moving Target Network 2014] Defense Mechanisms,” MSc. Thesis, Rochester Institute of Technology, December 2014. https://scholarworks.rit.edu/cgi/viewcontent.cgi?article=9690&context=theses

[Wilensky- U. Wilensky (1999) NetLogo. http://ccl.northwestern.edu/netlogo/. Center for 1999] Connected Learning and Computer-Based Modeling, Northwestern University, Evanston.

[Wireshark] Wireshark. http://www.wireshark.org (2019, accessed 30 May 2019).

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

90 28 March 2019

[Xu-2012] H. Xu, S. Jagannathan and F.L. Lewis. “Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses,” Automatica 2012; 48: 1017–1030.

[Xue-2014] M. Xue, W. Wang, and S. Roy, “Security concepts for the dynamics of autonomous vehicle networks,” Automatica (Journal of IFAC), vol. 50, no. 3, March 2014, pp. 852-857.

[Yang-2017] S. Yang and X. Wei. 2017. “Research on Optimization Model of Network Attack-Defense Game.” In 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), 426–29. https://doi.org/10.1109/ICSESS.2017.8342947.

[Yoginath- S.B. Yoginath, K.S. Perumalla, and B.J. Henz, “Virtual machine-based simulation platform 2015] for mobile ad-hoc network-based cyber infrastructure,” Journal of Defense Modeling and Simulation: Applications, Methodology, Technology, 2015, vol. 12, no. 4, pp. 439–456.

[Zhang- S. Zhang, X. Zhang, and X. Ou. 2014. “After We Knew It: Empirical Study and Modeling of 2014] Cost-Effectiveness of Exploiting Prevalent Known Vulnerabilities Across IaaS Cloud.” In Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, 317–328. ASIA CCS ’14. New York, NY, USA: ACM. https://doi.org/10.1145/2590296.2590300.

[Zhou-2003] M. Zhou and S. Lang, (2003). “A Frequency-based approach to intrusion detection,” Systemics, Cybernetics and Informatics, vol. 2, no. 3, pp. 52-56.

[Zhuang- R. Zhuang, S. Zhang, S. A. DeLoach, X. Ou, and A. Singhal. 2012. “Simulation-Based 2012] Approaches to Studying Effectiveness of Moving-Target Network Defense.” In National Symposium on Moving Target Research, 1–12.

[Zhuang- R. Zhuang, S. A. DeLoach, and X. Ou. 2014 “Towards a theory of moving target defense.” 2014] In Proceedings of the First ACM Workshop on Moving Target Defense, pages 31-40.

[Zhuang- R. Zhuang, A. G. Bardas, S. A. DeLoach, and X. Ou. 2015. “A Theory of Cyber Attacks: A 2015] Step Towards Analyzing MTD Systems.” In Proceedings of the Second ACM Workshop on Moving Target Defense, 11–20. MTD ’15. New York, NY, USA: ACM. https://doi.org/10.1145/2808475.2808478.

[Zonouz- S. A. Zonouz, H. Khurana, W. H. Sanders, and T. M. Yardley. 2014. “RRE: A Game- 2014] Theoretic Intrusion Response and Recovery Engine.” IEEE Transactions on Parallel and Distributed Systems 25 (2): 395–406. https://doi.org/10.1109/TPDS.2013.211.

[Zoto-2018] E. Zoto, S. Kowalski, C. Frantz, B. Katt, and E. Lopez-Rojas, “ CYBERAIMS: A Tool for Teaching Adversarial and Systems Thinking,” 2018, https://www.researchgate.net/publication/327745126_CYBERAIMS_A_TOOL_FOR_TEAC HING_ADVERSARIAL_AND_SYSTEMS_THINKING_abcde

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

91 28 March 2019

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

92 28 March 2019

Appendix A: Survey of Existing Cyber Simulation Solutions and Environments In this section, we survey existing cyber simulation environments. These environments are described in each subsection according to their main objectives, except for the last section which covers miscellaneous topics. The topics of each subsection are as follows:

• To understand the consequences of adversarial machine learning (Appendix A.1) • To study algorithm implementation to improve accuracy and lower computational requirements (Appendix A.2) • To study attack planning and adversary emulation (Appendix A.3) • To study attack tree and graph enhancements (Appendix A.4) • To survey existing breach and attack simulation solutions for assessing security posture in general (Appendix A.5) • To understand cloud security issues (Appendix A.6) • To generate cyber test data for use in cyber simulation (Appendix A.7) • To assess specific aspects of cybersecurity defense (Appendix A.8) • To explore confusion, deception, evasion, and concealment techniques for cybersecurity defense (Appendix A.9) • To explore reinforcement learning for cybersecurity defense (Appendix A.10) • To educate, train, and develop cybersecurity personnel (Appendix A.11) • To investigate the role of human factors (Appendix A.12) • To understand intrusion detection, response, and monitoring solutions (Appendix A.13) • To investigate MDP and POMDP-based approaches (Appendix A.14) • To understand payoffs quantification (Appendix A.15) • To explore securing defense solutions (Appendix A.16) • To list existing simulation systems (Appendix A.17) • To understand how to deal with software vulnerability (Appendix A.18) • To know how to validate changes and selection of optimal mitigation strategies (Appendix A.19) • To list some other cyber security issues (Appendix A.20)

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

93 28 March 2019

A.1. Adversarial Machine Learning This section describes articles dealing with adversarial machine learning. They are:

o [Vorobeychik-2013] provides a model of the adversarial machine learning o [Wagner-2017b] raises concerns of adversarial machine learning A.1.1 [Vorobeychik-2013] Operational Decisions in Adversarial Environments In [Vorobeychik-2013], the authors “model the adversarial machine learning problem by considering an unconstrained, but utility-maximizing, adversary” to account for intelligent adversary that actively tries to deceive the learner by generating false instances. They use a two- stage process: first, a machine learning tool is trained on historical data, and second, the linear program is solved to compute an optimal operational policy, using the predictions produced by the learning algorithm. To avoid being deceived, the operational decisions are made in a general randomized fashion, simulating a moving target defense strategy.

A.1.2 [Wagner-2017b] Security and Machine Learning In [Wagner-2017b], the author raises the concerns of adversarial machine learning. As the article is only a one-paragraph abstract, no much more is presented. However, a summary of the presentation of the paper is given here: https://medium.com/syncedreview/david-wagner-on- adversarial-machine-learning-at-acm-ccs17-f75e729a96d8, where the following are briefly discussed: o model evasion, e.g., generating “images that looked the same but were predicted with incorrect labels” o model extraction, e.g., stealing “the model by simply querying it” o model inversion, e.g.,” using features synthesized from the model to generate an input that maximizes the likelihood of being predicted with a certain label” o differential privacy which “is an area of research which seeks to provide rigorous, statistical guarantees against what an adversary can infer from learning the results of some randomized algorithm.”54 o data pollution attack, e.g., training set can be polluted

Note that the discussed topics are not directly related to network security.

54 From https://people.eecs.berkeley.edu/~stephentu/writeups/6885-lec20-b.pdf

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

94 28 March 2019

A.2. Algorithm Implementation Both the authors in [Durkota-2015a, Durkota-2015b, and Durkota-2016] and [Hamilton-2008] consider improvements in the algorithm design, basically limiting the needed solution search space, to mitigate implementation complexity in cyber simulation solutions. In [Moore-2017], the authors consider the reduction of feature space needed to accurately classify cyberattack from normal operating conditions. In [Yoginath-2015], the authors solve the timing issues in cyber simulation using a virtual time-synchronized virtual machine (VM)-based parallel execution framework.

A.2.1 [Durkota-2015a, Durkota-2015b, and Durkota-2016] Finding Approximate Algorithms in Honeypots Allocation In [Durkota-2016], the authors exploit the attacker’s lack of information in the problem of optimally allocating honeypots in a network. They found that

• “the number of possible defender actions grows exponentially with the number of honeypots and host-types” • “computing the attacker’s optimal attack plan is NP-hard” • “computing the defender’s strategy in Stackelberg equilibrium is also NP-hard for imperfect- information games”

Therefore they propose a number of approximation algorithms in [Durkota-2015a], where the search space is pruned [Durkota-2015b] and bounded [Durkota-2015a].

A.2.2 [Hamilton-2008] Managing Search Space in Adversary Modelling and Simulation In [Hamilton-2008], they “present a technique for avoiding the prohibitive cost associated with requiring live red team participation during each use of a simulation environment while still providing the advantages dynamic adversary modeling provides. Our approach utilizes game theoretic techniques, using a new probability based search technique to curtail the search-space explosion issues that previous attempts in this area have encountered. This technique, entitled Partially-Serialized Probability Cutoff Search, also includes a new approach to modeling time, allowing modeling of anticipatory strategies and time-dependent attack techniques.”

A.2.3 [Moore-2017] Feature Extraction and Feature Selection for Classifying Threats In [Moore-2017], the authors examine the use of the artificial neural network (ANN) for cyber network threat detection. Their combined classification and feature selection ANN was able “to remove non-salient features and determine an appropriate level of dimensionality for classifying cyberattack and normal operating conditions.” Using the Department of Defense Cyber Defense Exercises network traffic data from 2003-2007 and 2009, they were able to reduce the set of 248 features to 18 features and yet consistently classify with an accuracy of 83–97% (testing/training sets) and 63–88% (validation sets).

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

95 28 March 2019

“Common components of the salient features included segment size (maximum and minimum), number of segments or bytes sent in the initial window, the minimum window seen, and a count of the number of bytes in Ethernet, IP, or control packet sections (maximum, minimum, quartiles, total, and variance).”

A.2.4 [Yoginath-2015] Addressing Timing Issues in Modeling and Simulating Complex Systems To deal with the challenges of various timing issues arising in modeling and simulating complex systems such as mobile ad-hoc networks (MANETs), in [Yoginath-2015], the authors “present an approach in successfully reconciling them using a virtual time-synchronized virtual machine (VM)-based parallel execution framework that accurately lifts both the devices, as well as the network communications, to a virtual time plane while retaining full fidelity. At the core of our framework is a scheduling engine that operates at the level of a scheduler, offering a unique ability to execute multi-core guest nodes over multi-core host nodes in an accurate, virtual time-synchronized manner.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

96 28 March 2019

A.3. Attack Planning and Adversary Emulation This section describes solutions that deal with attack planning and adversary emulation. They are: o [Applebaum-2016] describes a red teaming emulator o [Applebaum-2017] describes a simulation testbed to model attackers o [Futoransky-2010a] presents a formal model for simulating cyber computer attacks o [Futoransky-2010b] presents a simulation platform for designing and simulating attack scenarios o [Grant-2018] focuses on attack planning o [Holm-2016] describes tools for design of attack plans o [Miller-2018] deals with adversary emulation A.3.1 [Applebaum-2016] Emulating Red Team Behavior The authors in [Applebaum-2016] consider emulating intelligent red team behavior (technique, tactics, procedures, and goals) automatically after the perimeter has been breached. They “consider red teaming only in the context of testing enterprise computer networks” with a system called CALDERA (Cyber Adversary Language and Decision Engine for Red Team Automation).

The key component of CALDERA is in the attack planning when faced with vast amount of uncertainty. They use classical and conformant planning 55, Markov decision processes, and Monte Carlo simulations on top of their logically encoded cyber environment and adversary profiles to guide the automated decision making process.

The agent behavior is described by the 3-tuple , , , where • is the state of compromise encoded as a set of logical predicates of agent � � � • is the state of knowledge representing〈� what� � the〉 agent knows about the network as � �encoded as a set of logical predicates � � • � is the set of agent actions, each of which is associated� with a set of pre-conditions that must be true for it to be executable and a set of post-conditions that will be true � when� it is successfully� executed

The 3-tuple template is used to describe both the attacker and defender actions. That is, { , }. As the authors focus is on the attacker, they construct only a very “simple, passive defender that is completely unaware of the adversary’s presence, leaving� ∈ the implementation�𝑎����� ��� �of�� a𝑑 full defender to future work.” [Applebaum-2016] The defender does not actually execute any defensive action except goes about doing almost regular activities such as = {rebooting a workstation, logging in and out of a workstation}. The defender does have a full knowledge of the world but is not aware of the presence of the attacker. That is �������� � = and = {knowledge of the world}.

��������� ∅ ���������

55 ‘Conformant planning means "synthesizing a strong plan under the assumption of null run-time observability". A conformant plan consists of a sequence of actions that is guaranteed to achieve the goal regardless of the uncertainty in the initial condition and in the nondeterministic effects of actions.’ [http://mbp.fbk.eu/conformant.html]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

97 28 March 2019

For the attacker behavior,

= {Escalated footholds, Known credentials, Exfiltrated data, Remotely probed hosts, Host enumerated} ������𝑎 � = {Known connections, Known authorized remote logins, Known local admins} ������𝑎 � = {Remote desktop protocol, Dumping credentials with Mimikatz56, Probe accounts, Enumerate local host, Exfiltration, Privilege escalation} ������𝑎 � The authors “want actions that are commonly executed, such that the automated red team system accurately emulates an adversary. We thus base our attack model on the MITRE developed framework Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK). ATT&CK provides a common syntax to describe TTPs that an adversary can execute during the post- compromise attack phase. Notably, the entries in the ATT&CK taxonomy were largely informed by reports on real world activities engaged by APTs, in some cases corresponding directly to them.” Here TTPs is “Tactics, Techniques and Procedures” and APTs is “Advanced Persistent Threats”.

The key feature of CALDERA is its decision planning of the red team based on their set behavior. The planning is conducted while treating “uncertainty as only in the environment, labeling actions instead as deterministic and using a simple pre- and post-condition model similar to the requires/provides architecture as put forth in [Templeton-2000].” Their planning algorithm is basically as follows: “ 1. Generate a list of plans, , given action , environment , taken from some logical language , and a set of goals . 2. Assign each plan in a �score. � ℰ 3. Select ℒ as the one with the� best∈ ℒ score. 4. Execute the first action,� , specified by . 5. If terminating� ∈ � condition is true, stop. 6. Otherwise, return back to� step 1, updating� based on the post-conditions of . “ � ℰ � As each plan is considered as a sequence of actions, { , ,…, }, the score of a plan is then defined as follows: 1 2 � �( )� � � ( ) = � � �� � � � �=1 � where () is a user defined reward function for actions, s.

� � � ′ 56 Mimikatz is an open-source utility that enables the viewing of credential information from the Windows lsass (Local Security Authority Subsystem Service) through its sekurlsa module which includes plaintext passwords and Kerberos tickets which could then be used for attacks such as pass- the-hash and pass-the-ticket. [https://www.itprotoday.com/mobile-management-and-security/what- mimikatz]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

98 28 March 2019

Moreover, the attacker has to build up its initially limited view of the environment using its 3-tuple , , behavior. It is basically a look-ahead strategy that gathers as much post-condition predicates with using pre-conditions of� attacker ⊆ ℰ ������𝑎 ������𝑎 ������𝑎 actions which〈� are already� known� to be true.〉 Doing look-ahead steps for a few actions give us a number of partial candidate plan sequences.� Selecting ∈ ℰ �the ∉ most � promising sequence among those allows the authors to plan under uncertainty and to determine the best known action to take.

Comparing their decision engine with the following three decision engine types:

• Random: executing an action selected randomly from a list of legal options. • Greedy: executing a goal action if available; otherwise follows the random strategy. • Finite-state Machine (FSM): executing the first legal action in the fixed order as described in a FSM while skipping invalid ones.

the authors found the following in terms of the average number of footholds, accounts, and exfiltrations:

• Their decision engine outperformed the other ones. • The random engine performed worst, while the FSM and greedy engines have similar performance. • High number of credentials stolen is not a determining factor in achieving the goal of exfiltration.

A.3.2 [Applebaum-2017] Analyzing Emulated Attacker Behavior The authors in [Applebaum-2017] “describe a simulation testbed designed to model attackers operating within a Windows enterprise network.” Their work highlights the importance of detection defense mechanism and its challenge if the attackers take extra precaution in mitigating risk of detection. As usual, the presence of vulnerability is always a problem.

A.3.3 [Futoransky-2010a] Building Computer Network Attacks The authors in [Futoransky-2010a] present a formal model consisting of various model components “to evaluate the costs of an attack, to describe the theater of operations, targets, missions, actions, plans and assets involved in cyberwarfare attacks.” They apply it to: o the “autonomous planning leading to automated penetration tests” o the “attack simulations, allowing a system administrator to evaluate the vulnerabilities of his network”

This article will be useful in understanding the different model components needed to be developed to simulate cyber computer attacks.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

99 28 March 2019

A.3.4 [Futoransky-2010b] Simulating Cyber-Attacks The authors in [Futoransky-2010b] present Insight, a new simulation platform for designing and simulating various cyber-attacks scenarios such as “a crowd of simulated actors, network devices, hardware devices, software applications, protocols, [and] users” They claim the platform can be used in the various cyber security research field “such as pentesting training, study of the impact of 0-day[s] vulnerabilities, evaluation of security countermeasures, and risk assessment tool.”

This article will be useful in understanding how to implement the different model components needed to simulate cyber computer attacks.

A.3.5 [Grant-2018] Speeding Up Cyber Attack Planning In [Grant-2018], the author “focuses on attack planning, assessing the state of the art in applying Artificial Intelligence (AI) techniques to automate this process.” The survey of the state of the art only considers vulnerability assessment, penetration testing, and Red teaming. The author, however, fails to consider the more recent developments such as those in Breach and Attack Simulation, kill chain concepts, and the DLV planning system used in Caldera. � A.3.6 [Holm-2016] Enabling Reproducible Cyber Security Experiment “To simulate attacker actions in a reproducible and reliable manner”, the authors in [Holm-2016] develop “the Scanning, Vulnerabilities, Exploits and Detection tool (SVED). SVED enables automated or manual low-effort design of attack plans. It also enables distributed automatic execution of sequences of attack steps as well as recording all the activity by carried out actions. Finally, SVED has interfaces for automatically interacting with popular intrusion detection tools such as Snort in order to get their alerts in real-time.” For future work, the authors indicate the need to enhance its current crude planning capabilities; “to extend SVED with capability to interact with more security tools that execute and detect malicious actions; and “to produce data that can be used by cyber security researchers and practitioners when evaluating the effectiveness of different metrics, methods and tools that are thought to enhance cyber security by some means.”

This article will be useful in understanding how to implement the different cyber components needed to perform cyber security experiments.

A.3.7 [Miller-2018] Automating Adversary Emulation In [Miller-2018], the authors emphasize that automated adversary emulation consists of solving the joint problem of planning an attack and acting on the plan. In planning, they consider exploiting vulnerability, discovering remote system, dumping credential, and managing uncertainty. In acting, they consider evaluating plans, constructing plans, and selecting a plan to execute. Moreover, they describe them in developing solutions to this joint problem, including their custom problem representation and solution algorithms.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

100 28 March 2019

This article will be useful for understanding the different issues involved in the design and implementation of an automated adversary emulation system. This paper is also closely related to [Applebaum-2016] which is described in Appendix A.3.1.

A.4. Attack Tree/Graph Enhancements This section describes solutions that deal with attack tree/graph enhancements. They are:

o [Gadyaskaya-2016] extends the traditional attack trees o [Gjøl-2017] is the same as that of [Gadyaskaya-2016] o [Wang-2013] modifies attack graph with Hidden Markov Model A.4.1 [Gadyaskaya-2016] Using Timed Automata in Attack-Defense Trees The authors in [Gadyaskaya-2016] extend the traditional attack trees (attack-defense trees) to incorporate both the attacker and defender actions. Enhancing the attacker and defender behaviors in the attack-defense trees, they give “a fully stochastic timed semantics for the behavior of the attacker by introducing attacker profiles that choose actions probabilistically and execute these according to a probability density.” In such a new formulation, the authors try to answer various questions such as the following: o Does a set of defense measures exist such that an attack can never occur? According to the authors, this can be answered by pure reachability check. o Does a set of defense measures exist such that an attack will take at least time units to succeed? On the other hand, the attacker will be interested in existence of success below time. � o If there is no time bound, assuming the attack will succeed, how soon will it happen? � o Considering fixed “distribution over execution times and the success probabilities of execution attacks, but let[ting] γ range freely among all possible probability mass functions” and having a range of possible parameterized stochastic attackers, they “are interested in finding the attacker that minimizes the cost.”

To be practical, we would need to know how to set the time unit associated to each atomic attack/defense actions and to select the right choice of probability distributions.

A.4.2 [Gjøl-2017] Evaluating Time Automata in Attack-Defense Trees [Gjøl-2017] treats the same topic as that of [Gadyaskaya-2016].

A.4.3 [Wang-2013] Exploring Attack Graph for Security Hardening In [Wang-2013], the authors combine a modified attack graph with Hidden Markov Model (HMM) to investigate the cost-benefit of security hardening. The attack graph is modified “to represent network assets and vulnerabilities (observations)” such as physical components, software and services, and vulnerabilities and security events. The HMM takes the attack graph as input and estimate attack states “driven by a set of predefined cost factors associated with potential attacks and countermeasures.” They also provide a heuristic searching algorithm “to automatically infer the optimal security hardening through cost-benefit analysis.” Through simulation-based experiments, they claim “that AG-HMM can significantly reduce the number

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

101 28 March 2019

of observations by associating them with relevant states, it meanwhile avoids state explosion suffered by many state-based security evaluation techniques.”

A.5. Breach and Attack Simulation (BAS) This group of simulation solutions finds vulnerabilities before the adversaries do. According to Harvey in [Harvey-2018a], their “security testing technology continually monitors networks and systems to help organizations determine how secure their environment is.” BAS is similar to penetration testing (Pen Test) but not exactly the same. Some differences are:

• Pen Test is more target-specific, while BAS is broader in scope • Pen Test is usually conducted by a security expert, while BAS is supposed to be usable by security expert with lesser experience in an automated manner • Pen Test provides snap-shot of security posture at a particular time and thus usually periodic, while BAS provides continuous security posture update • Pen Test tries to find if some security measure is broken, while BAS tries to provide how are the security measures holding • Pen Test ends with comprised system, while BAS usually does not compromise system

A list of most BAS solutions from [Barros-2018, Harvey-2018b, and Kumar-2018] is as shown in Table 17.

Before concluding this subsection, we review in more detail SafeBreach [SafeBreach-2017] to provide a sample in-depth view of a BAS solution.

Table 17 Breach and Attack Simulation Solutions Extracted from [Barros-2018, Harvey-2018b, and Kumar-2018] Name (Headquarters, Product Description Founded, Funding) AttacklQ (San Diego, Calif.; "AttackIQ delivers continuous validation of your 2013, enterprise security program so you can find the $14.3 M) gaps, strengthen your security posture and exercise your incident response capabilities." [Harvey- 2018b]

“AttackIQ is one of the popular security validation scalable platforms to strengthen your data center security. It is an offensive-defensive system to help security operation engineer exercise, red team capabilities.

The platform is integrated with a vital framework –

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

102 28 March 2019

MITRE ATT&CK. Some of the other features are. • Powered by AttackIQ research team and industry security leader • Customize the attack scenario to mimic the real-world threats • Automate the attacks and receive continuous security status report • Lightweight agents Works on a primary operating system and integrate well with existing infrastructure” [Kumar-2018] Caldera (Massachusetts, MA; “An adversary emulation tool. CALDERA supports 2017(?), only Windows Domain network. It is a MITRE research project) It leverages ATT&CK model to test and replicate the behavior.” [Kumar-2018] Circumventiv (Unknown, “Circumventive has its DNA in DLP 57 "leak e Unknown, testing," but has recently expanded to a framework Unknown) for mostly network-based threat simulation.” [Barros-2018] Cronus (Haifa, Israel; "CyBot is a next-generation vulnerability Cyber 2014, management tool as well as the world’s first Technologies $5.7 M) automated pen testing solution, that continuously showcases validated, global, multi-vector, Attack Path Scenarios (APS), so you can focus your time and resources on those vulnerabilities that threaten your critical assets and business processes." [Harvey-2018b] CyCognito (Palo Alto, Calif.; "CyCognito’s SaaS platform continuously 2017, simulates sophisticated attackers' actual Unknown) reconnaissance and examination processes across live infrastructure and network assets to provide comprehensive attack surface analysis in real- time." [Harvey-2018b] Cymulate (Rishon Le Zion, "Cymulate comprehensively identifies the security Israel; gaps in your infrastructure and provides actionable 2016, insights for proper remediation." [Harvey-2018b] $3 M) Foreseeti (Stockholm, Sweden; “securiCAD by foreseeti let you virtually attack 2015 (?), your infrastructure to assess and manage the risk Unknown) exposure. It works in three simple concepts. • Create a model – add what all (server, router, firewall, services, etc.) you want to test • Simulate an attack – to find out if and when

57 DLP (Data Loss Prevention)

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

103 28 March 2019

your system breaks • Risk report – based on simulation data, the actionable report will be generated which you can implement to lower the overall risk” [Kumar-2018] Guardicore, (Tel Aviv, Israel; “Infection Monkey is an open source tool which Infection 2013, can be installed on Windows, Debian, and . Monkey $46 M) You can run an automatic attack simulation for credential theft, misconfiguration, compromised assets, etc. Some of the worth mentioning features. • Non-intrusive attack simulation, so it doesn’t impact your network operations • Comprehensive audit report with an actionable recommendation to harden the web servers or other infrastructure • Low CPU and Memory footprint Visualize network and attacker map” [Kumar-2018] NeSSi2 (Distributed Artificial “NeSSi2 is an open-source, powered by JIAC Intelligence framework. NeSSi stands for Network Security Laboratory which is Simulator so you can guess what it does. It focuses part of Deutsche mainly to test intrusion detection algorithms, Telekom Innovation network analysis, profile-based automated attacks, Laboratories (T-Labs), etc. Unknown, Unknown) It requires Java SE 7 and MySQL to set up and runs.” [Kumar-2018] Pcysys (Petah Tikva, Israel; “Pcysys is another option that performs real Unknown, exploitation as part of its attack simulation. Pcysys Unknown) is an agentless solution and its tests will automatically identify new targets for exploitation and leverage obtained credentials and privileges as part of the attack.” [Barros-2018] Picus (San Francisco, Calif.; "Independent from any vendor or technology, the Security 2013, unparalleled Picus Platform is designed to $200,000) continuously measure the effectiveness of security defenses by using emerging threat samples in production environments." [Harvey-2018b] SafeBreach (Sunnyvale, Calif.; "Our unique software platform simulates adversary 2014, breach methods across the entire kill chain, without $34 M) impacting users or your infrastructure." [Harvey- 2018b] Scythe (Arlington, VA; “Scythe platform got a powerful and easy to use 2016(?), workflow to create and launch a real-world cyber

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

104 28 March 2019

$3 M) threat campaign. With the help of data, you can analyze your security endpoints in real-time.

Scythe is offered as SaaS model or on-premises. Whether you are a red, blue or purple team – it fits all.” [Kumar-2018] Threatcare (Austin, Texas; "We help build, measure, and maintain your 2014, cybersecurity through powerful breach and attack $2.1 M) simulations so you can scale your business." [Harvey-2018b]

“Threatcare is an efficient tool to validate control, identify the potential threat and provide insights in minutes.

It is a cross-platform application which can be installed on Windows or Linux in the Cloud or premises environment. Threatcare mimics how a hacker attacks your infrastructure, so you know how security controls are positioned. Some of the features you might be interested in. • MITRE ATT&CK playbooks with all 11 tactics • You can create a custom playbook • Actionable reporting • Scalable to meet the business growth demand” [Kumar-2018] Verodin (Mclean, VA; "Verodin is a business platform that provides 2014, organizations with the evidence needed to measure, $33.1 M) manage and improve their cybersecurity effectiveness." [Harvey-2018b] WhiteHaX (Silicon Valley, "WhiteHax is a unique multi-appliance, pre & post- Unknown, infiltration, security-breach simulation platform to Unknown) allow IT security teams to test the effectiveness of their already deployed security infrastructure." [Harvey-2018b] XM Cyber (Herzliya, Israel; "XM Cyber provides the first fully automated APT 2016, Simulation Platform to continuously expose attack $32 M) vectors, above and below the surface, from breach point to any organizational critical asset." [Harvey- 2018b]

“XM Cyber offer automated advanced persistent threat (APT) simulation solution. Stay ahead of the attacker.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

105 28 March 2019

You can select the target to run and setup on-going attacks and receive prioritized remediation report. Some highlights about the tool. • Customize the attack scenario based on needs • Visualize attack path • Up-to-date attack methods Best practices and policies recommendation” [Kumar-2018]

A.5.1 [SafeBreach-2017] A Breach and Attack Simulation Solution SafeBreach is described by Barros in [Barros-2018] as “one of the oldest BAS vendors, with a sizable client base and a mature, well-balanced set of functions.” To get ahead of the breach, SafeBreach suggests adopting a three-pronged, proactive, and predictive security strategy:

• Think like a hacker along the line of cyber kill chain • Challenge your defenses which include people, process, and technology • Continuously test to ensure any environment changes do not adversely affect security posture

As with all BAS solutions, SafeBreach is not exactly the same as penetration testing, security red teaming, or vulnerability scans. They suggest having a BAS platform with the following features [SafeBreach-2017]: “ • Uses a real hacker playbook 58 o Your automated breach simulation solution should use a black-box approach (no prior knowledge of the environment is required) and incorporate a comprehensive “hacker’s playbook” of threat types including brute force, exploits, malware, and remote access tools. • Provides continuous simulations … without impacting your environment or creating false positives. • Runs simulation in a real production • Simulates across the entire kill chain o By validating your defenses across the entire kill chain, you can determine your organization’s strengths and weaknesses and decide on the most effective way to stop a breach. For comprehensive visibility into risks across the entire organization, the simulation platform must support endpoints, network, and cloud. • Works with other tools o Your platform must work with the other required capabilities for preventing, detecting, and responding to threats. For example, when a breach is simulated, your platform must integrate with ticketing systems to create a task for the blue

58 The 2nd Edition of SafeBreach Hacker’s Playbook can be downloaded from https://safebreach.com/Hackers-Playbook-2nd-Edition?aliId=3331038

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

106 28 March 2019

team, or trigger automated remediation policies in automation and orchestration systems. “

On such a platform, SafeBreach “simulates hacker breach methods to find possible breach scenarios in your environment and then recommends fixes based on what is found. With SafeBreach, you can quantify actual risks of breach, validate whether your security controls are effective, and empower your security operations teams with actionable insight.”

SafeBreach [SafeBreach-2017] also improves security doing the following: “ • Validates the efficacy of one’s security defenses • Shortens exposure time • Proactively understands the impact of a breach • Ensures compliance with regulatory mandates • Empower red team members “

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

107 28 March 2019

A.6. Cloud Security This section describes the cost-effectiveness of various strategies in addressing cloud security.

A.6.1 [Zhang-2014] Exploiting and Resolving Cloud Vulnerability In [Zhang-2014], the authors “first identify the threat of exploiting prevalent vulnerabilities over public IaaS (Infrastructure as a Service) cloud with an empirical study in Amazon EC2.” From their qualitative cost-effectiveness analysis of this threat, they found “exploiting prevalent vulnerabilities is much more cost-effective than traditional in-house computing environment,” giving the attackers stronger incentive to do so. However, “cloud defenders (cloud providers and customers) also have much lower cost-loss ratio than in traditional environment,” giving them can be a more effective defensive position. After conducting a risk-gain analysis using a game- theoretic model, they find “their modeling indicates that under cloud environment, both attack and defense become less cost-effective as time goes by, and the earlier actioner can be more rewarding.” They then “propose countermeasures against such threat in order to bridge the gap between current security situation and defending mechanisms.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

108 28 March 2019

A.7. Data Generation In cyber simulation, we will need various kinds of data. Here, we describe the generation of attack sequence data [Krider-2015] and the intrusion data [Kuhl-2007].

A.7.1 [Krieder-2015] Specifying Attack Sequence In [Krieder-2015], the author introduces an Attack Guidance Template (AGT) for defining simulated cyberattack sequences and for guiding the attacker according to the goal of the attack plan. The template allows for exploring the likelihood of a simulated attack sequence and comparing it to a real attack sequence. The simulator, “Cyber Attack Scenario and network Defense Simulator” (CASCADES) is based on MASS (Multistage Attack Scenario Simulator [Moskal-2014]).

A.7.2 [Kuhl-2007] Modeling and Simulating Cyber Attack The authors in [Kuhl-2007] present a computer and intrusion detection system (IDS) modeling solution to simulate cyberattack scenarios (with cyberattacks modeling based on [Sudit-2005]) causing alerts to be raised. These alerts, outcome of the simulation, are IDS response to both malicious cyberattacks and non-malicious network activity specifiable by the user, and they are to be used to test and evaluate cyber security systems. The simulation model is implemented using the ARENA simulation software [Kelton-2004].

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

109 28 March 2019

A.8. Defense Assessment In this section, we describe the assessment of security posture using simulation of various cyber security contexts. [Acosta-2017] presents a method for developing and models for analyzing risk, while [Baiardi-2016] presents a suite of tools for assessing and managing risk. The authors in [Bergin-2015] and [Xue-2014] both consider the secure operation of autonomous vehicles. The denial of services is the key issue in [Bergin-2015] while protecting the network dynamics from being discovered is that of [Xue-2014]. [Hwang-2015] summarizes the state of the art in cyber assessment technologies and methodologies. In [Marshall-2012], the authors propose a capability to assess potential mission command applications. For the authors in [Moskal-2018], it is about different network configurations being secured from various attack behaviors. [Probst-2016] presents a graph-based approach to risk assessment. In [Smith-2017], the authors describe a rapid digital assessment framework for measuring “progress towards mission accomplishment through test and evaluation in the mission context.” In [Thompson-2018], the authors consider the spread of malware in the mobile tactical networks. In [Wagner-2017a], the authors examine and quantify network segmentation and communication isolation as a means of cyber defensive measures.

A.8.1 [Acosta-2017] Risk Analysis with Execution-Based Model Generation The authors in [Acosta-2017] describe “a method for developing models that can be used to analyze risk in mixed tactical and strategic networks, which are common in the military domain.” The models are created based on the “data collected from several black-box emulation executions.” They appear as “decision trees or other complex algorithm and formulas.” The authors claim to be able to generalize the developed models across scenarios and use them in assessing risk in attack graphs. The paper describes in detail “the process for creating models to predict traffic hijacking.” It “consists of three steps: collecting a dataset, designing a network description scheme, and building the prediction models (or classifiers).”

A.8.2 [Baiardi-2016] Assessing and Managing Risk The authors in [Baiardi-2016] describe a suite of tools, Haruspex, for assessing and managing the risk posed by information and communication technology systems. The tools allow for the following: o Description of the target system, including vulnerabilities, and the attacking agents, including different attack attributes and a builder that “considers how elementary attacks can be composed into chains.” o Application of the Monte Carlo method o Selection of countermeasures determined by the planner and manager of the suite

The builder, planner, and manager are similar to that of the CALDERA attack planning. However, in Haruspex, builder is for the attacking agent, and planner and manager is for the defending agent. The suite also has a simulation engine to execute “an experiment with several independent runs.”

“To validate the suite and verify it truthfully models attackers, it has been adopted in Locked Shield 2014, a network defense exercise with participants from 17 nations. The results of this exercise validate the designs of the tools.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

110 28 March 2019

This article will be useful in understanding the different cyber simulation tool components needed in simulating attack chains.

A.8.3 [Bergin-2015] Cyber-Attack and Defense Simulation Framework The author in [Bergin-2015] describes a notional framework to support modeling and simulation of cyber-attack and defense for training and assessment. It is mainly for the cybersecurity of military autonomous vehicles (MAVs) that must have secure communication on wireless networks. Some of the framework requirements are: • wireless propagation models • support for both simulation and emulation components • packet-level simulation • allows for plugging in of various network, cyber-attack and cyber defense models • flexible means for parameter configuration and modification

The author also describes about how the data model, data storage, data transfer, and data import/conversion are to be implemented. The author then illustrates how a DDOS (Distributed Denial of Service) attack would be simulated and what data (such as number of bytes sent, number of bytes received, number of packets sent, number of packets received, and number of dropped packets, delay, jitter, and propagation times) would be available for analysis.

A.8.4 [Hwang-2015] Operational Exercise Integration Recommendations In [Hwang-2015], the authors summarize “the state of the art in cyber assessment technologies and methodologies and prescribes an approach to the employment of cyber range operational exercises (OPEXs).” They emphasize that an assessment must have defined objectives within context and a set of metrics “to characterize the capabilities, performance, risk, or security …” Next, they consider threat-based cyber assessment that must address “both malicious and unintentional perturbations from operational objectives” with security goals, threat models, defense capabilities, and metrics.

For cyber range, the authors provide “background on the definition, purpose, use cases, and description of cyber ranges.” and describe how to design a cyber range, detailing the following aspects: o Design mantras o Support tools o Data generation o Instrumentation o Configuration management o Orchestration o Analytics

This article will be useful for understanding the different cyber components that are needed to design a cyber range.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

111 28 March 2019

A.8.5 [Marshall-2012] Distributed Cyber Operations Development In [Marshall-2012], the authors described the integration of Cyber Ops modeling and simulation capability into an HLA (High-Level Architecture) to perform distributed simulation events and experimentation, providing “a model for Computer Network Attack (CNA), Computer Network Defense (CND), and Computer Network Exploitation (CNE) traffic and its effects on military operations.” “This capability could be used to assess possible Mission Command applications, new Tactics, Techniques, and Procedures (TTPs), network management technology, and help highlight the critical nature of cyber operations for our military.”

A.8.6 [Moskal-2018] Cyber Threat Assessment The authors in [Moskal-2018] propose a method of analyzing a network’s security under cyberattack. To this end, the authors “present a method to measure the overall network security against cyber attackers under different scenarios.” The method allows for modeling the attacker behavior and setting the network configurations. The behavior consists of the attacker’s capability with respect to exposures, opportunity in the different stages of the kill chain, and intent of breaching which exposures (COI), and knowledge of the network and probability of attack actions. Together with the attacker’s preference, an attack action is selected.

“To conduct these experiments, the Cyber Attack Scenario and Network Defense Simulator (CASCADES) is used to model and simulate a variety of cyberattack scenarios. The network, the attacker, and the intent context models … are represented in CASCADES as the VT (Virtual Terrain), ABM (Attacker Behavior Model), and Adversary Intent Module (AIM), respectively.”

The authors are able to model attacker of different abilities: expert, amateur, comprehensive, and random and show the differences in their attack behaviors, in terms of the time required, the number of steps needed, and the choice of steps taken, against two configurations of a network. A more in-depth description of the solution in [Moskal-2018] is available in Appendix D.

A.8.7 [Probst-2016] The Attack Navigator In [Probst-2016], the authors propose “a graph-based approach to security risk assessment inspired by navigation systems” called the Attack Navigator (AN) which is being developed under the TRESPASS project. Based on the interdependency of the operating infrastructure having both social and technical aspect, the attack navigator identifies goal terminating attacker paths based on attacker properties such as skill or resources and profiles. This allows the “defenders to explore attack scenarios and the effectiveness of defense alternatives under different threat conditions.”

The approach is very similar to listing goal terminating paths in an attack graph. However, in this article, only attack trees are considered. Using attack-defence trees is left for future work.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

112 28 March 2019

A.8.8 [Smith-2017] Digital Warfighting for Systems Engineering In [Smith-2017], the authors argue that “The hardest thing for U.S. adversaries to duplicate is the integration of advanced technologies with skilled soldiers and well-trained teams.” Therefore, developing an Early Synthetic Prototyping (ESP) is going “to help the DoD achieve an enduring time-domain overmatch even if U.S. adversaries achieve technical parity.” ESP is a rapid digital assessment framework for measuring “progress towards mission accomplishment through test and evaluation in the mission context.” Feedback and recommendations from soldiers using ESP will help refine various force employment, force design, and equipment capability deployment.

A.8.9 [Thompson-2018] Agent-Based Cybersecurity Modeling Framework In [Thompson-2018], the authors “present an agent-based modeling framework to study malware spread in mobile tactical networks.” Their framework is particularly suited for modeling “hierarchical command structure, unit movement, communication over short-range radio, self- propagating malware, and cyber defense mechanisms.” The authors were able to evaluate how different cyber defense and compliance policies affect the spread of the malware in the network that supports complex mobile hierarchical and spatial structures.

A.8.10 [Wagner-2017a] Quantifying Mission Impact In [Wagner-2017a], the authors propose “to examine network-level cyber defensive mitigations and quantify their impact on network security and mission performance.” As “testing such mitigations in an live network environment is generally not possible due to the expense,” they use a modeling and simulation approach that “employs a modularized hierarchical simulation framework to model a complete cyber system and its relevant dynamics at multiple scales.”

The defensive mitigations, designed to protect the network at large from cyberattack, used in their investigation consist of: • Segregation of Networks and Functions (SNF) to partition a network to control network reachability; and • Limiting Workstation-to-Workstation Communication (LWC) to control reachability through communicability.

“The purpose is to provide a quantitative assessment of these network-level mitigations in the context of a complete network system and to consider mitigation effectiveness with respect to two fundamental network concerns, security and mission impact.”

A.8.11 [Xue-2014] Security Concepts for Autonomous Vehicle Networks In [Xue-2014], the authors examine the secure operation of autonomous vehicle networks (modeled with the canonical double-integration network) in the presence of adversarial observation. “Starting from an observability analysis of the closed-loop dynamics, several spectral and graphical characterizations of these security notions are obtained (for both the noise- free and noisy cases), that indicate the role of the network’s communication and control architecture in protecting the network dynamics from discovery.”

A.9. Defense by Confusion, Deception, Evasion, and Concealment

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

113 28 March 2019

In cyber security, confusion, deception, evasion, and concealment techniques are a form of defense. In this section, the deception technique described in [Durkota-2015a, Durkota-2015b, and Durkota-2016] considers optimally allocating honeypots. [Ferguson-2018] uses control of attacker perception, while [Jajodia-2017] uses probabilistic logic based deception techniques. Appendix A.9.4 considers articles related to moving target defense.

A.9.1 [Durkota-2015a, Durkota-2015b, Durkota-2016] Deceiving with Honeypots In [Durkota-2016], the authors use game theory to provide a methodology to deceive attackers that adapt their behavior to changing security measures. Specifically, it is about how to optimally place honeypots in a network in term of allocation cost and expected loss due to attacker compromising the network. Note that the solution in [Durkota-2016] is described in-depth in Appendix C.

An attack graph is used to represent a network hardened with honey pots. To circumvent the computational challenge of solving the problem in its compact representation, the attack graph is translated to Markov Decision Process (MDP) and then solved using policy search with a set of pruning techniques [Durkota-2015b]. Moreover, the authors in [Durkota-2015a] propose “a set of approximate solution methods and analyze the trade-off between the computation time and the quality of the strategies calculated.”

Their “results show strengths and weaknesses of both the theoretical and human solutions: humans were effective at defending against human attackers, but fared poorly against worst-case opponents. The game-theoretic strategies are robust, but there are opportunities to further exploit weaknesses in human opponents.”

A.9.2 [Ferguson-2018] Deception for Cyber Defense The authors in [Ferguson-2018] propose a “model of defensive cyber deception that incorporates defender control of attacker perception of the cyber environment”. They take a step beyond just optimally placing decoys to deceive attackers as described in [Durkota-2016]. Assuming the defender is now aware of the attacker’s presence, they model how to manipulate the true payoffs of the attackers and control the attackers’ moves through active deceptive responses.

A.9.3 [Jajodia-2017] Cyber Deception with Probabilistic Logic In [Jajodia-2017], the authors propose a probabilistic logic based deception technique that works on the mixing of the true and false responses to scan requests. Having modeled the attacker’s behavior and the effects of faked scan results, they “show how the defender can generate fake scan results in different states that minimize the damage the attacker can produce.” They then develop two algorithms to generate such results. One of them is shown to be more efficient in run-time than the other because of having pre-stored generated results.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

114 28 March 2019

A.9.4 Moving Target Defense This section consists of articles related to moving target defense solutions. They are:

o [Lei-2018] describes the concept of moving attack surface and exploration surface o [Vorobeychik-2013] describes using operational decisions made in a general randomized fashion o [Wheeler-2014] changes the network configuration over time to fool the intruders o [Zhuang-2010] investigates the effectiveness of proactively changing network configuration o [Zhuang-2015] presents the theory and concept related to moving target defense systems A.9.4.1 [Lei-2018] Incomplete Information Markov Game Theoretic Approach In [Lei-2018], the authors propose an “incomplete information Markov game theoretic approach to strategy generation for moving target defense … called IIMG-MTD”. They introduce the concept of moving attack surface which the defender needs to protect and the concept of moving exploration surface which the attacker needs to explore to infiltrate. “Markov decision process is used to describe the transition among network states in the process of MTD (Moving Target Defense) implementation.” Their proposed optimal defense strategy takes into account “hopping elements, hopping period, and hopping method …” in addition to defense cost. A.9.4.2 [Vorobeychik-2013] Operational Decisions in Adversarial Environments See Appendix A.1.1 for more information. A.9.4.3 [Wheeler-2014] Evaluating Moving Target Network Defense Mechanisms In [Wheeler-2014], a MSc. thesis, the author proposes using “Moving target network defense mechanisms aim to protect networks by modifying their attributes in order to confuse would-be attackers.” To this end, the author develops Dynamic Virtual Terrain (DVT) to capture different moving target network defense mechanisms under identical conditions so that they can be comparatively analyzed. Some of these mechanisms consist of “IP address hopping, port hopping, and dynamic firewall mechanisms in a cyberattack simulation environment.” The simulation of DVT is performed in CASCADES, a cyberattack simulator.

A time- and an event-driven mechanism are used to update DVT. A.9.4.4 [Zhuang-2012] Effectiveness Study of Moving Target Network Defense In [Zhuang-2012], the authors investigate the effectiveness of proactively changing a network’s configuration in thwarting attacker’s attempt and thus in improving the resiliency of the attacked system. They first present “a basic design schema of a moving-target network defense system.” With adaptations “limited to VM 59 refreshing, and all VMs assigned to a given role have the

59 Virtual Machine

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

115 28 March 2019

same configuration except for its ID 60 and IP address”, they are able to show that the success ratio of attacks decreases with in increasing frequency of adaptation. There are other adaptation techniques and corresponding control parameters that are yet to be investigated.

A.9.4.5 [Zhuang-2015] Analyzing Moving Target Defense Systems In [Zhuang-2015], the authors present a theory of cyberattacks to help analyze moving to defense (MTD) systems. They summarize the theory of moving target defense which is fully described in [Zhuang-2014].

They then “introduced a new concept called information parameters that supported the information-based definitions of target systems and attacker's knowledge.” The information parameters are then used “to define the concepts of attack type, attack instance, and exploration space.” These information parameters are basically what the attacker needs to launch successful attack and what defender needs to obfuscate to thwart attacker’s attempts. “By reasoning over the relationship between an MTD system's configuration parameters and an attack's information parameters,” the authors believe “we can formally describe and analyze interactions between attackers and MTD systems.”

60 VM ID

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

116 28 March 2019

A.10. Defense through Reinforcement Learning This section describes various cyber simulation solutions that employ learning strategies. In [Ben-Asher-2015], instance-based learning theory is used to study how much security infrastructure investment should be made to adequately protect one’s asset. In [Chung-2016], Naïve-learning strategy, a solution not requiring opponent information, is shown to be promising. In [Elderman-2017], Monte-Carlo learning with Softmax exploration is shown to be best among a number of competing learning solutions. Finally, in [Niu-2015], Q-learning is deployed in a cyber-physical simulation scenario. Note that these solutions are described in- depth in Appendix B.

A.10.1 [Ben-Asher-2015] Understanding Challenges of Cyber War In [Ben-Asher-2015], the authors “propose to study the decision making processes that drive the dynamics of cyber-war using multi-agent model comprising of cognitive agents that learn to make decisions according to Instance-Based Learning Theory (IBLT).” “IBLT [Gonzalez-2003] was originally developed to explain and predict learning and decision making in real-time, dynamic decision making environments. The theory has also been used as the basis for developing computational models that capture human decision making behavior in a wider variety of DFE (Decision from experience [Hertwig-2004]) tasks.”

Their CyberWar game works on the level of power and asset, where power represents the ability to defend one’s cyber security infrastructure and asset represents confidential information that a player is trying to protect or obtain from another. This game is inspired by the “n-Player model of social conflict over limited resources” as described in [Hazon-2011] and has been scaled up from the simple security game described in [Alpcan-2010] “to accommodate multiple players that can simultaneously decide to attack or not attack each other.”

The outcome of an attack is probabilistic and determined by the power of the player.

Specifically, player wins against player with probability , = . When a player wins, �� the player gains a portion� of player asset,� , . “Assets not�� �only𝑤 represent��+�� the motivation� for a cyber-attack. Assets are also required to initiate an attack and to protect from incoming attacks, � � whereas losing� all assets forces an agent� to be� suspended from the game for a specific duration.” The expected gains for player attacking player is as follows: , = , × , , where is the cost of player attacking player . � � For� � player� �𝑤, it is � ,� = , × , when attacked. � �� � � − � � � � � � � �𝑤 � � “Since a player� can� attack−�� another � player� and at the same time be attacked by several other players in any given round of the game, the calculations of gains and losses for each individual player are done sequentially and separately. First, losses from being attacked are calculated, then gains from winning an attack are calculated, and finally gains and losses are summed and the costs of attack and defense are deducted from the remaining assets of the agent.”

“The CyberWar game and the cognitive agents were implemented using Netlogo [Wilensky- 1999], a multi-agent simulation environment where each agent was an IBL model. Here, we used

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

117 28 March 2019

three levels of power and three levels of assets. The power of an agent was determined by a random draw from a uniform distribution. Low power had the minimum value of 100 and a maximum value of 350. Similarly, medium power ranged between 351 and 700, and high power ranged between 701 and 999. The three levels of initial assets were low, medium, and high assets also corresponded to a random draw from a uniform distribution from the following intervals: 100–350, 351–700, and 701–999. Thus, our environment included 9 different agents, each with a unique combination of power and assets.”

“When learning only from experience, instances are created by making a decision and observing its outcome. However, when making the first decision regarding other agents (i.e., a combination of power and assets), an agent had no previous experiences. Thus, to generate expectations from a decision and to elicit exploration of different opponents during the early stages of the interaction, we used prepopulated instances in memory which represents the agent’s expectations before having any actual experiences [Ben-Asher-2013a, Ben-Asher-2013b, and Lejarraga- 2012]. The prepopulated instances correspond to the possible outcomes from attacking and not attacking other types of agents and contained a fixed value of 500. As the agent accumulated experiences, the activation of these prepopulated instances decayed overtime. Their decay leads to a decrease in the retrieval probability, which in turn lead to a decrease in the contribution of these values to the calculation of the blended value.”

“Through a simulated war, we show that our approach can be applied to efficiently answer a wide range of questions regarding investment in cyber security infrastructure. In particular, we demonstrate how power (i.e., cyber security infrastructure) is the main determinant of the state of a unit during cyber-warfare. We also show how different levels of power lead to different dynamics in the ability to accumulate assets.”

In practice then, knowing what power one should have is crucial; but then what it should be is relative to the other players which may not be practically available.

A.10.2 [Chung-2016] Game Theory with Learning for Monitoring In [Chung-2016], the authors propose the use of Naïve Q-Learning, which uses limited opponent information, to response to adversarial incursion into a network to secure the system. While Markov game assumes full information about the opponent, minimax Q-Learning does not but uses the knowledge of history, the actual gain of the current iteration, and future expected reward to learn more about the opponent’s behavior. Minimax Q-Learning algorithm uses learning rate and exploration rate, in addition to the discount rate used in the Markov game. On the other hand, Naïve Q-Learning algorithm is similar to minimax Q-Learning but does not take the opponent’s behavior explicitly into consideration.

“From the experiments based on simulation, it was shown that Naive Q-Learning performs well against irrational (non-Markov) attackers, i.e., random decision-makers or attackers based on probing and learning. When played against a Markov attacker, the Naive Q-Learning approach was able to perform at least as well as a Minmax Q-Learning defender. … The simulation results show that despite the limited information on which decisions are based, our approach is promising compared to the traditional Markov game approach and Minmax Q-Learning.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

118 28 March 2019

A.10.3 [Elderman-2017] Learning in Cyber Security Simulation The autonomous cyber operations in networks are modeled as a Markov game with incomplete information and stochastic elements. The game is played by the attacker and defender in an adversarial sequential decision that can be assisted by different reinforcement learning techniques. The Monte Carlo learning with the Softmax exploration strategy, among others such as neural networks, Q-learning, is shown to be the most effective in performing the defender role and also for learning attacking strategies.

To simplify the following discussion, we assume the network is represented as a graph with only one starting node and, many intermediate nodes and only one end node. For each node, there is an attacker and a defender view. For the attacker, it is a list of different attacks that can be used such as ( , , , ), where 0 10 is an attack value on node . For the defender, it is the list of different defenses that can be used such as ( , , , ), where 0 10 and 1 2 10 � a detection� � value� ⋯ (� ), where 0≤ � ≤ 10. These values represent� the strength of attack, 1 2 10 � defense, and detection. Note that real attack, defense,� � and� detection⋯ � methods will≤ � have≤ to be mapped to one of these� �𝑑 values. ≤ �𝑑 ≤

Assuming the network topology is set, we simulate the game as follows: • Set all values of the START node for both attacker and defender to zero. • Set other defender nodes, INTERMEDIATE and END type, with security flaws, i.e., with some zero values and non-zero values to reflect defender’s strength • Set other attacker nodes to certain values to reflect attacker’s strength, which could initially be all zero. • The attacker starts at the START node. • The attacker and defender take turn playing the game. • When it is the attacker turn, the attacker increments one of the on the current node, , being visited. �� � o If > , then /* attack is successful */ . If the current node is the end node then attacker won and the game is over. � � �. Else� the attacker selects one of the neighboring nodes to be the current node and waits for the next simulation step. Only neighboring nodes and traversed nodes are available because the attacker has limited knowledge of the network. o Else /* attack fails */ . The attack is block. . If ( ) 0.10 is greater than some preset threshold, then the attack is considered detected, the defender won, and the game is ended. . Els�e �the𝑑 game∗ continues • When the defender’s turn comes, the defender can choose any node and either increment one of its defense value or its detection value. Any node can be chosen because the defender has full knowledge of the network. • When the game is over, the winner gets a reward of 100 points and the loser gets a reward of -100 points.

A.10.4 [Niu-2015] Defense and Control in Cyber-Physical Systems

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

119 28 March 2019

In [Niu-2015], the authors propose a cyber-physical representation scheme for examining the interaction between the cyber system states and physical system stability. Within this representation, the authors develop an optimal controller using Q-learning for the physical system with uncertain dynamics in the presence of cyberattacks. An example of a cyberattack could be increased network delay and packet losses causing difficulty in controlling, say, unmanned aerial vehicle.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

120 28 March 2019

A.11. Education, Training, and Development In this section, we describe various cyber simulation tools used for education, training, and development. Notable of them are the cyber ranges described in [Franklin-2018, Zoto-2018, Vykopal-2017, and Weiss-2017]. In [Fite-2014], the author argues for the needs of cyber simulation standards. In [Marshall-2014], the authors investigate on how to best incorporate cyber warfare effects in cyber simulation tools for training purposes. In [Stytz-2010], the authors focus on providing the decision makers the opportunity to learn from the results of various attack scenarios.

In its own category is the gamified learning environment which is described in more detail in Appendix A.11.2.

A.11.1 Cyber Ranges [Franklin-2018, Zoto-2018, and Vykopal-2017]

• CyberAIMs (Cyber Agents’ Interactive Modeling and Simulation) [Zoto-2018] o A simulation tool developed in Netlogo [Wilensky-1999] “to teach Master students in a Cyber security course.” In [Zoto-2018], the authors “describe the model and explain the design choices behind CyberAIMs in terms of associating them with the emerging concepts within cyber security curriculum, namely adversarial and systems thinking.”

• KYPO Cyber Range [Vykopal-2017] o “It provides a virtualized environment for performing complex cyberattacks against simulated cyber environments.” o “KYPO can be used for various different applications. During its design and development, we focused on these three main use cases: i) cyber research, development, and testing, ii) digital forensic analysis, and iii) cyber security education and training.” o Other cyber ranges listed in [Vykopal-2017] are: . DETER/DeterLab [Mirkovic-2010] . National Cyber Range (NCR) [NCR-2019] . Michigan Cyber Range (MCR) [MCR-2017] . SimSpace Cyber Range [ Rossey-2015] . EDURange [EDURange-2019] o Other lightweight training platforms listed in [Vykopal-2017] are: . Avatao [Buttyán-2016, Avatao-2019) . CTF365 [CTF365-2019] . Hacking-Lab [Security Competence-2019] . iCTF and InCTF iCTF framework [Vigna-2014] . InCTF [Raj-2016]

• Seven University-Connected Cyber Ranges [Franklin-2018] that are in the building stage, are open but still not in their full capacities, or are completed and fully functional are list below. o Florida Cyber Range (Web Page)

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

121 28 March 2019

o Georgia Cyber Innovation and Training Center (Web Page) o Regent University Cyber Range (Web Page) o Wayne State University’s Advanced Technology Training Center (extension of Michigan Cyber Range) o Western Washington University (Cyber Range Poulsbo) o Grand Canyon University (West Valley Range partnered with Arizona Cyber Warfare Range) o University of South Florida (Florida CyberHub) A.11.2 Gamified Learning Environments [Gallagher-2017] Unless indicated otherwise, this subsection is based mainly on the materials in [Gallagher-2017] describes the role of games in the cybersecurity training and some gamified learning environments. According to the authors of [Gallagher-2017], gamification is the application of “game mechanics to typically non-game activities including training to drive desired behaviors.” Here we are interested in the user learning behavior and not those of attacker/defender cybersecurity behaviors. Thus, this is unlike the learning behavior of the algorithms described in Appendix B about [Ben-Asher-2015, Chung-2016, Elderman-2017, and Niu-2015] and those adversarial user behaviors described in Appendix D about [Moskal-2018]. According to [Bunchball-2016], the typical game mechanics used to motivate and engage players to play, thus to learn, are the following:

• Fast Feedback • Transparency • Goals • Badges • Leveling Up • Onboarding • Competition • Community • Points

Although Capture the Flag (CTF) is considered a game, it may not be considered a full-fledged gamified learning environment due its lack of explicit support of game mechanics. It does give participants learning experience but the explicit reward or penalty only comes at the end of the game when success or failure is achieved. CTFs seem typically fall into the following two types [Gallagher-2017]: “ • Attack/defense CTFs that require each team to defend a given machine or small network. • Jeopardy-style CTFs that involve multiple categories of problems, each of which contains a variety of questions of different point values and difficulties. “

The benefit of gamifying cybersecurity training is to be able to elicit autodidactic behavior and make training a fun learning process. In the following, we describe briefly four such gamified

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

122 28 March 2019

learning environments: ESCALATE, Project Ares, Capture The Packet Training (CPT) tool, and Cyber Attack Academy (CAA)

A.11.2.1 ESCALATE ESCALATE, which was developed by Point3, was commercially launched in December 2016. It is designed to support the acquisition of skills in the cyber domain in a relaxed and engaged gamified learning environment. Some of its key features are the following:

• The ability to form and compete in teams • Team and global leaderboards • Awards points for challenge completions • Profile badges for accomplishing specific achievements • Problem-based approach • “Replayability” that generates a solution each time a challenge is attempted • Support for scaffolding and implicit feedback • Track learner behaviors using xAPI61 (Experience API from Advanced Distributed Learning (ADL)) A.11.2.2 Circadence Project Ares “Project Ares is a gamified, artificial intelligence (AI) powered cyber training environment by Circadence.” [Gallagher-2017]

Some of the key features of Project Ares are covered by the U.S. Patent 10,056,005, entitled “Mission-Based, Game-Implemented Cyber Training System and Method,” issued on August 21, 2018 [Morton-2018]. They are as follows:

• Access to an evolving library of mission scenarios and educational resources • Work alone or in teams • Hone skills inside realistic or mirrors of the organizational environments • “An AI component powered by IBM WatsonTM and SparkCognitionTM that acts as a coach, umpire, or even opponent” [Gallagher-2017] • “Real-time cyber-attack data that continually evolves and perpetually learns how threats appear, develop, and expose network systems” [Gallagher-2017] • AI-based monitoring and scoring • “AI is also used for offensive and defensive opponents to increase challenge factors.” [Gallagher-2017] • “AI provides monitoring and scoring as an umpire, and enables the trainees to play the missions against an adaptive, dynamic opponent that continually challenges the players to improve their tradecraft and hone their offensive and defensive skills.” [Circadence- 2018] • Skill badges and certifications can be earned

61 The Experience API (or xAPI) is a new specification for learning technology that makes it possible to collect data about the wide range of experiences a person has (online and offline). [https://xapi.com/overview/]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

123 28 March 2019

• Allows for massively multi player online game (MMOG) • Instructor portal with dashboards for monitoring progress and performance assessment. • It is fully xAPI enabled. • “Learners can also prepare in the Project Ares Battle School, which enables asynchronous practice and review of cyber skills and knowledge. Key features include: cyber games for technical topics (i.e. Cylitaire 62, PortFlow 63), battle room for non- mission specific tactical practice, and a media center for videos, documents, and other key resources/websites in cybersecurity.” [Gallagher-2017]

Although there is the concept of an AI player in Project Ares, there are no details available on how the player is designed and programmed. From [Morton-2018], this much is described: “The AI engine 30 also implements an AI opponent. The AI opponent preferably provides actions/responses to the game engine 22 for use in implementing a mission against a student. The AI opponent preferably also has a learning component which allows the AI opponent to change actions and responses over time, such as based upon student actions.”64

From [Morton-2018] again, “… AI opponent may comprise a defensive opponent to one or more offensive live students or an offensive opponent to one or more live defensive students).” For Project Ares, “… the AI opponent comprises a set of applications and processes focused on parsing all aspects of the system in real-time Such as logs, network messages, databases and database states, and the like, to determine if something of operational importance has changed within the particular training scenario. The AI opponent interacts with the Orchestration Agents 65 to obtain information and make operational changes. For example, when the AI component of the system detects a data change and a set of unexpected messages in a cyber threat scenario, it attempts to deduce from a knowledge database the implications of such a scenario and determine all possible root causes. As the AI component gathers additional data to narrow in on the cause, it may provide messages to trainers and students (such as hints, tips or warnings, such as by presenting messages through the in- game advisor feature), it may make changes automatically to the virtual environment within the training scenario in an attempt to remedy a potential breach, it may parse additional aspects of the virtual environment to gather more information, or it may do nothing and continue to monitor. In this way, the game play between one or more human students against an AI opponent emulates real-life scenarios wherein the AI opponent takes actions that a typical administrator would take given the detection of one or more possible cyber threats or system anomalies.”

62 For more info, go to https://cylitaire.circadence.com/. 63 For more info, go to https://portflow.circadence.com/. 64 Note that the numbers in “AI engine 30” and “game engine 22” are part of the patent documentation notation indicating the components in the document figures. 65 “The Mission Orchestration Agent is an agent service (e.g. a specialized software component developed to handle necessary requests and responses to configured and monitor each system dynamically) that runs on all machines within the mission environment and the controller interfaces with these agents to configure local networking and services. It installs packages, copies files from the master and allows arbitrary commands to be run from the master service. It also provides an in-game interface to monitor user progress, enable AI based opponent responses, and verify system health.” [Morton-2018]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

124 28 March 2019

From above, we observe that the AI opponent interacts continuously or continually with the Orchestration Agents to obtain state/environmental information and uses its knowledge base to make operational changes, thus responding to any actions caused by its opponents. How the knowledge base is initially populated is not described – most probably manually. How the in- game results may update the knowledge base is also not explicitly described.

There is no description of any attack planning scheme used or if reinforcement learning is involved at all. A.11.2.3 Capture The Packet Training Tool Aries Security supports Capture The Packet which was “originally based on network package analysis techniques and leveraging the current Capture The Packet training simulator framework” [Gallagher-2017].

Some of CPT’s key features are: • Test against live threats • Evaluate offensive and defensive abilities • Actionable growth strategy • Real-time records of student performance • Skill evaluation of users when operating under time constraints • A player can be on a player environment in which 10 teams of 5 participants each can concurrently compete • “Points are awarded for steals as well as deducted for any down time of any service.” [Gallagher-2017] • Does not need an Internet connection

A.11.2.4 Cyber Attack Academy “Although not a game per se, CAA provides a problem based approach within a role-playing scenario” [Gallagher-2017].

• AI-based tutoring providing scaffolds • Story-centered curriculum, where students do professionals work to produce the same professional-like results • Access to knowledgeable human mentors whose answers are stored in cyber knowledge database for future automated retrieval, updating existing solutions as needed • Learning environment augmented with an AI-based automated mentor • Use natural language processing to extract key semantic features from student questions to retrieve high-quality expert answers stored in cyber knowledge database

Repeating some of CAA’s capabilities, functionality and/or attributes in [Gallagher-2017], we have the following: “ • Cyber operators with a wide array of technical capabilities, inclusive of basic cyber operations skills in offense, defense, forensics, scouting, and hunting, cyber red team, penetration testing, acquisition and analysis of publicly available information, the Dark

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

125 28 March 2019

and Deep Web and other cyber operations topics achievable within the four to six-month duration of the course. • Reinforcement of the roles of anthropology, sociology and ethnography to better understand potential adversaries and themselves • Both remote individual and team learning and a culminating ’All Against All‘ team- based Capture the Flag (offense/defense) exercise to conclude the course. “

A.11.3 [Fite-2014] Simulating Cyber Operations The author in [Fite-2014] argues for the needed of “a publicly adopted Cyber Operations simulation standard to support training, assessment and technique development of operators within the electromagnetic communications spectrum.” In doing so, the author proposes to use Scenario Definition Language (SDL) to model Cyber Operations. The core simulation elements will be represented as objects and their interactions described in SDL. Using the proposed approach, the author describes how to create purpose-built simulations.

A.11.4 [Marshall-2014] Cyber Warfare Training Prototype Development In [Marshall-2014], the authors propose to conduct an analysis into the lack of cyber warfare effects in the training domain and to generate initial prototypes to refine user requirements, as well as, to develop the architecture for implementation. Their analysis indicates the need “to develop a loosely coupled software service that provides the capability to stimulate the effects of cyber-attacks on command and control communication among the blue mission command systems.” Their paper provides “the front-end analysis as well as the conceptual design plans for the initial prototype in order to solicit feedback from the Simulation Interoperability Standards Organization community.”

A.11.5 [Sommestad-2015b] Experimenting Operational Cyber Security in CRATE In [Sommestad-2015b], the author “describes how CRATE, the cyber range of the Swedish Defence Research Agency (FOI), has been used and will be used to test hypotheses and tools related to security assessments and situational awareness in the cyber security domain.” The author also provides the examples of previous experimental setups and discusses the challenges related to research using cyber ranges. Specifically, the examples consist of assessing vulnerability, predicting intrusion detection, synthetic environments creation, and synthetic event generation. The issues studies are the accuracy of vulnerability scanners, host vulnerability metrics, assessments of attack graph, filtering capability of intrusion detection mechanisms, and filtering IDS (Intrusion Detection System) alerts with situational data.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

126 28 March 2019

A.11.6 [Stytz-2010] Addressing Simulation Cyber Warfare Technologies Issues In [Stytz-2010], the authors focus on creating the results of various attack scenarios to allow the decision-maker to experience “the uncertainty, confusion, and information overload that accompany effective cyberattacks and lead to counter-productive behaviors.” To do so, one or more cyber simulation systems compute a sequence of cyber events, each based on “the simulated cyber warfare environment, the type of cyberattack, and the associated probabilities that the offense or defense will succeed at each stage of the cyberattack.”

A.11.7 [Weiss-2017] Cybersecurity Education and Assessment in EDURange In [Weiss-2017], the authors describe a platform called EDURange “for teaching security analysis skills to undergraduate students.” It has the following features:

• It allows students to “learn practical security concepts, tools, and skills in puzzle-like scenarios involving realistic challenges.” • It provides students scaffolding and assessment to achieve learning objectives. • It is user friendly. • It allows instructors to “tailor exercises to their classes and modify exercises to minimize the risk of students finding the answers online.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

127 28 March 2019

A.12. Human Factors This section describes the studies of human factors in cyber security. They are:

o [Buchler-2018] emphasizes the importance of human dynamics in cyber security operations o [Rosque-2019] argues for the need to model cognitive complexity o [Veskler-2018] deals with the human cognitive aspects

A.12.1 [Buchler-2018] Human Dynamics in Cybersecurity Performance The authors in [Bucher-2018] emphasize the importance of understanding the role of human dynamics in the performance of cyber security operations. From the data collected at the Mid- Atlantic Collegiate Cyber Defense Competition (MACCDC), they found that the leadership dimension affects positively the performance while face-to-face interaction is a “strong negative predictor of success.” Success is measured in term of the following three independent scoring dimensions: (a) Maintaining Services, (b) Incidence Response, and (c) Time-constrained Task Accomplishments.

To understand more the human dynamics issues, the authors propose “to extend our OAT (Observations Assessment of Teamwork) scaled assessment approach to include items that address team development and maturation level to include the overall amount of experience working as a team as well team composition to include technical skills, proficiency levels, and functional role assignments within each cyber team.”

A.12.2 [Rosque-2019] Cognitive Complexity in Cyber Range Event In [Rosque-2019], the authors argue the need to model the cognitive complexity of cyber range event operating environments to better understand cognitive workloads and improve human functions in such environments. After defining key concepts “such as cognitive complexity, task difficulty, and Blue Team cognitive workload”, they assess the cognitive complexity as experienced by a Blue Team and validate their assessment method.

A.12.3 [Veskler-2018] Human Cognition in Cybersecurity Operations While [Bucher-2018] deals with the role of human dynamics in the performance of cyber security operations, [Veskler-2018] deals with the role of human cognitive aspects. The authors describe how human “cognitive modeling can addresses multi-disciplinary cyber-security challenges requiring cross-cutting approaches over the human and computational sciences such as the following: (a) adversarial reasoning and behavioral game theory to predict attacker subjective utilities and decision likelihood distributions, (b) human factors of cyber tools to address human system integration challenges, estimation of defender cognitive states, and opportunities for automation, (c) dynamic simulations involving attacker, defender, and user models to enhance studies of cyber epidemiology and cyber hygiene, and (d) training effectiveness research and training scenarios to address human cyber-security performance, maturation of cyber-security skill sets, and effective decision-making.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

128 28 March 2019

This article will be useful in understanding how to improve personnel security savviness by being more cognizant of various security issues.

A.13. Intrusion Detection, Response, and Monitoring This section describes articles that deal with intrusion detection, response, and monitoring. They are:

o [Jajodia-2016] uses annotated probabilistic temporal logic programs o [Krall-2016] uses importance sampling method o [Ramai-2015] proposes a real time episode correlation algorithm o [Rege-2017] identify how adversaries carry out and adapt during the course of cyberattacks o [Shameli-2012] classifies intrusion response systems o [Shameli-2014] proposes risk impact based intrusion response system o [Zonouz-2014] proposes a game-theoretic intrusion response engine

A.13.1 [Jajodia-2016] Temporal Probabilistic Logic for Security Events Monitoring In [Jajodia-2016], the authors apply annotated probabilistic temporal (APT) logic programs to capture the probabilities of different network nodes being attacked. This information is in addition to the value of assets stored at the nodes. Together with a Stackelberg game theoretic formalization, the authors are able to compute the attacker’s maximal probability of success and ability to maximize damage. Using the computed results, “the defender can come up with optimal allocations of tasks to cybersecurity analysts, taking both network information into account as well as a behavioral model of the attacker into account.” After showing the correctness of their solutions and the complexity of the problem, the authors describe three algorithms (Pure, Greedy, and Mixed) for the defender to optimize its objectives and “show that these algorithms work well on realistic network sizes.”

A.13.2 [Krall-2016] Rare-Event Simulation for Infiltration Monitoring In [Krall-2016], the authors use “importance sampling method to estimate the likelihood of a successful network infiltration”, given that such event is rare and thus requires inordinate number of false events to have occurred before one could be encountered. Importance sampling method thus allows for expeditious comparison of solutions and improvements whose benefits are rarely breached.

A.13.3 [Ramaki-2015] Algorithm for Multi-Step Attack Scenario Detection In [Ramaki-2015], the authors propose a real time episode correlation algorithm (RTECA) for multi-step attack scenario detection. Attack scenarios are stored in the form of attack trees. Critical episodes extracted from sequences of alerts are correlated with the attack paths in the attack trees and “A Causal Correlation Matrix (CCM) is used for encoding correlation strength between the alert types in attack scenarios.” A database of off-line attack trees is used to assist in the prediction of the next steps in an on-line attack tree that is being used for monitoring.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

129 28 March 2019

A.13.4 [Rege-2017] Temporal Assessment of Cyber Intrusion Chains In [Rege-2017], the authors “identify how adversaries carry out, and adapt during” the course of cyberattacks. They compile this information from a single case of observations made “at the US Industrial Control Systems Computer Emergency Response Team’s (ICS-CERT) Red Team- Blue Team cybersecurity training exercise held at Idaho National Laboratory (INL).” They characterize first temporally the intrusion chain stages and then cluster those stages that are temporally close together. The approach allows them to assess how long each attack stage takes and how soon the subsequent stage occurs. Temporal clustering allows for grouping of attack stages. The time series of attack stages allows for identification of peaks where more attention should be focused.

A.13.5 [Shameli-2012] Survey and Taxonomy of Intrusion Response Systems In [Shameli-2012], the authors present a framework for classifying intrusion response systems published in the past decade to elicit insights into the field of Internet security. This framework include various concepts such as level or degree of automation, types of inputs expected, alert description, response cost model that is associated with attack categories, cost computation model, response system adaptability, response selection, response execution, and linkage to intrusion prediction and risk assessment. The authors believe using the right “response cost” model will determine the use of right security measures and responses that help ensure quality of service (QoS) of critical resources.

A.13.6 [Shameli-2014] Cyber-Attack Response System Using Risk Impact Tolerance In [Shameli-2014], the authors propose a risk impact based intrusion response system, called ARITO (Accurate Risk Impact Tolerance). The risk assessment component “assesses the value of the loss that could be suffered by a compromised resource.” ARITO can produce multi-level response based on the computed risk level and readjust the response level to a newly computed risk level that becomes available over time. Their simulation results on a multi-step attack to penetrate web servers “illustrate the efficiency of the proposed model and confirm the feasibility of the approach in real time.” However, if the attacker is aware of such a response system, the authors describe the situations the attacker may be able “to weaken the system by launching false attacks, to impact to its advantage the value of some parameters.”

A.13.7 [Zonouz-2014] Game Theoretic Intrusion Response and Recovery Engine In [Zonouz-2014], the authors present a game-theoretic intrusion response engine, called the response and recovery engine (RRE). They model the maintenance of computer networks security “as a Stackelberg stochastic two-player game in which the attacker and response engine try to maximize their own benefits by taking optimal adversary and response actions, respectively.” They use the attack-response trees (ART) “to analyze undesired system-level security events within host computers and their countermeasures using Boolean logic to combine

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

130 28 March 2019

lower level attack consequences.” Accounting for uncertainties in the intrusion detection alert notifications, RRE “chooses optimal response actions by solving a partially observable competitive Markov decision process that is automatically derived from attack-response trees. … To support network-level multi-objective response selection and consider possibly conflicting network security properties, we employ fuzzy logic theory to calculate the network-level security metric values, i.e., security levels of the system’s current and potentially future states in each stage of the game.” At the end, “the optimal network-level response actions are chosen through a game-theoretic optimization process.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

131 28 March 2019

A.14. MDP-based and POMDP-based Approaches This section describes Markov Decision Process (MDP)-based solutions ([Lisý-2016 and Wang- 2016]) and Partially Observable MDP (POMDP)-based solutions ([Sarraute-2012, Sarraute- 2013, and Shmaryahu-2016]). They are: • MDP-based approaches o [Lisý-2016] proposes a modified counterfactual regret minimization algorithm o [Wang-2016] combines game theory with the MDP with discounted reward • POMDP-based approaches o [Hoffmann-2015] describes modeling of the attackers in the security problem space in terms of uncertainty and interaction o [Sarraute-2012] describes a POMDP-based method to find good attacks o [Sarraute-2013] describes a preliminary version of the POMDP model used in [Sarraute-2012] o [Shmaryahu-2016] claims to offer a more realistic model of the problem A.14.1 [Hoffmann-2015] Accurate Modeling of Real Attackers The author in [Hoffmann-2015] argues the need for simulated penetration testing to be conducted in the same way a real attacker would. The focus is on generating attack plans that will not actually be executed. To model the problem space, Hoffman suggests partitioning the problem space described in a two-dimensional space with the uncertainty model of action on one axis and the interaction model between individual attack components on the other. In the uncertainty axis, we have

• None, which is amenable to classical planning, where there is no uncertainty and it is like the attacker having the attack graph of its target • Action outcomes, which is describable as in a Markov Decision Process (MDP), where each action has a probability of success • States, which is describable as in a partially observable Markov Decision Process (POMDP), where the attacker has incomplete initial knowledge of the state and through knowledge gained during attack execution updates its initial state knowledge

In the interaction axis, we have

• Explicit network graph, where each action corresponds to a traversal from one host to another with the ultimate goal of reaching a single target host • Monotonic actions, where the attacker can only gain attack assets; once obtained these assets cannot be lost • General actions, where monotonicity is not assumed and thus actions may negatively impact one another

Letting uncertainty be along the y-axis and interaction be along the x-axis, we have nine problem subspaces, the simplest one being at the lower left hand corner66 and the most difficult one being

66 This problem subspace is called Graph Distance in [Hoffmann-2015]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

132 28 March 2019

at the upper right hand corner 67. One question asked by Hoffman in [Hoffmann-2015] is “Can we exploit the restrictions for designing effective algorithms and, with that, practically feasible yet more accurate simulated pentesting tools?” Another related question is “are there feasible ways of making the MDP approximation of the POMDP model more accurate?” Basically, the challenge in simulated penetration testing is with respect to algorithms exploiting the special- case problem structure and for the corresponding problem subspace models to be properly formulated and designed with practical and accurate probability distributions.

A.14.2 [Lisý-2016] Counterfactual Regret Minimization in Security Games In [Lisý-2016], the authors show that all the normal-form games with sequential strategies (NFGSS) can be modelled as well-formed imperfect-recall extensive form games. “NFGSS is a class of two-player zero-sum sequential games in which each player i’s strategy space has the structure of a finite directed acyclic Markov decision process (MDP) with the set of states , set of actions and a stochastic transition function × ( ).” As such, NFGSS can � be solved with counterfactual regret minimization algorithms (CFR 68) that iteratively � learn � � � � strategies in� self-play to converge to an equilibrium.� ∶They� propose� → a� modified� CFR+ that includes pre-storing some special form of regret for each action-value pair. They then show “that with a negligible loss in precision, CFR+ can compute Nash equilibrium with five times less computation than its competitors.”

A.14.3 [Sarraute-2012] POMDPs Model More Accurately Hackers In [Sarraute-2012], the authors argue that POMDP (Partially Observable Markov Decision Process) models are more appropriate for generating cyberattacks. Their ability to reason under incomplete knowledge about the network configuration and to take advantage of newly discovered information such as through scanning actions helps generate better attack vectors. They “devise a method that relies on POMDPs to find good attacks on individual machines, which are then composed into an attack on the network as a whole. This decomposition exploits network structure to the extent possible, making targeted approximations (only) where needed.” They then evaluate the method on a suitably adapted industrial test suite to demonstrate its effectiveness in both runtime and solution quality. Proving the effectiveness of their solution against classical planning solution currently employed by Core Security and generalizing their currently domain-specific decomposition algorithm are some of the work left for the future.

A.14.4 [Sarraute-2013] Suitability of POMDP Model for Penetration Testing

67 This problem subspace is called Factored POMDP in [Hoffman-2015] 68 “CFR is a self-play algorithm: it learns to play a game by repeatedly playing against itself. The program starts off with a strategy that is uniformly random, where it will play every action at every decision point with an equal probability. It then simulates playing games against itself. After every game, it revisits its decisions, and finds ways to improve its strategy. It repeats this process for billions of games, improving its strategy each time. As it plays, it gets closer and closer towards an optimal strategy for the game: a strategy that can do no worse than tie against any opponent.” [https://www.quora.com/What-is-an-intuitive-explanation-of-counterfactual-regret-minimization]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

133 28 March 2019

This article seems to be the same as that of “Sarraute, C.; Buffet, O.; and Hoffmann, J. 2011. Penetration testing == POMDP solving? In Proceedings of the 3rd Workshop on Intelligent Security (SecArt’11).” The date discrepancy is due, most probably, to submission date to the arXiv.org website. This 2011 article describes a preliminary version of the POMDP model used in [Sarraute-2012] and present similar material arguing the suitability of POMDP model for penetration testing.

A.14.5 [Shmaryahu-2016] Constructing Plan Trees for Penetrating Testing In [Shmaryahu-2016], the author also argues as in [Sarruate-2012] that POMDP-based approaches model the real world problem better for penetration testing planning. The author claims to “offer a more realistic model of the problem, allowing for partial observability and non-deterministic action effects, by modeling pentesting as a partially observable contingent problem” which uses “a qualitative model where probability distributions are replaced with a sets of possible configurations or action effects”. Their “Contingent planners attempt to find a plan tree (or graph), where nodes are labeled by actions, and edges are labeled by observations.” For these plan trees, the author suggests a number of optimization criteria for ranking how best, worst, budget constrained, and fault-tolerant they are.

A.14.6 [Wang-2016] Markov Game Model –Based Analysis of Network Security In [Wang-2016], the authors propose combining game theory with the Markov decision processes with discounted reward to study attacker’s and defender’s equilibrium strategies. In their empirical study, they provide no guidance on how to set up the state transition probability, but indicate that it should be “a function of attacker and defense action, and the network state transition probabilities usually can be given by case studies, statistics, simulations, and knowledge engineering …” “However, in our example, we assign state transition probabilities based on the intuition and experience of our network expert.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

134 28 March 2019

A.15. Payoffs Quantification This section describes articles that deal with payoffs quantification. They are:

o [Chatterjee-2015] proposes a conceptual framework o [Chatterjee-2016] focuses on operationalizing a probabilistic framework described in [Chatterjee-2015] o [Rass-2017] studies how to define games over qualitatively assessed outcomes that may be random

A.15.1 [Chatterjee-2015] Quantifying Uncertainties in Cyber Attacker Payoffs In [Chatterjee-2015], the authors present “a conceptual framework for quantifying cyber attacker payoff uncertainties …” The sources of uncertainty considered are those due to the following: “(1) attacker types and their attack plans, and (2) the state of the system.” First, the interdependencies among attacker types, target protective conditions, and system states are represented as conditional probabilities. Next, a probabilistic framework is developed to reason about attacker payoff uncertainties. Finally, some “uncertainty quantification methods to address sources of aleatory and epistemic uncertainties in the attacker payoffs” are identified.

A.15.2 [Chatterjee-2016] Propagating Uncertainties in Cyber Attacker Payoffs The authors in [Chatterjee-2016] focus on operationalizing a probabilistic framework described in [Chatterjee-2015] “for quantifying attacker payoffs, within a notional case study, through: (1) representation of uncertain attacker and system-related modeling variables as probability distributions and mathematical intervals, and (2) exploration of uncertainty propagation techniques including two-phase Monte Carlo sampling and probability bounds analysis.”

The authors consider “representation and propagation of uncertainty in the conditional probability of attacker payoff utility given an initial system state and attacker type.” Using Monte Carlo methods, they are able to approximate attacker payoff utility distributions, and using the probability bounds analysis, they are able to develop bounds for these distributions.

A framework is provided but in practice actual probability distributions need to be determined or estimated which may not be straightforward or guaranteed to exist.

A.15.3 [Rass-2017] Quantifying Losses and Payoffs in Advanced Persistent Threats In [Rass-2017], the authors study “the problem of how to define games over qualitatively assessed outcomes that may be random” in defending against advanced persistent threats (APT). To address the difficulty of specifying losses and payoffs in a game, they suggest “letting the game being defined by any outcome that can be ordered (as in conventional game theory),” and also “allowing an action to have many different random outcomes (this is usually not possible in standard games).” To address the issue of game asymmetry in players’ knowledge and mode, they suggest “letting the effects of action be nondeterministic and random even if both, the attacker's and defender's action were both known.” To be useful, “the game theoretic model's

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

135 28 March 2019

input and output must be useful with what the risk management process can deliver and requires.” To avoid the difficulties of specifying game structures in the classical way, they suggest “working with empirical data directly”. To illustrate these points, the authors use “the protection of intellectual property rights (IPR), whose theft can be a reason to mount an APT” as an example.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

136 28 March 2019

A.16. Securing Defense This section considers articles related to defense optimization (Appendix A.16.1) and interdiction techniques (Appendix A.16.2).

A.16.1 Defense Optimization This section consists of articles related to defense optimization. They are: o [Hota-2016] presents optimal solutions for multiple defenders scenario o [Nguyen-2017] presents defense solutions with limited defense resources o [Yang-2017] presents how to determine optimal defense solution.

A.16.1.1 [Hota-2016] Addressing Security Investments with Interdependent Assets In [Hota-2016], the authors present a solution to compute the optimal defense solutions for multiple defenders, each managing its own set of assets interdependently with assets managed by others, in delivering some overall service. An interdependency graph is used to capture the interdependencies of the assets. While the attackers exploit the interdependencies of assets to compromise the network, “Each defender minimizes her own expected loss, where the attack probabilities of her assets are a function of her own defense strategies, strategies of other defenders, and the interdependency graph.”

In this framework, the authors prove the existence of pure Nash equilibrium in the game between self-interested defenders.” Moreover, “For a general class of defense strategies, we show that the problem of computing an optimal defense allocation for a defender (i.e., her best response) is equivalent to solving a convex optimization problem.”

To evaluate the framework, they apply it to “a SCADA system with multiple control networks managed by independent entities, and the second is a distributed energy resource failure scenario identified by the US National Electric Sector Cybersecurity Organization Resource (NESCOR).” They have the following two findings: o Being selfish is costlier than cooperating o Helping defend other defenders’ assets serves one interest A.16.1.2 [Nguyen-2017] Multi-Stage Attack Graph Security Games In [Nguyen-2017], the authors “study the problem of allocating limited security countermeasures to protect network data from cyber-attacks, for scenarios modeled by Bayesian attack graphs.” The countermeasures discussed in the paper consist of only disabling nodes. They “propose parameterized heuristic strategies for both players” where the players interact in multi-stages and use “a simulation-based methodology, called empirical game-theoretic analysis (EGTA) to construct and analyze game models over the heuristic strategies. For “a variety of game settings with different attack graph topologies”, they show that their “defense strategies provide high solution quality compared to multiple baselines.” They also “examine the robustness of defense strategies to uncertainty about game actions and to the attacker’s strategy selection.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

137 28 March 2019

A.16.1.3 [Yang-2017] Optimization Model of Network Attack-Defense Game In [Yang-2017], the authors study an attack-defend scenario where “failure rate and mean time to repair” is used to determine optimal defense solution. In the paper, how time to repair concept is used is not described. The concept of cost seems to have replaced time to repair concept.

A.16.2 Interdiction Techniques This section consists of articles deploying interdiction as a mean of defense. They are: o [Letchford-2013] models the problem of interdicting an optimal workaround attack plan o [Nandi-2016] describes optimal interdiction plan to minimize loss

A.16.2.1 [Letchford-2013] Optimal Interdiction of Attack Plans In [Letchfor-2013], the authors model the problem of a defender choosing a mitigation strategy to interdict potential attack actions whereby the attacker responds by computing an optimal workaround attack plan as Stackelberg game. For a general formulation of the deterministic interdiction plan, they provide two solutions, one based on a mixed-integer program and leveraged partial satisfaction planning techniques to compute optimal solutions, and the other based on greedy heuristics. They also formulate the MDP (Markov Decision Process) interdiction problem to handle “uncertainty about attacker’s capabilities, costs, goals, and action execution uncertainty, and show that these extensions retain the basic structure of the deterministic plan interdiction problem.” Allowing the defender to randomize its commitment strategy is left for future work.

A.16.2.2 [Nandi-2016] Protecting via Interdicting Attack Graphs In [Nandi-2016], the authors investigate the problem of finding an optimal interdiction plan on an attack graph to minimize the loss due to security breaches. They formulate the problem “as a bi-level mixed integer linear program and develop an exact algorithm to solve it.” Their experiments on synthetic scenarios “reveal that the quality of an interdiction plan is relatively insensitive with respect to the estimated error in the attacker’s budget and that the breach loss drops in response to increase in the security defense budget. Making interdiction effects probabilistic, minimizing the probability of large losses, and performing experiments on real attack graphs are left for future work.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

138 28 March 2019

A.17. Simulation Systems There are many ways of classifying simulation systems by purpose, type, and ownership. In [Kavak-2016], the authors list the components of a system and its type such as general purpose, emulator, and live, virtual, and constructive (LVC) systems. In [Leblanc-2011], the authors list simulation systems based on the owners such as private sector, academia, and government. [Moskal-2013] reviews four early cyberattack simulators and [Moskal-2014] presents CyberSim. [Noel-2015a] describes AMICA, an integrated approach for understanding mission impacts of cyberattacks.

A.17.1 [Kavak-2016] Characterization of Cybersecurity Simulation Scenarios In [Kavak-2016], the authors consider the types of actor having distinct characteristics in a simulation scenario and different types of simulation systems. The types of actor described are attackers, system security personnel, users, and insiders. Each of them can have different level of social, cognitive and technical abilities.

For simulators to be used in cyber simulations, the authors list the following: • General-purpose simulation software, including those for discrete-event or agent-based simulation • NS-3, an open source discrete-event network simulation tool with a programming environment [NS-3-2019] • OMNET++, an open source discrete-event simulation tool for simulating computer networks [Varga-2001] • SteelCentral (previously OPNET), a commercial, general-purpose, simulation-based network development and testing tool [Riverbed-2019]

The list of emulators given is as follows: • Emulab, a network testbed facility and software system developed at University of Utah [Emulab-2019] • DeterLab [Benzel-2011], built on Emulab, provides additional tools and capabilities to conduct cybersecurity experiments. • GENI [Berman-2014] is a facility, similar to DeterLab, allowing researchers to explore new networking technologies. • Minimega [McGregor-2002], a tool from Sandia National Labs that can launch and manage massive number of virtual machines connected through a virtual network. • NetKit [Pizzonia-2014, Rimondini-2007], an emulation-based network environment with a capability to setup high fidelity computer networks • Mininet [Lantz-2010] is similar to NetKit

The list of live virtual constructive systems given is as follows: • Emulytics [Leeuwen-2015], a platform from Sandia National Lab providing cybersecurity training and testing capabilities. • SIMTEX (Air Force Simulator Training and Exercises), a network environment with capabilities such as simulation of the Internet [Harwell-2013]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

139 28 March 2019

• SAST, a cyber-attack training environment with capabilities such as background traffic generation [Wabiszewski-2009] • LARIAT (the Lincoln Adaptable Real-time Information Assurance Testbed), a network emulation/simulation testbed with the capability of mimicking user behavior and evaluation of attack behavior [Lincoln Laboratory-2019, Rossey-2002]

A.17.2 [Leblanc-2011] Overview of Cyber Attacks and Networks Operations Simulation In [Leblanc-2011], the authors list simulation systems found in the private sector, academia, and government.

From the private sector and academic research, we have: • ARENA, a constructive simulation developed by researchers at the Rochester Institute of Technology (RIT) to simulate cyberattacks against a computer network such as from the Internet [Kuhl-2007, Constantini-2007]. It is primarily used to analyze IDS sensors. • RINSE (Real-Time Immersive Network Simulation Environment), a live simulation developed by researchers at the University of Illinois at Urbana-Champaign in 2006 [Liljenstam-2006]. It is for simulating attacks on complex networks helping the training and education of network defenders but not for the general understanding of the implication of Computer Network Operations (CNO) by senior leaders. • Cohen [Cohen-1999] describes a constructive simulation system that models various attacks on a simulated network. • SECUSIM, a constructive simulation software developed at the Department of Computer Engineering at Hangkong University in Korea in 2001 [Park-2001] to specify attack mechanisms, verify defence mechanisms, and evaluate their consequences. • OPNET-based solution o "The use of modeling and simulation to examine network performance under Denial of Service attacks” is a master’s thesis written by Rahul R. Sakhardande of the State University of New York in 2008 [Sakhardande-2008]. o “A Frequency-Based Approach to Intrusion Detection” is a research paper written by Mian Zhou and Sheau-Dong Lang of the University of Central Florida in 2003 [Zhou-2003]. The simulation is to test the effectiveness of intrusion detection algorithm. • NetEngine, a cyberattack virtual simulation tool developed at The Institute of Security Technology Studies at Dartmouth College [Brown-2003] able to represent very large IP networks and to train IT staff in combating cyberattacks..

From the public sector research, we have: • Under US Cyber Command and Air Force Cyber Operations Division o SIMTEX (Simulator Training Exercise Network), a simulation infrastructure to automatically simulate various computer network attacks for training purposes. Bulwark Defender, previously known as Black Demon, is a training exercise that uses the SIMTEX infrastructure. o CAAJED ‘06 [Mudge-2008], “a manual integration of CNO and cyberattacks with the US Air Force war simulator Modern Air Power (MAP). CAAJED (Cyber

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

140 28 March 2019

and Air Joint Effects Demonstration) consists of all the features of MAP such as the ability to play the war game as a human versus human, human versus computer opponent, or computer versus computer contest.” • IWAR, (Information Warfare Analysis and Research) developed in IWAR “laboratory at the US Military Academy (USMA –West Point, NY) is a network attack and defence simulator used to train cadets and faculty in information warfare [Lathrop-2002]. It is capable of simulating defenses such as cryptography, encryption and access control methods. IWAR is also able to simulate attacks such as Trojan horses, vulnerability scanners, viruses, worms, DoS, DDoS, and password hacking.” • Cyber Storm I, II, and III, are simulation exercises developed by the US Department of Homeland Security National Cyber Security Division. They were conducted with over 100 participants from industry, military, and government, mostly from the US. The goal is to examine the “preparedness, response, coordination, and recovery mechanisms to a simulated cyber event within international, Federal, and State Governments in conjunction with the private sector” [DHS-2006]. • DARPA National Cyber Range “aims to simulate cyberattacks on computer networks and help develop strategies to defend against them.” • France’s Piranet, designed “as the response to a major cyberattack on France's telecommunications and information systems infrastructure which impacts the military, public and private sectors.” It was used “to train government teams and” exercised “to validate the emergency measures taken in order to decide if Piranet defences are still valid.” [Naudon-2010] • India’s Divine Matrix, a war game that “simulated a notional attack by China on India in 2017.” [Singh-2009]

A.17.3 [Moskal-2013] Simulating Attack Behaviors In [Moska-2013], the authors review four early cyberattack simulators, developed between 1990’s and 2007, that focused on attack behavior. They then present “CyberSim, a modular system for cyber attack behavior simulation.” The core of this system interacts “with four context models: Virtual Terrain v2 (VT.2), Vulnerability Hierarchy (VH), Scenario Guiding Template (SGT), and Attack Behavior Models (ABM).

A.17.4 [Moskal-2014] Fusing Context Models in Network Attack Simulation This paper is an expanded version of [Moskal-2013]. It describes “a simulation system that fuses four context models: the networks (VT.2), the system vulnerabilities (VH), the attack behaviors (ABM), and the attack scenarios (SGT), so as to synthesize multistage attack sequences.” “After describing the design of the context models,” they provide “an example use of the simulator and sample outputs, including the ground truth actions and sensor observables”.

A.17.5 [Noel-2015a] Analyzing Mission Impacts of Cyber Actions In [Noel-2015a], the authors describe “an integrated approach for understanding mission impacts of cyber-attacks” called AMICA (Analyzing Mission Impacts of Cyber Attacks). “AMICA combines process modeling, discrete-event simulation, graph-based dependency modeling, and dynamic visualizations.” It “captures process flows for mission tasks as well as cyber attacker and defender tactics, techniques, and procedures (TTPs).” Vulnerability dependency graphs are

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

141 28 March 2019

used to generate network attack paths, and mission-dependency graphs are used to relate the different levels of cyber assets needed for the mission. Simulating the model, they are able to “quantify impacts in terms of mission-based measures, for various mission and threat scenarios.” Their dynamic visualizations of results enable users a “deeper understanding of cyber warfare dynamics, for situational awareness in the context of simulated conflicts.” Moreover, a prototype tool is demonstrated.

This article will be useful for understanding the different issues involved in the simulation of mission impacts due to the presence of cyberattacks.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

142 28 March 2019

A.18. Software Vulnerability This section provides an article on handling software vulnerability.

A.18.1 [Song-2015] Automating Software Vulnerability Analysis In [Song-2015], the authors describe the problem of automating software vulnerability analysis in the context of “The DARPA Cyber Grand Challenge”. The first challenge involves fixing vulnerable application given in binary form. This requires binary reverse engineering which is “disassembling the binary into an abstract representation that can be analyzed, manipulated, and converted back to a functional binary.” The next challenge involves vulnerability analysis. That is, given a challenge set (CS), consisting of one or more challenge binaries (CBs), to analyze each CB for vulnerability and provide an executable POV (Proof of Vulnerability).

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

143 28 March 2019

A.19. Validating Changes, Mitigation Strategies This section consists of literature related to validating changes and assessing mitigation strategies. They are:

o [Atighetchi-2017] describes an approach for validating prototype systems o [Backes-2017] describes simulated penetration testing to assess mitigation strategies A.19.1 [Atighetchi-2017] An Approach for Security Validation The authors in [Atighetchi-2017] describe an approach for validating security properties of prototype systems. It consists of a three progressive steps:

o Attack surface reasoning for specific security component implementation as a unit o Emulation for the envisioned security component in deployment scenarios o Logical argumentation for the security components in combination with others

The concepts and approach of [Atighetchi-2017] may be applicable in ensuring changes in a network environment do not have any associated security risks.

A.19.2 [Backes-2017] Simulated Penetration Testing and Mitigation Analysis Given a set of mitigation strategies, the authors in [Backes-2017] propose to use simulated penetration testing to determine which subset of the strategies to apply. The authors conclude that more work is needed in finding more effective algorithms – “effective search guidance towards good solutions” and a practical mean of acquiring more refined models – trading off “between the accuracy of the model, and the degree of automation vs. manual effort with which the model is created”.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

144 28 March 2019

A.20. Others This section consists of articles that do not belong exactly to any of the above categories. They are: o [Dowse-2018] discusses about trusted autonomy in military cyber security o [Gundersen-2017] studies the state of the art of reproducibility in existing Artificial Intelligence literature o [Holm-2017] studies about the effectiveness of a typical script kiddie

A.20.1 [Dowse-2018] Need for Trusted Autonomy in Military Cyber Security The author in [Dowse-2018] discusses the need for trusted autonomy in military cyber security within current cyber environment under the following four fundamental principles of cyber security: o “The preservation of confidentiality, integrity and availability of information in support of the organization’s missions and interests … is an integrated part of how an organization manages its ICT (Information Communication Technology) environment.” o “A secure foundation is fundamental to cyber security.” o “Everyone in the organization contributes to cyber security.” o “An organization must balance security imperatives with business requirements for functionality and access.”

Dowse defines autonomy as “the interaction with the environment within a goal-directed behavior” and trust as being able to “act in a predictable and reliable manner” and “to determine how they [autonomous agents] behave with such predictability and certainty”. The five challenges to establishing trusted autonomy, according to Dowse, are: o The size of the cyber environment o The extraction of relevant and accurate info from the cyber environment for better understanding of the state of the cyber environment o The real time and expeditious requirement for timely decision making o The variety of threats in the cyber environment o The non-standardization of the cyber environment

A.20.2 [Gundersen-2017] State of the Art of Reproducibility in Artificial Intelligence In [Gundersen-2017], the authors study the state of art of reproducibility in the published Artificial Intelligence literature. They consider reproducibility in the following aspects: o Experiment o Data o Method

The six metrics of reproducibility are three Boolean combination and three weighted sums of the above three aspects.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

145 28 March 2019

A.20.3 [Holm-2017] Effectiveness of a Typical Script Kiddie In [Holm-2017], the authors studied the effectiveness of a typical script kiddie in actually succeeding to compromise presumably vulnerable network services. They ran “1223 exploits corresponding to 45 different exploit modules against 204 machines in a cyber range.” To their surprise, only eight attacks were successful, primarily “due to the poor” association “between vulnerabilities (CVE) and exploit modules.” They indicated “that even fewer would be successful if the employed vulnerability scanners would not be allowed to authenticate to victims during scans.”

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

146 28 March 2019

Appendix B: Problem Abstraction in Cyber Simulation with Learning Strategies In this section, we describe the problem abstraction issues in [Ben-Asher-2015, Chung-2016, Elderman-2017, and Niu-2015]. In Table 18, we summarize the problem considered, the number of players involved and the technique used, and in Table 19, the advantages and disadvantages of each of the above proposed solutions.

Intrinsic to all game-theoretic models, we would need the following: • Have expert knowledge to populate the attack model • Have complete and granular models as they affect performance and accuracy • Define impact metrics of selected attack actions • Be robust against adversarial learning attacks.

In the following subsections, we described for each of them the problem, results, model, technique, and assumptions, including a summary of each.

Table 18 Learning Strategies in Selected Literature Literature Problem Considered # of Players Learning Techniques [Ben-Asher- Investigate the cyber Multi- Instance-Based Learning 2015] posture of various levels of players Theory (IBLT) cyber infrastructure investment [Chung-2016] Investigate variants Q- 2 players Q-Learning, Markov Game Learning for detecting execution of a multistage attack [Elderman- Investigate various game 2 players Monte-Carlo Learning, Q- 2017] learning strategies Learning, Neural Network [Niu-2015] How to defend a cyber 2 players Q-Learning system that controls a physical system

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

147 28 March 2019

Table 19 Pros and Cons of Selected Learning Strategies Literature Pros Cons [Ben-Asher- • Provide consequential trend • No detail guidance on how to set player’s 2015] of cyber security investment asset and power values based on real decision in terms of the asset scenarios accumulated from other • Attacks driven only by the power level of players the player and the severity parameter of • Explore and learn new attack, no other motives considered instances 69 • No grouping of players • Able to reduce dependency • Need to build up knowledge base on prepopulated instances in • Only two actions available for the the knowledge base over defender {respond or not respond}, no time evasion • Multi-players • No dynamic variation of power level when under attack [Chung-2016] • Provide the security • Consider only rewards of attack and not administrator the most likely the cost of attacks beneficial defensive actions • No network model, only attack and to take defensive actions and state transition • Consider both false positive strategy and negative • Depend on the accuracy of monitoring • Show that naïve Q-learning model is promising without requiring opponent [Elderman- • Consider a simple network • Not easily scalable 2017] model • Does not provide guidance on mapping of • Incorporate the concepts of real attack mechanisms to attack values dynamic defense and similarly for the defense mechanism. fortification and attack • Only end node has asset value. escalation • Provide no guidance to map realistic • Show advantage of Monte network to the network model used Carlo learning strategy over other strategies [Niu-2015] • Once the bounds of critical • The cyber system needs pre-training parameters are known, pre- before it can be safely deployed. training is always possible. • Packet delay and loss may not be the only • After pre-training, on-line critical parameters. training is always possible. • Assume mechanisms exist in the cyber system to mitigate critical parameters that must be bounded.

69 Instances are defined in Section 8.1.3

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

148 28 March 2019

B.1. [Ben-Asher-2015] Understanding Challenges of Cyber War

B.1.1 Problem Definition The authors in [Ben-Asher-2015] apply Instance-Based Learning Theory (IBLT) to study the dynamics of cyber-war involving multi-agents that learn to make decision according to IBLT. Specifically, they investigate the dynamics of cyber-warfare as result of the investment in cyber security thru agents trying to maximize their assets based on IBLT learning and exploiting their cyber security prowess in terms of cyber infrastructure investment at accumulating opponents’ assets.

B.1.2 Results The results seem obvious as described below. • High power agents managed to accumulate faster and larger amount of assets compared to all other types of agents. • Even though agents with medium power managed to accumulate assets, the rate of accumulation was relatively low compared to the rate with which high power agents accumulated assets. • Finally, low power agents did not manage to accumulate assets beyond their initial assets • The assets that the low power agents lost were distributed unequally between the high and medium agents, with a noticeable advantage going to the high power agents. • Agents learn to be proportionally aggressive in time and degree according to their power levels. • When examining the probability of attacking other agents in the last 500 trials, they found the high power agents were more likely (p=0.83) to attack other agents compared to medium (p=0.59) or low (p=0.21) power agents. • Low power players are basically sitting ducks; the only way out of the dilemma for them is to increase their power which the simulated scenarios did not considered.

B.1.3 Instance Based Learning Theory (IBLT) IBLT is developed to explain and predict learning and decision making in real-time, dynamic decision making environments. It is basically decision making from experience. From [Gonzales-203], an instance is a triple consisting of (situation, decision, utility) called an SDU element, where • situation – a set of environmental cues • decision – a set of actions applicable to a situation • utility – the evaluation of the goodness of a decision in a particular situation

Again from [Gonzales-2003], the mechanisms involved in IBLT are as follows: • Instance-based knowledge – stores SDUs • Recognition-based retrieval – retrieves SDU according to the similarity between the situation being evaluated and instances stored in the instance-based knowledge • Adaptive strategies – transitions from heuristic-based to instance-based decisions according to the amount of interactive practice in the dynamic task • Necessity – controls the need for additional alternative search

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

149 28 March 2019

• Feedback updates – updates the utility of SDUs and maintain the causal attribution of results to actions

In applying IBLT to investigate the dynamics of cyber-war, the authors in [Ben-Asher-2015] consider the following steps: “ • Initially, a single model is constructed to study the cognitive processes that a single decision maker (i.e., the defender) applies when interacting with a controlled environment. At this level, the actions of the attacker are predefined through a set of strategies. • In the next level, the attacker is also represented by a cognitive model. Here, both the defender and the attacker have learning mechanisms that allow them to learn from experience and adjust their decisions. When scaling up from an individual decision maker to a pair of decision makers that interact repeatedly, we borrow many concepts from Game Theory and especially from Behavioral Game Theory (BGT). This allows us to observe and examine how behaviors at the pair level evolve and whether stable patterns emerge (i.e., equilibrium) with experience. • To capture the complex dynamics of a cyber-war, multiple models interact through a network. In this network, models are not assigned to a specific role, rather they learn from experience regarding their abilities and the abilities of the others. In this setting, models learn through the repeated interaction to play the role of an attacker or a defender depending on their ability and the actions of their opponents. • Next, we elaborate and demonstrate each of the levels in the scaling process of cognitive models for cyber-war. “

B.1.4 Problem Modeling B.1.4.1 Player Definition • Let there be players, each of them can either attack any other players or defend itself • Let each player have two key attributes: � o Asset – a value representing the key resources that an agent has to protect o Power – a value representing the technical prowess of an agent in cyber security. It can be in the form cyber security infrastructure and personnel training investment. • Agents interact repeatedly to learn to maximize their respective assets by winning and thus gaining a percentage of the opponent’s assets ( ), where the value of is directly proportional to the severity of attack and is the assets of opponent ��� � B.1.4.2 Game Rules �� � • The outcome of player with power attacking player with power is given as follows: � �� � �� , = 1 , = . �� • � �𝑤 � �𝑤 The expected gains for� player attacking− � player��+� � is as follows: , = , × , , � � �� � ��� �𝑤 �� �� − �

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

150 28 March 2019

where o , = is the cost of player attacking player o � � � � �� o , = , , which is the expected loss of player when attacked by player • � � � “In each �trial,� an agent� � in the game makes a decision of whether or not to attack each of the other� agents−� based on their past experience. Once the �agent have identified a likely� opponent, it evaluates whether to Attack or Not attack that specific opponent based on the expected payoff for each of the two possible decisions, which is based on the past interactions and outcomes that are stored in memory for that particular type of agent.” • “Since a player can attack another player and at the same time be attacked by several other players in any given round of the game, the calculations of gains and losses for each individual player are done sequentially and separately. First, losses from being attacked are calculated, then gains from winning an attack are calculated, and finally gains and losses are summed and the costs of attack and defense are deducted from the remaining assets of the agent.” • “Therefore, assets change dynamically during the game as a result of each player’s actions. A player may also run out of assets and when assets drop below a certain threshold, this player is suspended for a fixed number of trials. This resembles downtime for a network node or service in the cyber-world. During a downtime period, the player cannot engage with other players. After recovering from downtime, a player regains assets which are equal to the initial assets that the player started with in the first trial of the game.” B.1.4.3 Learning Strategy • Learn only from experience – instances are created by making a decision and observing its outcome • During the early stages of the interaction when there is no historical data yet to consult, use prepopulated instances in memory • “The prepopulated instances correspond to the possible outcomes from attacking and not attacking other types of agents and contained a fixed value of 500. As the agent accumulated experiences, the activation of these prepopulated instances decayed overtime. Their decay leads to a decrease in the retrieval probability, which in turn lead to a decrease in the contribution of these values to the calculation of the blended value.” • “For all of the simulations, all the agents shared the same values for the free parameters in the IBL (Instance Based Learning) model. The values of σ (noise) and d (decay) were set at 0.25 and 5, respectively (obtained from fitting to a different data set as described in [Lejarraga-2012])”

B.1.5 Assumptions The authors in [Ben-Asher-2015] make the following implicit assumptions in applying IBLT to the cyber security decision making problem. • There are prepopulated instances based on historical data or expert experience for the initial simulation stage • Learning from experience means storing new instances (new learned experiences) in the knowledge based, updating existing instances in the knowledge base with new instances,

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

151 28 March 2019

and retrieving from the knowledge base similar instances to the currently experienced situation to determine what action to take • There is enough time to interact repeatedly to learn • The power values of all players are fixed through the simulation

B.1.6 Summary The authors in [Ben-Asher-2015] scale up a two-person behavioral game to multi-agents behavioral game where each agent can be both attacker and defender. Using concepts from multi-agent models, BGT (Behavioral Game Theory), and IBLT, they constructed an n-player cyber security game (CyberWar game) played by cognitive agents. In the game, each player is given a power and asset values, where power signifies attack capability to amass asset from the other players. The game allows for investigating the consequential trend of various levels of investment in cyber security in term of the level of accumulated asset.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

152 28 March 2019

B.2. [Chung-2016] Game Theory with Learning for Monitoring

B.2.1 Problem Definition The authors in [Chung-2016] propose a more realistic learning strategy to react automatically to the adversarial behavior of a suspicious user to secure the system. They called it the Naïve Q- Learning. They compare it to Markov, minimax Q-Learning, non-Markov player (irrational attackers, random decision-makers) and against itself. They illustrate their results on how a defender reacts “to the adversarial behavior of a suspicious user to secure the system.”

B.2.2 Results The authors found that

• Naïve Q-Learning performs well against irrational (non-Markov) attackers. • Naïve Q-Learning was able to perform at least as well as a Minimax Q-Learning defender. • Naïve Q-Learning does not perform as well against Markovian attacker, which is the worst a defender could face and is an unrealistic adversary having all necessary information about its opponents.

B.2.3 Problem Modeling “The main interest of the defender is to secure the cyber infrastructure from malicious activities. The attacker, on the other hand, is a malicious opponent who attempts to compromise the target system. We model the interaction between the attacker and the defender based on data on actual security incidents.”

B.2.3.1 Game States, Actions, Transitions, Rewards, and Rules Here, we define various parameters and the rules of the game.

• , an attack state , represents the state of the attack associated with a numeric value (reward) � • ��, an activity, is a set� of actions available to the attacker. At a given attack state , is the set of available actions, where � � • � ( , ), the transition matrix, is �the probability that an action in state will transition�� � � to� the next state . An attack succeeds if �the⊆ defender � does not make a proper response. It � fails,� � �́otherwise. � � • ( , ), the immediate�́ reward, is the reward of the attacker in taking an action in state and transitioning to the next state . � • � �, the�́ defender state from the defender’s perspective � • � , the defender action, is the set of�́ defender actions available to the defender in a � given�� defender state, � {“Response”, “No Response”}, where “No Response” is for � � �monitored events that are either correctly or wrongly considered� not harmful. ��� ∈

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

153 28 March 2019

• The rules of the game are as follows. “Once an attacker has taken an action, the defender chooses his or her action based on the information from the monitoring system. An attacker’s action results in a transition to the intended state only if the defender does not make a proper response. Once the defender has responded to the observed action, the attacker is forced to transit to the default state. Assuming a zero-sum game, a successful attack will result in an immediate reward, and the defender will have a symmetric loss. As a result of the execution of the attack, the attack state will change accordingly. Otherwise, if the defender detects the attack and makes a proper response, the attack state will be reset to the default for the identified attacker. In that case, a reward will be assigned to the defender, with an equivalent loss to the attacker.”

B.2.4 Learning Models In this section, we briefly describe the three learning models used in [Chung-2016]. First, we list below the common notations used by Markov game, minimax Q-Learning, and Naïve Q- Learning.

• Let ( , , ) be the immediate reward based on the attack state and the player’s action, and the opponent’s action in a game iteration, where and are the set of available� � � actions� for the player and the opponent, respectively � • Let� ∈ �( ), the value of state , be �the ∈ expected � reward when the player,� starting� from state , follows the optimal policy, assuming the opponent’s action will be the action that minimizes� � the expected reward.� • Let� ( , , ), the quality of state, be the expected reward each� player can gain by taking actions and from state and then following the optimal policy from then on � � � � B.2.4.1 Markov� Game� � In a Markov Game where complete information and rationality of game processes are available, the value of state and the quality of state are defined as follows:

o ( ) = max min ( , ) ( , , ), where � �́ ∈�� �́ ∈�� � �. , the optimal polic∑ y, is� the� �́ set� of� probability�́ �́ distribution of actions, ( ,.) . ( , . ) = max min ( , ) ( , , ) � ( ,.) � � . ( , ) indicates the likelihood�́ ∈�� � ́of∈� �taking action in state where is the � � ��� �́ � ∑ � � �́ � � �́ �́ overall distribution that maximizes the value of state, ( ) � � � � � � o ( , , ) = ( , , ) + ( ), �+1 �wher+1 e � � � � . � � � ( , �, ) is� the� �immediate�� �́ reward at iteration + 1 for player taking action,�+1 , opponent taking action, , at state . � ( ) �is� the� expected reward, which was determined� from the previous iterations,� � as a result of transitioning� to state � . �, the�́ discount factor, is for balancing between future and current rewards� �́ �

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

154 28 March 2019

B.2.4.2 Minimax Q-Learning With minimax Q-Learning, we assume limited information is available and use learning to compensate for the lack thereof. This is done by allocating “partial weight on its earlier results to combine knowledge of history, the actual earnings on the current iteration, and the future expected reward.” Its quality of state is defined as follows:

o ( , , ) = ( , , ) + (1 )( ( , , ) + ( )), �+1 wher� e �+1 � � . � �, the� learning�� � rate,� � to leverage− � the� ability� � of� the ��player�́ by assigning a real value between 0 and 1 . �, the discount factor, is for balancing between future and current rewards

Unlike Markov game� which possesses the optimal solution from the outset, minimax Q-Learning needs to explore its environment to learn the optimal policy. It uses , the exploration rate, to determine the degree of variation from the current policy to allow for learning the optimal policy by trial and error. �𝑒 B.2.4.3 Naïve Q-Learning In Naïve Q-Learning, no information about the opponent is available. Thus, its quality of state, value of state, and optimal policy are defined without any opponent information component as follows:

• ( , ) = ( , ) + (1 )( ( , ) + ( )) • (�+1) = max � ( , ) ( , ) �+1 � • �( ,. )� =� �� max� � ( ,−) � (�, ) � � �� �́ � � � ∑(�́ ∈�,.) � � � �́ � � �́ �́ ∈�� � � ��� �́ � ∑ � � �́ � � �́ B.2.5 Assumptions In formulating the game, the authors in [Chung-2016] make the following assumptions: • Knowledge expert will be able to come up with the attack model. • Reward was abstracted as the relative severity under the assumption of a zero-sum game; winner’s gain is equaled to loser’s loss. • Monitoring and detection do not directly map to the attacker’s definite actions. • Monitor may miss (false negative) or misidentify (false positive). Modeling of this aspect needs further study. • Need to rely on expert knowledge to enumerate the potential damage or overhead for taking a certain action.

B.2.6 Summary The authors show that Naïve Q-Learning is a promising learning strategy for improving the decision-making process of cyber security monitoring when compared to irrational attackers. It also performs at least as well as a minimax Q-Learning defender, although it does not perform well against a Markovian attacker. As their purpose is to investigate realistic learning strategies, the authors consider cyber simulation scenarios typical of intrusion/attack detection.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

155 28 March 2019

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

156 28 March 2019

B.3. [Elderman-2017] Learning in Cyber Security Simulation

B.3.1 Problem Definition The authors in [Elderman-2017] investigate the effectiveness of various learning strategies used by agents when playing against each other on a cyber network in a simulation modelled as a Markov game with incomplete information. The learning strategies considered are Monte Carlo with various exploration options, backward Q-Learning, Neural Network, and Linear Network. A simple network model is used to illustrate their findings. Each player plays according to its view of the network nodes’ status, which consists of defensive and detection levels for the defender, and attack levels for the attacker.

B.3.2 Results They found that “Monte Carlo learning with the Softmax exploration strategy is most effective in performing the defender role and also for learning attacking strategies.”

B.3.3 Problem Modeling Here we briefly describe the network model, player model, and the game rules in [Elderman- 2017]. B.3.3.1 Network Model The network model consists of three types of nodes and connectivity. Their details are as follows: • Node model o Start node – the node that the attacker starts with each time the game begins. It has no asset value. o Intermediate nodes – the nodes that the attacker must successfully hacked to reach the end node from its start node. They also have no asset values. o End node – the data node in the network containing the asset. • Connectivity mode o Start node connects to one or more intermediate node and may directly connect to the end node. o Intermediate nodes may connect to other intermediate nodes. At least one intermediate node connects to the end node. B.3.3.2 Player Models The attacker and the defender are described next together with their view of the network.

• Attacker’s view of the network o The attacker has access to the node it is currently in and to its adjacent nodes o For each node the attacker accessed, the attacker keeps a list of attacks, ( , ,…, ), where 0 10; ; 1 10, is the strength of the attack type. + � �1 �2 �10 ≤ �� ≤ �� ∈ ℤ0 ≤ � ≤ o Th�ℎe attacker does not know any of the defender’s lists of defenses and detection levels�

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

157 28 March 2019

o At each game step, the attacker tries to attack one of the adjacent nodes in an attempt to reach the end node by incrementing an of the attack type associated with the adjacent node. If successful, the attacker moves to the adjacent node. Otherwise, the attacker remains in the current node it is in, resulting in it being blocked or detected and lost the game. An attack on a node is exhibited by incrementing by one a specific attack type, whose value has not reached its maximal value, of the node’s attack list. o An attack on node is successful only if the following is true: . > , where is the defence value for the attack value at the node � An attack � on node � is not successful if the following is true: o � � � � . � � , , where� is the defence value for the attack value� at the node� � and� resulting� in one of the following: � � � � � ≤• � The attack� is not detected but blocked, and the attacker� remains at � its current node • The attack is detected and the attacker loses the game o An attack at node is detected based on the chance (( 10)%) of being detected, where 0 10 is the detection level at node � �𝑑 ∗ • Defender’s view of the network≤ �𝑑 ≤ � o The defender has access to every node in the network. o The defender does not know any of the attacker’s lists of attacks. o For each node, the defender keeps a list of defenses and an attack detection level, ( , ,…, , ), where 0 , 10; , ; 1 10, is the strength of the defence type and is the detection level+ at that node. Once 1 2 10 � � 0 an� �attack� is successfully� �ℎ�𝑑 blocked,≤ the � hacker�𝑑 ≤ chance� �of𝑑 being∈ ℤ caught≤ �is ≤ determined stochastically by� the detection level.� 𝑑 o At each game step, the defender is allowed to increment by one either the detection value or the defence value of a defence type in a node.

• The network state is reflected by the view of the attacker’s and defender’s views of the network • “The game is a stochastic game due to the detection-chance variable.” B.3.3.3 Game Rules The game is played as described below. It has an initialization phase and player interaction phase.

• Initialization o Set all attacker’s nodes’ lists’ values to zero o Set defender’s nodes’ lists’ values appropriately, zero for where there is a security flaw and non-zero values between 1 and 10 according to the strength of the defence mechanism

• Game Procedure o Attacker and defender take turn at each game step, with attacker starting at the start node

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

158 28 March 2019

o After an attacker’s turn as described above in Appendix B.3.3.2, the move is evaluated to be one of the following: . successful and the attacker moves to the attacked node . blocked and not detected, and the attacker remains at its current (attacking) node . detected and the game is over with the attacker losing and the defender winning o Next, the defender takes its turn as described above in Appendix B.3.3.2.

B.3.4 Learning Models The four learning models are described next.

• Monte Carlo learning o Its estimated reward values of the state-action pairs that were selected during the game are updated at the end of the game with the same reward value as follows: ( , ) = ( , ) + ( ( , )), where �+1 � � . is the� reward� �obtained� �at �the end� ∗of �each − �game� � . , ranging from 0 to 1, is the learning rate . � is the current state. For the attacker, it is the node it is currently in. For the� defender, is always 0. Its state is determined by the environmental state� consisting of the attack values, defense values, and detection value of each node. � . is a possible action at state o The four exploration algorithms used are described below. . �-greedy strategy – selects the� best action with probability 1 and in the other cases it selects randomly an action out of the set of possible actions, where� 0 1 − � . Softmax – selects every possible action in proportion to its estimated reward according≤ � ≤ to the probability given as � ( , )

( ) = �� � (� , ), � � � � �� � � where � � ∑�=1 � • is the game iteration index • is the total number of possible actions • � is the temperature parameter, indicating the amount of exploration� . UCB-1 (Upper� Confidence Bound 1) – selects an action based on its estimated reward values and on the number of previous tries of that action. The algorithm is as follows: • At the start of the simulation, all actions that have not been tried will be chosen. • When no such actions remains, the action that maximizes ( ) will be chosen, where � �� �

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

159 28 March 2019

( ) = ( , ) +

� � ( ) , � � � � � �∗ln � � � � where  ( ) is the number of previous tries of action over all previously played games in the simulation  � is� the total number of previous tries of all actions� currently available over all previously played games in the� simulation.  is the exploration rate . Discounted (UCB), a modification of UCB-1 – gives the more recently tried actions to have� more influence on the value ( ) than the earlier ones. The modified algorithm is as follows: � • At the start of the simulation, all actions that� �have not been tried will be chosen as in UCB-1. • When no such actions remains, the action that maximizes ( ) will be chosen, where � � ( ) = ( , �) +� ( ) 2 � � ( , ) , � � � � � �∗log �� � �� �� � � where  is the maximal reward that can be obtained in a game  is the exploration rate  � ( ) = ( , ), where � is� the number of possible actions in time step � o �=1 � � � ∑ � � � � o ( , ) = { }, where { } is � � �−� the� condition �=1that the value��=� for time step� �=� must only� � be� added∑ to �the sum⊮ if action was⊮ performed in time step �  is the discount factor that determines the� influence of previous tries on the exploration� term � • Q-Learning o A Backward Q-learning algorithm combined with -greedy exploration strategy is used here. The reward is updated starting with the last visited state until the first visited state is reached. It is supposed to have better� speed and performance than the normal Q-learning algorithm. The update is done as follows: . ( , ) = ( , ) + ( ( , )) for the last state-action pair . ( , ) = ( , ) + ( max ( , ) ( , )) for the other �+1 � � state� -action� � pairs,� � � � ∗ � − � � � �+1 � � � � � � � � � � wher� ∗e � � �́ � − � � � • 0 1 is the learning rate • 0 1 is the discount rate • is≤ the � ≤ next state after taking action at state ≤ � ≤

�́ � � General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

160 28 March 2019

• Neural Network o Used for the defender agent only o Uses stochastic gradient descent back-propagation to train its weight o Uses sigmoid activation function in one hidden layer with 6 neurons o Has one input neuron for each defence or detection value in the network o Has one output neuron for each possible action

• Linear Network o Is the same as the neural network above but with no hidden layers o Has weighted connections from inputs to outputs o Was not able to use experience replay to train it and thus was not used in the experiments

B.3.5 Assumptions The assumptions made in [Elderman-2017] are • The concept of privilege escalation is based on successful attack, which is having incremented an attack level to be above a defensive level at a given node. • Dynamic defense fortification is provided with incrementing defense or detection level. • Only end node has asset value. • A simple network is sufficient to prove the effectiveness of different learning strategies • Mapping real network setup to the abstracted models is not a consideration of their investigation

B.3.6 Summary The authors in [Elderman-2017] investigate the effectiveness of various learning strategies used by agents when playing against each other on a cyber network in a simulation modelled as a Markov game with incomplete information. The learning strategies considered are Monte Carlo with various exploration options, backward Q-Learning, Neural Network, and Linear Network. “Monte Carlo learning with the Softmax exploration strategy” is found to be the “most effective in performing the defender role and also for learning attacking strategies.” Here, the authors are interested mainly in analyzing the effectiveness of different learning strategies. So the cyber simulation scenarios concern mostly with intrusion/attack detection. However, they do provide a very simple network model.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

161 28 March 2019

B.4. [Niu-2015] Defense and Control in Cyber-Physical Systems

B.4.1 Problem Definition In [Niu-2015], the authors propose a framework for defending a cyber system that needs to keep its controlled physical system stable. The cyber system uses Q-learning to derive an optimal strategy in the presence of attacks; while the physical system uses Q-learning to derive an optimal controller in the presence of uncertain dynamics.

B.4.2 Results Using their framework on a small-scale UAV (Unmanned Airborne Vehicle) helicopter with remote controller, they found that “the cyber defender is able to make thorough decisions by selecting appropriate cyber state vector and output and customizing the payoff function that is of interest.”

Formulating the interaction between the cyber system and the physical, they are also able to derive the condition needed to maintain control stability. In their example, the cyber system must defend its system to maintain the expected packet delays and losses values within certain bounds for it to be able to control the physical system to be within its stability bounds.

B.4.3 Solution Strategy The authors first formulate the interaction of the cyber-physical system and then formulate each system individually.

B.4.3.1 Formulation of the Cyber-Physical System Interaction Here, the authors formulate a mathematical representation for the cyber-physical system as follows:

Cyber System Modeling

• The cyber system defends its state of controlling the physical system affected by packet delay and loss, and monitors its defensive posture ( ). �� • The physical system outputs its state info back to the cyber system and computes its �� control output ( ). �� • The controlling state of the cyber system is described by a non-linear discrete-time �� system given by ( + 1) = ( ), ( ), ( ) , where , is the dimension of the state vector of the cyber system, is a malicious�� action taken by �� � ��� � � � �� � � �� ∈ ℝ �� the attacker, , a defense strategy taken by the cyber system, and the function � ∈ ℝ describes the new cyber system state as the result of it being attacked by and defended � ∈ ℝ � by at state at discrete time . � • ( ) = ( ), ( ) The� health of� the� cyber system is� described by , where , is the state of the physical system, � �is� the dimensionℎ ��� � of�� the� �state vector�� ∈of ℝthe �� � � � ∈ ℝ � General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

162 28 March 2019

physical system, and the function determines a quantized value based on the values of and . As an example, could be defined as follows: 1, ℎ � � = � , � o � 0, � �� �� ∈ �𝑐 �𝑎 �� ∈ ��� �� � wher�e� ℎ��� and𝑒� are the set of desirable values of the cyber state and physical state . �𝑐 ��� � � •� More specifically, the cyber state is represented as �

( + 1) = ( ) ( ) ( ) = �� �� ( ) ,

�� � �� � ����� � ��� � � � ���������� � � where �=0 �=0

o = [ , ,…, ] is an attack vector of types with 1, = 0 �� �0 �1 ��� �� • = 1, > 0 � �� �ℎ��� �� 𝑛 ��𝑎�� �𝑎��� 0, > 0 � ℎ �� � � �� �ℎ��� �� �𝑎��� �� � �𝑡� = , ,…, �ℎ o � is a defense�� �ℎ� vector�� �� 𝑛 of �𝑎��� types�� with� �𝑡� 1, =� 0 �� ��0 �1 ���� �� . = 1, > 0 � �� �ℎ��� �� 𝑛 ��𝑎�� �����𝑑 0, > 0 � ℎ �� � � �� �ℎ��� �� �����𝑑 �� � �𝑡� = , ,…, ;…; , ,…, is a matrix�ℎ of functions, each of o � �� �ℎ�𝑟 �� 𝑛 �����𝑑 �� � �𝑡� which : � describes� the� effect of� �the on-going attack/defense pair � ��00 �01 �0� �� 0 �� 1 �� � � ( , ) to the�� cyber �state� ��� ℝ → ℝ �� �� ��

Physical System Modeling

• The physical system is described as a linear discrete system given by o ( + 1) = ( ) ( ) + ( ) ( ) + ( ) ( ), ( ) = ( ), o �� � � �� �� � � �� � � � �� � � wher�� �e ��� �

. is the state of the physical system, . �� is the control input, �� ∈ ℝ . �� is the disturbance input, � ∈ ℝ . �� is the output of the physical system � ∈ ℝ � �� ∈ ℝ

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

163 28 March 2019

. × , × , × , and × are the system matrices�� �� �� �� � �� �� �� � ∈ ℝ � ∈ ℝ � ∈ ℝ � ∈ ℝ B.4.3.2 Formulation of the Cyber System Defence Here, the authors cast the attack and defense problem of the cyber-physical system as a two- player in a zero-sum game to derive optimal strategies.

Let the control state of the cyber system be described by

( + 1) = ( ) ( ) ( ) = �� �� ( ) ,

�� � �� � ����� � ��� � � � ���������� � � where ( + 1) describes the updated packet delay and loss profile�=0 �=0 based on the current values of ( ), ( ), and ( ). � � � � � � Let� its� output� function� � in the� quadratic form of the state vectors be

( ) = ( ) ( ) + ( ) ( ), where � � � � � � � � � • , representing� the� current� �healthΛ � state� of �the �cyberΛ � system� × × • � and represent the weighting matrices for each state � ∈ ℝ �� �� �� �� Λ� ∈ ℝ Λ� ∈ ℝ

Instead of finding the optimal strategy for each possible health states of the cyber system, the authors propose partitioning , the set of all possible values of , into disjoint subsets such that = … so that only the optimal strategies for each of the subsets need to be � �� ��� computed 1– reducing2 computational��� complexity. � � ∪ � ∪ ∪ � Next, define the instant reward as follows:

( ), ( ), ( ) = ( ) + ( ) ( ), � where ���� � �� � �� � � �� � ���� � − ���� � • ( ) = ( ) ( ) + ( ) ( ) is the health state of the cyber system, • ( ) is� the defense cost where� = [ , ,…, ] and each element �� � �� � Λ��� � �� � Λ��� � , , , , + is the cost of launching defense � ���� � ,�� �� 1 �� 2 �� � �� � ∈ ℝ • ( ) is the attack cost where = [ , ,…, ] and each element is �� � , , , , � + the cost of launching attack � ���� � , �� �� 1 �� 2 �� � �� � ∈ ℝ �� �

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

164 28 March 2019

Next, define the expected discounted payoff function for each subset as

� ( , , ) = [ ( (�)| , , )�], ∞ � � � � � � � � where � Ξ Ξ � ∑�=0 � � � � Ξ Ξ � ∈ � • = { (1), (2),…, ( ),…} and = { (1), (2),…, ( ),…} are the policies for the attack and defense, respectively, where ( ) and ( ) are the actions taken at Ξ� �� �� �� � Ξ� �� �� �� � time instant �� � �� � • is one of the partitioned subsets of � • [0,1) is the discount factor �� � • ( ) = ( ), ( ), ( ) is the instant reward defined above � ∈ � � ���� � �� � �� � �

Next, the authors prove the existence of the solution of the game based on [Raghaven-1990] that shows “The discounted zero-sum game always possesses a unique solution yielding the optimal game value.”

They also prove the existence of an optimal policy based on [Puterman-1994] that shows “The policy ( , ) is guaranteed to be optimal if ( , , ) satisfies the following fixed-point Bellman equation∗ ∗ ∗ ∗ Ξ� Ξ� � Ξ� Ξ� �� min ( , , ) = ( , , ) + , , , , ,

∗ ∗ ∗ ∗ � � � �𝑚 � � � � � � � � � � � Ξ Ξ � � � �� � � � � � ���́ �� � � ���Ξ Ξ �́ �� Ξ Ξ �́� where is the probability of transitioning from current state to the next state after taking action pair ( , ). � � � � �́ � � They propose� a �-function using Minimax- algorithm [Littman-1994] for each region as

� � �� ( , , ) = ( , , ) + , , , ,

� �� �� �� � �� �� �� � � ���́���� �� ���� �Ξ� Ξ� �́�� Then the optimal action dependent value function� �∈�́ is defined as follows: ∗ � ( , , ) = ( , , ) + , , , , ∗ ∗ ∗ � �� �� �� � �� �� �� � � ���́���� �� ���� �Ξ� Ξ� �́�� They then implement the -function as follows: ��∈�́ ∗ ( , , )�= 1 ( ) ( , , ) + ( )( ( , , ) +

�+1 � � � � � � � � � � � � where � � � � � − � � �� � � � � � � � � � �Θ ��́ � • ( ) is the learning rate that satisfies ( ) < and ( ) < + ∞ ∞ 2 �=1 �=1 � � ∈ ℝ ∑ � � ∞ ∑ � � ∞ General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

165 28 March 2019

• ( ) is the state value function given in [Littman-1994] as

Θ�(��) = ( , , ) ( , ), where ( , ) denotes the probability for � � �𝑚 �� � � � � � � � � � Θthe attacker� to� take∑ �action� � �given� � � and is updated� � � based on [Littman-2001] as � arg ( ) � { � � ( )} , ( ,. ) � (� ∈, �, ) , � � � �𝑚 �𝑚 �� � � � � � � � � � ≔ � � � ∑ � � � � � � � � B.4.3.3 Formulation� � of the� �Physical System Control Dynamics Next, the formulation of the controlled system in the presence of uncertainties induced by the cyber system to obtain an optimal controller is based on [Xu-2012], the details of which are omitted here.

The expected values of the delays and packet losses needed (and thus the cyber system should maintain) to achieve physical system stability is determined as follows:

min{ , } > 1 [1 min{ , } ]]/ ( ), � 1 2 1 2 � � � where � � − − Ψ Ψ � � �

( ) ( ) • = / ( ) � , a function of the packet delay � � �−� � � 0 � �−� 1 � � � �∫� � ��� Ψ ( ) ( ) ∫0 � �� • = / ( ) � � � , a function of the packet losses � � �−� � � 0 � �−� 2 � � � �∫� � ��� � Ψ 1, if was received during∫�0 � [ ,�(� + 1) ) • = �0, if � was lost during [ , ( + 1) ) ��−� ��� � �� • � �is−� the �delay in discrete time ��−� ��� � �� • = ( ) < 1 � � ( ) • � = � � � � � � � − � � • = � �−� �� ∫0 � �𝑑 • ( ) denotes�� the eigenvalue of the matrix M �� � • and are the parameters of the discretized physical system � � • and are the system matrices defined earlier in the physical system modeling �� �� B.4.4 Assumptions� � Some of the assumptions made by the authors are:

• The satisfactory range of continuous cyber system parameters can be partitioned into finite number of manageable categories. • For each attack, there exists has a corresponding mitigating solution. • There is the need to prove the existence of cyber system parameters’ bound for physical system control stability.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

166 28 March 2019

B.4.5 Summary The authors are able to show that their framework, which is a cyber system controlling a physical system, works on a small-scale UAV helicopter with remote controller. Q-learning is used in both the cyber and physical to manage cyberattacks in the cyber system and maintain the stability in the physical system. Optimal strategy based on Q-learning is first derived for the cyber system and then used to control the physical system. Here, the authors deal with the cyber- physical simulation. Their problem definition starts with the following assumptions: • that there exists a causal relationship between network transmission parameters and degree of cyberattack • that there is mechanism to mitigate cyberattack, specifically Smurf 70 and slow read 71 attacks

70 For more information, go to https://usa.kaspersky.com/resource-center/definitions/smurf-attack. 71 For more information, go to https://www.cloudflare.com/learning/ddos/ddos-low-and-slow-attack/.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

167 28 March 2019

Appendix C: Problem Abstraction in Cyber Simulation with Deception Strategies In this section, we describe the problem abstraction issues in [Durkota-2016]. In Table 20, we summarize the problem considered, the number of players involved and the technique used, and in Table 21, the advantages and disadvantages of the proposed solution in [Durkota-2016].

In the following subsections, we describe the problem, results, model, technique, and assumptions of [Durkota-2016] and then provide a summary.

Table 20 Deception Strategies in Selected Literature Literature Problem Considered # of Players Deception Techniques [Durkota- Optimize the placement of 2 players Exploit attacker’s belief in 2016] honey pots light of its limited network information

Table 21 Pros and Cons of Selected Deception Strategies Literature Pros Cons [Durkota- • Based on real network • Assume worst-case, well informed 2016] configurations attacker, having knowledge of the • Based on attack graph of a number and type of honeypots though network to represent not which ones attack plan library • Does not consider attacker’s attempts • Convert attack graph to to detect the honeypots Markov Decision Process • Does not reallocate honeypots (MDP) with pruning to automatically when a network is reduce further modified computational complexity • Does not consider insider’s attacks • Include costs of exploit, • Provide examples but limited guidance allocating honey pots of on setting the values of cost, different types, and being probability of network configurations, detected and penalty.

C.1. [Durkota-2016] Deceiving with Honeypots

C.1.1 Problem Definition In [Durkota-2016], the authors study how to optimally place honeypots in a network by analyzing attacker’s belief of the network configuration. With limited information, the attacker may view a specific network as appearing to be one of its many variants. Using game theory, the

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

168 28 March 2019

authors model deception and information uncertainty to predict and exploit opponents’ belief in optimally allocating honeypots.

C.1.2 Results The proposed solution uses automatically constructed attack graph as input. The initial validation includes human decision makers. The authors found

• “… humans were effective at defending against human attackers, but fared poorly against worst-case opponents.” • “The game-theoretic strategies are robust, but there are opportunities to further exploit weaknesses in human opponents.”

C.1.3 Problem Modeling Here, we describe briefly the players’ attributes and how optimal strategy is computed. C.1.3.1 Player Definition The two players are as follows:

• the administrator (defender) chooses a honey pot allocation that minimizes cost and the expected loss caused by an attacker compromising the network and • the attacker chooses a strategy that maximizes the gain and minimizes cost in attacking the network while avoiding honey pots C.1.3.2 Computing Optimal Hardening Strategy [Durkota-2015a] Here we describe the general steps to obtain the optimal allocation strategy. We assume that

• Exploit actions have costs. • Defensive actions have costs. • The attacker can be detected and penalized. • Different types of honeypots have different costs.

In a network with a set of different host-types, two hosts are considered of the same type if they run the same services, have the same network connectivity (same network segment), and the same value (same asset value). All hosts of the same type present the same attack opportunities.

The real defender network appears to the attacker in one of many variants due to the attacker lack of defender’s network information. Analyzing each of these variant networks with different allocation of honey pots, the defender chooses one that optimizes its allocation strategy against the attacker who maximizes its attack strategy.

Basically, the strategy profile for the defender and the attacker in Stackelberg equilibrium is as follows: � �

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

169 28 March 2019

( , ) = max ( , ) ; ( ) where � � � � � � � ��� �́ �∈∑� �́ �∈��� �́ � � �́ �́ • is a pure strategy for player { , } • is the set of all pure strategies for { , } � � • � ∈ Π , a mixed strategy for player � ∈ {�, �}, is a probability distribution over the pure � strategiesΠ � ∈ � � � � • � ∈ is∑ the set of all mixed strategies � ∈ � � • ( , ) denotes the utility for defender when following strategy and the attacker � ∑following strategy � � � � • � �́( �́) is the best response pure strategy� for attacker against defender�́ strategy � �́ ��� �́� ∈ Π� �́� C.1.4 Deception Strategy There is no specific learning technique used here. However, game theory is used to analyze how a defender could optimally allocated honeypots against an attacker’s strategy developed with limited network knowledge. Deception is possible and successful if the following are true: • Defender has more information about the game state than the attacker. • Attacker has to form beliefs that may not be completely true prior to attack. • Attacker’s beliefs of the likelihood of the network state affect its choice of moves. • Defender can exploit the attacker’s uncertainty about the network to compute optimal honeypot allocations for the network. • The uncertainty appears in the form of the large number of network configurations that look the same to the attacker

C.1.5 Assumptions In coming up with the solution, the authors make the following assumptions: • The attack graph is always correct which may not be always true as described in [Sommestad-2015a]. • Different network configurations will require different honeypot allocation strategies. • The attacker interaction with a honeypot is assumed to be easily be detectable and thus immediately alerted and stopped, but not for interaction with a non-honeypot device. The authors did not explain why interaction with non-honeypot devices cannot easily be detected. • The attacker knows the number of deployed honeypots and their type but not which ones. • The defender can find the best strategy for each strategy that attacker formulates.

C.1.6 Summary In [Durkota-2016], the authors study how to optimally place honeypots in a network by analyzing attacker’s belief of the network configuration. Though the proposed solution looks promising, the authors suggest that “there are opportunities to further exploit weaknesses in human opponents.” The authors base their investigation on realistic network configurations as they use attack graphs as their starting point. However, their solutions are only applicable to allocation of honeypots.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

170 28 March 2019

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

171 28 March 2019

Appendix D: Problem Abstraction in Cyber Simulation for Threat Assessment In this section, we describe the problem abstraction issues in [Moskal-2018]. In Table 22, we summarize the problem considered, the number of players involved and the technique used, and in Table 23, the advantages and disadvantages of the proposed solution in [Moskal-2018].

In the following subsections, we describe the problem, results, model, technique, and assumptions of [Moskal-2018] and then provide a summary.

Table 22 Assessment Strategies in Selected Literature Literature Problem Considered # of Players Assessment Techniques [Moskal- Assess cyber threat by 2 players Simulation of attack scenario 2018] simulating attack scenario with user specifiable with adversary and adversarial behavior and network models network models

Table 23 Pros and Cons of Selected Assessment Strategies Literature Pros Cons [Moskal- • Pre-emptive solution • Need to specify attacker’s 2018] • Model different attacker preference parameters for target behaviors node, source node, and exposure for • Model network configuration each attack action • Model attacker knowledge of • Modeling accurately the attacker’s the network uncovered behavior observed �in real during the attack stages cyberattacks may not be easy • Use the concept of kill chain • No active defense strategy in modeling attacker considered behavior • Various attack scenarios based on different attacker’s behavior and network configurations can be assessed • Can generate synthetic cyberattack data for research purposes • Can analyze impact of vulnerabilities or exposures

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

172 28 March 2019

D.1. [Moskal-2018] Cyber Threat Assessment

D.1.1 Problem Definition The authors in [Moskal-2018] “present a method to measure the overall network security against cyber attackers under different scenarios.” Each attack scenario incorporates attacker’s behavior and knowledge of the network that can be configured with different kinds of vulnerabilities or exposures. See [Moskal-2016] for a comprehensive treatment “of tools and information used by cyber attackers.”

D.1.2 Results Performing simulations of four attacker types (expert, amateur, comprehensive, and random) on two network configurations (with and without misconfiguration), they found that “different attacker behaviors may lead to different ways to penetrate a network, and how a single misconfiguration may impact network security.”

More specifically, some of results they found are the following:

• On a non-misconfigured network, o “Amateur” and “Expert” achieve the same goal within similar time frame, using different strategies. o “Amateur” uses simple exploits and actions that are more detectable. o “Expert” is more methodical and performs minimum required actions to achieve objective o “Comprehensive” takes the longest time to ultimately achieve its goal. • On a misconfigured network such as having a backdoor, o “Expert” achieves its goal in lesser time by about 26%. o “Random” begins to be as successful as “Amateur” on average.

D.1.3 Problem Modeling D.1.3.1 Network Model Network model is represented by a set of network nodes and a set of vulnerabilities or exposures, . The children set of a node is denoted by and the set of nodes that could serve as potential attack source nodes is denoted by = { | � , }. Basically, any node � that has children� can be an attack source node.� ∈ � � � � � � ≠ ∅ ∀� ∈ � D.1.3.2 Attacker Behavior Model (ABM) An attack action is a triple, = ( , , ), where , , and . In other words, taking action , we exploit exposure at node from node . The attack action set = { | and , }, where� is� the� � set of parents� ∈ �of� node ∈ � and� ∈ �is the set of exposures or vulnerabilities� of node . � � � � � � ∈ � � � � � � ∈ � ∀� ∈ � � � � �

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

173 28 March 2019

At each step in a simulated attack scenario, the attacker takes a decision-making process that is based on its attack behavior model consisting of capability, opportunity, and intent, and its knowledge before deciding on its likely preferred attack action. The attacker’s behavior and knowledge are described next, together with how an attack action is selected.

• The attacker’s intent is a set of attack step’s specific exposures, . • The attacker’s opportunities is represented by the minimum viable kill chain consisting of ��� the following states, {Reconnaissance, Breach, Exfiltration}, each� of which maps onto the action set . “The selection process of the kill chain stages employs fuzzy logic to capture a balance between a rule-based behavior model and a probabilistic model. The parameters used� by the fuzzy logic depend on the attacker’s accumulated knowledge of the network and the outcomes of past actions.” • The attacker’s capability is the set of exposures, the attacker is capable of exploiting. is then the set of exposures the attacker can exploit at node and , that � is the attacker may not intent to use all of its capability.� � � ��� � • Th� e∩ attacker’s � knowledge, , of the network is a set of variables describing� � the ⊆virtual � terrain (VT 72) that includes known sources, targets, open ports, services, vulnerabilities, etc. � • The probability of an attack action , ( ) of being selected is based on the attacker’s preference parameters specified for the target, source, and exposure within an attack action . The details of the parameters� ∈ � � � specification are omitted here but are given in [Moskal-2018]. These parameters are used to define the following probabilities: � o ( ) is the probability of choosing a target node o ( | ) is the probability the source node is given the target node is � � � o ( | ) is the probability the exposure is given the target node is � � � � � Given� � � the above probabilities, then ( )�= ( ) ( | ) ( | )�, .

� � � � ∗ � � � ∗ � � � ∀� ∈ � D.1.4 Assessment Strategy Attack scenarios with adversary and network model are simulated to assess cyber threat. After the adversary and the network are initialized, an attack action a A is selected at each simulation step at random based on the set of generated ( ) . This attack action is used by the simulator to apply its impacts to the network and to update the ABM’s knowledge∈ depending on the result of the attack action. � � ′�

“This ABM process is repeated until one of three conditions is met: 1. the intent module reports the intent has been satisfied; 2. the ABM reports the attacker has chosen to stop the attack based on some user-defined condition; or 3. the simulation core logic times out the attack (fail safe).”

72 Virtual Terrain (VT) is a network model used in Cyber Attack Simulator and consists of hosts, routers, and users. Virtual Terrain v2 (VT.2) is a network model based on VT and used in MASS (Multistage Attack Scenario Simulator). It represents all network members generically as nodes. [Wheeler-2014]

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

174 28 March 2019

D.1.5 Assumptions Some of the assumptions the authors made are as follows: • The preference parameters for target nodes, source nodes and exposure within an attack action can easily be specified • The minimum set of chain states {Reconnaissance, Breach, Exfiltration} is sufficient to simulate� various attack scenarios.

D.1.6 Summary Using attack scenarios that incorporate attacker’s intent, opportunity, capability, knowledge, and preference, and networks of interconnected nodes with vulnerabilities or exposures, the authors use simulation to assess how different attack scenarios may transpire. The authors provide a rich cyber simulation environment to assess various attack scenarios. However, they do not consider any active defence strategy.

General Dynamics Mission Systems–Canada “Use, duplication or disclosure of this document or any of the information contained herein is subject to the restriction on the title page of this document.”

175 28 March 2019

DOCUMENT CONTROL DATA *Security markings for the title, authors, abstract and keywords must be entered when the document is sensitive 1. ORIGINATOR (Name and address of the organization preparing the document. 2a. SECURITY MARKING A DRDC Centre sponsoring a contractor's report, or tasking agency, is entered (Overall security marking of the document including in Section 8.) special supplemental markings if applicable.)

Delfin Montano CAN UNCLASSIFIED Solana Networks Inc. via General Dynamics Mission Systems–Canada 2b. CONTROLLED GOODS 1941 Robertson Road NON-CONTROLLED GOODS Ottawa, Ontario DMC A K2H 5B7

3. TITLE (The document title and sub-title as indicated on the title page.)

Analysis of Autonomous Cyber Operation Simulation Environments

4. AUTHORS (Last name, followed by initials – ranks, titles, etc., not to be used)

Montano, D.

5. DATE OF PUBLICATION 6a. NO. OF PAGES 6b. NO. OF REFS (Month and year of publication of document.) (Total pages, including (Total references cited.) Annexes, excluding DCD, covering and verso pages.) September 2019 189 216

7. DOCUMENT CATEGORY (e.g., Scientific Report, Contract Report, Scientific Letter.)

Contract Report

8. SPONSORING CENTRE (The name and address of the department project office or laboratory sponsoring the research and development.)

DRDC – Ottawa Research Centre Defence Research and Development Canada, Shirley's Bay 3701 Carling Avenue Ottawa, Ontario K1A 0Z4 Canada

9a. PROJECT OR GRANT NO. (If appropriate, the applicable 9b. CONTRACT NO. (If appropriate, the applicable number under research and development project or grant number under which which the document was written.) the document was written. Please specify whether project or grant.) W7714-115274/001/SV 05ac - Cyber Decision Making and Response (CDMR)

10a. DRDC PUBLICATION NUMBER (The official document number 10b. OTHER DOCUMENT NO(s). (Any other numbers which may be by which the document is identified by the originating assigned this document either by the originator or by the sponsor.) activity. This number must be unique to this document.)

DRDC-RDDC-2019-C257 CAD00010262

11a. FUTURE DISTRIBUTION WITHIN CANADA (Approval for further dissemination of the document. Security classification must also be considered.)

Public release

11b. FUTURE DISTRIBUTION OUTSIDE CANADA (Approval for further dissemination of the document. Security classification must also be considered.)

12. KEYWORDS, DESCRIPTORS or IDENTIFIERS (Use semi-colon as a delimiter.)

Cyber; Cyber Defence; Cyber Operations; Agent-Based Simulations

13. ABSTRACT/RÉSUMÉ (When available in the document, the French version of the abstract must be included here.)

Autonomous Cyber Operations is an important area of research and so is having a simulation and modeling tool that can optimize, assess, and enable exploration of strategic autonomic algorithms, especially against adversarial artificial intelligent agents in the network supporting mission critical operations. To assist in informing decisions about procuring or developing cyber simulation environments for the purpose of automated cyber operation experimentation, we investigate existing literature for cyber simulation solution theories, methods, tools, and results, and categorize them according to their simulation objectives by analyzing their key characteristics, including problem abstraction issues. Based on the investigation results, we formulate an attributes enumeration template for analyzing simulation environments in generic terms, and illustrate the application of the attribute template to selected solutions. Next we tailor the attributes enumeration template for analyzing solutions that deal with assessing security posture with intelligent actors, both benign and adversarial, and illustrate the application of the goal-specific attributes enumeration template to selected solutions. Finally we discuss sample building block solutions and propose a simple buy or build decision making process for obtaining cyber simulation environments.

L’émergence de la opérations autonomes en ligne en tant que domaine de recherche important a entraîné le besoin d’outils de simulation et de modélisation qui peuvent optimiser, évaluer et permettre l’exploration d’algorithmes stratégiques autonomes. Ces algorithmes permettraient aux systèmes de protéger les opérations essentielles à la mission contre les agents intelligents artificiels adverses déployés sur le réseau. Pour éclairer les décisions concernant l’acquisition ou le développement d’environnements de cybersimulation à des fins d’expérimentation de cyberopérations automatisées, nous examinons la documentation existante sur les théories, méthodes, outils et résultats des solutions de cybersimulation. Nous classons les travaux existants en fonction de leurs objectifs de simulation en analysant leurs caractéristiques clés, y compris les problèmes d’abstraction. À partir des résultats de l’enquête, nous formulons un modèle d’énumération des attributs pour analyser les environnements de simulation en termes génériques et illustrons l’application du modèle d’attributs à des solutions choisies. Ensuite, nous adaptons le modèle d’énumération des attributs pour analyser les solutions qui traitent de l’évaluation de la posture de sécurité pour les scénarios impliquant des acteurs intelligents, tant bénins que conflictuels. Nous illustrons l’application du gabarit d’énumération des attributs propres aux objectifs à des solutions choisies. Enfin, nous discutons d’exemples de solutions de base et proposons un processus de prise de décision simple d’achat ou de construction pour l’obtention d’environnements de cybersimulation.