A Knowledge-Based Approach to Network Security: Applying Cyc in the Domain of Network Risk Assessment

A Knowledge-Based Approach to Network Security: Applying Cyc in the Domain of Network Risk Assessment Blake Shepard, Cynthia Matuszek, C. Bruce Fraser, William Wechtenhiser, David Crabbe, Zelal Güngördü, John Jantos, Todd Hughes, Larry Lefkowitz, Michael Witbrock, Doug Lenat, Erik Larson Cycorp, Inc., 3721 Executive Center Drive, Suite 100, Austin, TX 78731 {blake, cynthia, dcrabbe, zelal, jantos, larry, witbrock, lenat}@cyc.com [email protected], [email protected], [email protected], [email protected] Abstract about what computers are on the network, what programs are installed or running on those computers, what privileges CycSecureTM is a network risk assessment and network monitoring the running programs have, what users are logged into the application that relies on knowledge-based artificial intelligence computers, how the parameters of critical operating system technologies to improve on traditional network vulnerability as- files are set, etc. The formal model is stored directly in the sessment. CycSecure integrates public reports of software faults KB, where it is accessible for inference and planning. This from online databases, data gathered automatically from computers on a network and hand-ontologized information about computers formal representation also allows users to interact directly and computer networks. This information is stored in the Cyc® with the model of the network using ontology editing tools, knowledge base (KB) and reasoned about by the Cyc inference allowing testing of proposed changes or corrections to the engine and planner to provide detailed analyses of the security (and network before implementation. vulnerability) of networks. CycSecure Server CycSecure Outputs Sentinels 1 Introduction Dynamic HTML User Interfaces Natural language (NL) descriptions of Network Model Network Attack Query network attack plans Statistics Plan In maintaining secure computer networks, system adminis- Editor Viewer Tool Tool Analyzer trators face an increasingly time-consuming task. Much of NL descriptions of network remedy the difficulty derives from the burden of information man- steps agement: the amount of information required is enormous, Cyc Inference Engine & HTN Planner much of it changes rapidly, and the relevant information can Detailed, editable and web-browsable be difficult to identify. Existing tools are difficult to keep network model Cyc’s Computing completely updated, and the format in which they provide Common Domain Network Sense NL answers to nearly Model Knowledge information can be unwieldy. In this paper we describe Cyc- Knowledge arbitrary NL queries Stored in In Cyc KB Secure , an emerging AI application in the domain of net- Cyc KB CERT about the model work security. This work endeavors to address some short- Bug- comings of existing security software by exploiting Traq strengths in the Cyc® technology [Lenat, D. B. and Guha, R. V. 1990], [Lenat, D. 1995]: a large knowledge base (KB), Figure 1. CycSecure Architecture natural language input and output modules, a well- The planner has access to knowledge about software developed inference engine and an integrated planner. faults that have been reported on standard security tracking CycSecure is a combination of information gathering sites (primarily Bugtraq and CERT1). These faults are tax- and AI technologies, integrated into the much larger and onomized according to a finely articulated ontology that more general Cyc system. CycSecure operates by scanning a includes knowledge about how to exploit each type of computer network to aggregate information that describes weakness. Users can direct the Cyc planner to assemble the network’s state. It then uses this information to build a specific network attack plans that capitalize on one or more formal representation or model of the network, based on weaknesses in the modeled network. Cyc’s inference engine Cyc’s pre-existing ontology of networking, security, and reasons about plan structures to identify critical nodes in computing concepts. The model incorporates information 1 Bugtraq is available at: http://www.securityfocus.com. It is a Copyright © 2005, American Association for Artificial Intelli- public forum for reporting security faults. Fixes and patches, as gence (www.aaai.org). All rights reserved. “Cyc” is registered well as exploits, are posted. The CERT website provides detailed trademark, and “CycSecure” is a trademark, of Cycorp, Inc. assessments of software faults: http://www.cert.org. attack plans to propose single-step remedies (e.g., recom- CycSecure server. The server in turn rests on a Cyc image, mending downloading software patches to fix critical faults, which includes the components described above, as well as or recommending specific alternative programs with the a set of interfaces that allow users to interact with network same functionality). models. The interfaces include an editor, a network statistics reporter, a viewer (for browsing information about different 2 Application Description parts of the network) and an interface to the planner. The CycSecure server is designed to run on a dedicated machine. 2.1 Cyc Technology Overview It handles all user interaction and KB interaction tasks. Each client machine on the network runs a daemon (or Cyc is composed of a large KB, an inference engine, a plan- Sentinel) that gathers information about the software, hard- ner and natural language components. ware and status of the machine it is running on. The server • The KB: common-sense knowledge encoded in the CycL polls the Sentinels, gathers network information from them, language. Approximately (as of March 2005) three mil- and then represents that information in the KB. lion assertions (facts and rules) interrelating over a quar- ter of a million concepts. 2.3 CycSecure Software Components • The inference engine: a resolution-based reasoning system, consisting of a general theorem-prover and a grow- CycSecure Sentinels and Server ing regiment of over eight hundred special-purpose mod- The Sentinels are small software daemons that run on each ules. Each module is an implementation of a pattern of machine on the target network. They are designed to con- reasoning for a common type of problem. sume very few system resources; such remote agents can be • The planner: an extension of SHOP [Nau et al. 1999], installed on even slow machines on a network without in- which is an efficient hierarchical task network planner. curring a serious performance cost [Humphries, J. W. and • NL parsing and generation components: a general lexi- Carver, C. A. Jr. 2000]. Each Sentinel is dormant until con, parsing, generation and discourse modules. NL gen- polled by the server. When polled, a Sentinel gathers infor- eration underpins all CycSecure interfaces. mation about the machine it is running on and relays it to The common-sense knowledge in the KB consists, in part, the server. Sentinels are extremely platform dependent, but of categorizations of objects and general truths about types they share a common XML protocol understood by the of objects. Having this pre-existing framework simplifies server; as a result, they can be written for multiple plat- forms, potentially including dedicated firewall hardware or encoding domain-specific knowledge, such as that required 2 for CycSecure. CycSecure’s knowledge focuses on topics smart hubs. The server translates the information returned such as computer programs and their functionalities, com- by the Sentinels into CycL and adds it to the KB. puter network vulnerabilities and network attack plan ac- This approach is potentially vulnerable to an attack on tions. The representations of these classes was facilitated by the Sentinels themselves, or to spoofing of their communi- the existing representations of more “common-sense” con- cation. The latter concern can be addressed by encrypting cepts: types of information-bearing things, such as prose the communication stream. However, since each Sentinel is text, emails, computer viruses, applications and operating sending unique information about the machine on which it systems; concepts of information flow and information is installed, the former concern is very real. Installing each transfer; and even the basic concept of a computer, as a Sentinel with a communications wrapper that contains a temporal, tangible thing. It was also useful to draw on the rolling key helps somewhat, but is still subject to some pre-existing representations of human agents as having mo- kinds of attacks and increases the complexity of upgrading tives, drives and behavior patterns, many of which are rele- the Sentinels over time. vant to the way networks are used and thus affect security. CycSecure’s architecture is not dependent on using Sen- Given the broad range of existing ontological distinc- tinels. It could also utilize third-party daemons that may tions in the KB, it was clear how to incorporate the requisite have already solved some of these problems. new classes of security domain knowledge. A team of just 3-4 trained knowledge enterers was able to add all relevant CycSecure User Interfaces knowledge to the Cyc KB in under 18 month’s time. This Users access the CycSecure server via several interfaces: the rapidity was facilitated by the ability to forgo spending time Attack Planner, the Model Editor, the Network Viewer, the stating relevant general facts, such as the fact that an elec- Query Tool and the Network Statistics tool. Each of these tronic device that is turned on will stay

A Knowledge-Based Approach to Network Security: Applying Cyc in the Domain of Network Risk Assessment

A Survey of Top-Level Ontologies to Inform the Ontological Choices for a Foundation Data Model

Wikipedia Knowledge Graph with Deepdive

Artificial Intelligence

Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs

Using Linked Data for Semi-Automatic Guesstimation

Why Has AI Failed? and How Can It Succeed?

Expert Systems: an Introduction -46

Logic-Based Technologies for Intelligent Systems: State of the Art and Perspectives

Knowledge on the Web: Towards Robust and Scalable Harvesting of Entity-Relationship Facts

Racer - an Inference Engine for the Semantic Web

Talking to Computers in Natural Language

Wikitology Wikipedia As an Ontology