THE UNIVERISTY OF NEW SOUTH WALES

Incremental Knowledge Acquisition for Search Control Heuristics

Ghassan Beydoun BE- University of NSW

Department of Artificial Intelligence School of Computer Science and Engineering University of New South Wales Sydney 2052, Australia

A thesis submitted as partial requirement of a Doctor of Philosophy (Computer Science and Engineering) March, 2000 Abstract

In recent years of development of expert systems, it has become evident that knowledge provided by the expert is always context dependent. This led to new modern incremental knowledge acquisition methodologies which aim to incrementally reconstruct experts knowledge in the context of its use. This thesis presents such a framework to model expert search processes. This by-passes the difficult, costly time-consuming task of developing effective domain specific heuristics. Adopting incremental knowledge acquisition is also justified because experts do not normally have complete introspective access to their search knowledge. They tend to give justifications of their search considerations rather than a complete explanation. We target at the implicit representation of the less clearly definable quality criteria by allowing the expert to limit their expressed knowledge to explanations of the steps in the expert search process.

For the basis of our knowledge acquisition (KA) approach, we substantially extend the work done on Ripple Down Rules which allows knowledge acquisition and maintenance without analysis or a knowledge engineer. Our extension allows the expert to enter his/her own domain terms during the KA process, thus the expert provides a knowledge level model of his/her search process. We call this framework Nested Ripple Down Rules (NRDR). Our NRDR formalism also addresses some shortcomings of RDR which have been a concern of substantial research efforts in the past few years: repetition, lack of explicit modelling and readability. The way our approach deals with these shortcomings is evaluated against recent work.

Another significant contribution of this thesis is a formal framework for analysing the knowledge acquisition process with RDR in general and NRDR in particular. Using this framework, we show that maintenance of an NRDR knowledge base (KB) requires similar effort to maintaining RDR for most of the KB development cycle. We show that when an NRDR KB shows an increase in maintenance requirement in comparison with RDR KB during its development, this added requirement can be automatically handled using past seen cases without the need for further interactions with the expert. Hence, because the NRDR framework avoids repetition, the number of interactions with the expert required for an NRDR knowledge base to converge is less than that for an RDR knowledge base in the same domain.

Finally, we employ our approach to incrementally acquire expert chess knowledge for performing a highly pruned tree search. These experimental results in the chess domain are a clear evidence for the practicality of our approach. Acknowledgments

As in all long journeys, moments of uncertainty and anxiety arise. I am grateful for my supervisor Dr Achim Hoffmann for his mentoring and generous support in such moments. I also thank him for his technical support, for his frequent insights and continuous oversight. He made the journey towards this thesis, not only possible, but also often stimulating and enjoyable. Without him, this thesis would not be. I also must thank Professor Paul Compton who generously gave me time for discussions and insights, and provided me with feedback on some of the materials published during the journey towards this thesis. I also would like to thank all the anonymous reviewers who reviewed my publications and gave extremely useful and stimulating feedback. This helped greatly towards preparing this thesis. I would like to thank the following people for reviewing parts of this thesis during its development: Achim Hoffmann, Rodrigo Martinez-Bejar, Jane Brennan, Donia Beydoun, Ravi Venkatesan and Hendra Suryanto. I would like to thank all members of the Artificial Intelligence Department for the time we spent together and for their interesting feedback on my research. In particular, I thank Mark Peters, Debbie Richards, Stefan Kaspar, Jane Brennan, Seung-Gwon Kim, Nasser Esmaili, Philip Preston, Hossein Shiraz, Joe Thurbon, Rex Kowk and Dongmo Zhang. I would like to thank all my friends who unknowingly give me strength and help me put the whole endeavour into perspective. I would like to thank my family members for their continuous love and belief in me. Finally, I thank God for all the above people and all other parameters of my earthly existence. With love to my father and mother: He inspired me to seek knowledge, and she taught me to smile to all challenges .. Table of contents

Chapter 1. Introduction ...... 1 l. l. Human searching and the field of Artificial Intelligence ...... 3 1.2. Thesis contributions ...... 4 1.3. Thesis structure and layout ...... 6 1.4. Chapter summary ...... 9

Chapter 2 Overview of the knowledge acquisition field ...... 11 2.1. KA efforts in 1970's ...... 12 2.2. KA efforts in 1980's ...... , ...... 13 2.3. KA efforts in 1990's ...... 15 2.3.1. Domain independent knowledge analysis ...... 16 2.3.2. Domain dependent knowledge modelling ...... 17 2.4. The situated cognition view of knowledge ...... 18 2.5. Chapter summary ...... 21

Chapter 3. The state of the art of Ripple Down Rules ...... 23 3.1. Incremental KA with RDR from a system's development perspective ...... 24 3.2. RDR basics ...... 25 3.3. From RDR to Multiple Classification RDR (MCRDR) ...... 28 3.4. Nested Ripple Down Rules (NRDR) ...... 31 3.5. NRDR and limitations ofRDR and MCRDR ...... 33 3.5. l. Lack of modularity ...... 34 3.5.2. Repetition in (MC) RDR knowledge bases ...... 35 3.5.3. Lack of explicit modelling ...... 37 3.5.4. Concluding remarks on modelling and NRDR ...... 38 3.6. Other recent RDR research ...... 39 3.6.1. Fuzzy RDR ...... 40 3.6.2. RDR theoretical work ...... 41 3.7. Chapter summary and conclusion ...... 42

Chapter 4. A Formal framework of Ripple Down Rules ...... 43 4.1. Formal framework for RDR structures ...... 44 4.1.1. RDR semantics ...... 44 4.1.2. The knowledge acquisition process ...... 50 4.1.3. Structural definitions ...... 52 4.2. Correctness of rules and development of RDR structures ...... 53 4.3. Accuracy of RDR with respect to past seen cases ...... 58 4.4. RDR as a non-monotonic reasoning approach ...... 59 4.4.1. Default logic ...... 60 4.4.2. RDR and Default logic ...... 61 4.4.3. Mapping RDR to a default theory ...... 62 4.4.3. Mapping a default theory to an RDR tree ...... 64 4.5. Chapter summary ...... 68

Chapter 5. Acquiring search knowledge incrementally...... 70 5 .1. Knowledge on search heuristics ...... 71 5.2. Acquiring search knowledge ...... 73 5.3. Requirements for a KA environment ...... 75 5.4 Smart Searcher 1.3 (SmS) ...... 77 5.4.1. System architecture ...... 77 5.4.2. NRDR user interface ...... 83 5.5. SmS reuse in search domains ...... 85 5.6. Chapter summary and conclusion ...... 89 Chapter 6. Nested Ripple Down Rules ...... 92 6.1. Why Nested Ripple Down Rules? ...... 93 6.2. Overview of Nested Ripple Down Rules ...... 96 6.3. NRDR technicalities ...... 97 6.4. NRDR development policies ...... 100 6.5. NRDR and default logic ...... 104 6.5.1. Mapping NRDR to default theories ...... 104 6.5.2. Mapping default theories to NRDR ...... 105 6.6. NRDR Example: The eccentric film producer...... 109 6.7. A philosophical perspective on NRDR ...... 11 l 6. 7 .1. Roots of confirmation ho Iism in philosophy ...... 111 6.7 .2. NRDR and confirmation holism ...... 113 6.7 .3. Implicit assumptions in knowledge acquisition (KA) with NRDR ...... 115 6.8. Chapter summary ...... 116

Chapter 7. Formal analysis of NRDR ...... 118 7.1. Probabilistic analysis of inconsistencies in NRDR ...... 119 7 .2. Statistical analysis of NRDR concepts ...... 125 7.3. Knowledge base convergence and inconsistencies ...... 130 7.4. Order dependence and maintenance of NRDR concepts ...... 135 7.4.l. Order independence ofRDR development ...... 136 7.4.2. RDR and order dependence ...... 137 7.4.3. Towards order independent RDR development...... 139 7.5. NRDR maintenance ...... 142 7.5.l. Nested Ripple Down Rules and Decision Lists ...... 142 7.5.2 Handling inconsistencies in NRDR ...... 143 7.5.2.1. Automatic fix of inconsistencies ...... 144 7.5.2.2. Reviewing the expert fixing the inconsistencies ...... 148 7.6. Chapter summary and conclusion ...... 150

Chapter 8. A case study: NRDR and SmS in chess ...... 152 8.1. Tuning SmS to the domain of chess ...... 153 8.1.1 Search Knowledge Interaction Language (SKIL) overview ...... 153 8.1.2. Adapting SKIL to chess ...... 155 8.2. Incremental acquisition of chess knowledge ...... 159 8.2.1 Multi-point modification ...... 160 8.2.2 Use of working memory during knowledge acquisiton ...... 163 8.3. Using knowledge to prune chess search ...... 164 8.4. SmS in chess endgames ...... 166 8.5. Contributions of chess adapted SmS ...... 170 8.6. Chapter summary ...... 17 l

Chapter 9. Discussion and critique ...... 173 9 .1. The Knowledge Level, SmS and NRDR ...... 17 4 9.2. SmS's limitations ...... 177 9.2.1. Extending the explanatory primitives incrementally ...... 177 9.2.2. Can we extend the search operators incrementally? ...... 178 9.2.3. Can we change the search state representation in SmS? ...... 180 9.2.4. A final word on the generality of SmS ...... 181 9.3. Chapter summary and conclusion ...... 183

Chapter 10. Summary and conclusion ...... 184 10.1. Thesis summary ...... 184 10.2. Contributions ...... 185 10.3. Future research ...... 189 Appendix A. SmS's interface manual ...... 191 A. l High level knowledge bases functionalities ...... 193 A.2 Functionalities for high level browsing of an opened knowledge base ...... 195 A.3 Chess functions ...... 197 A.4 Analysing concepts ...... 201 A.5 Testing a case ...... 204 A.6 Testing for inconsistencies ...... 210

Bibliography ...... 213 List of figures

3.1. A single classification Ripple Down Rule tree 3.2. A Multiple Classification Ripple Down Rule knowledge base 3.3. A Nested Ripple Down Rule knowledge base 4.1. A typical representation of an RDR tree 4.2. A set theory view of the RDR tree shown in 4.1 4.3. Conversion steps from a prioritised default theory D to an RDR tree T 5.1 SmSl.3 Architecture 5.2. SmS knowledge base overview 5.3. Graphical representation of a concept definition in SmS 5.4. Graphical view of the knowledge base in SmS 5.5. The domain specific interactions within SmS 6.1. A simple example of Nested Ripple Down Rules 6.2. Conceptual hierarchy interactions 6.3. Venn's diagram representations ofNRDR interactions. 6.4. Conversion steps from a prioritised default theory D to an NRDR Knowledge Base K 6.5. An example of an NRDR knowledge base 7 .1. Inconsistencies in NRDR 7 .2. Domain shift 7.3. Theorem 2 proof 7 .4. Coverage of newly added rules as knowledge base size increases 7.5. Knowledge base size and accuracy for KBS built for Chess and Medical diagnosis domains 7.6. Knowledge base size and accuracy for KBS built for Tictactoe domain 7.7. Inconsistencies frequencies versus knowledge base correctness 7 .8. Exception rules and order of case presentation 7 .9. Possible relations between hypothetical rules in different orders of presentation 7 .10. Inconsistencies from decision lists perspective 8.1. Chess position 1 8.2. Chess position 2 8.3. Chess position 3 8.4. Chess position 4 8.5. Chess position 5 8.6. Chess position 6 8.7. Chess position 7 8.8. Chess position 8 8.9. Chess position 9 8.10. Chess position 10 List of tables

6.1. Cases presented for the expert in the eccentric film producer example 8.1. Global and visual explanatory primitives for chess 8.2. Variable declaration and tactical explanatory primitives for chess 8.3. The first three entered rules in a chess KB. 8.4. The pruning effect of two knowledge bases of different maturity "If your mind is empty, it is always ready for anything. In the beginner's mind there are many possibilities, in the expert's mind there are few"

Zen master Suzuki-roshi (from The Wish-Fulfilling Jewel) Chapter 1

Introduction

In its early days, the field of Artificial Intelligence (Al) had a strong emphasis on search methods. The perception was that clever search techniques would account for most intelligent activities. Early in the seventies, researchers realised that in many cases, it is not search which warrants the solution but rather knowledge about tasks and problems (Hayes-Roth, Waterman et al. 1983). As a consequence, the era of expert systems and knowledge-based systems began1• Early expert systems had a number of early successes, e.g. Mycin in the medical field (Shortliffe 1976), Dendral in biochemistry (Buchanan 1978), Rl for hardware systems configuration (McDermott 1982). As the era of expert systems evolved, it became clear that capturing human knowledge - domain specific knowledge - into knowledge bases is not an easy task. This complex task is known in AI as knowledge acquisition. In its early days, the difficulty of capturing expert knowledge became known as the Knowledge Acquisition bottleneck (Charniak and McDermott 1985).

1 Expert systems by definition address a particular domain, arguably this focus on individual domains

is a lesson from the sixties where the early enthusiasm for creating artificial intelligence (e.g. Natural

language understanding) was hampered by the realisation that the context of understanding is

extremely varying and intractable. Since the early 1970's, knowledge acquisition researchers proposed a number of approaches for rectifying the 'knowledge acquisition bottleneck' in building knowledge based systems. The traditional approaches of the 1970's and the 1980's focus on the structure of the domain of expertise and the expert knowledge, and formalise them before knowledge is 'transferred' from the expert to a knowledge base in a system. In the 1990's, a modem approach (Richards and Compton 1998; Tecuci 1998) emerged strongly influenced by the situated cognition view, which was largely popularised by Winograd and Flores (Winograd and Flores 1987). This approach emphasises the dependency between knowledge and context (situation), and it espouses to incremental construction - rather than transfer - of knowledge in the context of its use. In the next chapter, we will overview main stream knowledge acquisition approaches as they evolved over the past three decades. We first overview the goal of this thesis.

More than forty years passed since the birth of Artificial Intelligence (Al), and many features of intelligence cannot yet be described for machines to simulate. However, search remains an important component of problem solving in AI and, more generally, in computer science and engineering. Particularly, in closed well-defined domains like combinatorial optimisation, decision analysis, game playing, design processes and planning, search continues to play a key role.

In this thesis, we tackle the problem of acquiring human search knowledge to computers in closed well-defined search domains. We subscribe to the "modem" view of knowledge acquisition in which expertise is reconstructed in a situated context (rather than transferred). The work in this thesis builds on and extends the work presented in (Beydoun and Hoffmann 1997). It also substantially extends the knowledge acquisition framework of Ripple Down Rules discussed in (Compton and Jansen. 1990; Compton, Edwards et al. 1991; Compton, Edwards et al. 1992; Compton, Kang et al. 1993). This thesis presents and analyses a framework for incremental acquisition of expert search knowledge in the context of its use.

2 In the next section, we give a brief discussion of human searching and search methods in the field of Artificial Intelligence. This preludes an overview of the key contributions of this thesis.

1.1. Human searching and the field of Artificial Intelligence

It is well known that human experts are often surprisingly good in finding better solutions to a given optimisation problem than existing heuristic programs. For example, in the area of circuit design, many of the combinatorial problems for finding optimised designs are too hard to be solved optimally by automatic design programs. However, human engineers can often improve the designs generated automatically by a program with just moderate effort. In practice, human engineers intervene into the automatic design steps performed by programs in order to optimise the overall result. The efficient and effective solution of these problems- many of them are known to be NP-hard optimisation problems- has a major impact on today's economy, the environment as well as on technological advances.

The successful simulation of human problem solving, in regard to optimisation, would be of primary importance for significant improvements of computer use in most areas of today and tomorrow's computer applications. If the way skilled humans search for solutions can be simulated on computers, then computers will probably be able to perform the search process much faster than humans do. This is indeed the goal of the research presented in this thesis.

Many forward pruning search algorithms are inspired by human search methods, examples are: B*, SSS* or Conspiracy number searching (Pearl and Korf 1987). These search algorithms use an evaluation function to predict the best search paths to be taken. They are successful algorithms. However, the knowledge used in the evaluation function does not prune the search to the degree a human expert seems to be able to when zeroing in on critical search paths during her/his search. More importantly, this ability to zero in on critical features of these paths relies on a sort of global awareness (Dreyfus 1994) which takes into account his/her current progress in the search and all his/her past experiences. Obviously, the knowledge behind such awareness cannot be expressed using an evaluation function which only takes into

3 account a current search state. Thus far, the work done in developing forward pruning algorithms targets their heuristic adequacy i.e. the effectiveness of the search, and leaves them epistemologically inadequate because the evaluation function used does not and cannot accommodate human search knowledge.

1.2. Thesis contributions In this thesis, we present a framework to incrementally acquire expert search control knowledge. This search knowledge is used to forward prune a heuristic search. Our focus is the development of search heuristics through incremental knowledge acquisition. We believe that experts are usually able to explain their reasoning process on a particular problem instance in rather general terms that cover at least the given concrete next step in their search process. However, their explanation may be quite inaccurate, in the sense that for other search states their explanation would not deliver the search step they would actually take. Either their explanation would not cover the step they would take, or their explanation would suggest search steps they would actually not consider. Thus, we pursue an approach similar to (Hoffmann and Thakar 1991), which allows to incrementally acquire complex concept definitions without demanding an operational definition from the expert. Rather, the expert is merely required to judge whether the concept applies to particular instances. This is a much more natural task for an expert, than to articulate general rules on how to judge on any particular instance.

To capture and operationalise expert's conceptualisation of his/her knowledge incrementally, we conceive a new incremental acquisition framework which we call Nested Ripple Down Rules (NRDR). This is a key contribution of this thesis. Nested Ripple Down Rules (NRDR) allows the expert to give his/her explanations using his/her own terms. These terms are operational while still incomplete. To represent every term in NRDR, we use Ripple Down Rules (Compton and Jansen. 1990) as our rule-based concept representation. This allows experts to deal easily with exceptions, and to refine definition of their terms readily. With our NRDR, experts are enabled to describe salient features of search states using their own terms.

4 In this thesis, our new Nested Ripple Down Rules incremental knowledge acquisition framework is applied to search problems. However it is useable in any incremental knowledge acquisition application. Indeed some domains require the key strength of NRDR in capturing the domain model during the knowledge acquisition process. For example, in building intelligent web browsers, it is often difficult to figure out the classes of documents before actually doing a substantial amount of browsing and indexing. Further, the web is always changing in size and contents. It is not possible to have a fixed set of classes. Simultaneous modelling and knowledge acquisition as allowed by NRDR and advocated in (Beydoun and Hoffmann 1998) is critical in such areas of applications.

Our Nested RDR substantially extends the relatively recent incremental knowledge acquisition framework of RDR (Compton and Jansen. 1990; Compton, Edwards et al. 1991; Compton, Edwards et al. 1992; Compton, Kang et al. 1993). This significant development of the RDR technology is intended to ease the incremental knowledge acquisition process by allowing the expert's use of her/his own terms, and operationalising these terms while they are incomplete. However, as we will later discuss, NRDR also helps alleviate shortcomings of incremental knowledge acquisition with Ripple Down Rules. Furthermore, NRDR offers a modular incremental knowledge acquisition framework. This greatly enhances the readability and the reusability of a knowledge base. This is yet another significant contribution towards the incremental knowledge acquisition community.

In search applications, we conjecture that a search pruned by an incrementally developed NRDR knowledge base will resemble the human search process. We expect that our approach will also yield heuristically adequate algorithms: The human expert is able to make use of critical static features in a given search state. Because these features persist over much of the search process, recognising them gives the expert a quick foresight without having to follow useless search paths that change the critical static features. Moreover, such knowledge is easily acquired as this is what experts are normally eager to express in their thought traces, see examples in (Groot 1965; Groot 1966; Newell and Simon 1972).

5 Our framework adapts the incremental knowledge acquisition process to match the expert's natural tendencies in giving her/his explanations- by introducing his/her intermediate terms. We believe this enables the expert more easily to express his/her knowledge, and to build an operational knowledge base more effectively. Our approach is strongly influenced by the notion of the knowledge level suggested by Newell (Newell 1982). This is the idea to specify all necessary knowledge a system needs, at a level which, roughly speaking, corresponds to the level at which humans communicate their knowledge to their fellows. We developed a workbench- SmS (Smart Searcher)- to capture and operationalise expert search knowledge. SmS consults the expert on search problems, and it takes advice of what search steps should be taken in which search state. The results that we obtained are very encouraging. They provide evidence that our approach is suitable for a general framework of building high-performance problem-specific search heuristics for combinatorial problems. We do not claim that our workbench simulates the human search process, nor do we claim that we capture human sub-cognitive mechanisms to zero in on promising search paths. Our incremental knowledge acquisition approach is able to capture major parts of an expert's search knowledge, which are normally inaccessible by introspective means. That is, besides the explicit knowledge of what abstractions apply to a given case (fact), the whole of the knowledge acquisition process will also implicitly capture the expert's judgment (meta knowledge) when these abstractions are of relevance and when they should apply. This meta knowledge is often termed non-factual meta knowledge (Hoffmann 1998). We believe our approach is a promising methodology for developing efficient and effective heuristics with a little engineering effort. As search problems usually require individually tailored heuristics, and these are expensive to develop, the cost benefits of our approach are considerable, and therefore the significance of this work is obvious.

1.3. Thesis structure and layout

In this chapter, we outlined two key contributions of this thesis: The first is a general contribution of a framework for using incremental knowledge acquisition to acquire search heuristic knowledge directly from experts (Beydoun and Hoffmann 1997; Beydoun and Hoffmann 1998; Beydoun and Hoffmann 1998). The second more specific contribution is our new Nested Ripple Down Rule (Beydoun and Hoffmann

6 1998) incremental knowledge acquisition framework which we use to capture and represent expert search knowledge. However, this framework is useable outside its intended application of acquiring search heuristic knowledge. It is a significant contribution in that it widens the scope of incremental knowledge acquisition to applications where domain models may need to be evolved during the knowledge acquisition process e.g. intelligent web browsing. Further, it alleviates modelling difficulties in some domains which are difficult to model e.g. Search, design, ... In such domains, it may be more economical to model during the actual knowledge acquisition process as facilitated by our Nested RDR. On a technical note, Nested Ripple Down Rules framework also addresses shortcomings of incremental knowledge acquisition with Ripple Down Rules. These have been extensively analysed and discussed in the literature e.g. (Compton, Edwards et al. 1991; Richards, Chellen et al. 1996; Richards and Compton 1997; Richards and Compton 1997; Beydoun and Hoffmann 1998; Suryanto, Richards et al. 1999).

To put the general contribution of this thesis in context, in chapter 2 we provide a quick review of the knowledge acquisition (KA) literature, and we discuss the importance of incremental knowledge acquisition in the KA field in general. To put our NRDR framework contribution in context, in chapter 3 we review the literature concerned with Ripple Down Rules applications e.g. (Compton, Ramadan et al. 1998), their shortcomings e.g. (Compton, Edwards et al. 1991; Compton, Edwards et al. 1992) and tools to enhance their applicability e.g. (Richards and Compton 1997). This will foreshadow advantages of our Nested Ripple Down Rules framework over existing RDR based frameworks. It will also highlight its specific contributions to the incremental knowledge acquisition community. Specifically, this is the way in which they rectify some RDR shortcomings.

Chapter 4 introduces our formal framework for dealing with Ripple Down Rule structures in general and NRDR in particular. This framework extends the work done in (Kivinen, Mannila et al. 1992; Kivinen, Mannila et al. 1993; Scheffer 1995; Scheffer 1996; Beydoun and Hoffmann 1998). It formalises semantics of RDR. It also provides a framework for describing the structure of an RDR knowledge base, and the efficiency of the actual knowledge acquisition process. Based on this framework, we explain empirical results in (Compton, Edwards et al. 1992; Compton, Preston et al.

7 1995). This theoretical framework will later be used to analyse the development of Nested Ripple Down Rules. This will show that the incremental development of a knowledge base in our workbench SmS can be efficiently carried out.

Chapter 5 presents a general analysis of search knowledge. This is followed by a presentation of a set of requirements for a knowledge acquisition environment to capture this knowledge. This provides a background for understanding the architecture of our workbench SmS which is described in detail in chapter 5. We also discuss the reusability of this workbench. Steps involved in adapting SmS to different domains are described. This description is later used to give a rough characterisation of domains for which SmS is effective. The reusability of our workbench depends largely on whether or not a knowledge base can be incrementally developed in an economic way through direct interactions with an expert. Chapter 6 and 7 of this thesis argue to this effect.

Chapter 6 presents our knowledge acquisition framework of NRDR, and it compares it with the existing framework of Ripple Down Rules. Development policies of an NRDR knowledge base are introduced to ensure that this development remains economical. The use of Nested RDR is demonstrated by an example knowledge acquisition session. We also discuss their validity from an epistemological and philosophical perspective.

In chapter 7, based on the theoretical framework of chapter 4, we show that the cost of developing Nested Ripple Down Rules knowledge bases, in terms of the number of interactions with an expert, is at most similar to that of developing RDR knowledge bases. This development is also shown to converge. That is, we show that a Nested RDR knowledge base converges during the incremental knowledge acquisition process. This convergence will be shown to be at least as fast as an RDR knowledge base convergence. Moreover, this will clearly show that advantages of Nested RDR over RDR do not impose any extra effort on an expert interacting with the knowledge base.

In chapter 8, our interaction language SKIL (Search Knowledge Interaction Language) which is used in SmS is discussed. A case study of adapting SmS to

8 acquire expert search knowledge in chess is presented. We describe the adaptation of SmS to this domain. This includes the use of NRDR to develop chess knowledge bases. This development is illustrated. It highlights features of incremental knowledge acquisition specific to search and NRDR. It also shows the effectiveness of our approach to build domain specific search heuristics. This is followed by a discussion of the case study in the chess domain.

In chapter 9, we further discuss the case study of chapter 8 in light of the knowledge level idea. We also present a general discussion of our approach and its limitations. This will provide a characterisation of domains for which our approach is most effective.

Finally, in chapter 10 we summarise the contributions of this thesis. We conclude with outlining possible directions for future development of the research presented in this thesis.

1.4. Chapter summary

This chapter highlighted the goal of this thesis, which is the incremental acquisition of human search heuristic knowledge. We also highlighted contributions of this thesis working towards this goal. Our general contribution is that of providing a methodology of building effective search heuristic more economically than existing engineering oriented approaches. A substantial aspect of this contribution is due to our new incremental knowledge acquisition framework of Nested Ripple Down Rules (NRDR). This allows direct interactions with an expert who is responsible for building and editing a knowledge base much in the same way an author is responsible for writing and editing a book. NRDR facilitates expert's interactions greatly because it allows the expert to use her/his own vocabulary during the knowledge acquisition process.

To highlight the significance of incremental knowledge acquisition in general and Ripple Down Rules in particular, we now tum our attention to a quick review of the knowledge acquisition literature. This will be followed by a review of the Ripple Down Rules literature. This will later highlight the significance of our knowledge

9 representation formalism Nested Ripple Down Rules (NRDR) specifically to the incremental knowledge acquisition community.

10 Chapter 2

Overview of the knowledge acquisition field

Expert systems and knowledge based systems are among the most successful and significant developments in artificial intelligence. They are powerful computer programs capable of representing and applying knowledge about a specific area of expertise. They replicate expensive and rare human expertise. A major impediment to their development has been the capture of this human expertise. The field of Knowledge Acquisition ( KA) has emerged as response to deal with this impediment.

In this chapter, we overview the development of the knowledge acquisition field over the past three decades. Our focus in this thesis is on knowledge acquisition issues that involve human experts directly. Contributions of this thesis are not closely relevant to automated knowledge acquisition, often termed machine learning. Thus for the sake

11 of succinctness, we do not overview recent machine leaming1 advances. In this chapter, we also outline our reasons for choosing incremental knowledge acquisition with Ripple Down Rules as a starting point for our knowledge acquisition framework.

The major streams in knowledge acquisition date from the mid 70's to the late 90's. Taxonomies of knowledge acquisition approaches are varying and inconsistent e.g. see (Boose 1991 ). For our quick overview, we find a chronological presentation of knowledge acquisition (KA) approaches more illustrative.

2.1. KA efforts in 1970's The 1970's knowledge acquisition research efforts were based on human interviewing. In these early efforts, knowledge from experts was elicited by a knowledge engineer. This knowledge engineer is a person knowledgeable of the programming environment. S/he would encode the expert's knowledge following a series of interviews. Examples of successful systems in this era are: DENDRAL to find chemical structures in unknown compounds (Buchanan 1978), Rl to configure VAX computers (McDermott 1980) and MYCIN in the medical domain (Shortliffe 1976).

This early period of building expert systems highlighted communication difficulties between knowledge engineers and experts. These difficulties were partly due to the nature of expertise. An expert's introspection does not give full access to expertise. It only allows the expert to express the factual part of his/her knowledge (Boose 1991; Hoffmann 1998). Moreover, the interviewing process itself was not effective. The following are the most commonly identified problems in the interviewing process between a knowledge engineer and an expert:

- Novice knowledge is often obtained instead of expert knowledge (Boose and Gaines 1989). This should not be surprising, as the knowledge engineer is

1 In machine learning expertise is captured by the machine without human expert intervention. This approach is data driven. A program attempts to build a knowledge base through a given set of examples. For details of recent machine learning advances, the interested reader can refer to (Hoffmann, 1998) for a good review of these. From this point onward, when we use the term "Knowledge Acquisition" we will implicitly mean that a human expert is involved.

12 not in a position to distinguish between the two. S/he is not supposed to be an expert in the domain for which a knowledge base is sought.

- Analysing the result of the interview is labour intensive. Errors can be introduced during the analysis following the interview. Consequently, programmers (knowledge engineers) may incorrectly implement expert's advice.

- Knowledge engineers were not necessarily trained in interviewing techniques. They were essentially computer programmers.

- Experts can neglect to express rules to cover all of the special cases that arise.

- Experts may be insecure and reluctant to give away their domain knowledge.

Later discoveries of undesirable behaviour by the expert system- due to any one (or a combination of) the above limitations- may motivate changes to the expert system. Unfortunately, these changes require programmer intervention and may take significant time and effort (Klahr and Waterman 1986). Hence, avoiding errors during the interview stage is very important to keep the task of developing an expert system economically feasible. Towards this, knowledge acquisition research in the 80's focussed on making the interviewing process more effective.

2.2. KA efforts in 1980's The interviewing process in the 70's has been largely unstructured. Researchers of the 80's saw this lack of structure an important reason why knowledge acquisition has been perceived as a 'bottleneck' (Hayes-Roth, Waterman et al. 1983; Hoffman 1990). Consequently, in the 1980's, a lot of work was done to improve the effectiveness of the interviewing process. Techniques based on experimental psychological theories were developed to facilitate knowledge elicitation from the experts. The assumption was that the more the knowledge acquisition environment is adjustable to the way of experts thinking and tendencies, the more effective it becomes in capturing the

13 expert's knowledge. The Psychological personal construct theory of Kelly (Kelly 1970) and the assimilation theory of Ausbel (Ausbel, Novak et al. 1978) led to the development of a number of tools to acquire expert knowledge.

The Personal construct theory is a formal model of organisation of human cognitive processes. The basic units of analysis are called 'constructs'. These units are attributes whose different values can distinguish a sub-group of objects from another one. Each expert possesses his/her own constructs with respect to his/her domain of expertise (Kelly 1970). Constructs are difficult for the expert to present systematically and explicitly. In systems using Kelly's psychological construct theory a knowledge engineer elicits these constructs (Shaw 1980). Examples of such systems: Expertise Transfer System (ETS) (Boose 1984), KRITON (Linster 1988), KITTEN (Shaw and Gaines 1988). Like the personal construct theory, the assimilation theory of Ausbel (Ausbel, Novak et al. 1978) attempts to construct a model of the representation underlying the human cognitive processes. Examples of systems based on this theory are: NICOD (Ford 1987), ICONKAT (Hoffman 1990), NANOKLAUS (Hass and Hendrix 1983). Structured query techniques to facilitate interactions between knowledge engineers and experts were also developed eg (LaFrance 1988).

In pursuit of improving the effectiveness of the interviewing processes, researchers in the 80's came to the important realisation that the difference between experts and novices is not only in the amount of what they know. Research showed that there are also qualitative differences between experts and novices (LaFrance 1990). Experts not only have more stored information, they also have it organised into more structurally and hierarchically meaningful patterns than novices. The psychological aspects of knowledge acquisition in the 80's emphasised this human element and accounted for those differences. This eventually translated to a realisation that improved expert system explanation, knowledge acquisition and maintenance depend primarily on abstracted descriptions of the content of knowledge bases and secondarily on the actual representation used (e.g using frames versus rules) (Clancey 1989). This view was explicitly articulated and popularised by the notion of the knowledge level introduced by Newell in (Newell 1982). This notion describes a new layer of abstraction in building computer systems. At this level knowledge is functionally expressed independently of the underlying implementation level of the symbolic level.

14 This idea of the knowledge level received considerable attention within the AI community, and it has been extensively discussed e.g. (Clancey 1989; Newell 1993; Beydoun and Hoffmann 1998).

With this paradigm shift, the view that knowledge acquisition is a bottleneck has changed in the 80's. Knowledge acquisition has come to be seen as a modelling activity. Clancey explicitly attacks this metaphor of bottleneck in (Clancey 1989) (page 39): "The "knowledge acquisition bottleneck" is a wrong and misleading metaphor, suggesting that the problem is to squeeze a large amount of already­ formed concepts and relations through a narrow channel; the metaphor seriously misconstrues the theory formation process of computer modelling .. ". Knowledge acquisition research of the 90's is underpinned by the view that the process of the knowledge acquisition is a process aiming to model the world and the problem solving, and these two are viewed as intertwined.

2.3. KA efforts in 1990's As confidence in expert systems grew, researchers wanted to use them for increasingly complex domains. In complicated domains the interviewing and modelling tools of the 80's are not enough to resolve the intricate concepts and relations of the relevant domains. Researchers started pursuing software engineering approaches to knowledge acquisition. This research has two directions: a domain independent and a domain dependent one. We overview both streams briefly. We then discuss the view of situated cognition in which the environment (situations) of usage of expert systems must play a key role in their design. We then discuss the impact of this view on modem knowledge acquisition approaches. We outline the context of Ripple Down Rules (RDR) incremental approach, in the Knowledge Acquisition community from this perspective.

15 2.3.1. Domain independent knowledge analysis In domain independent knowledge analysis there are two important streams: Ontological analysis and the structured knowledge base methodology known as KADS (Knowledge Analysis and Documentation System).

2.3.1.1. Ontological analysis In philosophy 'ontology' relates to existence and nature of being. In knowledge acquisition, ontology is often defined as an explicit knowledge level specification of a conceptualisation (Heijst, Screiber et al. 1997). Ontological analysis research aims to sorting out the primitive knowledge entities and relationships among them e.g. (Oussalah and Messaadia 1999). Ontologies allow obtaining reusable knowledge based components. For example different problem solving methods (PSMs) can be used on the same ontology e.g. (Gomez-Perez and Rojas- 1999). When an ontology is defined, building a knowledge based system (KBS) may be as simple as choosing the effective PSM from an existing library (see section 2.4.2. for a discussion of PSM's). Conceptual graphs (Sowa 1984) are a popular way to represent domain ontologies. Associated graphical tools are often used to develop them e.g. see (Shahar and Cheng 1998; Thanitsukkam and Finkelstein 1998).

2.3.1.2. KADS (Knowledge Analysis and Documentation System) KADS is a structured methodology to develop knowledge based systems (expert systems). It aims to model system at the knowledge level. It is now in practical use in Europe and elsewhere. The methodology arose as a response to the ineffective rapid prototyping approach of the early 80's. KADS is essentially a top down software engineering approach towards building a knowledge based system. It divides the process into layers of analysis which eventually yield an appropriate problem solving method and an effective domain model. There are five layers of analysis and development (Schreiber, Wielinga et al. 1993):

I. Knowledge Identification: At this stage everything that the expert tells about the domain is recorded in a linguistic form.

16 II. Knowledge Conceptualisation: Knowledge recorded during stage I is formalized in conceptual models. If more than one expert is used in stage I, then their knowledge is unified within a single conceptual framework.

ill. Epistemological Analysis: Different concepts from stage II are given structural properties e.g. types, relations and hierarchies.

IV. Logical Analysis: The structured knowledge of stage III is abstracted further to allow logical inference over it.

V. Implementational Analysis: A symbolic representation is sought at this stage.

The above five stages are a coarse description of the methodology. For example the knowledge conceptualisation stage is a complicated process which involves four layers of abstractions (Schreiber, Wielinga et al. 1993). For a detailed description of these stages (Wielinga, Schreiber et al. 1992) is a good source.

2.3.2. Domain dependent knowledge modelling

The domain dependent knowledge acquisition paradigm requires an explicit model for every domain to guide the expert in expressing his/her knowledge. Hence for a general knowledge acquisition environment, this approach requires a library of models to be constructed before knowledge acquisition can take place. In order to get more models and enable indexing and retrieval of these models, meta models are pursued in many researches. Any given model is a kind of knowledge. To get those meta models, classification of types of knowledge is often pursued. This approach has been pursued in CommonKADS (Schreiber, Wielinga et al. 1994) and in Generic Tasks (Chandrasekaran, Johnson et al. 1992). It has also been pursued by many researchers in the European knowledge acquisition community under the umbrella of Problem Solving Methods (PSM) research e.g. (Benjamins 1995; Fensel 1997; Fensel, Benjamins et al. 1999).

17 Knowledge analysis and knowledge modelling do not replace each other. They are often supplementary. Patel offers a way to connect the two (Patel 1991). Using our knowledge representation formalism of Nested Ripple Down Rules- to be outlined in the next chapter- we suggest that it is possible to do both simultaneously (Beydoun and Hoffmann 1998). Fensel (Fensel, Motta et al. 1997) provides a way of adapting problem solving methods for closely related domains using a brief knowledge analysis stage.

2.4. The situated cognition view of knowledge In the previous sections, we provided a brief overview of the evolution of Knowledge Acquisition (KA) research. The largest conceptual leap from the 70's into the 80's was the view that KA should be seen as a modelling activity rather than a transfer of knowledge from the expert's mind to the computer system. This conceptual leap has made an impact on the direction of KA research in the late 80's and throughout the 90's. In our view, the next conceptual leap in knowledge acquisition is the "situated cognition" view of knowledge. This advocates that the model captured by the Knowledge Based System (KBS) should also incorporate the environment in which the system operates. In (Clancey 1989; Clancey 1993), William Clancey discusses the influence of the environment on expert decisions. He reinterprets the knowledge level idea to include the expert's interactions with his/her environment, that is the environment where the knowledge based system is to be used. From this perspective he explicates three guidelines to reinterpret the knowledge level (Clancey 1989):

1. Knowledge engineering is modelling systems in the world (not how people think).

2. Knowledge level analysis is how observers describe and explain recurrent behaviours of a system embedded in a given environment.

3. The knowledge level interactions are a combination of expert's point of views and his/her interactions with the physical world, and the knowledge engineer (with his/her own interpretations) and the representation environment.

18 Compton also shares the above view, in (Compton and Jansen. 1990) (p241) he states: " .. The knowledge engineer finds that the expert's knowledge is not so much recalled, but to a greater or lesser degree 'made up' as the occasion demands".

The view that environment (and history for this matter) is intertwined with knowledge is extensively discussed in philosophy. Philosophers like Putnam (Putnam 1988) adopt this view as their basis on attacking mentalism (the view that knowledge resides in the mind). Quine (Quine 1951) expresses a similar view in attacking logical positivism (the view that all knowledge can be expressed in terms of atomic facts that can be verified empirically). Only fairly recently this view, that the environment in which knowledge is being used is intertwined with knowledge, has explicitly influenced the development of actual knowledge based systems. In some developments the adoption of this situated cognition view is implicit within the paradigm used. Examples of this are approaches which come under the umbrella of apprentice learning where the knowledge base is seen as an apprentice learning from an expert interacting with a given environment e.g. (Donoho and Wilkins 1994; Tecuci 1998). In other developments, the adoption of the situated cognition view is made explicit and drives the actual update methodology of the knowledge base e.g. (Compton and Jansen. 1990; Compton, Edwards et al. 1991; Richards and Compton 1998).

Ripple Down Rules is an incremental knowledge acquisition methodology motivated explicitly by the situated cognition view of knowledge (Richards and Compton 1998). It is the starting point for our new knowledge acquisition formalism of Nested Ripple Down Rules (NRDR). In the next section, we give an overview of the motivations behind Ripple Down Rules and discuss our reasons for using them as a starting point for developing our knowledge acquisition formalism of NRDR which we employ to capture human search knowledge interactively.

Ripple Down Rules

The Ripple Down Rule approach (RDR) is a recent development which focuses on easing the use of the expert system shell to a degree where a knowledge engineer becomes redundant. This RDR framework originates in the medical system GARY AN-ESl (Compton, Hom et al. 1989). From this work, it was evident that

19 experts provided justifications for their judgments rather than actual explanations. These justifications were strongly influenced by the context in which experts have provided them (Compton and Jansen. 1990). Experts can justify a situated decision, but they cannot explicate all the underlying assumptions that led him/her to make the decision in the situation. We believe that the collection of these justifications captured by the knowledge base, capture the domain model implicitly during its incremental development. Furthermore, this captured model includes a combination of the expert and the domain, as interactions between these two drive the knowledge acquisition process.

As we outlined in chapter 1, the goal of the research presented in this thesis is to provide a knowledge acquisition framework to capture and operationalise human search expertise. For developing this framework, we choose Ripple Down Rules as a starting point, because we do not expect the expert to give a full explanation for his/her decisions in a given search domain. As we discussed in chapter 1, part of this search knowledge is not available for introspection. This knowledge surfaces into consciousness in the form of justifications that are valid for the context on hand. Certainly these justifications will generalise to some situations outside the search context on hand. However, the most likely scenario is that some unseen contexts will not be covered by expert's justifications. Therefore it is mandatory to see the search knowledge base as always incomplete and open for future amendments as new situations arise. Furthermore, this incremental requirement mandates that maintenance of the knowledge base is affordable. This is the case for the RDR approach. In RDR the knowledge acquisition is economical. The expert is responsible for the construction of the knowledge base in the same way as the author of a textbook is responsible for its writing and editing. An expert adds rules. S/he never deletes rules. Moreover, adding new rules is a simple process which is possible by the expert himself without the need for a knowledge engineer (Compton, Edwards et al. 1992; Compton, Kang et al. 1993). In chapter 5, we will expand the above discussion to outline a comprehensive set of requirements that we take into account in our framework for acquiring search knowledge. Those requirements will form the basis for our knowledge acquisition framework, and they will highlight the need for our substantial extension to the RDR framework to deal with capturing search heuristic

20 knowledge. Our substantial extension results in our new knowledge acquisition framework of Nested Ripple Down Rules (NRDR).

2.5. Chapter summary This chapter presented an overview of the development of the knowledge acquisition field. This field is concerned with facilitating building knowledge bases for expert systems. We discussed developments from the early 70's, where the focus was on interviewing experts then encoding their knowledge by a knowledge engineer, to the 90's where knowledge acquisition is seen as a modelling activity. The widely accepted situated cognition view of nowadays suggests that, an expert's domain model and his/her interactions with her/his environment, are included in the model captured by a knowledge based system. Ripple Down Rules (RDR) is a modem knowledge acquisition framework based on this view. We adopt RDR as a starting point to develop our own knowledge acquisition framework. Our reasons for choosing RDR as this starting point have been outlined in this chapter. In brief, they were three: The RDR framework provides inexpensive development of a knowledge base. Secondly, the framework allows for direct interactivity with an expert. This is important, as we view the search knowledge base always incomplete, and its continuous maintenance is therefore needed. Thirdly, the framework accounts for knowledge not available for introspection.

The Ripple Down Rules (RDR) framework received considerable attention within the knowledge acquisition community. Numerous knowledge acquisition researchers in Europe (K.ivinen, Mannila et al. 1993; Scheffer 1995; Martinez-Bejar, Benjamins et al. 1997), North America e.g. (Gaines 1991), Australia e.g. (Richards and Compton 1997; Compton, Ramadan et al. 1998; Beydoun and Hoffmann 1999) and Asia e.g. (Catlett 1992; Wada, Horiuchi et al. 1998) contributed to their analysis and further development throughout the 90's. RDR is the starting point for our knowledge acquisition framework. Our Nested Ripple Down Rule (NRDR) framework extends the existing RDR framework and addresses some of their shortcomings. Further, we do not sacrifice any RDR advantages- in particular we preserve their ease of maintenance. Clearly, research work done on RDR is relevant as a background for understanding the research context of this thesis. We dedicate the next chapter of this thesis to overview this research. We also give an overview of our Nested Ripple

21 Down Rules (NRDR) framework which we utilise to capture human search knowledge.

22 Chapter 3 The state of the art of Ripple Down Rules

This chapter presents the knowledge acquisition framework of Ripple Down Rules. It discusses and analyses its key strengths. A comparison to few other incremental knowledge acquisition approaches is also made. We also survey and analyse some of its applications and extensions.

In extending the Ripple Down Rules (RDR) approach to deal with search domains, we addressed some key concerns of the RDR community. In particular we addressed issues of lack of explicit modelling and readability, lack of modularity, and repetition. In this chapter, we present and analyse research efforts which dealt with some of these concerns in the past few years. We present our Nested Ripple Down Rules (NRDR) incremental knowledge acquisition framework and we contrast it with those efforts. This will highlight the advancement of the state of the art made by our Nested RDR framework.

Lastly, there have been little analysis of the theoretical aspects of Ripple Down Rules. We overview the most notable research papers in this regard. We also report on other state of the art of RDR related research.

23 This chapter is organised as follows: Section 3.1 introduces the RDR methodology. Section 3.2 describes their technical details. Section 3.3 discusses a recent development of RDR: Multiple Classification RDR (MCRDR). Section 3.4 introduces our Nested RDR (NRDR). NRDR's contribution to the RDR research is discussed in section 3.5. We complete the survey of RDR related research in section 3.6 by reviewing two smaller RDR research areas: Fuzzy RDR and theoretical efforts in analysing RDR.

We first highlight the strength of incremental knowledge acquisition with Ripple Down Rule from a knowledge based system design perspective.

3.1. Incremental KA with RDR from a system's development perspective The development of knowledge bases (KB) is normally divided into several phases. Commonly distinguished phases are: requirement definition, analysis, design and implementation (Parpola 1998). A phase of maintenance is also needed to ensure the continuous applicability of the knowledge base. In many domains, the design stage may cause difficulties as a concrete model may not be available or modelling may be too expensive. Incremental construction of knowledge bases with RDR omits the design stage and merges the implementation and maintenance stages seemlessly. This is particularly important, as experience has shown that the maintenance phase is critical for the continuous deployment of an expert system. Many systems have become obsolete when maintenance became too difficult (Arinze 1989; Brown 1989; Stumptner 1997). Incremental knowledge acquisition (KA) also minimises problems which arise because of jumping between phases. Two common examples of such problems are: First is duplication of effort when the developer looses track of continuation points in the stage s/he is in, or when a team of developers is involved in the development and some members redo the work of other members. Second is loss of effort when previous stages have to be redone (Parpola 1998). Incremental construction and refinement of knowledge bases using RDR have proved successful in building many useful applications (Compton and Jansen 1988; Compton, Edwards et al. 1992; Shiraz and Sammut 1997; Compton, Ramadan et al. 1998). Some of these applications will be closely reviewed later in this chapter.

24 3.2. RDR basics

"If your mind is empty, it is always ready for anything. In the beginner's mind there are many possibilities, in the expert's mind there are few" Zen master Suzuki-roshi (from The Wish-Fulfilling Jewel)

RDR is the result of work to simplify knowledge engineering. With RDR, knowledge maintenance is a simple process which can be done by a domain expert without a knowledge engineer. Developing RDR knowledge bases relies on incremental refinement of acquired domain expert's knowledge. RDR is founded on the realisation that experts do not offer explanations of why they made a decision, rather they offer a justification based on the situation (context) (Compton and Jansen. 1990). RDR incorporates the idea introduced by Davis (Davis 1979) in TEIRESIAS of knowledge acquisition within the context of a shortcoming of the knowledge base. In TEIRESIAS, the task of localising the failure was left to the expert. In RDR, this task is automated as in (Craw and Boswell 1999). This can be seen by looking at the structure of an RDR knowledge base and its update mechanisms.

An RDR knowledge base is a collection of simple rules organised in a binary tree structure. Every rule can have two branches to two other rules: A false and a true branch (an exception branch). An example RDR tree is shown in figure 3.1. When a rule applies a true branch is taken, otherwise a false branch is taken. The root node of an RDR tree contains the default rule whose condition is always satisfied. The root node is of the form "If true then default conclusion". The default rule has only a true­ branch.

In an RDR knowledge base, if a 'true-branch' leads to a terminal node t, and the condition of t is not fulfilled then the conclusion of the rule in the parent node of t is taken. In other words, if the conclusion of an exception rule ('true-branch' child rule) is satisfied, it overrides the conclusion of its parent rule. If a 'false-branch' leads to a terminal node t, and the condition of t is not fulfilled, then the conclusion of the last rule satisfied 'rippling down' to t is returned by the knowledge base. Any classification starts at the root node. The described conditional branching is repeated

25 until a leaf node is reached. The knowledge base is guaranteed to return a conclusion as at least the default rule is satisfied when the leaf node t is reached. When the expert disagrees with the conclusion returned by the knowledge base, the knowledge base is said to fail and requires modification.

The key strength of the RDR knowledge acquisition framework is the fact that the knowledge base can be easily modified. This strength has two reasons: Firstly the cause of failure of the knowledge base is automatically determined due to the tree like structure of the knowledge base. A new rule is always added as a leaf node. It is attached to the last visited rule before the knowledge base failed (See above for conditions of failure). As we earlier indicated, this supersedes early approaches for incremental knowledge acquisition, where an expert was required to locate the cause of failure of the knowledge base before suggesting a repair (Davis 1979; Davis and Lenat 1982). The second reason for the ease of maintenance of an RDR knowledge

Rule 5: Rule 0: ff-true Ru le I: II-true Rule 4: II-flue true then deflllllt k.l then conc4 lf a.bthen cone\ h then conc2

It-false

Rule 2: If-true Ru le 6: e.fthen cone) d then conc5

/I-false It-false Rule 7: z.y then conc4 Rule J: a.c: then conc7

Figure 3.1. A single classification Ripple Down Rule tree. A case to be classified starts at the root default node and ripples its way down to a leaf node. The conclusion returned by the knowledge base is the conclusion of the last satisfied rule in the path to a leaf node.

base is the following: The framework ensures newly added rules make the knowledge base consistent with a new case, without becoming inconsistent with previously

classified cases 1• This is because every time a rule r is added to a parent rule p, r

1 This depends on whether all past seen cases or only the so-called 'corner stone' cases are used to

maintain the knowledge base. In case, where only corner stone cases are used, a modification may have

26 classifies the case which triggered its addition (the so-called corner stone case) correctly, and excludes all cases which are correctly classified by p. In their simple form, RDRs use simple attribute-value combinations as conditions for the rules (Compton and Jansen. 1990; Gaines 1991; Compton, Kang et al. 1993; Shiraz and Sammut 1997). When the expert enters a new rule r, s/he chooses conditions for r from the so-called 'difference list' (Compton and Jansen. 1990). This list contains attributes satisfied by the case which triggered addition or r, and it excludes all attributes satisfied by any of the cases covered by the parent of r.

As we discussed earlier, RDR is founded on the realisation that experts do not offer explanations of why they made a decision, rather they offer a justification based on the situation (context) (Compton and Jansen. 1990). In reference to the above, every rule added to the knowledge base is a justification for the corner stone case classification given by the expert. RDR update policies follow the idea that when a knowledge based system makes an incorrect conclusion, a new rule r that is added to correct that conclusion, should only be used in the same context in which the mistake was made (Compton and Jansen. 1990). In RDR, this context is represented by the sequence of rules that were evaluated leading to a wrong conclusion which caused the addition of r. Rules are attached to such sequences of rules. Hence, rules are only added in the context of their application. An added rule r satisfies the case for which the original sequence failed, and it excludes all cases covered by its predecessor rule. The strength of the approach is that rules are never corrected or changed because corrections are contained in rules added on to the end (Compton and Jansen 1988). Corrections entered by the expert are always guaranteed to be valid. That is, corrections classify new cases correctly without making the knowledge base inconsistent with respect to past seen cases. This is because of the way conditions of new rules are chosen (see above).

Ripple Down Rules knowledge bases were successfully used in a medical expert system known as PEIRS (Compton, Edwards et al. 1992). This system was in routine

a very small effect - very much negligible - on past seen cases. Why this is the case is discussed in

details once the theoretical framework of RDR is presented in chapter 4.

27 use at St Vincent's Hospital in Sydney until mid-90's. It was used to to provide clinical interpretations from pathology reports. The system went in routine use with about 200 rules and developed to 2000 rules during use. That is, maintenance and use were overlapping. It was reported in (Kang, Compton et al. 1998) that it took 15 minutes per day for extending the system. This task was undertaken by a resident pathologist without the use of a knowledge engineer. It resulted on average of only 2 to 3 rules per day (Edwards 1996). The resultant average of 10 rules per hour is extremely economical. Most importantly, the time required to add a rule was independent of the size of the knowledge base.

Ripple Down Rules were also successfully used by Shiraz to acquire complex control knowledge (Shiraz 1998). Shiraz combined machine learning techniques and Ripple Down Rules to capture human pilot skills to fly and land a single engine plane. The high level control was captured explicitly by a Ripple Down Rule knowledge base, and the sub-cognitive skills were captured by a machine learner component in the system.

3.3. From RDR to Multiple Classification RDR (MCRDR) In the medical domain, as in PEIRS, a patient may have more than one disease. In PEIRS this was handled by treating the situation as a compounded disease. This obviously leads to a combinatorial explosion to the number of classes in the domain. This can increase the size of the knowledge acqusition task exponentially (Kang, Compton et al. 1998). Alternatively, multiple RDR knowledge bases can be used and developed simultaneusly. This obviously increases greatly the burden of the knowledge acquisition task. Indeed this approach was used by Shiraz (Shiraz 1998). He used four different knowledge bases, one for each control action in flying a plane (Elevator, Flaps, Roll and Throtle controls).

In response to the above limitation of RDR, Multiple Classification RDR (MCRDR) were developed by Kang and Compton (Kang, Compton et al. 1998). The essence of MCRDR is that it allows multiple refinements for a rule rather than a single refinement as it is the case with single classfication RDR. This allows the possibility of multiple conclusions for a single case.

28 An MCRDR knowledge base is unlike RDR, it is not a binary tree structure. In MCRDR an n-ary structure is developed (see figure 3.2). Every rule can only have exception branches. When a rule is satisfied, all its children are evaluated. A case does not necessarily reach a leaf node. The conclusions returned by the knowledge base are those of the last rules in every path taken in the tree. That is, the children conclusions replace the parents conclusions.

Unlike single classification RDR, in MCRDR there may be multiple cornerstone cases that must be distinguished from the case that is causing a modification to the knowledge base. This is achieved by presenting to the expert one cornertsone case at a time. S/he adds extra conditions to a new rule until all cornerstone cases are excluded. Kang et al (Kang, Compton et al. 1998) report that a new rule is made precise usually after 2 or 3 cornerstone cases. In other words, the expert is able to make effective generalisation quickly by looking at few examples.

Rule l: Rule 4: Rule 5: k.1 tnen conc4 a.b th.en cone l nth.en con c2

Rule Ii: RuleO: d then conc5 lftrue tnen dflfault

Rule 7: z.y then conc4 Rule J: a.ctnenconc7

Figure 3.2: An MCRDR knowledge base. All branches in an MCRDR KB are exception branches (if­ true branches). For a case {e,f,d,z,y} the above knowledge base would return two conclusions: concS and conc4.

Studies reported in (Kang 1996; Kang, Compton et al. 1998) have shown that an expert using MCRDR, can produce more compact knowledge bases faster than RDR. This is probably because the expert is made to work harder than in RDR. Multiple Classification RDR (MCRDR) were successfully used in configuration tasks (Ramadan, Mulholland et al. 1997; Compton, Ramadan et al. 1998). This

29 configuration task was to develop an expert system adviser for Ion Chromatography methods. The actual task is to determine chemical components and the amounts needed for a specific ion separation task. MCRDR is used to select components for the task. An MCRDR knowledge base is used to choose values for different component. If more than one value for a component is chosen, then this component is left untouched (We think this is a weak point of the approach, as no search or backtracking is used to look at more than one possibility). Further interactions with the expert are initiated to resolve situations where a component is left unconfigured, or the configuration obtained using the knowledge base is incorrect. Search operators are implicit in the inference of MCRDR - some conclusions have implicit search operators. 'Action rules' are used to represent such conclusions (Compton and Richards 1999).

As we will discuss in chapter 6, we have single conclusions in our system SmS. The diversity of conclusions is simulated by the diversity of search operators. SmS has a search engine which allows jumping to different parts of the search space. MCRDR configuration (Compton, Ramadan et al. 1998) does not allow this. We believe this ability to re-visit parts of the search space is critical to allow solving more complex problems e.g. search heuristic problems. Similar research to (Compton, Ramadan et al. 1998) in applying Multiple Classification Ripple Down Rules to another configuration problem (Richards and Compton 1999) is currently under way. The task in (Richards and Compton 1999) is a room allocation problem known in the knowledge acquisition community as Sysiphus I. This research is looking at incorporating backtracking while looking for a solution. A limitation of the work presented in (Richards and Compton 1999) is that it does not allow the expert to express relations between cases. As we will later see in this thesis, our framework allows this by providing the expert access to the ongoing search process for him/her to comment on.

In the rest of this chapter, we will introduce our Nested Ripple Down Rules (NRDR). This will highlight contributions of Nested RDR to the incremental knowledge acquisition community which will be described in detail later in this chapter. In particular the way our Nested RDR addresses some of the limitations of RDR will be

30 highlighted. Nested RDR are a substantial contribution of this thesis. They are only briefly overviewed in this chapter. We dedicate chapter 6 to describe their technical details. Chapter 6 will also discuss their philosophical underpinning.

3.4. Nested Ripple Down Rules (NRDR) Single classification RDR and MCRDR fail to provide end-users with both the structure and the vocabulary associated with the domain knowledge on hand. This has been perceived as a strong impediment for re-using and sharing domain knowledge in RDR-base systems (Richards 1998). Further, given a search domain, e.g. chess, attributes used by the expert are not known a priori. In (Beydoun and Hoffmann 1997; Beydoun and Hoffmann 1997), we extended RDR into Nested RDR to allow the expert to use abstract attributes which s/he can explain using simpler attributes. That is, NRDR captures the vocabulary along with the expert's domain knowledge. This extension was intended to facilitate incremental acquisition of search knowledge where attributes are not known a priori (Beydoun and Hoffmann 1997; Beydoun and Hoffmann 1997). The general utility of NRDR was first discussed in (Beydoun and Hoffmann 1998). NRDR can be used to develop models through knowledge level interactions. This has been advocated in the knowledge acquisition literature for building domain independent knowledge acquisition tools e.g. (Eriksson 1993). In (Beydoun and Hoffmann 1998), NRDR was described as a tool allowing simultaneous incremental knowledge acquisition and explicit modelling. In reference to general knowledge acquisition approaches presented in chapter 2, NRDR can be regarded as a tool which allows simultaneously knowledge analysis (coming up with the right ontological units i.e. the expert vocabulary) and modelling2•

In our framework to acquire search knowledge, NRDR allows the expert to introduce his/her vocabulary to express his/her search knowledge. S/he has more freedom to express him/herself naturally than using normal RDRs. S/He uses Ripple Down Rule structures for allowing him/her to define a conceptual hierarchy during the knowledge acquisition process. Every concept is defined as a separate single classification RDR tree. Conclusions of rules within a concept definition have a boolean value indicating

2 By modelling we mean the implicit domain model captured by the knowledge base.

31 whether or not the concept is satisfied by a given case. Defined concepts can in tum be used as higher order attributes by the expert to define other concepts. When a condition of a rule contains expert defined concepts, the boolean value of the condition is calculated in a backward chaining manner. That is for any defined concepts, its boolean value is calculated first. This value is then propagated up the conceptual hierarchy (see figure 3.3). When more than one expert defined concept is used in the condition of a rule, lazy evaluation is used for efficiency purposes (that is when a concept is evaluated as false, then the value of the condition is returned as false without proceeding to evaluating the next concept). In our workbench to acquire search knowledge, SmS (Smart Searcher), the elementary level is the level of domain explanatory primitives. These will later be discussed in detail in chapters 5 and 8.

Cl BI Rule Cl.1 Al,B ->+Cl

Rule Cl.2 ..42->-Cl

Rule Al.1 Rule A2.l Rule B2.1 pl,p2->+A2 pl,p3-> +82

Figure 3.3: A Nested Ripple Down Rule knowledge base. In Cl, rule Cl.2 invokes RDR tree Al, this in tum from within rule Al .1 invokes RDR tree A2.

Clearly, the evolving concept hierarchy depends not only on the given domain but also on the expert. It will reflect his/her individual way of conceptualising his/her own thought process. This point will be further discussed in chapter 6.

32 Simple Ripple Down Rules as proposed by Compton and Jansen (Compton and Jansen 1988) discriminate input objects into a set of mutually exclusive classes. In the rest of this thesis, when referring to simple RDR trees embedded within Nested RDR we refer to binary conclusion RDR trees which classify input into two sets: A set of positive objects belonging to a given class, and a set of negative objects falling outside the class definition.

When some terms in an NRDR knowledge base are being modified, many terms directly (or indirectly) connected to those terms must be checked for consistency with respect to past seen cases. Consequently, a single update may cause a chain reaction of updates in the knowledge base. To deal with such chain reactions, a holistic account of the knowledge base consistency with respect to past seen cases is required. As NRDR is a substantial contribution of this thesis, we will dedicate chapters 6 and 7 for describing the technicalities and the added complexity (in comparison with RDR) associated with their maintenance. We will show that this added complexity has a minimal impact on the cost of developing an NRDR knowledge base. Therefore, the strengths of NRDR to be outlined in coming sections of this chapter do not impede the ease of maintenance of the knowledge base.

We now tum our attention to limitations of Ripple Down Rules. We discuss the main research done to alleviate these limitations. This will highlight the advancement of the state of the art made by NRDR in the incremental knowledge acquisition community.

3.5. NRDR and limitations of RDR and MCRDR Knowledge acquisition with (MC) RDR has the following limitations:

1. Lack of modularity of the knowledge base development process, or the resultant knowledge base. 2. Repetition.within the knowledge base. 3. Lack of explicit modelling.

33 We will later see that the first and the last limitations are essentially two different ways of looking at the same structural problem. However, the first limitation is from a point of view of the development process of the knowledge base. The third is more concerned with reusability and readability of the knowledge base. This last point of view will later lead us to question the applicability of the RDR approach to certain classes of applications. These are applications that require explicit models during the actual knowledge acquisition process. It must also be noted that the second problem, that of repetition, is a minor problem in comparison with the third, and it is less serious than the first.

In what follows, we give an overview of the major efforts that went into studying and finding solutions for those limitations.

3.5.1. Lack of modularity

As the knowledge base grows large (e.g. hundreds of rules), having the knowledge base split into separate modules would ease its development and enable reasoning about its contents. This may also facilitate using multiple sources of expertise simultaneously. Incremental knowledge acquisition with RDR (or MCRDR) does not offer this modularity. In NRDR, new domain terms (concepts) are constantly added by an expert during the knowledge acquisition process, and each term can be thought of as a distinct module in the knowledge base. To our knowledge, Nested Ripple Down Rule is the first and only approach which allows modularising the incremental knowledge acquisition process during the incremental process itself. In other hierarchical incremental knowledge acquisition approaches such as Disciple (Tecuci 1998), modules of the knowledge base are decided during a knowledge engineering stage, before the start of the knowledge acquisition process. In Disciple (Tecuci 1998), the interface design is dependent on the domain used and it needs to be re­ implemented for every domain. As we will later see in chapter 5, our NRDR interface is actually domain independent.

The hierarchical modularity of NRDR produces a conceptual structure that emerges as a result of the knowledge acquisition process. This conceptual structure constitutes an

34 explicit domain model (Beydoun and Hoffmann 1998). This makes an NRDR knowledge base very readable, easy to reason about and avails it for reuse, e.g. for tutoring purposes. Further, the emergence of this hierarchical conceptual structure has two advantages for the actual knowledge acquisition process: Firstly, it offers a good interface for the expert to keep track of his/her introduced terms. Secondly, because the human language also has hierarchical structure (Chomsky 1957; Sowa 1984)3, in NRDR the expert is given a natural way which is easy to follow while introducing his new concepts.

3.5.2. Repetition in (MC) RDR knowledge bases

RDR knowledge bases show repetitions. The cause of this can be clearly understood by considering the following quote from page 15 of (Compton, Edwards et al. 1991) " .. The expert is able to look at the data and identify the appropriate classification for a report. However for the rule to apply to as many cases as possible, it must abstract from the individual features of the data in this case, and the rule of abstraction must be known, so it can be applied appropriately to other cases". In RDR, such feature extraction is not available. The use of binary structure means that knowledge in one particular pathway is not accessible to a search down another pathway. Similar rules are introduced in different parts of an RDR knowledge base. That is, newly added rules have limited reusability and repetition arises. In NRDR, feature extraction is an inherent part of the development process of NRDR. A lower order concept RDR tree can be used as a condition. Rules entered in a given concept are reusable in any part of the knowledge base by simply reusing this concept as a condition. Richards and Compton proposed an approach for removing RDR repetition using machine learning

3 Language is also believed to reflect the structure of the mind, for example Fodor states in (Fodor,

1998) (page25): " .. mental representation is a lot like language, according to my version of

Representation Theory of Mind. Quite so; how language expresses thought if that were not the case".

The relation between language and intentionality of the mind has also been explicitly stated by Searle

in (Searle, 1983) (page vii): " .. The capacity of speech acts to represent objects and state affairs in the

world is an extension of the more fundamental capacities of the mind to relate the organism to the

world through [intentional states] .. ".

35 (Richards, Chellen et al. 1996) after a knowledge base is developed. Using NRDR some of the repetition is avoided during the actual knowledge acquisition process. NRDR can condense the size of RDRs, as the same concept defined by a lower order RDR tree may be used multiple times in higher order trees.

It must be said though, that this repetition problem is not a serious impediment for using RDR as shown in (Richards, Chellen et al. 1996; Suryanto, Richards et al. 1999). Simulation studies done with MCRDR showed that the knowledge bases produced are as compact and accurate as those produced by induction (Compton, Preston et al. 1994; Compton, Preston et al. 1995). However, we argue next that NRDR knowledge bases - due to their complex structure, are a lot more compact than RDR knowledge bases.

Looking at the number of rippling paths available in an RDR/MCRDR knowledge base, and the number of rippling paths in an NRDR knowledge base (of same size in terms of number of rules), we argue that the number of rippling paths is considerably larger in an NRDR knowledge base: In an RDR/MCRDR knowledge base the total number of paths is clearly equal to the number or rules. In an NRDR knowledge base, a given condition C in a given rule can have the same evaluation over a number of rippling paths. This depends on the size of the RDR tree which defines C. The total number of possible ripple paths in the whole knowledge base is the total number of combinations of different ripple paths over all conditions. So the total number of ripple paths is actually exponential to the number of concepts (terms) in the knowledge base. This is clearly larger than a linear combination of the size of the concepts (which is the size of an equivalent RDR/MCRDR knowledge base)4.

A ripple path determines the context of a rule. Clearly, the larger the number of ripple paths is, the larger the number of cases that can be covered by the knowledge base

4 A mathematical simile for this argument is the following: Divide a set of objects into a number of

subsets. The total sum of the number of elements of all subsets is equal to the number of elements in the

original set. However, multiplying sizes of the subsets will yield a number much larger than the size of

the original subset.

36 becomes. So for a given domain, we expect an NRDR knowledge base to be a lot more compact than an equivalent RDR/MCRDR equivalent knowledge base. In fact, we expect a faster convergence of an NRDR knowledge base than a RDR/MCRDR knowledge base. This is as long as the cost of adding a single rule in an NRDR knowledge base is equivalent to the cost of adding a single rule in an RDR/MCRDR knowledge base, which will be proved in chapters 6 and 7 of this thesis.

A more serious limitation of RDR/MCRDR systems is their lack of explicit modelling capability. This is discussed in the next section.

3.5.3. Lack of explicit modelling

In addressing limitations of using RDRs in modelling domain terms used by the expert, their relationships and abstraction hierarchies, Richards and Compton (Richards and Compton 1997) developed a technique to uncover conceptual structures from MCRDR knowledge bases. They used formal techniques to extract ontological vocabulary from MCRDR systems. Their method relied on treating each rule as a primitive concept, and deriving higher level concepts by finding the intersections of rule conditions. The primitive and abstracted concepts are then ordered to form a complete lattice or abstraction hierarchy. This method is limited to domains where not all attributes are always used with every case. For instance, this method cannot be used when conditions are based on relational attributes between elements of feature vector representations of classified cases. That is, this method can only be used with grounded relations, i.e. when conditions of rules are expressed in propositional logic. Opposed to uncovering models after the knowledge acquisition process, NRDR is a solid framework to capture relationships between terms used by an expert and acquire abstraction hierarchies during the actual knowledge acquisition process. Further, as we will later show in chapters 6 and 7, this is possible while preserving easy maintenance of RD Rs.

An other major work which tackles the lack of explicit modelling of RDR knowledge bases is known as ROCH (RDR-Oriented Conceptual Hierarchies) (Martinez-Bejar, Benjamins et al. 1998). This is reviewed next.

37 ROCH (RDR-Oriented Conceptual Hierarchies) In (Martinez-Bejar, Benjarnins et al. 1997), Martinez et al proposed a set of ontological operators which enable the extraction of domain knowledge from text fragments. The knowledge engineer applies these ontological operators on text, and s/he validates the results with domain expert(s) (Martinez-Bejar, Benjarnins et al. 1997). One underlying assumption in this approach is that, once validated, the result will not need later modification. Later in (Martinez-Bejar, Benjarnins et al. 1998), these operators were adapted to allow a knowledge engineer to use conceptual hierarchies from Ripple Down Rules knowledge systems. Motivation of the work in ROCH (Martinez-Bejar, Benjamins et al. 1998) is similar to ours in NRDR, in that the expert is assumed to be able to establish a conceptual hierarchy underlying a particular case, which they want the knowledge base to be able to classify. We agree with their argument that experts (and, in general, human beings) accomplish the mental processes (for instance, abstraction) necessary to construct ontologies in a better way. The conceptual hierarchy constructed is separate from the knowledge base. A consistency check is required, to ensure that both, the domain ontology and the domain knowledge base remain mutually consistent.

In ROCH (Martinez-Bejar, Benjamins et al. 1998), the number and the way in which modifications must be carried out in their associated hierarchies, to ensure consistency between the knowledge base and the associated hierarchy is restricted. In NRDR the conceptual hierarchy and the knowledge base are one inseparable entity, hence this restriction in update is quite simple (to be detailed with respect to Nested RDR in chapter 6). A similar policy as in NRDR, is that removing unused concepts from conceptual hierarchies may be feasible. However, we agree with a view presented in (Martinez-Bejar, Benjarnins et al. 1998), that further expert consideration can change in a way to make those concepts relevant, and it is best to leave unused concepts for possible future use.

38 3.5.4. Concluding remarks on modelling and NRDR

In works (Richards and Compton 1997; Richards and Compton 1997; Martinez-Bejar, Benjamins et al. 1998; Richards 1998) addressing the limitation of (MC) RDR to capture an explicit domain model, models developed are decoupled from the knowledge base and hence are not directly affected by the knowledge acquisition process. A knowledge base refinement requires a new generation of the model in (Richards and Compton 1997; Richards and Compton 1997), and a separate refinement of the model is required in case of (Martinez-Bejar, Benjamins et al. 1998). In NRDR this is not the case, the model is developed during the actual knowledge acquisition process. This is a critical difference. This simultaneous modelling and knowledge acquisition (Beydoun and Hoffmann 1998) can be critical in some domains. For example, in building intelligent web browsers, it is often difficult to figure out the classes of documents before actually doing a substantial amount of browsing and indexing.

Furthermore, reusable ontologies are increasingly being recognised as an important research area (see chapter 2) in the development of knowledge based systems. They allow mechanisms for constructing domain knowledge from reusable components. Unfortunately, there are no standard methodologies to build them. Their construction is difficult and time consuming. From this perspective, Nested RDR offers an economical way to build a domain ontology during the incremental knowledge acquisition process. That is, our NRDR framework allows knowledge analysis simultaneously with knowledge acquisition.

Finally, it must be noted that all strengths of NRDR do not impede the ease of the knowledge acquisition process (when this is measured in terms of the number of the interactions with the expert). It is true that the NRDR structure creates extra complexity in maintenance because of interactions between RDR trees within an NRDR knowledge base. In chapter 7, we will show that this extra complexity in maintenance of an NRDR knowledge base does not put any major extra burden on the expert. Further, we will show that dealing with these extra interactions can be automatically handled. In the rest of this chapter, we report on other state of the art research in the incremental knowledge acquisition community, and we give a quick

39 overview of theoretical research on RDR. Our work in chapters 4 and 7 extends some of this theoretical research.

3.6. Other recent RDR research In this section, we look at two important areas of RDR research: Incorporating uncertainty in the knowledge representation, and formalising the RDR framework.

3.6.1. Fuzzy RDR

Fuzzy terms are widely used in real-life engineering applications. Fuzzy logic (Zadeh 1983) is normally used in such applications. Martinez et al (Martinez-Bejar, Shiraz et al. 1998) proposed a theoretical model based on fuzzy logic to capture and represent fuzzy domain knowledge in RDR-based systems. The model provided a methodology for propagating uncertainty values represented with RDR rules and cases. This fuzzy model is intended to be used with ROCH system (see above) (Martinez-Bejar, Benjamins et al. 1998). This work allows continuous attributes to be used and it also incorporates error intervals in their values. It is clear that experts are inconsistent in doing this and they are subjective in their certainty judgement (for example, this subjectivity is most obvious in game playing where expertise can be manipulated with psychological stress). In this respect, the key strength of this work is that it does not rely on the expert assigning uncertainty measures to his rules. In (Martinez-Bejar, Benjamins et al. 1998), uncertainty values are derived using the distribution of correct classifications among MCRDR rules. This work is currently being developed, and an actual system based on this model is being developed and tested (Martinez, personal communication).

3.6.2. RDR theoretical work

Most notable theoretical papers concerned with RDR structures have focussed on analysing performances of automatic RDR induction algorithms. The earliest such algorithm was probably Induct presented in (Gaines 1991). The basic algorithm in Induct is to search for the premise for a given conclusion that is least likely to predict that conclusion by chance. In (Gaines and Compton 1992), Induct was used to

40 generate RDR knowledge bases in the medical domain, which were later modified manually. In (Kivinen, Mannila et al. 1992; Kivinen, Mannila et al. 1993), Kivinen et al presented another learning algorithm RDRS based on a greedy approximation method. They showed that their algorithm is a PAC (Probably Approximately Correct) learning algorithm for fixed depth Ripple Down Rule trees. In (Scheffer 1995; Scheffer 1996), Scheffer presented another learning algorithm called Cut95 to induce Ripple Down Rules automatically. He used the information gain criteria to select rule conditions. The main empirical results reported for Cut95, is that it produces RDR classification programs which are as accurate as C4.5 decision tree generation algorithm (Quinlan 1993) for most problems, and that the Cut95 information gain criterion outperforms Gaines's criterion in Induct.

The only effort concerned with modelling and analysing the actual knowledge acquisition process driven by an expert with Ripple Down Rules is our own effort presented in (Beydoun and Hoffmann 1998; Beydoun and Hoffmann 1999; Beydoun and Hoffmann 1999). In this work, we presented a mathematical model of the knowledge acquisition process and used it to analyse the knowledge acquisition process as a function of the quality of expertise. This model is intended to analyse Nested Ripple Down Rules behaviour, however it's generally applicable to all RDR structures. As we will later see, our model predicts and explains empirical results obtained with MCRDR in (Compton, Preston et al. 1995; Kang, Compton et al. 1998). Our work in (Beydoun and Hoffmann 1998; Beydoun and Hoffmann 1999) will be expanded and presented in chapters 4 and 7 of this thesis.

3.7. Chapter summary and conclusion In this chapter, we highlighted strengths of Ripple Down Rules from a knowledge based system perspective. We surveyed recent developments of this technology. In this context, we introduced our Nested Ripple Down Rules framework. This framework addresses shortcomings of RDR that have been of concern to a number of researchers over the past few years. In particular, these are: Repetition, lack of explicit modelling, lack of readability and modularity.

41 Our Nested RDR framework preserves the key strength of RDR, that of ease of maintenance through direct interactions with the expert. This will be formally shown in chapters 6 and 7 of this thesis. Towards this, in the next chapter we introduce a formal framework to deal with RDR tree structures in general. On its basis, important results about the efficiency of the knowledge acquisition process with NRDR will be derived in chapter 7.

42 Chapter 4

A Formal Framework of Ripple Down Rules

In an NRDR knowledge base, expert concepts are defined as separate RDR trees. In this chapter, we present a theoretical framework for dealing with RDR in general. This is a prelude to analyse the knowledge acquisition process with NRDR, and to prove that NRDR knowledge bases can be efficiently developed without too many inconsistencies occurring. This will be done in chapter 7, and it will follow a background discussion of why NRDR was conceived, which we will present in the next chapter.

Section 4.1 of this chapter presents our formal framework, it describes three aspects of the RDR knowledge acquisition framework: RDR semantics, RDR tree structure and knowledge acquisition parameters. In section 4.2, we apply notions from this framework

43 to analyse the convergence of an RDR tree. In section 4.3, we use the framework to explain why most of past cases classified by a Ripple Down Rule tree remain consistent with future modifications to the knowledge base. Further, we outline improvements to the existing update methodology to guarantee that all cases seen remain consistent with future modifications. Indeed, we use this methodology in developing RDR trees within an NRDR knowledge base. This discussion will be used as a starting point to relate Ripple Down Rules to default logic (Reiter 1980) in section 4.4. There, we characterise default theories that can be converted to RDR trees.

4.1. Formal Framework for RDR structures The framework presented in this section is divided in three parts: The first part formalises Ripple Down Rule trees semantics (This includes semantics of their basic components: conditions, rules, .. ). The second part formalises useful notions to describe the behaviour of the knowledge acquisition process with RDR. The final part formalises notions that will be later used to describe the structure of RDR trees.

4.1.1. RDR semantics

In this section, we formalise semantics of Ripple Down Rules. We first formalise semantics of individual rules within an RDR tree. We then formalise semantics of a whole Ripple Down Rule tree.

Let C denote a finite set of concept representations and a set V of class values. In a single

RDR tree used within NRDR, V contains two class values: The concept being defined and its negation. However, looking at a whole Nested Ripple Down Rule knowledge base, the number of concepts is limited by the expert's conceptualisation of the domain.

We use an extra symbol ➔ outside both V and C. Further, let the set X comprise all strings, which are used for representing instances. In the following, we assume every instance xE X is an attribute vector . Furthermore, we allow all possible

44 combinations of attribute values, i.e. X = A 1 X A2 x .. x An where A; is the set of all possible values an attribute a; can take.

Propositional representations are frequently used in knowledge acquisition. They are used in all RDR applications discussed in the previous chapter. Thus, we define the set C of concept representations as a set of propositions.

To form every c E C, we use the following: A finite set of primitives F (propositional variables).

The connectives: A,---, One truth-constant: 'true'. Then the set C of concept definitions is given as follows:

All primitives F and the constant true are in C.

If p E C, then ---,p E c.

If p E Cand q EC, then (p "q) is in C.

These are all the concept definitions in C.

The semantics of all c in C are defined by the interpretation function I: C ~ 2x. I associates to every concept representation c E Ca subset I( c) c X, i. e. I( c) denotes the set of objects, which are subsumed under c. This is also called the extension of the concept represented by c.

The connectives above have the following semantics: 'r7' p E C, I (---,p) = X - I (p ), that is, the interpretation of p is the set-theoretic complement of the interpretation of ---,p. Further, Vp EC, 'r7' q EC, I (p" q) = I (p) n / (q). Finally, I (true)= X.

45 Definition 1: A rule r is given by r = c ➔ v, where c is a condition, i.e. a concept representationfrom Cand vis the conclusion, i.e. a class value from V.

Given an instance x E X, a rule r = c ➔ v, then the rule r assigns the label v to x, if and only if x in I( c). In words, c ➔ v means "if the instance is in I( c), then this instance is assigned the class label v".

When an RDR based knowledge base is being used, the following definition is useful:

Definition 2: Given an instance x EX and a ruler= c ➔ v, r is said to apply to x, if xE l(c).

Ripple Down Rules have been formalised as binary trees in (Scheffer 1995). So an RDR tree T, can be recursively defined as follow: , where r is the rule in the root node, and E and S are the exception RDR sub-tree (true link sub-tree) and the succeeding RDR sub-tree (false link sub-tree) respectively. An empty sub-tree (the empty rule) is denoted by the symbol 'A, i.e. given then r is a leaf node.

A proper RDR tree is of the form < rd = true ➔ v"' E, S> where E and S are possibly empty RDR trees. rd = true ➔ vd is the default root node, where /(true)= X, i.e. /(true) covers the whole domain a knowledge base is being built for, and vd is its default conclusion.

Definition 3: Eval(T,x) is an evaluation function which determines a class value, which T assigns to an instance x E X:

Eval ( A., x) = _j_

Eva[ ( , x) = v, if x f! I( c) I\ Eva[ (E, x) = _j_

46 Eva! (, x) = Eva! (E, x), ifx E l(c)/\Eval(E,x):;c_l

Eva! (, x) = Eva! (S, x), if x fE l(c) Where -1 denotes the empty conclusion.

Clearly, not all rules in a given tree Tare required to evaluate Eval(T, x). We introduce the notion of checking path to describe whether or not a rule r is required to evaluate Eval(T, x). This is recursively described as follows:

A rule r'of a subtree T' = < r', E', S'> is on the checking path of T for x, if T' is on the checking path of T for x. Let T' = < r', E', S '> be a subtree which is on the checking path of a tree T for x, if r' applies for x then E' is on the checking path of T for x. Otherwise,

S' is on the checking path of T for x. T = is on the checking path for all x EX.

c. r, if-tcuc

~1

4 ~ r) Cs l-ttu< iJ : if.f.alse

G]·------

Figure 4.1: A typical representation of an RDR tree. A horizontal link is a true (exception) branch, a vertical link is a false branch.

47 Definition 4: Given an RDR tree T and a subtree T' = < r' = c' ➔ v', E', S'> of T. The rule r is said to fire for an object x, if T' is on the checking path of T for x, r' applies to x and E' is not on the checking path of T for x.

Definition 5: In an RDR tree T, a ruler= c ➔ vis said to misfire for an instance x EX during the knowledge acquisition process, if r fires for x, but the expert responsible for developing T declares v not a correct label for x.

The following three definitions follow (Scheffer 1995; Scheffer 1996):

Definition 6: Given an RDR tree T and a subtree T' = < r' = c' ➔ v', E', S'> of T. The context of the ruler', context(r') is the set of objects for which T' is on the checking path ofT.

For example, in figure 4.1 context( r3) is the set of instances flowing down arrow 2.

Definition 7: Given an RDR tree T and a subtree T' = < r' = c' ➔ v', E', S'> of T. The domain of a ruler', domain(r') is a subset of objects from context(r'), such that for every objectx in domain(r'), r' applies.

Definition 8: Given an RDR tree T and a subtree T' = < r' = c' ➔ v', E', S'> of T. The scope of a ruler', scope(r') is a subset of objects from domain(r') such that for every object x in scope(r'), r' fires.

The domain of an exception of a rule is clearly a subset of its own domain. That is, in a ripple down rule tree T = < r, E, S>, for any rule e in Ewe have dam (e) cdom (r). For example, in figure 4.1, dam( r3) is the set difference between the set of instances flowing down arrow 2 and the set of instances flowing down arrow 3. Figure 4.2 represents the

48 same RDR tree shown in figure 4.1 from a set theory point of view. Domains of rules in a false chain are mutually exclusive. Domain of an exception rule is a subset of its parent rule domain. For a default rule in a Ripple Down Rule tree, the context and the domain are the whole domain the knowledge base is being built for. In binary domains where there are only two classes, the use of a default rule is akin to the closed world assumption in non-monotonic reasoning (Etherington 1988).

In Ripple Down Rules trees, rules are self-contained. That is, rules have their own exceptions and these are meaningful only for instances that could be classified by the rule. In a false link chain of rules, exceptions of a rule do not get handled by exceptions of earlier appearing rules. This is clearly represented in figure 4.2, where domains of exception rules are subsets of domain of their parent rules. This is the spirit of locality of RDR trees that makes RDR knowledge bases easy to maintain and extend. Using our

Dom (r..) 0 ODom(,l

Figure 4.2. A set theory view of the RDR tree shown in figure 4.1.

49 current terminology: This translates into stability of the context and the domain of rules, i.e. these do not change during maintenance and extension of the knowledge base. As we will see later in chapter 7, this stability will no longer hold for NRDR concepts defined as RDR trees. The domain and context of a rule in NRDR may change. This flexibility in context allows simultaneous modelling and knowledge acquisition as advocated in (Beydoun and Hoffmann 1998). The model itself is also flexible in the sense that concepts may be created dynamically. Further when an old concept for some reason needs to be fixed - a new one may be created as a copy of the old one - with slight changes.

4.1.2. The knowledge acquisition process

The part of our framework presented in this subsection formalises notions needed to give qualitative and quantitative descriptions of the actual knowledge acquisition process. We also discuss the choice of default knowledge in individual expert defined concepts that are the building blocks of an NRDR knowledge base.

Definition 9: A rule r is complete when all cases in its domain are correctly classified by r or one of its exception rules.

Definition 10: The correctness of a rule r is the ratio of objects in its domain that are correctly classified with respect to the size of the domain itself. That is, correctness (r) = lscope(r)I I ldom (r)I.

As the expert enters his/her new rules in an RDR tree, s/he aims that his/her rules fire correctly for most instances. When s/he patches a rule with an exception, s/he attempts to cover the largest possible number of unseen instances correctly. In other words, s/he aims to give her/his rules maximum generalisation capability. This generalisation capability as perceived by the expert not only depends on the correctness of the rules. It also depends

50 on the distribution of instances in the given domain. A rule with high correctness may misfire frequently, simply because instances for which it would.fire correctly are of rare occurrence. To describe this behaviour, we introduce the notion of predictivity. We consider a probability distribution P function on X, P: X ➔ [O, l], where the sum of the range values of P over X is equal to 1. Thus, we define this predictivity notion as follows:

Definition 11: Predictivity measure pred(r) of a ruler is the probability that objects in its domain are classified correctly. That is:

L P(x) d ( ) XEscope (r) pre r = ~ L.J P(x) XEdom (r)

For domains with a uniform distribution function, correctness and predictivity of a rule are identical. From a practical point of view, because of its generality, this predictivity measure is much more useful in deriving results about the knowledge acquisition process. In particular, we are mostly concerned with instances which the expert actually meets during knowledge acquisition. These instances are also expected to be met by future users of the system during its lifetime. That is, performance of the knowledge base with respect to these instances determines the overall performance of the knowledge­ based system. Clearly, instances with frequent occurrence - high probability weight - are a lot more relevant in modelling the knowledge acquisition process than instances of rare occurrence. Therefore, our main results in chapter 7 will use this predictivity notion instead of the correctness notion.

The following definition is useful to express the effectiveness of the default rule:

Definition 12: The coverage ratio R for an RDR tree T is probability that a case does not belong to the scope of the default rule.

51 A low coverage ratio implies that expert entered rules classify a small proportion of the domain for which the knowledge base is being developed, as most cases are correctly classified by the default rule.

4.1.3. Structural definitions The part of the formal framework presented in this subsection formalises definitions required to describe the structure an RDR tree.

Definition 13: The granularity g of a domain D c X is the number of rules with disjoint domains which the expert requires, such that the union of domains of these rules is D. i.e. in an RDR tree, this is the length of a false link chain covering D.

For example, given a ruler with correctness(r) < 1, granularity of [dom (r) - scope (r)J is the number of exception rules required to cover all of the exception cases in dom( r ).

Definition 14: The initial granularity G of a concept definition is the target number of rules in the first exception depth (This constitutes the outer false chain in an RDR tree).

The most outer false links chain in an RDR tree is normally the longest of links chains in the tree. The union of domains of rules in this chain is a super-set of domains of all rules within the tree (excluding the default rule domain).

For example, if rule r1 in figure 4.1 is complete, then the initial granularity of the concept defined by that RDR tree is 3. Initial granularity depends on both the quality of expertise and the size (and complexity) of the domain. Given a fixed domain, the poorer the

52 expertise, the weaker generalisation the expert can offer from a given instance. Hence, the larger the initial granularity will be.

In a given RDR tree T, the following definitions allow reference to individual rules within T:

Definition 15: The depth of a rule r within T, depth(r) is the number of true links taken to reach r from the root node of T.

For example, in figure 4.1 depth( r5) is 2.

Definition 16: The rank of a rule r within T, rank(r) is the number of false links taken after the last true link to reach r from the root node of T.

For example, in figure 4.1 rank( r4) is 2.

In the coming section, the above framework will be used to analyse the development of Ripple Down Rule trees during the knowledge acquisition process.

4.2. Correctness of rules and development of RDR structures In this section, we use the framework of the last section to analyse the depth of exceptions in an RDR tree during the knowledge acquisition process.

For a given domain D, the objective of a knowledge acquisition process developing

RDRs is to come up with a set of complete rules S, such that \;/ ri ES, l..{ dom(ri) = D. The more correct those rules are, the faster the completion of the construction of the knowledge base becomes. We can assume that under normal circumstances of building an RDR concept definition (a binary conclusion RDR tree), a competent expert chooses

53 rules of correctness of at least ½. In any RDR tree, when an exception e for a rule r is added, the context of rand its domain remain unchanged. A subset of domain(r) becomes the domain of e. When the correctness of r is larger than ½ then we have: lscope(r)I > lscope(e)I. A rule r may have more than one exception. That is, a chain or rules connected by a false links may be required to deal with exceptions of r. For example in figure 4.1, r2 and r3 and r4 are all exceptions of r1• Actually, when the correctness of r is larger than½, we then expect lscope(r)I > J; scopes of all its exceptions.

We will shortly prove that if the correctness of a new ruler has a fixed lower bound, and if this bound is independent of any exception rule domain throughout the knowledge acquisition process, then we have an upper bound on the depth of exceptions of r which is much smaller than the domain of r.

We call the above constraint on the correctness values of expert rules: the correctness principle. As we will later show, this condition is enough for RDR trees to have an economic convergence. Actually, this condition is only a mathematical way of representing generalisation of expert's comments. When an expert cannot generalise from a given case, then the rule will only apply to the case for which the expert is giving his/her explanations, and therefore the correctness of his/her exception rule will be I/domain (parent rule). Hence, the correctness is bounded by a function dependent on the domain of the parent rule. This violates our stated correctness principle. We expect the correctness principle to be satisfied in most domains of expertise. That is, when the expert enters some new exception rule e, it is safe to expect that e applies to a part of the domain of its parent rule which is proportional to the size of this domain. This leads to the following theorem, which holds whenever the correctness condition is available:

Theorem 1: Given the correctness principle, if the size of the scope of any exception rule is > 1, then the depth of exceptions for a rule r is bounded by:

54 imax = C in ldom(r)I where In is the natural logarithm and C = -JI In ( I-correctness (r))

Proot· For a rule r of depth i-1, we denote its scope and domain by Si-J and Di-J respectively. Union of domains of all exceptions of r is denoted by Di. That is, Di is the set of objects that belonged to Di-1 but didn't belong to Si-1• Thus by definitions 7 and 8 we have:

Using definition 10, we can rewrite (1) as:

IDil = (1- correctness( r)) ID i-1 I (2)

We define a function Size: N ➔ N which takes the depth of an RDR tree and it gives the number of objects in rules domains at this depth. So, we can rewrite (2) as: Size(i) = (1- correctness(r)) Size(i-1) (3) This can be rewritten as: Size (i) =(1- correctness(r)Y Size(O) The initial condition Size(O) is ldom(r)I. The decay of the size of the scope must stop when the scope of the exception rule is just a single instance (in practice it stops much earlier than this). So, to find an upper bound on the depth of exceptions imax we set a bound on the size of 1, i.e:

Size(imax ) > 1

1 > (1-correctness(r)/max ldom(r)I

(1-correctness(r)/""'' > 1/ ldom(r)I

imax > I -Log (1-correctness(r)) I dom(r) I l

imax > C in ldom(r)I where C = -1/ in ( I-correctness (r))

55 For a fixed domain, the above constant C depends primarily on the expertise. Note correctness( r) does not have to be uniform. When it varies, the above upper bound on exceptions depth can be expressed in terms of the lower bound of the correctness. By the correctness principle, such a lower bound exists. QED

From theorem 1, we can see that for a fixed domain the maximum exception depth depends in part on the quality of the expertise. The poorer the expertise (the smaller the correctness of r) is, then the deeper the exception hierarchy becomes. The impact of the quality of expertise employed on the development of a Multiple Classification RDR knowledge base will be analysed in chapter 7 in reference to some past empirical results.

While for a fixed domain, the correctness is determined by the quality of expertise available, its absolute value depends on the actual domain itself. In extreme cases of domain complexity, where the scope of every rule only covers a single instance and the domain of every rule is the rest of the instances (i.e the granularity g is unity), the correctness value is at its theoretically lowest value: # correct cases I domain size of every new rule = 1 I domain size of every new rule = GJIDI (where Gis the initial granularity and IDI is the size of the given domain).

The correctness principle fails to hold in such an extreme case, as the correctness of new rules depends entirely on the domain of its parent (independent of the expertise used). In this extreme case where the correctness principle is clearly violated (see earlier), the number of rules is equal to the number of instances in the domain. Note, the depth of the tree will only depend on the granularity g. The expression for maximum depth becomes: imax = log~ I dom(r)I

56 An example of this extreme case is learning the parity function of binary strings. In this domain, every instance is a binary string. The change of any single attribute (a bit in the binary string) in any instance changes the class of the instance (i.e. changes its parity completely). The scope of any rule can cover exactly one instance. The opposite extreme is when the whole of the domain is covered by one rule (or a single chain of rules) i.e. there are no exceptions and the correctness is 1.

Experience of using Ripple Down Rules in the medical domain in (Compton, Edwards et al. 1992), showed that experts make reasonably good rules. The knowledge base reported for the PEIRS system in (Compton, Edwards et al. 1992) showed a maximum exception depth of 6. The accuracy reported by PEIRS was larger than 95%. Assuming that the correctness was uniform, and that the PEIRS system covered the whole domain i.e. the outer false link chain covers the whole domain and only exception rules are needed to make the knowledge base is 100% accurate. Further, assume that the error rate is only due to missing exceptions on the last level. The maximum depth reported for PEIRS was 6. Assuming that 5% of the cases were exceptions at depth 7, then we get a lower bound of k = 2/3 ( see footnote 1 ). Note, that this bound is very loose (i.e. the correctness k would in practice be substantially larger than 2/3), because the latter two assumptions are too strong. In fact, the maximum depth of 6 was not the usual exception depth in PEIRS. In (Compton, Preston et al. 1995), Compton et al describe the PEIRS knowledge base as closer to a decision list with each rule as having a decision list refinement where the depth of the decision list corrections is only two to three but the length of the decision list is about 50 (that is the initial granularity- see definition 14 - is about 50).

The behaviour of RDR with respect to past seen cases was analysed empirically in (Kang, Gambetta et al. 1996). The next section analyses this same issue based on our

1 Log7 (error rate)= 0.66, where 7 is for the assumed missing 7th exception level.

57 formal framework. Although, this issue is stultified in our NRDR framework because all past seen cases are used to validate newly added knowledge to the knowledge base. In previous RDR work e.g. (Kang, Gambetta et al. 1996), only corner stone cases were used to validate new knowledge.

4.3. Accuracy of RDR with respect to past seen cases For any newly added rule rn in RDR/MCRDR, its domain is always a subset of the domain of the default rule. Therefore, cases classified correctly by the default rule in the past may become incorrectly classified as result of adding rn (in particular if the default conclusion differ from the conclusion of the new rule - which is always the case outside our NRDR framework). Even cases in the domain of the parent rule, may become affected as we explain below. However, because the domain of the default rule is the largest (it is in fact the whole domain), we expect cases classified correctly by this rule to be most susceptible to become incorrectly classified as a result of a future knowledge base amendment. Indeed when the coverage ratio is very small (close to 0) we see that checking past seen cases is necessary. In NRDR, this issue is stultified for two reasons: Firstly we already check past seen cases after every knowledge base amendment to detect inconsistencies. Secondly, in NRDR we expect the coverage ratio to be very low. That is, most cases in the domain belong to the scope of the default rule (see discussion of default knowledge in NRDR of previous section).

In RDR/MCRDR when a new rule is added, its condition is chosen from a difference list (see chapter 3). Both, choosing the condition from a difference list and the context of the rule ensure that the domain of the new rule excludes the corner stone case of its parent rule. Its predictivity ensures that its domain does not cover much of the correctly classified cases by its parent. An empirical study in (Kang, Gambetta et al. 1996) has shown that the error rate in correctly past classified cases was never larger than 5% during the whole development cycle of the RDR knowledge base. In the latter

58 development stage when the accuracy on unseen cases was close to 99%, the error on past seen cases was less than 1%. This clearly illustrates that the scope of newly added rules did not encroach on the scope of their parents. The study in (Kang, Gambetta et al. 1996) did not mention how many cases were classified correctly by the default rule. So, we cannot make any definite comments in regard to the impact of the coverage ratio on that low error rate.

Together, the coverage ratio of the RDR tree and the predictivity of rules in Ripple Down Rules trees ensure that their default classifications (including classifications by non-leaf nodes) are reliable. This will be mathematically represented in the chapter 7. This default behaviour led us to describe Ripple Down Rules within the framework of default logic which is an intuitive and simple formalism for non-monotonic reasoning. This relation between Ripple Down Rules and default logic is the focus of the next section.

4.4. RDR as a non-monotonic reasoning approach Incompleteness of knowledge does not paralyse people from reasoning and making decisions. Our human way of reasoning with incomplete information inspires the study of non-monotonic reasoning in Artificial Intelligence. This is the study of the process of drawing conclusions from incomplete information which may be invalidated by new information (Lukaszewicz 1990). There are many formal approaches to non-monotonic reasoning e.g (Etherington and Reiter 1985; Etherington 1988). Default logic is such a formalism that received lots of attention in the literature (Reiter 1980; Brewka 1994; Antoniou, MacNish et al. 1996; Courtney, Antoniou et al. 1996) for its conceptual simplicity and intuitive appeal as it parallels human reasoning when dealing with rules that are subject to exceptions.

In this section, we present a summary of default logic. This will be followed by exploring the strong relationship between Ripple Down Rules and default logic. This will expose

59 RDR as an expert driven method to build a prioritised default theory, where default rules and their priorities are captured during the knowledge acquisition process.

4.4.1. Default logic The main difference between various formalisms of non-monotonic reasoning is the representation of non-monotonic rules. In default logic, they are represented by special expressions called defaults. A default d looks as follows: d=a:/3 r a is the prerequisite for the default d to be considered. y is the consequent which is believed if believing the justification /3 is consistent. In default logic, commonsense knowledge about the world is represented as a default theory (D, W) where Dis a set of named defaults, and W is a set of axioms of the theory. D and Ware expressed as first order sentences. D provides extension for the theory not derivable from W. Normal defaults with /3 = y are popular because they reduce the complexity of the representation, and they are sufficient for knowledge representation in many naturally occurring contexts.

A widely used example of a normal default is: Bird(x): Can_ Fly(x) Can_Fly(x)

This is interpreted as "For every x, if x is a bird and it is consistent to believe that x can fly, then it is believed that x can fly". So, if all we know about Tweety is that it is a bird then we are permitted to believe that it flies. However, if we learn that Tweety is a penguin, and we know penguins don't fly, it is inconsistent to believe that Tweety flies, and the application of the default is blocked.

60 Ripple Down Rules do not map straightforwardly to a set of defaults. Normal defaults have conflicts when their prerequisites are not mutually exclusive. Their consequents can be contradictory. Exception rules are never mutually exclusive and have always -by definition - contradictory conclusions (consequents). To overcome this, we attach priorities to defaults representing Ripple Down Rules trees. This is detailed in the next two subsections.

4.4.2. RDR and Default logic Ripple Down Rule methodology overlaps maintenance and use of the knowledge base. A Ripple Down Rule knowledge base is useable, and gets used while incomplete. A Ripple Down Rule knowledge base k grows non-monotonically. That is some of the premises derivable from the knowledge base may get overridden by future rules: After an expert enters a new rule r we denote the knowledge base by k'. The addition of r is one of possible two scenarios:

• r is added as an exception rule. Hence, some conclusions of its parent are no longer possible.

• r is added as a new rule - i.e. r is attached to the outer false link. Hence, some conclusions of the default rule no longer apply.

In both cases, conclusions which are no longer possible, apply to cases in the domain of r. So, adding a new r retracts some premises ink. That is premises of k' c:z premises of k (i.e. growth of k is non-monotonic).

Every Ripple Down Rule knowledge base has a default rule, which has "True" as a condition. Being the root node, the default conclusion of the default rule is taken when the knowledge base fails to give a conclusion. Therefore, the reasoning in an RDR knowledge base shows default reasoning in two respects:

61 1. The default rule is taken when all rules on the outer false link chain fail. 2. When a rule fires, its conclusion is taken only if it has no exceptions. Otherwise its conclusion is taken by default if none of its exceptions fire.

The second aspect of the default behaviour in an RDR knowledge base is equivalent to saying that exception rules have a higher priority than their parents. When they fire, their conclusions supersede that of their parents. This is enforced by the tree-like structure of an RDR knowledge base. When a case is being classified by an RDR knowledge base, the case filters down (ripples down) to a leaf node l, if the condition of l is satisfied the conclusion of l is taken; otherwise, the conclusion of the last satisfied rule s on the path to l. So, the structure of the RDR tree gives l a higher priority than s. A rule of depth n has a lower priority than a rule of depth n+l. When two rules have the same depth, the rule with the lower rank has the higher priority (i.e. the rule higher up in the false link chain has a higher priority). Further, the default rule (i.e. the root node of an RDR tree) has the lowest priority. Note, priorities of rules are implicit within the structure of a Ripple Down Rule tree. They are captured during the knowledge acquisition process.

In what follows, we sketch an algorithm which maps an RDR knowledge base to a default theory. The mapping preserves the default behaviour of RDR rules. Moreover, we have to resort to prioritising the defaults to stop them from interacting beyond the semantics of an RDR tree. This approach is similar to (Brewka 1994) in dealing with priorities in default logic.

4.4.3. Mapping RDR to a default theory

In a Ripple Down Rule tree T, a rule r can be rewritten as l ➔ u ( see definition 1 ). If r is the lowest priority rule in T (i.e. the default rule), then it can be rewritten as the following default d:

62 d = l(x): u(x) u(x)

Where x is a case being classified. Rules in RDR are applied in context. To convert a rule r at depth n to a default, we must consider the path from the root node to r. The default d corresponding to r would then become:

d=a:/3 /3 Where a is a conjunct of all the conditions of rules which fired on the way to r (the true links) and the conjunct of every condition negation of every rule which did not fire (the false links). ~ is the conclusion of r. The priority of this default is n, where n is the depth of r.

Every rule in the RDR tree is converted to a default according to the above. The default rule will have the lowest priority (its depth is 0). The set of axioms with which the defaults must remain consistent is given by the database of all past seen cases. In implementations of RDR (Compton and Jansen 1988; Compton, Hom et al. 1989), this was only the set of comer stone cases. RDR trees as used within NRDR are maintained consistent with respect to all past seen cases (see discussion in the previous section).

For every rule r in an RDR tree, the corresponding default is constructed by considering the path from the root node to r. Its priority is given by the depth of the added rule. The added default is unique because the path to every leaf node in an RDR tree is unique. Moreover, when the priority of a number of defaults is the same, these defaults will have

63 mutually exclusive prerequisites. This is consistent with the behaviour of an RDR tree

(see definition 5), where the conclusion of a single rule is taken2•

In this section, we outlined how an RDR tree can be mapped to a set of prioritised defaults which apply in mutual exclusion. That is for a given instance, exactly one default applies in the reasoning process. The mutual exclusion of defaults during reasoning is guaranteed by both: The way prerequisites of defaults are derived, and their priorities.

In the next section, we characterise default theories that can be converted to RDR trees. We also outline how those default theories are mapped to RDR trees.

4.4.3. Mapping a default theory to an RDR tree In an RDR tree, paths in the tree apply in mutual exclusion. In the previous section, we mapped every path to a default. Hence, to apply the corresponding reverse process to a default theory (D, W), whereby every default corresponds to a path in the corresponding RDR tree, defaults in D must apply in separation. Further, because we defined rules in an RDR tree (see definition 1 in section 4.1.1) to have propositional value conclusions, we also restrict the default theories that can be converted to RDR trees to have propositional value consequences. This translates to the following set of constraints which must apply to the default theory (D, W) for it to be convertible to an RDR tree:

1. Every default must be a normal default (see before for definition of normal defaults).

2 In Multiple Classification RDR (MCRDR), conversion of the knowledge base to a default theory is possible and follows the same steps outlined in this section. The difference in MCRDR, is that some defaults will have same priorities and may apply simultaneously. This is acceptable as their consequents are guaranteed not to conflict. This is consistent with the semantics of an MCRDR knowledge base (see chapter 3).

64 2. For every default d = a : /3 , /3 E V where Vis the set of class values (conclusions) /3 which are propositional values (see section 4.1.1 for description of V).

3. Every default d E D must have a priority n. During inference, defaults with higher priorities are chosen before defaults with lower priorities, which are chosen only if none of the higher priority defaults are applicable.

4. Any two defaults in D with equal priorities must have mutually exclusive . . . . a :/3 d a2:f32 prereqms1tes. That 1s, given two defaults: d 1 = 1 1 E D and 2 = /3 E /31 2

D with equal priorities, then we have mutual exclusion between a1 and a2. That is, a1

I\ a2 = False.

a : /3 d a2 : /32 1 1 5. For any two defaults: d 1 = and 2 = /3 we have: /3i is not required /31 2 to compute a1. That is, if WI- a2 then W\ { P1} I- a2.

6. Exactly one default, dtkta•'" has the lowest priority. dd,taui, applies only when all other defaults in D do not apply, and it has an empty prerequisite. That is: : /J default d = ---''----- "''-'' fJ default

For a given default theory (D, W) which satisfies the above six constraints, we now outline guidelines to convert ( D, W) to an RDR tree T. These are:

1. The default d with the lowest priority min is converted to the following default rule which forms the root node of T:

65 If True then ~ default

2. Defaults with priority min + 1 form a first false link chain on the first exception level in T. Order of rules in this chain is irrelevant because defaults have mutually exclusive prerequisites. Defaults with same priorities > min + 1 are also converted to rules in T, and they form separate false link chains of rules within subsequent exception levels of T (see step 1 in figure 4.3).

Rules from defaults with priorities: min min+2 0 D 0 13 0 0 ?···-- 0 D 0 ro--o A suitabl_v prioritised 0----- set ofdefaults D Tlie equiw.,lent RDR tree T Figure 4.3. Conversion steps from a prioritised set of defaults D to an RDR tree T. Step 1: Rules from equal priorities defaults are grouped together, rule conditions in a group are mutual exclusive due to the third constraint on the source default theory. Step 2: Subsumption relations between rules from these groups are determined to form the RDR tree T.

3. To get the RDR tree T equivalent to D, subsumption relations between rules from defaults of different priorities are determined (see step 2 in figure 4.3). Rules converted from defaults with priorities min + n are represented as exceptions for equivalent rules of defaults with priorities min + n - 1. The resultant tree T reflects relationships between defaults with differing priorities (figure 4.3). In determining these subsumption relationships, the following must be observed: For two defaults

66 d = a' : /J' and d, = a';:' .where priority(d,) < priority(d,): The rule I {JI

converted from d2 is an exception of the rule converted from d1 if a1 subsumes a2 , that

is, if w u a2 I- a1 . Because defaults with same priority are guaranteed to be mutually

exclusive (see constraint 4 earlier), a 2 can only be subsumed by exactly one prerequisite of exactly one default in a collection of defaults of equal priorities. However, the subsumption restriction may lead to more than one exception rule of the

rule corresponding to a default d1• In other words, a rule in T can be an exception for at most one other rule. But a rule can have more than one exception rule, when this occurs, all generated exceptions are linked in a false chain in which the order is irrelevant because of the mutual exclusion constraint (constraint 4 earlier).

Finally, the incremental knowledge acquisition process with RDR corresponds to an incremental development of the corresponding default theory D. In D, if an expert disagrees with the returned consequence of the chosen default with priority n, s/he will need to add to the theory a new default with a priority larger than n. This corresponds to adding an exception rule to a corresponding RDR tree.

This section completes a two way mapping between RDR trees and default theories. The significance of this mapping is in that, it shows that the RDR framework can be seen to solve the problem of controlling the interactions and conflicts between normal defaults. This is done by implicitly assigning priorities to defaults during the knowledge acquisition process. This implicit priority is captured through the context of rules in the knowledge base. The mapping from an RDR tree to a default theory makes these priorities explicit. In characterising constraints on default theories that can be mapped to RDR trees, we highlight the expressiveness limits of RDR trees. As we will later discuss in chapter 6, NRDR expressive power is beyond those limits, and the range of default theories that can be mapped to NRDR is wider.

67 4.5. Chapter summary In this chapter, we presented a framework to formalise semantics and the structure of RDR trees. This framework also gives a discourse language to describe the knowledge acquisition process developing RDR trees. We introduced the notion of predictivity to describe the quality of interactions with an expert. We showed that the depth of rules exceptions is primarily determined by this predictivity, as long as this predictivity is bound by a fixed constant. We called this condition the correctness principle. Whether this condition holds or not, depends primarily on the nature of the domain.

Based on our framework, we also analysed the following two aspects of the knowledge acquisition process: The choice of default knowledge, and the accuracy of an RDR knowledge base with respect to past seen cases. These two issues apply equally to Nested Ripple Down Rules as expert concepts in a Nested RDR knowledge base are defined as RDR trees. We concluded the chapter with highlighting the strong relationship between RDR and default theories. We also provided a two way mapping between the two formalism.

This chapter has been largely expository of our formal framework. In chapter 7, this framework will be used to analyse the convergence of RDR in general and the cost of maintaining an NRDR knowledge base. Based on this framework, we will explain some of the empirical results available in the literature. Most importantly with respect to our own framework of NRDR, we extend this framework to cover NRDR specific notions. We then use the extended framework to analyse interactions between RDR trees within an NRDR knowledge base. This will show that these interactions place very little burden on the expert during the knowledge acquisition process. We will show that the added advantages of NRDR of modelling, and of removing some RDR repetition (discussed in the previous chapter) do not incur any extra burden on the expert during the knowledge

68 acquisition process. Hence, we show that the search knowledge base in our workbench SmS needs not be complete, and its incremental refinement is economically acceptable.

Before we analyse the knowledge acquisition process with NRDR, we analyse the background behind their conception in the next chapter. Our knowledge representation formalism NRDR plays a key role in satisfying requirements of the knowledge acquisition framework needed to capture human expert search knowledge. These requirements will be discussed in the next chapter. We will then introduce the architecture of our workbench SmS (Smart Searcher) which uses our new NRDR framework to capture expert search knowledge.

69 Chapter 5 Acquiring Search Knowledge Incrementally

In our research towards acquiring human search knowledge incrementally, we developed a workbench to test our ideas and our NRDR knowledge acquisition framework. We call this workbench Smart Searcher (SmS). In this chapter, we describe our workbench SmS (Smart Searcher). Before we introduce the design and architecture of SmS, we discuss the nature of search knowledge, and we extend this discussion to highlight requirements of the knowledge acquisition framework needed to capture, refine and reuse this knowledge. Some of these requirements are satisfied by our NRDR framework, which we use in SmS to capture and reuse human search knowledge. The rest of those requirements are accommodated by the architecture of SmS.

This chapter will also present an analysis of the knowledge engineering steps required to adapt SmS across different domains. This adaptation takes place mostly at the knowledge level during the actual knowledge acquisition. This results in a domain dependent search knowledge base. In chapter 8, we will present a case study which illustrates a knowledge acquisition process using SmS. Following that presentation, we will revisit the knowledge engineering phase analysed at the end of this chapter. We will characterise domains for which our approach is applicable. This will be an important discussion, as SmS should not be confused to be a general problem solving architecture. It was not intended to be so.

70 This chapter is organised as follows: In section 5.1, we discuss the nature of search knowledge. In section 5.2, we describe abstract guidelines for acquiring it. In section 5.3, we develop the discussion of sections 5.1 and 5.2 into a concrete set of requirement on which we base our SmS architecture. This is presented in section 5.4. Finally, in section 5.5, we discuss the steps required to adapt SmS to different search domains.

In the following section, we discuss the nature of search knowledge and the challenges in capturing expertise in search problems. This will be followed by an abstract description of our approach to meet those challenges.

5.1. Knowledge on search heuristics

In our work, the effective performance of a search process itself is the domain of expertise. In many activities of human intellectual endeavour, some sort of search within a set of potential solutions can be said to take place. Eg. in design processes, engineers will evaluate partial designs on their fitness for a vantage point to complete the design successfully. If a partial design seems unfit, an alternative partial design is selected etc. Capturing the expertise being employed for such skilled search processes is a difficult task. This is reflected in the great efforts being spent on building tools for semi-automatic or even automatic design for many technical problems, eg. in circuit design (Lengauer 1990; Gu 1996), mechanical design (Gero and Sudweeks 1996) or architecture (Gero and Tyugu 1994). For example, in circuit design, usually the design produced by current design tools is optimised by human intervention. The process of such human intervention is very difficult to formalise and, thus does not usually become automated.

Formalising the employed knowledge in the human search process seems more difficult than in many other domains of human expertise, since it is mostly a skill rather than (declarative) knowledge what is employed. And even the inspection of such skills seems more difficult than 'say', the inspection of motor skills. Motor skills can be observed from the outside. The search for a solution of a problem, however, is a process, which can - to a significant extent - only be observed by introspection.

71 Such introspection will be limited to a certain part of the actual thoughts, which produce the search process. In particular, in such introspection observed by thinking aloud protocols the expert talks of static features of search states that have logical implications to his/her decisions. The cognitive processes that go in the recognition of these features remain outside his/her awareness and are often perceived to lie at the fringe of conscious thoughts. They drive the thoughts that the expert becomes aware of.

Even for other types of domains of expertise it is well known that experts usually can neither explain coherently why nor under exactly which circumstances they decide as they do. Their explanations are normally justifications for the decisions they make rather then a recipe for how they arrive at such decisions. Thus, it is unlikely that experts will be able to completely specify heuristic rules which could generate a search process reasonably similar to the search process the expert is adopting at the conscious level.

In fact, the expert skill used in searching for solutions seems inaccessible by introspection. Plenty of evidence can be found in philosophical considerations on human thought processes that introspectively inaccessible knowledge exists, see e.g. (Winograd and Flores 1987; Dreyfus 1994), that it plays a major role in many important areas of human intelligence (Clancey 1993), and that it is of considerable complexity (Simon 1974; Hoffmann 1992). Due to its non-introspective nature and its complexity, the acquisition of such knowledge is particularly difficult.

Although there are plenty of applications of more relevance, in the tradition of using chess as the Drosophila of AI (Coles 1994), let us consider the expert search processes in chess playing for illustration purposes. De Groot (Groot 1965; Groot 1966) conducted systematic psychological studies into the thought processes involved in master chess play. A master chess player, thinking aloud, may report the following:

... Let me try to attack the pawn on f2. Ok, I can move my bishop to c5 attacking this pawn. Possibly, my opponent will move his knight to e4 defending the pawn and simultaneously attacking my bishop on c5, such that I am forced to move the bishop again. This is unpleasant - so let me see whether this problem can be fixed. Maybe, I can avoid the knight move to e4.

72 Oh well, first I can attack the knight by moving my pawn to b4 forcing it to move to another square. If it does not move to e4, then moving my bishop to c5 is much better ....

Obviously, such an expert search process involves more complex reasoning than just the association of move sequences which were useful in other chess positions. For example it involves some causal reasoning on a rather abstract level. However, it seems difficult to devise a general inference mechanism, which could accommodate such expert reasoning. This is particularly the case, since much of such reasoning will not be at a conscious level to the expert.

5.2. Acquiring Search Knowledge

As we discussed in chapter 2, knowledge Acquisition in the 90's is widely considered a modelling activity (Schreiber, Wielinga et al. 1993; Shaw and Gaines 1993; Schreiber, Wielinga et al. 1994). What is modelled is the domain intertwined with the expert problem solving. Such a model is attempted to be described at the Knowledge level (Newell 1982).

Most of the Knowledge Acquisition approaches to build knowledge-based systems, e.g. (Chandrasekaran 1986; Aussenac, Frontin et al. 1989; Shadbolt and Wielinga 1990; Dieng, Giboin et al. 1992; Schreiber, Wielinga et al. 1994) support a problem/knowledge analysis by the expert, the knowledge engineer, general system analysts, ... or some combination of them. This may involve steps like developing a conceptual model of the knowledge being used by an expert, distinguishing different subtasks to be solved, different types of knowledge to be distinguished during reasoning processes etc. This approach can be facilitated by a formalised analysis method such as the KADS methodology (Schreiber, Wielinga et al. 1993) which is outlined in chapter 2. Eventually, such a knowledge engineering approach either results in a model of the expertise which is easy to tum into an operational program manually, or even an automatically generated operational program comes out of the process (see chapter 2).

Opposed to that is our approach, which targets to skip the time-consuming process of analysing the expertise and the problem domain by a knowledge engineer, as it has

73 been advocated in Ripple Down Rules, e.g. in (Compton, Kang et al. 1993). Our approach allows the experts themselves to communicate their expertise to the system. Since also experts usually need to perform a structuring of their knowledge themselves in order to articulate it, the system should allow to specify knowledge 'on the fly' and to restructure or rephrase or refine it later on. We view the expert knowledge from a purely declarative perspective. This is a big advantage over generic techniques as in (Chandrasekaran 1986) and (Yost 1993) where from a procedural perspective, knowledge is encoded by programmers. Thus, we allow the expert to develop the structure of the knowledge they want to communicate to the system during the actual knowledge acquisition and maintenance process.

We aim at acquiring expert search knowledge mainly by other than purely introspective means. We want to rather (re-)construct that knowledge than to directly acquire it. We envisage a spiral process of knowledge acquisition, similar to (Linster 1993) of coming stepwise closer and closer to an operationalisation of the knowledge acquired. Our approach follows works on knowledge acquisition using Ripple Down Rules (RDR) which allow incremental knowledge acquisition and maintenance without a knowledge engineer (Compton and Jansen. 1990; Compton, Edwards et al. 1992; Compton, Preston et al. 1994; Kang, Compton et al. 1998).

As we outlined in chapter 1, we believe that experts are usually able to explain their reasoning process on a particular problem instance in rather general terms that cover at least the given concrete next step in their search process. Their explanation may be quite inaccurate, in that they may overlook parts of the definition of concepts that they may use, in particular where these definition parts do not apply for the problem instance under their consideration. Our approach allows incremental acquisition of complex concept definitions without demanding an operational definition from the expert. The expert is merely required to judge whether the concept applies to particular instances. This is a much more natural task for an expert than to articulate general rules on how to judge any particular instance.

In the next section, we a give a concrete set of requirements of the knowledge acquisition environment needed to implement our approach of capturing search knowledge incrementally.

74 5.3. Requirements for a KA environment As a result of our aim of providing a framework to reconstruct expert search knowledge by directly interacting with the expert, a suitable knowledge acquisition tool has to support the following steps for an expert building a knowledge base of search control knowledge:

1. The criteria being used to select search steps worthwhile to consider for deepening the tree search must be very flexible. I.e. the following options should be available: • The expert can freely define concepts to characterise search states as well as search operators applied to these search states. • The definition of concepts should allow the use of other expert-defined 'sub-concepts'. • Revision, modification or amendment of initially defined concepts must be possible. E.g. an expert's vocabulary in chess may include terms like 'King side attack: 'attacking move', etc ... Such terms are not easily defined comprehensively. As a consequence, such terms are subject to incremental refinement throughout the Knowledge Acquisition process. Furthermore, the side effects being introduced by an amendment of a (sub )concept must be limited or controllable in some way. • In articulating their knowledge, experts refine their model in a bidirectional manner: bottom up and top down (Sowa 1984). This must be allowed during the knowledge acquisition process.

The above criteria are handled by our NRDR knowledge acquisition framework. This will be detailed in the next chapter.

2. As indicated in the example of a chess expert search process, how the search proceeds may depend on the findings of search states encountered earlier in the search. To accommodate this sort of reasoning, the system should have a search engine which logs the encountered search states along with potentially interesting findings of the encountered search states. I.e. certain characteristics of search states should be stored as well. This reduces computational requirements as many

75 characteristics persist over sequences of search operators. Furthermore, the expert definable selection criteria for search operators must allow conditions which involve such findings of earlier encountered search states.

3. Finally, there is evidence that domain expert knowledge is fairly complex (Sowa 1984), for instance Simon (Simon 1974) estimated that a chess master has about 25000-100000 schemata (mental concepts). Hence, any knowledge representation used must be dense, structured and modular to allow easy maintenance, access and development.

4. In many search domains - e.g. chess, circuits design, ... - search states (i.e. cases presented to the expert) may have components (in chess these are squares on the board, in circuit design these may be smaller circuit components) and the expert's explanation will often refer to certain relations between these components. Thus, we will require a representation that is more powerful than propositional logic to represent expert comments about individual search states.

5. The Knowledge Acquisition workbench must accommodate an expert's suggestions (not only explanations). That is the expert will not only comment on solutions found by the system, the expert must suggest and justify a solution to the system when it fails to find one. When a suggestion is made, new search operators may be introduced to the system. That is, expert suggestions indirectly check that the actual search operators used by the system are effective. This speeds the knowledge acquisition process. This will also verify - as we will later discuss in chapter 9 - whether or not the system is useable in a given domain.

Our Nested Ripple Down Rules (NRDR) address issues related to concept definitions (criteria 1). We will describe their technical details and development policies in detail in the next chapter. Nested Ripple Down Rules also satisfy the modularity and structure requirements (criteria 4), and they offer a dense knowledge representation. The expressiveness of the knowledge representation is also required indirectly by criteria 4. How this and the rest of the criteria are fulfilled within our framework is discussed in the next section as we present the architecture of our knowledge acquisition workbench SmS.

76 5.4 Smart Searcher 1.3 (SmS)

In this section of this chapter, we will discuss our work towards acquiring search knowledge. We will first present our system's architecture. This will highlight how some of the requirements discussed in the previous section are fulfilled. The satisfaction of the rest of the requirements will be discussed in the context of giving a detailed description of our knowledge representation scheme Nested Ripple Down Rules in the next chapter.

5.4.1. System architecture In this sub-section, we first give an overview of the system's architecture. We then follow this by a detailed description of every module.

In designing our system SmSl.3, we aimed at providing a workbench to allow efficient development of search heuristics and their easy refinement. SmS aims towards providing the means to acquire human expert knowledge as earlier discussed in this chapter. For the acquisition of this knowledge and to assist a human expert in modifying the knowledge base as required, SmS models the structure of expert search processes as seen in figure 5.1.

The architecture shown in figure 5.1 provides two modes of functionalities: Firstly it allows the expert to incrementally construct the system's knowledge during knowledge acquisition. Secondly, when the system conducts a search it operationalises the constructed knowledge.

During incremental knowledge acquisition, the Knowledge Acquisition Module together with the Case Data Base and the Knowledge Acquisition Assistant are responsible for the incremental development of a Nested Ripple Down Rule knowledge base which is always consistent with the past seen cases. The progress of the search is stored in the working memory. This is often used by the expert to explain her/his decisions during incremental knowledge acquisition. The search progress also gets later used when the Search Engine conducts the actual search. The Domain Specific Search Operators Module and the Search Control Knowledge Base which

77 functions as a filter on all applicable search operators determine which search states are actually visited.

Working SmS (Smart Searcher) 1.3 Men10cy

Search Con tro I Domain Specific Search Engine Knowledge Base E'ri mitives

Case Data Ba!le

Knowledge Acquisition Module

KA Assistant

User Interface

( (aH:::o ) ~ Figure 5.1. SmSl.3 Architecture

We now give a more detailed - largely functional - description of every module in the system:

The domain specific search operators module The domain specific search operators module contains a set of search operators forming an instance generator. Given a particular search state, this generator is capable of generating all immediate next possible search states. This module also

78 contains a set of explanatory primitives. The expert uses these primitives to construct the lowest abstraction layer of the hierarchical search knowledge base (see next section). These domain specific primitives are designed in consultation with a domain expert. They allow his/her natural description of a domain. They satisfy criteria 4 in the previous section. For example, in the chess domain, they describe spatial and tactical relations between pieces and squares.

The search control knowledge base The search control knowledge base stores what the expert expresses as search control knowledge. Using this knowledge, given the possible next states from the previous module, this module passes through only those states seen as worth pursuing deeper during a search. This module contains the larger part of the domain knowledge. This knowledge base is built during the incremental knowledge acquisition process. Our NRDR framework to be detailed in the next chapter is used to allow the expert to incrementally construct this knowledge base.

The working memory The working memory stores the progress of the search, which is often used by the expert to explain his/her decisions during incremental knowledge acquisition. For example, solving a component placement problem, a circuit designer chooses his/her next step based on a rough plan; this plan prevails in the progress towards finding a problem solution. During the search, contents of the working memory change dynamically every time a new search path is visited. That stored search path is used by the knowledge base to evaluate expert rules which refer to the ongoing search process. When the system is used to conduct a search, the working memory may need to be consulted by the knowledge base to evaluate some rules. From a Ripple Down Rule perspective, this working memory allows the expert to give relational description between cases in the search path, as s/he can refer to any of the cases in the working memory while s/he explains a current case description. In (Richards and Compton 1999), the authors saw the inability to describe relations between cases as detrimental to their room allocation task (The so-called Sysiphus I problem in the knowledge acquisition community).

79 The working memory also stores higher order features of steps of an evolving search plan. This reduces computational requirements as these features can get used again at a later stage of the search minimising the number of interactions between the search engine and the knowledge base. This decreases inferencing run-time.

The working memory module satisfies the second requirement of the knowledge acquisition environment discussed earlier in this chapter. The way the expert accesses the contents of this working memory and uses it in expressing the conditions of his/her rules, will be detailed when we later introduce our Search Knowledge Interactive Language (SKIL) in chapter 8.

The search engine The search engine controls the generation of the search tree through interactions with the knowledge base. For a given search state it evaluates all operators to find those operators which should be applied according to the knowledge base. The evaluation of the rules in the knowledge base may also involve the examination of the generated search tree. This evaluation is conducted in a depth (best) first order. We have two methods for determining the best search state to be followed by the search engine.

In the first method, during search control knowledge acquisition, the expert assigns a weight to every rule, which he/she enters. These weights do not get adjusted in maintenance. This is a new extension to the RDR framework. During searching, the search engine applies the search operators chosen as a result of applying the rule with the highest weight. This weight is assigned to the chosen operators. If the highest weight of an operator falls below a given threshold, the search engine backtracks, and in another state of the search tree an operator is applied with a higher weight. For example, in the domain of chess the expert may play a move sacrificing his/her queen, in such a case s/he would only look for moves to win her/him back at least 9 points1• In such a case, the knowledge base would lead the search engine to consider search paths that lead to a total gain of at least 9 points. Thus, SmS uses an implicit evaluation function constructed during the knowledge acquisition process. Note, if the

1 The relative value of the queen in chess is 9 points, e.g. the value of a pawn is 1 point.

80 threshold cannot be satisfied then the search path yielding the highest value is chosen2•

In the second method for choosing the best search state, the knowledge engineer provides a simple evaluation function and a pruned Alpha-Beta search (Pearl and Korf 1987) is conducted based on the values of the provided function. This simplifies the knowledge acquisition process at the expense of a slightly more elaborate knowledge engineering process. In domains where an evaluation function is well known, the saving of the effort on the expert during the knowledge acquisition process warrants using this known evaluation function.

The knowledge acquisition module The knowledge acquisition module gets the expert input through the user interface. It maintains the knowledge base as well as the case database. The inference algorithms implemented by this module will be discussed in detail in the next chapter.

The case database The case database stores all cases classified by the expert. It allows retrieval of these cases according to their classifications which are time-stamped. Thus, this database contains a complete history of interactions with the expert. Although, not all of the interactions affect the knowledge base development, they are essential for the

2 We have considered extending the threshold used in the above search method to accommodate uncertainty measures in the outcome of the search. This raises a research question of how can such uncertainty be incrementally captured from an expert during the actual knowledge acquisition process.

The difficulty in this is twofold: Firstly how can we be sure that the expert remains consistent in expressing his/her measures (feeling) of uncertainty with respect to some given scale. Secondly, how can the knowledge base maintain a proper use of these uncertainties as the knowledge base is maintained consistent during the incremental knowledge acquisition process. The second problem is a technical problem and solving it is likely a matter of time. The first is a fundamental problem which

may be impossible to resolve as the question is really how do we actually operationalise the expert feelings of uncertainty which fluctuates depending on his/her psychological state.

81 functionality of the knowledge acquisition assistant while maintaining the knowledge base. This is described below.

The knowledge acquisition assistant The knowledge acquisition assistant provides hints to the expert as to which parts of the knowledge base may need to be modified while ensuring the consistency of the knowledge base with the case database. It relies on past interactions with the expert stored in the case database to give these hints. The role of this module in connection with the case data base will become clear in the next chapter as we discuss knowledge base maintenance in our knowledge representation scheme of Nested Ripple Down Rules.

The user interface The user interface reads the expert input and displays the system's answer to a search request. Further, it provides graphical representation of the knowledge base and graphical output of the automatic assistant to the expert. The user interface will be described in the next section in reference to the interface to a Nested Ripple Down Rule knowledge base. This interface will be comprehensively described in Appendix A of this thesis.

Contents of the knowledge base and the domain specific operators module are the only parts of the system, which are domain dependent. Developing the contents of the knowledge base is primarily a knowledge acquisition task. The Domain specific search operators are cheap and easy to develop, particularly in well defined search domains. Their development will be later analysed and discussed in section 5.6.

The readiness of adapting SmS to different search domains generalises its utility as a workbench to design search heuristic inexpensively. Moving across different search domains the architecture of SmS is intended to be reusable. The success of this reusability clearly depends on the successful construction of the two domain dependent modules: the domain specific primitives module and the search control knowledge base. Building the first is a knowledge engineering task, while the second is built during the actual knowledge acquisition process. We will discuss details of these two modules in section 5.6.

82 The graphi cal user interface used in SmS to support the interactions with an expert is descri bed in the next section.

5.4.2. NRDR user interface A visual di splay is important for the human expert to deal with the complexity of the knowledge representation. The expert is given access to the knowledge base at three levels: Firstly s/he has an overview of the whole knowledge base, secondly s/he can browse and access individual concept definitions and lastly s/he can view individual rules. Figure 5.2. shows the list of all available concepts defined within a knowledge

[!] . NRDR in ches~ for m~y22_96._

file 1Itilio/ Q.ame §.oard

List of Concepts C entreMove : size= 1 Good :size= 5 LooseKnight :size= 1 MaterialLoss :size= 1 Opening :size= 1 · einforce :size= 2 ReinforceNew :size= 1 SaveKnight :size= 1 SavePiece :size= 1 'Threat : size= 1 I Concept Pl:Opei1ies

Parameter= □;- Weight= 0.000000 No Comment

Testing Pl:Opei1ies _J Test Against

+Good J I #concept=lO & #rules=15 Figure 5.2: Knowledge Base Overview

83 base. The interface provides a li st of all available concepts. A double click on a concept gives a graphi cal representation of the actual Ripple Down Rule tree defining the concept. In this representation a horizontal link is a true branch and a vertical link is a false branch (see figure 5.3). A double click on a node will give a view of the contents of the rule. A double click on a high order concept used as a condition will give a graphical view of the RDR representing this concept (see figure 5.3). It is possi ble to see the full knowledge base by having a cascade of windows representing the hierarchy of the knowledge base structure (see figure 5.4). For an update, all points of refinements are available for an expert via this cascaded representation. This allows top-down or bottom up development of the knowledge base, satisfying both tendencies of human experts. The conceptual structure which emerges as a result of the knowledge acquisition can easily be inspected using the interface. A Nested RDR knowledge base is very readable and can be easily reused e.g. as an expertise reference source or for tutoring purposes.

[!] Analysing U1e Good concept gIJ . '!' - .. - ' 0 I Rule 4 of Good If +Materia!Loss!(0), Then -Good The weight is 999 .000000 0

Cei.1ainty ~ I I Concept Properties Paramete:r= 0;- Weight= 0 .000000 The most gene:ral concept CnrStone I Em I I Figure 5.3: Graphical view of a concept defi nition

84 • •

Rule 2 of Reinfm·ce

IfX=•dtN1, ~ Y=•dtN2, J Z=•dtN3,

Ill! 11!111 111!1 Ill llll llll X•to-Zin1-N1, -~ I -Then •Reinforce 1181 Ill llll 1111 ll!l ll!l Certainty 1!11 l!lll 11111! ll!l 111!1 llll 111!1 Iii 1111 "' 111!1 llll 111!1 111!1 .. 1111 Iii Concept Prope:rties 1111 11111 Parameter= 0;­ llll ll!lll 1181 1111 1111 ...... Weight= 0 000000 l!lll llll 1181 1181 -llllllilllll "' No Comment CnrStone ~ 11111 Em llll 1181 llll 1111 11111 ~ "' "" "' "" Figure 5.4. Graphical view of the knowledge base. Concept definitions are given in separate windows

The use of this interface will be demonstrated with example knowledge acquisition sessions in chapter 8. In the next section, we discuss the steps required to adapt the architecture of SmS to a given search domain.

5.5. SmS reuse in search domains

The description in this section of steps required to adapt SmS to a given search domain is adapted from (Beydoun and Hoffmann 1999). Later, fo11owing the presentation of a case study in chapter 8, these steps will be analysed to characterise domains for whi ch SmS is effective.

The effectiveness of SmS in a given domain depends largely on whether or not a search knowledge base can be successfully built and effectively used. The search knowledge base and the domain primitives are tightly coupled (see figure 5.5). Interpretation and successful incremental construction of a knowledge base depends

85 Search Control D01nain Specific Know ledge Base Primitives

SKIL Interpreter Explanatory Primitives

Knowledge Base Search Operators

Figure 5.5. The Search Control Knowledge Base module has two sub-modules: a SKIL Interpreter and the actual knowledge base. The Domain Specific Primitives module has also two sub-modules. The above figure shows the domain specific interactions between these sub-modules within SmS on the choice of the domain specific operators. The effective use of the knowledge base depends on having a set of search operators that covers regions of the search space required to find search solutions. In other words, the knowledge base filters through a search solution only if, such a solution is first generated by the search space generator.

In what follows m this section, we will describe the knowledge engineering/acquisition stages involved in reusing SmS across different domains. This includes the domain dependent task of designing domain specific operators. Details and pitfalls of every stage will be discussed.

Stages in adapting SmS to search domains Four steps are involved in using SmS in a given search domain: • Deciding on a search state representation. • Building a state generator, that is deciding on a set of search operators.

86 • Building a set of explanatory primitives. • Building a knowledge base incrementally.

In what follows, we describe details of every stage and its role.

Deciding on a search state representation In this stage, a knowledge engineer in consultation with the domain expert decides on the right ontological units to characterise search states i.e. s/he decides on features in the feature-vector representation of cases (search states) presented to the expert. Such modelling task is also normally needed when building classifiers with tools such as C4.5 (Quinlan 1993), RDR (Compton, Edwards et al. 1991), or Induct (Gaines and Compton 1992). Later in chapter 9, we will discuss difficulties in this task and its relation to the task of designing a state generator. That discussion will form a guide to characterise domains for which SmS is effective.

Building a set of explanatory primitives In this stage the knowledge engineer designs a set of explanatory primitives. S/he does this in consultation with a domain expert. These primitives are atomic relations between elements (components) of the search state. This satisfies the fourth requirement discussed in section 5.3. These primitives allow the expert to give his/her description of a domain in his/her natural way. They are an interface between the search state representation and the search operators on one hand, and the expert on the other hand. Later as will be shown in our chess example in chapter 8, explanatory primitives allow the expert to express spatial and tactical relations between pieces and squares.

The explanatory domain primitives form expressions in our search knowledge interactive language (SKIL). They are implemented as C-procedures that are called by a SKIL interpreter when the knowledge base is used. This interpreter is part of the knowledge base module and interfaces to the explanatory primitives sub-module (see figure 5.5). The interpreter is a reusable domain-independent component of the system. All expert explanations are given in SKIL. A detailed description of SKIL and an example of explanatory primitives for the domain of chess will be described in chapter 8.

87 It should be noted that the role of the explanatory primitives is to facilitate the knowledge acquisition process. In theory, it is possible to define Nested Ripple Down Rules concepts to replace them. Whether an NRDR concept is defined or a new primitive is added depends on a number of factors: The simplicity of its definition as a C-function versus an NRDR concept, and its frequency of use in the knowledge base. Later in section 8.1, we will discuss, in more detail, how those factors impact the decision of whether a concept should be implemented as a C-function or an NRDR concept.

Building a search state generator In this stage the knowledge engineer decides on a set of search operators. Given a particular search state, a complete state generator is capable of generating all successor states. If the knowledge engineer has some basic knowledge of the domain then s/he can incorporate it in this stage. For example, in the domain of chess, if the knowledge engineer knows the rules of the game, s/he can create a state generator which, given a starting position, produces only legal positions. Otherwise s/he can simply create a search state generator, which allows all syntactically correct search states, and s/he leaves more details to a domain expert during the knowledge acquisition process.

Later in chapter 9, we will discuss conditions under which an incomplete state generator - one that does not generate all successor states - is useable. That discussion will be important because the useability of the state generator underpins the success of the search engine in SmS in finding a solution.

Building a knowledge base incrementally As earlier discussed the knowledge base is built incrementally. Using the stored knowledge in the search knowledge base, only those states deemed worthwhile pursuing are accepted from possible successor states generated by a state generator. When the expert is satisfied with its performance, the knowledge base sub-module would contain the larger part of the domain knowledge. In our knowledge representation scheme Nested Ripple Down Rules, the expert can enter his/her own domain terms. These terms can in tum be later used to explain more of his/her domain

88 terms. This gives rise to a hierarchy of terms. The lowest level of the hierarchy consists of basic explanatory domain primitives. Details of constructing a knowledge base will be discussed in the next chapter dedicated to the presentation of technical aspects of our framework of Nested Ripple Down Rules.

A search knowledge base is never assumed to be complete. With our Nested Ripple Down Rules formalism, further additions of new knowledge (in the form of symbolic rules) are anticipated. NRDR extends some of the works done in RDR research (e.g see (Compton and Jansen. 1990; Compton, Edwards et al. 1991; Compton, Edwards et al. 1992; Compton, Kang et al. 1993)) and preserves RDR's main strength being that of ease of maintenance. In NRDR, the addition of new rules ensures that past cases remain correctly classified. An incomplete knowledge base is expected, and does not pose problems. Indeed, as we will later in our analysis of the knowledge engineering phase in chapter 9, problematic adaptation of SmS only arises out of an incomplete knowledge engineering phase, never from the knowledge acquisition stage. The knowledge engineering phase includes the first three steps outlined in this section. Further, an incomplete knowledge engineering phase causes problems, only if revisiting it, affects the incremental process of building the knowledge base.

Our discussion of the process of building a domain dependent search knowledge base incrementally is left brief in this section. This will be the topic of discussion of the next two chapters. This process will then be illustrated with an actual case study in chapter 8.

5.6. Chapter summary and conclusion

This chapter presented our framework to capture search knowledge incrementally. We discussed the nature of expert search knowledge. In particular, we highlighted that it is not completely available for introspection. We gave an abstract description of our approach which captures search knowledge incrementally based on expert's explanations. Our approach aims to reconstruct expert's knowledge in the context of its use. As we earlier discussed in chapters 1 and 2, an expert's complete introspective access to his/her knowledge is impossible, but we believe this is not necessary to effectively capture a sufficiently operational part of his/her domain knowledge. Based

89 on this view, we presented a set of requirements for a knowledge acquisition environment that enable the reconstruction of this expertise. Our workbench SmS, using our NRDR knowledge acquisition framework, fulfils these requirements. In their light, we gave a detailed description of SmS' s architecture.

This chapter also discussed how the architecture of SmS is adapted to different domains. This adaptation includes a short knowledge engineering phase. We outlined three steps in this phase: deciding on a search state representation, building a set of search operators and building a set of explanatory primitives. In chapter 9, we will discuss how these steps are related. We will outline conditions and assumptions under which it is possible to revisit one or more of the steps throughout the incremental knowledge acquisition process without loosing past efforts in the process. This will give a characterisation of domains for which incremental search knowledge acquisition is feasible and effective.

The bulk of effort for adapting SmS to different domains takes place at the knowledge level during the actual knowledge acquisition process. Details of this process will be the centre of our discussion throughout the rest of this thesis. In the next chapter, we discuss technical aspects of our knowledge acquisition framework in SmS. This is based on our Nested Ripple Down Rule knowledge representation formalism outlined in chapter 3 and briefly discussed in chapter 4. Our formalism allows an expert to define his/her own terms and abstractions. Interactions between terms introduced by the expert, due to their interdependence, add an extra complexity to our formalism. This mandates further effort by the expert beyond what his/her initial explanations intended.

The hierarchical structure of the knowledge base and the complexity of interactions between the expert and the knowledge base, require that the knowledge base interface is especially designed to facilitate the role of the expert. Further, update policies of an NRDR knowledge base are tailored to control the complexity of interactions between the knowledge base and the expert. Policies adopted in the knowledge acquisition process will be the focus of the next chapter. The cost of the extra effort by the expert in the incremental knowledge acquisition process in NRDR (as compared to an

90 incremental KA process with RDR) due to interactions between expert terms, will be later analysed in chapter 7.

91 Chapter 6

Nested Ripple Down Rules

In this chapter, we discuss how incremental addition of knowledge to the actual knowledge base is possible within our Nested Ripple Down Rules (NRDR) formalism, which is the central knowledge representation scheme in our system SmS 1.3, as we discussed in the previous chapter. This chapter will be concerned with NRDR's technical details and their philosophical underpinning.

This chapter is organised as follows: In section 6.1, we justify the need for NRDR from an epistemological perspective. In section 6.2, we overview technical details of NRDR. Maintenance of an NRDR knowledge base is as expected more complicated than a simple RDR knowledge base. This is partly due to interactions between terms which the expert defines, and which s/he may reuse in different contexts. In section

92 6.3, we analyse this extra complexity of maintaining an NRDR knowledge base. To control this complexity, we adopt a fixed set of maintenance policies during the knowledge acquisition process. We present and discuss these policies in section 6.4. In section 6.5, we discuss the relationship between default theories and our NRDR formalism. We illustrate NRDR maintenance features and policies with an example in section 6.6. In section 6.7, we end this chapter by drawing a parallel between knowledge base maintenance in our NRDR formalism and maintenance of scientific theories as discussed in philosophy. We argue that NRDR maintenance has certain parallels to confirmation holism of scientific theories as advocated by Quine (Quine 1951).

We first give an overview of why we believe our NRDR framework actually facilitates interactions with an expert, which in tum facilitates the incremental construction of a knowledge base.

6.1. Why Nested Ripple Down Rules?

The discussion in this section is largely philosophical. It argues for NRDR from an epistemological perspective. It will be expanded into purely philosophical lines later in section 6.7.

In daily use of language, people use very complex processes. When a simple sentence: "No smoking on planes" is understood by a person, s/he has to know what smoking means, what planes are, ... Moreover, for the person to deduce that s/he must alter her/his behaviour, s/he must know that s/he must be boarding a plane to do so (which of course implies that s/he must know what boarding a plane is). To explain such a sentence to an extraterrestrial being of comparable intelligence, a human will have to introduce new terms. For example in explaining what a plane is, s/he may proceed: a plane is a vehicle used to transport people via air. Then s/he will have to define what s/he means by "vehicle", "transport", "air" ... As s/he explains these terms s/he will have to consider the context of each term s/he chooses to use. For example, the alien may have previously heard the sentence "he has certain air around

93 him", and s/he may wonder if this is the same "air" where plane travels. In both sentences, the meaning of "air" is a function of the sentence, its context of use and the epistemological links between all its constituent terms. Thus to represent the meaning of "air", all its links in every possible sentence in the English language where it may be used must be taken into account. That is, the meaning of "air" (the concept) depends on the whole of the language.

This suggests that, when an entire body of beliefs fails to explain a current experience (like the alien encountering a new meaning of "air"), revision can strike anywhere. In (Putnam 1988), Putnam gives a lucid example how maintenance of physical theories can strike anywhere: Newton defined the physical quantity momentum to be "mass times velocity", and he noted that this quantity is always conserved. Relativistic physics highlighted problems with the concept of momentum. A number of changes in the concept of momentum could have taken place, e.g. a possibility is to relax the law of conservation of momentum when velocity becomes great. However, Einstein's ingenuity led to actually relating mass to the velocity, and this left Newtonian laws of momentum unchanged (including the law of conservation of momentum). This holistic view of meaning mandates that, because the meaning of any term can be changed, term definitions cannot be given fixed once and for all. This holistic view of knowledge refinement is advocated by many contemporary philosophers like Quine (Quine 1951), Putnam (Putnam 1988) and Davidson (Davidson 1984).

Even in restricted domains, human expertise is believed to be holistic in nature, in that the meaning of their domain concepts cannot be absolutely taken in separation from the rest of their domain knowledge. This causes experts to struggle to express themselves as they explain (justify) their expertise. This holism of expertise makes domain conceptualisation difficult. To ease this difficulty, experts use intermediate abstractions that they (re) use in further explanations. For example in chess, experts introduce notions like "centre development" to justify some of their opening moves. When asked to explain such intermediate concepts, experts may oversee a definition of the concept in some contexts outside the current situation which they are explaining. They fail to provide a complete explanation that always covers their use,

94 instead they provide an operational definition sufficient for the purpose of explaining the context on hand. Moreover, expert's articulation of intermediate concepts may depend on his/her articulation of other concepts, which may not yet be made explicit or completely defined. So as earlier suggested by holism of meaning, the incompleteness of these intermediate concepts is likely (if not often unavoidable).

Adapting the incremental knowledge acquisition process to match the expert's natural tendencies in giving his/her explanations (by introducing intermediate terms), enables the expert to more easily express his/her knowledge, and to build an operational knowledge base more effectively. Towards this, our knowledge representation

Cl Rule Cl.l

Rule Al.l Rule A2.1 Rule B2.1 AZ,82-,+Al pl,pZ-,+AZ p1,p3-, +82

Figure 6.1: A simple example of Nested Ripple Down Rules. Note that, while the syntax of conclusion in an RDR tree defining concept X is +X or -X, this corresponds semantically to True and False respectively when the concept X is being evaluated. In the above, an update in concept A2 can cause changes in the meaning of rules Cl.I, Cl.2, and Al.I of the knowledge base.

95 formalism Nested Ripple Down Rules (NRDR) allows an expert to give his/her explanations using his/her own terms. These terms are operational while they are still incompletely defined. Because every term in NRDR is defined as an RDR tree, NRDR allows experts to deal easily with exceptions, and to refine definitions of their terms readily.

6.2. Overview of Nested Ripple Down Rules

As discussed in the previous chapter, an essential requirement of our workbench to capture search expertise, is the ease of acquisition and maintenance of the search knowledge. For this purpose, we use Ripple Down Rules as a starting point for the implementation of the knowledge base and the learning module. In SmS, incorporating Nested Ripple Down Rules, the expert introduces his/her own domain abstractions. As we discussed in chapter 5, s/he uses a Ripple Down Rule structure for allowing him/her to define a conceptual hierarchy during the knowledge acquisition process. Every concept is defined as a separate simple RDR tree. These defined concepts can in tum be used as higher order attributes (conditions) by the expert to define other concepts. When more than one concept defined by an expert are used in the condition of a rule, lazy evaluation is used for efficiency purposes (that is when a concept is evaluated as false, then the value of the condition is returned as false without proceeding to evaluating the next concept). The elementary level is the level of explanatory domain primitives. In SmS, these primitives are provided for every given domain by respective C-procedures (along with the instances generator to form the domain primitives module as shown in figure 5.5) as we discussed in the previous chapter.

Simple RDR as proposed by Compton and Jansen (Compton and Jansen 1988) discriminate input objects into a set of mutually exclusive classes. When referring to a simple RDR embedded within an NRDR knowledge base, we mean a binary conclusion RDR which classifies input into two classes: Positive objects belonging to the class indicated by the expert concept, and negative objects falling outside it. This enables a set theory view of such RDR trees. Moreover, boolean algebra can be used

96 to analyse relationships between concepts. This will be done in chapter 7 to prove that inconsistencies can be dealt with automatically. We can also use Venn's diagrams to represent RDR concepts. Intersections of concepts define higher order concepts (see figure 6.3). The use of Venn's diagrams allows intuitive analysis of concepts interactions. Later in this chapter, Venn's diagrams will be used to illustrate the need for adopting update policies, during the knowledge acquisition process to control occurrence of inconsistencies.

In the next section, we present technical details of our Nested RDR. We discuss issues and policies for their development process. We also give an example of using the NRDR framework. This will highlight the extra effort in maintaining an NRDR knowledge base consistent compared with a standard RDR knowledge base. This extra effort arises as NRDR concepts interact during the knowledge acquisition process.

6.3. NRDR Technicalities

The hierarchical structure of NRDRs causes problems for keeping the entire knowledge base consistent when a single concept definition needs to be altered (see figure 6.1). These problems can be minimised by using suitable development policies during the actual knowledge acquisition process. These policies will be discussed in the coming section. We first describe a typical update process.

An NRDR knowledge base is said to fail and requires modification if the expert disagrees with the conclusion returned by the entry point RDR tree (see chapter 3 for how a conclusion is reached in an RDR tree). The entry point in NRDR as used in SmS is the highest level RDR tree. However, this does not have to be the case, as an NRDR knowledge base can be used to classify a case by any of the concepts defined within the knowledge base.

During knowledge acquisition, given a case x that requires an NRDR knowledge base to be modified, the modification can occur in a number of places. For example -

97 referring to figure 6.1 say case x satisfies conditions Al and BJ in rule Cl.I but the expert thinks that case x is not Cl. Hence, the knowledge base needs to be modified to reflect this. A rule can be added as an exception for the RDR tree describing Cl, or alternatively, the meaning of attribute Al can be changed by updating the definition Al, or A2 in rule Al .1; and so forth. The number of possibilities depends on the depth of the concept hierarchy in the knowledge base. This is where the user interface must provide assistance to the expert. An example of an NRDR interface has been shown in the last chapter. This interface makes all possible locations for update available for the expert via a cascaded representation (see figure 5.4). (A detailed description of our SmS interface is included in appendix A).

The development of the concept hierarchy depends not only on a given domain but also on the expert. It reflects his/her individual way of conceptualising his/her own thought process. In choosing a suitable point of refinement, the expert views the knowledge base holistically as an interconnected system of concept definitions. The choice of refinement depends on his/her judgment. It involves aspects, such as estimating simplicity, weighing up simplicity against successful prediction, and also whether the expert wants to adapt existing concept definitions or introduce new ones. NRDR structures can be developed in a top-down or bottom up fashion, satisfying both tendencies of human experts, of which Sowa in (Sowa 1984) states: " .. The two approaches may be combined in bidirectional reasoning, which is originally triggered by stimulus in the data (bottom up), but which then invokes a high-level goal that controls the rest of the process (top down)".

A more serious maintenance issue is dealing with inconsistencies due to localised

updates in the hierarchical knowledge base1• For instance, if the expert updates the meaning of Al by changing the meaning of attribute A2 in rule Al.I, s/he may inadvertently cause a change in the meaning of rule Cl.2 that contains A2. Generally, when a condition X is defined in terms of lower order RDR, and X is repeatedly used in different rules, the update of X - by adding an extra rule or an exception to an

1 Throughout this thesis, given a case x classified by the knowledge base, if this classification is inconsistent with respect to a correct past classification, then x is called an inconsistency.

98 existing rule - has an effect everywhere it is used. This may cause inadvertent inconsistencies with respect to past classifications. Such a side effect is detected by the system after every update. The system will check the Case Database for any inconsistencies. It must be noted that the local impact of refinement of simple RDR, where the effect of the change has no impact on the rest of the knowledge base, no longer holds.

In simple RDRs, a comer stone case is associated with every rule. The rule is the justification for the classification of this case (Richards, Chellen et al. 1996). In NRDR, every rule has a set of comer stone cases. This set contains all cases that a rule classified correctly under verification of an expert. These classified cases are stored in the Case Database module of SmS (see figure 5.1). Their classifications must always hold as the knowledge base evolves. Cases can travel between sets of comer stone cases because of interactions within an NRDR knowledge base. Checking for inconsistencies when a concept C is modified requires access to all cases previously classified by C. Following every knowledge base update, these cases are classified again. A case x is inconsistent if the new classification differs from the old classification. During the discovery of x, because of the nested structure of the RDR's some lower order concept descriptions are found about x. To repair the inconsistency of the knowledge base with respect to the case, some of those concepts describing it, may also need to be updated. This may in tum cause more inconsistencies to occur. Hence, the process of checking inconsistencies is also recursive. SmS has an automatic knowledge acquisition assistant module. This tracks these inconsistencies as they arise. It guides the expert towards eliminating them by highlighting concept descriptions that may need to be changed. In (Beydoun and Hoffmann 1998), we discussed why the number of inconsistencies is too small to have a major impact to the cost of the total knowledge acquisition process. These discussions will be expanded in chapter 7. To ensure that following a knowledge base update, the process of dealing a single occurrence of inconsistencies terminates, we introduce development policies during the actual knowledge acquisition process.

99 6.4. NRDR development policies

Inconsistencies do not occur at the lowest layer of the knowledge base hierarchy, because concepts at this layer are normal Ripple Down Rules, which use explanatory domain primitives as their conditions. Also, updates of the highest level RDR tree do not create inconsistencies. A concept defined by the highest level RDR tree is never used as a condition. That is, impact of updates in the highest concept is localised to itself. The maintenance of consistency of the middle abstraction layers in an NRDR knowledge base with respect to past seen cases, is the extra maintenance effort (of NRDR compared to simple RDR), where interactions between concepts have an impact on the knowledge acquisition process. It should be noted that in controlling the search, values of intermediate concepts are not relevant for SmS's performance. The selection by the search engine is dictated by the highest order concept defined by the entry point RDR tree. Thus in SmS, an NRDR knowledge base gets updated only during failure of the search engine in SmS to do the required selection. That is for some cases, intermediate classifications may be incorrect whilst the top classification (for the select concept) is correct. These wrong classifications can go undetected, until a consistency check is later carried out for concepts for which those cases are misclassified.

Consider figure 6.2, where both C1 and C2 are used in the definition of G. If C1 gets modified then G may need to be modified as discussed earlier. We call modifications required to fix any inconsistencies secondary refinements. Such refinements are initiated following a consistency check with respect to past seen cases. Any found inconsistencies alert the expert to their need However, another way is to let the expert check all concepts on a case and follow through his/her updates with any required secondary refinements. Clearly, that would add extra effort on the expert.

A primary refinement is an expert initiated refinement as a result of a misclassification by the knowledge base, rather than a consistency check. In SmS, a primary refinement is most likely a result of a failure of the search engine to find a correct solution.

100 The semantic requirement in a secondary refinement of G (figure 6.2) may be fulfilled by modifying C2• That is, the modification of C1 can have an indirect impact on C2. So, interactions between concepts during the knowledge acquisition process may take place within the same layer of abstraction - not upward only. Moreover, if C2 is defined in terms of lower order concepts, the semantic requirement may be fulfilled by modifying the lower concepts used in C2• Hence, this semantic requirement of secondary refinement may be fulfilled by refining lower order concepts, higher order concepts or concepts within the same layer of the abstraction hierarchy. Indeed, refinement can strike anywhere in the knowledge base. However, to avert circularity of chains of secondary refinements within the abstraction hierarchy, we introduce the following refinement policies to ensure that the chain reaction of secondary refinements terminates:

Figure 6.2. In the above simple conceptual hierarchy, there is a direct link between G on one hand and

C1 and C2 on the other hand. Moreover, there is an indirect link between C1 and C2 which is taken into consideration in designing the update policy.

Policy 1: No circular or recursive definition of concepts is allowed. Policy 2: When an inconsistency occurs as a result of a primary refinement one of the following two scenarios can take place:

101 The expert may undo the primary refinement, and s/he can enter a substitute refinement. S/He can introduce a secondary refinement in a concept only in a higher abstraction layer.

Together, the above two policies ensure, that a chain reaction of secondary updates terminates with the knowledge base being consistent with respect to past seen cases. The second policy ensures that any chain reaction of secondary refinements terminates at the highest level concept where no inconsistencies can occur (as earlier outlined). To enforce the first policy, every concept has a dependency list2. A dependency list for a concept C contains a list of all concepts that have a direct link with C. The expert is warned if a circularity is detected when s/he enters a rule. Then the rule is retracted, and s/he must enter a substitute. A substitute is always possible, because in a worst case scenario the expert may invent a new concept to avoid the circularity. Introducing a new concept at anytime has no influence on the rest of the knowledge base, since the knowledge base would not have any reference to it yet. The dependency list is also used to conduct the consistency check more efficiently. The consistency check is carried out after any concept refinement. It is a recheck of all past cases classifications, which were previously verified by the expert, against all concepts in the dependency list of C.

The above discussion of policies of updates is clearly illustrated using a Venn diagram to represent concepts as sets. The Venn diagram in figure 6.3 is the set representation of the simple conceptual hierarchy of figure 6.2. Intersections between RDR concepts - shown as circles in a Venn diagram - are used to define higher order concepts. These intersections are dynamic during the knowledge acquisition process. Change of concepts - i.e. shifts in the boundaries of the circles on a Venn diagram - causes changes in concepts defined which may cover undesired cases

2 The use of dependency lists is also common in some implementations of spreadsheet programs to

avoid circularity between cells.

102 (inconsistencies) (see figure 6.3). Considering figure 6.3, the change of concept C1 may correspond to the extension of its boundaries (as shown by the dotted boundary of C1), which clearly changes its intersection with C2• This intersection defines concept G.

,. ,.., --- ... -- ' ' ' + ' '' 'I I I C1 + ' , ' c2

Figure 6.3: Let concept G = Cl n C2. When Cl is extended, past negative cases of G (near the edge of Cl) become positive. Hence the new knowledge base is inconsistent with respect to those cases. An exception can be added to rectify this (e.g. the small dotted circle).

This change in G covers two cases (the two cases denoted by small '-' in figure 6.3) which are near its previous boundary. These cases are inconsistencies and a secondary refinement is needed. Using the above policies, the expert has two options: S/he can rethink the initial extension to C1, and s/he can ensure that the new boundaries stay clear of the two discovered inconsistencies. Alternatively, s/he can

enter an exception rule to concept G (the small dotted circle within C1 n C2) to exclude the two discovered inconsistencies.

103 An interesting question to ask is, what happens if the expert does not like to fully adhere to the presented policies? For example, what happens if the expert wants to introduce circular or recursive definitions. The conservative answer to this, is that experts do not use circular definitions, and they seldom choose to use recursive ones. Further, any recursive definition can be given in another non-recursive form. On the other hand, a more constructive answer is that more complexity can be introduced to the NRDR formalism, to allow recursive and circular definitions at the concept level, but still forbidding circularity and recursion at the rule level. That is, in evaluating a particular rule, only the rule itself is to be avoided.

6.5. NRDR and default logic

In section 4.4 (chapter 4), we discussed the strong relationship between RDR and default theories. We also characterised default theories that can be mapped to RDR trees. In this section, we discuss the relationship between default theories and NRDR. We discuss how the policies presented in the previous section translate to constraints on default theories that can be mapped to NRDR. We will show how default theories, that can be mapped to NRDR, are less constrained than theories which can be mapped to RDR. That is, the set of default theories that can be mapped to NRDR is larger than the set of those that can be mapped to RDR.

6.5.1. Mapping NRDR to default theories

An NRDR knowledge base is a collection of interacting RDR trees. To convert an NRDR knowledge base K to a default theory (D, W), every single RDR tree in K is converted to a default theory according to the mapping which was presented in section 4.4. In doing this, every NRDR concept c is mapped to a set of prioritised

defaults that have the same consequence /3 E {c, ~c}. Note that the conclusions of all RDR trees within an NRDR knowledge base are propositional values, therefore any consequence /3 has a propositional value (see definition 1 in section 4.1.1 for

104 semantics of rules in an RDR tree). The complete set of resulting defaults from K can be partitioned into a collection of subsets of defaults, where defaults in a given subset have the same consequence. The number of different consequences (or subsets) corresponds to the number of different concepts in K.

Priorities of defaults, generated during the mapping of every NRDR concept (see section 4.4), ensure mutual exclusion when applying defaults which have the same consequence. However during inference, defaults with different consequences can apply simultaneously. Given a subset of defaults in D which have the same consequence, the default with the highest priority in this subset is applied first. Analysis of what is permitted in the inference process over D will be expanded in the next sub-section, where we analyse how the update policies and the hierarchical structure of an NRDR knowledge base translate into a number of constraints on the resultant default theory (D, W). This analysis will also allow a characterisation of a default theory which can actually be mapped to NRDR. We also sketch how such a default theory, which follows those constraints, can be mapped to an NRDR know ledge base.

6.5.2. Mapping default theories to NRDR To make our analysis of which default theories can be mapped to NRDR succinct, we introduce the following definition:

Definition: Given a set of defaults D, a stratum of D is a proper subset of defaults S c D such that any two defaults in S have the same absolute consequence 1,81, but different prerequisites. Further, given S is a stratum of D where defaults in S have absolute consequence 1,81, tt d E D with absolute consequence 1,81, we have d E S.

a d a2:f32 1 :fJ 1 Where two defaults d 1 = and 2 = /3 are said to have the same /JI 2 absolute consequence, denoted by I/J1I or I/J2I, if /J1 = /J2 or /J1 = ~/J2.

105 A default theory (D, W) mapped from an NRDR K, is a collection of strata where every stratum corresponds to a concept definition. Every stratum is a prioritised set of defaults which follow the constraints listed in section 4.4.3. Briefly, these are:

1. Every default must be a normal default (see section 4.4.1). 2. Consequences for all defaults are propositional values.

3. Every default d E D must have a priority n. 4. Any two defaults in D with equal priorities must have mutually exclusive pre­ requisites. a :/3 d a2:f32 1 5. For any two defaults: d 1 = 1 and 2 = /3 : Pi is not required to 131 2

compute a,. That is, if WI- a2 then W\ { P1} I- a2.

6. Exactly one default, dd,Jaut,, has the lowest priority.

For defaults in D from different strata, constraint 4 does not apply, that is, defaults with same priority may apply. This is completely consistent with semantics of NRDR knowledge base. An NRDR knowledge base can give more than one conclusion at any one time. Moreover, constraint 5 does not apply for defaults from different strata of D. In other words, given two defaults with different consequences d = a 1 : 131 I /31

d a2:/J2 11 · and 2 = /J 2 then: /J2 can be used to compute a,. However, a constraint, which corresponds to the prohibition of recursion and circularity of concepts in an NRDR knowledge base, applies to dependency between defaults from different strata. To express this constraint, we introduce the notion of a dependency graph for a set of defaults:

Definition: A dependency graph G of a set of defaults D is a directed graph, where every default in D corresponds to a node in G, and if the application of

__ a I ·/3 • I a 2 :/3 2 p d --=-----=- is essential to apply d = ~-~, that is only W u { i} I- a z, then I /3, 2 /32

106 there is a direct link from the node corresponding to d2 to the node corresponding to d1 in G.

An NRDR knowledge ba!le A collection of i:rioriti!led strata of defaults ny····· G y-a-o

O····· RDR tree 1

A )Xioritised 1 2 !let of defaults G RDR tree2

8 RDR tree3

6.4. Conversion steps from a prioritised default theory D to an NRDR Knowledge Base K. Step 1 is grouping the defaults into a set of prioritised strata of defaults. Step 2 is converting each of the stratum to an RDR tree (as shown in figure 4.3 in chapter 4). The result is a collection of RDR trees forming the NRDR knowledge base K..

107 The constraint on defaults, which corresponds to prohibition of recursion and circularity of concepts in an NRDR knowledge base, can then be expressed as follows:

For a default theory (D, W) to be convertible to an NRDR representation, then the dependency graph G of D must be acyclic.

In summary, given a default theory (D, W) which follows conditions 1 to 6 for all its strata, and the above constraint on defaults from different strata, then ( D, W) can be converted to an NRDR knowledge base. The conversion is as follows:

1. Group defaults into strata. 2. Apply conversion of defaults to RDR trees - outlined in chapter 4 - to each stratum.

The above translation is illustrated in figure 6.4.

In conclusion: The translation of RDR to a default theory discussed in chapter 4 does not allow using more than one default during inference, which in contrast is generally possible in a knowledge-based system implementing a default theory. RDR imposes the maximum possible restriction on interactions between defaults: No interactions between defaults are actually possible. While this eases maintenance, it decreases the inference power of the corresponding default theory. In this section, we discussed how a default theory can be mapped to an NRDR knowledge base. We highlighted restrictions on the corresponding defaults. Inference restrictions still applied, but these were much weaker than those imposed by the RDR formalism. For example, the total mutual exclusion imposed on defaults in RDR equivalent default theories no longer applied. That is, much more powerful default theories can be mapped from (and to) an NRDR knowledge base. In other words, NRDR is a more powerful knowledge representation scheme than RDR. This last comment does not apply to later developed version of inference MCRDR in (Compton, Ramadan et al. 1998; Richards and Compton 1999)

108 In the next section, we give an example of knowledge acquisition with NRDR. This illustrates the NRDR maintenance features and the associated maintenance policies.

6.6. NRDR Example: The eccentric film producer

We now give an example of a small knowledge acquisition session which results in causing inconsistencies. The knowledge acquisition task is to build a knowledge base to choose male models for an army scene in a Hollywoods movie. The expert is an eccentric film producer who enjoys using expert systems in his work. The cases (candidates) are presented in table 6.1.

Case Weight (Kg) Height (m) Body Fat Age Expert Decision number % (year) comment 1 90 1.75 40% 21 Too Heavy Reject 2 60 1.9 3% 26 Too Lean Reject 3 81 1.7 6% 27 Too Heavy Accept and too Lean 4 79 1.81 9.5% 39 Too Lean Reject 5 80 1.6 9.8% 25 Too Heavy Reject Table 6.1: The cases presented to the expert.

The know ledge base starts with the default rule "If True then Accept". After meeting the first case, the expert enters the rule "If Too_heavy then Reject". He explains the term "Too_heavy" with the rule "If weight> 80 then Too_heavy". The expert also rejects case 2. He finds the candidate too skinny. He enters a new rule to the highest level concept "Accept". This rule is "If Too_lean then Reject". It gets attached to the false link of rule Accept.2. He explains the new concept "Too_lean" : "If body fat < 7% then Too_lean". The expert accepts the third candidate although he's "Too_heavy" and "Too_lean", so he enters the exception rule "If Too_heavy and Too_lean then Accept". This last rule is attached to the true link of rule Accept.2. See figure 6.5.

109 If True then lfToo_beavy if Too_beavy if weight> 80 if body_fat < 7'll Accept t------,,- then Reject t------,,- and Too_lean then Too_heavy then Too_lean Accept Acapt_2 Toa_het.,.,,-J Too_Ja,n.J ltc<,pt.1 ltc<-,,,.4 l ! if height > 1.8 ifToo_lean and weight < 80 then Reject then Too_lean Tao_J.un.2 A~upr..J

"Too_heavy" "Too_lean" "Accepr concept RDR concept RDR conceptRDR

Figure 6.5: An example of an NRDR knowledge base. Note, the addition of rule "If body fat> 9 % then Too heavy" to account for case 5 in table 6.1, causes case 4 to become an inconsistency.

The expert rejects case 4. However, the knowledge base accepts this case on the basis of the default rule, as the case is not Too_heavy (79 < 80) and it is not Too_lean (9.5 > 7). Because of this disagreement with the knowledge base, the expert must modify the knowledge to reject this case. He rejects this case on the basis that it is too lean (see comments in Table 6.1). To cover this case, the expert modifies the concept "Too_lean". He enters a new rule "If height> 1.8 and weight< 80 then Too_lean". With this new modification, the knowledge base now agrees with the expert, and it rejects this case on the basis of rule Accept.3 (Rule Accept.3 = "if Too_lean then Reject") in the Accept RDR concept definition (note Reject= not Accept).

Finally, case 5 is rejected by the expert on the basis that it is too heavy. However, this case is accepted by the knowledge base. Therefore, the expert must modify the knowledge. He updates the concept "Too_heavy" to cover this case. He enters "If body fat > 9% then Too_heavy". Case 5 is now rejected by the knowledge base on the basis of rule Accept.2 (in agreement with the expert). This final rule "If body fat > 9% then Too_heavy" also covers case 4. Hence, case 4 now becomes accepted by the knowledge base (on the basis of rule Accept.4). Note, this case was earlier rejected by the expert, hence it becomes an inconsistency. This is detected by using all past seen cases - that is cases 1 to 4. To overcome this inconsistency, the expert needs to apply a secondary refinement. So using our policies from the previous section, he can do one of two things: He may rethink his change to concept

110 "Too_heavy" which caused this inconsistency (e.g using the limit of 9.7 % for body fat instead of 9% ). Alternatively, he can also refine the higher order concept "Reject" by entering a new exception to rule Accept.4 (e.g. "If age > 35 then Reject").

In summary, the knowledge acquisition process with Nested Ripple Down Rules as the underlying representation has three distinct features: Firstly, it allows the expert to introduce new domain terms during the knowledge acquisition process. Secondly, these terms are operational while still incomplete, e.g. they are always open for amendments. Thirdly, the knowledge base is viewed as an interconnected whole and multiple points of change are available during maintenance. These three features will be discussed from a philosophical perspective in the next section. We will argue that because of the combination of these features the knowledge acquisition process with NRDR as the underlying formalism has certain parallels to confirmation holism in philosophy of sciences. This discussion will provide a philosophical background for our work. We will discuss the implicit assumptions adopted by the expert during the knowledge acquisition process. This discussion is an extension of philosophical observations which we expressed in (Beydoun and Hoffmann 1999; Beydoun and Hoffmann 1999).

6.7. A philosophical perspective on NRDR

Before we argue that the maintenance of NRDR has certain parallels to confirmation holism in philosophy of science, we first introduce the philosophical background that led to the idea of confirmation holism (Quine 1951).

6.7.1. Roots of confirmation holism in philosophy Logical positivism emerged early this century as an important current in the philosophy of science, strongly promoted by the Vienna Circle. It maintains the view that we can precisely describe our (scientific) observations in a formalised language using atomic predicates for describing observations that can be stated without any doubt. From such basic propositions, we would be able to systematically derive scientific theories. The advantages of that approach would be that we could systematically analyse the formal derivation processes of generalisations and other

111 types of reasoning, so that we can get a better grip on the validity and invalidity of our scientific theories.

Later, severe problems with logical positivism were found which led eventually to its abandonment. A major problem represented dispositional predicates, i.e. predicates describing a disposition to behave somehow under certain circumstances. For example, the proposition My glasses are breakable is very problematic. It cannot be observed that these particular glasses are breakable indeed, unless I actually break them. In other words, for each pair of glasses I can never observe that they are breakable, unless I destroy them. Consequently, the predicate breakable can only be asserted to a pair of glasses by means of projection. That is, by a generalisation from the observation of other pairs of glasses which have been broken. This problem left dispositional predicates inapplicable in a 'pure language' consisting only of undoubtedly observable predicates. Furthermore, it was demonstrated that not only those obviously dispositional predicates are problematic, but that essentially the same problem carries over to almost any predicate that we can think of. For whatever predicates we choose to describe our observations, there is already a projection involved. For example, if we describe cars by their age, colour, weight, brand, etc. we implicitly assume that all cars sharing the same description in our language are the same. So, we may not be able to differentiate those cars where the steering wheel was mounted on Monday morning between 9:30am and 10:00am from those where the steering wheel was mounted on a Tuesday etc. As long as we can be certain that these differences are irrelevant, things are fine. In scientific endeavours, however, we can never be certain about that. Even more importantly, if we cannot get a grip on the speculations which we put into the design of our formal language (i.e. the collection and operationalisation of the basic predicates), then the very purpose of logical positivism, to give us a better grip on the speculative part of our scientific theories must fail!

The problem of verification of atomic facts in logical atomism has been analysed by Quine (Quine 1951 ). He proposes that verification conditions cannot in general be taken seriously in isolation from the rest of one's beliefs about the world. He observes

112 that methods for testing scientific facts depend upon complex interactions between explanatory systems. That is, the whole body of scientific knowledge must be taken into account to execute reliable confirmation tests. This idea of holistic verification is known in philosophy as "confirmation holism". This is also advocated by other contemporary philosophers like Davidson (Davidson 1984) and Putnam (Putnam 1988). In the next section, we discuss how this idea of confirmation holism has certain parallels in the development of an NRDR knowledge base.

6.7.2. NRDR and confirmation holism Every knowledge representation formalism used in Artificial Intelligence (Al) assumes a set of primitives which are used to describe domain instances. This atomism is inherent with using symbolic representation in AI. However, we need to highlight an important distinction between the knowledge representation scheme used and the actual knowledge acquisition process. The former is a static entity while the latter is a dynamic process that involves interactions between human experts and the abstraction of knowledge, and interactions among the abstract entities themselves. Indeed it is between the latter in NRDR and confirmation holism of development of scientific theories where we highlight parallels.

In our NRDR framework, we observe that maintaining the consistency of a knowledge base with respect to past seen cases during the actual knowledge acquisition process in NRDR is a holistic process similar to that of confirmation holism. Our observation applies only to middle abstraction layers of the knowledge base. Recall as we discussed in section 6.4, that only in these middle layers updates may cause inconsistencies.

Confirmation holism has two important consequences with respect to the development and maintenance of scientific theories (Putnam 1988). These two consequences have two parallels the maintenance process of NRDR: Firstly for any change the whole body of knowledge needs to be considered. Similarly in NRDR, all concepts that use a concept that is being refined have to be rechecked for consistency against past seen cases and may themselves undergo refinement. Secondly the maintenance of scientific

113 theories offers multiple points of change during their development or as Quine (Quine 1951) expresses it" ... Revision can strike anywhere". Part of scientific ingenuity is choosing the right knowledge change (e.g Einstein taking the bold step and relating mass to velocity, rather than changing the initial definition of momentum defined as the product of mass and velocity (Putnam 1988)). Similarly, the NRDR knowledge base has a hierarchical structure which offers the expert multiple points of refinement.

The expert has to make a decision of the most suitable refinement point, and part of her/his expertise is indeed captured through such decisions. In particular, such decisions determine the relations between the different concepts that the expert introduces, and hence they partly reflect the expert's explicit domain model. In this respect, Putnam remarks that meaning depends on the individual person (Putnam 1988) (this notion is extensively discussed by Gadamer in (Gadamer 1993)). Capturing the expert's own conceptualisation is all what we require in NRDR. That is, we acknowledge that different experts have different views which translate into different domain conceptualisation (as expected from discussions in (Putnam 1988; Gadamer 1993)). But we do not seek absolute meaning, and expert dependence is expected and perfectly acceptable.

In (Putnam 1988), Putnam develops the idea of holism to include the impact of environment and history on meaning. While this is convincing to us on the large scale of scientific theories which span generations of scientists, we believe that for small domains of expertise the impact of history and environment can be ignored. That is, we believe that boundaries of concepts are sufficiently stable throughout the development and (re)-use of the knowledge base.

So far, we have considered the philosophy of NRDR by looking at the development of the knowledge base. In the next section, we analyse the interactions at the interface between the expert and the knowledge base. We analyse implicit assumptions that s/he may make during the knowledge acquisition process. We argue that the way, in which the expert makes these assumptions, points to the need to adopt a holistic

114 verification process during KA, parallel to confirmation holism m developing scientific theory.

6.7.3. Implicit assumptions in knowledge acquisition (KA) with NRDR

As the expert developing an NRDR knowledge base invents new concepts, s/he makes implicit assumptions about any concepts that s/he introduces, in that s/he has a certain disposition about what these concepts mean and when they do actually apply. Such assumptions are extended in terms of new rules to the RDR trees representing these concepts. Alternatively, old assumptions are modified by adding exceptions to rules in the old representation. The expert expects his/her assumptions to hold to future unseen cases as much as they do for the already seen case. This expectation translates into a predictive capability of the rules that s/he enters. Such expectation of concepts to hold to unseen cases - simply because they have features that apply to seen cases - implies a subtle inductive hypothesis used by the expert. This is described as projection by Goodman in (Goodman 1954). This raises the following question: while concepts are introduced in a particular context how are they used outside the context of their introduction, i.e. how do we project these new predicates (NRDR concepts) beyond their initial context of use. This is answered in two parts: Firstly the projection itself of concepts is assumed incomplete and open for modifications to capture any new contexts. Secondly, an NRDR knowledge base is holistically maintained, i.e. other concept definitions are checked and may be modified when a single concept is being updated. We first expand the first part of the answer.

With respect to the incompleteness of concepts in NRDR: They are represented as RDR trees whose structure anticipates refinement. Thus NRDR concepts are considered fallible. They are operational while they are incomplete. This accommodates our human inability to define concepts absolutely without reference to their context. This inability was discussed by early philosophers like Aristotle, more recent philosophers like Berkeley (Berkeley 1952)3. More recently, Wittengstein

3 In (Berkeley, 1952), Berkeley uses the expression "abstract ideas" to express our contemporary notion of "concepts". In paragraph 13 (p.408), he argues for their imperfection and uses an example of

115 remarks in (Wittgenstein 1953) (paragraph 69): " ... Is it only other people whom we cannot tell exactly what a game is? - But this is not ignorance. We do not know the boundaries because none have been drawn. To repeat, we can draw a boundary-for a special purpose. Does it take that to make the concept useable? Not at all!"

Last comments with respect to its holistic nature: An NRDR knowledge base is - as we earlier discussed - viewed as an interconnected set of concepts. NRDR concepts are defined as functions of the overlap of other concepts. This overlap is dynamic during the knowledge acquisition process. It changes during the knowledge acquisition process (e.g see figure 6.3). So, the meaning of any dependent concept changes during the knowledge acquisition process. It is always a function of other concepts that in tum may depend on other concepts. Consequently, the meaning of a concept is a function of the state of the whole knowledge base.

It should be noted that our policies of maintenance and development (discussed in section 6.4), impose restrictions on which concepts a given concept may depend. These cannot be all other concepts. For example, the highest level concept may not be called by lower order concepts in the hierarchy. To be more accurate in drawing parallels with NRDR maintenance and confirmation holism in scientific theories, it is important to underline the molecular aspect of holism in NRDR. That is at any single point of update, every concept will likely depend only on a number of other concepts, rather than the whole knowledge base. For a characterisation of the different views of holism, the interested reader can refer to (Fodor and Lepore 1993).

6.8. Chapter summary

In this chapter, we presented an epistemological argument for our Nested RDR. We aligned this framework with observed human behaviour in general and experts in

"triangles" in a similar way that Weittgenstein in (Wittgenstein, 1953) uses the example of "games", to show that we cannot be precise about the meaning of a word. In paragraph 15 (p.409), he puts this notion more explicitly by arguing for the impossibility of universality of "abstract ideas". This is his version of context dependency. Finally in paragraph 25, he discusses how words not only communicate "abstract ideas", they also prepare the mind for certain dispositions, i.e. words also communicate the context of the "idea".

116 particular. We argued that, left to their own nature, human experts introduce intermediate concepts when articulating their knowledge. Further, we discussed that interactions between such intermediate concepts are natural to an expert. We then analysed how interactions between NRDR terms can create inconsistencies. We showed how an expert can efficiently handle any inconsistency. This is by adopting development policies during the knowledge acquisition process. We then discussed the relation between NRDR and default theories, and we compared this relation to the corresponding relation between RDR and default theories. We then illustrated the features and the policies of developing NRDR with an example. Finally, we argued that secondary refinements required to maintain an NRDR knowledge base are also necessary in the development of scientific theories, and in this respect NRDR is in concordance with confirmation holism.

In the next chapter, we continue to show that incremental refinement of the search knowledge base in SmS is possible without much effort on the part of the expert. Our policies presented in this chapter ensure that a single inconsistency can be handled efficiently by the expert. In the next chapter, based on our formal framework of chapter 4, we show that inconsistencies are in fact infrequent during the knowledge acquisition process. Furthermore, we show that they can be handled automatically. However, as we will discuss, we prefer to get the expert involved to oversee semantics of any secondary refinements. This is by following the maintenance policies presented in this chapter.

117 Chapter 7

Formal Analysis of NRDR

In this chapter, on the basis of our formal framework presented in chapter 4, we analyse the relationship between the convergence of NRDR knowledge bases and the frequency of inconsistencies occurring. We show that as an NRDR knowledge base converges, the probability of inconsistencies occurring will diminish. This explains why in our application in search domains (in particular chess, to be described in the next chapter), we did not detect many inconsistencies. We also show in this chapter that secondary updates (see section 6.4) required to deal with inconsistencies can be automatically dealt with. This automatic method will be compared with Nested RDR development policies discussed in chapter 6.

This chapter is organised as follows: In section 7.1, we use notions from the formal framework presented in chapter 4 to describe conditions, under which a case becomes inconsistent during an NRDR KB update. In section 7 .2, based on the performance of the knowledge base on seen cases, we develop a statistical analysis to estimate the likelihood

118 of these conditions occumng. In section 7.3, we analyse these conditions using our formal framework, and we explain some past empirical results based on our analysis. In section 7.4, we discuss the impact of the order of presentation of cases to the expert, on the convergence of the knowledge acquisition process. We also propose a new RDR maintenance algorithm (for individual NRDR concepts) to minimise any impact of that order. In section 7.5, we finally show that any inconsistencies can be dealt with automatically without any extra burden on the expert.

7.1. Probabilistic analysis of inconsistencies in NRDR

....

If +C, ->-C,

I - I flia: I I

Figure 7.1. C2 (RDR tree in smaller box on the right) is a concept of a lower order in the conceptual hierarchy formed by an NRDR KB. When a new rule is added to C2, and C2 is used in the definition of an other concept C1 (RDR tree in the large box on the left), then a case can become inconsistent only if it belongs simultaneously, to the domain of the newly added rule within C2 and the context of the rule using

C2 within the definition of C1•

In this section, we extend the formal framework of chapter 4 to cover NRDR specific notions. We then discuss conditions under which a case x may become an inconsistency.

119 This will be later used to derive a relationship between the probability of inconsistencies occurring and convergence of the knowledge base towards satisfactory accuracy.

We introduce the following NRDR specific definitions:

Definition 17: A case x is said to travel from rule r1 to rule r2, if a concept used in the

RDR tree containing both of r1 and r2 is updated, and we have: x E scope (r1) and x f! scope ( r2) before the update, and x E scope ( r2) and x f! scope ( r1) after the update.

Definition 18: A case x is said to offend, if it travels from a rule r1 to a rule r2 and the conclusion of r2 differs from the conclusion of r1• That is an offending case is an inconsistency.

Given P(x): D ➔ [0,1] the distribution function over the domain of expertise D, we extend the above notions with the following:

Definition 19: The coverage of a rule r, coverage( r) is the probability that an arbitrary case from the total domain is in dom( r ). This is:

coverage(r) I, P(x) xE dom(r)

The coverage of a rule is a key measure of the impact of the rule on the rest of the knowledge base.

Observation 1: Given a rule r 2 in a concept definition c2• If r 2 uses a concept definition C1 which is being modified by adding a new rule ri, then: A case x drawn randomly according to the distribution function P(x), can travel (see definition 17), only if it falls in the intersection of the scope of the new rule r, and the context of r2 •

For example in figure 7 .1, a new rule is added to the concept C2 defined as an RDR tree.

C2 is used in the condition of rule ru (child of rule r1.2 ) in RDR definition of C2 . Only

120 cases in the scope of the new rule in C2 and context(ru) can travel (see definition 17 earlier).

Clearly, not all objects in this intersection will offend. Some cases may travel to new rules where they are still correctly classified. Furthermore, not all cases within that intersection will actually travel. Hence, the upper bound can be further tightened, by considering the probability of a travelling case simultaneously offending. To determine an upper bound estimate on this probability, we consider the distribution of the two classes in a binary classification RDR tree. In NRDR, this is the only relevant type of

RDR trees1•

Observation 2: In an RDR tree for binary classification, rules within an exception level n have the same conclusion. Furthermore, these conclusions alternate within the exception hierarchy, i.e. conclusion of a rule at a level n is the opposite of conclusion at level n+ 1 (or n-1).

Figure 7 .2. The change in the condition of a rule changes its domain. This change is denoted by the shaded region. By theorem 2, only half the cases in this shaded region can become inconsistent if the predictivity is> 4/9.

From observation 1 and 2, the following observation follows:

1 We can envisage a top level RDR tree with more than two conclusions. However, in our implementation of SmS 1.3, and in outlining the relationship between NRDR and default logic in the previous chapter, we talk of many top level RDRs, each defining a distinct concept (conclusion).

121 Observation 3: Given a rule r2 in a concept definition c2 , and depth (r2) = n and r2 uses a concept definition c1 that gets modified by the addition of a rule ri, then we observe the following: A case x can offend, only if it falls simultaneously in the domain of the new rule r1 and the context of r2, and in addition, x travels (see definition 17) to a rule at depth n+ 2d + I within c1 (where dis an integer and d ~ 0).

Observation 3, leads us to the following theorem:

Theorem 2: The probability of a travelling case x offending is < ½ as long as the predictivity p throughout the knowledge base has a lower bound of 4/9 (see figure 7.2).

Proof:

' ?, ( ~-p)+ ( 1-p)

a : 1

Figure 7.3: p+( 1-pJ2 becomes larger than ( 1-p)+( J-pJ3 when p Exceeds 4/9.

122 There are two possible destinations for a travelling case c: Firstly, c can travel to a direct parent connected by a true link i.e. at an exception depth d - 1, in this case c becomes an inconsistency (by observation 2). Secondly, c can travel (by definition 17) within the same exception level or to a deeper exception level. In this second scenario, c may or may not become an inconsistency. This depends on the exception depth to which it travels.

By observation 2, a travelling case (by definition 17) does not become an inconsistency if it moves from the scope of a rule at depth d to a rule scope at an exception level n and n - d is even. We first find an expression for the probability of c not becoming an inconsistency.

The probability that c travels within the same exception level is < p (where p is the lower bound on the predictivity). The probability that c travels to an exception level n where n - d = 2 is (J - p )2. The probability that c moves to an exception level n where n - d = 4 is

( 1-p J4 ... Thus the total probability P 1 = P(a travelling case c does not become an inconsistency) (i.e. it travels to a rule of same conclusion) is:

2 4 P1 < p +(1 - p) + (1 - p) + ..... (1)

Similarly, the total probability P2 = P(a travelling case becomes an inconsistency) (i.e. it offends by definition 18) is: P2 < (1-p) + (1-p)3 + (1-p)5 + .... (2) Every corresponding term in ( 1) is larger than every corresponding term in (2) except for the first term. For this term, the result of the comparison depends on the value of p. That is: p > 1- p iff p> ½ . However, including the second term in the comparison will give 0.44 instead of ½ . That is:

p + (1- p) 2 > (1-p) + (1-p) 3 if p > 0.44 and hence,

p+(l - p) 2 > (1-p) + (1-p) 3 if p > 4/9 (see figure 7.3) and finally, because P, + P 2 = 1 and P 1 > P 2 iff p > 4/9, theorem 2 follows. QED

123 We expect the lower bound condition in theorem 2, i.e. that the predictivity p > 4/9, to hold when we have an expert on hand. That is, we anticipate that an expert enters rules which can classify at least 4 cases correctly out of every 9 cases in their observed domains. Thus theorem 2 tells us that, even when a case travels then most likely it will not become an inconsistency.

By theorem 2 and observation 1, we have the probability of a case x - drawn randomly according to the distribution function P(x) - becoming an inconsistency as a result of a single update r, bounded by:

½ P (x E dom (r1)) P (x E context (r2))

<½coverage (r1) [ see earlier definition 19J

In the next sub-section, we analyse how the above bound varies as concept definitions converge. We focus on the average of coverage (r1) as an RDR tree concept definition grows.

Expected coverage of rules as KB grows In a given RDR tree T, we denote the expected value of coverage(r) by 'P [r is an arbitrary rule]. The upper bound on 'P - i.e. the average of coverage(r) as the knowledge base develops - can be calculated by observing the exception hierarchy on a level by level basis. The probability for a case to belong to the first level false chain in the knowledge base is ~ 1, to belong to the second is ~ 1-p (where p is the lower bound on the predictivity). Generally, the coverage of a rule on the nth level exception is~ (1-pl-l. For a complete knowledge base of size St [where all exceptions are handled] and a coverage ratio R (see definition 12 in chapter 4), we have:

2 n-1 'P = R [1 + (1-p) +(1-p) + ...... + (1-p) ] I St In a growing knowledge base, as new rules are added, the above denominator increases much faster than the corresponding numerator. So substituting the current size S of the knowledge base in the above denominator, the following upper bound on the average

124 probability that a case belongs to a domain of a rule becomes (excluding the default rule/:

R {J +(]- p) +(1-p/ + ..... + (1-pl-J] / S < R [1+ Bl_ ] / S p The above result is independent of the shape or the depth of the RDR tree. For a typical RDR tree with varying values of p, the lowest value of p can be taken to derive the upper bound. For most domains of expertise where the correctness principle holds and there are only binary conclusions, we can safely assume that p > ½. That is, rules entered by the expert classify more than half seen cases correctly. This is possible for experts in most binary domains (this assumption fails in domains discussed earlier in chapter 4 e.g. domains similar to the parity function). In the above expression, as S increases, the numerator hardly increases. Hence, the above upper bound decreases as the RDR tree matures. More importantly, the probability of inconsistencies in an NRDR knowledge base occurring, decreases as the knowledge base matures. The relationship between the occurrence of inconsistencies and RDR trees convergence will be further analysed in section 7.3, by looking at how the coverage of newly added rules changes as an RDR tree converges.

In the next section, we show how the predictivity and coverage a given rule can be estimated using both, statistical methods and the performance of the knowledge base on past seen cases.

7.2. Statistical analysis of NRDR concepts In this section, we discuss how we can estimate the incremental accuracy, due to a new rule being added, based on the knowledge base performance on past seen cases. This is presented in the context of developing a statistical estimate of the knowledge base error, which in tum can be used as a criterion to stop the incremental knowledge acquisition process. Statistical analysis in this section appears in (Beydoun & Hoffmann 2000).

2 2 n-1 (1- p) +(1-p) + ..... + (1-p) < (1-p)lp [because 1-p <]]

125 When do we stop the KA process with RDR?

Clearly, the most expensive resource in developing an RDR-based knowledge base is keeping the expert on-line during its use. Hence, it is desirable to free up this resource by terminating the knowledge acquisition process as soon as this is feasible. In what follows, we give guidelines for the decision to terminate the KA process based on a trade off between the cost of keeping the expert on-line and an error tolerance in the knowledge base performance. For example, a knowledge base with 99.9% accuracy would be assumed effective enough in most domains and the cost of keeping the expert would be hard to justify from an economic perspective.

In what follows, we develop statistical estimate of the knowledge base error, which in tum can be used as a criterion to stop the knowledge acquisition process.

Theorem 3: Given an error tolerance e for an RDR tree T, after classifying m cases correctly, the probability that the error of T is less than e is given by: P ( Error of T < e) > (1- er

Proot· The probability that the error of T is less than e is greater than ( 1-e) for every correctly classified case. T performance is independent of the order of these m cases. Thus, theorem 3 follows. QED

Theorem 3 gives a confidence measure for a test that classifies m cases correctly. When a classification error by the knowledge base occurs, the expert modifies T by adding a new ruler.

From theorem 3, the following corollary immediately follows:

126 Corollary 1: Given an upper bound 8 on the probability that a given error estimate e of the knowledge base is exceeded after the knowledge base classifies m cases correctly, we have: e <= 1- 8 11m

Following the correct classification of n cases, we use corollary 1 to calculate an upper bound on the error of the knowledge base Ek after the addition of a new rule r. Towards this, we make the following observation:

Observation 4: Following the addition of r, the error of the knowledge base Ek is given by: Ek = Error due to error in new rule r + Error due to the rest of the knowledge base without r

Given n correctly classified test cases, m1 cases will be correctly classified by the new ruler, and m2 cases by the rest of the knowledge base without r. That is, m2 = n - m1. The component of the knowledge base error due to a new rule r, depends on whether or not an arbitrary case c is actually classified by r, that is coverage(r). That is, the contribution of the new rule r to the error rate of the knowledge base is as follows:

coverage( r) * Er (where E, is the error rate in r)

Thus, observation 4 can be rewritten as follows:

Observation 5: Following the addition of r, the error of the knowledge base Ek is given by: Ek = coverage( r) * E, + ( 1 - coverage( r)) * Error due to the rest of the KB

To estimate coverage( r ), we use the ratio Q of cases classified by r to all seen cases. That is, an estimate of coverage(r) is Q =m1/n.

In expressing Ek in terms of the performance of the knowledge base on the seen correctly classified n cases, two questions must be considered: Firstly, How well do the correctly n seen cases reflect the real accuracy of the knowledge base. Secondly, how well does the ratio Q estimate coverage(r). These two questions are answered using statistical theory, normally applied in machine learning.

127 Statistical theory (e.g. (Walpole and Myers 1989; Mitchell 1997)) tells us that with N% confidence, the true probability that a case c belongs to the domain of r lies in the

Q-ZN a < coverage(r) < Q + ZN a

An N% confidence interval for some parameter p is an interval that is expected with probability N% to contain p (Mitchell 1997). Our parameter of concern is coverage(r), estimated by Q = m1/n. ZNdefines the width of the smallest interval about the mean that includes N% of the total probability mass under the bell-shaped Normal distribution.a is the standard deviation over the sample of n cases. However, this is not available to us. In the above, the Normal Distribution approximates the binomial distribution. This is a common technique in machine learning when the sample size exceeds 30 (Mitchell 1997). Thus, we can use the standard deviation expression for binomial distribution, that 1s:

The above interval gives the limits for two-sided N% confidence intervals. However, as we are interested in the upper bound of coverage(r), we prefer instead the single sided limit. The symmetry of the Normal Distribution Bell curve is used, and we assert with M% (where M =N+l (100-N 1200)) confidence that:

Q + ZM ,J------;;--~ > coverage(r) (1)

Similarly, we develop an upper bound for [ 1 - coverage(r)J - the probability that a case belongs to the domain of a rule other than the new rule r - based on the lower bound of coverage(r). That is, the upper bound of [ 1 - coverage(r)J is given by:

1 - (Q- ZM ,J------;;--~ ) > [1- coverage (r)J (2)

3 For a discussion of how these limits are derived, refer e.g., to chapter 5 in (Mitchell, 1997).

128 Thus, by observation 5 and substitution of expressions (1) and (2), an upper bound for error of the knowledge base Ek following the correct classification of n cases is given by:

~ .JQ(l-Q) Ek < [ Q+ ZM v------;;- ]E, + [ 1 - ( Q-ZM n )J * Error due to rest of KB

Given n cases correctly classified, to estimate the knowledge base error Ek to a degree of certainty M, we use corollary 1. This results in:

In the statistical analysis of this section, we showed how to approximate the error of an RDR tree by considering a sequence of correctly classified cases. In doing this, we have estimated an upper bound on the coverage and the error of a new rule r. This error can be easily used to obtain the predictivity(r) using the following: Predictivity ( r) = 1 - error ( r)

This predictivity can in turn be used to derive an expression for the upper bound on the probability that an arbitrary case c E scope (r):

P(c E scope (r)) = Predictivity (r) * coverage (r) ( See definition 11 in chapter 4)

With respect to the knowledge acquisition process, all these statistical estimations can be calculated in its background, and the result can be reported as the knowledge acquisition proceeds.

As an example of how the statistical analysis discussed can be used, let's consider a sequence of n instances, which dictated the progress of the knowledge acquisition process:

11 12 [3 •· • IM1 ••• IM2 ••• JM3 ... JM, ... /Ms •••••• JM, ... In Highlighted instances in the given sequence are misclassified instances which cause a knowledge base modification. In the above sequence, s modifications are shown. That is,

129 the expert added s rules to the RDR tree, following observing and correcting its performance on that sequence. In our above analysis, cases correctly classified following each modification are used to measure the effectiveness of the knowledge base and the last modification undertaken by the expert. For example, the last part of the sequence IM, ... I. is used to evaluate both, the effectiveness of the last s-th update and the error in the knowledge base at that point. However, following every modification, adjacent sub­ sequences of test cases can be merged to evaluate the effectiveness of the corresponding knowledge base modifications combined. This provides a more accurate estimate of the whole knowledge base error with an improved confidence level.

In section 7 .1, we discussed how the average of coverage( r) - over all rules in an NRDR concept - is expected to decrease as the knowledge base develops. In the next section, we discuss how the actual probability coverage(r) for every newly added rule r is expected to change as the knowledge base develops. We use this analysis to develop a quantitative relation between knowledge base convergence and frequencies of inconsistencies. Some recent empirical studies in (Compton, Preston et al. 1995) will be examined in light of our current analysis.

7.3. Knowledge base convergence and inconsistencies During its development process, an RDR tree shows two types of inaccuracies: False positives: these include cases incorrectly classified by the default rule. False negatives: these include cases incorrectly classified by expert entered rules because of their imperfect predictivity (these include errors by exceptions to existing exceptions rules). False positives cause addition of new false links on the outer chain. False negatives cause addition of new exception rules to the existing expert entered rules. Given the coverage ratio, the target initial granularity and the current structure of the knowledge base, we can obtain a probabilistic estimate of both types of inaccuracies as a function of rules predictivity.

130 Coverage of newly added rules

Size of knowledge base

Figure 7.4. Coverage of newly added rules shrinks as KB size increases

Completion (see definition 9 in chapter 4) of rules on the first false link increases accuracy of the knowledge base by the predictivity p. Completion of rules at exception level n, increases knowledge base accuracy by pn. The addition of a rule at a level n-1 is JI( 1-p) more likely than the addition of a rule at level n. This is simply because a case x - drawn according to the distribution function P(x), belonging to the domain of a target rule at level n-1, is 11(1-p) more likely to be encountered during a KA session, than a case belonging to the domain of a rule on level n (see definition 11 of predictivity in chapter 4). So an RDR tree development occurs in a breadth first manner where false branches develop faster than true branches. As interactive knowledge acquisition proceeds, an expert will be adding more exception rules than false links on the outer exception chains. In reference to the above, most errors will be of false negatives as the knowledge base develops. Hence, the coverage of newly added rules decreases rapidly as their depth increases (as described earlier the previous section). So, as the knowledge base develops we expect the coverage of newly added rules to decrease rapidly as seen in figure 7.4.

131 As a Ripple Down Rule tree develops, accuracy increments to the knowledge base performance of newly added rules decreases, because the coverage of these new rules becomes smaller (see figure 7.4). The number of added rules per seen case also diminishes, because the KB classifies most cases correctly, and more cases are required to cause the event of a rule addition. Plotting the number of rules added per seen case would yield a graph with a sharper rise and flatter top as shown in figure 7.7. In (Compton and Jansen. 1990; Compton, Preston et al. 1995; Kang, Gambetta et al. 1996; Kang, Compton et al. 1998), Compton etal empirically observed this relationship. Results in (Compton, Preston et al. 1995) are reproduced 4 in figures 7 .5 and 7 .6. These figures show convergence of RDR knowledge bases in three domains: Chess endgames, TicTacToe and Medical diagnosis. All three domains of expertise show a similar convergence behaviour. That is, the shapes of their convergence functions are similar. Hence, we can see that the convergence relationship is domain independent. This same convergence was derived in our theoretical analysis. Most importantly, our analysis predicts what these results show. That is, in a given domain of expertise, a better expertise yields a sharper convergence (a better expertise is correlated with a higher predictivity).

As we discussed previously, as the knowledge base develops, the coverage of newly added rules decreases exponentially with depth. So with respect to Nested Ripple Down Rules, we expect the probability of inconsistencies to occur to decrease exponentially as shown in figure 7.7. This is also observed from the upper bound derived in the previous section tightening as the knowledge base grows.

4 I am grateful to Professor Compton for granting his permission to reproduce the graphs shown in figures

7.5 and 7.6.

132 200 50 # Rules % Error 40 150

30 100 20

50 10

0 0

1500 25 RDR RDR Errors Growth 20

1000 Rules 15 %Em

1 0 500

5

0 0 0 20 40 60 80 0 20 40 60 80

Percentage of Total Cases Percentage of Total Cases

Most Expertise~ Medium Expertise~ Least Expertise ■ □

Figure 7.5. Knowledge base size and accuracy for KBS built" for the Chess (above) and Medical diagnosis (below) domain

133 We end this section by noting the following: The depth of a conceptual hierarchy expressed by a human expert is limited by cognitive strain (Sowa 1984). Typical values for this depth are 4 to 5 (Beydoun and Hoffmann 1997). Indirect dependencies between concepts in the conceptual hierarchy have minimal effect on the knowledge base consistency and are ignored in our current analysis. However, we note that the probability of an arbitrary case becoming inconsistent as a result of such indirect dependency, is bounded by the probability of a case belonging to the domain of a chain of rules simultaneously. This probability is inversely exponential to the length of the chain, for our current purpose, it's too small to be considered.

200 60 RDR Growth RDR Errors

150 40 % Error Rules 100

20 50

0

Percentage of Total Cases " Percentage of Total Cases Most Expertise~ Medium Expertise~ Least Expertise ■ □ Figure 7.6. Knowledge base size and accuracy for KBS built for the Tictactoe domain

Analysis in this section and the empirical results discussed, support the validity of our notion of correctness principle (see chapter 4). This is the notion that the predictivity has a lower bound which, for a given domain, is mainly dependent on the level of expertise. By its definition, this notion is independent of the order of presentation of cases to the expert during the knowledge acquisition process, because the lower bound is a global value which is independent of any particular rule. That is, if the correctness principle holds for a given order of presentation of cases during the knowledge acquisition process, it will then hold for any other order of presentation. However, as we will discuss in the

134 next section, the order of presentation of cases may impact the rules entered by the expert. This may cause the predictivity and the granularity of rules to fluctuate and slow down the convergence of the knowledge acquisition process. To avoid this, we propose an improvement to the algorithm of updating NRDR concepts (binary conclusion RDR trees). This algorithm will be less influenced by the order of presentation of cases, and it will also lead to a faster convergence of the knowledge acquisition process.

KB Accuracy:

Probability of Inconsistencies

KB size (#rules) Figure 7.7. Inconsistencies frequency and knowledge base performance: As the KB grows, the accuracy increments to the KB of newly added rules decreases. As the KB size increases the coverage of the rules decreases rapidly. The probability of past cases becoming inconsistent becomes close to 0. Indeed, most of the inconsistencies occur in the early stages of developing an RDR concept.

7.4. Order dependence and maintenance of NRDR concepts Different orders of presentation of cases to the expert during the knowledge acquisition process may cause him/her to enter different rules. In this section, we analyse when and how this occurs. Based on this analysis, we propose a new update process for maintaining NRDR concepts, which are defined as binary conclusion RDR trees. This new update maintenance process is more sophisticated than the known RDR maintenance process,

135 however we argue that this new update process is less dependent on the order of presentation of cases, and moreover it leads to a faster convergence.

The RDR development methodology has features which are clearly independent of the order of the presentation of cases. These features are highlighted in sub-section 7 .4.1. The methodology has other aspects which are order dependent. These are highlighted in sub­ section 7.4.2, and later analysed in sub-section 7.4.3. There, we also discuss the impact of order dependence, and we develop an RDR development methodology, which preserves the order independent aspects of the existing RDR development methodology, and as shown by our analysis it is less order dependent.

7.4.1. RDR development order-independent features

As earlier discussed, an RDR tree starts with a default rule of the form "if true then default conclusion". The first case CJ misclassified by this default rule will cause the first interaction with the expert and this initiates the interactive knowledge acquisition process. The first rule rJentered by the expert classifies CJ correctly, and if rJmisclassifies any of the previously seen cases then the expert enters exception rules to 'J· Conditions of

'J are based on the difference list between CJ and the corner stone case of the default rule. As the default rule and its corner stone case are fixed prior to the start of the knowledge acquisition process, then during the KA process, conditions of 'J depend solely on CJ. Similarly, during the KA process, conditions of all rules on the first exception level depend solely on their respective corner stone cases. Hence, the following observation immediately follows:

Observation 6: Order of presentation of corner stone cases of rules on the first exception level, does not impact his/her choice of conditions of these rules.

Indeed, observation 6 can be generalised as follows:

Observation 7: In an RDR tree, given a fixed parent rule rp with an exception link leading to a false link chain of exception rules to rp: Expert chosen conditions of rules in the false

136 link chain are independent of the order in which an expert enters the rules in this false link chain.

/ 1 I I I I I 1 - y - I z I p - X I I if-true if-true I I if-true I I I l ----✓

Figure 7 .8. Exception rules and order of case presentation: Order of presentation of corner stone cases of rules X and Z affects the KA process, however order of presentation of corner stone cases of rules Y and X does not.

In figure 7.8, rule P is fixed and rules X, Yand Z are entered by the expert. Rules P and Y have the same conclusion and rules X and Z have the same conclusion (see observation 2). Before rule X is added, cases in the scope of rule Y would be correctly classified by the parent of X, that is rule P. Those cases (in the scope of Y) can only be misclassified after a child for P (e.g. X) is added. That is, the expert has a reason to add Y only after a child for P (e.g. X) is added. Hence, changing the order of comer stone cases of X and Y does not change the conditions chosen by the expert for rules X or Y. This can be generalised to the following observation:

Observation 8: Given two comer stone cases c1 and c2 of two rules at exception levels n and n+ 1 respectively, in a binary conclusion RDR tree: Swapping the order of presentation of these two cases to the expert does not impact his/her chosen conditions of their respective rules.

7.4.2. RDR and order-dependence

We so far looked at the resilience of the RDR development methodology against some particular changes in the order of the presentation of cases to the expert. We now examine how other changes of order of presentation can impact the development of binary conclusion RDR trees, e.g. NRDR concepts. For the example shown in figure 7.8, we denote the comer stone cases of P, X, Y and Z by cp, Cx, cy and c, respectively. These cases appeared to the expert in this order: cp, ex, c)' then c,. The expert enters conditions for

137 X based on the difference list between case ex and er If c,, the comer stone case of Z, is seen by the expert before ex, then the conditions of the exception (child) of P would be based on the difference list between cp and c,. Hence, conditions of the child of P depends

,...,,------______-......

\ ' I ' , ,: .,.. --.,. ., G ' dom(P..) ) I ' dom{yo I I ' \ dom(y) ,.

,, , ,, A --- ...... ______B

, ' I I ' , I I ' I I I .,----- ... - .... I I \ odom(y) f \ > I '' ' , I ' ' ' ,' ' ' ' ' , , .. ______.,

dorn(Pcx) C D

Figure 7 .9: Possible relations between hypothetical rules resultant from two different orders of presentation (This figure follows figure 7.8). Pc, is a hypothetical rule based on c, and Pcx is a rule based on ex (that is X in figure 7 .8). Note domain (Pc,) is depicted by the dotted ellipse. Four possibilities exist: A: domain (P,J c domain

138 Observation 9: Given two comer stone cases c1 and c2 of two distinct rules, at exception levels n and n+2 respectively in an exception chain in a binary conclusion RDR tree: The order of presentation of these two cases to the expert determines the conditions of their respective rules.

In reference to figure 7 .8, the interesting question is, how would the condition of the child rule of P vary, if Cz (comer stone case of Z) is presented to the expert before ex (comer stone case of X). Indeed, it is this variation that underpins the impact of order of presentation of cases to the expert on the conditions that s/he chooses for her/his rules. In the next sub-section, we address that question and we propose an RDR development methodology, which minimises some of the negative aspects of order dependence, whilst preserving the order-independent features of the existing methodology.

7 .4.3. Towards order independent RDR development

It must be noted that the possibilities shown in figure 7.9 are exhaustive. For instance, it's not possible to have domain (Pc, )n domain (Pcx) = 0, as we know that c, E domain

( Pa )n domain( Pcx). In what follows, we discuss the impact of adding Pcx instead of Pc, on the knowledge base in each of those four possibilities. We then propose a new update algorithm, which minimises any impact, that the difference in the order of presenting c, and ex may have on the accuracy of the knowledge base. Instead, our new algorithm will guarantee a quicker convergence of the knwoledge acquisition process.

Clearly, the current discussion (illustrated in figure 7.9) applies only for exception branches of depth of at least 2. We analyse specific features of each of the four possibilities shown in figure 7.9. For every possibility, we propose a new update expert action, which decreases the order dependence of the development of a binary RDR tree (an NRDR concept), when a rule of depth at least 2 misfires:

In the first possibility, where domain (Pc, ) c domain (Pcx ) (Figure 7.9A), no classification power is lost by adding Pcx instead of Pa. That is, all cases classified by Pa are also classified by Pcx in the same way. Moreover, by adding P cz (in addition to Pcx) to

139 the RDR tree, no classification power is gained. Pcx covers at least all cases covered by Prz. In this scenario, we propose to use c, as a comer stone case only for an exception of Y. That is, the normal RDR update algorithm applies in this case.

In the second, third and fourth possibilities (Figure 7.9 B, C and D), domain (Pc,) covers parts of the expertise domain, which are not covered by domain(Pcx)- That is, some extra classification power is possible in Pcz which is not available in Pcx alone. Therefore, we propose to attach Pc, to the false link immediately following Pcx· This has the effect of classifying cases in [domain (Pcz ) \ domain (Pcx )] by Pc,· This captures the extra classification power of Prz· Adding Pcz to the false link, immediately following Pm does not lead to correct classification of c,, as c, E domain (Pcx). c, is an exception of Y and this still needs to be dealt with. This can be done by using c, as a comer stone case, only for an exception rule of Y, as in the normal RDR update algorithm.

Two notes should be made in reference to modifying the RDR update algorithm, as outlined above: Firstly, adding an exception for rule Y based on c, requires applying the comparison test shown in figure 7.9 recursively, if the chain of exceptions that includes the misfired rule is of depth larger than 2. This is when Y has exceptions for its exceptions. Secondly, for any case misclassified by a rule of exception depth 1, the normal process of adding rules in RDR remains the same. Only when a case is misclassified by a rule of depth 2 or larger, the possibilty of the scenario described in observation 9 arises.

An excetpion rule at depth n+2 in the same exception branch has a smaller domain than its grand-parent at depth n. At the deeper exception level, expert conditions are more restrictive. The context at depth n+2 narrows down his/her options. In our modification of the RDR update algorithm, outlined above, we account for the extra domain coverage which the classical RDR update algorithm overlooks. Regardless of which rule misfires when a case is misclassified by an NRDR concept, we allow a rule to be added to the first applicable exception depth. This is the first exception depth which corresponds to its class (this is depth 1 or 2, depending on the class of the misclassified case). If the depth

140 of the misfiring rule is at least 2, then the possibility of adding an exception of exception arises. The domain of the added rule is then compared to domains of its antecedent rules in the false link chain. Thus given a single RDR tree being updated, every update may lead to the addition of more than one rule due to the same misclassification. In most cases, this approach will yield faster convergence of NRDR concepts. Looking at the whole NRDR knowledge base, following the addition of every single rule, a consistency check is still required.

The most important aspect of our proposed update algorithm, is that the order dependence - as outlined in observation 6, will be minimised. In our proposal, each misclassified case may lead to an addition of a rule at the most outer corresponding level (1 or 2 depending on the class of the case). This guarantees that the first two exception levels are at least order independent (see observation 7).

C1 = [ ....• if b and C1 then c; ..... ]

'. \ \ \

' \ \ \ '..

'. ' '. ' Possibly other concepts are used to define C1

Figure 7.10. The rule "if band C1 then C2 " can be rewritten as "if band (y and C1) then C/' where (y and

C1) is equivalent to C1 before this was updated and caused an inconsistency in the domain of "if band C1 then C/'.

In the next section, we will prove that fixing inconsistencies can be done automatically during the actual knowledge acquisition process without asking the expert for further

141 interventions (This was termed secondary refinements in chapter 6). We will discuss how this compares to the NRDR expert driven development policies as discussed in chapter 6.

7 .5. NRDR maintenance

In this section, we show that given an update in a concept C1 which creates inconsistencies, because it is being used in conditions of a higher order concept, the undesired impact of the update of c1 can be automatically eliminated. This is by adding a correction term y to the rule condition using c1 (see figure 7.10). We show that this y can be automatically derived without involving the human expert.

Ripple Down Rule trees defining concepts in an NRDR knowledge base, can be easily converted to decision lists. An RDR tree without exceptions is actually a decision list. Decision lists are simpler structures than RDR. They are easier to reason about and manipulate. We use decision lists (DL) to demonstrate the derivation of y term (see figure 7 .10). Methods or results obtained using DLs can easily be mapped back to NRDR. Before we discuss the derivation of y, we discuss the relationship between RDR concepts in an NRDR knowledge base and decision lists. This discussion is largely based on (Scheffer 1996).

7.5.1. Nested Ripple Down Rules and Decision Lists A decision list (Rivest 1987) is a list of rules. In a decision list, a rule applies if none of its predecessors apply, ie. The first applicable rule classifies the input. Scheffer (Scheffer 1996) discussed the techniques of converting RDR into decision lists. Exception rules (true branches) overshadowing their parent rules are removed and placed in front of their parents. The condition of a removed rule becomes a conjunction of its previous condition with its parent condition. Obviously, an RDR not containing any exceptions is a decision list. In the following, we discuss interacting decision lists, because NRDR is a collection of interacting RDRs. Note, for a rule r within a decision list, definitions 6 to 8 in chapter 4 still apply. Further, because rules within a DL don't have any exceptions, scope = domain for all rules.

142 To convert NRDRs to decision lists, we introduce Nested Decision Lists (NDL). In NDL, corresponding to NRDR, a condition in a given list can be defined and calculated as lower order decision list. A Nested RDR knowledge base representing a high level concept c can be represented as:

[cond1 ➔ conc 1 ,cond2 ➔ conc2 ,cond3 ➔ conc3, ... , condn ➔ concn, default rule]

In NDL representation, conditions are conjuncts of lower order concepts, which are in tum defined in forms of decision lists.

7.5.2 Handling inconsistencies in NRDR

In section 4.1 in chapter 4, we noted that the context and domain of rules during maintenance of RDRs are stable. In fact, this stability eases their maintenance. In Nested RDR, this stability no longer exists. With respect to the decision list representation, consider a concept representation c given by the following decision list:

[ cond1 ➔ conc1 , cond2 ➔ conc2 , cond3 ➔ conc3 , ... , condn ➔ concnl

If the expert updates a concept definition within the conjunction expression of a rule r;=cond; ➔ cone; for some i, then scope( r;) changes. This change corresponds to the set of cases that were initially correctly classified by r;, but they ceased to be correctly classified because of the new update. These cases travel to the domain of the first applicable rule ri for some j where j>i. Another source of the change of the context may come from intercepting5 cases from the context of antecedent rules. This change in rule contexts may create inconsistencies. The set of comer stone cases constituting an encountered subset of the scope of a suspect rule - one which has in its conditions a concept which is being modified - is retested to detect inconsistencies. By observation 1

5 A rule r is said to intercept a case x following an update if: before the update x E domain (r2) where r2 follows r in the list, following the update we have x E domain (r) so the rule never really reaches r2 •

143 of section 7.1, only those cases in common between the context of the suspect rule and the domain of the new rule need to be rechecked.

In what follows, we prove that these inconsistencies can be dealt with automatically. New rules are automatically added to stop potentially offending cases travelling. This is opposed to the expert adding extra rules to correctly classify offending cases. S/he creates new rules with conclusion same as the class of the offending cases. This was discussed in chapter 6 and it was demonstrated in the example of "The eccentric film producer" (section 6.5). The expert driven fix of inconsistencies must follow the policies discussed in chapter 6 (section 6.4). This expert intervention will be formalised in the context of representing RDR concepts as decision lists, and it will be compared to the automatic method.

7.5.2.1. Automatic fix of inconsistencies We first discuss the formal framework for the automatic method. In this method, if a concept c1 is used to define another concept c2, and if a change in c1 causes inconsistencies within c2, then to undo the update side effects on the definition of C1, new terms are automatically added to c1. These new terms stop any cases travelling (see figure 7.10). This automatic method is based on the following theorem:

Theorem 4: Let a concept c1 get updated by adding a rule ru = cond1 ; ➔ conc1.. , to become:

C1 = [condu ➔ concu, ... , condl,i ➔ conc1,1, ... , cond1.n ➔ conc1,nl

If C1 is in the condition of a rule r2.x = cond2.x ➔ conc2.x within a concept c2 given by:

C2 = [ cond2.1 ➔ conc2,1, ... ' cond2,x ➔ conc2,x' ... , cond2.m ➔ conc2,m 1

Then to shield c2 from any impact caused by the update of C1, we can rewrite c2 as (bold font indicates new additions):

C2 = [cond2.1 ➔ conc2.1,···,cond1,/\ cond1,1/\ cond2," \ {c1} ➔ f (conc1,i, conc2,x), ..., cond1,1

/\ cond1,;.1/\ cond2,x \fc1} ➔ f(conc1,;.1 ,conc2), -, cond1,; A cond2,x ➔ conc2.x,-·•, cond2,m ➔ conci.ml where

144 f(concu, conc2,x) = Ci if(concu = C1 and conci,x = Ci) V(concu = --,c1 and conci,x = --,c2)

= --, Ci. otherwise

We prove the above theorem using the following observations:

Observation 10: If within a decision list L, the condition of a ruler is always false, r will never be used to classify any cases, and hence such r can be deleted from L, without changing its semantics

Observation 11: If within a decision list L, two consecutive rules r, and r2 have mutually exclusive conditions, their positions can be swapped over without altering the semantics of L.

Observation 12: In a decision list L, making replicates of a rule r does not change the semantics of a decision list, as long as the replicates follow r in order of their appearance in L.

Observation 13: A ruler= cond ➔ cone, can be rewritten as:

( X I\ cond v --, X I\. cond) ➔ cone Where X is an arbitrary boolean expression. That expression can be rewritten within a decision list as two mutually exclusive consecutive rules:

[ ... , X I\. cond ➔ cone, --, X I\. cond ➔ cone, ... ]

Observation 14: Given a concept c1:

c, =[cond,., ➔ cone,., , cond,.2 ➔ concu, ... , cond,.n ➔ conc1.nl

And given a rule r = c1 ➔ conc2,x within the list defining a concept c2 , then within c2 r can be rewritten as:

[ .... , cond,., ➔ f (conc1.1, conc2.,J, cond,., ➔ f (conc1,2, conc2,,), ... , cond, .. ➔ f (conc1,n, conc2,J, ...... J

145 The conclusion of r is reached if c1 is true. The sign of this conclusion of r depends on both, the rule conclusion conc2.x and conclusion of the satisfied rule in the definition of c1•

Hence, f (concu, conc2.x) = c2 if ( concu = C1 and conc2.x = c2 )

V (concu = -, C1 and conc2.x = -, c2)

= -, c2, otherwise

To prove theorem 4, we need to show that the initial rule cond2," ➔ conc2," in the definition of c2 which uses c1 before ru = condu ➔ concu was added to c1 is equivalent to the following sub-list of rules substituted within the adjusted c2 (as stated in theorem 4):

cond1,; I\ cond1,1/\ cond2," \{c1} ➔ f (conc1-i, conc2), .•. , cond1,; I\ cond1,;-1/\ cond2," \{ci} ➔ f(conc1,i-1 ,conc2), -cond1,;/\ cond2," ➔ conc2,x (1)

Using observation 14, initial cond,,, ➔ conc,,,can be rewritten as:

cond1,1 I\ cond2," \ {ci} ➔ f (conc1,i, conc2),cond1,2 I\ cond2,x \ {ci} ➔ f (conc1,2, conc2), ... , cond1,n I\ cond2,x \ {ci} ➔ f (conc1,n, conc2) (2)

We now show that (2) is equivalent to (1). Using observation 14 and noting that the last term in (1) -, cond1,;/\ cond2," ➔ conc2,x uses the modified c1, we rewrite (1) as follows (i.e. substituting cond2," with its actual decision list definition):

cond1,;/\ cond1,1 I\ cond2," \{ci} ➔ f (conc1,1, conc2), ..., cond1,;/\ cond1,i-1 I\ cond2," \{c1} ➔ f(conc1,i-1 ,conc2), , cond1,i I\ condu I\ cond2.x \ {c1} ➔ f (conc1,1 , cone,), ... , -, cond1,; I\ condu I\ cond2,x \ {ci} ➔ f (conc1,; , cone,,,), ..., -, cond1,; I\ cond1.n I\ cond2," \ {ci} ➔ f (conc1,n ,cone,,,)

By the definition of a decision list, -. cond1,i can be deleted from all terms after the point of insertion of condu to get:

146 cond1,J\ cond1,1I\ cond2,x \{c1} ➔ f(conc1,1, conc2), ..., cond1,; I\ cond1,i-1 I\ cond2,x \{ci} ➔ f(conc1,;_1 ,conc2), -, cond1,; I\ condu I\ cond2,x \ {ci} ➔ f (concu , conc1), ••• , False ➔ f

(conc1,i ,conc1,, ), ••• , cond1 .• I\ cond2,x \ {c1} ➔ f (conc1 .• ,conc1,,)

By observation 10, we delete the rule with False condition and the above becomes:

cond1,; I\ cond1,1 I\ cond2,x \{ci} ➔ f(conc1,1, conc2), .•. , cond1,; I\ cond1,;-1 I\ cond2,x \{ci} ➔ f(conc1,;_1 ,conc2), -, cond1,; I\ condu I\ cond2,x \ {ci} ➔ f (conc1,1 , cone,), ... , -, cond1,; I\ condu.1I\ cond2,x \ {ci} ➔ f (conc1,i-1 ,cone,), cond1,;+1 I\ cond2,x \ {ci} ➔ f (conc1,;+1 ,cone,),

..., cond1 .• /\cond2,x \{ci} ➔J(conc1,. ,conc1,,)

Note, any two rules containing complementing conditions are mutually exclusive, so the first i-1 rules in the above are mutually exclusive from the second i-1 rules, so using observation 11, we can rewrite the above in pairs as follow:

cond1,; I\ cond1,1/\ cond2,x \{ci} ➔ f(conc1,1, conc2), -, cond1,; I\ condu I\ cond2,x \ {ci} ➔ f

(conc1,1 ,cone,), ... , cond1,;/\ cond1,;-1/\ cond2,x \{ci} ➔ f(conc1,;-1,conc2), ,cond1,; I\ condu-

1I\ cond2,x \ {c1} ➔ f (conc1,;-1 , cone,), cond1,;+1 I\ cond2,x \ {ci} ➔ f (conc1,;+1 , cone,), .•• , cond1 .• I\ cond2,x \ {c1} ➔ f (conc1 .• ,conc2,,)

Using observation 13, the above can be rewritten as:

cond1,1 I\ cond2,x \{ci} ➔ f(conc1,i, conc2), ..., cond1,;-1 I\ cond2,x \{c1} ➔ f (conc1,;_1 ,conc2), condl,i+l j\ cond2,x \ {ci} ➔ f (concl,i+l , cone,), ..., cond1,n j\ cond2,x \ {ci} ➔ f (conc1,n , cone,,,)

The above is exactly equivalent to (2); and hence (1) is equivalent to (2) and theorem 4 is proved. QED

During maintenance of an NRDR knowledge base, any conversion done due to theorem 4 is automatic. That is, it is possible to automatically - without involving the expert - generate the adjustment expression:

147 cond1,; A cond1,1 A cond2,x \{cJ ➔ f (conc1,1, conc2,x), •.• , cond1,;A cond1,;.1 A cond2,x \{cJ ➔ f(conc1,;.1 ,conc2)

As we discussed in section 7.1, the expected number of inconsistencies is extremely low. We would expect most of the rules in the adjustment expression given by theorem 4 to have an empty scope, and get automatically deleted (see observation 10). Note also, that if a concept c1 occurs in more than one rule within c2, the above conversion is only needed where inconsistencies are detected. However, unless every occurrence of c1 is converted according to theorem 4, then the check for inconsistencies needs to be propagated up the conceptual hierarchy in an NRDR knowledge base.

7.5.2.2. Reviewing the Expert Fixing the Inconsistencies In involving the expert in dealing with inconsistencies, detecting them first (as in the automatic method) is required. As discussed in chapter 6, the process of checking for inconsistencies, when a concept c is modified, requires access to all cases previously classified by the RDR defining c. These cases are classified again. A case x is inconsistent if its new classification differs from the old classification.

Once inconsistencies are detected, the expert is asked to enter rules to classify them correctly as in the example shown in chapter 6. During the discovery of x, because of the nested structure of the RDR's some lower order concept descriptions are found about x. To repair the inconsistency of the knowledge base with respect to x, some of those lower order descriptions may need to be updated. This may in tum cause more inconsistencies to occur. Hence, the process of checking inconsistencies may also be recursive. However, this process is guaranteed to terminate because of our policies for developing an NRDR knowledge base. The policies ensure that the expert finally reaches the highest order concept in the NRDR hierarchy when dealing with inconsistencies. These policies were discussed in details in chapter 6.

148 Case classifications within every level in the conceptual hierarchy are preserved. The expert enters rules whose conditions satisfy inconsistent cases and none of the cases in the context of their precedent rules. In the context of decision list representation of

NRDR concepts, consider an RDR represented by the following decision list: [ a ➔ c, b

Ad ➔, c, e Af ➔ c, default rule]. Assume that a change in a concept a caused some cases from scope (a ➔ c) to down to the second rule, and they became , c instead of c. Assume also that some cases from scope (b Ad ➔, c) were intercepted by the first rule, and they became part of scope (a ➔ c ). For the intercepted cases, the expert can add an exception rule r1 = a A extra_cond ➔ , c to a ➔ c. For the first group of inconsistencies, an exception rule r2 = b Ad A extra_cond2 ➔ c can be added before b A d ➔, c. So the new concept definition of c becomes:

[a A extra_cond ➔ , c , a ➔ c, b A d A extra_cond2 ➔ c , b A d ➔ , c, e A f ➔ c, default rule ]

Note that r1 and r2 may be two distinct sequences of rules. That is, more than a single rule may be needed to deal with inconsistencies caused by modifying concept a. The important fact about r 1 and r 2, is that their conditions are satisfied by all inconsistent cases. But they are not satisfied by any cases in the context of the rules which follow (This condition is also part of the RDR development process e.g. see (Compton, Edwards et al. 1991; Gaines 1991)).

In fixing any inconsistencies, the expert must start from the lowest order concepts, and must work her/his way to the top of the conceptual hierarchy. When c is modified, any rules using c in higher order concepts need to have a similar concept check. However, this propagation of updates up the concept hierarchy is guaranteed to stop, as the hierarchy is limited by the cognitive process of humans when verbalising their thought processes (Sowa 1984). In experiments in the domain of chess (Beydoun and Hoffmann 1997), the conceptual hierarchy was 4 to 5 levels deep. In those experiments, the expert was responsible for fixing any inconsistencies, while he maintained medium size knowledge bases (Beydoun and Hoffmann 1997).

149 Opposed to the expert fixing the inconsistencies, the automatic method does not require context checking propagation all the way to the entry point concept of the knowledge base. This is because in the automatic method, the impact of any update causing inconsistencies is limited to the concept being updated. We recognise that concepts are not used in the same way to define other concepts, i.e. they do not contribute equally to semantics of other concepts. This is ignored by the automatic update method. In this method, it is possible to preserve semantics of the knowledge base by having an expert oversee each automatic update. S/he can ensure that the correction term y does not violate any semantics, and s/he enters her/his own rules if s/he sees any semantics violation.

It's also possible to use a hybrid approach to increase efficiency without loosing semantics. Whereby at lower layers of the hierarchy, where the reuse of the small concepts is more likely, we can use the automatic shielding method to limit the upward propagation of updates. In higher layers, where concepts cover a larger number of cases, we can insist on the expert intervening. Moreover, where efficiency is required, the depth of propagation of the inconsistency check can be artificially limited.

7.6. Chapter summary and conclusion In this chapter, we analysed the interactions between concepts in an NRDR knowledge base. We have shown that the probability of these interactions causing inconsistencies is indeed small. Further, this probability diminishes, as the knowledge base becomes more accurate. The convergence of an NRDR knowledge base presumes convergence of individual RDR trees. We have shown that RDR trees converge as long as theorem 1 of chapter 4 applies. This is so, as long as the predictivity of individual rules in the knowledge base is bounded by some fixed constant (we called this condition in chapter 4 the correctness principle).

The convergence relationships derived in this chapter are independent of the order of the presentation of cases to the expert, as the assumptions embodied in the correctness principle are order independent. However, to ensure that the order of presentation of

150 cases (to the expert) does not slow the convergence of RDR trees, we have proposed in this chapter a new update mechanism which is less order dependent and which also guarantees faster convergence.

In this chapter, we also showed that secondary updates, required to deal with inconsistencies, could be handled automatically. These secondary updates can be seen as the extra cost of developing NRDR knowledge bases instead of RDR knowledge bases. As these do not necessarily require expert intervention, the cost of developing an NRDR knowledge base in terms of the number of interactions with an expert, is at most equal to that of developing an RDR knowledge base. However, let us consider the number of rippling paths available in an RDR/MCRDR knowledge base and an NRDR knowledge base of same size. We argued in chapter 3 (section 3.5), that the number of rippling paths is considerably larger in an NRDR knowledge base (in NRDR this is exponential to the number of concepts). A rippling path determines the context of a rule. Clearly, the larger the number of rippling paths is, the larger the number of cases that can be covered by the knowledge base becomes. In other words, given two distinct knowledge bases of same size, an NRDR knowledge base and an MC/RDR knowledge base, then the NRDR knowledge base will cover many more contexts. As we showed in this chapter, the cost of adding a rule in an NRDR knowledge base can be equivalent to that of adding a rule in an RDR/MCRDR knowledge base. Therefore, because NRDR rules cover many more contexts and cases than standard RDR, we expect a faster convergence of an NRDR knowledge base than an RDR/MCRDR knowledge base6•

In the next chapter, we employ NRDR to capture human search knowledge in the domain of chess. This will show that NRDR is easy for the expert to use. To be able to employ NRDR in SmS, our workbench requires a short knowledge engineering phase. This will also be discussed in Chapter 8.

6 This discussion does not take into account inference MCRDR which allows repeat inference over the same case. This is a very recent development e.g see (Compton 1998; Richards 1999).

151 Chapter 8

A case study: NRDR and SmS in chess

In this chapter, we employ our system SmS 1.3 to capture and operationalise expert chess knowledge. This will show that our Nested Ripple Down Rules framework is an economical and effective framework to build an expert search knowledge base. The ease of the knowledge acquisition is an empirical complement for the theoretical analysis of the past chapters. As we will see, inconsistencies do not pose major problems for the expert. The multi point modification aspect of updating a Nested Ripple Down Rule knowledge base will also prove to be natural for the expert to deal with. This chapter will give details of how the knowledge acquisition environment criteria discussed in chapter 5 are accommodated. This provides an empirical evidence for the validity and reusability of SmS.

152 Section 8.1 of this chapter discusses the knowledge engineering steps required to apply it for the domain of chess. Sections 8.2 to 8.4 will give a demonstration of actual knowledge acquisition sessions conducted with SmS. They will highlight specific features of Nested RDR. We also show the effectiveness of these knowledge acquisition sessions by demonstrating the strongly pruned search conducted by the system in the domain of chess. In section 8.5, we conclude by outlining the contributions of SmS in chess to the chess computing community.

8.1. Tuning SmS to the domain of chess In chapter 5, we discussed that the expert in expressing his/her knowledge, s/he may use explanatory primitives, and that these are embedded in a more general interactive language SKIL (Search Knowledge Interactions Language). SKIL is first described in this section, then a discussion and a description of the explanatory primitives specific to the domain of chess follow. These primitives adapt our interaction language SKIL for its use in the chess domain.

8.1.1 Search Knowledge Interaction Language (SKIL) overview

From a system design (software engineering) perspective, SKIL allows an easy knowledge engineering (KE) stage. It is a reusable component of the system. Adding explanatory domain primitives tunes it to a given domain.

SKIL adds a new abstraction layer between the expert and the cases that s/he explains. This plays two roles: Firstly, it condenses the knowledge base by having expressive statements. Secondly, it gives an interface to the working memory. SKIL provides an extra parameter with every primitive indicating whether the primitive describes the current case (the current search state), or a search state within the working memory. During the knowledge acquisition process, SKIL statements can refer to both, the current case and the ongoing search process stored in the working memory (e.g. in chess this would be the calculated chess game in progress). During search, these statements will refer to possible search paths for an ongoing search being examined by the search engine. In this way, the human reasoning explained, by the expert in the knowledge acquisition process, will be regenerated and examined by the knowledge base during the search process.

153 SKIL allows existential and universal quantification of generic variables used in the explanatory primitives 1• Conjunctions and negations of these primitives are also possible. SKIL constructs are perfectly transferable across domains. The actual explanatory primitives are non-transferable because they are domain dependent. A domain expert is required for their construction during the initial knowledge engineering stage. An example of a set of such primitives in the domain of chess is discussed in the next section.

An important strength of NRDR as earlier discussed, is in that it allows the expert to define his/her own concepts in terms of explanatory domain primitives, or other previously defined concepts. In defining a concept X (as an RDR tree) in SKIL, the expert can give a parameter list to the concept. The use of this parameter list is flexible. Initially, the expert may give a general rule condition leaving some parameters unbound. In later refinements of the rule, s/he may enter new more specific rules with bound parameters. Also, in SKIL, there is a special time stamp parameter to indicate the point of application of the concept in the search path. For example, if the expert states X(-1), then s/he would be referring to the previous search state in the search path. Concepts evaluated during visiting a particular search path are stored in the working memory. These may be reused later when a reference to visited search states, such as X(-1 ), is made. This bypasses the use of the knowledge base. Clearly, this decreases the number of times the search engine queries the knowledge base, and allows a more efficient search. This latter functionality of the working memory in storing recent past inferences is similar to the function of the short term memory of human experts during search. During the knowledge acquisition process, the expert is given complete access to the actual search progress stored in the working memory, ass/he gives his/her explanations.

In normal RDR, the expert chooses a list of relevant differences from the so-called difference list. This difference list is the finite set of differences between the comer stone case of the parent rule and the comer stone case of the newly added rule (see

1 In chess, the search operators constitute the move generator. Chess is a well defined domain. Given a

starting position, the move generator generates all possible legal chess positions. Thus there is no

discussion of the search operators in chess.

154 chapter 3). In normal RDR, this difference list is easily generated as every case is represented as a vector of propositional valued attributes, and these same attributes are used as conditions in new rules. In using Nested RDR for search domains, cases are also represented as propositional valued attribute vectors. However, conditions of rules in NRDR in search domains are relations between the vector features expressed in SKII... So, a finite difference list is not feasible in our framework. Instead, the expert is represented with a visual representation of a comer stone case of the parent rule and the current case causing the addition of a rule. Thus, s/he can ensure that his chosen conditions apply to the current case without applying to the comer stone case of the parent rule. If s/he makes a mistake in this regard, then the interface will warn him/her.

The explanatory primitives themselves can be declared in reference to the current search state, or to any past search state in the search progress stored in the working memory. As an illustrative example, we discuss the implementation of explanatory domain primitives in the domain of chess.

8.1.2. Adapting SKIL to chess

Explanatory primitives define basic relations between elements (components) of a given search state. Any such relation can be also defined incrementally by an expert as NRDR concepts during the knowledge acquisition process. For some relations, it is clear that their C-implementation is simpler than their corresponding SKII.. implementation in NRDR. These are defined as explanatory primitives on the outset of the know ledge engineering stage.

For other relations, it is less clear whether they should be implemented as C­ procedures or NRDR concepts. For deciding whether they are defined as NRDR concepts or explanatory primitives, two factors are considered: First is how simple it would be to program the relation as a primitive. Any implementation that requires more than a single C loop is not considered simple. The second factor is how frequent a relation (concept) is being used. Only concepts (relations) used in a large percentage of expert rules are considered for a C-implementation (as explanatory primitives). Of these, only simple concepts are defined as primitives. All others are defined as NRDR concepts.

155 In the domain of chess, we classify explanatory primitives into four categories:

Variable declaration and comparison primitives: There are a number of types of which a variable can be declared. A variable X can range pieces, ranks, files, colours, or particular squares. Note, in chess a rank is a horizontal row of squares on the chess board. There are 8 ranks ranging from 1 to 8. A file is a vertical row of squares on the chess board. There are 8 files, ranging from a to h. See examples in table 8.2.

These types are very weak types in that they do not impose any constraints on what can be said by the expert. They simply provide an easy interface for the expert. On a symbolic level, there is only one generic type managed by the stack used by the SKIL language interpreter. The weak types correspond to groupings of generic objects of this generic type. Clearly, specification of these types is simple. Hence, there are no constraints on the number of types that can be specified. For example, in chess a single object on the SKIL variables stack corresponds to a single square on the board. For a variable of type rank, the eight corresponding squares are pushed on the stack. For a variable P of type piece, all squares holding a piece P are pushed on the stack.

Tactical primitives: These essentially describe how many legal moves are needed for a piece on a given square to reach another square. See example in table 8.2.

Visual primitives: In chess, the expert often searches for visual patterns on the board when making a decision. This is very common throughout a game of chess. In particular in endgames, most of the expert chess knowledge is in recognising such visual patterns. To express her/his visual reasoning, the expert is provided with explanatory primitives which relate variables in terms of their relative visual orientation. See example in table 8.1.

In earlier knowledge acquisition experiments, these visual primitives were defined as NRDR concepts. The expert was defining visual concepts such as "Piece_On_Hfile". However, as the C-function defining such concepts are very simple to implement (a single for-loop), we decided to revert to their C-implementation. This eased the

156 knowledge acquisition task for the expert. Further, compared with NRDR concepts, c­ primitives are executed faster.

Global primitives: In chess, the expert often refers to global features of the board when justifying his/her move. E.g. s/he may say "/ exchange my queen because I am two pawns ahead, and therefore my two pawns advantage will become more crucial when there are less pieces on the board". For example bvalue (0) << bvalue (-2) compares the current board value of the current position to the value of the board position two moves ago (see table 8.1 for more examples).

Primitive and description Example Example explanation Category

Board value comparisons: This Bvalue (0) >> bvalue The current board value in Global describes the changes in total (-2) the current position is greater primitive piece value on the board as the than the value of the board game progresses. position two moves ago. Missing pieces: This states 2 P missing The position in question has Global which pieces are not on the two missing white pawns. primitive board. Count ranks: This counts the X R> Yin 1 There is one rank separating Visual number of ranks between two X and Y. X (or Y) can be a primitive variables. particular piece or square, or any square existentially declared on a given file, rank or colour Count files: This counts the XF>Yin3 There are three files Visual number of files between two separating X and Y. X (or Y) primitive variables. can be a particular piece or square, or any square existentially declared on a given file, rank or colour

Shortest distance measure: This X ➔ YinS The minimum separation Visual counts the shortest separation between X and Y is 5. X (or primitive between two variables. This Y) can be a particular piece separation is the minimal or square, or any square number of ranks or files. existentially declared on a given file, rank or colour. Table 8.1: Global and visual explanatory primitives

157 The use of explanatory visual and global primitives is common for expressing advanced competence in a given domain. For example, the expert can express foresight in a position (say in chess) without having to go through the sequence of moves in his/her mind. He does this by recognising a certain pattern on the board, e.g. The prospect of promoting a pawn is foreseen by the expert through simple geometric

(visual) observations2•

The use of such global foresight reduces the search computation greatly for two reasons: Firstly, the computation of the involved primitives is very quick, no resort to rules of the game or the move generator is needed. Secondly, such static features persist over sequences of moves and simple reference to the working memory, as discussed earlier, will suffice to any future reuse.

At the symbolic level, explanatory domain primitives express relations between (or descriptions of) features in the feature-vector representation of the cases. These features also include properties specific to search operators (rather than search states). For example in chess, a case (a search state with the operator leading to them) is represented as 68-element vector. The vector contains the following features: 64 features which describe squares on an 8x8 chess board, 3 features which describe the move (search operator) which led to this position, and one feature to describe the possibility of castling. This representation resembles the pseudo-standard PGN chess notation with the addition of the three features describing the move (the source and the destination squares, and any captured piece as a result of the move).

In the system implementation, we ensured flexibility of the explanatory domain primitives module (see figure 5.5). We used an automatic parser generator system based Lex and Yacc3. So, it is easy to extend the syntax of our language SKIL to accommodate new explanatory primitives during the actual knowledge acquisition stage.

2 Promoting a pawn in chess means advancing a pawn to the final rank where it can be transformed to a higher value piece.

3 These are standard UNIX (and lately ) parser generator tools.

158 In the next section, we will demonstrate how an expert uses the explanatory primitives in tables 8.1 and 8.2 within SKIL, to define the lowest layer in the hierarchy of an NRDR knowledge base. SmS will use this knowledge base to simulate an expert chess search process. We will emphasise interactions between SmS and the expert. This will show how the expert/system interactions are at the knowledge level.

Primitive and description Example Example explanation Category

Declaring variables: This X=a5 X is declared to be Variable declares a variable to be a square a5. declaration square, a piece, a file, a rank or the whole board. Variable comparison: This X=Y X is constrained to Variable compares two variables. values equal to Y. X comparison and Y must be declared. Value comparison: This X>>P X is constrained to Variable compares the value of a values larger than a comparison variable to a piece value. E.g. pawn (a pawn is I in chess a rook is 5, and a point) knight is 3. Captured piece statements: the Captured= P The captured piece is a Comparison captured piece ceases to be on pawn. primitive the board, therefore it's designated by its own token "captured" Table 8.2. Variable declaration and tactical explanatory primitives

8.2. Incremental acquisition of chess knowledge We now demonstrate our approach by showing examples of a typical knowledge acquisition cycle (or rather spiral) in the domain of chess. The objective is to develop a search knowledge base which produces a search process that resembles a human expert search process as much as possible. To do this, an expert defines concepts that approve moves to be considered in a minmax tree search. The search conducted by the search engine determines the actual move to be taken. In order to keep the searched tree small, the knowledge base approves only those moves that are really worthwhile considering any further. On the other hand, no dangerous move by the opponent should be excluded from the tree search in order to ensure high quality play. The knowledge acquisition process attempts to develop a concept 'good move'. This is

159 the highest order concept in the hierarchy of the Nested RDR search knowledge base. This concept 'good move' applies to exactly those moves which should be considered for the tree search. Initially, no search operator in any search state will qualify, i.e. no move in any chess position will be classified as 'good move' and, hence, no tree search takes place. Initially, the NRDR knowledge base contains only the default rule "If True then not Good".

In what follows, we show snap shots of the knowledge acquisition session. We finally show the effectiveness of the knowledge base in pruning the search conducted by the search engine. We first show the specific feature of the NRDR framework of how it presents the expert with more than one possible option of update (as discussed in chapters 3 and 6).

8.2.1 Multi-point modification

Initially the knowledge base is empty. The default rule is "If True then not Good" which means no move is considered as good. Hence, no more will be developed further in the search tree. That is, none of the search operators produced by the move generator are approved. The system develops a search tree of depth 1.

Later, when the knowledge base matures, it approves some search operators (chess moves). The search engine then chooses the best move using a simple evaluation function as used in the minmax search algorithm. This evaluation function can also be developed during the knowledge acquisition by asking the expert to assign weights to his NRDR terms being used as rule conditions. Every conclusion of the knowledge base will have a weight based on these weights. Because chess is a well defined domain, and many evaluation functions are available in the literature, we provide a simple built-in evaluation function to the system. This simplifies the effort required by an expert during the knowledge acquisition process.

In figure 8.1, there is a pawn to be captured on e3. The human expert plays for black Nxd5. Here, the system's ability to allow the expert to give suggestions is being used (see criteria of the KA environment in chapter 5). The expert sees this as a good move because he wins a piece. Thus, on the highest level of the knowledge hierarchy, he introduces a new rule "If WinPiece then Good". Subsequently, he needs to explain

160 the meaning of the newly introduced attri bute "WinPiece ". The system creates a new Ripple Down Rule t ree prompting the expert to expl ain the concept "WinPiece". The expert enters th e root n ode "If Win Pawn then WinPiece". Similarly, the system prompts th e expert for an expl anation of the attribute "WinPawn". The expert expl ains the concept in terms of pri mi ti ves: "If X =+ dt and white -to X in 1 and Captured = P then Win Pawn "4 ( see table 8.2). The chain of interactions between the system and the expert terminate at thi s point as the meaning of the primitives is clear to the system.

Note, if the task of constructing an evaluation function is also required from the expert then he may enter a weight for the "WinPawn " term. In this case, he would enter +l as a s uitable weight. However, it's not possible for the expert to enter a definite wei ght for the concept "WinPiece". This depends on lower concept definitions (s uch as WinPawn). In this case, he enters a special undefined weight symbol. The weight of the conclusion of a rule "If WinPiece then +Good" would be calcul ated dynamically based on the context-dependent weight evaluation of the concept WinPiece.

Figure 8.1. After ... Nxd5 a nd white plays Ba3, black to pl ay and capture a "Free" pawn on e3 with .. . Nxe3.

Rule Good.I: If WinPiece then G ood

Rule Wi11Pawn.l: If Win Pa wn then Winpiece

Rule Wi11Paw11 .l : If X = + dt and white -to X in 1 and Captured= P then Win Pawn

Table 8.3. The first three entered rules in the chess KB .

4 "X =+ dt" means the va ri able X is instanti ated as the destinati on square o f last made move.

16 1 Assume that it was white's tum, and white plays Rel, and that the system knowledge consists of the three rules entered in the above interactions with the expert (see table 8.3). When asked to play, the system will respond with Nxe3. To the system, this is a good move because it wins black a piece (it wins a piece because it wins a pawn). The expert disagrees with this, because Nxe3 is not a safe move as the black knight on e3 can be captured by the rook on el. So, he must modify the knowledge base. There are three modification points for the knowledge base. At the highest level, the expert can add "If Not SafeMove then Not Good" on the true link of "If Win.Piece then Good". Or, he can change the meaning of "Win.Piece" by adding "If Not SafeMove then Not Win.Piece"; finally, he can also change the meaning of "Win.Pawn" in the WinPiece Ripple Down Rule tree by adding "If Not SafeMove then Not Win.Pawn". The choice of the point of modification is part of the knowledge acquisition process. In this particular example any of the three mentioned modifications is valid.

Figure 8.2. Pawn on e3 can or cannot be captured? Left: After l. Rel -- Nxe3 (bad unsafe play by black). Right: Pawn on e3 is worth capturing.

Inserting a new rule on a higher level in the concept hierarchy can specialise the knowledge base more than inserting the new rule in a more specialised concept. That is , the knowledge base amendment would apply to a smaller number of cases. The higher the number of occurrence of the amended concept in the knowledge base is, the more likely inconsistencies are introduced into the knowledge base. It is the responsibility of the knowledge acquisition assistant module to detect these effects and guide the expert to deal with them as we discussed in chapter 6. The deeper the hi erarchical structure of the concepts is, the more condensed the knowledge base may become. Typical depth of the hierarchy is 4 to 5.

162 In our ongoing example, the expert chooses to modify the meaning of "WinPiece" by adding "If Not SafeMove then Not WinPiece" to the true link of Rule 1 in the "WinPiece" concept definition. Of course, the meaning of "SafeMove" needs to be explained. The expert introduces the rule "X =+dt white +to X in ]then Not SafeMove" (see tactical primitives description in table 8.2). This is understood by the system, because it is defined solely in terms of the available primitives.

Now, we show how reference to the working memory may be needed during the search process and hence also the knowledge acquisition process.

8.2.2 Use of working memory during knowledge acquisiton

In figure 8.2 (left), the pawn on e3 is protected by the rook on el. Hence, Nxe3 is not considered a good move by the system. However, capturing this protected pawn on e3 exposes the white knight on d4 to the black bishop on g7. This is overlooked by the current knowledge base. So, the expert enters new knowledge to the system. He adds the rule "If Expose then Good" to the highest level "Good" concept. The new concept Expose(X) is explained as capturing an opponent piece Y, such that Y has been protecting the piece X for the last two positions. That is, before the opponent had his move Y was protecting X i.e. the move made by the opponent exposes X to an attack. Reference to last two positions is possible through the working memory being accessed by the explanatory primitives. Note, the parameter X is a variable used in primitives explaining the concept "Expose", and X refers to the piece which is no longer protected as a result of the "Expose" move.

Figure 8.3. Pawn on e3 is not any more worth taking after White plays Bb2.

163 In figure 8.3, white played the bishop to b2 to protect the knight on d4. Hence, capturing the pawn on e3 no longer exposes the knight. Hence, the knowledge base needs to be altered again. In particular, inside the Ripple Down Rule tree describing the Expose concept. According to our chess expert, a black move exposes a white piece if the move removes all of the white piece's protection which it had in the previous two positions.

8.3. Using knowledge to prune chess search Chess is a two-player game. The search tree nodes alternate for the two players. To look ahead further than one half move, the knowledge base must be large enough to account for responses of the opponent. In this section, we develop the knowledge base further, and we demonstrate the capability of the system to generate an intelligibly pruned search tree.

The expert plays for white explaining the option he takes. Thus, for figure 8.1 (left) the expert plays Rel to protect the pawn on e3, and for figure 8.2 (right), he plays Bb2 to protect the knight on d4. The knowledge base is developed by the expert's input. He introduces the concept of "DefendingPiece". Black is required to find some good moves in the absence of tactical obvious moves- i.e. capturing and/or exposing white's pieces. So, the expert develops the knowledge base to cover more strategic moves such as strengthening one's defences or building potential attacks. He introduces the concept of a "Solid Move" (increasing attack on a piece) and the concept of "Safe Move" (increasing defence of a piece).

The knowledge base matures to 47 rules and 19 concepts (KBl in table 8.4). The hierarchy of concepts is up to depth four. The position on the left in figure 8.4 is taken as a starting position with white to play. Table 8.4 shows the effectiveness of the knowledge base to prune the tree. The pruned tree was tested for intelligibility. The computer play with the pruned tree is also shown below. The play was using a pruned tree of depth three. The less mature knowledge base (KB2 in table 8.4) did not contain concepts describing strategic moves. Table 8.4 shows the tree pruning effect of both knowledge bases. The less mature knowledge base leads to a smaller search. This is because it overlooks sensible search paths. Note, the default root node rule in

164 the NRDR chess knowledge is the conservative rule "if True then not Good". This disregards all moves unless otherwise classified as good.

Search Tree (number of nodes) Average breadth(branching factor) Depth Using kbl Using kb2 Without KB UsingKBl Using KB2 Without KB 1 4 3 26 4 3 26 2 14 10 925 3.74 3.16 30.4 3 50 17 25880 3.68 2.57 29.6 4 llO 28 757935 3.23 2.3 29.5 5 334 47 22.5 millions 3.19 2.16 29.5 Table 8.4. The pruning effect of two knowledge bases of different maturity. Note the deeper the tree the thinner it becomes, as more possibilities fail to be worth pursuing.

The following shows the moves of a game between SmS 1.3 and an average human player. The starting position is shown in figure 8.4 (left). The resulting position is shown in figure 8.4 (right)

White Black ( Computer plays with KBJ) 1. Rei Rd8 2.h3 BJ6 3 Nd4 Nxe3 4. Rxe3 Bxd4 5. Kj2 BxRe3+ 6. KxBe3 Re8+ 7.Kf3

The above play resulted in the computer gaining the upper hand against an average human player. Clearly, the used knowledge base would not suffice for playing sufficiently well over an entire chess game as many more problems would occur, to which the knowledge base has no solution yet. However, it is clearly possible to take advantage of the human knowledge to prune the tree significantly without loosing quality of play.

165 11.i-

Figure 8.4 Average human player against our system SmSl.3. Left: The starting position. White to play against SmS 1.3. Right: The resulting position. SmS 's performance (black) against an average human player (white).

8.4. SmS in chess endgames We also employed SmS to capture chess knowledge in chess endgame problems. Our NRDR methodology, which allows the expert to enter his own terms and to enter exceptions to their RDR tree definitions still, proved a powerful way to capture the expert's knowledge interactively. We show snapshots of the knowledge acquisition highlighting those NRDR features.

Figure 8.5. A typical starting position for the KA session concerned with teaching SmS checkmating in endgames. For checkmating with a rook instead of a queen, the queen is swapped for a rook in this same starting position.

A checkmate with a queen or a rook can only be executed if the opponent's king is on the edge of the board. The KA process starts with the initial position shown in figure 8.5. The expert aims to squeeze the king to the edge of the board, and when the king is completely squeezed by the rook or the queen, he then checkmates him. To express this trategy to the system, the expert introduces the concept "squeezing" which he

166 uses to describe a move which pushes the opponent's king towards the edge. The expert gives a rough definition of the "squeezing" concept using the visual primitives as follows: "If the attacking piece is moved so that it forms an L pattern with the opposite king then this is a squeezing move" (see figure 8.6). Further, he enters the rule "if the opposite king was not 'squeezed' and the move is 'squeezing' then the move is 'Good'" (note reference to the working memory is required to say that the king was not squeezed). However, exceptions for these rules exist, and these must be handled during the KA process. For example, this last rule can cause a position known in chess as a 'stalemate'. This is when the king is not being threatened in its current position, but any move would cause him to be captured (as shown in figure 8.7). Such a position is considered a draw, and it must be avoided at any cost for the player with a superior attacking power on the board. Therefore, the expert enters an exception rule for the previous rule: "If Stalemate then not Good". He defines the concept 'Stalemate' as a new RDR tree.

Towards completing the KA task, the expert enters rules that complete the checkmating process. These include moving his own king when the opponent' s king is "squeezed" (figure 8.8). He also deals with other exceptions to avoid a stalemate e.g. (figure 8.8 left).

Figure 8.6. Left: the wh ite king is not squeezed. The expert pl ays Qe6 and explains that: "If Squeezing then Good". He starts the definition of "Squeezing" as: "If the move puts the attacking piece and the opponent's king in an L-shapeformation then Squeezing". The L-shape condition is defined using the visual describing rank and file distances (see row 3 and 4 in table 8.1)

The completed knowledge base that we developed is capable of guiding the search engine towards finishing the game starting from a rook and a king or a queen and a

167 king against a lone king. The knowledge base contained 60 rules and 10 concepts. In the next section, we discuss observations made as a result of this KA experience, and we give more KA snapshots in showing some features specific to using a rook to checkmate instead of a queen.

Figure 8.7. Left: The rule "If Squeezing then Goocf' applies, however it results in a stalemate (right). Therefore, an exception is required: "If Stalemate then Not Goocf'. Instead, a "K-closing" move is taught to the system, which involves moving the opposing king closer to the trapped king. This will eventually lead to a checkmate.

Figure 8.8. In both figures, the black king should be moved. Left: Black Icing should be moved to avoid a stalemate. Right: Black Icing should be moved, because the black queen is squeezing the white king and there is nothing more to do for her, and moving the black king would eventually force the white king to move further towards the edge.

168 Figure 8.9: A squeezed king on the edge of the board. The expert enters a rule "If K is on h-file then squeezed". Of course, this is later refined to include all possible four edges of the board. This concept is independent of moves (search operators) description. Therefore, it's a descriptive concept referring to stati c properties of the board position.

Endgames Discussion

In endgames, the use of the working memory was less frequent than in the middle of the game. There were fewer choices available and the expert was quicker at articulating his knowledge than in the middle game. Endgame decisions depended more on knowledge rather than search. That is, consequences of the "best move" can be easily identified by the expert without having to develop a search tree mentally.

During the KA process in endgames, the expert made frequent use of the visual primitives. Concepts that he defined could be categorised in two categories: action concepts and descriptive concepts. Action concepts defined a sequence of moves. Descriptive concepts defined conditions that make such actions desirable. For example in checkmating the opponent's king, the expert defined two concepts "Squeezing" (the king) and "Squeezed". The "Squeezing" concept referred to a forcing sequence of moves which forced the opponent's king on the edge of the board (see figures 8.10 and 8.6). The "Squeezed" concept was articulated by the expert to describe conditions which are sati sfied, when the "Squeezing" action was no longer needed. These conditi ons were: The queen (or a rook) formed an L shape with the opposing king, or when the king was right on the edge of the board (figure 8.9). A second example of a descriptive concept is the concept of "stalemate" whi ch refers to the stati c properties of the position.

169 In a single NRDR single concept, the conclusion of a single rule 1s taken (Note, NRDR concepts are defined as single classification Ripple Down Rule trees). In action concepts, individual rules filter through a sequence of search operators one by one to the search engine and thus implementing a sequence of moves. Each rule in the "Squeezing" concept stores one move (see figure 8.5). The collection of rules in every action concept implements a macro search operator. This semantic distinction between NRDR concepts (action concepts versus descriptive concepts) also applies in the middle game. However, because endgame NRDR concepts are easier for the expert to complete, this distinction is more prevalent in endgames.

Figure 8.10. Black plays .. Ra3 +. This forces the white king to move towards the edge of the board. The expert explains this as a "Squeezing" move. He explains "Squeezing" by entering a rule "If King_Opposes and Check then Squeezing". He explains the concept "King_Opposes" by entering a rul e "If both kings are on the same file and are two ranks apart then King_Opposes". This rule can be ex pressed easily with the visual primitives available. The "Check" concept describes a move which threatens the king. This is explained in terms of the available tactical primitive (see table 8.2). This concept "Squeezing" refers explicitly to the move and therefore is an action concept.

8.5. Contributions of chess adapted SmS We end thi s chapter by noting that researchers in the field of computer chess have not yet used knowledge acquisition from experts directly in building strong chess programs. Furthermore, where knowledge has been used, it was only in chess endgames and openings (Hsu, Marsland et al. 1991 ; Barth 1995). In using SmS in chess, all chess kn owledge is treated equally. In this regard, SmS's contribution to the fie ld of computer chess is two-fold: Firstl y, we offer a framework whereby

170 knowledge acquisition can be used. Secondly, we offer a knowledge representation scheme which allows the representation of tactical concepts in chess middle game.

Another contribution of SmS in chess is towards a new emerging area of research in AI, known as Metagame (Pell 1993; Pell 1996). The purpose of Metagame research is to develop and compare programs which can analyse any game from some general class, rather than just a single game. A Metagame program takes as its input rules of a game to be learned and performs some analysis on the game and learns by playing against other players. From a Metagame perspective, SmS can be seen as a Metagame player which can play any game (which requires search) and for every game a knowledge base can be redeveloped. In our current implementation, any chess like game can be easily learned separately. As we discussed in chapter 5, rules of the game can be acquired during the knowledge acquisition session if so desired, as long as the move generator is made general enough. SmS has the facility to load different knowledge bases according to their individual use. This would allow accommodating different knowledge bases for different games and allow SmS to be used as a Metagame player.

8.6. Chapter summary In this chapter, we presented a case study of SmS in the domain of chess. We later sketched knowledge acquisition sessions following our knowledge engineering stage. Experience of the expert entering his rules to the system clearly showed that SmS's adaptation to the domain of chess can be made mostly during the incremental knowledge acquisition stage. In particular, a good level of play was achieved in the middle of a chess game using a strongly pruned search tree. In endgames, search knowledge was successfully captured from an expert to solve checkmating problems. In endgames, the expert made less use of the working memory. However, he found it easier to develop his concepts. This is probably because endgame decisions are mainly based on similar past experiences, rather than search and extensive analysis of the current board position as is the case with the middle game. Further in endgames, the distinction between 'action concepts' implementing a macro search, and 'descriptive concepts' used as conditions for action concepts was prevalent. Generally, a similar distinction can also be made at the rule level. Every rule which explicitly refers to the actual move (search operator) can be called an 'action rule'.

171 This distinction in the functionality of rules was also evident in using Multiple Classification RDR for configuration (Compton, Ramadan et al. 1998). Rules which described actual changes in the configuration are 'action rules'. In 'action concepts' which define macro search operators, every rule is actually an action rule.

In the next chapter, we will discuss further observations made from this knowledge acquisition experience from a knowledge level perspective. We will highlight the critical role of our Nested RDR for the interactions between the expert and the system to be conducted at the knowledge level. The NRDR framework simplifies the knowledge acquisition process, and it allows the expert to construct the knowledge base directly using his own vocabulary.

172 Chapter 9

Discussion and critique

In this chapter, we discuss our framework for acquiring search control knowledge and highlight its limitations. In section 9.1, we detail how the notion of the knowledge level as introduced by Newell (Newell 1982) permeates our work. In section 9.2, we discuss limitations of our approach, and we finally conclude with a characterisation of domains for which our approach is most effective.

We first overview the knowledge level (KL) idea (discussed briefly in section 2.2.2) and its significance. We then consider important features of the KL as first discussed by Newell in (Newell 1982). We discuss how our framework accommodates these features. This highlights how the knowledge acquisition requirements for implementing search heuristics at the KL are incorporated within our framework. This discussion also highlights important implicit assumptions within our work.

173 9.1. The Knowledge Level, SmS and NRDR The idea of the KL is about a system description level, at which the system is viewed as a rational agent guided by knowledge specified in terms in which humans communicate their knowledge to their fellows. The knowledge level is in some sense an AI version of the philosophical intentional stance where we view computers as though they were humans, and as though they had human intentions5• The key contribution of the KL idea is that it offers a perspective that simplifies the design of intelligent systems. At this level, knowledge is functionally expressed independently of the underlying implementation level, the symbolic levez6. For example, when the expert is expressing his search knowledge, s/he may recommend a certain action to be taken. At the knowledge level, the action description, and the conditions under which this action should be taken are described in the same way, for example see section 8.4). Moreover, if the expert foresees future conditions, which may impact the choice of action that s/he is about to recommend, then s/he is able to include this in his/her current KL description.

In what follows, we will highlight how our system SmS accommodates the KL description as originally expressed by Newell in (Newell 1982). We will show how three important features of the KL are pertinent to our framework. These KL features are: Firstly, the KL description remains incomplete, because the intelligent agent (expert or system) is incapable of drawing all conclusions from his/her knowledge. Secondly, the KL does not have a structure. Thirdly, some of the knowledge at KL is generated dynamically. This implies that the outcome of using the knowledge - the action taken by an agent - cannot be always statically predicted. It may depend on the task on hand.

5 In (Dennett 1996), Dennett describes the intentional stance as follows: "The intentional stance is the strategy of interpreting the behaviour of an entity as if it were a rational agent who governed its "choice" of "action" by a "consideration" of its "beliefs" and "desires". In (Newell 1993) Newell discusses the relation between his knowledge level and Dennett's intentional stance (Dennett, 1978). Viewing the system as a rational agent that chooses its actions to achieve its goals according to what it "knows" is the key resemblance with the intentional stance.

6 To many other authors, knowledge level analysis is seen abstracting away from the particular domain and analysing how the knowledge is structured and used independently of the actual domain content.

174 The first feature of the knowledge level is its incompleteness (Newell 1982). Knowledge does not completely describe behaviour, and the symbolic level may need to be used to completely describe observed behaviour. This feature of the knowledge level should not be confused with the human inability to describe their concepts completely (as discussed in chapter 6). The knowledge maintenance problems that arise from that inability are managed by using Ripple Down Rules, as their structure anticipates future changes and changes are easily accommodated within RDRs. However, the incompleteness of the KL is a fundamental limitation due to the human inability to explain all their knowledge with language. For example, it is not possible for experts to articulate all their sub-cognitive judgements. In the next two sub­ sections, we describe how the other two features of the KL are accommodated in the interactions between SmS and the expert.

KL interactions in SmS At the KL, Newell (Newell 1982) states that no distinction is required between goal and body of knowledge, so at the KL we talk of knowledge about goals rather than actual goals. From a system development perspective, the procedural/declarative distinction is made redundant. In SmS, only knowledge about search goals is relevant. Actual search goals are implicitly captured by storing knowledge about search operators that modify a search state. Hence, operators leading to a search state are themselves treated as features of a state. E.g. in design problems, the current state of the design along with the last action (search operator) taken to get to this state would be presented to the expert.

The expert comments on any action and a search state in the same way, ignoring the semantic difference that would be clear at the symbolic level. A complex goal can be reached via a sequence of less complicated goals. That is, the knowledge that the expert expresses about complex goals includes intermediate state descriptions. Since, s/he describes actions and search states in the same way at the KL, s/he is enabled to deal with complex goals in the same way that s/he may deal with a simple goal. For example, in the domain of chess the goal that the expert wants to express may be as simple as the completion of a sequence of moves, to a more complex subtle goal, such

175 as going through search states in the game to achieve a specific strategic goal (target state).

In our framework, the expert expresses all his/her knowledge in a declarative manner. Features of every search state along with features of the search operator that led to it are presented to the expert for commenting (in retrospect). NRDR concepts can be viewed as labels that organise the knowledge - both the procedural and the declarative knowledge. These concepts can be used to store sequences of actions - macro search operator. In (Iba 1989; Laird, Newell et al. 1993.) the machine tries to discover macro search operators. In contrast, our approach allows acquisition of these macro operators directly from the expert through KL interactions. These interactions also capture conditions under which applying such search macros is desirable. Although, search macros and the conditions of their applicability are semantically different, they are captured at the KL in the same way (see section 8.4 for examples). This is made possible by our common perspective on features and operators. This is a crucial step required to adopt the generic view of knowledge at the KL. It enables the metamorphosis between declarative and procedural knowledge transparently from a KA perspective.

The third important feature of knowledge at the knowledge level is its dependence on the task at hand. I.e. knowledge is generated dynamically. This is expressed by

Newell (Newell 1982) as follows: ".. this knowledge can only be created dynamically in time. If generated by some simple procedure, only relatively uninteresting knowledge can be found. Interesting knowledge requires generating only what is relevant to the task at hand .. ". In acquiring search knowledge, we note that the expert often refers to the ongoing search process while explaining his/her steps. This dynamic reference is accommodated in SmS by storing this search process in a working memory, and by availing it at the knowledge level. Consequently, while a search is being executed by SmS, some of its decisions are based on the execution trace of the search and can only be generated dynamically. For example in chess, moves approved by the knowledge base during the search by SmS, are a product of both, the knowledge base contents and the chess game being played and stored in the working memory.

176 Our approach allows most of the domain knowledge to be acquired during the knowledge acquisition process. However, a short knowledge engineering stage is required to adapt SmS to any given domain. The success of the subsequent incremental knowledge acquisition process in acquiring effective search knowledge is underpinned by a successful knowledge engineering stage. The limiting role of this knowledge engineering stage is discussed next.

9.2. SmS's limitations Limitations of the knowledge engineering stage impose generality constraints on SmS. That is, these limitations narrow the scope of domains for which SmS may be suitable. In this section, we discuss how an inaccurate (incomplete) knowledge engineering stage can impact the incremental knowledge acquisition process.

As we discussed in chapter 5, there are three steps in the knowledge engineering stage: First step is deciding on the search state representation, second step is deciding on the search operators, and last and third step is designing the explanatory primitives. We examine how revisiting each of these steps during the knowledge acquisition process affects the validity of the knowledge base. This discussion leads us to a rough characterisation of domains for which incremental search knowledge acquisition is possible using SmS.

In what follows, two assumptions are taken for granted: firstly, the expert does not make mistakes with respect to the culling decision of search paths7• Secondly, the domain does not change during the actual knowledge acquisition.

9.2.1. Extending the Explanatory Primitives incrementally

When the expert fails to express his/her knowledge about a search state using a current set of explanatory primitives, this set can be extended during the actual knowledge acquisition process. Our SmS implementation anticipates this, and it allows easy extension of any existing set of explanatory primitives. These primitives are presented to the expert as regular expressions. Automatic parser generator (Yacc

7 This assumption is not critical. If the expert makes a mistake s/he can fix it by amending the

knowledge base.

177 and Lex8) is used to generate the C-code corresponding to these expressions. In tum, this C-code interfaces to a C-procedure corresponding to each explanatory primitive. So, to extend a set of primitives, only the input grammar file to Yacc is extended and a c-procedure corresponding to the primitive is provided. See sections 5.6.1 and 8.1.2 for how to decide, whether a primitive should be made available as a C-procedure, or as an NRDR concept.

New explanatory primitives give the expert extra expressive power for his/her future rules. The process of adding rules with new explanatory primitives remains the same. Adding new explanatory primitives only increases the expressive power of the interaction language, and it does not have any impact on the incremental knowledge acquisition process. The rest of the discussion will focus on describing difficulties encountered when the set of the search operators needs to be extended, or when the state representation is inadequate.

We assume that there are no mistakes made in defining search operators or representing search states. That is, all defined search operators are useable. The problem of our concern is one of incompleteness of the set of search operators, or the inadequacy of the representation to capture all relevant features in the search states. In other words, not all features required by the expert to express his/her search knowledge about a given search state are present, and/or there are regions in the search space required to conduct a successful search that cannot be reached by the search engine. This may be because the state representation is inadequate to represent such regions of the search space, or it may be because there are no search operators that generate states in those regions.

9.2.2. Can we extend the Search Operators incrementally? A search operator can be modelled as a function that takes a search state as input and outputs a new search state. Clearly, a given search operator does not change all features of a given search state. For example, a chess move (a search operator) will only change two squares on the board, and this possibly corresponds to two features in the search state representation. The state representation needs to be modified only when features of a new operator are not captured by the representation. For example,

8 Lex and Yacc are standard UNIX tools.

178 in the domain of chess, say the board of chess is mistakenly represented as a 7x8 board (rather than 8x8), and following expert advice, search operators to allow pieces promotion9 are introduced, then the search representation needs to be modified to allow references to the missing 8th rank in the current incomplete representation. In this section, we consider the following question: if the search state representation includes all the relevant features in every possible state in the search space, what would happen to the knowledge base if we incrementally build the set of search operators?

Given a search state, the expert can determine what possible search operators are desirable without considering whether these operators are already available (in the system). The expert's knowledge is expressed independently of what search operators are represented. That is, we have a mentalistic10 view about the mental intermediate decisions made by the expert during the search process itself. This is acceptable, as what goes on inside the expert's head - his/her internal search process - is independent of whether or not the search operators, that s/he wants to use to express it, are present. However, as we will discuss shortly, we take a situated cognition perspective about the expert's justifications during the incremental knowledge acquisition process. Therefore at first glance, it seems that new search operators can be added without affecting past individual rules. However, new operators may actually change the progress of a search conducted by the system. This progress is what ultimately the expert monitors and comments on. So, the set of search operators can dictate the order of presentation of cases to the expert. Hence, updating the set of search operators can indirectly impact the development process of a knowledge base. In particular, new search operators may lead an expert to discuss regions of the search space, which have been visited earlier during the knowledge acquisition session (through previous search paths not using the new operators). In this case, past acquired knowledge may be incomplete, because the expert did not comment on possibilities that were overlooked by the system. For example, if not all knight moves in chess were initially produced by the existing chess move generator, and all seen cases did not have knights on the board, then completing the move generator does not have any impact

9 Promotion moves in chess are moving pawns to the 8th rank, where pawns are promoted to any piece of the player's choosing.

179 on the development of the knowledge base. However, if some past cases had knights on the board, then the performance of SmS may have differed with respect to those cases using the new state generator. So, incompleteness of the search operators may have an indirect impact on the incremental knowledge acquisition process. Fortunately, in either case, the knowledge base can be further developed incrementally after the addition of search operators, without loosing past knowledge acquisition efforts.

To avoid incremental addition of search operators, the initial state generator can be made rather general. The set of search operators may be made as large as possible, and any redundant search operators can be later filtered out by the knowledge base. For example in chess, we can have a most general state generator which allows moves between any two squares on the board with total disregard to any rules of the game. Clearly, the knowledge acquisition process would then subsequently be slightly more expensive. However, this would stultify the issue of impact of adding new search operators on the validity of the existing knowledge base, because we can be sure that there will not be any new search operators added in the future.

9.2.3. Can we change the search state representation in SmS?

The search state representation needs to be changed when not all available features represent all states in the search space which the expert wants to refer to in his explanations. This may mandate the addition of new search operators to be able to generate such search states. Again, arguments of the previous section with respect to adding new search operators apply. However, the absence of relevant features in the representation has much more serious impact on the progress of the incremental construction of the knowledge base than the absence of search operators. This may stop the expert from considering some desirable search operators or conditions in rules which s/he enters (see example below). In other words, the representation of cases which the expert is describing is intertwined with his/her description of these cases (this is clearly a situated cognition perspective).

An incomplete state representation will affect the expert's expression of rules. For example, in architectural decisions, colour may or may not be an important feature.

10 In philosophy, mentalism is the view that all knowledge resides in the head.

180 However, if the colour issue is not foreshadowed to the expert in the state representation, then s/he may totally miss its significance. If colour is added to the state representation, then all past rules need to be rechecked for significance of this colour feature. This check requires human expertise. The whole knowledge base may need to be discarded, and the knowledge acquisition process may need to be restarted. This is a worst case scenario. Occasionally, a change in the representation may be tolerable. For example, in chess if the representation is changed to accommodate a new piece, then we can safely assume that up to this point in the knowledge acquisition process, seen cases did not include this piece. The only change required to past knowledge would be to explicitly restrict past rules to cases where this new piece is not present. Taking another example, say the representation of the chess board changes, e.g the board was mistakenly represented as 6x6 instead of 8x8. Such change would require restarting the whole knowledge acquisition process, because conditions in past rules will no longer hold. In both scenarios, a change in the state representation requires expert intervention to ensure that the change does not have any side effects on the previously acquired knowledge.

From a syntactic perspective whether the KB is useable after a modification depends on the chosen representation. E.g. if chess search states are represented as a vector of pieces locations (instead of the whole board), and a new piece is introduced then it would be hard to keep using the same knowledge base regardless of any semantic similarities. However, adding a rank to the board would change the whole knowledge base - regardless of the syntactic representation. That is, whether or not the KB is useable, depends partly on syntactic consideration of the search state representation.

9.2.4. A final word on the generality of SmS

When the search state representation is not stable, incremental acquisition of search knowledge may not be a feasible option. This depends on the number of changes the state representation undergoes. This indeed applies to instance representation for incremental knowledge acquisition in any domain - not only search. If instance representation (search state representation) is not guaranteed to be stable enough, knowledge engineering disciplines such as KADS (Wielinga, Schreiber et al. 1992) may be more effective. However, if the search state representation does not change, then we don't have a problem. We can then be sure that all relevant features are

181 within the expert's perception at all times. For example, showing the expert the actual physical case (e.g driving an architect around and asking him to give his knowledge in relation to actual buildings), would eventually eliminate any inaccuracies in the search state representation. In this last example, expert's comments repair any inadequacy in the representation, without altering the validity of the already acquired rules. S/he would be commenting on 'reality' 11 rather than on an intermediate symbolic representation.

In conclusion, incremental search knowledge acquisition with SmS is feasible in domains which allow successful completion of the three knowledge engineering stages, or in domains which allow revisiting of the stages without loosing past efforts. As we earlier discussed, extending explanatory primitives does not cause any side effects to the existing knowledge base. Modifying the search state representation or extending the set of search operators may invalidate parts of the existing knowledge base. Frequent modifications of the search state representation, makes incremental knowledge acquisition with SmS difficult, and most likely impossible. Each time the search state representation is modified, the knowledge base may get discarded. More positively, it is not necessary during the whole KA process, to be able to visit every possible search state in the search space for incremental KA to succeed. Search operators can be incrementally added, as long as the source and target search states can be represented. The drawback of this is that SmS would have to revisit some search states, and the knowledge base may need to be refined as new search paths become available to SmS, and its old search solutions may no longer be correct.

Fortunately, there are many important well defined domains where a good search state representation is well known, and incremental search knowledge acquisition with SmS is feasible. Examples are: network routing, configuration problems (e.g Sysiphus II, ion chromatography, ... ), path finding problems (e.g. the travelling sales person problem), Games (chess, go, .. ), ...

11 Whether a symbolic representation can ever represent 'real' objects is beyond the scope of this thesis.

182 9.3. Chapter summary and conclusion In this chapter, we analysed how and why interactions between SmS and an expert are actually conducted at the knowledge level. Two critical features of KL interactions are accommodated within our framework. These features are: They are generic and their subsequent use from the knowledge base is dynamic. This last feature implies that different usage of the knowledge base may yield different actions.

For smooth and effective interactions between SmS and an expert a successful knowledge engineering phase is required. As we discussed in chapter 5, this includes the following three steps: Designing explanatory primitives, designing search operators and deciding on a search state representation. In this chapter, we discussed whether or not errors/omissions in any of these steps could be rectified during the knowledge acquisition process without making the knowledge base obsolete. This discussion gave us a rough characterisation of domains for which our approach is effective. Briefly, these are domains where a relatively stable state representation is possible.

Next chapter concludes this thesis. It reviews its contributions, and it discusses possible future directions of the presented research.

183 Chapter 10

Summary and conclusion

In this chapter, we summarise the contributions of this thesis. We conclude with a discussion of possible future directions of the research presented.

10.1. Thesis summary In this thesis, we presented a new approach to the development of effective search heuristics. We viewed the problem as a knowledge acquisition task, which resulted in a modelling process of experts' search knowledge at the knowledge level. This modelling process is based on experts' explanations of their search process. Our case study in the domain of chess (chapter 8) suggests that, when forced to explain the outcome of the knowledge on given instances in sufficient details, it is possible for an expert to articulate incrementally his/her knowledge, which can be inaccessible

184 through introspection. An expert was asked to explain and to refine his concepts until their application resulted in the same search steps that he was proposing.

The incremental KA framework of this thesis was put in the context of AI in general in chapter 1, and knowledge acquisition in particular in chapters 2 and 3. This framework included these two components:

• A domain independent architecture: This is our SmS architecture. It is inspired by observed human introspection and behaviour. This architecture can be seen as modelling the domain independent search process. The architecture operationalizes the acquired human search knowledge. This architecture was presented in chapter 5 and employed in the domain of chess in chapter 8. Its limitations were discussed in chapter 9.

• Domain dependent search knowledge: This is represented and communicated at the knowledge level. This knowledge is captured and maintained using our NRDR incremental knowledge acquisition framework. The NRDR framework was overviewed in chapter 3, presented in detail in chapter 6 and later formally analysed in chapter 7.

In the next section, we overview our contributions within the above two components as they appeared in this thesis.

10.2. Contributions Key contributions of this thesis are three: Firstly, it substantially extends Ripple Down Rules into Nested RDR, which allows simultaneous modelling and knowledge acquisition. Nested RDR is generally applicable where domain classes are determined only during the actual KA process. Secondly, this thesis presents a knowledge acquisition framework which allows incremental acquisition of expert search knowledge. This is implemented in SmS and it employs NRDR to capture experts' knowledge incrementally. Thirdly, it presents a general formal framework which gives a discourse language to analyse the incremental knowledge process with RDR. In particular, it was used to show that, the added benefits of using NRDR do not incur

185 any extra burden on the expert - in comparison with RDR - during the incremental KA process.

The NRDR contribution of this thesis is most influential in the incremental knowledge acquisition research area. This was reviewed in chapter 3. The NRDR framework alleviates some of the shortcomings of incremental knowledge acquisition with Ripple Down Rules (RDR): Repetition and lack of explicit modelling. Furthermore, unlike (MC) RDR, NRDR offers a modular incremental knowledge acquisition framework. Chapter 3 mainly focussed on the Ripple Down Rules methodology. Chapter 4 provided a formal framework to describe and analyse the knowledge acquisition process with RDR. This framework was used to describe convergence behaviour and conditions, of an RDR knowledge base. On its basis, we later analysed our NRDR knowledge acquisition framework in chapter 7.

NRDR is critical in satisfying the requirements needed to capture human search knowledge. These requirements were discussed in chapter 5, and they led us to our SmS architecture. NRDR facilitates knowledge level interactions between SmS and an expert. It accommodates experts' natural tendencies in expressing their knowledge. NRDR is based on the idea that, a hierarchical incremental knowledge acquisition process that captures and operationalises the expert terms, while incompletely defined, makes the knowledge acquisition task more effective1 (see section 6.1 in chapter 6).

1 An NRDR knowledge base has a hierarchical structure. The human mind is widely believed to be hierarchical. Language is believed to reflect this structure (Husser!, 1977; Fodor, 1975; Chomsky,

1957; Devlin, 1997; Sowa, 1984). The popularity of this notion has created the community of conceptual structures researchers. To their perspective, we believe NRDR offers an extensible easy

methodology of acquiring conceptual structures directly from experts. It is a natural representation of

the way humans give their explanations in many domains such as circuit analysis (Kieras, 1993), chess

(Beydoun, 1997), ..

186 NRDR is a core contribution. It is a novel approach which allows modelling and knowledge acquisition concurrently. It is more generally applicable than to the acquisition of search knowledge. It brings incremental knowledge acquisition to a whole new set of domains where domain classes (domain terms) are expert dependent and are only determined during the actual knowledge acquisition process, e.g. intelligent World Wide Web browsing, some design problems, ..

This NRDR knowledge representation scheme allows the expert to view the knowledge base holistically during the actual knowledge acquisition process. Any given concept in an NRDR knowledge base may depend on a number of other concepts. A change in any concept should ensure that remaining concepts stay consistent with respect to this change. That is developing any concept in NRDR must take into account its impact on other concepts where it may be used. This impact is monitored by storing past seen cases in a database, and by keeping the knowledge base consistent with respect to the cases in this database. These issues of development and maintenance of consistency of the knowledge base were first introduced in chapter 3. A mechanism to support this consistency was developed and presented in chapter 6. A thorough formal analysis of this mechanism was later provided in chapter 7.

Some of the requirements for acquiring search knowledge discussed in chapter 5 are due to the nature of expertise and the human tendencies in expressing it. In their light, chapter 6 focussed on the philosophy of the Nested Ripple Down Rules framework. There, we aligned this framework with the observed human behaviour in general and of course experts in particular. We argued that its maintenance, characterised by the need for secondary refinements, shows certain analogies with confirmation holism of scientific theories as advocated by Quine (Quine 1951).

In chapter 7, we used the formal framework of chapter 4 to analyse the added effort in KA due to the interactions between concepts throughout the knowledge acquisition process. In chapter 4, we formally showed that, for a given domain, the size of a knowledge base is dependent on the quality of expertise. To formalise the quality of interactions between the system and the expert, we introduced the notion of predicitivity. This notion describes the generalisation capability of individual rules.

187 We showed that an RDR KB converges, as long as the predictivity of newly added rules is bounded by a fixed constant. We called this condition the correctness principle. In other words, as long as some generalisation of added individual rules exists, the KB will converge. Chapter 7 extended this important result to the NRDR framework. We showed that maintaining an NRDR knowledge base throughout most of its development is as simple as maintaining an RDR knowledge base. This is in terms of the complexity of interactions with the expert. We analysed the extra maintenance issues of incremental knowledge acquisition with NRDR - that is the extra maintenance due to the interactions between concepts. We proved that any extra effort of NRDR maintenance could be handled automatically without an expert. This showed that interactions between NRDR concepts add little burden on the expert. Thus, it was shown that developing an NRDR knowledge base is as economical as developing a simple RDR knowledge base. That is, the epistemological benefits of NRDR are economically viable in terms of the load on the expert during the incremental knowledge acquisition process.

In chapter 8, we utilised SmS and the NRDR incremental knowledge acquisition framework to capture and represent expert search knowledge in the domain of chess. The obtained results were evidence for the effectiveness of the SmS architecture. It was shown that the knowledge acquisition process with NRDR allows economical development of search heuristic allowing a strongly pruned search in the search space. This is possible as we argued in chapter 9, because the knowledge acquisition process is actually conducted at the knowledge level as originally described by Newell in (Newell 1982; Newell 1993).

In summary, this thesis extended the work done in Ripple Down Rules, to make it applicable to a whole new range of search domains. That is, an economical incremental knowledge acquisition methodology for building search heuristics was developed. This is the thesis most significant contribution. However, it should be noted that our approach is most effective, when the search space can be adequately represented. If the search space representation requires modification, during the incremental knowledge acquisition process, then most likely the knowledge acquisition process needs to be restarted from scratch. This is clearly a limitation which should be considered before our approach is recommended (Beydoun and

188 Hoffmann 2000). In this regard, chapter 9 characterised domains for which our approach is most effective. In brief, these were domains where the search state representation is well defined.

10.3. Future research Future extensions to the work presented in this thesis can go in two directions: First is extending the formal framework analysing the NRDR incremental KA framework. Second is bringing the knowledge acquisition interface closer to match humans' natural way of expressing their knowledge.

In that first direction, we note that it is probable that our notion of predictivity and the associated correctness principle introduced in chapter 4, can play an important role in software engineering in general. Future extension of this work will attempt to generalise them in that direction. Towards this, an intermediate research step may be to use those notions to outline guidelines for conservative extension of logical theories similar to (Antoniou, MacNish et al. 1996). Further extensions of the formal part of the thesis, in particular chapter 7, can deal with the behaviour of the knowledge base as a conceptual structure. For example, easily defined concepts (which have a smaller number of rules) show low granularity, and in defining them the expert enters rules which have higher predictivity than concepts which are difficult to define. Such easy to define concepts occur at the lower layers of the hierarchy. Concepts which occur on the higher layers of the hierarchy seem to be harder to define for an expert. Analysis along these lines will enable assessment of different properties (size, frequency of use, ... ) which concepts at different layers of the knowledge base would exhibit. Such analysis will yield a more accurate model of a Nested Ripple Down Rule knowledge base development process.

Towards bringing our incremental knowledge acquisition framework to closer match the human nature, we note that human reasoning undoubtedly accommodates fuzzy conditions. Only recently, some work has started to accommodate uncertain reasoning into Ripple Down Rules methodology - see (Martinez-Bejar, Shiraz et al. 1998). In our current knowledge representation scheme NRDR, we do not accommodate uncertain conclusions or conditions. Future work will follow (Martinez-Bejar, Shiraz

189 et al. 1998) to accommodate uncertainty estimates during the knowledge acquisition process.

190 Appendix A

SmS's interface manual

This manual provides a description of the functionality of the interface for incremental knowledge acquisition with SmS. This manual focuses on describing how the interface can be used. It does not provide technical knowledge of NRDR. This is assumed (see chapters 3, 6 and 7). However, the interface description will highlight features of our NRDR incremental knowledge acquisition framework. These have been extensively discussed throughout this thesis.

Functionalities of the interface can be classified into the following categories:

191 1. This first category is concerned with opening and managing a knowledge base

from an existing set of knowledge bases. This is required as SmS allows accessing

more than one knowledge base. These basic high level functionalities includes

opening an existing KB, deleting an existing one, ..

2. The second category of functionalities is concerned with providing basic high

level access to a chosen opened knowledge base. For example, this includes

providing a list of all concepts within an NRDR knowledge base, number of rules

in a KB, number of concepts in a KB, ..

3. The third category is concerned with accepting domain specific input actions from

an expert. For example, in chess this includes an interface to a chess board.

4. The fourth category provides interfacing functionalities to browse/modify/create

NRDR concepts.

5. The fifth category of interfacing functionalities allows reuse and testing of an

opened know ledge base against existing data.

6. This category includes the knowledge acquisition assistant functionalities. It

includes warning the expert about any inconsistencies. It also provides graphical

hints to assist the expert in dealing with any inconsistencies.

192 In the rest of this appendix, each group of these functionalities will be described in details. The graphical user interface input/ouput associated with every functionality windows will also be shown.

A.1 High level knowledge bases functionalities Figure A.l is the window which the user first sees when SmS is launched. SmS allows accessing more than one knowledge base depending on the task on hand. In this window (figure A. l), no knowledge base is opened yet. Therefore, the list of concepts is s hown empty. The 'File' item in the menu bar is for creating a new knowledge base or opening an existing one (Figure A.lb).

RRDR in chess l!!llil 13 RRDR in chess l!!llil 13

List of Concepts 1~ Qpe.n KB :;iw 0 RB I ~eleteKB E!it

L.....------~' I •------'' I Concept Properties Concept Properties

Figure A.la. The main window Figure A.lb. The File Menu

A knowledge base is s tored as a collection of files. Whenever a knowledge base is opened or created, all its files are backed up to files with an extension of '.temp'. This prevents accidental corruption of a knowledge base. All subsequent changes by the expert wi ll be made on the temporary files rather than the actual knowledge base files.

A "Save KB" command permanently saves any changes to the knowledge base.

193 Initially, some items of the menu shown in figure A. l are disabled. Functionalities of disabled items do not apply without having a knowledge base opened. E.g. testing cases, saving a knowledge base, ..

Most menu items have accelerator keys. Underlined characters (see figure A.lb) indicates keystrokes for the accelerator key, eg 'New knowledge Base' item has

Altemate-N as the accelerator key. (Note: Accelerator keys may change under different platforms, in some platforms both Control and Alternate keys are sometimes used)

A.1.1. New knowledge base

This item opens a prompting the user (expert) for a name for the new knowledge base (figure A.2). If an existing knowledge base is opened, there will be a warning to close and save it (see message box in Figure A.3).

------===- New KB === l!!l~ 13

Cancel loct12

Figure A.2. New knowledge base

. Hot Save El The KB is not saved yet Save changes to the KB? IGI Figure A.3. Not Save

A.1.2. Open/Deleting knowledge base

This item allows opening an existing knowledge base. The default directory assumed

to contain all knowledge bases is set to the one from which the Tcl/Tk interface is

194 being executed. A dialog box (Figure A.4) prompts the user to choose a knowledge base. All knowledge bases must have an extension of '.kb'. This mandatory extension of '.kb' is enforced by the interface. The user chooses a knowledge base by double clicking the first button of the mouse, or by hitting a 'carriage return'. A double click on the second mouse button changes the directory to the chosen item. If there is already an opened knowledge base, a message box asks the user to save any changes made to that knowledge base (figure A.3). Deleting an existing knowledge base is simi lar to opening a knowledge base (figure A.5), except that a double click on its name deletes the knowledge base.

F. a,1@@@13:i-1 □ 1 x1 deleUng KB l!llil El

Backup 1 Backup bitmaps mar6c.kb

messages 1 messages recycled recycled safe safe - - testing.kb / estin .kb I=-=~==""' ~------~- I Dir: ltdo s b/Dat/Thesis/T clTk jt dos b/Dat/Thesis/TclTk

OK I Cancel I OK I Cancel I Figure A.4. Opening a knowledge base Figure A.5. Deleting a knowledge base

A.2 Functionalities for high level browsing of an opened knowledge base Once a knowledge base is opened, a list of concepts that it contains is shown in a new window (figure A.6). The number of rules forming in each concept is shown next to its name (see item 2, figure A.6). Highlighting a concept with a mouse click shows more detailed information about it (see item (3) figure A.6). This information includes: A formal parameter list (if any), weight associated with the concept (if any), and descriptive comments for the concept. The expert enters these comments when s/he creates the concept. A window scroller (item 5) containing 5 buttons is also

195 shown below an opened knowledge base. When the mouse passes over any of the buttons, the title of the window will be shown at the bottom. If a user wants to maximise/minimise a window, then s/he presses its corresponding button on the scroller. This can also be used to uncover any window hidden by other windows. The main window will always be shown. Whenever a new window pops up, it is entered in a stack starting from "WinO". When the number of stacked windows exceeds 5 then the oldest window is minimjsed. This leaves a maximum of 5 opened windows at any

OMainMenu

RRDR m chess for mar6c.kb l!I~ Ei E:tle. J,ltility Q.ame ~oard Case. taken from board List of Concepts OConcepts ExchangeO :sin• 3 l click: displays the Expose ·size= t properties ofthe 'oo d size:::.) concept in e Materiall..oss :size= 2 2 clicks: displays the MaterialWin :size.• 2 rules oftk concept New .size• 1 SafeMove :s1ze• l as a tree Save.Bishop :size= 2 Save.Knight :size• 2 Savt.Pawn :size .. 2 €>Properties Concept Properties Pe.re.metu• O;- Also includes the Weight• 0.000000 comments for the The most general conc ept concept

Testing Properties

MaHCo«1rO raPtorMt.•n-Wlini OCase information Shows some extra I SeN! I Play level: r Verbo•e r lflfcnnatioo fer the case to be tested ( described .::-'.J~ late-) Wint Win2 Wml Win4 I _t_j

0 Window Scroll Bar OTotal counts Raises the windows that Shows the total are min1m1sed er can't be number cf concepts seen due to obstructioo. and rules lfl the KB Shows a maximum offiv e windows

Figure A.6. Opened knowledge base

one time. This reduces the cluster of windows on the screen. This feature is useful as

an NRDR knowledge base is browsed in a number of windows, where every NRDR

concept is shown in a separate wi ndow (see section A.4 later, or section 5.5 in chapter

5).

196 A.3 Chess functions Figure A.7 shows the main window after making few chess moves on the board. A chess piece is moved by a click and drag mouse action. Entry boxes just below the shown messages in figure A.7 are extra parameters specific to the conducted search.

The "Send" button (10) is to confirm a move made by a user. This prompts the system to reply.

OMenus 0 Cnse locaion Three more menus for the case The case can be either set up using generated us111 g the board oo 0 the board, oc taken out of the KB (The case number and the concept tested the, would be indicated)

.... X @Chess Board !'ii• _l,!tihty Q_ame !loud Casc taken fr om board The pieces can be List of Concepts moved by dragging the piece to the destinaticn Ei:changt.O size• 3 Exp os t. sia= 1

Mett.riall.oss size.,. 2 MatuialW"tn ·siz e: 2 Nt.w size• 1 SaftMon :s1ze.s l SavtBishop :sizt.= 2 SaveKnight :size .. 2 SavePawn siz e• 2 @SystemMessn-grs I COACept Properties messages sudi as Pasamt.tu• 0, - who's tum it is Weight• 0.000000 The most general concept

Testing Properties

Winl Win2 Wml Win4 I _t_j

O Cnse Inform.awn 0 Window Scrdl Bar Shows th e last mo,e made Shows the Window hidden and the coocepts tested beh111d cne of the buttcns.

Figure A.7 Play chess

The options bar on top of the window has three menus: First is the "Game Menu", this

offers options specific to the chess game. Second the "Board Menu", this allows

artificial manipulation of the board. Third is the "Utility Menu", this offers

functionalities to manipulate the interface between the chess board and the opened

197 knowledge base. In what follows, functionalities under each of these three menus are described.

A.3.1 The Game Menu

Each command in this menu is described below (Figure A.8)

TakeBack: This takes back a move made on the chessboard.

Reverse: This reverses sides in the chess game, i.e it gives the expert the opponent's colour.

Skip: This skips a move and allows a player to have two consecutive moves. This and the above two options are occasionally useful during knowledge acquisition.

Use KB: This interfaces the opened knowledge base to the search engine.

Use Weights: This uses the defined weights for concepts while playing against the

RRDR in chess for mar6c.kb

r)le Jltility Q. eme I fl_ 0 eid ------Lis Iakeback ExchangeO :si Reverse Expose :size= ~kip Dilh!AI Materia!Loss : • Use KB Materia!Win :i . Use Weights N ew :size= 1 r SafeMove :size= 1 SaveBishop :size= 2 SaveKnight :size= 2 SavePawn :size= 2 / I Figure A.8. Game menu system- instead of a built in evaluation function (see section 5.4.1.4 in chapter 5 for a

detailed discussion of both options).

A.3.2 The Board Menu

Each command in this menu is described below (see figure A.9):

Defin e Basics: This allows defining basic concepts in a quick textual form by-passing

the graphi cal interface. This is useful to define some basic vi sual concepts, rather than

198 usmg explanatory primitives (see chapter 8 for chess endgames experimental di sc ussion).

Custom Set This provides a textual way of customi si ng the position on the chess board.

·.. RRDR in chess for mar6c.kb r)le 1ltility Q.ame fl. oa:rd I Llst of Con

Ex c hange◊ :size= 3 _Qefine Basics Exp ose :size= 1 g_ustom Set Good ·size= 5 ~ ave Position MaterialLoss :size= 2 MaterialWin :size= 2 L. oad Position New :size= 1 SafeMove :size= 1 S aveBishop :size= 2 SaveKnight :size= 2 SavePawn :size= 2 1' Figure A.9. Board menu

Reset This resets the board to the initial starting position.

Save Position This saves the displayed position on the chess board to a ".data" file for later testing. A dialog box prompts the user to enter a concept name under which this case should be saved (Figure A.10). The default sign for the saved case is "+". This can be overriden. For example, the user can type +Good, or -Good to describe the conclusion of the saved case (Figure A.10). This functionality is useful to generate test data for existing knowledge bases. ------Get Concept Rlil a

Pl,as, l'Rtlr t.t, 11111121 aftil.1 CD11Clft'

Figure A.10. Save Position

199 Load Position This loads a position from a file to the play board. The file extension must be ".data". This file can belong to a different knowledge base from the opened knowledge base. This is useful in transferring concepts from other knowledge bases

(Figure A. 11). A ". data " file contains cases which have been tested against the associated a given concept. In order to test a particular case, another dialog box prompts the user for a file name. The "Case Database" module of SmS is actually implemented as a collection of data files.

Data Files Et !l.irectory: _,..,,.._1_do_s_bl_D_a_t1Th_ e_s1_· srr_ cl_Tklm__ aro_ c._kb____ ...._....l ~ I

[I crap.data [I MaterialLoss.data [I SavePawn.data [I WinBis [I defS.data [I MaterialWin.data [I SavePiece.data [I WinKn 'I [I deflect.data [I New.data [I temp.data [I WinPa· [I dummy.data [I Sac.data [I Threat.data [I WinPie [I ExchangeO.data [I SafeMove.data [I Weird.data [I WWW [I Expose.data [I SaveBishop.data [I Weirder.data [I ,(,ood.data.1 [I SaveKnight.data [I weirderr.data '~------~' File name: jaood.data Qpen

Files of !ype: Data Files (*. data) -I ~ance1 Figure A.11 Loading a data file

Oloose case l!!llil El

'l71ue o.re39co.ses farGood.do.tv.,pleo.se c.~oose 011e

~23

OpenCase I ~ Cancel

Figure A.12 Choose a Case

A.3.3 The Utility Menu This menu shows commands to test the current case displayed on the board. There is

also a "Batch Test" command to test more than one case, and results are stored in a

file . Superseded text based fac ilities are also there: "Classify" and "Discuss" (as

200 shown below th e separator in fi gure A. 13). These have textual outputs in th e shell from whi ch the chess system is running.

Change Test This item changes the concept against which a case is being tested. The default concept again st which a case is tested is +Good (the highest level concept in an NRDR chess KB). The default sign of any concept is +.

RRDR in chess for mar6c.kb f!o ard

Excht Accept Expo: Batch Test

Mate.i Qlas sify -Mate.i ------. N ew: _____,Qi.s cuss _, SafeMove :size= 1 SaveBishop :size= 2 S aveKnight :size= 2 S avePawn :size= 2 Figure A.13. Utility menu

Accept This item actually tests the displayed case against the chosen concept. More detail s will be later described.

Batch Test This item is for testing a particular case against more than one concept.

The information is too large to be displayed graphically. It therefore gets saved into a fil e with the concept's name with an extension of ".attrib". The ".attrib" file contains attributes whi ch show up in traces of tested cases. Attributes of such traces will also contain oth er concepts, whi ch have been visited during generating the trace.

A.4 Analysing concepts The way NRDR concepts are di spl ayed refl ects the hi erarchi cal structure of an NRDR

know ledge base. Single RDRs are viewed as binary trees, with the true and false links

as branches to children nodes. Each rule in the tree may have reference to other

concepts. These in tum are described by other trees.

201 A.4.1 Viewing Trees Double clicking on a concept m the list box of the mam window (Figure A.6), di splays the RDR tree which defines it. If a tree is too big to be displayed completely in its designated window, the scrollbar on the sides (in figure A.14) navigates through the tree. A horizontal branch describes a true-link, and a vertical branch describes a false-link. Comment in the text box can be edited anytime by the user. Any changes to the comments are automatically saved when the "exit" button is pressed.

Analysing the Good concept l!llil Ei

Ride 2 of Good If +Exposel(0;-), +SafeMovef(0;-), pJ Then +Good The weight is 0.150000 /

Certaint>•

Concept Properties Parameter= O;­ Weight= 0.000000 The most general concept Cm-Stone N ,,,,- I Exit Figure A.14. Viewing the tree

A.4.2 Viewing rules To view each rule in a displayed tree, the user can place the mouse cursor on the desired node. This displays the information of the rule in a text box on the right

(Figure A.14). The cursor is represented by a small empty circle and a cross to indicate its exact position. While scanning, as the cursor is moved across the tree,

information (ie, rule name, conditions, and conclusion) for any node below the cursor

is di splayed in the text box.

202 For analysis purposes, the information of a rule can be ''frozen" for viewing (figure

A.15) by an extra click on the node. The frozen information will be displayed into another window called the "Analyser". While the information is frozen m one window, the mouse can still be used to scan through other nodes in the tree.

(J)lutle Scanner mov e the m ouse pass the nodes to display 0 RDR Tree the rule info he-e. Vertical link means fa lse link, horizootal means true. - □ X

~

\ I (f)futle Window \ a cl ick on anode of \ R>de2 of Good the tree with display If +Exposel(O;-), ~..1/ '\ +SafeMove!(0:-), this window Then +Good \ The weight is 0.150000

If +Exposel(0;- ), Certainty •Exchan eO 0;-), ThUl +G ood The weight is 0.100000 Concept Properties

Cei1ainty Parameter= □ ; - Weight• 0.000000 The most general concept CnrStone I N- CnrStone I Exit I Exit I Figure A.15 Analyser

Con1erSlone Case 1 of Good III Rule 2 l!llil El

Figure A.16. CornerStone Case

If conditions of a displayed rule have expert entered concepts, these conditions are hi ghli ghted. RDR trees defining such conditions are displayed by a double click on

203 them. The format and functionality of windows displaying trees are the same as in figures A.14 - A.15. A button for viewing the cornerstone case of the rule is also available for the user (Figure A.16) (the comer stone case of a rule is the case which caused it addition).

A.5 Testing a case

!:ile 1/tility Q.ome ~oo:rd Case taken from bollld I List of Concepts ExchangeO :size= 3 Expose :size= 1 1JOOd size= S Materiall.oss :size= 2 MaterialWin :size= 2 New :size= 1 Safe.Move :size= 1 SaveBishop :size.: 2 SaveKnight :size= 2 SavePawn :size= 2 Concept Properties Parameter= O;- W e.i ght= 0 .000000 The most gene.re.! concept

I source~;;tin~ JPrope,:ties ~ dest= c4 I captured X J Test Against I Send I Play level: p Verbose p +Good I / #concept= 14 & #ndes =2S l_tj WinO I Wint _ w_;...._•___,_w_inl_ _. __w_ in_4__,! _LJ A11al'ysi1tt ~d Good C(J'R.Cdpt , Figure A.17 Testing a case

In our current adaptation of SmS, a case includes a chess position and the last immediate move leading to this position. If any piece was captured in reaching this position, this will also be shown. This current case, i.e. current chess position, can be tested against any existing concept in an NRDR knowledge base. It can also be used

as a comer stone case for the first rule in a new concept. The test concept is chosen by

the Change Test item in the Utility menu. A dialog box prompts the expert for the

concept name (Figure A.18). If the user is satisfied with the concept to be tested

against the case, then the user can use the "Accept" command button. This is also in

the Utility menu (see A3.3).

204 Figure A.18 Change the concept

There are three possible scenarios after a case is tested: First, the concept being tested may not exist, then concept has to be created by the expert. Second, the concept already exists and the system agrees with what the user has entered. Third, the concept exi sts but the system disagrees with what the user entered. In the last two scenarios, the user can choose to do further analysis. The user has an option of viewing test result in the form of a knowledge base trace (figure A.19, 20).

A.5.1 Case Verified

, case Verified ii . .. . . El The concept Good is verified · against the case, Do you want · to see the trace?

Figure A.19 Case verified

A trace is represented using red lines and nodes. Red path indicates the nodes that have been vi sited (the rippling path). A text box is also shown on the side to show the trace and the conclusion in textual form. The trace window of a test case has the same fu nctionality as the window for viewing the tree. In addition, there are the following:

The Addrule button This all ows adding a rule to a concept. Usually, it is only needed

when adding a new concept, or when the expert di sagrees with the system. This

button becomes red when there is a di sagreement wi th the expert (figure A.21),

205 The Deleterule Button This allows deleting the last added rule to the tree. This may be useful in dealing with inconsistencies (See chapter 6 discussion for NRDR update policies).

Analysmg the Good concept l!!I~ El

Trace Rule: 123 Conclusion: +Good

Rwe 3 of Good If +Exp ose!(0;-), +ExchangeO!(0;-), µ~ Then +Good The weight is 0 .100000

Ceiiainty

Concept Properties CnrStone Parameter= 0;­ W eight= 0 .000000 Addlbde The most general concept DeleteRwe ,,.. I Exit Figure A.20 Trace of agreed test

Analysing the Good concept l!llil 13 • .I

Trace Rule: 1 Conclusion: -Good

Rule I of Good Then -Good The weight is 999.000000

Certainty ~ A new node generic comment

Concept Propei-ties CnrStwle Parameter= O;- I Weight• 0.000000 The most general concept ■· Ii'\!!! ■ DeleteRwe I Exit FJ / f I Figure A.21 When the system disagrees with the user

206 A.5.2 Misclassified Cases If the system disagrees with the expert, a trace of the tree immediately pops up. An

"Addrule " button is highlighted. (Figure A.21). This informs the expert that s/he must add a rule to correct the knowledge base so that it agrees with him/her. The expert is not forced to add a rule, but the button will remain highlighted if s/he chooses not to add rule.

A.5.3 Addrule

, Add rule 1 to +Good at tnl!!iOO El !;_dit I Qonditions ------

Undo Redo Analyse Exit

Certainty

jx =+ e4 TO then I +/- Inconsistencies level ra- _J All Figure A.22. Adding a rule

When an expert decides to add a rule to change the definition of a concept, then a specific dialog box for this task pops up (Figure A.22). Once all conditions are added for the new rule (in figure A.22), the expert clicks the "then" button to conclude the rule. S/he can use the "+/-" button to change the sign of both conditions of the rule and its conclusion. The "Redo" and the "Undo" command in the menu assists the expert in dealing with any mistakes or syntax errors that s/he may make. The "Clear" button is to clear the contents of the entry box.

207 , Add mle 1 to +Good at tnlllil 13 ~ dit g_ onditions I If K Compound If K If I bvalue (0) « bval 2 P missing 1U X +to e4 in 1 1U X +to Yin 11U e4 +to Yin 1 1U ______black +to Yin 1 Certain: X +->Yin 1 1U X +R> Yin11U X+F>Yin11U P 12 missing 1U X = + [ value ] 1U Inconsistencies le, X = + Y 1U Figure A.23 Condition Menu Condition menu

The left-hand text box is for displaying current available primitives and existing concepts in the knowledge base, which can be used as conditions (figure A.23). Each class of conditions is displayed under a separate submenu. Whenever a condition is selected, it is transferred to the entry box. The user can use both sub-menus selection mechanism, or simply type in the conditions manually. A "Carriage Return" in the entry box is required to finalise entering the condition.

Help comments for each of the primitives conditions are available by moving the mouse over the condition the expert wants to sue (Figure A.24.). The help for primitive conditions is read by the system from a file called "primitives.list" which should reside in the directory with the TclTk-codes. This file should be edited as new

primitives are made available. The help for concepts are comments earlier entered by

the expert when the concept is created.

208 bvalue (0) « bvalue (2) TO 2 P missing TO X +to e4 in 1 TO X +toYin1 TO e4 +to Y in 1 TO black +to Y in 1 TO X+-> Yin1 TO X+R> Yin1 TO X+F>Yin1 TO X =+ [ value ] TO ...... X=+ YTO Constrains X to (not) equal to ' captured=+ PTO I Figure A.24. Help for the conditions Syntax check

When a condition is typed in the addrule window entry box, a syntax check is made.

If the condition entered is a concept, then it should be in the format of: sign conceptname (parameters), eg + WeakSquare(e4). The sign can be a"+" or a"-", the conceptname can be any alphanumeric name, and the parameters must be bracketed.

00!Mfftiblif212i,Mt4h; .1~1RI l'lfldSfl ntartifl ll'tlJgkts, TQ'llgfl ill {-/0, /OJ

Figure A.25. Adding a new concept info rmation

The syntax of the parameter li st will be described later in detail. If the condition is a

primitive (detected whenever the fi rst letter of the string entered is not a sign (+/-)),

then its syntax is checked by the system. In case of a syntax error, a di alog box will

pop up di spl ayin g the error. The expert can then re-enter th e condi tion. If a concept

209 entered in a rule is not listed in the list of concepts, then it is considered a new concept. Hence, its details must be added as explained next.

A.5.4 New Concept When a new concept is added, default rule in its RDR tree definition must also be added. The information of the concept is added first. This includes: its weight, the formal parameter list, and descriptive comments (see figure A.25)

All information entered is checked against these criterion:

1. Weights have to be between 10 and -10

2. If the weight is undefined, it can be represented by 999.

3. If conditions of the new concept consist of all primitives, then the its

weight cannot be undefined.

4. The formal parameter list is checked for syntax errors.

5. All tabs and returns are stripped off from the comments.

If no syntax error occurs, the same procedure of adding a rule applies (as shown in section A 5.3). The new added concept has a further restriction, that conditions to be added can only be described in terms of either all primitive conditions, or all expert concepts conditions. The restriction is enforced by the interface.

A.6 Testing for Inconsistencies

Updating an NRDR knowledge base may generate unwanted inconsistencies (see chapters 6 and 7). A tool is added to detect such inconsistencies. This tool is useable

from within the addrule window.

The only visible difference added to the addrule window is a small box to enter the

level of inconsistencies and a checkbox to mark "All" levels of inconsistencies. The

level of inconsistencies indicates how many levels in the hierarchy of the knowledge

210 base up from the added rule in the current concept are to be traced. The "All" button means all levels of inconsistencies, hence all the ".data" files are to be traced. Since there are lot of cases to be tested, the inconsistency test is done by the C-code of the system- rather than at the interface level. Multiple traces are returned back to the interface. To detect any inconsistencies, traces of cases in question before and after the new rule has been added, are compared. A window will appear with the

Found Inconsistencies l!!llil 13 Concepts Cases 3 ou.t of 3 ...J Conclusion ,Good Case 1races ~ 2 I I 3 Attribute Difference case=1 OLD:-Good NEW:-Good J I I I Figure A.26 Inconsistencies found appropriate information of any found inconsistencies (Figure A.26). No interactions between the user and the interface are possible while traces are being found by the system (As this may take 10-20 seconds).

The Concepts listbox The listbox at the left-hand side in Figure A.26 displays all concepts which mi ght be affected due to a change in a particular concept. If the inconsistencies are only being checked at level 1, only the concept for which the new rule is added appears. Each selection of a concept (on the right hand side of figure

A.26) leads to a display of a li st of numbers. Each number indicates a case found to be

inconsistent with respect to the selected concept. The total number of inconsistent

cases found for a selected concept is displayed on the label on the top right hand side.

2 11 Buttons of the knowledge acquisition assistant window (figure A.26)

There are three buttons in total: The "Case" button displays the selected case. The concept and the case number must be selected before any of the buttons are pressed.

The "Trace" button displays old and new traces associated with the selected case. The display format of the trace is the same as in the trace window, which was described earlier in Figure A.20. Finally, the "Attribute Differences" button displays the differences in terms of expert defined concepts between the old and the new traces.

212 Bibliography

Antoniou, G., MacNish, C. K. and Foo, N. (1996). Conservative expansion concepts for default theories. Fourth Pacific Rim International Conference on Artificial Intelligence (PRICAI96), Australia, Springer Verlag, 1, 522-534.

Arinze, B. (1989). "A natural language front-end for knowledge acquisition." SIGART Newsletter 108: 106-114.

Ausbel, D., Novak, J. and Hanesian, H. (1978). Educational Psychology: A Cognitive View. New York, Holt, Rinehart & Winston.

Aussenac, N., Frontin, J., Riviere, M. and Soubie, J. (1989). A mediating representation to assist knowledge acquisition with MACAO. European Knowledge Acquisition Workshop, Springer-Verlag516-529.

Barth, W. (1995). "Combining Knowledge and Search to Yield Infallible Endgame Programs: A study of passed Pawns in the KPKP endgame." International Computer Chess Association (ICCA) journal 18(3): 149-159.

Benjamins, R. (1995). "Problem solving methods for diagnosis and their role in knowledge acquisition." International Journal of Expert Systems: Research and Applications 2(8): 93-120.

Berkeley, G. (1952). The Principles of Human Knowledge. Great Books of the Western World, Encyclopedia Britanica. 35.

Beydoun, G. and Hoffmann, A. (1997). Acquisition of Search Knowledge. The 10th European Knowledge Acquisition Workshop (EKAW97), Spain, Springer, 1, 1-16.

Beydoun, G. and Hoffmann, A. (1997). NRDR for the Acquisition of Search Knowledge. 10th Australian Conference on Artificial Intelligence, Australia, Springer, 1, 175-186.

213 Beydoun, G. and Hoffmann, A. (1998). Building Problem Solvers Based on Search Control Knowledge. 11th Banff Knowledge Acquisition for Knowledge Base System Workshop, Canada, SRDG Publications, 2, SHARE3.l-SHARE3.18.

Beydoun, G. and Hoffmann, A. (1998). Building search heuristics at the knowledge level. Pacific Rim Knowledge Acquisition Workshop (PKA W98), Singapore, National University of Singapore, 1, 140-156.

Beydoun, G. and Hoffmann, A. (1998). Simultaneous Modelling and Knowledge Acquisition using NRDR. 5th Pacific Rim Conference on Artificial Intelligence (PRICAI98), Singapore, Springer-Verlag, 1, 83-95.

Beydoun, G. and Hoffmann, A. (1999). A Formal Framework of Ripple Down Rules. The Fourth Australian Workshop on Knowledge Acquisition (AKAW1999), Sydney, University of New South Wales, 1, 57-71.

Beydoun, G. and Hoffmann, A. (1999). Hierarchical Incremental Knowledge Acquisition. 12th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop (KAW99), Canada, SRDG publications, 2, 7-2.1 - 7-2.20.

Beydoun, G. and Hoffmann, A. (1999). A Holistic Approach for Knowledge Acquisition. 11th European Workshop on Knowledge Acquisition and Management (EKAW99), Germany, Springer, 1, 309-315.

Beydoun, G. and Hoffmann, A. (2000). "Incremental Acquisition of Search Knowledge." International Journal of Human Computer Interactions. Volume 52, issue 3, March 2000, p493-530.

Beydoun, G. and Hoffmann, A. (2000). Monitoring Knowledge Acquisition, Instead of Evaluating Knowledge Bases. 12th European Workshop on Knowledge Acquisition and Management (EKAW000), France (to appear).

Boose, J. (1984). Personal Construct Theory and the Transfer of Human Expertise. American Association Artificial Intelligence Conference (AAAI84), Los Altos, Kaufman, 1, 27-83.

Boose, J. (1991). Knowledge Acquisition Tools, Methods, and Mediating Representation. Knowledge Acquisition for Knowledge-Based Systems. H. Motoda, Mizoguchi, R., Boose, J. and Gaines, B. Amsterdam, Ohmsha. 1.

Boose, J. and Gaines, B., Eds. (1989). Foundations of Knowledge Acquisition. London, Academic Press.

Brewka, G. (1994). Reasoning About Priorities in Default Logic. American Association Artificial Intelligence Conference (AAAI94), Washington, MIT press, 2, 940-945.

Brown, B. (1989). "The taming of an expert: an anecdotal report." SIGART Newsletter 108: 133-135.

214 Buchanan, B. G. (1978). "Dendral and Meta-Dendral: Their application dimension." Artificial Intelligence 11(11): 5-24.

Chandrasekaran, B. (1986). "Generic tasks in knowledge-based reasoning: High level building blocks for expert system design." IEEE Expert 3(1): 23-30.

Chandrasekaran, B., Johnson, T. and Smith, J. (1992). "Task Structure Analysis for Knowledge Modelling." Communications of ACM 35(9): 124-137.

Charniak, E. and McDermott, D. (1985). Introduction to Artificial Intelligence, Addison & Wesley.

Chomsky, N. (1957). Syntactic Structures, Mouton.

Clancey, W. (1993). "Situated Action: A neuropsychological interpretation." Cognitive Science 17: 87-116.

Clancey, W. J. (1989). "The Knowledge Level Reinterpreted: Modelling How Systems Interact." Machine Learning 4: 285-291.

Coles, L. S. (1994). "Computer Chess: The Drosophila of AI." AI Expert 9(April): 25-31.

Compton, P., Edwards, G., Kang, B., Lazarus, L., Malor, R., Menzies, T., Preston, P., Srinivasan, A. and Sammut, C. (1991). Ripple Down Rules: Possibilities and Limitations. 6th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop (KAW9 l ), Canada, SRDG publications, 1, 6.1-6.18.

Compton, P., Edwards, G., Kang, B., Lazarus, L., Malor, R., Preston, P. and Srinivasan, A. (1992). "Ripple down rules: Turning knowledge acquisition into knowledge maintenance." Artificial Intelligence in Medicine 4: 463-475.

Compton, P., Hom, R., Quinlan, R. and Lazarus, L. (1989). Maintaining an expert system. Applications of Expert Systems. J. R. Quinlan. London, Addison Wesley: 366-385.

Compton, P. and Jansen, R. (1988). Knowledge in Context: a strategy for expert system maintenance. Second Australian Joint Artificial Intelligence Conference (Al88), 1, 292-306.

Compton, P. and Jansen., R. (1990). "A philosophical basis for knowledge acquisition." Knowledge Acquisition 2: 241-257.

Compton, P., Kang, B., Preston, P. and Mulholland, M. (1993). Knowledge Acquisition Without Knowledge Analysis. European Knowledge Acquisition Workshop (EKAW93), Springer, 1, 277-299.

Compton, P., Preston, P. and Kang, B. (1995). The Use of Simulated Experts in Evaluating Knowledge Acquisition. 9th AAAI-sponsored Banff Knowledge

215 Acquisition for Knowledge Base System Workshop (KAW95), Canada, SRDG publications, 1, 12.1-12.18.

Compton, P., Preston, P. and Yip, T. (1994). Local patching produces compact knowledge bases. The European Knowledge Acquisition Workshop (EKAW94), Springer-Verlag, 1, 104-119.

Compton, P., Ramadan, Z., Preston, P., Le-Gia, T., Chellen, V., Mulholland, M., Hibbert, D., Haddad, P. and Kang, B. (1998). A trade-off between domain knowledge and problem-solving method. 11 th Banff Knowledge Acquisition for Knowledge Base System Workshop (KAW98), Canada, SRDG Publications, 2, SHARE7.l­ SHARE7.18.

Compton, P. and Richards, D. (1999). Extending Ripple Down Rules. Fourth Australian Knowledge Acquisition Workshop (AKAW99), Sydney, University of New South Wales 87-101.

Craw, S. and Boswell, R. (1999). Representing Problem-Solving for Knowledge Refinement. Sixteenth National Conference on Artificial Intelligence (AAAI99), Orlando, FL, AAAI Press/MIT Press, pages 227-234.

Courtney, A., Antoniou, G. and Foo, N. (1996). Exten: A System for Computing Default Logic Extensions. Fourth Pacific Rim International Conference on Artificial Intelligence (PRICAI96), Australia, Springer, 1, 411-423.

Davidson, D. (1984). Inquiries into Truth and Interpretation. Oxford, Oxford University Press.

Davis, R. (1979). "Interactive Transfer of Expertise: Acquisition of New Inference Rules." Artificial Intelligence 12: 121-157.

Davis, R. and Lenat, D. B. (1982). "Knowledge bases systems in Al".

Dennett, D. (1978). Brainstorms: Philosophical Essays on Mind and Psychology. Montgomery, Bradford.

Dennett, D. C. (1996). Kinds of Minds: Towards an Understanding of Consciousness, Science masters.

Devlin, K. (1997). Goodbye, Descartes: The end of logic and the search for a new cosmology of the mind, John Wiley & Sons, Inc.

Dieng, R., Giboin, A., Tourtier, P.-A. and Corby, 0. (1992). Knowledge acquisition for explainable, multi-expert, knowledge-based system design. The European Knowledge Acquisition Workshop (EKAW92), Springer-Verlag, 1, 298-317.

Donoho, S. K. and Wilkins, D. C. (1994). ODYSSEUS2: Addressing the Challenges of Apprenticeship. Knowledge Acquisition for Knowledge-Based Systems Workshop, Canada.

216 Dreyfus, H. L. (1994). What Computers Still Can't Do: A critique for Artificial Reason, The MIT Press, Massachusetts.

Edwards, G. (1996). Reflective Expert Systems in Clinical Pathology (MD thesis), University of New South Wales.

Eriksson, H. (1993). Specification and Generation of Custom-Tailored Knowledge­ Acquisition. 13th International Joint Conference on Artificial Intelligence, France, Kaufmann, 2, 510-515.

Etherington, D. W. (1988). Reasoning with Incomplete Information. California, Morgan Kaufmann.

Etherington, D. W. and Reiter, R. (1985). On Inheritance Hierarchies With Exceptions. Readings in Knowledge Representation. R. Brachman and Levesque, H., Morgan Haufmann. 1.

Fensel, D. (1997). The tower-of-adaptor method for developing and reusing problem­ solving methods. European Knowledge Acquisition Workshop, Spain, Springer­ Verlag, 1, 97-112.

Fensel, D., Benjamins, V. R., Motta, E. and Wielinga, B. (1999). UPML: A framework for knowledge system reuse. Sixteenth International Joint Conference on Artificial Intelligence (IJCAI99), Sweden, Morgan Kaufmann Publishers, 1, 16-21.

Fensel, D., Motta, E., Decker, S. and Zdrahal, Z. (1997). Using Ontologies for Defining Tasks. Problem-Solving Methods and Their Mappings. 10th European Knowledge Acquisition Workshop (EKAW97), Spain, Springer, 1, 113-128.

Fodor, J. (1975). Language of Thought. Cambridge, Harvard University Press.

Fodor, J. and Lepore, E. (1993). Holism: A Shopping Guide. Oxford, Blackwell.

Fodor, J. A. (1998). Concepts: where cognitive science went wrong, Oxford.

Ford, K. M. (1987). An approach to the automated acquisition of production rules from repertory grid data, Tulane.

Gadamer, H.-G. (1993). Truth and Method. New York, Continuum.

Gaines, B. R. (1991). Induction and Visualisation of Rules with Exceptions. 6th Banff Knowledge Acquisition for Knowledge Base System Workshop, SRDG, 1, 7.1- 7.18.

Gaines, B. R. and Compton, P. J. (1992). Induction of Ripple Down Rules. Fifth Australian Conference on Artificial Intelligence (Al92), Hobart, World Scientific, 1, 349-354.

Gero, J. and Sudweeks, F. (1996). Artificial Intelligence in Design, Kluwer Academic Press.

217 Gero, J. and Tyugu, E. (1994). Formal Design Methods for CAD, North-Holland.

Gomez-Perez, A. and Rojas-Amaya, D. (1999). Ontological Reengineering for Reuse. 11th European Workshop on Knowledge Acquisition, Modelling and Management (EKAW99), Germany, Springer, 1, 139-156.

Goodman, N. (1954). Fact, Fiction & Forecast. London, Athlone press.

Groot, A. d. (1965). Thought and choice in chess. Paris, Mouton.

Groot, A. d. ( 1966). Perception and memory versus thought: some old ideas and recent findings. New York, John Wiley and Sons.

Gu, R. (1996). High-performance digital VLSI circuit design, Kluwer Academic Publishers.

Hass, N. and Hendrix, G. (1983). Learning by being told: Acquiring knowledge for information management. Machine Learning: An Artificial Intelligence Approach. R. Michalski, Carbonell, J. and Mitchell, T. Palo Alto, Tioga Press. 1.

Hayes-Roth, F., Waterman, D. A. and Lenat, D. B., Eds. (1983). Building Expert Systems. Massachusetts, Addison & Wesley.

Heijst, G. V., Screiber, A. and Wielinga, B. (1997). "Using explicit ontologies in KBS development." International Journal of Human-Computer Studies 45: 183-292.

Hoffman, R. (1990). A survey of methods for eliciting the knowledge of experts. Readings in knowledge acquisition. K. McGraw and Westphal, C. New York, Ellis Horwood. 1: 7-14.

Hoffmann, A. (1992). Phenomenology, representations and complexity. 10th European Conference on Artificial Intelligence, Vienna, Wiley & Sons610-614.

Hoffmann, A. (1998). Paradigms of Artificial Intelligence: A Methodological and Computational Analysis. Sydney, Springer.

Hoffmann, A. and Thakar, S. (1991). Acquiring Knowledge by Efficient Query Learning. 12th International Conference on Artificial Intelligence (UCAI91), Sydney, Morgan Kaufman, 1, 783-788.

Hsu, F.-h., Marsland, T. A., Shaeffer, J. and Wilkins, D. (1991). The Role of Chess in AI. 12th International Conference on Artificial Intelligence, Sydney, Morgan Kaufman, 1, 547-552.

Husserl, E. (1977). Phenomenological psychology. The Hague, Nijhoff.

Iba, G. A. (1989). "A Heuristic Approach to the Discovery of Macro-operators." Machine Learning 3: 285-317.

218 Kang, B. (1996). Validating Knowledge Acquisition: Multiple Classification Ripple Down Rules (PhD thesis). School of Computer Science and Engineering. Sydney, New South Wales University.

Kang, B., Compton, P. and Preston, P. (1998). Multiple classification ripple down rules: Evaluation and possibilities. 9th AAAI-sponsored Banff Knowledge Acquisition for Knowledge Based Systems Workshop, Canada, 1, 17.1-17.20.

Kang, B. H., Gambetta, W. and Compton, P. (1996). "Verification and validation with ripple-down rules." International Journal of Human-Computer Studies 44: 257-269.

Kelly, G. (1970). A brief introduction to personal construct theory. Perspectives in Personal Construct Theory. D. Bannister. London, Academic Press: 1-29.

Kieras, D. E. (1993). Learning Schemas from Explanations in Practical Electronics. Foundation of Knowledge Acquisition. S. Chipman and Meyrowitz, A. L., Kluwer Academic Publishers. 1: 83-118.

Kivinen, J., Mannila, H. and Ukkonen, E. (1992). Learning Hierarchical Rule Sets. Fifth annual ACM workshop on Computational Learning Theory, New York, The Association of Computing Machinery, 1, 37-44.

Kivinen, J., Mannila, H. and Ukkonen, E. (1993). Learning Rules with Local Exceptions. European Conference on Computational Theory.

Klahr, P. and Waterman, D. (1986). Expert Systems. Techniques. Tools and Applications. California, Addison-Wesley.

Lafrance, M. (1988). The knowledge acquisition grid: a method for training knowledge engineers. Knowledge acquisition for knowledge-based systems. B. Gaines, Academic Press: 81-92.

Lafrance, M. (1990). The special structure of expertise. Readings in knowledge acquisition. K. McGraw and Westphal, C. New York, Ellis Horwood. 1: 55-70.

Laird, J. E., Newell, A. and Rosenbloom, P. S. (1993.). Soar: An Architecture for General Intelligence. The Soar papers. J. L. P. Rosenbloom, and A. Newell, editors, MIT Press. 1: 459-462.

Lengauer, T. (1990). Combinatorial Algorithms for Integrated Circuit Layout, John Wiley and Sons.

Linster, M. (1988). KRITON: A knowledge elicitation tool for expert systems. 2nd European Knowledge Acquisition Workshop for Knowledge-Based Systems, Germany, Springer.

Linster, M. (1993). Explicit and operational models as a basis for second generation knowledge-acquisition tools. Second Generation Expert Systems. J.-M. David, Krivine, J.-P. and Simmons, R., Springer-Verlag: 477-506.

219 Lukaszewicz, W. (1990). Non-Monotonic Reasoning: Formalization of Commonsense Reasoning. West Sussex, Ellis Horwood.

Martinez-Bejar, R., Benjamins, V. R., Compton, P., Preston, P. and Matin-Rubio, F. (1998). A formal framework to build knowledge ontologies for ripple-down rules­ based systems. 11th Banff Knowledge Acquisition for Knowledge Base System Workshop, Canada, SRDG, 2, SHARE13.l-SHARE13.16.

Martinez-Bejar, R., Benjamins, V. R. and Matin-Rubio, F. (1997). Designing Operators for Constructing Domain Knowledge Ontologies. European Workshop on Knowledge Acquisition, Modelling and Management, Spain, Springer, 1, 159-173.

Martinez-Bejar, R., Shiraz, H. and Comtpon, P. (1998). Using Ripple Down Rules­ based Systems for Acquiring Fuzzy Domain Knowledge. 11th Banff Knowledge Acquisition For Knowledge-Based Systems Workshop, Canada, SRDG publications, 1, KAT2.l-KAT2.18.

McDermott, J. (1980). Rl: An Expert in Computer Systems Domain. American Association Artificial Intelligence Conference (AAAI80), William Kaufmann, 1, 269- 271.

McDermott, J. (1982). "Rl: a rule-based configurer of computer systems." Artificial Intelligence 19(1): 39-88.

Mitchell, T. M. (1997). Machine Leaming. Singapore, McGraw-Hill.

Newell, A. (1982). ''The knowledge level." Artificial Intelligence 18: 87-127.

Newell, A. (1993). "Reflections on the Knowledge Level." Artificial Intelligence 59(1): 31-38.

Newell, A. and Simon, H. A. (1972). Human problem solving. Englewood Cliffs. N. J., Prentice-Hall.

Oussalah, M. and Messaadia, K. (1999). The Ontologies of Semantic and Transfer Links. 11th European Workshop on Knowledge Acquisition, Modelling and Management (EKAW99), Germany, Springer, 1, 225-242.

Parpola, P. (1998). Seamless Development of Structured Knowledge Bases. 11th Banff Knowledge Acquisition for Knowledge Bases Systems (KAW98), Canada, SRDG, 1, VKM9.l-VKM9.18.

Patel, J. (1991). On the Road to Automatic Knowledge Engineering. 12th International Conference on Artificial Intelligence (IJCAI91), Sydney, Morgan Kaufman, 1, 628-632.

Pearl, J. and Korf, R. E. (1987). "Search Techniques." Annual Revue of Computer Sciences 2: 451-67.

220 Pell, B. (1993). Strategy Generation and Evaluation for Metagame Playing. Computer Laboratory. Cambridge, University of Cambridge.

Pell, B. (1996). "A Strategic Metagame Player for general chess-like games." Computational Intelligence 12: 177-198.

Putnam, H. (1988). Representation and Reality. London, MIT press.

Quine, W. (1951). "Two Dogmas of Empiricism." Philosophical Review(l).

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.

Ramadan, Z., Mulholland, M., Hibbert, D. B., Compton, P., Preston, P. and Haddad, P. R. (1997). Towards an Expert System in Ion Chromatography Using Multiple Classification Ripple Down Rules (MCRDR). International Ion Chromatography Symposium, IICS'97, Canada

Reiter, R. (1980). "A Logic for Default Reasoning." Artificial Intelligence 13: 81-132.

Rey, G. (1997). Contemporary Philosophy of Mind. Cambridge, Blackwell.

Richards, D. (1998). The Reuse of Knowledge in Ripple Down Rule Knowledge Based Systems (PhD thesis). Artificial Intelligence Department. Sydney, New South Wales: 335.

Richards, D., Chellen, V. and Compton, P. (1996). The Reuse of RDR Knowledge Bases: Using Machine Learning to Remove Repetition. Pacific Knowledge Acquisition Workshop (PKAW96), Sydney, 1, 293-312.

Richards, D. and Compton, P. (1997). Knowledge Acquisition First, Modelling Later. 10th European Knowledge Acquisition Workshop (EKAW97), Spain, Springer, 1, 237-252.

Richards, D. and Compton, P. (1997). Uncovering the conceptual models in RDR KBS. International Conference on Conceptual Structures ICCS'97, Seattle, Springer Verlag

Richards, D. and Compton, P. (1998). "Taking up the Situated Cognition Challenge With Ripple Down Rules." International Journal of Human-Computer Studies 49: 895-926.

Richards, D. and Compton, P. (1999). Revisiting Sisyphus I- An Incremental Approach to Resource Allocations using Ripple Down Rules. 12th Banff Knowledge Acquisition for Knowledge Base System Workshop (KA W99), Canada, SRDG, 1.

Rivest, R. L. (1987). "Leaming Decision Lists." Machine Leaming 2: 229-246.

Scheffer, T. (1995). Leaming Rules with Nested Exceptions. International Workshop on Artificial Intelligence Techniques, Bmo, Czech Republic.

221 Scheffer, T. (1996). Algebraic foundations and improved methods of induction or ripple-down rules. 2nd Pacific Rim Knowledge Acquisition Workshop, Sydney, Australia.

Schreiber, G., Wielinga, B., Akkermans, J., Velde, W. V. D. and Hoog, R. D. (1994). "CommonKADS: A comprehensive methodology for KBS." IEEE Expert 9(6): 28- 37.

Schreiber, G., Wielinga, B. and Breuker, J. (1993). KADS: A Principled Approach to Knowledge-Based System Development, Academic Press.

Searle, J. R. (1983). Intentionality. New York, Cambridge University Press.

Shadbolt, N. R. and Wielinga, B. (1990). Knowledge-based knowledge acquisition. European Knowledge Acquisition Workshop (EKAW90), IOS Press, 1, 98-117.

Shahar, Y. and Cheng, C. (1998). Ontology Driven Visualisation of Temporal Abstractions. 11th Banff Knowledge Acquisition for Knowledge Base System Workshop (KAW98), Canada, SRDG, 1, VKMlO.l-VKMl0.18.

Shaw, M. L. (1980). On Becoming a Personal Scientist: Interactive Computer Elicitation of Personal Models of the World. London: Academic Press

Shaw, M. and Gaines, B. (1988). A methodology for recognizing consensus. correspondence, conflict, and contrast in a knowledge acquisition system. 3rd Knowledge Acquisition for Knowledge Based Systems Workshop, Canada.

Shaw, M. L. and Gaines, B. R. (1993). Personal construct psychology foundations of knowledge acquisition and representation. European Knowledge Acquisition Workshop (EKAW93), Springer-Verlag, 1, 256-276.

Shiraz, G. and Sammut, C. (1997). Combining knowledge acquisition and machine learning to control dynamic systems. 15th International Joint Conference on Artificial Intelligence (IJCAI97), Japan, Morgan Kaufman.

Shiraz, G. M. (1998). Building Controller for Dynamic Systems (PhD Thesis). School of Computer Science and Engineering. Sydney, New South Wales: 260.

Shortliffe, E. (1976). Computer-Based Medical Consultants: MYCIN. New York, Elsevier.

Simon, H. A. (1974). "How big is a chunk?" Science 183: 482-488.

Sowa, J. F. (1984). Conceptual Structures. information processing in mind and machine, Addison and Wesley.

Stumptner, M. (1997). "An overview of knowedge-based configuration." AI Com 10(2): 111-126.

222 Suryanto, H., Richards, D. and Compton, P. (1999). The Automatic Compression of Multiple Classification Ripple Down Rule Knowledge Based Systems: Preliminary Experiments. Third International Conference on Knowledge-Based Intelligent Information Engineering Systems, Australia, IEEE inc., 1, 203-206.

Tecuci, G. (1998). Building Intelligent Agents: An Apprenticeship Multistrategy Learning Theory. Methodology, Tool and Case Studies. Sydney, Academic Press.

Thanitsukkam, T. and Finkelstein, A. (1998). A Conceptual Graph Approach to Support Multiperspective Development Environment. 11th Banff Knowledge Acquisition for Knowledge Base System Workshop (KAW98), Canada, SRDG, 1, VKM 11.1-VKM 11.18.

Wada, T., Horiuchi, T., Motoda, H. and Washio, T. (1998). A New Look at Default Knowledge in Ripple Down Rules Method. Pacific Rim Knowledge Acquisition Workshop, Singapore, National Univeristy of Singapore, 1, 171-186.

Walpole, R. E. and Myers, R.H. (1989). Probability and Statistics for Engineers and Scientists. New York, Macmillan Publishing Company.

Wielinga, B., Schreiber, G. and Breuker, J. (1992). "KADS: a modelling approach to knowledge engineering." Knowledge Acquisition 4: 5-53.

Winograd, T. and Flores, F. (1987). Understanding computers and cognition, Addison Wesley.

Wittgenstein, L. (1953). Philosophical Investigations. London, Blackwell.

Yost, G. R. (1993). "Acquiring Knowledge in Soar." IEEE Expert(6): P26-34.

Zadeh, L. (1983). "The role of fuzzy logic in the management of uncertainty in expert systems." Fuzzy Sets and Systems 8(3): 199-227.

223