THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON ALGORITHMS

FOR REASONING OPERATIONS USING A CONCEPTUAL GRAPHS

KNOWLEDGE BASE

BY

HEATHER DAY PFEIFFER, B.S., M.S.

A dissertation submitted to the Graduate School

in partial fulfillment of the requirements

for the degree

Doctor of Philosophy

Subject:

New Mexico State University

Las Cruces, New Mexico

December 2007

Copyright c 2007 by Heather Day Pfeiffer, B.S., M.S. “The Effect of Data Structures Modifications On Algorithms for Reasoning Operations

Using a Conceptual Graphs Knowledge Base,” a dissertation prepared by Heather Day

Pfeiffer, B.S., M.S. in partial fulfillment of the requirements for the degree, Doctor of

Philosophy, has been approved and accepted by the following:

Linda Lacey Dean of the Graduate School

Roger T. Hartley Chair of the Examining Committee

Date

Committee in charge:

Dr. Roger T. Hartley, Chair

Dr. Desh Ranjan

Dr. Clinton Jeffery

Dr. Jeanine Cook

ii DEDICATION

This Dissertation is dedicated to my husband, Dr. Joseph J. Pfeiffer, Jr. who has supported me through "thick and thin", my children, Joseph “Joel” III and Rebecca

“Becca” who have seen "Mom" work on a degree all their lives, my parents, Lloyd and Barbara Day who have always believed in education and instilled that belief in their children, and my in-laws (may they rest in peace) Joe and Mary Elizabeth “Betty”

Pfeiffer.

iii ACKNOWLEDGMENTS

David J. Benn, from the University of South Australia at Adelaide, for working to help intergrat his ‘pCG’ system with the CPE "Operations" module and help in testing and debugging comparison tests with CPE and pCG.

Dr. John F. Sowa who gave me some very lively discussions on growing ideas of Con- ceptual Structures and especially Conceptual Graphs. Also, for allowing me to work with and expand on his original CGIF format.

Dr. Jean-François Baget and Dr. Madalina Croitoru who have taught me much about

Simple Conceptual Graphs (SCGs) and how relation hierarchies make great Supports for SCGs. Also for evaluating and discussing some of the theoretical finds of this dis- sertation.

All the past and current AI graduate students at New Mexico State University, in partic- ular, Dr. Melanie Martin, Nemecio “Chito” Chavez, Jr., Dr. Dan Tappan and Dr. Tom

O’Hara.

The hard work of my committee, in particular, Dr. Clinton Jeffery who carefully looked at both content and formatting of all the chapters and traveled all the way back from

Idaho, and Dr. Jeanine Cook who kept me "on track" and over the bumps in the roads.

iv VITA

February 11, 1955 Born in Dallas, Texas, USA

June1977 B.S.inMicrobiology/BiologyfromUniversityofWashington

1980-1984 SystemsAnalystatTheBoeingCompanyinSeattle, Washington

May 1988 M.S. in Computer Science from New Mexico State University

1987-2007 ComputerConsultantbasedinLasCruces,NewMexico

2005-2006 SeniorComputerScientistatHortonTechnicalAssociates, Inc. in Las Cruces, New Mexico

Professional Societies

Association for Computing Machinery (ACM)

IEEE Computer Society

The American Society for Information Systems and Technology (ASIS&T)

New Mexico Network for Women in Science and Engineering (NMNWSE)

Publications

H.D. Pfeiffer and R.T. Hartley. Semantic additions to conceptual programming. In Proc. of the Fourth Annual Workshop on Conceptual Structures, Detroit, MA, 1989.

v M.J. Coombs, R.T. Hartley, H.D. Pfeiffer, and B. Kilgore. How to become immune to facts. In Proc. Rocky Mountain Conference on Artificial Intelligence, Las Cruces, NM, June 1990.

H.D. Pfeiffer and R.T. Hartley. Additions for set representation and processing to con- ceptual programming. In Proc. of the Fifth Annual Workshop on Conceptual Structures, pages 131–140, Boston&Stockholm, 1990.

H.D. Pfeiffer and R.T. Hartley. The Conceptual Programming Environment, CP: Rea- soning representation using graph structures and operations. In Proc. of IEEE Work- shop on Visual Languages, Kobe, Japan, 1991.

M.J. Coombs, H.D. Pfeiffer, and R.T. Hartley. e-MGR: an Architecture for Symbolic Plasticity. In the special issue of International Journal of Man-Machine Studies on in Symbolic Problem Solving in Noisy, Novel, and Uncertain Task Environments, 36:1–17, 1992.

C.A. Fields, H.D. Pfeiffer, and T.C. Eskridge. Knowledge representation and control in gm1, and automated dna sequence analysis system based on the MGR architecture. In International Journal of Man-Machine Studies, 34:549–573,1992.

R.T. Hartley, H.D. Pfeiffer, and D. Qui. Representation for Viewgen: Structures and Reasoning. In Workshop on Propositional Knowledge Representation, Stanford, CA, 1992.

H.D. Pfeiffer and R.T. Hartley. The Conceptual Programming Environment, CP. In T.E. Nagle, J.A. Nagle, L.L. Gerholz, and P. W. Ekland, editors, Conceptual Structures: Current Research and Practice, Ellis Horwood Workshops. Ellis Horwood, 1992.

H.D. Pfeiffer and R.T. Hartley. Temporal, spatial, and constraint handling in the Con- ceptual Programming Environment, CP. Journal of Experimental and Theoretical AI, 4(2):167–182,1992.

H.D. Pfeiffer and T.E. Nagle, editors. Conceptual Structures: Theory and Implementa- tion, volume 754 of LNAI. Springer-Verlag, Heidelberg, W. Germany, 1993.

H.D. Pfeiffer and B.J. Waltar. Automated message analysis using the Conceptual Pro- gramming Environment, CP. In G. Ellis and P. Ekland, editors, Supp. Proc. of the 3rdInternational Conference On Conceptual Structures, Santa Cruz, CA, 1995.

vi H.D. Pfeiffer and R.T. Hartley. Visual CP representation of knowledge. In G. Stumme, editor, Working with Conceptual Structures - Contributions to ICCS 2000, Shaker- Verlag. pages 175–188, 2000.

H.D. Pfeiffer and R.T. Hartley. ARCEdit - CG editor. In CGTools Workshop Pro- ceedings in connection with ICCS 2001, Stanford, CA, 2001. [Online Access: July 2001] URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

H.D. Pfeiffer and R.T. Hartley, editors. CGTools Workshop Proceedings in connec- tion with ICCS 2001, Stanford, CA, 2001. [Online Access: July 2001] URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

R.T. Hartley and H.D. Pfeiffer. Data models for Conceptual Structures. In Foundations and Applications of Conceptual Structures, Contributions to ICCS 2002. ICCS2002, 2002.

K.E. Wolff, H.D. Pfeiffer, and H.S. Delugach, editors. Conceptual Structures at Work, volume 3127 of LNAI. ICCS2004, Springer, July 2004.

H.D. Pfeiffer, K.E. Wolff, and H.S. Delugach, editors. Conceptual Structures at Work, Contributions to ICCS 2004, Aachen, July 2004. ICCS2004, Shaker Verlag.

H.D. Pfeiffer. An exportable CGIF module from the CP environment: A pragmatic approach. In K.E. Wolff, H.D. Pfeiffer, and H.S. Delugach, editors, Conceptual Struc- tures at Work, volume 3127 of LNAI, pages 319–332. ICCS2004, Springer, July 2004.

M.A. Keeler and H.D. Pfeiffer. Collaboratory testbed partnerships as a knowledge capture challenge. In P. Clark and G. Schreiber, editors, Proceedings of the Third Inter- national Conference on Knowledge Capture, pages 203–204. KCAP’05, ACM Press, October 2005.

M.A. Keeler and H.D. Pfeiffer. Games of inquiry for collaborative concept structuring. In F. Dau, M-L Mugnier, and G. Stumme, editors, Conceptual Structures: Common Se- mantics for Sharing Knowledge, ICCS2005, pages 396–410, Berlin, Springer-Verlag, LNAI 3596, July 2005.

H.D. Pfeiffer. Games for co-evolution of digital resources and knowledge tools. In Information Realities: Shaping the Digital Future for All, ASIS&T 2006, Austin, TX, November 2006.

vii M.A. Keeler and H.D. Pfeiffer. Building a pragmatic methodology for KR tool re- search and development. In H. Scharfe, P. Hitzler, and P. Ohrstrom, editors, Conceptual Structures: Inspiration and Application, ICCS2006, pages 314–330, Berlin, Springer- Verlag, LNAI 4068, July 2006.

H.D. Pfeiffer and R.T. Hartley. A comparison of different conceptual structures projec- tion algorithms. In U. Priss, S. Polovina, and R. Hill, editors, Conceptual Structures: Knowledge Architectures for Smart Applications, ICCS’07, pages 165–178, Berlin Hei- delberg, Springer-Verlag, LNAI 4604, July 2007.

H.D. Pfeiffer and J.J. Pfeiffer, Jr. Representation levels within knowledge represen- tation. In U. Priss, S. Polovina, and R. Hill, editors, Conceptual Structures: Knowl- edge Architectures for Smart Applications, ICCS’07, pages 484–487, Berlin Heidel- berg, Springer-Verlag, LNAI 4604, July 2007.

H.D. Pfeiffer, N.R. Chavez, Jr., and J.J. Pfeiffer, Jr. CPE design considering inter- operability. In H.D. Pfeiffer, A. Kabbaj, and D.J. Benn, editors, CS-TIW 2007 Second Conceptual Structures Tool Interoperability Workshop, pages 71–75, 2007.

H.D. Pfeiffer, A. Kabbaj, and D.J. Benn, editors. CS-TIW 2007 Second Conceptual Structures Tool Interoperability Workshop.Research Press International, 2007.

Field of Study

Major field: Artificial Intelligence

Conceptual Structures

viii ABSTRACT

THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON ALGORITHMS

FOR REASONING OPERATIONS USING A CONCEPTUAL GRAPHS

KNOWLEDGE BASE

BY

HEATHER DAY PFEIFFER, B.S., M.S.

Doctor of Philosophy

New Mexico State University

Las Cruces, New Mexico, 2007

Dr. Roger T. Hartley, Chair

Knowledge representation (KR) is used to store and retrieve meaningful in- formation. Meaning cannot be directly stored in the computer; therefore, a series of levels of representation transforms knowledge to a format that a computer can process.

This transformed knowledge is saved using dynamic data structures that are suitable for the style of KR being implemented, and through the KR the system manipulates the knowledge in the data using reasoning operations. The data structure, together with the contents of the transformed knowledge, is called the knowledge base (KB). An al- gorithm and the associated data structures make up the reasoning operation, and the performance of this operation is dependent on the KB it uses.

ix In this work, the basic reasoning operations for knowledge management will

be explored using a particular style of KR called Conceptual Graphs (CGs). These

operations, projection and maximal join, are the foundation for query/answer and hy- pothesis generation (abduction) systems, respectively. It is believed that changing the reasoning operation’s algorithm and providing adequate data structures for them can improve the implementation of the operation for use in intelligent systems; therefore, making them faster and more efficient. Different algorithms and data structures execu- tion times are analyzed over the most general form of CGs knowledge base showing that flexible, fast and efficient operations can improve higher level systems.

x TABLE OF CONTENTS

LISTOFALGORITHMS...... xxii

LISTOFTABLES ...... xxiii

LISTOFFIGURES...... xxv

1 INTRODUCTION ...... 1

1.1 KnowledgeandKnowledgeRepresentation ...... 2

1.1.1 RepresentationLevels...... 4

1.1.2 SpeedandEfficiencyinProcessing ...... 16

1.2 FoundationalInformation ...... 17

1.2.1 BasisofSubgraphIsomorphism ...... 18

1.2.2 OverviewofUnification/Matching ...... 20

1.2.3 DatabasevsKnowledgeBase ...... 22

1.3 OrganizationofDissertation ...... 23

2 ONTOLOGY,KNOWLEDGEANDREPRESENTATION ...... 27

2.1 Ontology ...... 27

2.1.1 AbstractHierarchies...... 29

2.1.2 Relationships ...... 29

2.1.2.1 Compositional ...... 29

2.1.2.2 Quantification ...... 31

xi 2.1.2.3 Qualitative...... 33

2.2 Knowledge ...... 35

2.2.1 Types ...... 36

2.2.1.1 DeclarativeKnowledge ...... 36

2.2.1.2 ProceduralKnowledge ...... 37

2.2.2 Operations...... 38

2.2.2.1 Terminological ...... 38

2.2.2.2 Assertional ...... 39

2.2.2.3 Generalization...... 39

2.2.2.4 Specialization ...... 40

2.3 Representation ...... 40

2.3.1 Knowledge ...... 40

2.3.1.1 Logic ...... 41

2.3.1.2 Rule-Bases ...... 43

2.3.1.3 SemanticNetwork...... 43

2.3.2 InternalRepresentation ...... 47

2.3.2.1 PredicateCalculus...... 47

2.3.2.2 IF..THEN ...... 49

2.3.2.3 ConceptualStructures...... 50

xii 3 DEFINITIONS ...... 55

3.1 GraphTheory...... 55

3.1.1 DigraphandBigraph ...... 56

3.1.2 Walk,PathandConnected ...... 57

3.2 TypesandHierarchies ...... 58

3.2.1 ConceptTypeHierarchy ...... 61

3.2.2 Support ...... 62

3.3 FOL...... 63

3.4 ConceptualGraphs ...... 64

3.4.1 GraphTheoryRelationships ...... 70

3.4.2 FormationRules...... 71

3.4.3 SimpleConceptualGraphs(SCGs) ...... 74

3.4.4 Conceptual Graphs Interchange Format (CGIF) ...... 76

3.5 DataStructures ...... 77

3.5.1 Arrays...... 78

3.5.2 HashTables ...... 79

3.5.2.1 PerfectHashing ...... 79

3.5.2.2 HashTable/HashTables...... 80

4 REASONINGOPERATIONS ...... 81

xiii 4.1 Operators...... 81

4.1.1 Project...... 82

4.1.2 Join ...... 83

4.2 GraphandSubgraphIsomorphism ...... 85

4.2.1 GraphIsomorphism...... 85

4.2.2 SubgraphIsomorphism ...... 85

4.2.2.1 Non-labelednodesandundirectededges ...... 87

4.2.2.2 Labelednodesandundirectededges ...... 87

4.2.3 SubtreeIsomorphism ...... 88

4.2.3.1 HamiltonianPath ...... 88

4.2.3.2 SubforestIsomorphism ...... 88

4.2.4 SubbipartiteIsomorphism...... 89

4.2.5 Projection ...... 90

4.2.5.1 HistoricalAlgorithms ...... 90

4.2.5.2 ProposedAlgorithm...... 91

4.2.6 MaximalJoin ...... 92

4.2.6.1 HistoricalAlgorithms ...... 92

4.2.6.2 ProposedAlgorithm...... 92

4.3 Operations ...... 93

xiv 4.3.1 Projection ...... 93

4.3.2 MaximalJoin ...... 96

4.3.3 OverKnowledgebases ...... 98

5 ALGORITHMSANDANALYSIS ...... 101

5.1 FoundationalAlgorithms...... 101

5.1.1 SCGProjection ...... 103

5.1.2 SCGRelationProjection ...... 107

5.1.3 Polyprojection...... 109

5.1.4 NotioProjection...... 111

5.2 NewAlgorithms ...... 114

5.2.1 SupportingInformation ...... 115

5.2.1.1 VariablesandGivenvalues ...... 115

5.2.1.2 ActualSupportingRoutines...... 117

5.2.1.3 WorstCaseAnalysisforSupportRoutines . . . . . 117

5.2.2 NewProjection ...... 122

5.2.2.1 ActualAlgorithm ...... 124

5.2.2.2 ExecutionTime ...... 124

5.2.2.3 WorstCaseAnalysisforProjection ...... 125

5.2.3 NewMaximalJoin ...... 126

xv 5.3 Typical Scenario Analysis for Projection Algorithms ...... 128

5.3.1 ProjectionAlgorithmsusingSCG ...... 128

5.3.1.1 SCGProjection ...... 129

5.3.1.2 SCGRelationProjection ...... 129

5.3.2 NotioProjection...... 130

5.3.3 NewProjection ...... 130

5.3.3.1 TypicalCaseforSupportRoutines ...... 130

5.3.3.2 TypicalCaseforNewProjectionAlgorithm . . . . 132

6 SYSTEMS/ENVIRONMENTSANDIMPLEMENTATIONS ...... 135

6.1 SemanticNetworkSystems ...... 135

6.1.1 KL-ONE...... 135

6.1.2 SNePS...... 139

6.1.3 SNAP ...... 142

6.1.4 CSInitialProject-PEIRCE...... 143

6.2 ConceptualGraphsEnvironments ...... 147

6.2.1 CoGITaNT ...... 147

6.2.2 Amine...... 147

6.2.3 pCG ...... 148

6.2.4 CPE ...... 149

xvi 6.2.4.1 BasicArchitecturefortheEnvironment ...... 151

6.2.4.2 DataFlowwithintheEnvironment ...... 152

6.2.4.3 DataStructuresusedbytheEnvironment ...... 153

6.3 ADTImplementations ...... 153

6.3.1 Logical ...... 154

6.3.2 BasicDataStructures ...... 154

6.3.3 Object ...... 155

6.4 ExperimentSystemsImplementation ...... 156

6.4.1 pCG-OriginalNotio ...... 157

6.4.2 CPEnvironment(CPE) ...... 159

6.4.2.1 Array(Vectors) ...... 160

6.4.2.2 HashTables ...... 162

7 PROJECTIONEXPERIMENTS,RESULTSANDANALYSIS ...... 165

7.1 DomainProblem-‘BlocksWorld’...... 165

7.2 Tests ...... 171

7.2.1 SingleAppearanceofRelationwithinGraph ...... 172

7.2.1.1 Increase#ofGraphsinKB ...... 173

7.2.1.2 Increase#ofNodesinGraphsinKB ...... 173

7.2.1.3 Increase#ofNodesinQueryGraph ...... 175

xvii 7.2.2 MultipleAppearanceofRelationwithaGraph ...... 178

7.2.2.1 Increase#ofNodesinGraphsinKB ...... 179

7.2.2.2 Increase#ofNodesinQueryGraph ...... 180

7.3 ResultsofEachExperimentSystems ...... 182

7.3.1 pCG-OriginalNotio ...... 182

7.3.2 CPEnvironment...... 182

7.3.2.1 Array(Vector) ...... 183

7.3.2.2 HashTables ...... 183

7.4 ResultsofEach#ofNodesinKB ...... 184

7.4.1 5nodesinKBgraphs ...... 184

7.4.2 11nodesinKBgraphs ...... 187

7.4.3 21nodesinKBgraphs ...... 189

7.4.4 31nodesinKBgraphs ...... 192

7.4.5 53nodesinKBgraphs ...... 194

7.4.6 73nodesinKBgraphs ...... 197

7.5 AnalysisofResults...... 199

7.5.1 Change#ofGraphsinKB ...... 200

7.5.2 Change#ofNodesinKBGraphs...... 200

7.5.3 Change#ofNodesinQueryGraph...... 201

xviii 7.5.4 Change#ofIdenticalRelationsinGraph...... 202

8 CONCLUSIONSANDFUTUREWORK ...... 203

8.1 EvaluationofFourProjectionAlgorithms ...... 203

8.1.1 Strengths ...... 205

8.1.2 Weaknesses ...... 206

8.2 Data Structures and Algorithms Effectiveness Comparison for ImplementedAlgorithms...... 206

8.2.1 Strengths ...... 207

8.2.2 Weaknesses ...... 207

8.3 SignificanceofWork ...... 208

8.3.1 FullConceptualGraphs ...... 208

8.3.2 FindsAllValidProjections ...... 208

8.3.3 Data Structure Integration in Algorithm over Large KBandGraphs ...... 209

8.4 FutureWork ...... 209

8.4.1 Experiments and Analysis of Maximal Join Algorithm . . . . 210

8.4.2 KBStoredFromandToStandardRelationalDB ...... 210

8.4.3 TimeandSpaceConstraints...... 211

8.4.3.1 Heuristics ...... 212

8.4.3.2 Time...... 213

xix 8.4.3.3 Space ...... 215

8.4.4 Different Domain Problems and Interoperability ...... 217

APPENDICES ...... 218

A PROGRAMMINGLANGUAGECRITERIA ...... 219

A.1 LanguageEvaluation ...... 219

A.1.1 VisualBasic.Net ...... 222

A.1.2 JavaTM ...... 222

A.1.3 C...... 223

A.1.4 C++ ...... 223

A.2 LanguageComparison ...... 223

A.2.1 C++toC...... 224

A.2.2 C++ to JavaTM ...... 224

A.2.3 C++toProlog...... 225

A.2.4 C++toVisualBasic6.0...... 225

B DOCUMENTATIONOFCGIF-VERSION2001 ...... 227

B.1 AddedDefinitionsForCGIFCategories ...... 227

B.2 LexicalCategories ...... 229

B.3 SyntacticCategories ...... 232

C DOCUMENTATIONOFSYSTEMS...... 245

xx C.1 pCG(CGPPrograms) ...... 245

C.2 CPEnvironment,CPE ...... 250

C.2.1 CPEModuleDocumentation ...... 250

C.2.1.1 CP_GraphReasoningOperations ...... 250

C.2.1.2 CP_GraphReasoningInternalOperations . . . . . 251

C.2.1.3 CGHash_Graph and CG_Graph Public Functions . 252

C.2.2 CPEClassDocumentation ...... 253

C.2.2.1 cp_graphClassReference...... 253

C.2.2.2 cghash_graphClassReference ...... 254

C.2.2.3 cg_graphClassReference...... 255

D DATACOLLECTEDFROMSAMPLETESTS ...... 257

D.1 Data Collected for Computing Each Experimental Results Test Set-53nodesinKBGraphs ...... 257

D.2 ErrorBarData-53nodesinKBGraphs...... 259

D.3 ValidationofCorrectProjection ...... 262

D.3.1 11nodesinKBgraphs-UniqueRelationResults ...... 264

D.3.2 13 nodes in KB graphs - Multi-Instances Relation Results. . .265

REFERENCES ...... 271

xxi LIST OF ALGORITHMS

5.1 Π is a General Projection from T to G ...... 105

5.2 Π Modified as an Injective Projection from T to G ...... 107

5.3 NotioProjection...... 113

5.4 SupportingProjectionRoutines ...... 118

5.5 SupportingProjectionRoutines(Cont1) ...... 119

5.6 SupportingProjectionRoutines(Cont2) ...... 120

5.7 NewProjection ...... 123

5.8 NewMaximalJoin ...... 127

xxii LIST OF TABLES

1.1 Brachman and Guarino Classification Levels and Main Fea- tures(Adaptedfrom[[45],Figure6])...... 9

3.1 Execution Times For Single Element with Set of Size n...... 78

4.1 RelatedProblemClasses...... 86

7.1 KBSingleRelationGraphFiles...... 172

7.2 Single Relation: Query Graph Size Run vs Number of Nodes inKBGraphs...... 178

7.3 Multi-Relation: Query Graph Size Run vs Number of Nodes inKBGraphs...... 180

7.4 Number of Projections Found: Query Graph Size vs KB Graph Size. . . 202

8.1 ComparisonofFourAlgorithms...... 203

C.1 CGPProgramFiles...... 245

D.1 AverageData Values for 53nodes KB with1000Graphs...... 257

D.2 AverageData Values for 53nodes KB with2500Graphs...... 258

D.3 AverageData Values for 53nodes KB with5000Graphs...... 259

D.4 Fast/SlowValuesfor53nodesKBwith1000Graphs...... 260

D.5 Fast/SlowValuesfor53nodesKBwith2500Graphs...... 261

xxiii D.6 Fast/SlowValuesfor53nodesKBwith5000Graphs...... 261

D.7 Error Bar Data Values for 53 nodesKB with1000Graphs...... 262

D.8 Error Bar Data Values for 53 nodesKB with2500Graphs...... 263

D.9 Error Bar Data Values for 53 nodesKB with5000Graphs...... 263

xxiv LIST OF FIGURES

1.1 LevelsofRepresentations...... 13

1.2 AbstractDataType(ADT)...... 16

1.3 Unifier U, Projs U −→ G1 and U −→ G2, Unification G is Found(Adaptedfrom[[136],Figure5])...... 22

2.1 TimeChart...... 38

2.2 LogicExample...... 42

2.3 Meaning Triangle for Symbols, Concepts, and Referents (Based on[[129],Figure1])...... 52

2.4 Peirce’sTriadicRelation...... 52

3.1 A Graph to Illustrate Concepts (Adapted from [[46],Figure2.9])...... 56

3.2 ADigraphthatisaBipartiteGraph...... 57

3.3 ATypeHierarchy...... 59

3.4 AnAnimalConceptHierarchy...... 61

3.5 Support Using a Relation Hierarchy (Based on [[5], Figure1])...... 63

3.6 BasicAbstractConceptualGraph...... 66

3.7 Basic Abstract Conceptual Graph in Digraph Format that isBipartite. . 67

3.8 BasicConceptualGraphwithActor...... 69

3.9 ActionFunctionForBasicActorGraph...... 69

xxv 3.10 BasicDetachedConceptualGraph...... 72

3.11 SimpleBasicConceptualGraph...... 73

3.12 SecondConceptTypeHierarchy...... 73

3.13 SimpleRestrictedBasicConceptualGraph...... 74

3.14 SimpleConceptualGraph(SCG)...... 75

4.1 Project (Mp (Q, H) = P)(Adaptedfrom[[92],Figure3])...... 82

4.2 Join(MJ (Q, H) = J)(Adaptedfrom[[92],Figure2])...... 84

4.3 QueryGraph...... 94

4.4 KBGraphwithTypeHierarchy...... 95

4.5 ProjectionResults...... 95

4.6 Joinof P1 and P2Graphs...... 97

4.7 CommonGraphofBasicGraphs...... 98

4.8 JoinofDetachedBasicandSimpleBasicGraphs...... 98

6.1 A KL-ONE of a Simple ‘Blocks-World’ Arch (Based on[[141],Figure1])...... 136

6.2 A SNePS Representation of “A on B on a Table” (Based on [[110],Figure12])...... 140

6.3 SNAP of “USC in LA, CA” (Based on [[72], Figure2])...... 142

6.4 PEIRCE Schema for Age (Based on [[119], Figure 6.5])...... 144

xxvi 6.5 Current CP Environment (From [[87], Figure 1, page 322])...... 152

7.1 Part1: ExampleofBlocksWorldBenchmarkFile...... 166

7.2 Part2: ExampleofBlocksWorldBenchmarkFile...... 167

7.3 Part3: ExampleofBlocksWorldBenchmarkFile...... 169

7.4 Part4: ExampleofBlocksWorldBenchmarkFile...... 170

7.5 APictureoftheBenchmarkFile...... 171

7.6 5nodesinKBof1000Graphs...... 185

7.7 5nodesinKBof2500Graphs...... 186

7.8 5nodesinKBof5000Graphs...... 186

7.9 11nodesinKBof1000Graphs...... 187

7.10 11nodesinKBof2500Graphs...... 188

7.11 11nodesinKBof5000Graphs...... 189

7.12 21nodesinKBof1000Graphs...... 190

7.13 21nodesinKBof2500Graphs...... 191

7.14 21nodesinKBof5000Graphs...... 191

7.15 31nodesinKBof1000Graphs...... 192

7.16 31nodesinKBof2500Graphs...... 193

7.17 31nodesinKBof5000Graphs...... 194

7.18 53nodesinKBof1000Graphs...... 195

xxvii 7.19 53nodesinKBof2500Graphs...... 196

7.20 53nodesinKBof5000Graphs...... 196

7.21 73nodesinKBof1000Graphs...... 197

7.22 73nodesinKBof2500Graphs...... 198

7.23 73nodesinKBof5000Graphs...... 199

8.1 IntervalTimeRelationships...... 212

8.2 ASimpleTimeMap...... 214

8.3 TimeChartforaBouncingBall...... 215

8.4 ConceptualSpaceDiagramforaBouncingBall...... 216

B.1 The Display Format for ‘A person is between a rock and a hard place.’ . 244

C.1 Part1:ExampleofCGPProgramfrompCG...... 246

C.2 Part2:ExampleofCGPProgramfrompCG...... 247

C.3 Part3:ExampleofCGPProgramfrompCG...... 248

C.4 Part4:ExampleofCGPProgramfrompCG...... 249

C.5 Inheritance Diagram for Class ‘cp_graph’...... 254

D.1 KBforVerifying3nodesQueryonto11nodesKB...... 265

D.2 Query Graph for Verifying 3 nodes Query onto 11 nodes KB...... 266

D.3 ProjectionVerifying3 nodesQuery onto11 nodesKB...... 266

xxviii D.4 Query Graph for Verifying 5 nodes Query onto 13 nodes KB...... 267

D.5 KBforVerifying5nodesQueryonto13nodesKB...... 268

D.6 Projections Verifying 5 nodes Query onto 13 nodes KB...... 269

xxix xxx CHAPTER 1

INTRODUCTION

Knowledge representation (KR) is used to store and retrieve meaningful infor- mation that can not directly be stored in a computer. However, this work develops a se- ries of levels of representation to transforms knowledge to a format that a computer can use to process this information. This transformed knowledge is saved using dynamic data structures that are suitable for the style of KR being implemented, and through the

KR the system manipulates the knowledge in the data using reasoning operations.

The data structure used together with the contents of the transformed knowl- edge, is called the knowledge base (KB). An algorithm and this associated data struc- ture makes up the reasoning operation, and the performance of this operation is de- pendent on the associated KB. In this work, the basic reasoning operations for knowl- edge management will be explored using a particular style of KR called Conceptual

Graphs (CGs). These operations, projection and maximal join, are the foundation for query/answer and hypothesis generation (abduction) systems, respectively. It will be shown that changing the reasoning operation’s algorithm and providing adequate data structures for them can improve the implementation of the operation for use in an intel- ligent system; therefore, making it faster and more efficient. Different algorithms and data structures execution times are analyzed over the most general form of CGs knowl-

1 edge base showing that flexible, fast and efficient operations can improve a higher level

system.

1.1 Knowledge and Knowledge Representation

Artificial Intelligence (AI) emerged in the 1960’s, and can be characterized as

the process of describing a problem in such a way that a machine could find a solution.

AI uses general reasoning techniques that develop along the lines believed used by an

intelligent human [12, 65, 106]. AI systems, therefore, needed to represent knowledge in the computer so that these reasoning techniques can be applied to the problem. First, consider what knowledge is and then how to represent it to the computer. According to the on-line dictionaries, knowledge is “the range of ones information or understand- ing; the circumstance or condition of apprehending truth or fact through reasoning; the fact or state of knowing; the perception of fact or truth; clear and certain mental ap- prehension” [60, 59]. However, there are two types of knowledge that human beings deal with every day, 1) knowledge that defines an idea or concept and their relation- ships [120], and 2) knowledge that gives understanding to time, space, or constraints in connection to these definitions [3, 26, 4]. So knowledge allows us to have a definition or understanding of the events and acts around us; knowledge allows us to describe our world. Second, for the computer the description of the problem that it is to solve has become known as knowledge representation. The representation consists of a set of syntactic and semantic rules to describe a problem domain [1]. Given that syntax stud-

2 ies the grammar rules for expressing the arrangement of symbols [119], and semantics

“is the scientific study of the relations between signs or symbols and what they denote or mean” [[139] page 41], knowledge representation, when abstractly described, may appear very informal and without concrete structure. This seems informal because the syntactic rules perform symbol manipulation, while the semantic rules define a map- ping that gives an interpretation of the representation in terms of another representation.

The term “semantics” (meaning) has come to be associated with many different types of processing of relationships. Two key relationship types (discussed as links in

[139]) are 1) structural links - which set up parts of propositions, and are definitional relationships within a network of concepts, and 2) assertion links - which assert some- thing about the world, and are basic relations that hold between concepts (i.e. part-of, a-kind-of, etc.). Structural links give definition of knowledge, where assertion links define facts. However, the processing of each of these links does not imply semantic meaning. Meaning can be defined in terms of axioms of basic propositions, or truth maintenance with correctness of assertion [82, 71, 123]. Semantic interpretations and procedural semantics are used in determining these meanings [139]. One misuse of the term semantics is in the area of semantic inferences [139]. Semantic inferences refer to inferences that cross the boundary between symbol and referent; however, all steps of the process are not semantic. If the step of the process involves parsing or processing a structural link then one now has a syntactic operation.

3 Many knowledge representations used by computers have been developed in- cluding semantic networks, logic, frames, and rule-based representations. Within AI, knowledge representations have been built into different working applications some of which are referred to as software information systems [134].

Knowledge representation systems are built to help find solutions to problems.

Many times the knowledge representation, KR, is broken into both a processing lan- guage, and a knowledge base, KB (see Section 1.2.3 for discussion), that has special data structures and operations that process the data. Some systems address only partic- ular problem domains, i.e. neural networks for pattern matching, while other systems attempt to process large amounts of diverse data, i.e. CYC KB from Cycorp [62]. Also, historically, many KRs and KBs work as standalone systems, while newer systems are being constructed as a group of modules each handling a specific aspect of the problem solving process [124]. Sometimes these are actually different modules within a single system [87]; others are designed as agents in a multi-agent environment [31].

1.1.1 Representation Levels

For Newell, intelligent systems (AI systems) need both a symbol and a knowl- edge level to perform reasoning [80]. The symbol level is where representations of knowledge would be processed. This is the level where the data structures are defined and acted upon. The knowledge level has no physical structure, only a general func- tional equation for knowledge. The symbol or program level is where physical structure

4 or environment is defined for the knowledge level. Within the symbol level, computa- tional mechanisms are defined for the environment of knowledge.

Some of the confusion in the field of knowledge representations, and in particu- lar semantic networks, is what rules, syntactic or semantic, are defined at each of these levels of representation. In many readings, it is not made clear what knowledge can be processed directly by the computer as machine code representation, and what must be transformed (mapped) into another representation level. It should be noted that, in general, abstract representations are too informal for machine processing. Therefore, most knowledge representations must be translated to a more concrete representation in order to be coded for the machine, and for execution and analysis to be performed by the computer.

Back in 1971, Shapiro [109] attempted to divide all representations defined by semantic networks into the following two levels:

• item - conceptual level of a semantic network.

• system - structural level of interconnection that ties the structured assertions of

facts represented in the network to items participating in those facts.

Levelization only looks at the actual semantic network represented on the page. It does not consider the semantics defined by the network or this knowledge representation would be coded for machine processing. The item level is concerned with the nodes

5 that appear in the network. These nodes are both concepts and relations, and have some definition represented within the semantic network. The system level, according to Shapiro, is attempting to define the links that are present between the nodes in the network.

In 1979, Brachman [11] tried to address the confusion about representations of knowledge by defining levels for different types of semantic network representations. In this way, Brachman was describing one representation in terms of another. When levels are defined by other levels and representations are defined by other representations, a confusion [12] is produced in the field of knowledge representations. When knowledge representations have this interpretation one can see it as a “levelization” of representa- tions. Historically, this levelization of representations has been looked at mainly when discussing the specific knowledge representation scheme known as semantic networks

[123], but it could be applied to most representation schemes. In his paper Brachman defines a “level” as a distinctive type of node or link. These are conceptual levels and a network’s notation can be analyzed in terms of any of these levels. The levels are the following:

1. implementation level - a network is only a data structure.

2. logical level - in a network, links represent logical relationships such as:

• ∀ (for all)

6 • ∃ (there exists)

• ¬ (not)

• ∨ (and)

• ∧ (or)

• → (implication)

• ≡ (if and only if)

3. epistemological level (Brachman’s missing level) - in a network, links give for-

mal structure to conceptual units and create a set of their interrelationships as

conceptual units.

4. conceptual level - in a network, links represent semantic or conceptual relation-

ships.

5. linguistic level - in a network, primitive elements are language-specific and links

stand for arbitrary relationships that exist in the world.

Brachman’s levels are defined types of network nodes and links. He states:

“It should be clear, then, that one of the main problems with many of

the older formalisms was their lack of a clear notion of what level they

were designed for” [[11] page 32].

7 For the five levels given above, Brachman saw the implementation as the lowest level; that is, the most basic type of network. This level only had data structures associated with it; there are really no semantics related to the network. The logical level is seen as needing the semantics of the basic logical operators. The conceptual level is similar to Shapiro’s item level discussed above. However, this level defines the semantics of the concepts being included within the level. The linguistic level is very abstract and is used to define, for the network, a level that has an open concept.

The epistemological level is seen by Brachman as a missing level, located be- tween the logical level and the conceptual level. Brachman then uses all these levels to define semantic networks in terms of cases (or roles) with slots (or sets of fillers), by looking at the types of links needed when processing the network. Currently, this type of representation is known as “frames” and in some circles is a knowledge representa- tion in its own right (see Section 6.1.1 for a discussion of a Frame system).

On evaluation of the main feature of the epistemological level, this author would place it between the conceptual and linguistic level. The reasoning behind the move is because it is similar to the system level discussed by Shapiro, and is very much con- cerned with the interrelationships of concepts and conceptual units. However, Guarino like Brachman also saw missing information in the levels, but argues instead of moving the level to add an ontological level to Brachman’s classification levels. The ontologi- cal level gives a foundation for the knowledge engineering process and depict a set of

8 features for the computational properties of each level (see Table 1.1) [45]. The onto- logical level in Guarino’s eyes should be introduced between the epistemological and conceptual level, being neutral with respect to the epistemological level, but not any epistemological formalism is necessarily adequate. For Brachman and Guarino, all the levels are processed as part of the knowledge representation.

Table 1.1: Brachman and Guarino Classification Levels and Main Features (Adapted from [[45], Figure 6]).

Level Primitive concepts Main feature Interpretation Implementation are pointers Concrete Objective Logical are predicates Formalization Arbitrary Epistemological are structuring primitives Structure Arbitrary Ontological satisfy meaning postulates Meaning Constrained Conceptual are cognitive primitives Conceptualization Subjective Linguistic are linguistic primitives Language Subjective

Brachman did not try to actually look at processing representation from a com- puter processing point of view. Then in 1982, Newell [80] began the redefinition of a

“level” from this new point of view. He defined a level in the following way:

“a level consists of a medium that is to be processed, components that

provide primitive processing, laws of composition that permit compo-

nents to be assembled into systems, and laws of behavior that determine

how system behavior depends on the component behavior and the struc-

9 ture of the system” [[80] page 92].

Newell referred to computer systems levels as going through the following bottom

(highest) to top (lowest) sequence:

• device level

• circuit level

• logic level (sub-levels - combinatorial and sequential circuits)

• register-transfer level and symbol (program) level

• configuration level

• knowledge level (new level)

As a third sibling just below the configuration level, Newell added a new level known as the knowledge level. For each of the levels, the following aspects need to be de-

fined: the medium, the components, the assembly of the components into a system, the composition laws and the behavior laws. In looking at knowledge and representation,

Newell’s symbol level and knowledge level are the most important. Each of these levels has been defined according to the above aspects. The medium for the symbol level is symbols and/or expressions. The components include memories and operations. The components are assembled into systems known as computer systems. In the aspect of

10 laws, composition is built on designation and/or association, while behavior is sequen-

tial interpretation. When looking at the knowledge level, the medium is knowledge.

The components are goals, actions and bodies (physical code). The composition laws

are a set of actions, a set of goals and a body, code, for the system that is referred to as

the agent. Lastly, the behavior law is the principle of rationality: “Actions are selected

to attain the agent’s goals”. This principle provides a general functional equation for

the knowledge medium to act on. However, the agent is very abstract and has no real

physical structure. The medium definition shows that knowledge is very open and has a

potential for generating an action. The knowledge level is an approximation and there

are no guarantees on the system’s behavior.

In 2002, the Object Management Group (OMG) dealing with relating legacy

systems to business modeling continued to defined a Model Driven Architecture (MDA)

[114, 112]. Within this architecture, the modeling space transforms into the code space, where the representation of the business process/rules transforms all the way to the code to be deployed. MDA-enabled tools do this transformation using a set of levels to move from the business model, through to an intermediate level that represents the aspects of the model that need to be coded, on to the actual generation of the code. Even though this is discussed in terms of an architecture instead of a knowledge representation, the transitioning of representation from an abstract level to a concrete code level still applies.

11 This work expands on Newell’s computer processing level idea, in particular

investigating what could be the possible computational mechanisms or physical struc-

tures of the symbol level (representations), while seeing level relationships more from

Brachman’s definition [11] point of view. This work defines level as:

“There is a level of processing of representations that sees the lowest

level to be a very abstract representation and then, as levels increase, the

representation becomes more concrete or machine like.”

The highest level of representation would then be processed directly by a computer

(see Figure 1.1) because it is the actual implementation that is compiled or interpreted as machine code.

When one discusses semantic networks, it is not clear what rules, syntactic or semantic, are defined at each of these processing levels of representation. In many read- ings, it is not indicated clearly what rules can or should be processed directly by the computer at each knowledge representation level. In general, abstract representations are too informal for machine processing and these need to be translated to another more concrete representation. Therefore, when looking at all forms of knowledge representa- tion translation to a more concrete representation allows coding, and later for execution and analysis to be performed with the computer.

Therefore, now consider representation in an AI system to be a series of these processing levels. Encapsulating the knowledge representation (KR) is the level of

12 ontological information [81], level 0. This level would be considered the knowledge level under Newell’s levels, part of the linguistic level for Brachman, and would be a relocation of Guarino’s ontological level. The information represented is not actually part of the structure of the domain knowledge and is the most abstract of all the levels of representation and implementation. In fact, it is more of a hierarchy of conceptual

Ontology

Knowlege Representation

Internal Representation Declaring ADT

Defining Representation Defining ADT

Storing Representation Implementing ADT

Level 4

Level 3

Level 2

Level 1

Level 0

Figure 1.1: Levels of Representations.

13 information than knowledge, so will be called “ontology” (see Section 2.1).

This ontology level contains more general information than what is found within the KR [61]; this might also contain any meta data needed to be stored for the knowl- edge representation. Within the ontology level any particular system may use an ab- stract hierarchy. These hierarchies define relationships between the conceptual units within the knowledge representation and information outside the KR, such as group membership. Therefore, defined hierarchies are to be considered part of level 0 in our representation levels.

KR will start processing at level 1. It should be noted, the semantics at this level are declarative and/or procedural in terms of its interpretation to a second repre- sentation, and therefore are not concrete. For Newell this level would be part of the symbol level, very close to the knowledge level. This is where the representation of the knowledge medium would begin. In Brachman’s levels this would encompass part of the conceptual level and all of the epistemological level. The epistemological level as defined clearly should be placed between the linguistic level and conceptual level as opposed to where Brachman placed it in his work [11].

The second level of representation, level 2, is an internal representation that could be viewed as a virtual machine. When comparing this level to the MDA architec- ture, this would be the platform-independent modeling level. Within the representation of KR, this is where the declaration of an abstract data type (ADT) is performed (see

14 Section 1.1.2). This syntactic representation is more formal and can be used in the def- inition and implementation of the declared ADT. The syntactic rules are concrete and define a mapping of symbols to operators. However, in order to implement the ADT declared by this level of representation, there must be a third level of definitions giving more structure to the representation.

This level, level 3, consists of the actual semantic definition of the ADT declared in level 2. The semantic rules are also concrete, and define a mapping of operations to functions. This representation level can be use to implement code for the computer to store and retrieve knowledge. It defines the algorithms to be performed, and theoretical time/space analysis can be performed on these algorithms. There is a strong connection between level 2 and level 3 because the concrete rules of the representation in level 2 will work over the algorithms of level 3 during the implementation of the data structures at the next level.

The innermost level of representation, level 4, is the actual implementation of the ADT definition and the implementation within the MDA architecture. This level is where all the data structures come together. It is at this level that a computer program- ming language (see Appendix A), such as, C, Prolog, Lisp, or a newly defined language is chosen [134]. This is also the level that the coding of data structures and algorithms will be performed, and any time/space analysis is done. This representation is the most concrete. Level 4 is the representation where the domain knowledge being worked on

15 will tie into the computer language used for the implementation.

1.1.2 Speed and Efficiency in Processing

An abstract data type (ADT) (see Figure 1.2) can be broken down into two

parts: 1) specification and 2) implementation. The specification, which is abstract,

includes the definition of data types, including their structure and values, and supporting

operations for those data types; this half of an ADT will be referred to as a data model.

ADT

SPECIFICATION IMPLEMENTATION abstract concrete

Set of ValuesSet of Operations Data Representation Algorithm Code Bodies

Figure 1.2: Abstract Data Type (ADT).

The data model provides a mapping from general knowledge to the abstract element of the ADT. The implementation, which is concrete, contains the data repre- sentation used by the algorithms and algorithmic code bodies of the operations. This is the second half of the ADT, and connects the knowledge to the algorithms being used for implementation.

Many issues come into play when defining an ADT for a processor. Probably

16 one of the most important is the efficiency of data structures when implementing the op- erations. When examining the data in terms of their time and space requirements, some data structures are better than others. Well-designed and well-defined data structures can certainly help in these respects, whereas poorly defined structures lead to ineffi- ciencies. The data structures directly affect the efficiency of both aspects of the ADT because the data model deals with the data types, while the data representation is part of the implementation. Modification of the data structures to give faster access times to the data types and representations can help in the efficiency of the algorithms being implemented for the operations to be performed. In this way the algorithms are being implemented to work towards their best possible execution time, where as, if the data structures are not optimized there will be a higher probability that the worst possible execution time is seen.

One important aspect in processing the underlying knowledge represented is how to communicate that knowledge to other systems and applications. This may affect the speed and efficiency of the implemented data storage.

1.2 Foundational Information

The following sections: subgraph isomorphism, unification and data/knowledge base give some foundational information as building blocks for working with the rea- soning operations that are the basis of this work.

17 1.2.1 Basis of Subgraph Isomorphism

When dealing with basic graph operations (see Section 3.1 for definitions), the efficiency of some of the algorithms have been investigated by many researchers. In looking at these algorithms and their efficiency, it is important to understand relation- ships between the different complexity classes that they may fall into:

P =⇒NP =⇒ NP-Complete =⇒NP-Hard

P: Problems that can be solved in polynomial time; NP: problems that are in NP, but not known to be either in P or NP-Complete; NP-Complete: problems that are reducible to

NP-Complete problems and are decision questions; and NP-Hard: problems that are at least as hard as an NP-Complete problem, but are not decision questions so can not be reduced to a known NP-Complete problem.

At the core of graph isomorphism (see Section 4.2.1 for full definition and ex- ample), the problem is to find a mapping, f , of graph G to graph H, such that G and

H are identical. Discovering if two graphs are isomorphic is not known to be an NP-

Complete or P problem [42]. It is defined to be in the complexity class between P and

NP given that P 6=NP. For this discussion, it will be called the class ‘NP’.

However, in most cases involving reasoning operations, given graphs G and H, a more important question, than if they are identical, is knowing whether a small pattern graph in G, a subgraph, is isomorphic to H. This is known as subgraph isomorphism.

18 Because this question can be restricted to the well known NP-complete problem of a

CLIQUE (see page 64 of Garey and Johnson [42]) by allowing only instances for which

H is a complete graph, it is known to be an NP-complete problem [42].

When sub-problems (“special cases”) of the subgraph isomorphic question are analyzed, some are found to be solvable in polynomial time. One of these sub-problems is subtree isomorphism [42]; this is when both G and H are trees (a graph, G1,is a tree if and only if every two distinct vertices of G1 are connected by a unique path of G1

[Theorem 3.4 on page 69 of [13]] ). A polynomial time algorithm for this sub-problem was shown by Reyner [103].

When labels are added to graphs, such as in bipartite graphs, these can be fac- tored into an isomorphic algorithm. The two label types can significantly speed up subgraph matching by allowing a pruning of some possibilities through separating the vertices into two groups.

Another sub-problem that produces a polynomial time algorithm besides just labeling the vertices, is to define that a class or group of the vertices may only have a specific number of edges [2]; such as in feature term graphs (see Section 3.4.3). This process is constraining the problem to bring it into polynomial space.

It should be mentioned that all of these sub-problems are concerning two graphs and are considering the running time based on the number of vertices in the graphs, n

= vertices [68].

19 1.2.2 Overview of Unification/Matching

As was discussed in Martelli [67], “unification was first introduced by Robinson

[104] as the central step of the inference rule called resolution.” Resolution became a single rule that could replace all the axioms and inference rules of first-order predicate calculus and be used in designing mechanical theorem provers. Unification can be expressed in the following way: Given two terms containing some variables, if there exist such, find the simplest substitution (assignment of some term to every variable) which makes the two terms equal. This substitution becomes a matching of the terms based on variables binding assignment and therefore is a unifier. There may be many ways to unify a pair of terms, but there will be at most one most general unifier, MGU; the other unifiers add extra bindings for sub-terms which are variables in the original terms. If a unifier, U, is the MGU of a set of expressions, then any other unifier, V, can be expressed as V = UW, where W is another substitution.

As discussed in Myaeng and Lopez-Lopez [77], graph matching has been rec- ognized as a central problem across many application areas. Many researchers have attempted to reduce the computational complexity by developing application-specific matches [78, 79]. As discussed above, while the general subgraph isomorphism prob- lem is known to be NP-complete, matching graphs containing conceptual information appears to be computationally tractable [77]. This is because conceptual graphs are connected (acyclic or cycles), bipartite (can be separated into two distinct groups) and

20 directed (finite), and for reasons given in Section 1.2.1 improves the general subgraph isomorphism problem. However, adding labels to the conceptual information [77] is essential in extending the plain graphs to be tractable.

If graph matching is reduced to a unification problem, then one should be able to check a set U of finite terms over a set of function symbols and a countable set of variables, where there is defined a finite set of pairs of terms, {< ui,vi > |i ∈ I}. The question is now to determine if there exists a substitution σ = {(x j → t j) | t j ∈ U, j > 0} such that σ(ui) = σ(vi) for i ∈ I [84]. However, most unification algorithms that can be done in linear time require that the graph is acyclic [84]. The reason the graphs can not contain cycles is because of the occurs check. This is a feature of implementations of unification which causes substitution to fail if the structure S being unified against contains the variable, V, being substituted [133]. If occurs check is not evaluated, then unsound inference could occur. Some implementations could go into a indefinite loop if a cycle appears in the structure; therefore, it is disallowed [84].

However, if the relationship is functional as in feature term graphs (see Section

3.4.3), then the unification can be performed even with cycles. Figure 1.3 from Willems paper on projection and unification with conceptual graphs[136], shows the unification of two projections into a single graph even when one graph contains a cycle. The Figure

1.3 will be explained more fully in Section 4.3.2.

21 U: [Person:*y] -> (Name) -> [Word:*x]. G1: [Man] -> (Name) -> [Word:*x]. (Child) -> [Girl:*y] -> (Name) -> [Word:*x]. G2: [Person:*y] -> (Name) -> [Word:*x `Smith']. G: [Man] -> (Name) -> [Word:*x `Smith']. (Child) -> [Girl:*y] -> (Name) -> [Word:*x `Smith'].

Figure 1.3: Unifier U, Projs U −→ G1 and U −→ G2, Unification G is Found (Adapted from [[136], Figure 5]).

1.2.3 vs Knowledge Base

As defined by Wikipedia: “a database is a collection of records stored in a computer.” These records are fields of data that contain information that is queried to answer questions and make decisions. This data is stored in files by records. A

“database management system” is used to access and query the fields and records of the data information. However, only can retrieve data that is explicitly stored in its structures. No information that is not factual can be retrieved.

A knowledge base is like a database, but it contains more than fields and records of data. Most knowledge bases also contain some kind of inference engine that uses reasoning operations over the structure of the data stored in the records to infer more information. The knowledge base as describe by Tappan, “operates over a framework of objects, properties, and relations towards the goal of supporting reasoning” [128].

22 The framework will be different depending on the “goal” of the domain of knowledge.

Knowledge bases like databases use a management system, but this system adds struc- tural information to the data (many times called meta-data) to help in the discovery of additional information. This meta-data gives organization to the data and allows the deduction of contextual knowledge from implied semantics of the inference engine

[74, 128]. The new contextual knowledge deduced by implicit semantics may or may not be factual, but can be presented to the user of the knowledge base to see if it should be added to the stored data information. Then this new data can be used to answer queries and help in making decisions.

In the future, it is hoped that more and more users will use knowledge bases over databases; however, the speed of retrieval from a knowledge base is slower than a database because of the added structural information and because of the built in in- ference engine. In this work, some of the advances that have been found in database algorithms and data structures will be applied to knowledge bases, in hope of improving some of those retrieval speed problems.

1.3 Organization of Dissertation

This work will begin by looking at ontology, knowledge and representation in

Chapter 2. This will include looking at how ontology can be processed, knowledge types and operation, and moving knowledge through different types of representations.

Above the outer most level of representation as seen in Figure 1.1 (one could describe

23 this level as being the top) would be a zero level of macro information. The informa-

tion represented is not actually part of the domain knowledge and is even more abstract

than the first level of representation. In fact, it is more of a hierarchy of conceptual

information than knowledge, so will be referred to as abstract hierarchies and discuss this zero level in Section 2.1.1. Different ADT representations of knowledge are used for implementing a semantic networks KR. These internal representations (see Section

2.3.2) use different formal approaches for syntactic processing, such as: 1) proposi- tional logic, 2) predicate calculus, and 3) graph grammar (with set theory). Some of the representations of knowledge used by semantic networks will be discussed in more detail in Section 2.3.1.3. However, within a propositional logic approach, propositions and logical operators use arbitrary conceptual units and expression links to define nodes and arcs with semantic descriptions and context. Predicate calculus is built on top of a propositional approach and also incorporates the use of predicates with quantifica- tion over variables. Using graph grammars or a set theoretic approach not only is built above the predicate calculus and quantified variables approach, but also uses primitive objects and actions with procedural operators to help define the semantics of the net- work. Graph grammars built on top of graph theory instead of set theory also give a visual representation which can be more expressive (see Section 3.1). Several different types of knowledge representations were originally investigated, but a detailed example

(see Section 2.3.1.3) will be given for semantic networks.

Chapter 3 gives definitions for several elements that will be used through out

24 the thesis, so that the reader has a frame of reference. Data structures that are rele- vant to the implementation of the problem are defined and their basic running times are examined. Reasoning operations, projection and maximal join, are then explained in

Chapter 4. Chapter 5 presents a new projection algorithm after explaining and analyz- ing the foundational projection algorithms; continuing on to theoretically analyze the new algorithm and show how it compares in a “typical case” with the other algorithms.

Some example environments/systems will be discussed in Chapter 6. KL-ONE,

SNePS, SNAP, PEIRCE, CoGITaNT, Amine, pCG and CPE, are all semantic network knowledge representation systems. In Chapter 6 each of these systems will be dis- cussed; evaluating the different ADT representations that are used in each case. Chapter

6 also gives an evaluation of possible data structures to used in implementation of the new algorithm given in Chapter 5. While implementation different ADTs, different data structures will be explored, and their efficiency with storage and speed of algorithmic execution will be analysis. This leads into the practical element of this dissertation.

An important aspect of the dissertation (the practical element) is presented in

Chapter 7 where the change in data structures, and algorithms are shown to effect speed, efficiency, flexibility and space needs. One can see how the change in the data struc- tures can effect the system speed and efficiency. Creating a system fast enough to retrieve and process thousands of graphs in a reasonable amount of time for simple query processing, therefore making it a usable system. As well as looking at tying the

25 algorithms to the data structures to improve the system’s functionality, the new system was designed with a flexible system in mind, such that, even a sub part of the system, a module, can connect and be used by another standalone application. Also, the new algorithm can find results that the baseline system was not able to process. These last two features are important contributions of this dissertation. Chapter 8 draws conclu- sions and describes future work. Different implementation languages were examined to find the fastest system; they are discussed in Appendix A. Next, Appendix B gives the actual CGIF format for the 2001 version. Appendix C gives documentation for how pCG program work and for the implementation of the CPE systems. Appendix D gives sample data of the averages and error spreads that are shown in the experimental results.

It also show verification that the CPE algorithm for projection gives correct results for both single and multiple projections.

26 CHAPTER 2

ONTOLOGY, KNOWLEDGE AND REPRESENTATION

For use later in declaring an ADT for an internal representation, this chapter begins by evaluating the interplay between hierarchies, relationships and operations. These elements have impact on how the higher levels of representation will be designed, de-

fined and implemented, and an understanding of each of these elements is necessary to clarify the different representation level interactions.

2.1 Ontology

Unlike the definition of a knowledge base given in the Tappan thesis [128] (dis- cussed in Section 1.2.3), the knowledge base is more than just its ontology, but also the higher levels of representation. Hierarchies and relationships are the abstract elements of ontology, and are more informal and open in their presentation of the representa- tion. One can see them as the building blocks of the ontology; therefore, they will be discussed in this section. Operations are more processing oriented and give a more concrete representation. These depict how information blocks are put together, so will be discussed later (see Section 2.2.2). Below are defined some of the elements of the knowledge that are needed to process the representation levels. As discussed in [80], the knowledge level gives the general functional expression that builds the notation at

27 the symbol or programming level. Evaluating the function of an ontology will reveal

some basic elements of this expression.

In order to look at the ontological elements of knowledge representation, let us

define some basic entities: object and act. An object is a thing, for example a subject of a sentence, and is commonly considered to be a physical object such as a ball, a book, a person, etc. An object has size, shape, mass, color, temperature, speed, etc. Basically, an object exists as a physical thing. An act is to perform, i.e. a verb in a sentence. An act has properties such as rate, acceleration, direction, orientation, etc. Each of these entities are important in understanding overall concepts about representations and both not only have related term information, but also exist in time and space.

As defined in [120, 123], ontology comes from the Greek words onto, being, and logos meaning the study of being or the basic categories for existence. An ontol- ogy is a synonym for the arrangement of a generalization hierarchy that classifies the categories or concept types of the hierarchy. The ontology also looks at the relation- ships, operations and constraints that are essential to help define the nature (knowledge) of our world or reality [106]. This general knowledge defines an informal list of con- cepts that are part of the domain. These concepts will be seen as terms (see Section

2.1.2 for more of a discussion on terms) within the ontology, and they may be defined by categories [106] in which they are members. The next section will begin by looking at different abstract hierarchies, and later tie the hierarchies to the categories of objects

28 within a domain.

2.1.1 Abstract Hierarchies

Abstract hierarchies can be used to define membership in various categories, or

give macro definitions about the categories. The actual structure of hierarchies will be

discussed in Section 3.2 within Chapter 3 containing definitions.

2.1.2 Relationships

Relationships can be divided into different categories: compositional, quantita- tive and/or qualitative [43]. Each of these relationships are involved in the construction

and propagation of information within sentences or expressions. Quantification deals

with fuzzy quantifiers like most, as they relate to the classical universal and existential quantifiers; qualification looks at fuzzy probabilities [43]. Sentences from a logic point of view may be simple predicates with an arity of n term arguments that return either a true or false value. Sentences also may be more complex and return more complex information within structures. First, looking at simple sentences and how they are built using compositional operations.

2.1.2.1 Compositional

Within simple sentences there are term arguments [65]. Terms may be of three different types: constant symbols, variable symbols and function expressions. The con- stant symbols are symbols that do not change; two known constant symbols are the

29 truth symbols, true and false. These symbols may also be things such as numbers, 1, 2, etc. Each of these symbols has a known interpretation as specific objects or acts in the world. They are also members of a specific category within the defined world. Variable symbols are used to designate general classes of objects or properties in the world [65].

Variables are not constant, and as seen later, they may be substituted. Function sym- bols have an attached arity indicating the number of elements of the domain mapped onto each element of the range. A function expression consists of a function symbol followed by the number of terms indicated in the function symbol’s arity.

The terms are built into sentences using connectives. There are different types of

Boolean connectives that are used when mathematically working with sets or equations, for example: conjunction, disjunction, negation, implication and equivalence. These

Boolean connectives can be used to create sentences or composite sentences by treating the connectives as compositional relationships (or functions). Each of these connectives operate as follows: the Conjunction (“and”) operator forms a ‘collective’1 set, where

each member of the set is “anded” with the other members of the set; the Disjunction

(“or”) operator forms a ‘distributive’2 set, where each member is “ored” with the other members of the set; the Negation (“not”) operator forms an ‘opposite’ set, where each member is the opposite of what it is in the set; the Implication (“if A then B”) operator

1Refers to a generic assemblage of items.

2Refers to a generic bag of items.

30 is used in an equation, where the truth of A causes the truth value assignment to be concluded from the truth value of B; otherwise, the assignment of the implication is always true; and the Equivalence (“equals”) operator forms a ‘identity’ set, where all members within the set contain the following properties: 1) Reflexivity: a ≡ a; 2)

Symmetry: if a ≡ bthenb ≡ a; and 3) Transitivity: if a ≡ band b ≡ cthena ≡ c.

These simple and complex sentences or expressions can be expanded to use variable symbols and function expressions. This introduces more complex relation- ships where properties are applied to a whole set of terms or a collection. These rela- tionships may be either quantitative or qualitative in nature. A quantitative relationship propagates information by performing quantification of values and variables to provide an interpretation or meaning for a symbol or expression [65]; a qualitative relationship is based on qualitative physics and propagates through both moments in time when acts occur, and locations of objects in space [47]. Each of these types of relationship will be used in the next section when discussing constraints. Next these relationships will be examined more closely.

2.1.2.2 Quantification

Quantification allows the substitution of variables with numeric values, and these values using numbers and arithmetic operations can be performed within a rea- soning process [92]. When there is a fixed number of constant symbols with only a

finite number of substitution possibilities, a truth value assignment can be determined

31 as either true or false, for each substitution of the quantification. These truth value as- signments are then collected into a truth table giving an interpretation for expressions or sentences over a domain. A truth table can be used to exhaustively test all possible assignments of member values [12].

However, a more common use of a quantitative relationship is with variables.

Variables may be quantified in two ways: 1) universally or 2) existentially. A variable is universally quantified when in a sentence it is true that all constants intended in the interpretation can be substituted for the variable. The symbol indicating this universal quantifier is ∀. Universal quantification introduces problems in computing truth value assignments for a complete sentence. There now becomes an infinite number of possi- ble substitutions; therefore, making the creation of a complete truth table impossible.

This exhaustive testing of all substitutions computationally is an undecidable problem

[65]. At the same time, the quantitative relationships (or functions) allow a larger map- ping of information in a knowledge-base which can be more powerful as seen later.

The second quantifier for variables is existentially quantified. In this case at least one substitution is true for the variable across the interpretation of the domain.

The symbol for an existential quantifier is ∃. Existential quantification is no easier to compute than universal quantification; this is because of the infinite number of possi- bilities.

For quantification of variables, the scope of the quantified variable is indicated

32 by enclosing the quantified occurrences of the variable in parentheses. Quantification

allows one to look at infinite possibilities within one instance in time and space; when

a time or space continuum is introduced then qualitative relationships (or functional

mappings) are needed.

2.1.2.3 Qualitative

Qualitative relationships are used in qualitative physics. This area of knowledge representation is concerned with constructing a logical, non-numeric theory of objects and acts [106]. This theory defines relationships that process operators of time and space. In order to define these relationships, one must define what entities and relation- ships are relevant to time and space. An entity in the domain of time is called a moment or instant [47], while an entity in a spatial domain is a location. A relationship over time is an interval and over space is a region. When these aspects of entities are related to objects and acts within the world and to each other interesting things start to occur.

When one looks at the properties of an object for the current point in time and space they are said to have a state [47]. If that object is in relationship with other objects, a partial ordering of states can be produced. When the properties of an act for the current point in time and space are examined there is a process; that act in relationship with other acts gives a partial ordering of processes.

When one now starts to look at the interconnection between objects and acts, if a set of objects are in a spatial relationship for a single moment in time they are said to

33 have a schematic. As these relationships are looked at over a set of moments in time,

one gets a partial ordering of schematics. Also, when there is a set of acts in a temporal relationship for a single region in space, it has a chronicle. Extending this to a set of regions, one gets a partial ordering of chronicles.

When qualitative relationships are executed as functions the above defined par- tial orderings are processed. For example, a ball in one state may be at the top of a bounce, and in the next state at the bottom. However, the interesting part is that when moving between the two states the ball also moved from point A to point B which was a forward direction. Here is seen a time and space progression being performed within an operation. There are many more temporal and spatial qualitative functions

(see Hartley’s work [47] for a much more complete list).

Now, if one looks at a set of objects that are participating in a single act, this is said to be an event. If one looks at a set of acts with a single object, this can be described as an experience. Both an event and an experience are atomic units to the single act or object, respectively. It should be noted that events are time-independent and experiences are space-independent [47]. Unlike the functions defined above, these do not produce a partial ordering across entities. This is because there is only one act or object present. However, these events and experiences can be linked together in a set to form related knowledge structures. These structures are similar to standard case relations already available within knowledge representations [47].

34 Temporal and spatial operators allow one to discuss time and space relation- ships. However, the drawback is how to represent the knowledge and the time and space it takes to process the partial orderings.

2.2 Knowledge

First, considering closer the actual definition of knowledge and learning from

Piaget [98],

“in each act of understanding, some degree of invention is involved; in

development, the passage from one stage to the next is always charac-

terized by the formation of new structures which did not exist before,

either in the external world or in the subject’s mind” [[98] page 70] and the types of knowledge that make up this definition as discussed previously in

Section 1.1. If one learns information and then keeps it in their mind so that they can understand it, obviously that information, or knowledge, must be stored. However, what representation it is actually stored in is still a mystery. Whatever the representation, the mind is able to recall the information at will.

Second, ontological information can add additional structure to the representa- tion by placing a macro level of knowledge for the defined world, outside of the types of conceptual knowledge that has been defined. This additional structure can work

35 through knowledge operations to use the representation levels previously defined (see

Section 1.1.1) to store information in a knowledge base.

2.2.1 Types

Therefore, to examine more closely what representation the mind might use to store knowledge, the types of knowledge will be discussed. Knowledge can be thought of as Declarative or Procedural; the following sections will define what is entailed in each type.

2.2.1.1 Declarative Knowledge

The first type of knowledge is known as declarative knowledge, describing a collection of definitions about the world. Throughout history, language has been used to describe knowledge and conceptual relationships. In many instances it is easier to describe in words definitions of concepts and their relationships,for example: a cat is an animal with four legs and a long tail. In this example, a definition is being performed to give attributes and characteristics to a cat; that is, an attribute of four legs and a characteristic of a long tail. There could also be given domain information, that is, a cat is an animal, but this will be discussed further in ontologies. It can be noted that in the definition of a cat, it has been declared that this animal has four legs and a tail. In some other world of declarative knowledge, a cat may have only three legs and no tail.

36 2.2.1.2 Procedural Knowledge

The second type of knowledge is procedural knowledge, describing the tempo-

ral, spatial, and constraint aspects for the above definitions. It is believed that there is a

duality between these two types of knowledge [107], but one type is inadequate without

the other. If the simple example given above is expanded to include a location for the

cat, it can now be defined that: a cat is an animal with four legs and a tail and the cat

is located on a mat. Within this expanded example both types of knowledge are being

used: 1) cats have attributes and characteristics, and 2) spatially, the cat is located on a

mat. If then this statement is slightly changed to add that a cat with four legs and a tail

sat on a mat, not only is declared definitional information about the cat, four legs and a tail, but spatial information of the location, a mat, and temporal information, sat (this moment in time). Sometimes, written language is not an easy tool to use to describe all knowledge information. If the example is changed to add one more temporal wrinkle:

A rat sat on the mat before a cat sat on the mat. Assuming that the cat is the one already defined in our world knowledge, there can now be two interpretations of this idea: 1) the rat is sitting in front of the cat on the mat at the same time, or 2) the rat sat on the mat prior to the time the cat sat on the mat. Here a picture or a time diagram (see

Figure 2.1 from [95] on page 176), can help display the correct interpretation. Again, both types of knowledge are being used, 1) cats located on mats; rats located on mats;

2) spatially, the cat on the mat and the rat on the mat; and 3) temporally, the rat on the

37 mat before the cat on the mat or the rat and cat on the mat at the same time.

Figure 2.1: Time Chart.

2.2.2 Operations

Besides using the relationships just defined above, an example system uses dif- ferent types of operations to process the internal knowledge being stored in the knowl- edge base and the hierarchies of ontology information being applied to the data struc- tures.

2.2.2.1 Terminological

Terminological operations work over terms or concepts and are designed to fa- cilitate the expression of definitions [66]. Some common operations are: subsumption, inheritance, completion and coherence. Let us look at each operation briefly [141].

Subsumption, as defined in Section 3.2, is when a term is subsumed by another term.

When all appropriate subsumption relations are identified for a given set of terms, then the terms are said to be classified. Inheritance is the operation of identifying the appro- priate subsumption relations, and completion is the process of identifying and recording

38 all conditions that should be applied to a term so it can be classified. Lastly, the coher-

ence operation is finding a model in which the term’s denotation is not empty. These

terminological operations work above the knowledge base when trying to actually pro-

cess rules or predicates.

2.2.2.2 Assertional

Assertional operations try to state constraints or facts that apply to a particular

domain or world [66]. The most common assertional operation is realization. Realiza-

tion is the process of identifying all concepts that have been instantiated [141]. Once

a concept has been instantiated, it can be entered into the domain as a fact. One very

important aspect of this operation is whether or not the closed world assumption, this

is where only definitions or facts defined within the world can be operated on, is being

made [141]. Most systems no longer make this assumption.

2.2.2.3 Generalization

The simplification operator generalizes an entity by taking it to a more general form [119]. This generalization sometimes removes part of a conceptual idea that has more information to take the idea to a more general form. When generalization is performed on hierarchies the concepts are moved upward in the hierarchy from the bottom to the top. The top (⊤) is the most generalized.

39 2.2.2.4 Specialization

The join operator (see Section 4.1.2) allows the specialization of entities by performing unification (see Section 1.2.2) between two entities. When unification is performed, a substitution is made in one entity by another [106]. If it is a concept that is being unified, then the concept may go from a general form to a more specific one.

Specialization on hierarchies moves the concepts from the top toward the bottom. The bottom (⊥) is the most specialized.

2.3 Representation

As was discussed in Section 1.1.1 of the introduction, conceptual ideas can be transformed through representation levels to a form that can be processed by a com- puter. In this section, examples will be discussed from level 1 and level 2 of those representation levels.

2.3.1 Knowledge

Level 1 from Figure 1.1 discussed the concept of knowledge representation

(KR) at a beginning syntactic level. This section will present three KRs: Logic, Rule-

Base, and Semantic Network, and their basic representation of knowledge.

40 2.3.1.1 Logic

Logic as a knowledge representation looks at representation of knowledge in

two parts: implicit and explicit [63]. The implicit part allows knowledge to be repre-

sented within a closed world assumption; that is, it contains a set of sentences of the

form (s 6= t) for any two terms in the universe that have not already been explicitly

defined. This allows the user of the world to know what is “not true” for the universe.

This part of the knowledge, when speaking about the processing level of knowledge

representation, relates to the ontology (as discussed in Section 2.1) of the Logic KR.

The explicit part is a collection of first-order sentences (a subset are called Horn

clauses) of the form:

∀x1 ···xn[P1 ∧···∧ Pm ⊃ Pm+1] where m ≥ 0 and each Pi is atomic.

If m = 0 and the arguments to the predicates, P, are all constants then there

is nothing more than a relational database of facts. However, this may be a first or-

der logic, FOL (see Section 3.3) sentence. These first-order sentences define what

is “known” about the universe, and give the syntax of the Logic KR. Logic must be

mapped to the next machine processing level using some ADT (see Section 2.3.2).

The computational part of Logic KR is the execution, inference, of the logic system. This can be seen as a form of semantics for logic. The inference engine also uses an ADT declaration to interface to a machine representation. See Figure 2.2 for a simple example of a knowledge representation (KR) and internal representation (IR)

41 that uses logic. Within this example, at the KR level one can see a FOL (see Section

3.3 for definition) sentence where there is a red block on top of a yellow block which is on the table. When translating this to the IR level, the KR single sentence translates into 13 triples of relationship information.

KR level

( Table( table-1 ) ∧ (( Block( block-1 ) ∧ Color( Yellow ) ) ∧ Supported-by( block-1, table-1 )) ∧ (( Block( block-2 ) ∧ Color( Red ) ) ∧ Supported-by( block-2, block-1 )) )

IR level

(inst table-1 table)

(inst block-1 block) (color yellow block-1) (and and1 block-1 yellow) (supported-by sup1 block-1 table-1) (and and2 and1 sup1)

(inst block-2 block) (color red block-2) (and and3 block-2 red) (supported-by sup2 block-2 block-1) (and and4 and3 sup2)

(and and5 and2 and4) (and and6 and5 table-1)

Figure 2.2: Logic Example.

42 2.3.1.2 Rule-Bases

Rule-Base knowledge representations are procedural schemes that represent knowledge as a set of instructions for solving a problem. The instructions are in the form of if ... then ... rule and may be interpreted as a procedure for solving a goal in a problem domain. At the heart of the system is a knowledge base that holds the instructions. An inference engine takes the rules (knowledge) from the knowledge base and applies them in the correct order to produce a solution (goal) to an actual problem.

This is a recognize-act control cycle, and procedures that implement the control cycle are separate from the rules in the knowledge base. The procedures can be seen as the semantics of the system and they produce a very simple ADT for operation by the infer- ence engine for processing the rules. Rule-Base systems are the basis of expert systems and an expert provides the rules for the system. These systems focus on a narrow set of problems in which knowledge is extracted from a specialist in this area.

2.3.1.3 Semantic Network

A semantic network is an example of a knowledge representation that is dis- played as a discrete graphical structure of vertices and arcs [61]. Within the graphical structure, the vertices are called nodes and may be displayed as circles or boxes. The arcs are called links and are displayed as lines with arrows between the nodes. The nodes are related to each other through their links, where the links are assigned a one- to-one correspondence with a conceptual meaning defining the relationship [108].

43 The nodes are sometimes called conceptual units and may be seen as objects within the network. These objects may be of many different types including entities, attributes, events or even states. Syntactically, each object is just a symbol (normally text within a box or circle), in the graphical structure. On top of the semantic network, abstract hierarchies are organized according to levels of generalization for the concep- tual units. These hierarchies were discussed in Section 2.1.1. The links of the network form relational connections between the conceptual units, such that, the valence (or par- ity) of the relational connection is the number of units that are connected to a particular unit with a link. In a semantic network links are usually dyadic (binary) connecting two conceptual units together.

The syntax of the semantic network is a set of the grammatical rules that express how the symbols of the network can be combined within the graphical structure. In this way, the syntax of the network is very abstract. The semantics of the network is the abstract meaning of the links and their nodes. Because the semantic network’s representation, in the abstract, appears as informal, its semantics is an interpretation of the objects displayed within the graphical structure. This creates a transformation from one representation level to the next. Therefore, the interpretation of the network defines a modeling of the relational connections between conceptual units using an abstract and generative form of semantics, and has the characteristic notion of a set of links which connect individual conceptual units, referred to as facts, into a total basic network structure. In this way, the representation of knowledge or implementation of

44 the knowledge representation is at a different level of representation than the semantic network.

Elements of semantic networks appeared as early as the late nineteenth cen- tury in works by Alfred Kempe in 1886 and Charles Peirce in 1897 [86, 61, 37, 121].

Both gentlemen used a graphical structure of conceptual units to diagram meaning [86].

However, semantic networks were not introduced for use with computers until 1956 by

R.H. Richens in a system called ’NUDE’ [53]. This system was used for machine translation of Russian to English by going through a neutral conceptual language. This procedure actually operates over the innermost level of representation produced by the translation of the semantic network to the storage representation of knowledge. The actual natural language, Russian, is mapped onto a semantic network knowledge repre- sentation for natural language processing. This KR is then mapped onto an internal rep- resentation, which is really a new language declaration for a new conceptual language.

It uses the nodes and arcs within the semantic network to map to the new language.

The virtual machine internal representation, at level 2 (as seen in Figure 1.1), is not the semantics of the network, but the representation produced by applying the semantics of the network through a mapping to the new representation language. After the semantic network has been translated to an internal representation, the new language is mapped through the definition level with the new data structures onto the implementation of the algorithm for processing that data structure, so the innermost (highest) level can be executed and perform reasoning operations and analysis.

45 Examples of applications where semantic networks have been used are natural language understanding, planning, machine translation, deductive databases, and ex- pert systems [61]. However, in order for a semantic network to be a good knowledge representation for an application, the network must be interpreted in terms of a repre- sentation that algorithmically or procedurally can process the network’s meaning and perform reasoning. Interpretation requires that the representation be translated from the abstract to a more concrete representation. For any semantic network, different representations of knowledge, levels 2 - 4 (as seen in Figure 1.1), may be used for implementing the storage representation. These representations use different formal approaches for syntactic processing, such as: 1) propositional logic, 2) predicate cal- culus, and 3) graph grammar (set theory). Some of the representations of knowledge used by semantic networks will be discussed in more detail in Section 2.3.2. However, within a propositional logic approach, propositions, logical operators, and abstract hi- erarchies use arbitrary conceptual units and expression links to define nodes and arcs with semantic descriptions and context. Predicate calculus is built on top of a propo- sitional approach and also incorporates the use of predicates. A graph grammar or set theoretic approach not only is built above the predicate calculus approach, but also uses primitive entities and actions with procedural operators to help define the semantics of the network.

A specific type of semantic network, or a knowledge representation in its own right, is a frame representation. The frame is a named data object with an unbounded

46 collection of named slots (attributes or fields) which can have values [61]. The value to a slot in a frame can be a pointer to another frame, thereby producing a network of frames (therefore the representations name). A frame is an object represented by a node with a set of slots; a slot is information about the object and may be represented by a pointer to another node, restrictions on attribute values, by a pointer to an attached procedure for calculating a value, an actual simple value, or a set of values [63]. Frames collect explicit information about an individual object at a node level.

2.3.2 Internal Representation

Each of these internal representations is at the next higher level than the KR used (see Level 2 from Figure 1.1). Even though each internal representation will be discussed as being the ADT for a “best fit” with a particular knowledge representa- tion, any of these ADTs could be used with any knowledge representation, as stated in

Brachman’s work [11] when he was discussing semantic networks.

2.3.2.1 Predicate Calculus

Elements within the syntactic representation of the semantic network knowl- edge representation can be grouped into structures. These structures have predefined reductions (meaning) to a ADT. The structures are propositions, predicates, logical op- erators and procedural operators. Let us look at each of these structures and how they affect building an ADT.

47 Propositions will be discussed in Section 2.3.2.2. From that discussion it can be seen that predicates not only generalize propositions, but also define relationships.

They can be separated into their intentional and extensional characteristics [139]. The extension of the predicate refers to the set of things that this concept denotes; while the intention of the predicate defines the meaning for the concept. Both characteristics define the semantics of the concept. However, the intention of a predicate gives an abstract function which can be assigned to the extension of the predicate, the concept itself.

Logical operators use model-theoretic semantics with the basic operators being conjunction, disjunction, negation, and existential and universal quantifiers. How each of the operators are used in relationships was looked at more closely in Section 2.1.2.

So, let us consider the question: “what is model-theoretic semantics?” The word model has multiple meanings, three of them being [119]:

• Simulation - simplified system that simulates some significant characteristics of

some other system.

• Realization - a set of axioms as a data structure in which these axioms are true.

• Prototype - an ideal or standard for a system.

Theory on the other hand is a proved hypothesis. Therefore, logical operators are mod- eling a proved hypothesis by the conjunction of true propositions containing existing

48 objects and conjoined predicates and relations [61]. Systems that exclusively use only

logical operators answer reasoning questions by using theorem proving in FOL (see

Section 3.3).

Procedural operators define procedures that actively interpret the semantic net-

work and operate over it [119]. Use of these operators in defining the semantics of the

network sets up a controversy, procedural vs declarative.

The procedural semantics assume that knowledge of the world or meaning, can

be represented by knowing how a concept operates; declarative semantics assumes that knowledge can be represented by knowing that a concept is defined by a collection of

facts [119]. This controversy will appear throughout the discussion of the systems in

Section 6.1.

Each of these structures will need to be examined in building an ADT for the in-

ternal representation. Figure 2.2 shows an example of a logic knowledge representation

that is mapped to an internal predicate calculus representation. This internal represen-

tation can then be used to help define an ADT for processing the structures. As one can

see the connectives get turned into predicates and are called to instantiate the objects

from the knowledge representation level.

2.3.2.2 IF..THEN

For some knowledge representations, in particular rule-base representation, the

data structure that just consists of the propositional query can be used. A proposition

49 comes from mathematical logic and is a simple statement which may have a truth value,

TRUE or FALSE, associated with it. These simple statements can generate, manipulate, and/or relate concepts through logical functions. Propositions are always intentional, define concepts, and do not consider relationships or dependencies between concepts.

They also only use quantitative relationships which allow the application of heuristics to reduce the search space, but do not function in the areas of time or space.

The IF..THEN rule construct can be defined directly in most programming lan- guages and makes for a very simple ADT for defining the inference engine. However because of the simplicity of the data structure only simple questions can be answered.

It is for this reason that this internal representation is not used for knowledge represen- tations such as logic or semantic networks.

2.3.2.3 Conceptual Structures

Even though there are multiple semantic network representations available, the representation that has flexibility in its use of the above approaches is conceptual struc- tures. Conceptual Structures, CS, are a logic based representation of C.S. Peirce’s exis- tential graphs [86] developed by John Sowa[119]. Graphical that are built out of the logic building blocks of conceptual structures are conceptual graphs, CG (see

Section 3.4).

Semantic networks play a very important role in the use of conceptual graphs.

Sowa claims that “a conceptual graph has no meaning in isolation. Only through the

50 semantic network are its concepts and relations linked to context, language, emotion, and perception”. Such concepts as TOMATO or DOG are easier to understand and define than abstract concepts such as PEACE or JUSTICE. In order to capture the meaning of abstract concepts, these concepts must be hooked up through a vast network of relationships which will eventually link them to concrete concepts. The philosopher

A. R. White [135] defined the meaning of a concept as follows:

“To discover the logical relations of a concept is to discover the nature

of that concept. For concepts are, in this respect, like points; they have

no quality except position. Just as the identity of a point is given by its

coordinates, that is, its position relative to other points and ultimately to

a set of axes, so the identity of a concept is given by its position relative

to other concepts and ultimately to the kind of material to which it is

intensively applicable. A concept is that which is logically related to

others just as a point is that which is spatially related to others” [135].

In Tepfenhart’s paper [129], he stated that the conceptual grounding for conceptual structures is based on the triangle meaning for the relationships between symbols, con- cepts, and referents (see Figure 2.3).

Peirce [86] actually had a different relationship triangle (see Figure 2.4); it aligns its sign relation with Tepfenhart’s symbol, while the concept stayed the same.

For Tepfenhart, a referent was the instantiation of the concept in the triangle meaning,

51 Figure 2.3: Meaning Triangle for Symbols, Concepts, and Referents (Based on [[129], Figure 1]).

while Peirce saw the object as the instantiation of the concept. This makes Peirce’s triangle more general to all conceptual logics; not just conceptual structures. Concep- tual Structures (CS) are the development of human “concepts” in such a way that they can be processed by machines. The structures give meaning in the computer for the conceptual ideas [119].

Figure 2.4: Peirce’s Triadic Relation.

52 Going back to language as a mechanism for communicating human concepts;

over time, the foundation of conceptual structures within knowledge representation has

changed. Chomsky maintained that traditional grammars, which are syntactic, carried

the structure needed to process sentences in computers, and that each sentence was a

single structure [16]. However, he clarified in 1965 that these structures were an ab-

stract theory of competence, which is an idealized knowledge of language, as opposed

to a performance structure, which is the actual use of natural language [17]. Jackendoff

maintains that the meaning of a sentence, which is semantic, in natural human language actually has separate semantic structures for each element of the sentence [52].

John Sowa took both of these ideas and blended them together to develop a graph diagrammatic representation for the structure called Conceptual Graphs, CG

[119, 121]. Section 3.4 defines conceptual graphs. Later, Bernhard Ganter and Rudolf

Wille realized that they had developed a similar, but simpler lattice representation for conceptual structures that had a mathematical foundation called Formal Concept Anal-

ysis, FCA [41]. FCA is a mathematical formalism [41] that handles concepts with

attributes in a lattice format. These mathematical structures can be traversed as in a

type hierarchy to discover super and sub-type relationships between concepts. They

can also be easily stored in a relational database [6, 7]. Their latest research has been in

the area of adding temporal attributes to the lattices to handle time relationships [138].

53 54 CHAPTER 3

DEFINITIONS

This chapter gives definitions for several concepts that will be developed though out

this work. Theses definitions are complete for the knowledge needed for this work, but

are not a complete coverage of all of these areas of study.

3.1 Graph Theory

Graph theory, unlike logic, is not built on sentences of predicates that evalu-

ate to TRUE and FALSE, but is based on the visual elements of drawings. A graph,

G = {V,E} where V is a finite nonempty set of points (or vertices), and E is the set of all the links (or edges) between adjacent points [46]. An edge, x = {u,v} where x

is said to join vertices u and v [46]. The example in Figure 3.1 is a graph G where

V = {v1,v2,v3,v4,v5} and E = {{v1,v2},{v1,v3},{v2,v3},{v2,v4},{v3,v4},{v3,v5},

{v4,v5}}. However, even though graphs must have at least one vertex, they do not

have to have any edges. Graphs are very useful for discovering if a finite number of

objects, vertices, are in relationship, edges, with each other. The next sections are graph

theory definitions that are important within the details discussed later in this work.

55 v2 v4

G:

v1 v3 v5

Figure 3.1: A Graph to Illustrate Graph Theory Concepts (Adapted from [[46], Figure 2.9]).

3.1.1 Digraph and Bigraph

A directed graph (or digraph), H = {V,E} where V is a finite nonempty set of vertices, and E is set of ordered pairs of all directed edges between adjacent vertices

[13, 46]. For these edges, a directed pair x =(u,v) joins u and v in an irreflexive binary

relation and the direction is from u to v. When drawing the edges in a graph there

is an arrow to indicate the direction [13, 46]. As can be seen in Figure 3.2, graph H

contains V = {v1,v2,v3,u1,u2} and E = {(v1,u1),(u1,v2),(v2,u2),(u2,v3)}. For each

of the pairs in E, the arrow is in the direction from the first vertex to touching on the

second vertex, i.e. v1 to u1 where the arrow is touching u1.

A bipartite graph, B = {V,E} is a graph with the distinction that all vertices in

V can be divided into two subsets V1 and V2 , or colors, such that every edge, E, of

graph B connects an element of V1 to an element of V2 and there are no edges between

the vertices in the subsets (for example with the same color) [46]. If Figure 3.2 is again

56 v u 1 1

H:

v u 2 2

v 3

Figure 3.2: A Digraph that is a Bipartite Graph.

examined, it is seen that besides being a digraph this is also a bigraph (bipartite graph)

where V1 = {v1,v2,v3} and V2 = {u1,u2}.

3.1.2 Walk, Path and Connected

A walk of a graph is an alternating sequence of vertices and edges where the

beginning and ending of the walk is at a vertex, and the edges are incident on two

vertices [46]. In the previous example, Figure 3.1, a simple walk has vertices v1v2v3v4v5

with the edges comprising the following order: {v1,v2}{v2,v3}{v3,v4}{v4,v5}. This

walk does not include all the edges, but does include all the vertices. In the example

just stated, since all edges are distinct, it is called a trail. There are other kinds of

walks, such as when the first and last nodes are the same, then the walk is a cycle [46].

57 Using Figure 3.1 again, a cycle would be vertices v1v2v4v5v3v1 with edge ordering

{v1,v2}{v2,v4}{v4,v5}{v5,v3}{v3,v1}. This second example is also a nontrivial trail that is closed. This kind of trail is referred to as a circuit [13]. Another example of a trail, that is a cycle, would be v2v4v2 with edges {v2,v4}{v4,v2} (because the edges are not directed). However, this is not a circuit because it is a trivial trail.

A path for a graph, in graph theory terms [46], is a walk in which all the vertices on the walk are distinct except in one special case. If the path creates a cycle then the path will come back to the starting vertex. Remember the first example encountered in this section, Figure 3.1, was a path, but not a cycle.

For a graph to be connected every pair of vertices is joined by a path [46]. Both

Figures 3.1 and 3.2 are connected graphs.

3.2 Types and Hierarchies

A type is a label that represents an idea with a underlying perceived object or entity; these are type labels. These entities within the world are in relationship with each other. These entities can be axiomatic, that is primitive and not made up of any other defining entities, or the entities can be defined meaning that they are built of more than one axiom [125, 90]. The relationships can be seen as a hierarchy and can be broken down into two functions:

58 1. The function ctype maps a finite set of vertices called concept nodes onto a set,

TC, of type labels. Each type label in TC is specified as axiomatic or defined.

Examples from Figure 3.4 of a type labels are Bird,∼ Cat,∼ Dog,∼ etc.

2. The function rtype maps a finite set of vertices called conceptual relation nodes

onto a set, TR, of type labels. Each type label in TR is specified as axiomatic or

defined. Examples from Figure 3.5 of these type labels are member,∼ works −

with,∼ etc.

A type hierarchy is a partially ordered set of type labels, TH . Type hierarchies can be used to define membership in various categories of entities. An example of a four level hierarchy can be seen in Figure 3.3.

TOP

H C E

F D MG

I AJ

KLBN

BOTTOM

Figure 3.3: A Type Hierarchy.

59 The levels are counted from the TOP of the hierarchy to the BOTTOM. In each

level of the hierarchy, there are entities that are members of the hierarchy. These mem-

bers are organized into a partial ordering with the symbol ≥ being used to designate the ordering from top to bottom of the hierarchy. An example of a partial ordering from

Figure 3.3 would be TOP ≥ C ≥ D ≥ A ≥ L ≥ BOTTOM.

Members at the top of the hierarchy are considered to be more general; the

members at the bottom are more specific. More general members in the partial ordering

are said to subsume the more specific members and the more specific members inherit

information from the more general ones. As stated in MacGregor:

“a concept C subsumes a concept D if any individual satisfying

the definition for D necessarily satisfies the definition of C”

[[66] page 388].

Through this process of moving down the hierarchy to gain more specific information

concepts are classified based on a relationship known as subsumption [140].

As seen in these hierarchies there is a partial ordering between its members.

However, when this membership is extended such that for any two elements x and y of

L, the set x,y has both a least upper bound and a greatest lower bound then an lattice exists [44]. When the elements are types, then these are referred to as type lattices.

60 3.2.1 Concept Type Hierarchy

With these type lattices, the concept type hierarchies are organized into partial ordered hierarchies according to the level of generality of the types. Using a more con- crete example in Figure 3.4, there is given a set of labels {Animal, Mammal, Bird, Cat,

Dog,Human}. If Mammal ≤ Animal then Mammal is called a subtype of Animal and

Animal is called a supertype of Mammal, written Animal ≥ Mammal. If Cat ≤ Animal and Cat ≤ Mammal then Cat is called a common subtype,∩, of Mammal and Animal.

If Animal ≥ Mammal and Animal ≥ Cat then Animal is called a common supertype, ∪, of Mammal and Cat.

T

Animal

Mammal Bird

Cat Dog Human

Figure 3.4: An Animal Concept Hierarchy.

61 Extending the type lattice definition as a type hierarchy plus the operators ∪ and

∩, it can be seen that the minimal common supertype of a and b, written a ∪ b, has the property that for any type t, if t ≥ a and t ≥ b, then t ≥ a ∪ b. The maximal common

subtype of a and b, written a ∩ b, has the property that for any type t, if t ≤ a and t ≤ b,

then t ≤ a ∩ b. In order to make the lattice complete, the labels ⊥ and ⊤ are introduced

such that for any type t, ⊥ ≤ t ≤ ⊤. The levels from ⊤ to ⊥ in the hierarchy go from

general to specialized for the types (e.g. Animal to Cat). Relationships that hold for all

objects of a given type are inherited through the hierarchy by all subtypes.

3.2.2 Support

Per the definition given within Baget and Mugnier [5], a support is defined as

4-tuple S =(TC,TR,I,τ) (see Figure 3.5 for an example). TC and TR are two partially

ordered finite sets of concept types and relation types, respectively. TR is partitioned into

1 k subset of hierarchies, TR ...TR , of relation typesof arity 1...k where k ≥ 1, respectively.

Both orders on TC and TR are denoted by x ≤ y, which means that x is a subtype (or

specialization (see Section 2.2.2.4)) of y. I is the set of individual markers (or referents),

and τ is a mapping from I to TC.

As can be seen in Figure 3.5, TC and TR are written in more of a shorthand for

type hierarchies; they do not include the ⊥ and ⊤ labels, even though they are implied.

Also, as discussed above, the relation hierarchy is broken down into a set of hierarchies

using the rtype function (as defined earlier in this section) noted by the arity of the

62 relations within the hierarchy. The individual markers, J. and K., are mapped to the concept Researcher through the mapping function τ. The‘...’ in each membership list indicates that there are more elements to each of these lists.

2 T T c r

Project Person Office member works−with geographical−relation in near Researcher Manager adjoin HeadOfProject

2 T TT= { } C RR I = {J.,K.,...} τ = {(J.,Researcher),(K.,Researcher),...}

Figure 3.5: Support Using a Relation Hierarchy (Based on [[5], Figure 1]).

3.3 FOL

First Order Logic, FOL, is a well understood form of symbolic reasoning pi-

oneered by Boole, Frege, and C.S. Peirce [51]. Each sentence that appears in FOL

contains a predicate and a subject in variable form. The predicate can either define

or modify the subject, but the resolution of the predicate is defined only for the logi-

cal truth values, TRUE and FALSE. When these sentences are combined, they must adhere to the rules of Boolean algebra. These sentences “only have variables for first- order objects (and these expressions such as “∀x” and “∃x” apply only to the elements of a structure), so will be call a first-order language” [[35] page 9].

63 FOL is considered part of first order predicate calculus, FOPC, but for FOPC

there is also a finite number of axioms. As restrictions are relaxed, and one looks at

the full area of logic considered FOPC, predicates can extend beyond just TRUE and

FALSE [12], where there are predicates such that can not be proven TRUE nor FALSE

[50]. With FOPC, λ−expressions using the predicate axioms with an infinite sequence can be expressed. This allows the use of predicates, such as, exists, forall, iff, etc.

Building up these axioms to represent sentence descriptions leads to set theory.

3.4 Conceptual Graphs

In his book [119], John Sowa states: “Conceptual graphs form a knowledge representation language based on linguistics, psychology, and philosophy” [[119] page

69]. The representation contains a graph, the definition stated in Section 3.1, and oper-

ate according to graph theory rules using graph diagrams that are built out of the logic

building blocks of conceptual structures (see Section 2.3.2.3). The definitions for some

of the blocks are presented beginning with the type block:

Definition 3.4.1 A type is a labeling for an abstract idea which is either

a conceptual unit or a relationship. These types are members of a set, T,

that may form several structures including hierarchy trees, lattices, and

other related structures. When the structure is a type hierarchy lattice,

the set is labeled TC, and the function ctype maps a conceptual unit to

64 the type label in the structure. When the structure is a relation hierarchy

tree, the set is labeled TR, and the function rtype maps a relationship to

the type label in the structure.

A referent block would have the following definition:

Definition 3.4.2 A referent is an abstract conceptual unit that has been

instantiated with a factual value.

Therefore, a conceptual graph, CG, applies the following definition:

Definition 3.4.3 A conceptual graph is a bipartite, connected, directed

graph G =(V,E), such that V, all vertices in G, is partitioned into two

disjoint sets VC andVR. The vertices are labeled, and the setVC is called

the concept nodes and the setVR is called the conceptual relations nodes.

Thus, e ∈ E is an ordered pair that connects an element of VC to an

element of VR using a directed edge which will be called an arc.

The label of a concept node is a pair, c =< type,re f erent >. The type

is an element of the set TC, that may be defined in a type lattice (see Sec-

tion 2.1.1). The referent (if present) contains the individual instantiation

for the type field; however, if it is not present then c =< type,empty >or

just written c =< type >.

65 The label of a conceptual relation node is a pair,cr =< type,signature >,

where type is an element of the set TR, and the signature is a pair,

s =< I,O > where I is the arcs that are directed into the conceptual

relation and O is the arcs that are directed out from the conceptual re-

lation. The signature is further defined by its subset category of either

relation or actor. The relation is a tuple, r =< type,c1,c2,...,cn >where

type is defined above and in the signature I ⊆ VC and O ∈ VC. The

number of concepts in the tuple is the valence of the relation. A con-

ceptual relation of valence n is said to be n-adic, and all signatures

must be at least 1-adic. The actor is a slightly different tuple, a =<

type,c1,c2,...,{...,cn−1,cn} > where type is defined above and in the

signature I ⊆ VC and O ⊆ VC.

Figure 3.6 shows a basic conceptual graph in traditional format with nine nodes.

C1 R1 C2

R2 R3 R4

C3 R5 C4

Figure 3.6: Basic Abstract Conceptual Graph.

66 Figure 3.7 shows a conceptual structure with nine nodes in the mathematical digraph and bigraph format. Within the CS community, it is felt that the typical display format of Figure 3.6 is easier to read and follow the conceptual relationships.

C1 R1

C2 R2

C3 R3 3

C4 R4

R5

Figure 3.7: Basic Abstract Conceptual Graph in Digraph Format that is Bipartite.

In Figure 3.6, four nodes are concepts (seen in display mode as rectangles), five nodes are relations (seen in display mode as ovals). In this example, VC = {c1,c2,c3,c4} and VR = {r1,r2,r3,r4,r5}. There is not a type hierarchy, but the four concepts are

67 c1 =< C1 >, c2 =< C2 >, c3 =< C3 >, c4 =< C4 >, and the relations are r1 =<

R1,< c1,c2 >>,r2 =< R2,< c1,c3 >>,r3 =< R3,< c1,c5 >>, r4 =< R4,< c2,c5 >>

, r5 =< R5,< c3,c5 >> . As can beseen the “R1” relation has the signature < c1,c2 >

which indicates that r1 is a 2-adic (or binary) relationship where c1 is the input concept

or the argument to the relationship and c2 is the output concept or the output for the

relationship.

Sowa has shown how unknown objects (nodes with no referent field) can be

computed by an actor node [119]. Actor nodes (displayed as diamond-shaped boxes)

are connected to concept nodes with dashed lines because an actor can best be thought

of as a “functional relation”, where there is a semantics (performed by the procedure)

being represented graphically between two objects. Figure 3.8 is a functional relation-

ship, displayed with the diamond shape, between CATCHING and PERSON, CATCH,

and BALL.

A functional relationship has directionality from one node to another, but there

is both optional inputs, and multiple outputs possible of conceptual information. In

order to know how to handle the data being processed through this kind of relationship

an action function (see Figure 3.9 for example) is attached to the relation. Therefore

each functional relationship is called an actor in the conceptual graph representation

because it can perform actions on its data.

68 CATCH

AGT PTNT

PERSON BALL

CATCHING

POSS

Figure 3.8: Basic Conceptual Graph with Actor.

void Catching(String person, String catch, String &ball) { // process in knowledge base so this person now has // possession of a specific ball }

Figure 3.9: Action Function For Basic Actor Graph.

If this internal representation was to be used at a higher level for a logic knowl- edge representation, one would need to map the intention of the predicate with a type, and the extension of the predicate with a referent in the internal a conceptual structure representation.

69 3.4.1 Graph Theory Relationships

This conceptual structure encodes knowledge using concepts and conceptual

relations. Concepts are “blocks” of typed information and conceptual relations are the

“linkages” between the blocks. This knowledge is then transformed into a graphical

structure such that a conceptual graph contains two kinds of nodes: concepts and re-

lations. The lines are the arcs between these two kinds of nodes. It is this duality

that make the graph a bigraph, or bipartite graph. It should be noted that conceptual relations can also be of two kinds: direct relationships and functional relationships.

Also, unlike a general graph, the pairs of nodes defining the arcs are ordered, or directed. The arrows on the arcs show the directionality of information movement from one node to another. As can be seen in Figure 3.6, relation R2 is in a direct relationship from concept C1 to concept C3. R2 receives conceptual input data from concept C1,

and produces output data and sends it to concept C3 (instantiates C3). Therefore, these

nodes are connected in a triple relationship.

The walk for a conceptual graph must not only alternate between nodes and

arcs, but the kind of the nodes must alternate between concepts and relations [136].

When a walk is just a trail such as in Figure 3.6, an example trail would be from C1 →

R2→C3→ R5 → C4. In a walk, since the arcs are incident to the nodes, each concept

node has a relationship number to each relation node (i.e. C1 has the relationship

number 1 to R2, and C3 has the relationship number 2 to R2). From now on to this

70 relationship number will be referred to as the ith edge of the relation in respect to each linked concept.

For there to be a path in a graph all the vertices must be unique. However, for a conceptual graph only the concept nodes must be unique [136]. If the path is closed, that is a cycle, then the first concept, c1 would be equal to cn. Since a conceptual graph is directed, one can follow the arcs through the graph to create a path. A conceptual graph without a cycle, that does not contain a functional relation, is called a tree [136], and the path followed will lead to a leaf node. Examining Figure 3.6, a path reaching all the concept nodes (but not all the relations) would be C1 → R2→ C3→ R5 → C4 →

R4 → C2 → R1 → C1. Note, R3 is not reached and C1 is repeated; therefore, this is a path, but not a tree.

3.4.2 Formation Rules

Not every concept and conceptual relation combined together make sense in a meaning full way; therefore, conceptual graphs that do represent meaning will be considered well-formed, and other combinations with no meaning will be called ill- formed [118]. When working with well-formed CGs, three formation rules can be applied repetitively [118]:

1. Copy - An exact copy of a well-formed CG is well formed.

2. Detach - All CGs that remain when any conceptual relation is removed from a

71 well-formed CG are also well-formed.

3. Restrict - If a is a concept in a well-formed CG G, then for any concept c ≤ a

from the concept type hierarchy (see Section 3.2) of G, the graph obtained by

substituting c for a is well-formed.

Examples can be presented to show how each of these formation rules can be applied.

The ‘copy’ formation rule is fairly straight forward: the graph G in Figure 3.7, can be copied to graph H in Figure 3.6, where both graphs are equivalent and well-formed, just displayed in a different way.

If one starts with the graph H in Figure 3.6, and performs two ‘detach’ formation rules; first, remove conceptual relation R3; and second, remove conceptual relation R5; then graph H′ shown in Figure 3.10 will be produced. The graph H′ is still well-formed even though two of the conceptual relations have been detached.

C1 R1 C2

R2 R4

C3 C4

Figure 3.10: Basic Detached Conceptual Graph.

72 In order to explain clearly about the restrict rule, graph H′created with the de- tach formation rule applications in the paragraph above, and graph G shown in Figure

3.11 will be used in connection with a new concept type hierarchy shown in Figure

3.12.

C6

R6

C5 R1 C2

Figure 3.11: Simple Basic Conceptual Graph.

T

C5

C2 C1

C3 C4

Figure 3.12: Second Concept Type Hierarchy.

73 Graph G contains the nodes C5, R1, C2, R6, and C6; in which, the C5 node can be restricted using the second concept type hierarchy (see Figure 3.12) to C1, because

C1 is a subtype of node C5 (note: other restrictions could also be performed).

This restriction will produce the well-formed graph in Figure 3.13.

C6

R6

C1 R1 C2

Figure 3.13: Simple Restricted Basic Conceptual Graph.

3.4.3 Simple Conceptual Graphs (SCGs)

Researchers M. Chein and M.-L. Mugnier [15] from the LIRMM group at the

Universite Montpelier and others [5, 22] have done research on a subset of concep- tual graphs known as simple conceptual graphs, SCGs, (see Sowa 3.1.2 [119]). As explained in Baget and Mugnier [5], these SCGs are connected, bipartite graphs where the arcs are labeled and finite but not directed, SG = ((Vc,Vr),U,λ). Figure 3.14 is an example of a SCG. Vc and Vr are the concept and relation nodes, respectively. U is the set of edges, where edges incident on a relation node are totally ordered (that is, they

74 C1 2

R2

1 3

C3 C4

Figure 3.14: Simple Conceptual Graph (SCG).

are numbered from 1 to the degree of the node). λ is a labeling function of the nodes

and edges [75].

Examining U further, an edge numbered i between a relation node r and a con-

cept node c can be labeled by (r,i,c) and is unique in U; all edges within U will be

stored in this triplet format. As an example, from Figure 3.14, (r2,2,c1) would be an

element of U.

Every node also has a label defined by the mapping of λ. A relation node’s label

is its (type(r), arity(r)) (defined in Section 3.2.2), and a concept is its (type(c),marker(c))

(defined in Section 3.2.2). The directionality is removed to simplify the reasoning

processing of the graphs. Due to the fact that there is no directionality, there are no

conceptual relations that are functional (excludes actors).

However, an extension from SCG that does allow directionality and cycles (this

will be discussed more later in actual algorithms), is feature term graphs, ω − term,

introduced by Ait-Kaci [2]. A conceptual graph G is a feature term graph if it obeys the

75 following conditions [136]:

• the relations are all binary, for any relation r only arg1(r) and arg2(r) are defined,

′ ′ • the relations are functional, for any relations r and r ∈ A,arg1(r) = arg1(r ) and

type(r) = type(r′) implies that r = r′, and

• there is a head concept h ∈ C such that for all c ∈ C there is a path (c1,r1,...,rn−1,

cn) with arg1(ri) = ci and arg2(ri) = ci+1 such that c1 = h and cn = c. Note that

when n = 1 this includes the case c = h.

3.4.4 Conceptual Graphs Interchange Format (CGIF)

The conceptual graph interchange format (CGIF1) is a representation for con-

ceptual graphs intended for transmitting CGs across networks and between IT systems

that use different internal representations. The CGIF syntax ensures that all necessary

syntactic and semantic information about a symbol is available before the symbol is

used; therefore, all translations can be performed during a single pass through the in-

put stream. Part of this information is reproduced here in appendices (see Appendix

B) to give a concrete definition of a conceptual graph and indicating how CGs were

transmitted between the systems during testing discussed later in this work.

1The current archived copy of CGIF from the ICCS2001 workshop is located at: http://www.cs.nmsu.edu/~hdp/CGTools/cgstand/cgstandnmsu.html#Header_44

76 The CGIF format was originally developed by John Sowa for a possible In-

ternational Standard [126]. It was then modified to the format seen at the CGTools

Workshop, that was held at the International Conference on Conceptual Structures in

2001 [97]. Since that time, it was totally changed and incorporated, as Annex B, into a

larger effort of standardization known as “International Standard for

[33].

3.5 Data Structures

In order to evaluate the array and hash table data structures over the graph struc-

ture, one looks at how long it takes to store and retrieve a single relationship (see Def-

inition 3.4.3 for a CG) within the graph given the specified data structure. Note: this

does not examine or account for any support (see Section 3.2.2) or hierarchy (see Sec-

tion 3.2) processing. This is from the perception that more retrievals will be done on the

knowledge base than stores, so it is important to optimize the retrieval of relationship

elements of the graph over considering the time and space to store that information.

Table 3.1 indicates the time to store and retrieve a relationship from a set of n rela- tionships within a graph for certain data structures. The following sections define how these values were reached and any related constraints or constants.

77 Table 3.1: Execution Times For Single Element with Set of Size n.

Data Structure Storage Retrieve Array (sorted) O(n) O(log(n)) Array (unsorted) O(1) O(n) Hash Table O(1) O(1 + α) Perfect Hash (single) O(n) O(1) Perfect Hash (double) O(n2) O(1)

3.5.1 Arrays

When arrays are used for data structures, the time an array takes to store ele- ments depends on whether the array of values is sorted. When the data is not sorted, but just appended to the end of the array then the storage of data is very quick, O(1), but retrieving the data back can take as long as O(n) because one has to look through the whole array. When the array is sorted data on storage, it can take O(n) time to place the data, but with a binary search on a tree structure it takes O(log(n)) time to retrieve it back.

However, if the sorted data is from a directed cyclic graph into a knowledge base structure, storage is equal in execution time for the time needed to retrieve it back. This can be shown, such that, for building the array, the execution time for a single graph, is

O(n) where C * n = #vertices + #edges and #edges = 1/2#vertices=1/2n, so C = 3/2 (see

Cormen90[20]). For retrieving the element back from the graph (for example, doing a direct match) the whole array may again need to be checked again giving the execution

78 time of O(n).

3.5.2 Hash Tables

Storage of an element in a hash table data structure has the expected storage time of O(1) plus the time it takes to compute the hash value for the key, h(k), and to store in the case of collision depending on the secondary data structure. If the secondary structure is an unsorted linked list then the element can just be placed at the head of the list (most common data structure [20]). On retrieval, even when there are collisions with key hash values, there are still far fewer than n where n is the number of nodes in the graph. The hashing function will produce more than one value, so all hash keys will not collide. The expected time for retrieval with a hash table is O(1 + α) where α is the time to retrieve the element if there was a collision at storage, and the time to compute the hash value for the key, h(k).

3.5.2.1 Perfect Hashing

In true perfect hashing, there are no collisions on key values so retrieval time now becomes O(1) (note: there is a constant because of the execution of the hashing function to find the key) [24]. However, creation of the perfect hashing function given a set of dynamic input data can be costly on storage. There has been research on finding the perfect hashing function, and a hash function description (“program”) for a set of size n occupies O(n) words, and can be constructed in expected O(n) time [83]. Work

79 has also been done on finding a universal hash function [131] or a quasi-perfect hash

function [23] (as opposed to a perfect hash function) that can be constructed in time

O(1 + α) where α is again not close to n.

3.5.2.2 Hash Table/Hash Tables

When a hash table is the value element of a hash table data structure, then

there is extra storage space in order to hold the overhead needed by the second hash

table. Considering that this hash table is embedded in another hash table with its own

overhead, then there is double the amount of overhead space being used. However,

if both hash tables are perfect hash tables then the retrieval time for the finding the sub-value becomes O(1) * O(1) or O(1) (constant). After the overhead retrieval time,

constant, is accounted for, then the retrieval time is O(1).

Besides the extra overhead space is required, the time to store the double hash

tables would be at maximum O(n2) time for two hash tables to be stored. This assess-

ment is reached by looking at two hash tables in which one table holds n elements and

the other table holds m elements. Since the size of m is ≤ to n, then evaluation can

be performed by using the size n. When using Pagh’s algorithm [83] discussed above,

it was shown that to store perfect hash tables for one set of hashes takes O(n) time;

therefore, if storing two perfect hash tables it would take O(n) time at each element in

the first hash table to store the second hash table or O(n2) for both tables.

80 CHAPTER 4

REASONING OPERATIONS

Within this chapter, first will be described the operator ‘project’ and then how it relates to the operator ‘join’. The second section will describe graph isomorphism relation- ships, and the last part of the chapter will describe how all these elements are connected within reasoning operations.

4.1 Operators

Using the knowledge representation described in Section 3.4, two operators, project and join, manipulate conceptual graphs using the rules that incorporate type hierarchy subsumption [48]. These operators are duals (i.e., intersection and union), therefore, the description of project is, in some sense, the dual of the description of join.

The following set of correspondences are sufficient to indicate how project and join compare:

Project ←→ Join

Min. Supertype ←→ Max. Subtype

Intersection ←→ Union

81 4.1.1 Project

The project operator is defined through a mapping π :u → v, where πu is a sub- element of v. When u and v are defined to be conceptual graphs, for graph u to be a subgraph of graph v, all of the nodes and arcs of u are in v [46], and the project operator

π holds to the following rules [119, 136]:

• Type preserving: For each concept c in u, πc is a concept in πu where type(πc)

≤ type( c ), and ≤ is the subtype relation. If c is an individual, that is an actual

instance of an object, then referent( c ) = referent( πc).

• Structure preserving: For each conceptual relation r in u, πr is a conceptual rela-

tion in πu where type(πr) = type( r ). If the ith edge of r is linked to a concept c

in u, the ith edge of πr must be linked to πc in πu.

The example in Figure 4.1 shows project with the general forms of graphs.

G A I A I project

AF B J J

Figure 4.1: Project (Mp (Q, H) = P) (Adapted from [[92], Figure 3]).

82 This example uses the hierarchy example (see Figure 3.3) from Section 3.2. One can see that the node A from the first graph, which will be called Q, is projected onto the second graph, which will be called H, with a match at its node A. This is the only exact match in the project. Then using the hierarchy, F is the supertype of I, so when

Q is projected onto graph H, I the common subtype of I and F forms a new node in the projection graph P, and this node is linked to A. Lastly, nodes G from graph Q and

J from graph H have a common sub-type of J, so that is formed as a new node in the projection graph P giving the resulting project graph P = {V,E} where V = {A,I,J} and E = {{A,I},{A,J}}. Note, using this hierarchy there are more than just this one project possible.

If join (see Section 4.1.2) is likened to set union, in that all nodes not joined are just left alone, and come along for the ride, then project is like set intersection.

All nodes that are not projected are simply dropped from the resultant graph, and their associated relation nodes are detached.

4.1.2 Join

With an elementary join between two graphs, U1and U2, that are non-necessarily distinct; let c1,c2 be two concept vertices belonging respectively to U1and U2, and hav- ing the same type or subtype, then the results of the join of U1 and U2would be U3 with the restriction (see Section 3.4.2) of concept c1 with c2 and linking to c2 all the edges that had been linked to c1 now in U3 [15, 119].

83 In join MJ (see Figure 4.2), the labels may be restricted by replacement with a label of any subtype, and graphs will be merged on the maximum number of nodes.

Figure 4.2 again uses the hierarchy example (see Figure 3.3) from Section 3.2. One can see that the node A from the first graph, which will be called Q, is joined with the second graph, which will be called H, with a match at its node A. This is again the only exact match in the join. Then using the hierarchy, I is the subtype of D, so when

Q is joined with graph H, D is restricted to I and I forms a new node in the join graph

J, and this node is linked to A. Lastly, node K from graph Q is linked into the new join graph J giving the resulting join graph J = {V,E} where V = {A,I,K,B,F} and

E = {{A,I},{A,K},{A,B},{A,F},{I,F}}.

K

A I K A I join

BF AD B F

Figure 4.2: Join (MJ (Q, H) = J) (Adapted from [[92], Figure 2]).

84 4.2 Graph and Subgraph Isomorphism

Table 4.1 shows how each of these problems and sub-problems fall within

the problem classes discussed in basic subgraph isomorphism reasoning (see Section

1.2.1).

4.2.1 Graph Isomorphism

For two graphs to be identical, the vertices in G must map onto the vertices in

H, such that, (x,y) is an edge of G iff ( f (x), f (y)) is an edge in H; therefore, giving isomorphic graphs. However, if the graphs are labeled, that is the vertices have actual labels as opposed to variables, then given graph G = (Vg,Eg) and graph H= (Vh,Eh), such that they are identical, that is (x,y)∈ Eg iff (x,y)∈ Eh, then they can be defined to be isomorphic. As already stated in Section 1.2.1, graph isomorphism is in the problem class NP (see first row of Table 4.1), even though there are known algorithms when the graphs are labeled that have a polynomial time solution.

4.2.2 Subgraph Isomorphism

As discussed in Ullman’s paper of 1976 [132] and used in basic subgraph iso- morphism reasoning (see Section 1.2.1), looking for all the isomorphisms between a given graph G = (Vg,Eg) and subgraphs of a further graph H= (Vh,Eh) allows the de- tection of related objects within the two graphs. This subgraph isomorphism helps to

find if two structural patterns within the graphs are related.

85 Table 4.1: Related Problem Classes. Graph to Graph Problem Graph Description (worst case References Class time) Graph nodes - non-labeled NP [42] Isomorphism edges - undirected Subgraph nodes - non-labeled NP-Complete [42, 132, 77, 68] Isomorphism edges - undirected nodes - labeled Subgraph edges - undirected; P (n2) [132, 77, 68] Isomorphism labeled nodes - non-labeled Isomorphism edges - undirected P (n2.5) [103, 42, 111] graphs are both trees nodes - non-labeled Subtree edges - undirected NP-Complete [42] Isomorphism query graph is a tree nodes - non-labeled Subforest edges - undirected NP-Complete [42] Isomorphism query graph is a forest; search graph is a tree nodes - bipartite;non- Subbipartite labeled except type NP-Complete [38, 39, 40] Isomorphism edges - undirected nodes - bipartite; non- labeled except type Projection NP-Hard [119, 48, 74, 22] edges - labeled; undirected nodes - bipartite; dissertation Proposed labeled NP-Hard defined Projection edges - non-labeled; algorithm directed nodes - bipartite; non- Maximal labeled except type [84, 119, 48, 74, NP-Hard Join edges - labeled; 77] undirected nodes - bipartite; Proposed dissertation labeled Maximal NP-Hard defined edges - non-labeled; Join algorithm directed

86 4.2.2.1 Non-labeled nodes and undirected edges

This is where one wishes to discover if G1 contains a subgraph isomorphic to

G2. For this class of problem, the vertices are general non-labeled nodes and the edges are non-labeled, undirected links. According to Garey and Johnson [42], and other references, this problem can be restricted to the known NP-Complete class problem of

CLIQUE and therefore has the complexity of NP-Complete. Per the reasoning given above for graph isomorphism, the complexity of graph, G2, to all the graphs in a knowl- edge base is also NP-Complete.

4.2.2.2 Labeled nodes and undirected edges

This sub-problem of the subgraph isomorphism problem discussed above is shown by Ullman [132], and others to be solvable in P (polynomial time). Within the Messmer and Bunke paper of 2000 [68], they show that by dividing the subgraph question into two parts: 1) decomposing the graph, and 2) querying the subgraph iso- morphism question on the smaller graph; this can improve the time complexity. In fact, producing unique labels for the decomposed graph parts (down to the single nodes) allows both parts to run in polynomial time.

87 4.2.3 Subtree Isomorphism

One of the sub-problems to subgraph isomorphism is subtree isomorphism. This is when both G and H are trees (“a tree is a connected acyclic graph” [[46] page 32]).

A polynomial time algorithm for this sub-problem was shown by Reyner [103] where

1.5 the running time was O(n1 ∗ n2 ) where n1 is the number of vertices in the input graph

and n2 is the number of vertices in the knowledge base graph. This polynomial time algorithm extends to m∗O(n2.5) where m is the number of graphs in the knowledgebase

and O(n2.5) is the polynomial running time for the input graph times the knowledge base graph. In the P algorithm, the n is the maximum number of nodes in the largest graph. It should be noted that Reyner’s algorithm used maximal matching in a bipartite graph and therefore considered the trees to be bipartite [103].

4.2.3.1 Hamiltonian Path

When the query graph, H, is a tree and the knowledge base graph, G, is un- known, then H contains a HAMILTONIAN PATH as a sub-problem and hence is NP-

Complete [according to Garey and Johnson [42] page 104].

4.2.3.2 Subforest Isomorphism

When the knowledge base graph, G, is a tree then the query graph, H, must be acyclic. If it is not a tree, then it may be a forest. However, Garey and Johnson [[42] page 105] also show that this sub-problem is also NP-Complete.

88 4.2.4 Subbipartite Isomorphism

This sub-problem can be defined as a subgraph isomorphism search using bi-

partite graphs. This would be the most closely related class of problem to the reasoning

operations projection and maximal join as defined in Sowa’s 1984 book [119]. This

sub-problem of subgraph isomorphism answers the decision question: is there a sub-

graph of the knowledge base graph, G, that is isomorphic to the query graph, H, where

G and H are bipartite graphs. According to the Eppstein 1994 work [38], this sub- graph isomorphism question on bipartite graphs can be answered in the best case in polynomial time. This comes about because the number of edges are reduced through the relationship between the nodes and a natural set of labels that are added because of the types on the nodes. However, it should be stated that these labels are not totally unique, and therefore, Ettinger [40] clearly states for the worst case running time over a whole knowledge base where the labels turn out to be duplicated across nodes, the execution time is still NP-Complete. The labels may not be totally unique even though they are separated into two groups because the labels in the nodes must only be of two different types; within a type the label on all nodes may be the same. Therefore, this sub-problem can be reduced to the Maximal CLIQUE problem which is known to be

NP-Complete.

89 4.2.5 Projection

Projection involves both the subclass of problems defined above as subbipartite

isomorphism and a new subclass of problem that looks at defining rules for type lattice

(many times called ‘trees’) subsumption.

4.2.5.1 Historical Algorithms

The projection sub-problem can be considered constructive as well as isomor- phic because of the way the rules are applied. The construction comes from the gener- alization that can be applied when a node, through the application of subsumption with the type lattice, is built into a new node of the output graph when part of the projection of an isomorphic subgraph. Therefore, the output to projection is not simply a logical true or false, but a newly constructed graph containing the subgraph structure from the knowledge base graph with possible new constructed nodes through the application of the subsumption rules. The nodes in the graph are only labeled to the same extent as the bipartite graphs described in the above class so, when evaluating the running time of the algorithm, if no rules are applied from a type lattice, the running times are just the same as for a subgraph isomorphism using bipartite graphs. The output graph in this case is the subgraph from the knowledge base graph that was being projected onto.

However, the worst case running time when rules for the type lattice are applied must take into account that projection is a problem known to be in NP (Sowa [119],

90 Hartley and Coombs [48], Mugnier and Chein [74], and Croitoru and Compatangelo

[22]), and is constructive, so NP-hard.

4.2.5.2 Proposed Algorithm

This sub-problem of the projection given above will be defined new in the dis- sertation. It makes the following two modifications to the maximal projection problem:

1) all nodes are uniquely labeled, and 2) the edges are non-uniquely labeled, but do have some implicit labeling because they are directed. Tests were also performed us- ing different data structures at implementation time. It is believed that through the use of different data structures, the execution time will reflect the running time of the subgraph ‘labeled’ isomorphism problem as opposed to the subgraph isomorphism on bipartite graphs. The change in data structures also allow a change in how concepts verses conceptual relations are searched for within the graph structure through the use of the ‘labels’. Through this shift in sub-problem of subgraph isomorphism, there is an improvement in the running time for the first part of the projection problem (not having the application of rules from the type lattice).

However, because the overall problem is still constructive as opposed to a de- cision problem, and the application type rules have a worst case running time in NP

(Mugnier and Chein [74], and Croitoru and Compatangelo [22]), the worst case run- ning time for this sub-problem is still NP-hard.

91 4.2.6 Maximal Join

Maximal join is a sub-problem of projection, in that, the maximal join algorithm

is a join on compatible projections. These projections are maximally extended from the

common generalization of two graphs which are bipartite graphs [119].

4.2.6.1 Historical Algorithms

In performing a join, the time complexity includes the time to find the subbi-

partite isomorphism(s) of the two graphs, and then the matching (or joining) of these

projections again in a constructive manner to produce the largest extended constructed

graph from the subgraph of the knowledge base graph with the query graph.

Graph matching can be reduced to a unification problem, and by doing so, in

many cases where the graphs are acyclic, can be performed in linear time (Myaeng and

Lopez-Lopez [77] and Paterson and Wegman [84]). Therefore, the overall complexity

of a maximal join in the best case (when no type rules are applied in the projection)

is still polynomial, O(n4); however, in the worst case (when the projection does apply type rules) it is a NP-Hard problem.

4.2.6.2 Proposed Algorithm

Like the proposed new projection algorithm, this new algorithm is a sub-problem of maximal join with modifications. The modifications are unique labels on all the nodes of the graphs and non-labeled directed edges in the graph. Here, it is seen that

92 the data structures being modified at implementation time again drive the projection to be the time complexity of a labeled subgraph isomorphism, without type lattice rules.

These changes also drive the matching in the join to be linear even when the graphs are cyclic. Therefore, the best case time complexity is O(n3). However, the worst case, with type rules being applied, is still NP-hard. During experimentation, it is hoped that it can be shown that the worst case is not reached very often.

4.3 Operations

There are two basic operations necessary to process CG reasoning processes:

1) projection and maximal join. These operations use the project and join operators, respectively, and apply the CG KB algorithms over them. These algorithms are based on the subgraph isomorphism class of problems defined in the section above.

4.3.1 Projection

A projection operation uses the project operator, which is a matching on a graph morphism, graph data structures with either the support information for SCGs or hierar- chies when full CGs, and the actual projection algorithm. Stated in Baget and Mugnier,

“the elementary reasoning operation, projection, is a kind of graph homomorphism that preserves the partial order defined on labels” [[5] page 428]. Not only does projection use a project operator (see its definition in Section 4.1.1), but either the support S of the graph (when a SCG) or the defined type hierarchy (when CG), and produces a gen-

93 eralization subgraph. During the projection of the query graph onto the match graph,

the match graph is generalized, and structure is removed by conceptual relations being

detached [37].

For the rest of this work, the projection operation evaluation and comparison

will be restricted to injective projection. The projection mapping is not necessarily one-

to-one; that is, a concept or relation in u may have more than one concept or relation

in v that πu is a valid mapping. In this respect, there is more than one valid projection from u to v.

When the projection operation is performed using the query graph from Figure1

4.3 onto the KB graph and hierarchy of Figure 4.4, the two projections, P1 and P2,

discovered are displayed in Figure 4.5. Using the type hierarchy, both object and ball

are matches; note, if no hierarchy were present, then there would be only one projec-

tion. This is a simple injective projection because of the small graphs, however, it can

become complex very quickly.

Object prop Color: blue

Figure 4.3: Query Graph.

1The figures in this section were generated by CharGer [32].

94 CubeBetweenBalls Object prop

ontop Color: blue

Cube: A Ball prop

between Ball

T Object Cube

Ball

Figure 4.4: KB Graph with Type Hierarchy.

P1 Object prop Color: blue

P2 Ball prop Color: blue

Figure 4.5: Projection Results.

95 4.3.2 Maximal Join

For the join operation with conceptual graphs it is always maximal. Maximal join is therefore defined as: “a join on compatible (common) projections that are max- imally extended from the common generalization, L, of two conceptual graphs, Q and

G” (see Sowa 3.5.8 [119] page 102). The join is locally maximal because there may be more than one group of compatible projections from two graphs that are maximally extended (see Figure 4.2). In this way, structure is added or concepts are made more specific [37]. Since restrictions are allowed, it is clear that two nodes are join-able as part of a maximal join operation if they contain types that have a maximal common subtype using the support S (in the case of SCGs) or type hierarchy (for CG).

Papers ([15, 92, 91, 48]) contain many examples, but to clarify the maximal join operation three examples will be shown here. First, the two projections found in the previous example for the projection operation, could be joined into a single graph because object is a generalization of ball and cube (see Figure 4.6). Basically the

Object concept from graph P1 would be restricted to concept Ball, and then relation R1 would be detached; this produces a graph that is just a copy of the graph P2. Because these two graphs could be fully joined into a completely compatible (common) graph, where there are no nodes that were not join-able, then these are considered compatible projections. When graphs are specialized, they are maximally joined on compatible projections of a more general graph [119]; therefore, the joined graph from Figure 4.6

96 could then be joined back to the original graph seen in Figure 4.4 to produce parcel

models. Within these models, the consideration that the second ball that is part of the

‘between’ relation is also colored blue will be shown.

Ball prop Color: blue

Figure 4.6: Join of P1 and P2 Graphs.

The second example relates back to Figure 1.3 given in Section 1.2.2 of the

Introduction Chapter (see Chapter 1). From that example it can be seen that the graph

U is the common projection graph between graphs G1 and G2. When graphs G1 and

G2 are maximally joined this common graph becomes the merged nodes within the resulting graph G. In order for graph U to be the merging ‘piece’ between graphs G1

and G2, it is assumed that a hierarchy indicating that Girl ∼≤∼ Person is available

information. It is using this subtype that allows the restrict rule to produce the available

join.

The last example being discussed to clarify the maximal join operation comes

about when the graphs in Figures 3.10 and 3.11 (see Chapter 3 Section 3.4.2) are max-

imally joined. It has already been seen, within that section, that the graph in Figure

3.11 be restricted and detached to produce the graph in Figure 3.13. Using the common

97 graph seen in Figure 4.7, the graph J in Figure 4.8 is produced with just one step after restriction.

C1 R1 C2

Figure 4.7: Common Graph of Basic Graphs.

C6

R6

C1 R1 C2

R2 R4

C3 C4

Figure 4.8: Join of Detached Basic and Simple Basic Graphs.

4.3.3 Over Knowledge bases

As discussed in Section 1.2.1, all the subgraph isomorphism problems discussed so far are from a two graph perspective. However, for knowledge bases there may be more than one graph within the KB that will match to the input (query) graph [68].

98 Looking at the operations above, when they are performed over a knowledge base of graphs G, even though the two graph operation in the typical situation can be solved in P, the functionality of the operation over the whole database gives the following results.

Projection’s functionality over a set of graphs G is:

projection: G × G → 2G

As described above, there can be more than one valid projection between two graphs, hence the powerset notation on the set of all graphs G.

The functionality of maximal join over a set of graphs G is:

maximal join: G × G → 2G

There can be more than one maximal join, hence the powerset notation on the set of all graphs G. Join is a binary operation but multiple graphs can be joined by composing it with itself. Unfortunately, there is good reason to believe that join is not commutative when semantic considerations come into play [91], but for now it will assume there is no problem.

99 100 CHAPTER 5

ALGORITHMS AND ANALYSIS

As discussed in Section 4.3.2, the maximal join operation is an algorithm that involves the joining of compatible projections that are maximally extended; however, not much analysis and implementation has been performed on the join operation. Therefore, in the first section on foundational algorithms, only projection algorithms with be ex- plored. Later when the newly developed algorithms are discussed, any variations on maximal extension of graphs and joining will be addressed.

5.1 Foundational Algorithms

In general, the matching part of both the projection and join algorithms is unifi- cation (discussed previously in section 1.2.2) [19], and there are known linear unifica- tion algorithms for acyclic (tree) graphs [84]. Also, SCGs have been evaluated as both graph homomorphism and graph isomorphism. In their original paper from 1992 [74],

Mugnier and Chein looked at general projection running times and injective projection.

However, CGs and SCGs are not necessarily trees and only part of the algorithms pre- sented next apply to injective projection, so these linear algorithms give guidance, but do not always directly apply.

As discussedin the Messmer and Bunke paper [68], a naive strategy with forward-

101 checking for establishing a subgraph isomorphism is Ullman’s backtracking in search

tree algorithm [132]. Since Messmer and Bunke feel that it is a common technique with

a good baseline subgraph isomorphism algorithm, the Ullman algorithm and its known

complexity (from [132, 68]) will be reiterated here for to define a basis for investigating

projection algorithms. The basic idea of Ullman’s algorithm is to take one vertex of the

input vertices (query graph) at a time and map it onto a model (a graph from the KB)

such that the resulting mapping represents a subgraph isomorphism for a subgraph of

the model (KB graph) projected from the input graph (query graph) (see page 307 and

322 of Messmer and Bunke [68]). If at some point, the mapping being built does not

represent a subgraph isomorphism then the algorithm backtracks and tries a different

mapping. This process is continued until all vertices, v1,...,vM in VI of the input graph are successfully mapped onto V of the model. This either produces a subgraph isomor- phism from G to GI or stops when a vertex in VI can not be mapped to at least one vertex in V . In the second case, the algorithm backtracks to a new v1 in V or vn−1in V

and tries to remap the subgraph isomorphism.

Even though this basic algorithm works well for small model and input graphs,

it performs poorly as the graphs become larger. This is because all checks are being

done locally. Ullman added a forward-checking procedure to know when it is not pos-

sible for vn to be mapped onto an available vertex in VI (see page 322 in Messmer and

Bunke [68]), so that the algorithm can backtrack immediately and save computational

steps. In the best case Ullman’s algorithm is bounded by: O(NIM) where N = #model

102 graphs, I = #labeled vertices in the input graph which come from the M set of labels,

M = #labeled vertices in the model graph that are unique. In the worst case the algo- rithm is bounded by: O(NIMM2) where N = #model graphs, I = #vertices in the input graph that are not labeled, and M = #vertices in the model graph that are not labeled.

With this general algorithm, labeling of vertices greatly improves the efficiency of the algorithm. However, it should be noted, that this algorithm does not take into account any support or hierarchy knowledge information.

5.1.1 SCG Projection

This section is an explanation of the projection algorithm found in Marie-Laure

Mugnier and Michel Chein’s 1992 work [74]. Note the base level polynomial algorithm discussed is for SCG without loops (cycles) in the graph being projected, trees, and this is the foundation for improving projection between two SCGs with a support (see sections 3.4.3 and 3.2.2).

Before discussing the general and injective projection algorithms, some basic definitions are given which will help the reader understand each algorithm. 1) Using the projection operation provided in section 4.3.1, the following additional rules on labels will be added to the graph morphism (from [74] page 240):

Definition 5.1.1 Given two simple conceptual graphs G and G′, a pro-

′ jection Π from G to G is an ordered pair of mappings from (RG,CG) to

(RG′ ,CG′ ), such that:

103 (i) For all edges rc of G with label i, Π(r) Π(c) is an edge of G′ with

label i.

(ii) ∀r ∈ RG, type(Π(r)) = type(r); ∀c ∈ CG, type(Π(c)) = type(c).

There is a general projection from G to G′ if and only if G′ can be derived from G by the elementary specialization rules [119, 15].

2) The set of the numbers on edges between r and c (refer to section 3.4.3 on

SCGs) holds the following definition:

Definition 5.1.2 For c a neighbour of r, let Pr[c] be the class of the

partition of Pr which corresponds to c.

3) Injective projection definition:

Definition 5.1.3 Injective projection is a restricted form of projection

where the image of G in G′ is a subgraph of G′ isomorphic to G.

The projection from a tree to a graph in the general case, as defined on pages 245-246 of Mugnier and Chein work [74], and where there is a concept vertex a in T and a concept vertex c in G is given in Algorithm 5.1.

104 Algorithm 5.1 Π is a General Projection from T to G

1: function PROJ-ROOT(a,E) ⊲ a ∈ CT 2: E ←{c ∈ E | label(a) ≥ label(c)} 3: If E = 0/ or a is a leaf, return E 4: for all r successors of a do ⊲ Move through the neighbours 5: for all c ∈ E do ′ ′ 6: Wc,r ← {r neighbour of c | type(r) = type(r ) and Pr[a] ⊆ Pr′ [c]} 7: end for 8: Er ← S{Wc,r}c∈E 9: Er ← PROJ-r(r,Er) 10: for all c ∈ E do 11: Vc,r ← Wc,r TEr 12: end for 13: Er ←{c ∈ E | Vc,r 6= 0/} 14: end for 15: return E ⊲ Project of graph 16: end function 17: function PROJ-R(r,E) ⊲ r ∈ R ′ 18: E ← r ∈ E | Pr is thinner than Pr′ 19: If E = 0/ or | P |= 1 is, return E 20: for all ai successors of r do ⊲ Move through the hierarchy 21: Ei ← S{cr′ | Pr[ai] ⊆ Pr′ [cr′ ]}r′∈E 22: Ei ← PROJ-ROOT(ai, Ei) ′ 23: E ←{r ∈ E | cr′ ∈ Ei} 24: end for 25: return E ⊲ Projection up relation hierarchy 26: end function

For this general algorithm to compute this projection from T to G, it is broken into two parts. The first function is used to determine the PROJ-ROOT part of the definition. As seen in line 4, the function looks for the projection from T to G by com- paring the relation vertices connected to concept vertex a in T to the relation vertices connected to concept c in G.

105 The second function at line 17 is used to determine the PROJ-r part of the defi- nition. This function looks for possible mappings at each concept vertex by examining sub-trees. The complexity of this general algorithm as proved on page 247 of Mugnier and Chein [74] is O(mT × mG), where m denotes the number of edges. The problem class related to this algorithm is in the NP class of problems.

This should be recognized as a single graph to graph project operator and with a projection operation (see section 4.3.1) an injective projection is necessary in order to produce the projection graph. If each graph is a tree then one has a tree to tree projection which is known to have a polynomial time algorithm [42], but conceptual graphs are not necessarily trees.

Therefore, Mugnier and Chein [74] modify their algorithm to the given Algo- rithm 5.2 to actually return the image of the new projected graph. Within this algorithm they use the function PROJ-r to continue to look for possible mappings at each con- cept vertex, but they modify the PROJ-ROOT routine to return the projection image.

Even though this is a locally injective projection, on page 249 they prove that if T is a conceptual tree and G is a cyclic conceptual graph then the decision question problem being solved by this algorithm is still a NP-complete problem.

106 Algorithm 5.2 Π Modified as an Injective Projection from T to G

1: function PROJ-ROOT(a,E) ⊲ a ∈ CT 2: E ←{c ∈ E | label(a) ≥ label(c)} 3: If E = 0/ or a is a leaf, return E 4: for all r successors of a do ⊲ Move through the neighbours 5: for all c ∈ E do ′ ′ 6: Wc,r ← {r neighbour of c | type(r) = type(r ) and Pr[a] = Pr′ [c]} 7: end for 8: Er ← S{Wc,r}c∈E 9: Er ← PROJ-r(r,Er) 10: for all c ∈ E do 11: Vc,r ← Wc,r TEr 12: end for 13: Er ←{c ∈ E | Vc,r 6= 0/} 14: end for 15: for all c ∈ E do 16: Build the bipartite graph (A,B,U) such that: 17: A ={sons of a}, B ={neighbors of c}

18: (B can also be defined as S{Vc,ai ,ai ∈ A})

19: U = {aiv | v ∈ Vc,ai } 20: If this graph admits a matching with cardinality | A |, 21: c is a solution 22: end for 23: return all c-vertices which are solutions of 22 ⊲ Projection of the subgraph 24: end function

5.1.2 SCG Relation Projection

Madalina Croitoru’s new projection algorithm is based on SCGs as described in her two 2004 papers [22, 21]. This algorithm begins by starting from the foundational algorithm given in section 5.1.1 by Mugnier and Chein [74]. The decision question associated with this new algorithm is the same as was stated in the Mugnier and Chein

1992 work [74] and is in the class of problems that are NP-complete. The significant

107 change applied to Algorithm 5.2 is split the algorithm into two parts and adding a

preprocessing algorithm to each graph pair looking for a matching graph as defined by the Definition 4.1 (in [22], page 8). Before defining the matching graph, some added definitions are needed:

Definition 5.1.4 1) λ is a labeling of the nodes of a SCG graph G with

elements from the support S (see Section 3.2.2).

2) d is the degree (or arity) of each node in the SCG graph G.

3) N denotes the neighbour sets for the relation node (see Section 5.1.1).

Now for the actual definition:

Definition 5.1.5 Let SG =(G,λG) and SH =(H,λH) be two SCG’s

without isolated concept vertices defined on the same supportS.

The matching graph of SG and SH is the graph MG→H =(V,E)where:

-V ⊆ VR(G) ×VR(H) is the set of all pairs (r,s) such that r ∈ VR(G),

λ λ λ i s ∈ VR(H), G(r) ≥ H(s) and for each i ∈{1,...,dG(r)} G(NG(r)) ≥

λ i H(NH(s)).

- E is the set of all 2-sets {(r,s),(r′,s′)}, where r 6= r′,(r,s),(r′,s′) ∈

′ V and for each i ∈ {1,...,dG(r)} and j ∈ {1,...,dG(r )} such that

i j ′ i j ′ NG(r) = NG(r ) we have NH(s) = NH(s ).

108 These matching graphs indicate which relation vertices should be used as potential candidates for projection; therefore, reducing the search space for the related search problem. By using this preprocessing with the matching graphs, the projection of G →

H in its reduced form belong to a class of problems in which finding the maximum clique can be solved in polynomial time [22]. Therefore, the execution of the algorithm gives a polynomial time algorithm to the NP-Hard search problem.

5.1.3 Polyprojection

This is Mark Willems’s algorithm explaining polyprojection and how it relates to a CG projection algorithm from his 1995 paper [136]. A polyprojection (from Defi- nition 5 in [136], page 282) is:

Definition 5.1.6 Consider two (conceptual) graphs G =(C,R,type,re f erent,

′ ′ ′ ′ ′ ′ ′ arg1,..., argm) andG =(C ,R ,type ,re f erent ,arg ,..., argm). A polypro-

′ ′ jection µfromGtoG is a pair of Cartesian product subsets µC ⊆ C ×C

′ and µR ⊆ R × R that are:

′ ′ ′ 1. Type preserving: for all concepts c ∈ C and c ∈ C , cµcc only if

type(c) ≥ type′(c′), and re f erent(c) = ∗ or re f erent(c) ≥ re f erent′(c′),

′ ′ ′ 2. Type preserving: for all relations r ∈ R and r ∈ R , rµRr only if

type(r) ≥ type′(r′),

′ 3. Structure preserving: µR ◦ argi = argi ◦ µC.

109 4. Non Empty: for all concepts c ∈ C there is a concept c′ ∈ C′ such that

′ cµcc .

It is said that G′ is structurally similar to G, if there is a polyprojection µ between G′

and G, and will be written G′µG.

Given this definition, Willems goes on to define that a polyprojection can be

found by a polynomial algorithm. The algorithm is divided into two parts, the first

part computes steps 1 and 2 from Definition 5 and finds the structure to be Type-

preserving(G,G’) (see Algorithm 1 in [136], page 283); the second part computes

step 3 from Definition 5 and finds a polyprojection through the use of Structure-

preserving(M) (see Algorithm 2 in [136], page 284) where M0 ⊆ Type− preserving(G,

G′) and determine a pair of sets M ⊆ M0 that is structure-preserving; that is M =

′ ′ ′ ′ ({(c1,c1),..., (cn,cm)},{(r1,r1),..., (ro,rp)}) where n = # of concept vertices in G, m

= # of concept vertices in G′, o = # of relation vertices in G, and p = # of relation ver-

tices in G′. The actual execution time of the algorithm is not given; the only statement

is that it is a polynomial result.

The algorithm described above is reminiscent to the one given in Reyner’s work

[103] (see page 284 in [136]). Therefore, if both G and G’ are trees, the polyprojection

of GµG′ is a projection of G onto G′ by Corollary 8 (see page 283 in [136]). Willems

goes on to state in Theorem 10 (see page 285 in [136]) that if there is a polyprojection

TµG′ where T is a tree, then there is a projection T → G′. This is significant because

110 Garey and Johnson on pages 104 - 106 of [42] indicate that the sub-problem of sub-tree

isomorphism called sub-forest isomorphism is NP-complete. The sub-forest isomor- phism problem is where given two graphs G and H, determine if H is isomorphic to a

subgraph in G, such that G is required to be a tree, but H is a forest. However, in this

case H may be a cyclic graph, and given that G is a tree, a polynomial time algorithm

can be determined. Willems shows that a polynomial time algorithm can be found for

detecting the structure of a projection graph helped in the design of the new algorithm

seen in section 5.2.2.

5.1.4 Notio Projection

The Notio project is a conceptual graph implementation with a well defined API

[117]. It is currently being used by several projects [30, 10, 99] for working with basic

reasoning operations with a CG KB. This is the author’s derived theoretical algorithm

(see Algorithm 5.3) from the Notio implementation code [117, 115] for his injective

projection algorithm (note: Southey never wrote any analysis papers or documentation

on the actual implemented algorithm).

It should be noted for Algorithm 5.3, the vertices are all labeled, but the edges

are directed. Also for the analysis of the execution times given above, the following

definition of variables hold:

111 Definition 5.1.7 Variable definitions:

| Mc |= # of concepts in the KB graph

| Mr |= # of relations in the KB graph

| Qc |= # of concepts in the query graph

| Qr |= # of relations in the query graph

| Qe |= # of edges in the query graph | N |= # of graphs in the KB

| KBc |= # of concepts in the whole KB

As can be seen in the stated algorithm, in step 1: Notio collects all the concept and relation vertices from both the KB graph and query graph. This takes O(| Mc | + | Mr |

+ | Qc | + | Qr |). In step 2: Notio attempts to see if any of the concept vertices from the KB graph maps to a concept vertex in the query graph.

In this way attempting to see if there is any possible subgraph isomorphism of the KB graph onto the query graph. In the best case this step is bounded by: O(|

Mc || Qc |) ; for the worst case by: O(| Mc || Qc || KBc |) ; and expected by: O(| Mc ||

Qc || log(KBc) |) . In step 13: Notio (if a possible mapping was indicated from step

2) will attempt to match all the relation vertices from the KB graph (along with their neighboring concepts along their edges) onto query graph vertices with the same edge relationships. As a match is found for relation vertices in the query graph; only those relation vertices are now examined. At the end of this step, it is checked that all relation vertices for the query graph were mapped. In the best case this step is bounded by:

O(| Mr || Qr || Mc || Qc | + | Qe |) , with the arity being binary (so it is just a constant).

112 Algorithm 5.3 Notio Projection 1: Get all concept and relation vertices from the KB and Query graphs 2: for i ← 0,numfirstconcepts do ⊲ all concepts in KB graph 3: for j ← 0,numsecondconcepts do ⊲ all concepts in Query graph 4: foundmatch ← false 5: if (type(ci) == type(c j)) || (supertype(ci) == type(c j)) then 6: if (individ(ci) == individ(c j) || (individ(c j) == 0/) then 7: foundmatch ← true ⊲ match all concepts in query graph 8: end if 9: end if 10: 11: end for 12: end for 13: if foundmatch == true then 14: for i ← 0,numfirstrelations do ⊲ all relations in KB graph 15: for j ← 0,numsecondrelations do ⊲ all relations in Query graph 16: if (!relation[j].mapped) && (type(ri) == type(r j)) then 17: if match from r j to match to each of its concepts then 18: relation[j].mapped = true ⊲ repeat line 2 for all 19: end if 20: end if 21: 22: end for 23: end for 24: foundmatch ← true 25: for j ← 0,numsecondrelations do 26: if !relation[j].mapped then 27: foundmatch ← false 28: end if 29: end for 30: end if 31: if foundmatch == true then 32: P ← build new subgraph projection 33: return P ⊲ return new projection 34: else 35: return 0/ ⊲ no projection returned 36: end if

113 In the worst case step 13 is bounded by: O(| Mr || Qr || Mc || Qc || KBc | + |

|Q | Qe | r ) when the arity of the query graph is fully connected; and expected by: O(|

Mr || Qr || Mc || Qc || log(KBc) | + | Qe |) , again with the arity being binary and only having to go to the hierarchy the height number of times. In step 31: if a projection is found, it is returned.

Therefore the leading step is 13 in the overall running time, so the best case for

finding a projection for all the graphs in the KB = | N | would have a lower bound of

: O((| Mr || Qr || Mc || Qc | + | Qe |))(| N |). Therefore, when the number of graphs in the KB is small, the number of vertices in the KB graphs are small, and the number of vertices in the query graph is small then the execution time would move towards

O(n3) where n =avg # of nodes in the KB graphs. As the KB grows in size and as the number of vertices in the KB graph and query graph increase the expected run-time becomes explosive even though not out of P. However, the worst case bound for the whole KB is very close to the worst case bound given for Ullman’s algorithm above:

|Q | O((| Mr || Qr || Mc || Qc || KBc | + | Qe | r ))(| N |).

5.2 New Algorithms

After examining the above algorithms it was discovered that even though the running times were acceptable with small size graphs and fewer numbers of graphs, the actual algorithms were either not truly general as with SCG or had a very poor execution times with large data sets. With a SCG set of graphs, the user was confined

114 by what parts of a valid conceptual graph could be present in the data. The desire to

allow the user to give a directed, connected, bipartite conceptual graph (see Definition

3.4.3) that was cyclic and contained actors prompted new projection and maximal join

algorithms to be designed.

5.2.1 Supporting Information

In order to produce new algorithms, new data structures and supporting routines

were needed. Because the author believes that the connection between the algorithm

and data structures in the KB is critical, the new data structures and variables need to

be designed around the actual supporting routines.

5.2.1.1 Variables and Given values

Evaluating all the past projection algorithms, and looking at the data struc-

tures used for each knowledge base, the author has discovered that handling conceptual

graphs as triples as opposed to vectors or linked lists makes the operation of projection much easier and cleaner to process. This author is not the first researcher to think about using triples. Kabbaj and Moulin in 2001 [58] looked at CG operations using a boot- strapping step. It was at this time that they also looked at defining the join operation using triples as part of the matching data structure. Even as recent as 2006, Skipper and

Delugach, [113], looked at using triples again in the data structure for the storage of graphs. However, in both cases, they did not look at exploiting the triples in the actual

115 algorithm of the operation.

All conceptual graphs in the KB and the query graph are stored not only with the general conceptual graph information, but also with a C-R-C list and C-A-C list in a cs-triple format. Their definitions are given below:

• cs-triple is a 3-tuple, T =< ci,b,c j >, where ci,c j are concept nodes, and i and

j are not equal. b is a conceptual relation (either a relation or actor node), and

(ci,b) ∈ E and (b,c j) ∈ E, and ci and c j are members in the signature of b.

• defining labels are all elements in a data structure that hold a unique label; that

includes concepts, relations, actors, and cs-triples

• c-r-c list is a concept-relation-concept list that holds cs-triple information in

which the ‘b’ in the 3-tuple is a relation node

• c-a-c list is a concept-actor-concept list that holds cs-triple information in which

the ‘b’ in the 3-tuple is an actor node

During the performance of the projection operation, two added data structures are used.

One data structure holds the matching possibilities of the query concepts with the KB graph concepts, called the match list, and the second structure holds the matching triples from the KB graph for each concept in the query graph, called the anchor list. These data structures improve performance by making available preprocessed information at the time of creating and building the actual projection graphs. These data structure’s

116 implementation will be defined when describing the experimentation systems (see Sec- tion 6.4).

5.2.1.2 Actual Supporting Routines

Because the conceptual information is the structural foundation of a concep- tual graph and because the relationships between the concept define the meaning of the graph, the new supporting routines algorithms define in Algorithms 5.4, 5.5 and

5.6 have been defined around the cstriple relationship of C-R-C. The main supporting routines are: MatchHierarchy, MatchConcept, MatchConcepts, MatchTriple, and Pro- jection. They are the foundation behind the projection operation, and these routines will help in determining the projection operation’s worst case and typical case execu- tion time.

5.2.1.3 Worst Case Analysis for Support Routines

Using the support routines defined in the Algorithms 5.4, 5.5 and 5.6, the worst case execution time will be evaluated.

MatchHierarchy:

The type hierarchy is depicted as a tree of relationships, such that, the maximum depth of the tree is just all concepts from the top, ⊤, to the bottom, ⊥. Therefore, in the worst case the time to match to the given input concept type is to traverse the whole tree, or linear, which is O(n).

117 Algorithm 5.4 Supporting Projection Routines

1: function MATCHHIERARCHY(qi,n j) ⊲ q ∈ Q and n ∈ G 2: foundmatch = false 3: if check flag for supertype then 4: check to see if qi is a supertype of n j ⊲ check up hierarchy 5: if qi is supertype of n j then 6: foundmatch = true 7: end if 8: else 9: check to see if qi is a subtype of n j ⊲ check down hierarchy 10: if qi is subtype of n j then 11: foundmatch = true 12: end if 13: end if 14: if foundmatch = true then 15: add to match list 16: return n j ⊲ return n j as a match 17: else 18: return NULL ⊲ return NULL as no match 19: end if 20: end function ⊲ Check if concept match in hierarchy

21: function MATCHCONCEPT(qi ,n j) ⊲ q ∈ Q and n ∈ G 22: if check match list for q, n match then 23: return n j ⊲ return n j as a match 24: else 25: if type(qi) == type(n j) then 26: M ← {qi,n j} as match 27: return n j ⊲ return n j as a match 28: else 29: return MatchHierarchy(qi,n j) ⊲ Check if match in hierarchy 30: end if 31: end if 32: end function ⊲ Check if concepts match

33: function MATCHCONCEPTS(qi ,G) ⊲ q ∈ Q and G ∈ KB 34: for each n j ∈ L, where j =1to c(G) do ⊲ L is a list in G 35: C ← MatchConcept(qi,n j) 36: end for 37: return C ⊲ All matching concepts from KB graph to Query graph concept 38: end function

118 Algorithm 5.5 Supporting Projection Routines (Cont1)

1: function MATCHTRIPLE(ta ,sb,directionp) ⊲ t ∈ Q, s ∈ G and 2: ⊲ directionp is a BOOLEAN 3: if (directionp == true) && ((direction from ta) == -1)) then 4: match ← false 5: end if 6: match ← Compare relation type of ta to relation type of sb 7: if match == true then 8: match ← Compare MatchConcept(ta,cb,sb,cb)!= NULL 9: else 10: match ← false 11: end if 12: if ((match == true) && (directionp == false)) then 13: match ← Compare (direction from ta == direction from sb) 14: else 15: match ← false 16: end if 17: if match == true then 18: return true ⊲ Indicate two triples are a match 19: else 20: return false ⊲ No triple match 21: end if 22: end function

MatchConcept:

This routine must first check to see if the query concept, qi , is found in the match concept, n j, match list, and in the worst case this takes time O(c ∗ m) , where c is the number of concepts in the query graph, Q, and m is the number of concepts in the match graph, G. If this check fails then next is to compare qi and n j for a match in both concept type and referent. This is a constant time operation. If this succeeds, then adding to the match list is in the worst case O(c ∗ m); if not, then worst case running

119 time will be O(n) which is the height of the type hierarchy tree. Overall the total worst

case running time for this routine would be O(c ∗ m + n).

Algorithm 5.6 Supporting Projection Routines (Cont2) 1: function PROJECTION(i,W,G,Pset) ⊲ i,W ∈ Q 2: t ← Number of elements in the qilist ∈ W 3: z ← Size of Pset 4: if (i == 1) then 5: for each sa ∈ qi,wherea=1tot do 6: Pset ← AddNewProjection(sa, G, PSet) ⊲ Starts Projection Graph 7: end for 8: else if (t == 1) then 9: s1 ← only element of qi list 10: AddToExistingProjection(s1, G, Pset) ⊲ Add to existing Projection Graph 11: else 12: Pset′ ← 0/ 13: for each sa ∈ qi,wherea=1tot do ′ ′ 14: Pset ← ProcessProjection(sa, G, PSet,Pset ) ⊲ Process Proj Triple 15: end for 16: Pset ← Pset ∪ Pset′ 17: end if 18: return Pset ⊲ Return created and modified Projection Graphs 19: end function

MatchConcepts:

This routine will process all the concepts in the match graph, G, where m is the number of concepts in G. Since to process the concepts the routine MatchConcept is called and

its worst case running time is known to be O(c ∗ m + n), then the total worst case time

for this routine would be O(m ∗ (c ∗ m + n)).

120 MatchTriple:

Within this routine, the driving step would be step 11. This step of the algorithm would

call the routine MatchConcept where its worst case running time is known to be O(c ∗ m + n). Therefore, in the worst case this routine would also be O(c ∗ m + n).

Projection:

This is the routine for creating and building the new projection graphs where there is

a structural match after finding the matching cstriples between the two graphs. Within

this routine are three major step that depend on the processing of the anchor list: 1)

when first concept in the anchor list; 2) when only one related triple matching for the

concept in the anchor list; and 3) when neither of the first two conditions exist. The

driving section of the algorithm in this routine is this third type of processing. As

can be seen at step 11 of the algorithm, this step calls to routine ProcessProjection.

ProcessProjection checks to see if a new projection graph has to be started by copying

an existing projection or if an existing projection graph can just add the current cstriple

being processed in the For Loop. The easier of the two functions is to add to an existing

projection, but time must be taken to find which projection graph to add to so from the

algorithm it can be seen that is time z, which is the size of Pset or the # of projections.

The more complex modification would be to copy an existing projection graph

in order to add the new cstriple being processed. It was just seen that to add a cstriple

121 is time z, but at each time step within this processing would be the time needed to

copy a projection graph, which will be called d times the number of projection graphs

that must be copied which is t. Therefore the worst case time for this step would be

O(z ∗t ∗ d). It should be noted that the size of Pset which is z would be growing much faster than the time needed for copying, d, therefore, d can be dropped from the running

time leaving O(z ∗t).

There is a relationship between t, the number of triple matches for this concept in the query graph, and z, the size of Pset; that is, in the worst case z = ti−1. During the processing of this routine, if all triple matches lead to a new projection graph, then the number of projection graphs currently in Pset will be the number of all triple matches currently processed from the anchor list or ti−1. On replacement of z, one gets a new

worst case running time of O(ti−1 ∗t)or just O(ti).

5.2.2 New Projection

As seen in Algorithm 5.7 for the new projection of the query graph onto the KB

is based on looking at all triples that are in the query graph and checking for a complete

subgraph match of the query graph onto the KB graph. Because each triple in the query

graph is unique, even if the node type is not, all projections can be found in the KB

graph.

122 Algorithm 5.7 New Projection 1: function NEWPROJECTION(Q,KB) ⊲ Query and KB graphs 2: P = 0/ 3: for each G ∈ KB do ⊲ All graphs in KB 4: W ← A list from Q ⊲ Preprocessing 5: for each qi ∈ W, where i =1to c(W) do 6: if ((M ← MatchConcepts(qi,G)) > 0/) then 7: for each n j ∈ M, where j =1to M do 8: match = false 9: for each ta ∈ Q do 10: ⊲ where a = 1 to the # of cs-triples in crc list for qi 11: for each sb ∈ G do 12: ⊲ where b = 1 to the # of cs-triples in crc list for n j 13: if MatchTriple(ta,sb,true) == true then 14: add (n j, (sb,ta)) to qi ∈ W 15: match = true 16: end if 17: end for 18: end for 19: if match == false then 20: break out of loop and start next graph in KB 21: end if 22: end for 23: else 24: break out of loop and start next graph in KB 25: end if 26: end for 27: Pset = 0/ ⊲ Projection processing 28: for each qi ∈ W, where i =1to c(W) do 29: Pset = Projection(i,W,G,Pset) 30: end for 31: P ← P ∪ Pset 32: end for 33: return P ⊲ Set of projections from query onto KB 34: end function

123 5.2.2.1 Actual Algorithm

The overall algorithm (see Algorithm 5.7) for the projection of the query graph

onto the KB is checking for a complete subgraph match of the query graph onto the KB

graph during preprocessing. Because each triple in the querygraph is unique, evenif the

node type is not, all projections can be found in the KB graph. Then after all matches of conceptual units and triples are found, the actual projection graphs are built. However, because the temporary data structures are saved from the preprocessing, matching does not have to happen again at build time. The actual projection just uses the match list and anchor list already created to build up or create the new projection graphs. Because the anchor list contains all available projections, both injective and non-injective or homomorphism projections are found.

5.2.2.2 Execution Time

Now that the algorithm is split into two sections, there is a running time for answering the decision question of whether or not there is a projection, it will be called the matching algorithm, and a running time for the actual projection. For the new algorithms, three modifications have been made that affect the execution time of the projection operation: 1) all nodes and triples are uniquely labeled, 2) the edges are not labeled, but do have implied labeling through their directionality within the triples, and

3) the triples are not only part of the data structure of the KB, but also directly effect the actual projection algorithm. The labeling drives the execution time of the matching

124 algorithm when doing an injective projection toward the running time for a subgraph

‘labeled’ isomorphism problem which can be solved in polynomial time as opposed

to a straight subgraph isomorphism problem which is known to be NP-complete. The

triples allow the matching algorithm to stop sooner when no projection is possible.

For the actual projection creation, the number of triples in the query graph drives

the amount of time needed for the actual projection. The size of the graphs in the KB

affects the base of the execution time, but the number of times the Projection function

is executed is based on the number of triples in the query graph.

5.2.2.3 Worst Case Analysis for Projection

The actual projection operation algorithm is broken down into two steps: Pre-

processing (mapping of concepts) and Projection (structural build of new projection

graphs). Within the preprocessing step, the ‘forward’ concepts from the query graph,

H, that are in anchor list, W, are unified (or matched) to concepts in the match graph,

G (see 9 and 11). Because in the worst case the number of ‘forward’ concepts in H

is equal to the total number of concepts in C minus 1 from now on in this analysis the

number of elements in W will be seen as the number of concepts in H. Since in the worst case the number of concepts in H is equal to the number of concepts in G then

the number of concepts in H will be called m. For the rest of the processing of the

preprocessing step, it will be recognized that there are four nested For Loops with each

being connected to the value of m. In two of the four loops, they will be executed m

125 times with constant time internal processing. The second For Loop at step 26, involves a call to MatchConcepts which has already been seen to have the worst case running time of O(m ∗ (c ∗ m + n)). Assuming in the worst case that c = m, on expansion of this time is found O(m3 + mn) or just O(m3) because n can never be greater than m. The fourth For Loop calls the routine MatchTriple that has the worst case running time of

O(c∗m+n) or O(m2) because of the previous reasoning. This would give a worst case running time for the matching processing of O(m8).

The actual projection part loops around the support routine Projection. This routine was discussed as having the worst case running time of O(ti) where i = m when called from NewProjection. Given that the actual projection will loop through all m concepts, in the worst case the actual projection is O(m ∗ tm). Therefore, with the overall NewProjection algorithm, the worst case is driven by the building of the actual projection with the exponential factor on the number of concepts in the query graph.

5.2.3 New Maximal Join

As described in the Maximal Join operation section (see Section 4.2.6), more than one node (or groups of nodes) can be joined between two graphs. When these joins happen, the two graphs are composed into a new graph with possibly more information than the original input graph. However, the joining of the input graph across the KB, producing maximal join graphs are not commutative [90] when semantic considerations come into play. As with the projection algorithm, the overall algorithm (see Algorithm

126 5.8) is split into two parts.

Algorithm 5.8 New Maximal Join 1: function NEWMAXIMALJOIN(I,KB) ⊲ Input and KB graphs 2: J = 0/ 3: for each G ∈ KB do ⊲ All graphs in KB 4: foundmatch = false 5: W ← A list from I ⊲ Preprocessing 6: for each qi ∈ W, where i =1to c(W) do 7: if ((qi != null) && ((X ← MatchConcepts(qi,G)) > 0/)) then 8: foundmatch = true 9: for each n j ∈ X, where j =1to X do 10: for each ta ∈ I do 11: ⊲ where a = 1 to the # of cs-triples in crc list for qi 12: for each sb ∈ G do 13: ⊲ where b = 1 to the # of cs-triples in crc list for n j 14: if MatchTriple(ta,sb, false) == true then 15: add (n j, (sb,ta)) to qi ∈ W 16: end if 17: end for 18: end for 19: end for 20: end if 21: end for 22: Jset = 0/ ⊲ Join processing 23: if foundmatch == true then 24: for each n j ∈ M, where j =1to | M | do 25: Jset = MaximalJoin( j,W,I,G,Jset) 26: end for 27: end if 28: J ← J ∪ Jset 29: end for 30: return J ⊲ Set of joins from input onto KB 31: end function

127 This new algorithm has the matching algorithm (checking for possible joins) happening first, and then the actual joining of the two graphs to build the new maximal join graph being performed second. This work will actually proceed as future work using this algorithm as the starting point.

5.3 Typical Scenario Analysis for Projection Algorithms

Unlike the worst case analysis just evaluated for the projection algorithms, with a typical query sent to a query-answer system, the query graph is much smaller than the graphs in the knowledge base [100]. Basically, this comes about because the user is trying to find a specific piece of data. Looking at the “blocks world” domain area (later to be tested on implemented systems as seen in Chapter 7), one has a knowledge base of graphs that represent blocks on a table. The user wishes to know information like “Is there a red block in the graph?”, or “Is there a blue block above a red block?”. These are very small graphs compared to the graphs in the knowledge base describing all the blocks on a table and their relationships to each other. As well as descriptions about all characteristics and relationships to all the blocks on the table. Blocks world is a well known planning problem [100].

5.3.1 Projection Algorithms using SCG

Both the SCG injective projection algorithms, Mugnier and Chein, and Croitoru, have a direct tie between the matching part of the algorithm and the building part of the

128 algorithm. Also both these algorithms are built from the relation perspective which are typically fewer nodes than concept nodes.

5.3.1.1 SCG Projection

On evaluation of the injective projection algorithm by Mugnier and Chein, given a typical scenario of a much smaller query graph on few graphs in the KB, the execution time is still bound by the fact that the matching of relations and their related concepts, and the building of the image structure are not separated. Therefore the execution time for the searching and building of the projection in this typical scenario has to match every relation from the query graph onto all relations in the match graph at all iterations.

However, if there is no match the structure of the subgraph does not have to be checked any further from that root evaluation. When the typical scenario is very small and the support depth is shallow then this algorithm performs well, but quickly derogates as the number of valid projections and support depth increases because of the re-evaluation of the match each time.

5.3.1.2 SCG Relation Projection

Croitoru has a preprocessing phase to her algorithm to look for matches, and then executes the build phase separately based on the number of relations in the query graph. By doing the preprocessing phase with the matching through the search space, the number of relations from G that are candidates for projection is pruned. Therefore

129 the execution time for the building of the projection graph in this typical scenario is

′ ′ O(qrxgr), where qr= # of relations in Q and gr = the # of relations from G that were

viable candidates.

5.3.2 Notio Projection

This typical case would match up to the analysis of the lower bound, O(n3), for the Notio as discussed in section 5.1.4. Notio does the matching and building of the projection in the same step without pruning the tree. However, Notio only finds a single projection because all relations within a graph must be unique. Even though

Notio can work over full CGs, this constraint does reduce the search space during the

Notio algorithm execution.

5.3.3 New Projection

With this typical case, the new projection algorithm moves towards the best case results possible from the algorithm. To evaluate the typical case using this algorithm,

first the support routines will be evaluated and then the new projection algorithm will be looked at.

5.3.3.1 Typical Case for Support Routines

Using the support routines defined in Algorithms 5.4, 5.5 and 5.6, the typical case can be given a foundation by evaluation by first examining these routines:

130 MatchHierarchy:

The type hierarchy is depicted as a tree of relationships, such that, the maximum depth of the tree is just all concepts from the top, ⊤, to the bottom, ⊥, but in a typical case the tree is a broad tree and the depth of the tree is normally the log(n) where n is the number of concepts in the type hierarchy. Therefore, in the typical case the time to match to the given input concept type is O(log(n)).

MatchConcept:

In the typical case the only step that would not be constant time would be matching to the hierarchy. Since it was just shown that this running time is O(log(n)) then the running time for this routine would be the same.

MatchConcepts:

Since to process the concepts the routine MatchConcept is called and its typical case running time is known to be O(log(n)), then the total time for this routine would be

O(m ∗ (log(n))).

MatchTriple:

Again within this routine, the driving step would be step 11. This step of the algorithm calls the routine MatchConcept where its running time is shown to be O(log(n)) which would also be routines typical running time.

131 Projection:

Since it was seen in the worst case analysis that this routine’s running time is connected to the number of triples in element of the anchor list, then if only one match is available for a query graph concept then only one projection would be produced and the running time for this routine become linear in the number of concepts in the query graph.

5.3.3.2 Typical Case for New Projection Algorithm

In a typical query-answer scenario where the query graph would potentially contain normally one to four triples compared to possibly a thousand in the KB graph, this algorithm takes into account that the query graph is small. Because of that, the time to do thousands of graphs in a KB is only multiplied by a constant based on the maximum number of triples in a KB graph that the small query graph is projected onto.

The preprocessing part is again based on the number of concepts in the query graph. However, for a typical scenario these would be small; probably not more than eight concepts. Now if the four For Loops are evaluated, two of the loops become constant time. The second For Loop at step 26, involving a call to MatchConcepts which has running time of O(m ∗ (log(n))). The fourth For Loop calls the routine

MatchTriple with running time of O(log(n)). Since as stated before n would never be greater than m, this would give a typical case running time for the matching processing of O(m2 ∗ log2(m)).

132 The actual projection part of the algorithm is multiplicative in the number of projections available with this query graph. Since in the most common case there is only one projection, the actual projection creation algorithm becomes polynomial (in fact linear as seen in the Projection routine analysis).

The preprocessing part now becomes the driving step in the algorithm and shifts the execution of the problem to one that is polynomial. Through this shift in search problem performance, the running time for the projection operation for a typical sce- nario within a query-answer application shows improvement .

133 134 CHAPTER 6

SYSTEMS/ENVIRONMENTS AND IMPLEMENTATIONS

This chapter discusses each example system’s basic features, as well as how it is used in one or more of the previously defined knowledge representations, ontology elements and ADTs.

6.1 Semantic Network Systems

For each semantic network system the good points/features will be brought for- ward and also the drawbacks of each system. Each of these good and bad features will attempt to be defined in a factual way.

6.1.1 KL-ONE

The KL-ONE language was originally formulated by Ron Brachman’s Ph.D. dissertation from Harvard [66]. It was built into a system at Bolt Beranek and Newman

(BBN) by Woods and Schmolze [141].

The KL-ONE system was designed originally around the classic frame system.

As stated earlier, “frames” could be defined as a knowledge representation type all to itself, but for this work they are classifying it as a sub-type of semantic networks. Typi- cally a frame will include an “isa” or “ako” pointer to a more general frame from which

135 additional slots can inherit [141]. KL-ONE forms a taxonomy hierarchy out of multi- ple links of this type, therefore forming a partial ordering of concepts for inheritance.

Taxonomies were discussed in Section 3.2.

KL-ONE is made up of concepts, roles, and fillers. Structured concepts are el- ements standing in specific relationships to each other [141]; roles are the entity names for the relationships; and fillers are the structural conditions of the roles. Concepts are represented in the semantic network by ovals, roles are circled squares, and structural conditions are double ovals attached to diamond shaped lozenge (see Figure 6.1).

Lintel # = 1

V/R Arch Block Upright V/R # = 2

Objects

Noncontact Supported

Support Supporters

Figure 6.1: A KL-ONE Diagram of a Simple ‘Blocks-World’ Arch (Based on [[141], Figure 1]).

136 Concepts can be generalized from other concepts. These super-concepts spec-

ify a class of which the defined concept is a subclass. In this way, KL-ONE structures

depict a mapping of inheritance. An example from Wood’s work [141] is a concept

[appreciable debt obligation] which has super-concept link to two parents, [debt obliga-

tion] and [appreciable asset]. This example illustrates the utility from multiple parents;

it is also directly represented within the semantic network.

Concepts may also be primitive in definition. That means that the collection of

super-concepts, roles and structural conditions are necessary, but not sufficient to define

the concept. These concepts are indicated in the semantic network representation by

putting an asterisk by the oval [141]. Concepts may also be individual, that is they are a

member of a set and not the set itself. Many times they are the instantiation of a generic

concept and are represented in the semantic network by diagonal shading inside the

concept.

Roles also have different forms of structure. Value restrictions on roles are con-

cepts that characterize constraints on possible role fillers [141]; they are shown by roles

with an arrow coming from a role to the concept that applies the constraint. Number

restrictions may also be applied for the maximum and minimum number of allowed

fillers. These are seen in Figure 6.1 by the use of “# = ”; they may also be

a range such as “#,”. Roles may also be “chained” together to produce an access path from the concept being de-

137 fined to the intended filler [141]; this would be depicted by a using a small triangle between the structural condition diamond and the role. The chain is necessary because it constrains the filler of the specific role. If roles are just linked together a square is used for the intermediate roles.

As discussed previously in this section, taxonomic structures are built into the semantic structure of KL-ONE. This means that at the internal representation level, sub- sumption and other terminological operations must be considered and at the ADT level these operations must be available. Putting the taxonomy hierarchy inside the semantic network was a deliberate act [141], but as discussed earlier, the taxonomy is part of level 0 and is not supposed to be part of the semantics of the actual network. Therefore, the classification operation is used to place new descriptions into the taxonomy at their correct position [141], and the internal representation must be able to interact with the semantic network representation when editing the network.

Within the internal representation level, KL-ONE makes a distinction between terminological components and assertional components. The terminological compo- nents are called “T-Boxes” and assertions are called “A-Boxes”. The t-box is responsi- ble for specialized types of reasoning that follow from the structure of the terms, that is definitional information, where the a-box is responsible for general reasoning and provides factual information to the system. Later systems based on KL-ONE allowed

“hybrids” between these components [141]. Given the three types of ADT defined in

138 Section 6.3, the logical ADT would best represent this system.

An important goal of KL-ONE was to make useful KR services available and as the system was developed the expressive power of the system increased [141]. With the development of the roles and fillers, the quantitative relationships were fully im- plemented; however, this system did not provide for qualitative relationships. In fact, frame-based systems are severely limited when dealing with procedural (qualitative

relationship) knowledge [137].

6.1.2 SNePS

SNePS is a system designed for representing the beliefs of a natural-language- using intelligent system [110]. At the semantic network knowledge representation level, it consists of nodes and labeled, directed arcs. The nodes are the terms or con- cepts of the network and the arcs are like grammatical punctuation. All entities in all the versions of SNePS are nodes [110]; the nodes are four basic types: base nodes, variable nodes, molecular nodes and pattern nodes. Base nodes represent some particular entity within the network, while variable nodes represent arbitrary individuals, propositions, etc. that are distinct from the rest of the network. Neither base or variable nodes have output arcs. Molecular nodes represent propositions, rules and “structured individuals”, while pattern nodes are like open sentences or functional terms with free variables. Both molecular and pattern nodes have input and output arcs and are structurally defined by the arcs. Every node has an identifier and base nodes may be identified by the user (all

139 others are system generated identifications).

The arcs were defined differently for different versions of SNePS. Within the current version, there are two types of arcs: descending and ascending. The arcs rep- resent relationships. The current system also has a belief revision system as a standard feature. As part of this system, assertion tags ‘!’ are appended onto asserted nodes. For an example of how the semantic network representation looks see Figure 6.2.

snsequence

lex

plan M18 M19!

object1 object2 act

action action object1 object1 M11 M13 M14 M15 M17

lex lex

put A object2 action

object1 object2 object2

M10 M16

lex lex lex M12 B table stack

Figure 6.2: A SNePS Representation of “A on B on a Table” (Based on [[110], Figure 12]).

140 Internal to the SNePS system are incorporated some theoretical decisions [110]:

• the systemwillnot builda new node where there is already a node in the structure.

• two variables in one rule can not be instantiated to the same term.

• the universal quantifier is only supported on a proposition whose main connective

is one of the following: and, or, min/max, or thresh.

Given these restrictions, SNePS is not much more than an intensional propositional representation; however, the inference package, SNIP, is a direct part of SNePS and adds to the capabilities of the system.

SNIP must be able to interpret rules properly because it is a separate system and because operator-based formulations may be added on top of SNIP. Also the belief revision system is also built above SNIP. Therefore, when looking at a possible internal representation for SNePS one would need the functionality of predicate calculus. This would also mean that the logical ADT would need to be chosen.

SNePS is a very straight forward representation. It has only nodes and arcs, and puts everything into the semantic network structure. There is no hierarchy being ap- plied to the network or even structurally incorporated into the network, thereby keeping it very simple. Belief processing is available through assertion tags and operator-based formulations may be added on top of the system through procedures. However, only

141 universal quantification is available, therefore limiting knowledge that can be repre- sented. Also no qualitative relationships are possible.

6.1.3 SNAP

SNAP stands for Semantic Network Array Processor and was implemented at the University of California. It is a parallel computer architecture with a semantic net- work representation of the permanent knowledge being stored [72]. The actual model is one of marker-passing and the knowledge-base does not do much more than just general production rule processing (see Figure 6.3 for an example).

university city

is-a is-a

is-in is-in USC Los Angeles California

Figure 6.3: SNAP Semantic Network of “USC in LA, CA” (Based on [[72], Figure 2]).

The permanent knowledge for the knowledge-base is stored at start up time.

Nodes are terms or concepts and the arcs are the labeled relations between the nodes.

For each new relationship within the knowledge-base an instruction is created by the controller of the machine, transformations are performed and node assignments are

142 done, and then commands are broadcast to specific array processors for storage of the

knowledge [72].

The temporary knowledge is where the markers are processed. Markers are

flags that travel around a distributed intelligent network. Marking nodes indicate that

they are relevant to the current action. Markers may also have attributes associated with

them.

The inference engine controls the two knowledge areas, but the job of the infer-

ence engine is controlled by the controller on the machine and the intelligent network.

The markers are controlled by the inference engine and spread the searches and queries.

Because of the simpleness of the actual semantic network knowledge represen-

tation, the internal representation can just be basic data structures and the basic ADT

for the IF .. THEN structure can be used. This does not give much expressive power to

the semantic network, but it does allow parallel processing across an intelligent network

which provides much potential for the future.

6.1.4 CS Initial Project - PEIRCE

The PEIRCE project is named after the American philosopher and logician

Charles Sanders Peirce [37]. In 1883, Peirce developed the first linear notation for

first-order logic [86]; however, he felt that the predicate notation for logic was unduly

complex [121]. Then in 1897, Peirce invented existential graphs [86, 25] with the sim- ple mechanism of graphs within a context that were parts of larger graphical notations

143 [121, 37]. John Sowa then used these existential graphs as his foundation for his Con- ceptual Graph theory [119].

The PEIRCE project is designed to be built out of conceptual graphs[37]. It originated as a joint effort for different systems being built out of conceptual graphs across the world to work together [37]. Over time, it became a project at the PEIRCE

Foundation being built by its director Gerard Ellis in Australia. An example of a graph within the PEIRCE system is given in Figure 6.4.

schema for Age(x) is

1 Person: ∀ Chrc Age: Ε *x Ptim Date: ∀

Chrc DT-Birth Diff-DT

Birth

Ptim Date: Ε 1

Figure 6.4: PEIRCE Schema for Age (Based on [[119], Figure 6.5]).

144 These graphs are made out of conceptual structures which were discussed in

Section 2.3.2.3. The PEIRCE system is divided up into the following modules [37]:

• Programming standards

• Database storage and retrieval

• Linear notation input and output

• Massively parallel hardware

• Graphical editor and display

• Conceptual catalogs (ontologies)

• Programming in conceptual graphs with constraints

• Inference/theorem-proving mechanism

• Learning mechanism

• Natural language parsers and generators

• Information systems engineering

• Vision system

The followingmodulesare the only ones that are within the scope of this work: Database

storage and retrieval; Graphical editor and display; and Programming in conceptual

145 graphs with constraints. Because of the difficulty of collaboration with many people none of these original modules made it past the design phase, but this work was very important as these original designs were used within other tools that have been devel- oped for the conceptual structures community.

The database storage and retrieval module was responsible for storing concep- tual graphs. It was to use a C++ ADT for graph operations and generalization hierarchy operations. These were to incorporate the fundamental operations of graph matching and unification (maximal join [119]). They also perform generalization and specializa- tion operations on the hierarchy. As stated in the Ellis work [37] large knowledge bases were being created for processing, but it has taken some time to deliver these to the community.

A graphical editor and display that was constructed in X-Windows and executed on all versions of Unix available (including Linux for PCs) was one of the foundational modules. This same module runs under Windows. Growing out of this effort is the very complete graphical editor CharGer developed by Harry Delugach [29, 30]. To go along with the editor would be a compiled language that will allow programming in conceptual graphs with constraints.

This actual system would be available once the ADT has been coded and boot- strapped into a compiler for conceptual graphs. Two systems, Amine and FMF, have grown out of this effort and given Prolog compilers that include CGs as part of the lan-

146 guage [55, 56, 124]. The system is mainly a set of concepts and tools, however it will address all quantitative and qualitative relationships and generalization and specializa- tion operations when functioning.

6.2 Conceptual Graphs Environments

6.2.1 CoGITaNT

CoGITaNT has several useful utilities: a set of library routines in C++ for con- ceptual modeling, some knowledge bases in conceptual graphs, and an XML specifica- tion for CGXML [64]. All documentation is in French and none is available in English

(including the installation instructions). In the future, documentation should be avail- able in English which will allow this author to test and evaluate this very complete system.

6.2.2 Amine

Amine is actually a “platform” as opposed to an environment [55]. Its main processing is a multilingua system for ontologies [54]. It was originally built on a conceptual structures internal representation, with a storage representation compiled through Prolog [57]. Now that it has been converted to a platform, it is written in Java.

At the present time, only the ontological hierarchies have been converted, but all the storage representation will soon be made available. Amine is using CGs as an internal representation for machine translation from French to English.

147 6.2.3 pCG

pCG is “a process operating upon a CG”. It was developed at the University of

South Australia by David Benn under the direction of Dan Corbett [10, 9, 8]. This is based on the work of Guy Mineau at the Universite Laval [69, 70]. This system imple- ments its process mechanism by using the Java library routines of Notio developed by

Finnegin Southey [117].

pCG had several design goals: 1) making concepts, graphs, actors and processes

first class citizens with the pCG language; 2) easy extensibility; 3) rapid development;

4) portability; and 5) minimality [10]. Of these goals, the first, fourth and fifth were the most interesting to this author. By making all value types first class types in this language, every type can be passed as a parameter to functions for execution. Portability was available by using Java as the language and the ANTLR1 construction tool for parsing. This system was constructed and designed with as few of constraints and built- in keywords as possible. Therefore many functions that are already available within the

Notio system are directly possible from pCG.

”The pCG language is multi-paradigm, since apart from its object-based char- acteristic, pCG supports imperative (variables, assignment, operators, selection, iter- ation), functional (higher order functions, value, recursion), and declarative styles of programming” [10]. This created the opportunity for interoperability between pCG

1http://www.antlr.org

148 and other systems.

6.2.4 CPE

The Conceptual Programming system (CP) was originally developed as a sin- gle, standalone application [92, 93] that handles temporal, spatial and constraint infor- mation [47, 94] using a knowledge base of Conceptual Graphs (CGs) [119]. CP was a knowledge representation development environment with a graphical framework, that had a set of tools that used graph structures and operations over those structures to do knowledge reasoning.

All knowledge within the system is stored and operated on as a graph. These graphs are implementations of Sowa’s Conceptual Graphs [119], but also retain many of the features of graph theory [46]. Although there exists a mapping from CGs to formulae in first-order predicate calculus (FOPC), the operations used in the CP sys- tem take advantage of the graphical representation; therefore, the data structures and operations over the graphs use graph theory [46] instead of FOPC.

The original system was a single application written in Lisp and ran only on a

Symbolics machine. The data structures were CGs defined using link lists of structure elements where the structures held the node information and the links were the edges of the graphs. All graphs had to be entered directly into the environment’s editor, and each graph was stored into the environment’s knowledge base. The CP inference engine would then operate over these data structures; sometimes creating new graphs or partial

149 models of conceptual graphs and storing them into the environment’s knowledge base.

In the old environment, there was no way to import or export any of the graphs or models. This prompted investigation into alternative data structures and models to allow other applications and systems to communicate with the CP application [95, 96,

49]. Harry Delugach’s invited talk at ICCS2003 [31] outlined a framework for building active knowledge systems. By 2004 the Conceptual Programming Environment (CPE) had been introduced with its new modular, multi-component design to increase the flex- ibility of the environment and to allow modules to be used outside of the environment by other systems [87]. The viewpoint on the redesigned was to make the CP Environ- ment be the “heaven” displayed in Delugach’s framework. At that time, the main form of interoperability was by using the CGIF interchange format (see Section 3.4.4). John

Sowa, in a paper published in 2002 as part of a ”Special Issue on Artificial Intelligence” of the IBM Systems Journal [124], proposed a modular framework as an architecture for intelligent systems because of the flexibility in communication and interoperabil- ity it provides. This flexible modular framework (FMF) allows different applications in different memory spaces to communicate using a blackboard architecture of mes- sage passing between applications. FMF would be very useful in implementing the reference framework discussed in Aldo de Moor’s RENISYS specification methodol- ogy [34] because FMF handles interprocess communication across computers as well as processes, and it would also be useful in developing the intelligent agent operations from Delugach’s framework [31]. However, the modularization of CP is at a module

150 component level, rather than the FMF process communication level, so that the module can be directly “tied-in” to another application. The modular design at the component level also allows modules to be interchanged as units, as in modular furniture, to get the most flexibility from the environment.

The modularization of the CP Environment allows parts of the environment, the actual modules, to be both interfaced and interacted with by outside systems or appli- cations. It also has a specific module, CGIF, that creates a mechanism to import and export CGs created from execution of the environment’s inference engine modules and storage in the environment’s knowledge base. CPE included simple wrapper modules to allow other languages, besides C and C++, to use the CGIF module.

6.2.4.1 Basic Architecture for the Environment

Figure 6.5 depicts the new directionality of the CP Environment. The very light gray background area indicates what is actually part of the environment. The light gray oval depicts applications, i.e. the pCG reasoning and language system. The medium gray rounded-corner-square represents editors that are available for CGs, i.e. ARCEdit; these editors should be able to import/export CGIF formatted files. The light gray trapezoid and drum shapes indicate data that is not necessarily graphical in nature, but may be part of a domain of information that a user wishes to process (note: the data in the database need not necessarily be textual and may be graphical or any visual form).

The very dark gray shapes are modules that are part of the CP Environment and use

151 Figure 6.5: Current CP Environment (From [[87], Figure 1, page 322]).

the environment’s internal data structures. All solid arrowed lines in the figure indicate

data or processing that is currently available; dashed arrowed lines indicate where an

interface, connection, interaction, and/or translation should be available between these elements, but is not currently present.

6.2.4.2 Data Flow within the Environment

Because the architecture is set up as a set of modules, each moduleisset up as a

DLL (under Windows) and a library (under Unix or Linux) depending on the operating

152 system. It also has a specific module, CGIF, that creates a mechanism to import and export CGs created from execution of the environment’s inference engine modules and storage in the environment’s knowledge base. This mechanism can be “plugged-in” to other applications by using the CGIF module’s API specification to call the module’s implementation code level [117]. All the modules have available APIs to allow their library routines to be called by other applications. Also, because all data structures can be stored to a CGIF formatted file, graphs can transferred to other applications through the graphs in the CGIF file.

6.2.4.3 Data Structures used by the Environment

When originally conceived it was just an implementation of conceptual graphs algorithms without considering how the data structures affected implementation. In

2000, this system began to change to allow it to be more of a foundational environment that could be used as the underpinning of a multiple reasoning systems. When this environment was first conceived it used a double linked list data structure. On redesign, new data structures were investigated and have been and will be discussed in other chapters.

6.3 ADT Implementations

Given is a discussion of three implementations of the internal representation

ADT definitions discussed in Section 2.3.2. These are just basic ideas of how each of

153 the ADT’s might be implemented. Each of the definitions have been given in pseudo- code that looks like C++, but that does not have anything to do with the programming language that it might be implemented in.

6.3.1 Logical

This ADT could best be implemented in either Prolog or Lisp. The basic struc- ture of the ADT is one of predicates. If the predicate and tree structure of Prolog is used, then implementation is straight forward. The syntax and semantics as seen in the example in Figure 2.2 could be directly mapped onto this ADT. Within the implemen- tation of the ‘query’ procedure, unification and resolution would be performed over the knowledge-base records. This would be performed by using the ‘SupportClauses’ that will be saved during processing. If there is a network present, then the routines that are needed to perform terminological operations would also be executed. The theorem prover would need to use not only the ‘SupportClauses’, but also the stored knowledge- base from the Calculus class. Note: the ‘Logical’ class is where reasoning is performed by use of its function, this is the inference engine, and the ‘Calculus’ class is for storing the knowledge-base.

6.3.2 Basic Data Structures

When implementing the basic data structures that many times are needed for simple rule-based systems, languages such as Lisp or C come to mind. This implemen-

154 tation needs to store records of information (knowledge-base) that can be separated out as “conditional” information and “rule” information. The rule would be used for pro- cessing or ‘fired’ when the conditional is found to be true. If implemented in Lisp then a list representation could be used where the “car” and the “cdr” can give the conditional or rule back from the record. If one used C then a structure holding the elements of the

IF .. THEN record would be used and functions would need to be defined to retrieve the conditional and rule parts of the structure.

The inference engine would be implemented in the ‘query’ function. It would apply the actual knowledge that had been stored in the knowledge-base in order to inter- pret the conditional [65]. This function is also where the actual reasoning is performed.

If there is any network or hierarchy processing to be performed, it would be imple- mented in the inference engine, i.e. marker-passing operations are implemented in this module.

6.3.3 Object

For object manipulation, and in particular, graph manipulation more informa- tion is needed. To work with graphs there are not only record types of information about the objects, but the structure of the graph has bearing on both the syntax and the semantics, or meaning, of the graph and must be stored as part of the represen- tation ADT. As can be seen by the ADT definition, more basic information needs to be stored. Because Java and C++ are object-oriented, these languages work well for

155 the implementation of Conceptual Structures. The implementation must not only know which conceptual units are linked to which relationship, but must have directionality.

By the use of the ‘3WayTable’ data structure, knowing which is the starting conceptual unit and which is the ending concept of the relationship is possible. Also, by evalua- tion of the fetched links, the structure of the physical graph can be known. Through this knowledge, syntax and semantics of the internal representation can be mapped and stored.

When this basic ADT is built upon, qualitative functions can be performed by using the ‘query’ procedure and adding information to the knowledge base about time and space. The ‘Graph’ class would also work with any hierarchy that is used with the knowledge base. In order to do reasoning, several other graph manipulation procedures and functions have been added. Given a specific system, it is possible that this is not a complete ADT and more functions will need to be defined.

6.4 Experiment Systems Implementation

The experimental systems were chosen because they were able to handle full

Conceptual Graphs and did not have the restrictions of SCGs described in 3.4.3. Even though the SCG algorithm by Muginer and Chein has been implemented in the CoG-

ITaNT system, there is no English documentation in order to work with the system, and the Croitoru algorithm has not been implemented. Lastly, the author of the pCG system, David J. Benn, addressed any errors or problems that arose within the pCG

156 code.

6.4.1 pCG - Original Notio

The pCG system as discussed previously is built on top of Notio library. It, like

Notio, was written in Java for portability and used the antlr parsing system to read and process the CGIF format. It was mainly designed to use Notio for the actual match- ing, projection and join algorithms, while developing a language for inputting and out- putting simple programs to do analysis. After examining several currently available systems (see above), pCG was chosen as the most general of the currently available systems. After working with this system, it was discovered that it like several of the other systems, had the following limitations:

• Only a single copy of a relation could be present in a graph. I.E. if a person

had two characteristics of brown hair and blue eyes, creating a graph with both

characteristics was not valid.

[Person]->(CHRC)->[Hair:brown] ->(CHRC)->[Eyes:blue]

• It also only found a single projection of a query into a graph even if others were

present.

However, it was possible to work with the pCG programs (see Section C.1) to directly use many of the same test sets of CG graphs that would test CPE’s data structure varia- tions.

157 The data structures used within pCG/Notio are vector arrays for processing

within an operation. More specifically, that is for matching nodes, structural analy-

sis of graphs, and evaluating the search space during the projection operation. Through

out all of these processes array data structures are use. However, at the very end of the

projection operation, the graphs in the KB are actually translated to be stored in a hash

table even though if the projection operation is again performed on the KB it will be

again loaded into an array data structure for processing. The hash table is only used

when doing a direct retrieval of a graph from the KB; not during operations.

An interesting feature design of pCG can be related to the ‘typical case’ analysis

from Chapter 5 (see Section 5.3). Within the pCG implementation, it computes the pro-

jection by using a two part algorithm; however, these parts are not the same as the SCG

Relation algorithm of Croitoru. The first part is actually more a part of the storage of

the graphs. In the preparation for the projection operation, this algorithm performs an

Assertion phase which commutes the structure of the graph and re-aligns the labels on the elements of the graphs to improve the matching later during the projection. There- fore, as the number of graphs in the KB increases, this Assertion takes a proportional amount of time to the size of the KB. Since in the typical case the size of the query graph is small in the number of nodes and the KB size is small, the Assertion does not have a significant effect on the results, but as the size of graphs in the KB increase and the number of graphs in the KB also increase this Assertion part should have more of an effect.

158 6.4.2 CP Environment (CPE)

This system has been developed since 1988 at New Mexico State University

[93, 94] and has never had the two limitations listed for the pCG system. However,

besides this difference the two systems are very comparable.

As discussed in Section 6.2.4, CP was originally developed to use doubly linked

lists. When linked lists are sorted, but single linked the execution time for retrieval from

a linked list is the same as that of an array data structure. When in the process of re-

designing CP to use new algorithms for the projection and maximal join operations,

investigation was done on what would be good data structures to use with these new

algorithms. Therefore, an array data structure was originally chosen to test with the

projection algorithm because when the array is non-sorted then storage is just an ap-

pend at the end of the list and one does not have to use a sorted, or doubly linked

list. After carefully looking at other data structures, hash tables were also chosen to be

investigated.

There are four variables that hold a direct link between the algorithms and data

structures for the system. By changing their underlying data structure, it is believed

that the projection operation execution time will be altered. These variables are c-r-c and c-a-c, which are part of the CG graph data structure, and match list and anchor list which hold internal data information that will be used to move data from the match- ing pre-processing part of the algorithm to the actual projection building of the query

159 graph onto the KB graph. These variables were defined in Section 5.2.1; their actual implementations will be defined here for use with each test representations.

6.4.2.1 Array (Vectors)

First will be discussed the array implementation for these critical variables (c- a-c will not be showed in this implementation or the next because the block world benchmark did not use actors, but its implementation is very close to the c-r-c data structure). In the following descriptive examples, ‘[]’ indicate indices and ‘()’ indicate structures. c-r-c

[1] -> (GC1, ([1] -> GT1, (GT1, GC1, R1, GC2, 1) [2] -> GT2, (GT2, GC1, R2, GC3, -1))) [2] -> (GC2, ([1] -> GT1, (GT1, GC2, R1, GC1, -1))) [3] -> (GC3, ([1] -> GT2, (GT2, GC3, R2, GC1, 1)))

This data structure would be an array that is part of the cg graph class in which the first part of the structure is the unique concept identifier, for example, at index 3 the key would be “GC3”. Also at every index in the array, there is an array of cstriple unique identifiers, for example, at index 1 the key would be “GT2”, that will retrieve a node structure. This node structure contains the cstriple, forward concept, relation, backward concept and direction. The direction is either a ‘1’ or ‘-1’ indicating if the cstriple, in display format, is proceeding from forward concept to back concept with directed arrow or vice versa. The node structure within the previous built example would be cstriple -

160 “GT2”, forward concept - “GC3”, relation - “R2”, backward concept - “GC1”, direction

- ‘1’. match-list

[1] -> (GC1, ([1] -> QC1 [2] -> QC2)) [2] -> (GC2, ([1] -> QC2) [2] -> QC1)) [3] -> (GC3, ([1] -> QC3)) [4] -> (GC4, NULL)

This list holds the matching concepts between the KB graph and the query graph. In this example structure, an array would hold all the concepts found in the KB graph with a link to an array of all the matching concepts in the query graph. Until a matching concept is found, the second array is NULL. anchor-list

[1] -> (QC1 -> ([1] -> (GC1, ([1] -> GT1,QT1) [2] -> (GC2, ([1] -> GT2,QT2))) [2] -> (QC2 -> ([1] -> (GC3, ([1] -> GT1,QT1)))) [3] -> (QC3 -> ([1] -> (GC4, ([1] -> GT2,QT2))))

The anchor list holds the matching KB concepts that also structurally have the cstriple relationships found in the query graph. By holding both the matching concepts to each query graph concept and the related triples, at build time the anchor list can just be traversed to create the new projection graphs. This example finds two projections where one projection includes concepts GC1 and GC3 in the projection using the GT1

161 cstriple node. The second projection includes concepts GC2 and GC4 in the projection using the GT2 cstriple node. All data structures here are arrays.

6.4.2.2 Hash Tables

The important change in the data structures for the critical variables given above came when it was seen that perfect hash tables (as discussed in Section 3.5.2.1) can improve the overall projection time by greatly improving the actual projection step, or building of the projection graph, in the second step of processing. In the following descriptive examples, ‘<>’ indicate hash tables and ‘()’ indicate structures.

c-r-c

) GT2, ()>) GC2, ()>) GC3, ()>)>

This data structure would be a perfect hash table with a that is part of the cghash graph class. The first part of the structure is the unique concept identifier, for example, at key GC3 would be a perfect hash value for “GC3”. Also at every key in the hash table, there is another perfect hash table of cstriple unique identifiers, for example, at key GT2 would be a perfect hash value for “GT2”. This second hash table has a value that is a node structure. The node structure is also stored in a perfect hash table using the cstriple unique identifier and direction as the key in which the two values create a

162 perfect indexing value for the hash table. The value of the last hash table is the same as the value in the above array implementation. match-list

) GC2, () GC3, () GC4, NULL>

This hash table holds the matching concepts between the KB graph and the query graph.

In this example structure, a perfect hash table would hold all the concepts found in the

KB graph with a link to a perfect hash table of all the matching concepts in the query graph. As before, the KB graph concept that did not have a match would have a NULL in its value parameter.

anchor-list

) GC2, ()>) QC2, ()>) QC3, ()>)>

This example finds the same two projections discovered from the array implementation; however, all data structures here are perfect hash tables and each unique label would produce its own unique index in order to have constant time retrieval.

163 164 CHAPTER 7

PROJECTION EXPERIMENTS, RESULTS AND ANALYSIS

This chapter includes a discussion of the domain “blocks world” problem and the actual experiments tested with it. Each of these experiments use a set of reasoning graphs in the KB for projecting queries against a solution to the “blocks world” prob- lem. These are extended graphs from the benchmark set of conceptual graphs from the

CGTools workshop of ICCS2001.

It will also give all the timing results for the cross matrix of test runs discussed in Section 7.2.1; including a simple analysis of the over all execution times of each test set. Comparing and contrasting each test set analyzing the amount of execution time, overhead time, and space requirements.

7.1 Domain Problem - ‘Blocks World’

Back in 2001, a group of tool developers began the process of truly making con- ceptual graph systems interoperable. A set of benchmarked files of conceptual graphs that can be used by reasoning systems to work with the blocks world domain were de- veloped in CGIF format (see Section 3.4.4). During the 2001 Conceptual Graphs Tools

Workshop1 a set of benchmark graphs were defined and place in files with increasing 1The web location: http://www.cs.nmsu.edu/~hdp/CGTools/, holds the resources for the workshop and the proceedings.

165 difficulty to process the CGIF format [96]. Figures 7.1, 7.2, 7.3, and 7.4 are the contents

of file ‘final_graphs_level2.cgf’ that was able to be processed by all tools submitted to

the workshop.

(GT [TypeLabel: "Entity"] [TypeLabel: "Block"] ;A block is an entity; ) . (GT [TypeLabel: "Entity"] [TypeLabel: "Hand"] ;A hand is an entity; ) . (GT [TypeLabel: "Entity"] [TypeLabel: "Location"] ;A location is an entity; ) . (GT [TypeLabel: "Act"] [TypeLabel: "Pickup"] ;Pickup is an action; ) . (GT [TypeLabel: "Act"] [TypeLabel: "Putdown"] ;Putdown is an action; ) . (GT [TypeLabel: "Act"] [TypeLabel: "MoveHand"] ;MoveHand is an action; ) . (GT [TypeLabel: "Act"] [TypeLabel: "MoveBlock"] ;MoveBlock is an action; )

Figure 7.1: Part 1: Example of Blocks World Benchmark File.

The Part 1 example (see Figure 7.1) is the type hierarchy definition for the concepts used in the benchmark. Entity and Act are directly below the top,⊤, concept

of hierarchy, and the other seven concepts, Block, Hand, Location, Pickup, Putdown,

MoveHand and MoveBlock, are directly above the bottom,⊥, concept. Not a very deep

166 hierarchy.

In Part 2 (see Figure 7.2), one can find the definition graphs for the Entity indi- vidualized as a Block, and for the Acts individualized as Pickup, Putdown, MoveHand

[Entity:'Block' (ATTR [Block*b] [Color]) (CHRC ?b [Shape]) ;Each block has a color and shape; ] . [Act:'Pickup' (PTNT [Pickup*p] [Block*b]) (INST ?p [Hand*h]) (RSLT ?p [Situation: (GRASP ?h ?b)]) ;Each block is picked up using a hand; ] . [Act:'Putdown' (PTNT [Putdown*p] [Block*b]) (DEST ?p [Location*l]) (INST ?p [Hand]) (RSLT ?p [Situation: (Top ?b ?l)]) ;Each block is put down at a location from the hand; ] . [Act:'MoveHand' (DEST [MoveHand*m] [Location*l]) (PTNT ?m [Hand*h]) (RSLT ?m [Situation: (At ?h ?l)]) ;This action moves the hand to a location; ] . [Act:'MoveBlock' (DEST [MoveBlock*m] [Location*l]) (PTNT ?m [Block*b]) (INST ?m [Hand]) (RSLT ?m [Situation: (At ?b ?l)]) ;This action moves the block to a location; ]

Figure 7.2: Part 2: Example of Blocks World Benchmark File.

167 and MoveBlock. The concepts Entity and Act are dominant concepts (see Subsection

B.1 for definition) with internal structure. The < type,re f erent > pair are the external scoped concept definition for the instantiation of the dominant concept. During model processing these individualized definitions can be joined with a reference to the subtype from the hierarchy. Concepts Hand and Location have a concept type and a location in the type hierarchy; however, they do not have any internal structure to be considered.

Part 3 (see Figure 7.3) contains both the relation hierarchy definition for the relations At, Above, OnTable, Top and EmptyHand being used in the benchmark. It also gives the dominant concept Relation internal structure for each relationship because these relations are not axioms to CGs. When these referenced relations appear in other

CGs then the definitional graphs can be joined to them.

The last part of the file, Part 4 (see Figure 7.4), gives the factual graphs con- tained in the knowledge base. From this section of the file can be seen, three cubical blocks with colors ‘Red’, ‘Blue’ and ‘Green’ that are on a table at two locations. Block

#1 is above Block #3 which located directly on the table. Both these blocks are located at Location #5 and Block #2 is at Location #6. The hand is empty and is holding no blocks. Either Block #1 or Block #2 must be Blue in color, but both can be. The file contents can be seen in picture form in Figure 7.5.

168 (GT [RelationLabel: "Relation"] [RelationLabel: "At"] ; Relation At ;) . (GT [RelationLabel: "Relation"] [RelationLabel: "Above"] ; Relation Above ;) . (GT [RelationLabel: "Relation"] [RelationLabel: "OnTable"] ; Relation OnTable ;) . (GT [RelationLabel: "Relation"] [RelationLabel: "Top"] ; Relation Top ;) . (GT [RelationLabel: "Relation"] [RelationLabel: "EmptyHand"] ; Relation EmptyHand ;) . [Relation:'At' (POS [Entity] [Location]) ;An entity is positioned at a location; ] . [Relation:'Top' (OnTable [Block*b1] [Location]) ~[(Above [Block*b2] ?b1)] ;A block on top is at a location and has no blocks above it; ] . [Relation:'EmptyHand' ~[(GRASP [Hand] [Block])] ;A hand is empty when no blocks are in it; ] . [Relation:'OnTable' (At [Block*b] [Location]) ~[(GRASP [Hand] ?b)] ;A block on the table is at a location and not in the hand; ] . [Relation:'Above' (OnTable [Block*b1] [Location*l]) (OnTable [Block*b2] ?l) ;The first block is above the second block at the same location; ]

Figure 7.3: Part 3: Example of Blocks World Benchmark File.

169 [Block:#1] . [Block:#2] . [Block:#3] . [Hand:#4] . [Location:#5] . [Location:#6] . [Block:@3] . ;Block #1 is red; (ATTR [Block:#1] [Color:'Red']) . ;Block #2 is blue; (ATTR [Block:#2] [Color:'Blue']) . ;Block #3 is green; (ATTR [Block:#3] [Color:'Green']) . (OnTable [Block:#1] [Location:#5]) . (OnTable [Block:#2] [Location:#6]) . (OnTable [Block:#3] [Location:#5]) . ;Block #1 is above block #3, and block #2 is at a different location; (Above [Block:#1] [Block:#3]) . ;All the blocks are on the table and not in the hand; (Emptyhand [Hand:#4]) . [Either: [Or: (ATTR [Block:#1] [Color:'Blue'])] [Or: (ATTR [Block:#2] [Color:'Blue'])]] . ;All blocks are cubical; (CHRC [Block:@every] [Shape:'Cubical'])

Figure 7.4: Part 4: Example of Blocks World Benchmark File.

170 Red Blue Green

Figure 7.5: A Picture of the Benchmark File.

7.2 Tests

The tests that were performed were not only to validate that the new projection algorithm produced the correct projection of the query onto the knowledge base graphs, but to evaluate how different parameters effect the running of that algorithm given the data structures used. The data file described in Section 7.1 that was benchmarked was modified to create larger size knowledge bases and larger size graphs in terms of the number of nodes (concepts and relations) in the graphs.

171 Each of the tests were run on a single computer that was running the operating system, Windows XP. There was 2 gigabytes of memory in the machine and all systems were setup to use all virtual memory. No other applications were executed while the tests were being performed. There was also 80 gigabytes of disk space, so there were not space limitations imposed.

7.2.1 Single Appearance of Relation within Graph

Because it turned out that pCG was not able to process more than one instance of a relation type within a graph, two sets of tests were performed. In Table 7.1 is given all the files that were tested by all three systems. A knowledge base with 1, 1000, 2500, and 5000 graphs were each stored in a file; those numbers are across the top of the table.

Then graphs of size 5, 11, 21, 31, 53, and 73 nodes had each of these KB graphs in a

files; those numbers are down the first column. Within each graph in these knowledge bases, all relation types were unique.

Table 7.1: KB Single Relation Graph Files.

1 1000 2500 5000 5 graphs_5_1.cgf graphs_5_1000.cgf graphs_5_2500.cgf graph_5_5000.cgf 11 graphs_11_1.cgf graphs_11_1000.cgf graphs_11_2500.cgf graphs_11_5000.cgf 21 graphs_21_1.cgf graphs_21_1000.cgf graphs_21_2500.cgf graphs_21_5000.cgf 31 graphs_31_1.cgf graphs_31_1000.cgf graphs_31_2500.cgf graphs_31_5000.cgf 53 graphs_53_1.cgf graphs_53_1000.cgf graphs_53_2500.cgf graphs_53_5000.cgf 73 graphs_73_1.cgf graphs_73_1000.cgf graphs_73_2500.cgf graphs_73_5000.cgf

172 7.2.1.1 Increase # of Graphs in KB

One parameter being examined was how the number of graphs within the knowl-

edge base effected the running time. Therefore, 1, 5, 100, 1000, 2500 and 5000 graphs

were stored in the knowledge base for each graph size. However, as will be seen in

the Results section below (see Section 7.4), the times for 1, 5 and 100 graphs in a

knowledge base were so low that there was not significant difference between all of the

systems for evaluation. Therefore, only the 1000, 2500 and 5000 graphs KBs will be

analyzed.

7.2.1.2 Increase # of Nodes in Graphs in KB

Another parameter that was believed to effect the actual execution time of the

projection of the query into the knowledge base was just how many nodes were present

in each graph of the knowledge base. This was somewhat arbitrary because a real world knowledge base would not have a fixed number of nodes in every graph. In fact, the sizes of the graphs would be small for factual data, medium for definitional data, but larger for particle and complete model data. As seen in Table 7.1 above the number of nodes in the graphs of the KBs were increased in the following way: 5, 11, 21, 31, 53, and 73.

173 A sample graph from the single graph KB for each node size is:

5-nodes:

(OnTable [Block*b] [Table])(NAME ?b [Number])

11-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number]) (CHRC ?b [Shape])(LOC ?b [Place])(OnTable ?b [Table])

21-nodes:

(Above [Block*b2] [Block*b1])(OnTable ?b1 [Table]) (ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape]) (LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number]) (CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])

31-nodes:

(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2) (OnTable ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape]) (LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number]) (CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])

53-nodes:

(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2) (OnTable1 ?b1 [Table*t1])(OnTable2 [Block*b4] ?t1) (Above3 [Block*b5] ?b4)(NAMET ?t1 [Number])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])(CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])(ATTR4 ?b4 [Color])(NAME4 ?b4 [Number])(CHRC4 ?b4 [Shape])(LOC4 ?b4 [Place])(ATTR5 ?b5 [Color])(NAME5 ?b5 [Number]) (CHRC5 ?b5 [Shape])(LOC5 ?b5 [Place])

174 73-nodes:

(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2) (OnTable1 ?b1 [Table*t1])(OnTable2 [Block*b4] ?t1) (Above3 [Block*b5] ?b4)(Above4 [Block*b6] ?b5)(Above5 [Block*b7] ?b6)(NAMET ?t1 [Number])(ATTR1 ?b1 [Color]) (NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place]) (ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape]) (LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number]) (CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])(ATTR4 ?b4 [Color]) (NAME4 ?b4 [Number])(CHRC4 ?b4 [Shape])(LOC4 ?b4 [Place]) (ATTR5 ?b5 [Color])(NAME5 ?b5 [Number])(CHRC5 ?b5 [Shape]) (LOC5 ?b5 [Place])(ATTR6 ?b6 [Color])(NAME6 ?b6 [Number]) (CHRC6 ?b6 [Shape])(LOC6 ?b6 [Place])(ATTR7 ?b7 [Color]) (NAME7 ?b7 [Number])(CHRC7 ?b7 [Shape])(LOC7 ?b7 [Place])

It should be noted that as of 21-nodes in the KB graph, it now became the case that a relation type needed to be repeated. Therefore, a number was added to the relation type in order to make it unique.

7.2.1.3 Increase # of Nodes in Query Graph

Returning to the ‘typical case’ discussed in Section 5.3, it was proposed that smaller query graphs would take less time to project onto a KB with larger, more nodes, graphs. Therefore, query graphs ranging in size from 3 nodes all the way to 73 nodes were tested given the constraint the no query graph was larger in size then the KB graph size. This is so the projection was always an injective projection as explained in Section

5.1.1.

Examples of several of the query graphs for the projection are given below in

CGIF:

175 3-nodes:

(ATTR [Block] [Color])

5-nodes:

(ATTR1 [Block*b] [Color])(NAME1 ?b [Number])

7-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])(OnTable ?b [Table])

9-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])(CHRC ?b [Shape])(LOC ?b [Place])

15-nodes:

(Above [Block*b2] [Block*b1])(OnTable ?b1 [Table]) (ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])

27-nodes:

(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2) (OnTable ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])

176 43-nodes:

(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2) (OnTable1 ?b1 [Table*t1])(OnTable2 [Block*b4] ?t1)(Above3 [Block*b5] ?b4)(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number]) (CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color]) (NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place]) (ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])(CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])(ATTR4 ?b4 [Color])(NAME4 ?b4 [Number])(ATTR5 ?b5 [Color])(NAME5 ?b5 [Number])

Not all of the query graphs were given as examples especially if the graph structure is defined in the subsection above for the KB. It should be noted that some of the 3- nodes query graphs were not exactly the one given. This is because several of the query graphs had to slightly modified to make the relations match. This happened with not only 3-nodes queries, but several of the others also. At this time, pCG did not have a relation hierarchy to account for relation types that were actually specializations of other relations, so CPE was not given one either.

The set of queries evaluated with each KB were attempting to test ‘typical’ queries that would possibly be asked by a user when using a query-answer system. This is why not all queries were tested against all KBs that meant the injective projection requirement. A place that this is very obvious is while examining the data in Table

7.2. Here is seen that the query graph with 7 nodes is only used when testing the 11 nodes KB graph. Looking at the structure of the 11 node KB graph, one can see that the

“OnTable” relation without an “Above” relation is only used in this graph. Therefore looking for a block with a name, color and directly on the table without a block above

177 Table 7.2: Single Relation: Query Graph Size Run vs Number of Nodes in KB Graphs.

3 5 7 9 11 15 21 27 31 43 53 63 73 5 X X 11 X X X X X 21 X X X X X X 31 X X X X X X X 53 X X X X X X X X X X 73 X X X X X X X X X X X X

it would only appear in these graphs. That is why this query graph was tested only with this KB graph structure.

7.2.2 Multiple Appearance of Relation with a Graph

Due to the fact that some systems are not able to process multiple relations of the same type within a single graph and it is perceived that this would be necessary for any system working as a general query-answer system, tests were performed on only the two data structure versions of CPE to validate that this projection algorithm is in fact able to handle this type of data. As in the above section, multiple sizes of graphs within the knowledge base were tested as well as multiple sizes of query graphs. Because these tests were for validation and not for execution time purposes, multiple size KBs were not tested.

178 7.2.2.1 Increase # of Nodes in Graphs in KB

Each of these graphs in the KB are designed to test that a query graph that is contained in more than one subgraph will produce all valid projections. In order to test several different query graphs with several different node sizes, KB graphs with 13, 23,

33 and 55 nodes within the graph in the KB were tested.

A sample graph from the single graph KB for each node size is:

13-nodes:

[Block*b1][Block*b2](Above ?b1 ?b2)(OnTable ?b2 [Table]) (ATTR ?b1 [Color])(NAME ?b1 [Number])(ATTR ?b2 [Color]) (NAME ?b2 [Number])

23-nodes:

(Above [Block*b1] [Block*b2])(OnTable ?b2 [Table*t1]) (NAME ?t1 [Number])(ATTR ?b1 [Color])(NAME ?b1 [Number]) (CHRC ?b1 [Shape])(LOC ?b1 [Place])(ATTR ?b2 [Color]) (NAME ?b2 [Number])(CHRC ?b2 [Shape])(LOC ?b2 [Place])

33-nodes:

(Above [Block*b2] [Block*b1])(Above [Block*b3] ?b2) (OnTable ?b1 [Table*t1])(NAME ?t1 [Number])(ATTR ?b1 [Color])(NAME ?b1 [Number])(CHRC ?b1 [Shape])(LOC ?b1 [Place])(ATTR ?b2 [Color])(NAME ?b2 [Number])(CHRC ?b2 [Shape])(LOC ?b2 [Place])(ATTR ?b3 [Color])(NAME ?b3 [Number])(CHRC ?b3 [Shape])(LOC ?b3 [Place])

179 55-nodes:

(Above [Block*b2] [Block*b1])(Above [Block*b3] ?b2) (OnTable ?b1 [Table*t1])(OnTable [Block*b4] ?t1) (Above [Block*b5] ?b4)(NAME ?t1 [Number])(ATTR ?t1 [Legs])(ATTR ?b1 [Color])(NAME ?b1 [Number])(CHRC ?b1 [Shape])(LOC ?b1 [Place])(ATTR ?b2 [Color])(NAME ?b2 [Number])(CHRC ?b2 [Shape])(LOC ?b2 [Place])(ATTR ?b3 [Color])(NAME ?b3 [Number])(CHRC ?b3 [Shape])(LOC ?b3 [Place])(ATTR ?b4 [Color])(NAME ?b4 [Number])(CHRC ?b4 [Shape])(LOC ?b4 [Place])(ATTR ?b5 [Color])(NAME ?b5 [Number])(CHRC ?b5 [Shape])(LOC ?b5 [Place])

7.2.2.2 Increase # of Nodes in Query Graph

When examining Table 7.3, it is seen that not as many variations of query graphs were exam. This is due to the fact that the interest here was in validating that multiple projection graphs could be found within the KB graphs. There was only a limited number of nodes that did in fact appear in some form of multiple projection; after that, as the number of nodes in the query graph grew, the projection operation could only

find a single subgraph projection from the query graph onto the KB graph.

Table 7.3: Multi-Relation: Query Graph Size Run vs Number of Nodes in KB Graphs.

3 5 9 11 13 X X 23 X X X 33 X X X X 55 X X X X

180 The query graph node structure was designed to produce multiple projections given the KB graph. The actual query graph for the projection is given below in CGIF:

3-nodes:

(ATTR [Block] [Color])

5-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])

9-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])(CHRC ?b [Shape])(LOC ?b [Place])

11-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])(CHRC ?b [Shape])(LOC ?b [Place])(OnTable ?b [Table])

Each of these query graphs will produce multiple projection graphs given using the

KBs discussed in Section 7.2.2.1. To look at an example of how this will give multiple projection graphs, if the 5 nodes query graph is projected onto the 13 nodes KB graph it would result in the following two projections:

1. (ATTR [Block*b1] [Color])(NAME ?b1 [Number])

2. (ATTR [Block*b2] [Color])(NAME ?b2 [Number])

181 7.3 Results of Each Experiment Systems

Beginning will be an overview of what was seen for results of each of the ex- periment systems.

7.3.1 pCG - Original Notio

As will be seen and discussed in the sections below the pCG system is very stable when the query graphs are small and when there are few numbers of graphs in the knowledge base. As the size of the query graphs grow towards the size of the graphs within the knowledge base and as the number of graphs within the knowledge grows larger, then the error span increases (see error bar data in Section D.2 of Appendix D) and becomes very unstable.

7.3.2 CP Environment

The new projection algorithm presented in Chapter 5 and Section 5.2.2 gave interesting results in both forms of the tested data structures. The array implementation did very well on the typical case (as it was designed to do); the hash table implemen- tation did not come on strong until the size of the graphs within the KBs was increase.

Below some more information is given on why it is believed that these results were seen.

182 7.3.2.1 Array (Vector)

As laid out in Chapter 6, the data structures here were all arrays. The storage of these data structures were using an unsorted mechanism excepted that each concept, relation, and cstriple all had unique labels and were stored according to their appearance within the CGIF formatted graphs in the file. That did cause the most basic concept node in the graph to many times be stored first in the list; therefore, causing it to be quickly retrieved during the projection operation. However, as the arrays became longer and some of the concept nodes had an equal number of links, it can be seen that the time needed to check the structure and build the projection increased. But the increase and the shape of the resulting polynomial never went outside of the predicted analysis of the algorithm given in Chapter 5 and Section 5.3.3.

7.3.2.2 Hash Tables

This data structures implementation behavior as expected. In using a perfect hash, 1) extra time was needed for storage; 2) more space was needed in order for the

KB to be resident in memory; and 3) there was extra overhead in processing the hash tables. However, even though on the projection of small query graphs onto small KB graphs did not give excellent results, as the size of the graphs within the KBs increased and the number of graphs within the KB increased the simple linear regression [73] showed that the execution time was linear as the size of the query graphs increase. It is believed that the reason that the results were not seen with the more ‘typical case’

183 is because the hash tables were designed and implemented so that no collisions would happen within the tables. This added to both the amount of space needed for the KB to be resident in memory and added extra overhead during processing. However, when the execution time needed for the actual projection in the other implementation reached the overhead for this implementation, this implementation became more efficient.

7.4 Results of Each #of Nodes in KB

As discussed previously in this chapter, each of the graph sizes were placed in a knowledge base of size 1, 5, 100, 1000, 2500 and 5000 graphs. Timings were collected on runs against part of the graph files, but all of the relevant query graphs. However until there were at least 1000 graphs in the KB, there were really no separation in the acquired execution times; therefore, for each graph size below only the timing for 1000,

2500 and 5000 graphs will be given and discussed.

7.4.1 5 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases containing graphs with 5 nodes in them. First is Figure 7.6 containing the results for the projection of query graphs against a KB containing 1000 graphs all with 5 nodes.

As discussed in Section 7.2.1.2, the 5 nodes in the KB graph holds the information of the name of block that is on the table in the ‘blocks world’ domain. Second is Figure

7.7 containing the results for the projection of query graphs against a KB containing

184 250

200

150 pCG CPE CPEHash Poly. (pCG) Poly. (CPE) 100 Linear (CPEHash) time in milliseconds

50

0 2 3 4 5 6 #nodes in Query graph

Figure 7.6: 5 nodes in KB of 1000 Graphs.

2500 graphs all with 5 nodes.

Third is Figure 7.8 containing the results for the projection of query graphs against a KB containing 5000 graphs all with 5 nodes. Looking at this set of 3 charts, there really is not enough information to indicate what the real growth curve is for the projection of the query graphs onto the KB graphs. Therefore, the tests were expanded to include more nodes and more cstriples.

185 600

500

400 pCG CPE CPEHash 300 Poly. (pCG) Poly. (CPE) Linear (CPEHash) time in milliseconds 200

100

0 2 3 4 5 6 # of nodes in Query graph

Figure 7.7: 5 nodes in KB of 2500 Graphs.

1200

1000

800 pCG CPE CPEHash 600 Poly. (pCG) Poly. (CPE) Linear (CPEHash) time in milliseconds 400

200

0 2 3 4 5 6 #nodes in Query graph

Figure 7.8: 5 nodes in KB of 5000 Graphs.

186 7.4.2 11 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases containing graphs with 11 nodes in them. First is Figure 7.9 containing the results for the projection of query graphs against a KB containing 1000 graphs all with 11 nodes.

These 11 node graphs from the KB, as seen in Section 7.2.1.2, contain the block on the table as well as the name, color, shape and location of the block. Second is Figure 7.10 containing the results for the projection of query graphs against a KB containing 2500 graphs all with 11 nodes.

250

200

150 pCG CPE CPEHash Poly. (pCG) Poly. (CPE) 100 Linear (CPEHash) time in milliseconds

50

0 2 3 4 5 6 7 8 9 10 11 12 # of nodes in Query graph

Figure 7.9: 11 nodes in KB of 1000 Graphs.

187 800

700

600

500 pCG CPE CPEHash 400 Poly. (pCG) Poly. (CPE) 300 Linear (CPEHash) time in milliseconds

200

100

0 2 3 4 5 6 7 8 9 10 11 12 # of nodes in Query graph

Figure 7.10: 11 nodes in KB of 2500 Graphs.

Third is Figure 7.11 containing the results for the projection of query graphs against a KB containing 5000 graphs all with 11 nodes. Because there are more nodes and cstriples in the KB graphs, more query graphs can be projected onto these graphs to see more of a gradation in the set of 3 charts. The slope and shape of the curves as the number of nodes in the query graphs increase become more distinct. Even with this smaller number of query graphs being tested, CPEHash, is showing a linear slope and

CPE (array format) performs faster than the other two systems.

188 1600

1400

1200

1000 pCG CPE CPEHash 800 Poly. (pCG) Poly. (CPE) 600 Linear (CPEHash) time in milliseconds

400

200

0 2 3 4 5 6 7 8 9 10 11 12 # of nodes in Query graph

Figure 7.11: 11 nodes in KB of 5000 Graphs.

7.4.3 21 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases containing graphs with 21 nodes in them. First is Figure 7.12 containing the results for the projection of query graphs against a KB containing 1000 graphs all with 21 nodes. These 21 node graphs not only have the information for the block on the table including the name, color, shape and location of the block (see Section 7.2.1.2), but this same information that is defined for the block located above the first block. Second is Figure 7.13 containing the results for the projection of query graphs against a KB

189 400

350

300

250 pCG CPE CPEHash 200 Poly. (pCG) Poly. (CPE) 150 Linear (CPEHash) time in milliseconds in time

100

50

0 2 4 6 8 10 12 14 16 18 20 22 # of nodes in Query graph

Figure 7.12: 21 nodes in KB of 1000 Graphs.

containing 2500 graphs all with 21 nodes.

Third is Figure 7.14 containing the results for the projection of query graphs

against a KB containing 5000 graphs all with 21 nodes. In all three of these charts, the

slopes of the execution time for projecting the query graph onto the KB graphs for each

system are the same. However, the execution time for projecting the query graph onto

the KB is definitely a function of the number of graphs in the KB. For the actual graph

isomorphism, that is when the query graph is the same size as the the KB graph, the

execution time is actually coming together.

190 1000

900

800

700

pCG 600 CPE CPEHash 500 Poly. (pCG) Poly. (CPE) 400 Linear (CPEHash) time in milliseconds in time

300

200

100

0 2 4 6 8 10 12 14 16 18 20 22 # of nodes in Query graph

Figure 7.13: 21 nodes in KB of 2500 Graphs.

2000

1800

1600

1400

pCG 1200 CPE CPEHash 1000 Poly. (pCG) Poly. (CPE) 800 Linear (CPEHash) time in milliseconds in time

600

400

200

0 2 4 6 8 10 12 14 16 18 20 22 # of nodes in Query graph

Figure 7.14: 21 nodes in KB of 5000 Graphs.

191 7.4.4 31 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases containing graphs with 31 nodes in them. First is Figure 7.15 containing the results for the projection of query graphs against a KB containing 1000 graphs all with 31 nodes. These 31 node graphs include the two blocks with their information including the name, color, shape and location of the block (see Section 7.2.1.2). It also indicates that the first block is on the table, and that a third block with all of its information is on top of the second block.

600

500

400 pCG CPE CPEHash 300 Poly. (pCG) Poly. (CPE) Linear (CPEHash) time in milliseconds 200

100

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 # of nodes in Query graph

Figure 7.15: 31 nodes in KB of 1000 Graphs.

192 Second is Figure 7.16 containing the results for the projection of query graphs against a KB containing 2500 graphs all with 31 nodes, and third is Figure 7.17 contain- ing the results for the projection of query graphs against a KB containing 5000 graphs all with 31 nodes. These charts are now showing quite clearly that as the number of nodes in both the query and KB graphs increase, the shape of the curves become clearer.

These curves are coming very close to crossing indicating that with larger graphs some algorithms perform better than with small graphs. In fact the curves are very close together when looking at a large size KB.

1400

1200

1000

pCG 800 CPE CPEHash Poly. (pCG) 600 Poly. (CPE) Linear (CPEHash) time in milliseconds

400

200

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 # of nodes in Query graph

Figure 7.16: 31 nodes in KB of 2500 Graphs.

193 3000

2500

2000 pCG CPE CPEHash 1500 Poly. (pCG) Poly. (CPE) Linear (CPEHash) time in milliseconds 1000

500

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 # of nodes in Query graph

Figure 7.17: 31 nodes in KB of 5000 Graphs.

7.4.5 53 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases containing graphs with 53 nodes in them. First is Figure 7.18 containing the results for the projection of query graphs against a KB containing 1000 graphs all with 53 nodes.

These 53 node graphs (see Section 7.2.1.2) include all three of the blocks in one stack on the table with their information including the name, color, shape and location of the block. Also, there is a second stack on the same table with two more blocks including their information.

194 1000

900

800

700

600 pCG CPE CPEHash 500 Poly. (pCG) Poly. (CPE) 400 Linear (CPEHash) time in milliseconds in time

300

200

100

0 2 7 12 17 22 27 32 37 42 47 52 # of nodes in Query Graph

Figure 7.18: 53 nodes in KB of 1000 Graphs.

Second is Figure 7.19 containing the results for the projection of query graphs against a KB containing 2500 graphs all with the 53 nodes information. Third is Figure

7.20 containing the results for the projection of query graphs against a KB containing

5000 graphs all with 53 nodes. Because the number of nodes in the KB graphs has gotten large enough, the curves have now crossed to indicate that the overhead from the hash tables is no longer having as much effect on the overall execution time. The

CPEHash system is continuing to show a linear curve with the cross-over points being the same in all three charts.

195 2500

2000

1500 pCG CPE CPEHash Poly. (pCG) Poly. (CPE) 1000 Linear (CPEHash) time in milliseconds

500

0 2 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 #nodes in Query graph

Figure 7.19: 53 nodes in KB of 2500 Graphs.

6000

5000

4000 pCG CPE CPEHash 3000 Poly. (pCG) Poly. (CPE) Linear (CPEHash) time in milliseconds in time 2000

1000

0 2 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 #nodes in Query graph

Figure 7.20: 53 nodes in KB of 5000 Graphs.

196 7.4.6 73 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases containing graphs with 73 nodes in them. First is Figure 7.21 containing the results for the projection of query graphs against a KB containing 1000 graphs all with 73 nodes.

These 73 node graphs (see Section 7.2.1.2) include six blocks in two stacks on the table with the information for each block including the name, color, shape and location of the block. The name for the table is also part of each graph in the KB.

1600

1400

1200

1000 pCG CPE CPEHash 800 Poly. (pCG) Poly. (CPE) 600 Linear (CPEHash) time in milliseconds

400

200

0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 #nodes in Query graph

Figure 7.21: 73 nodes in KB of 1000 Graphs.

197 Second is Figure 7.22 containing the results for the projection of query graphs against a KB containing 2500 graphs all with 73 nodes. Third is Figure 7.23 containing the results for the projection of query graphs against a KB containing 5000 graphs all with 73 nodes. With these three system tests, it is seen that 73 nodes shows the clearest result changes between the pCG, CPE and CPEHash. As before, CPE does the best with the smallest (fewest number of nodes) query graphs, but when graph isomorphism is reached, complete coverage of the full graph, then the array vector implementation causes real slow down. As with the 53 node charts the cross-over of the curves happens when testing the same query graph projection in each chart.

4000

3500

3000

2500 pCG CPE CPEHash 2000 Poly. (pCG) Poly. (CPE) 1500 Linear (CPEHash) time in milliseconds

1000

500

0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 #nodes in Query graph

Figure 7.22: 73 nodes in KB of 2500 Graphs.

198 8000

7000

6000

5000 pCG CPE CPEHash 4000 Poly. (pCG) Poly. (CPE) 3000 Linear (CPEHash) time in milliseconds

2000

1000

0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 #nodes in Query graph

Figure 7.23: 73 nodes in KB of 5000 Graphs.

7.5 Analysis of Results

Each of the results given in the section above is laid out by nodes in the KB graphs. This appears to be the most direct way of evaluating the result received from the tests.

199 7.5.1 Change # of Graphs in KB

Looking at the results above for the 1000, 2500 and 5000 graphs in each KB, the curves in each chart are the same and just increase in milliseconds by the propositional number of graphs increased in the KB. Adding large numbers of graphs to the KB puts stress on the amount of memory space needed for processing because the KB needs to stay resident in memory. However on evaluation of each of the KBs by nodes size, the shape of the curves in relationship to the three system implementations is showing no change.

7.5.2 Change # of Nodes in KB Graphs

As the number of nodes in the graphs found in the KB increased, the shape of the curves in the result graphs became more pronounced. That is as the graph sizes increased and the problems moved closed to “real life”, the effects of the algorithm changes and data structures were more prominent. When looking at the results from the 5 nodes and 11 nodes KBs, about all the solutions looked the same except that the hash tables because of their added overhead took longer than both of the other solutions.

However, as the size of the graphs increased, the curves generated by the results took on either a polynomial exponential or linear shape. By 53 nodes in the KB graphs, the solutions had started crossing and taking on shape. In the 73 nodes KB results, the same crossings seen at 53 nodes were present and the CPE hash table implementation was definitely a linear result when tested with a simple linear regression [73].

200 7.5.3 Change # of Nodes in Query Graph

The number of nodes in the query graphs as projected onto the KB were started small and then increased until the subgraph isomorphism was in fact a graph isomor- phism of the KB graph. Because it was desired to only do an injective projection operation, the number of nodes in the query graph never exceeded the number of nodes in the KB graph. As the number of nodes in the query graph increased the number of concepts in the anchor list also increased. This created more matches of query concepts to match graph concepts and therefore more processing of cstriples during the building of projections phase. Therefore, when the number of nodes in the KB graphs increased, the execution time of the faster implementation with small graphs started to increase.

By the time the KB size was to 53 nodes even the smaller query graphs were showing similar execution times. That is for larger size graphs in the KB, the smaller query graph execution was much closer together for all implementations then when the KB graphs were small. Also, as the KB graphs increased in size more variations in query graph sizes could be tested; therefore, allowing the visualization of cross over for the

CPE hash implement. This implementation showed better result than the pCG system at about 27 nodes in the query graphs and better results than the CPE array implemen- tation at about 41 nodes in the query graphs. These results were seen for both the 53 and 73 nodes in graphs of KB.

201 7.5.4 Change # of Identical Relations in Graph

As discussed previously in this chapter, the pCG system can not process iden- tical relations within a single graph. Therefore the tests run with multiple instances of the same relation were confided to validating that the CPE system with both its data structures were correctly finding all projections (see Section D.3.2 in Appendix D for actual output). Table 7.4 shows how many projections were found when running the validation tests. These tests gave the same results for both data structures implemented in the CPE system.

Table 7.4: Number of Projections Found: Query Graph Size vs KB Graph Size.

3 5 9 11 13 2 2 23 2 2 2 33 3 3 3 1 55 5 5 5 2

202 CHAPTER 8

CONCLUSIONS AND FUTURE WORK

8.1 Evaluation of Four Projection Algorithms

Four different, yet related, projection algorithms that use either full Conceptual

Graphs (CGs) or Simple Conceptual Graphs (SCGs) have been described (see Chapter

5). Examining Table 8.1 comparisons between basic units, type of graphs, number of possible projections found, projection question analysis, overall problem analysis of projection operation algorithm execution time and actual projection creation execution time will be evaluated.

Table 8.1: Comparison of Four Algorithms.

M&C Croitoru Notio New Proj basic unit relations relations relations concepts works over SCGs SCGs CGs CGs projs found all # relations 1 all proj question NP-Complete NP-Complete NP-Complete NP-Complete problem NP-Hard NP-Hard NP-Hard NP-Hard proj alg non-impl non-impl n3 n3/n

203 The Mugnier and Chein and Croitoru algorithms use SCGs, and Notio and the new algorithm work over full CGs. Looking back at the example shown when dis- cussing the projection operation, Notio would only find one projection because it was only designed to look for a single projection graph. Croitoru’s algorithm includes a stop mechanism such that the total number of relations in the query graph equals the number of possible projections; therefore, at times it may not find all projections even though the actual algorithm should final all projections.

It is not clear from the Mugnier and Chein 1992 work [74], if they can handle two concept pairs with the same relationship between them in a projection operation.

However, from later work [75], it is indicated that the same relationship between dif- ferent concepts can be found and multiple projections are possible between two CGs, but the algorithm is based on SCGs, that do not use actors, and are not directed graphs.

The Mugnier and Chein algorithm is also based on the relations found within the graph and must traverse all of their signatures to discover if there is a subgraph morphism.

The new algorithm is based on the conceptual units, or concepts, within the graph and can stop searching as soon as there is no match for a concept or concept triple in the

KB graph for one of a query graph’s conceptual unit.

Mugnier and Chein’s algorithm does the whole projection operation as a sin- gle injective projection algorithm, where Croitoru, Notio and the new algorithm all use some form of preprocessing. Notio and the new algorithm have a complete separation

204 between the preprocessing algorithm and projection; where, Croitoru uses the prepro- cessing algorithm inside of the actual projection, therefore, giving the same running time for both the overall algorithm and the actual projection. Notio does preprocessing at storage time that helps in constructing the projection. However, the actual projection search problem after the preprocessing is still NP-Hard.

The new algorithm splits the overall projection algorithm into two parts, match- ing and projection construction. Then data structures are used between these two al- gorithms to use the structure of the graphs to help in the projection process. In the most common case the matching algorithm is the longest running part of the overall algorithm because the actual projection execution is polynomial.

8.1.1 Strengths

All of these algorithms address decision problems that are in the class of NP-

Complete and have search problems that are in the class of NP-Hard, therefore, where the strength of the algorithms come into play is in how they handle ‘typical case’ situ- ations where they would be used.

Since the number of database records and semantic web pages are increasing yearly in the amount of information available, algorithms that can work with knowledge bases with large amounts of data will be on the forefront.

205 8.1.2 Weaknesses

Notio has a definite weakness in that it is actually designed to only find one

possible projection graph even when others are available. Also, the SCG Relation algo-

rithm of Croitoru has a weakness in the stop mechanism of the algorithm. This should

be modified to not exclude any possibilities.

The CPE algorithm potentially could take a large amount of time to process the

matching or preprocessing algorithm. However, in most cases discussed within this

work it does not operate on this end of the spectrum.

8.2 Data Structures and Algorithms Effectiveness Comparison for Implemented Algorithms

Since the Mugnier and Chein algorithm and Croitoru algorithm were not im- plemented, comparison will only be in relationship to Notio and CPE algorithms. The pCG system that implemented the Notio algorithm used an added phase between stor- age and projection to impose internal structure on the stored graphs. Even though this added structural information helped the projection process when the graphs were small in size and the KBs had few graphs, as sizes and KBs increased this added phase be- came very costly in time.

The CPE algorithm when adding the data structures change also showed some changes from when the graphs were small and KBs were small to when these elements increased.

206 8.2.1 Strengths

pCG algorithm was efficient in its size and memory usage. The graph was stored in a very tight array and hash table structure. Many graphs could be processed before the memory available had to be increased in order to have processing of the projections.

The CPE array data structure also was efficient in its size and memory usage. In fact, in all tests it never ran out of memory even when the 5000 graph KB was resident.

The hash table data structure CPE had the advantage that it executed in linear time as the number of nodes in the query graph increased.

8.2.2 Weaknesses

With the pCG algorithm, the Assertion phase became a real ‘bottle neck’ as the number of graphs in the KB increased. Because it wanted to compare all the structures of all the graphs within the KB when asserting them, this took over an hour of actual execution time to assert the 73 node KB with both 2500 and 5000 graphs.

CPE hash table implementation require a lot resident memory for processing the large KBs. This was because it used 10000 element hash table indices to be sure that the labels for all elements (concepts, relations and triples) were unique. The implemen- tation could have been changed to add an extra part to the processing to redo the unique identifiers after the graphs were stored, but it was not known how much execution time would be added to the process.

207 8.3 Significance of Work

Both in a typical scenario where the query graph is small in size (number of relationships between concepts) compared to the graphs in the knowledge base, and in actual execution tests, the new injective projection algorithm: 1) performs projec- tions on full conceptual graphs, 2) finds all projections even when conceptual relation’s rtypes are not unique, 3) performs the projections faster over a complete KB than com- parative system, and 4) gives good results when executing against a large KB (5000 graphs).

Data structure modifications when directly integrated into projection algorithm produced significance improvement when executed over larger KBs with larger size graphs within the KB.

8.3.1 Full Conceptual Graphs

Even though much work has been done with SCG, full conceptual graphs with all their functionality are desirable. This new algorithm does not have the added re- strictions of SCGs and can even process functional relations. Because there are cases of queries over time and space that require full CGs, this new algorithm is significant.

8.3.2 Finds All Valid Projections

This new algorithm finds all valid projections given a query. Because it is not known which projection from the KB graph may answer the needed information, it is

208 necessary to produce all valid projections. Because of the data structure implementation

finding all valid projections does not really cost any more time than finding one.

8.3.3 Data Structure Integration in Algorithm over Large KB and Graphs

The perfect hash table implementation it is more efficient with large KBs and large graphs within a KB even though it required much more storage space and memory allocation. Because the information within many standard databases is increasing with record information and if one desires to store semantic information from off of the

Semantic Web, being able to handle large amounts of data and knowledge is critical.

Being able to retrieve a projection onto this large KB is a significant improvement.

8.4 Future Work

As an extension to this dissertation’s work, some lines of research can be con- tinued. The work on the maximal join algorithm can be improved by using the infor- mation found in this work. Also other researchers can be worked with on this algorithm analysis.

By evaluating the information about the use of the data structures, this work can help to develop new ideas for storing knowledge base meta-structures in relational databases for creating the ability to move factual information to a knowledge base and then return more information back to the original database.

As new benchmark graphs become available within the community of research

209 more domains can be tested with this new algorithm and data structures. Also, time and

space constraints can also be tested with the new algorithm, while adding heuristics to

improve on the constraint processing.

8.4.1 Experiments and Analysis of Maximal Join Algorithm

In Section 5.2.3, an algorithm was presented to describe the maximal join oper-

ation in the same terms as the projection operation new algorithm. Now that the author

of the Amine Platform, Adil Kabbaj, wishes to make his system interoperate with other

systems [56] and has an implemented the full CG Maximal Join operation, the same

modifications made to the Projection support routine (see Section 5.2.1.3) will be im- plemented, tested and analyzed. Amine can be used for comparison and to help in the validation of the new algorithm.

8.4.2 KB Stored From and To Standard Relational DB

Investigation into possibly storing SCG from relational database records has begun [130]. Given that the new data structure used in storing the CG, in this work, is in a hash table format, this structure could be translated into a relational database record structure. Then this meta-data could be used to store the full structure of the CG. Once the CG is stored in the database, retrieving it again back to a knowledge base would be easily constructed.

210 8.4.3 Time and Space Constraints

Constraints are divided into several groups. Some constraints work by modi- fying and/or evaluating the elements of the domain that are actually processed. Some constraints apply heuristics to decide what information seen during processing should continue to be considered. These heuristics may be very simple such as is the current domain element TRUE or FALSE, maintaining basic truth, or they may be very com- plex. In constraint-satisfaction problems quantification operations are used in a Prolog type fashion to assign values and variables subject to a set of constraints [28]. Con- straint specifications give a convenient form for expressing known knowledge while allowing the system designer to focus on local relationships among entities within the domain. The next sub-section will discuss heuristic constraints.

Other constraints are concerned with time and space relationships between the domain elements and between the actual conceptual units within the semantic network.

These constraints use qualitative relationships to propagate over time and space. As discussed in the Qualitative Section (see 2.1.2.3), these are interval relationships that are setup “point to point”. In Figure 8.1, adapted from Allen’s 1991 paper (p. 346) [4], seven of the basic interval relationships originally discussed in the 1983 Allen paper [3] are shown. There are six other relationships that are the inverse of part of these event objects not depicted. In the further sections below, these relationships will be discussed as they relate to time and space.

211 Figure 8.1: Interval Time Relationships.

8.4.3.1 Heuristics

Heuristics are criteria, methods, or principles for deciding which among several alternative courses of action promises to be the effective in order to achieve some goal

[85]. The idea here is to define a simple criteria that discriminates between good and bad selection. One may choose a heuristic that is just a rule of thumb that guides to a selection, or one may look to see if the out come of applying a heuristic appears to take them to a “stronger” position. When one has good heuristics, they provide a simple

212 means of indicating which course of action will lead to the preferred goal with a quick

path even if it is not the most effective [85]. In general, most reasoning problems are

very complex and have large numbers of cases to evaluate to find an answer. Heuristics

allow these number of cases to be reduced and a shorter, even though maybe not the

most direct, solution to be discovered within reasonable time constraints.

Heuristics use quantification operations to prune and shape the evaluation of

information. The information may be checked with heuristics for either its feasibility

to lead to a valuable solution or for its correctness in the world that the system knows

as reality.

8.4.3.2 Time

Time intervals over moments in time are processed using qualitative relation-

ships. Temporal reasoning requires that the knowledge representation be able to define

and process a snapshot of time. A snapshot is a constraint in the time interval where

only one moment in time, zero duration, is processed. Dean and McDermott [27] saw

time in terms of duration constraints. Figure 8.2 gives an example from the Dean and

McDermott 1987 paper (p. 41) [27], on how a time map can be designed seeing snap-

shots as a point to point with the duration constraints encoded between the snapshots.

It is like a time slice across the current states and schematics, which will be call a situ-

ation, of the objects being considered. The situations may be processed in a forward or backward direction of snapshots.

213 Figure 8.2: A Simple Time Map.

As can be seen in Figure 8.3, an interval can be set up with a start and end time and can be assigned as a property of an act. This property is the time duration of the interval and can be viewed as either a fixed time scale or relative time. In the

Figure 8.3 relative time is used because the time line does not have actual time values.

Time intervals when used with a relative scale require a way of knowing where to start to investigate for information. As following the time map given, the ball is suspended, drop, then falling; a choice is now made from continuing in this time stretch by a bounce, rising, and stop, or roll and rolling. If the choice is for the bounce, then as the stop interval finishes, the falling event will return. If the roll event occurs, there will be no circling back to the falling event. In many temporal reasoning systems, this is done by time indexing of time tokens at insertion of events; however, as discussed in the Mukerjee review [76], sometimes the “neat” durations are not available.

Now as one looks at the full time interval each time slice for an object can be seen as a constraint. If each of these constraints are viewed as their own act property, then the full time interval picture will be modified depending which time slices are current and/or which constraints that are satisfied.

214 Figure 8.3: Time Chart for a Bouncing Ball.

8.4.3.3 Space

Regions within space at a location are also processed using qualitative function- ality. Unlike temporal reasoning with a starting and ending time, space does not have time line flow, but can be multi-dimensional patchwork [76]. However, one can still look at regions that are space sliced according to locations across processes and chron- icles, but one does not get a concept of input and output to the spatial relationships

[47].

In Figure 8.4, it can be seen that over time a ball that is bouncing appears in

215 different locations, bounce height, in space (left axes). When working with spatial constraints [12], the key is to find objects that fall into some spatially organized category such as a region and then sort according to the category, Because there is only one object in this example it is harder to see, but if two balls were bouncing being dropped at different times, the constraints could be categorized by whether or not there is zero, one or two balls in the same space slice.

Figure 8.4: Conceptual Space Diagram for a Bouncing Ball.

216 8.4.4 Different Domain Problems and Interoperability

The design of the architecture for the CPE system specifically was to address the need to communicate and interoperate with other systems [87, 89]. The next step is to work with multiple domains of information and start to connect the modules to as many applications as possible. Work has already been proceeding on using the CPE knowledge base as the “back-end” for a Story Understanding System [14, 88, 89].

217 APPENDICES APPENDIX A

PROGRAMMING LANGUAGE CRITERIA

A.1 Language Evaluation

Each of the applications/systems defined in Chapter 6 may or may not share the same implementation language. So an added complication, besides different internal data structures, is that an application may not be able to communicate at a function call level with another application because they are not written in the same implementation language.

When looking to design an application with flexibly modules, the question arises what implementation language should be used. Since, it was desired that the system work with conceptual structures as the internal representation existing CS sys- tems were examined. First the CS editors that were currently available were evaluated.

These editors, for editing CGs and FCAs, turned out to have different implementation languages. CharGer [29, 30] is based on the API/Implementation code of Notio which is written in Java [117, 115]. ARCEdit is a plug-in to PowerPoint [96] and is written in Visual Basic 6.0. ToscanaJ [6, 7] has an editor as part of its suite of tools written in JavaTM. While Docco is actually based on a Conceptual Email Manager [18], writ-

ten in C++/QT, the commercial version of the manager [36] is a plug-in for Microsoft

219 Outlook.

Since Notio, written in Java, is already defined as an API/Implementation code level and is available with extensible class definitions, the author considered using the

Notio interface for the CGIF (see Section 3.4.4) module. However, there are some drawbacks in communications to Java(see Section A.2.2), and Notio is in hiatus and is not currently being enhanced or developed [116].

All of the applications in the conceptual structures community employ differ- ent implementation languages, such as Prolog, XML, Schema, RDF, etc., which made it difficult to pass even simple syntactic representation data by linking languages in modules. Files, streams, pipes, blackboards, etc. can be used to pass data information without passing the actual data structures, but these mechanisms can be slow if there are a large number of graphs or the graphs are extremely complex. Every time one applica- tion process needs to talk to another, these mechanisms require multiple file descriptors to be opened. If the applications or systems execute on different machines, the Flexible

Module Framework, FMF, architecture designed by John Sowa [124] is a flexible way of passing the syntactic data representation; but, if the applications and systems are able to be executed on the same machine configuration, a good API/Implementation design would be more advisable because the module can be linked directly into the existing application. Communication by files and other stream devices may require a locking mechanism to be setup, so that one application can know when it is safe to read the

220 input graphs from another application. The locking of records can cause a problem when two applications communicate by way of shared databases or message passing systems, such as MPI. If on the other hand, an application/system can call another application/system directly (or can link to it), processing can go more quickly.

However, connecting systems when the implementation languages are not the same is more difficult, because a straight forward “call” to the other system’s functions is not always possible. Each language implementation has its own calling specifica- tions.

In order for data to be transferred between working tools, either all the tools must be implemented within the same environment as a single application or there must be an interchange format. When tools are developed through a single system, the same data structure (or model) can be shared among all the tools so that data can be stored and retrieved. However, when tools are not part of the same system, they do not necessarily share the same internal data structure (or design model). To support interoperability for applications [101], an interchange format must be defined. This interchange format must be agreed upon by the whole working community. When this standard format is used to move data between applications, standard benchmark test can be developed.

One module concentrated on, in respect to data structures for the processors, is in the use of Conceptual Graph Interchange Format, CGIF, for communication [122,

221 126]. This is not to say that a processor is constrained in its internal structures or imple- mentation by CGIF, but it makes sense to examine the correspondence between CGIF and the appropriate data types with a view to minimizing the difficulties of parsing and generating CGIF syntax [87]. A definition of the actual CGIF syntax and semantic interpretation used for Conceptual Graphs data structure can be seen in section 3.4.4.

A.1.1 Visual Basic .Net

If it was desirable to have the component modules available only for execu- tion under the Microsoft Windows OS, Visual Basic .Net does not have any of the connection or interface problems discussed above. However, this would not allow the components implementation to be moved off the Microsoft Windows OS, and if the modules are not implemented in Visual Basic .Net, than they can be made more widely available under Linux operating systems and eventually under other operating systems such as Unix.

A.1.2 JavaTM

Java has a very nice visual interface and is able to be transferred to many dif- ferent operating systems. However, it is an interpreted language and takes more time to execute than a language that is resident to the machine.

222 A.1.3 C

C as originally designed to be an operating system base language. It like Visual

Basic .Net is tied to the OS that it is running under. Because of this, it is much faster than interpreted languages like Java, but is not as portable. It is also designed to be coded “bottom up”, such that, routines are built into libraries and then called from an overall application.

A.1.4 C++

C++ is an Object-Oriented Programming (OOP) language that is an evolution- ary extension from the language C was developed by Bjarne Stroustrup [127]. Even though it accepts the C syntax, it improves on many features of the language. In par- ticular, programs written in C++ can be coded “top down” by designing what objects are needed within the program and then how do they relate. The actual code comes directly from the design and specification of the program instead of linking existing routines together.

A.2 Language Comparison

In order to know which language would be best for implementation of the new environment’s modular components, so that they could possibly be used directly by other applications, an evaluation is performed over how the C++ language interacts, interfaces and communicates with other languages.

223 A.2.1 C++toC

Interfacing implementation code between languages that are somewhat similar, for example: C++ and C, is not as difficult as other communications between languages.

However, this connection may not be bidirectional. The calling sequence for the C language is simpler than for the C++ language, because C++ does name mangling with the name of the function, the types of the arguments and the return type of the function.

C does not do the same name mangling and uses only a modified form of the actual name of the function.

Therefore when designing an API in C++, the interface routines should be ex- ported as “C” functions as opposed to being methods for a class in C++; this will prevent the routines from being name mangled by the C++ compiler. Wrapping C++ with standard C routines allows the internal implementation of the module to remain

C++ and use the classes and methods functionality from C++, while at the same time using the simpler formulation of the name of the calling routine provided by C.

A.2.2 C++to JavaTM

Connecting C++ to Java is also possible, but is more difficult than communicat- ing with C. This connection is also not bidirectional, but for different reasons. Java is a simpler language than C++ [102], but it is an interpreted language. This means that

Java can be byte-compiled, creating a smaller file to be moved across the web, but it is

224 not compiled to machine code. This allows Java to be platform independent; C++ is a compiled language and is both platform dependent and operating system (OS) depen- dent. However, because Java is interpreted instead of compiled, it can not be linked and called directly by a system (application) that is not written in Java. Java must start the process, and then can call compiled code in some of the other compiled languages (for example: C/C++). Therefore, Java can call interface functions written in C or C++, but

C++ can not call Java directly.

A.2.3 C++ to Prolog

Connecting C++ to Prolog is very similar to connecting C++ to Java. Prolog has foreign function routines for call C++/C functions. Also, when using particular operating system and version of Prolog, communication may be provided by the Prolog system (for example: Amzi! Prolog Logic Server and Microsoft C++) for integrating

C++ and Prolog routines. However, in general, like Java and Lisp (not discussed in this paper) Prolog must start the process of executing the system and then call to the C++ routines, but C++ cannot call directly to Prolog.

A.2.4 C++ to Visual Basic 6.0

The connection or interface from C++ to Visual Basic 6.0 is the most difficult connection among the four languages discussed in this paper. One reason is that Visual

Basic 6.0 is a two part language; an event driven module part and a class module part.

225 The class module part is very similar to C++ and holds the characteristics that are available in object-oriented languages. Class modules can also be compiled just like C++ to native code for the machine. However, the event driven part executes Basic code in response to an event. These event driven procedures (routines) are triggered by a form or control which is hooked into the visual part of the language. The triggering of a routine by an event is similar to the interpretation of a function call in languages like Java. Because of the event driven part of the language, Visual Basic 6.0 can call

C++ or C API/Interface routines, but C++/C cannot trigger an event within the Visual

Basic code, so the event procedures (routines) are not executed outside of Visual Basic code.

A second reason Visual Basic 6.0 is difficult to connect, is that it has different encodingof someof itsdatatypesthan C, C++, or Java[105]. Character data is stored in more bits by Visual Basic than by C. Therefore, to pass a character string as a parameter from Visual Basic to C or visa versa, the character string must be converted to Unicode

first, that is passed as a parameter, and then decoded from Unicode at the other end.

This makes passing character data much more cumbersome. Also, Visual Basic defines different Boolean values than C; the “false” value is 0 in both languages, but the “true” value for Visual Basic is -1 (negative) where in C it is 1 (positive). Therefore, in passing

Boolean values, the user must be careful when working with conditionals.

226 APPENDIX B

DOCUMENTATION OF CGIF - VERSION 2001

B.1 Added Definitions For CGIF Categories

Context.

A context C is a concept whose designator is a nonblank conceptual graph g.

• The graph g is said to be immediately nested in C, and any concept c of g is said

to be immediately nested in C.

• A concept c is said to be nested in C, if either c is immediately nested in C or c is

immediately nested in some context D that is nested in C.

• Two concepts c and d are said to be co-nested if either c=d or there is some

context C in which c and d are immediately nested.

• If a concept x is co-nested with a context C, then any concept nested in C is said

to be more deeply nested than x.

• A concept d is said to be within the scope of a concept c if either d is co-nested

with c or d is more deeply nested than c.

227 Coreference Set.

A coreference set C in a conceptual graph g is a set of one or more concepts selected from g or from graphs nested in contexts of g.

• For any coreference set C, there must be one or more concepts in C, called the

dominant concepts of C, which include all concepts of C within their scope. All

dominant concepts of C must be co-nested.

• If a concept c is a dominant concept of a coreference set C, it may not be a

member of any other coreference set.

• A concept c may be member of more than one coreference set C1,C2,... provided

that c is not a dominant concept of any Ci.

• A coreference set C may consist of a single concept c, which is then the dominant

concept of C.

Referent.

Adding to the definition already seen in Definition 3.4.2, a referent of a concept is specified by a quantifier, a designator, and a descriptor.

• Quantifier. A quantifier is one of two kinds: existential or defined.

• Designator. A designator is one of three kinds:

228 1. literal, which may be a number, a string, or an encoded literal;

2. locator, which may be an individual marker, an indexical, or a name;

3. undetermined.

• Descriptor. A descriptor is a conceptual graph, possibly blank, which is said to

describe the referent.

B.2 Lexical Categories

The CGIF lexical categories can be recognized by a finite-state tokenizer or preprocessor. No characters of white space (blanks or other nonprinting characters) are permitted inside any lexical item other than delimited strings (names, comments, or quoted strings). Zero or more characters of white space may be inserted or deleted between any lexical categories without causing an ambiguity or changing the syntactic structure of CGIF. The only white space that should not be deleted is inside delimited strings.

Comment.

A comment is a delimited string with a semicolon ";" as the delimiter.

Comment ::= DelimitedStr(";")

229 DelimtedStr(D).

A delimited string is a sequence of two or more characters that begin and end with a single character D called the delimiter. Any occurrence of D other than the first or last character must be doubled.

DelimitedStr(D) ::= D (AnyCharacterExcept(D) | D D)* D

Exponent.

An exponent is the letter E in upper or lower case, an optional sign ("+" or "-"), and an unsigned integer.

Exponent ::= ("e" | "E") ("+" | "-")? UnsignedInt

Floating.

A floating-point number is a sign ("+" or "-") followed by one of three options: (1) a decimal point ".", an unsigned integer, and an optional exponent; (2) an unsigned integer, a decimal point ".", an optional unsigned integer, and an optional exponent; or

(3) an unsigned integer and an exponent.

Floating ::= ("+" | "-") ("." UnsignedInt Exponent? | UnsignedInt ("." UnsignedInt? Exponent? | Exponent ) )

230 Identifier.

An identifier is a string beginning with a letter or underscore "_" and continuing with zero or more letters, digits, or underscores.

Identifier ::= (Letter | "_") (Letter | Digit | "_")*

Integer.

An integer is a sign ("+" or "-") followed by an unsigned integer.

Integer ::= ("+" | "-") UnsignedInt

Name.

A name is a delimited string with a single quote "’" as the delimiter.

Name ::= DelimitedStr("'")

Number.

A number is an integer or a floating-point number.

Number ::= Floating | Integer

QuotedStr.

A quoted string is a delimited string with a double quote ’"’ as the delimiter.

QuotedStr ::= DelimitedStr('"')

231 UnsignedInt.

An unsigned integer is a string of one or more digits.

UnsignedInt ::= Digit+

B.3 Syntactic Categories

The CGIF syntactic categories are defined by a context-free grammar that can be processed by a recursive-descent parser. Zero or more characters of white space (blanks or other nonprinting characters) are permitted between any two successive constituents of any grammar rule that defines a syntactic category.

Actor.

An actor begins with "<" followed by a type. It continues with zero or more input arcs, a separator "|", zero or more output arcs, and an optional comment. It ends with ">".

Actor ::= "<" Type(N) Arc* "|" Arc* Comment? ">"

The arcs that precede the vertical bar are called input arcs, and the arcs that follow the vertical bar are called output arcs. The valence N of the actor type must be equal to the sum of the number of input arcs and the number of output arcs.

232 Arc.

An arc is a concept or a bound label.

Arc ::= Concept | BoundLabel

BoundLabel.

A bound label is a question mark “?” followed by an identifier.

BoundLabel ::= "?" Identifier

CG.

A conceptual graph is a list of zero or more concepts, relations, actors, special contexts, or comments.

CG ::= (Concept | Relation | Actor | SpecialContext | Comment)*

The alternatives may occur in any order provided that any bound coreference label must occur later in the CGIF stream and must be within the scope of the defining label that has an identical identifier. The definition permits an empty CG, which contains nothing.

An empty CG, which says nothing, is always true.

CGStream.

A conceptual graph stream is defined as a sequence of one or more CGs, each separated by a period.

233 CGStream ::= CG ("." CG)*

Since a CG may itself be empty, the string "....." would also qualify as a CG Stream; as well as an empty file.

Concept.

A concept begins with a left bracket "[" and an optional monadic type followed by optional coreference links and an optional referent in either order. It ends with an optional comment and a required "]".

Concept ::= "[" Type(1)? {CorefLinks?, Referent?} Comment? "]"

If the type is omitted, the default type is Entity. This rule permits the coreference labels to come before or after the referent. If the referent is a CG that contains bound labels that match a defining label on the current concept, the defining label must precede the referent.

Conjuncts.

A conjunction list consists of one or more type terms separated by "&".

Conjuncts(N) ::= TypeTerm(N) ("&" TypeTerm(N))*

The conjunction list must have the same valence N as every type term.

234 CorefLinks.

Coreference links are either a single defining coreference label or a sequence of zero or more bound labels.

CorefLinks ::= DefLabel | BoundLabel*

If a dominant concept node (as defined in subsection B.1) has any coreference label, it must be either a defining label or a single bound label that has the same identifier as the defining label of some co-nested concept.

DefLabel.

A defining label is an asterisk “*” followed by an identifier.

DefLabel ::= "*" Identifier

The concept in which a defining label appears is called the defining concept for that label; a defining concept may contain at most one defining label and no bound corefer- ence labels. Any defining concept must be a dominant concept as defined in subsection

B.1.

Every bound label must be resolvable to a unique defining coreference label within the same context or some containing context. When conceptual graphs are im- ported from one context into another, however, three kinds of conflicts may arise:

235 1. A defining concept is being imported into a context that is within the scope of

another defining concept with the same identifier.

2. A defining concept is being imported into a context that contains some nested

context that has a defining concept with the same identifier.

3. Somewhere in the same module there exists a defining concept whose identifier

is the same as the identifier of the defining concept that is being imported, but

neither concept is within the scope of the other.

In cases (1) and (2), any possible conflict can be detected by scanning no further than the right bracket "]" that encloses the context into which the graph is being imported.

Therefore, in those two cases, the newly imported defining coreference label and all its bound labels must be replaced with an identifier that is guaranteed to be distinct. In case (3), there is no conflict that could affect the semantics of the conceptual graphs or any correctly designed CG tool; but since a human reader might be confused by the similar labels, a CG tool may replace the identifier of one of the defining coreference labels and all its bound labels.

Descriptor.

A descriptor is a structure or a nonempty CG.

Descriptor ::= Structure | CG

236 A context-free rule, such as this, cannot express the condition that a CG is only called a descriptor when it is nested inside some concept.

Designator.

A designator is a literal, a locator, or a quantifier.

Designator ::= Literal | Locator | Quantifier

Disjuncts.

A disjunction list consists of one or more conjunction lists separated by "|".

Disjuncts(N) ::= Conjuncts(N) ("|" Conjuncts(N))*

The disjunction list must have the same valence N as every conjunction list.

FormalParameter.

A formal parameter is a monadic type followed by a optional defining label.

FormalParameter ::= Type(1) [DefLabel]

The defining label is required if the body of the lambda expression contains any match- ing bound labels.

237 Indexical.

An indexical is the character “#” followed by an optional identifier.

Indexical ::= "#" Identifier?

The identifier specifies some implementation-dependent method that may be used to replace the indexical with a bound label.

IndividualMarker.

An individual marker is the character “#” followed by an integer.

IndividualMarker ::= "#" UnsignedInt

The integer specifies an index to some entry in a catalog of individuals.

LambdaExpression(N).

A lambda expression begins with "(" and the keyword "lambda", it continues a signature and a conceptual graph, and it ends with ")".

LambdaExpression(N) ::= "(" "lambda" Signature(N) CG ")"

A lambda expression with N formal parameters is called an N-adic lambda expression.

The simplest example, represented "(lambda ())", is a 0-adic lambda expression with a blank CG.

238 Literal.

A literal is a number or a quoted string.

Literal ::= Number | QuotedStr

Locator.

A locator is a name, an individual marker, or an indexical.

Locator ::= Name | IndividualMarker | Indexical

Negation.

A negation begins with a tilde “~” and a left bracket “[” followed by a conceptual graph

and a right bracket “]”.

Negation ::= "~[" CG "]"

A negation is an abbreviation for a concept of type Proposition with an attached relation of type Neg. It has a simpler syntax, which does not permit coreference labels or at- tached conceptual relations. If such options are required, the negation can be expressed by the unabbreviated form with an explicit Neg relation.

Quantifier.

A quantifier consists of an at sign “@” followed by an unsigned integer or an identifier and an optional list of zero or more quoted strings enclosed in braces.

239 Quantifier ::= "@" (UnsignedInt | Identifier ("{" (remove(Arc*)) QuotedStr ("," QuotedStr)* "}")?)

The symbol @some is called the existential quantifier, and the symbol @every is called the universal quantifier. If the quantifier is omitted, the default is @some.

Referent.

A referent (see subsection B.1 for added definitions) consists of a colon “:” followed by an optional designator and an optional descriptor in either order.

Referent ::= ":" {Designator?, Descriptor?}

Relation.

A conceptual relation begins with a left parenthesis “(” followed by an N-adic type, N arcs, and an optional comment. It ends with a right parenthesis “)”.

Relation ::= "(" Type(N) Arc* Comment? ")"

The valence N of the relation type must be equal to the number of arcs.

Signature.

A signature is a parenthesized list of zero or more formal parameters separated by commas.

240 Signature ::= "(" (FormalParameter ("," FormalParameter)*)? ")"

SpecialConLabel.

A special context label is one of five identifiers: “if”, “then”, “either”, “or”, and “sc”, in either upper or lower case.

SpecialConLabel ::= "if" | "then" | "either" | "or" | "sc"

The five special context labels and the two identifiers "else" and "lambda" are reserved words that may not be used as type labels.

SpecialContext.

A special context is either a negation or a left bracket, a special context label, a colon, a CG, and a right bracket.

SpecialContext ::= Negation | "[" SpecialConLabel ":" CG "]"

Structure.

A structure consists of an optional percent sign “%” and identifier followed by a list of zero or more arcs enclosed in braces.

Structure ::= ("%" Identifier)? "{" Arc* "}"

241 Type.

A type is a type expression or an identifier other than the reserved labels: "if", "then",

"either", "or", "sc", "else", "lambda".

Type(N) ::= TypeLabel(N) | TypeExpression(N)

A concept type must have valence N=1. A relation type must have valence N equal to the number of arcs of any relation or actor of that type. The type label or the type expression must have the same valence as the type.

TypeExpression.

A type expression is either a lambda expression or a disjunction list enclosed in paren- theses.

TypeExpression(N) ::= LambdaExpression(N) | "(" Disjuncts(N) ")"

The type expression must have the same valence N as the lambda expression or the disjunction list.

TypeLabel(N).

A type label is an identifier.

TypeLabel(N) ::= Identifier

242 The type label must have an associated valence N.

TypeTerm.

A type term is an optional tilde “~” followed by a type.

TypeTerm(N) ::= "~"? Type(N)

The type term must have the same valence N as the type.

Example.

When transforming the English phrase: A person is between a rock and a hard place, the display format, DF, of the phrase in CG format can be seen in Figure B.1. Following is a translation of Figure 2 from DF to CGIF:

(Betw [Rock] [Place *x1] [Person]) (Attr ?x1 [Hard])

For more compact storage and transmission, all white space not contained in comments or enclosed in quotes may be eliminated:

(Betw[Rock][Place*x1][Person])(Attr?x1[Hard])

243 Rock

Person Between

Place Attr Hard

Figure B.1: The Display Format for ‘A person is between a rock and a hard place.’

This translation takes the option of nesting all concept nodes inside the concep- tual relation nodes. A logically equivalent translation, which uses more coreference labels, moves the concepts outside the relations:

[Rock *x1] [Place *x2] [Person *x3] (Betw ?x1 ?x2 ?x3)

[Hard ?x4] (Attr ?x2 ?x4)

The concept and relation nodes may be listed in any order provided that every bound

label follows the defining node for that label.

244 APPENDIX C

DOCUMENTATION OF SYSTEMS

C.1 pCG (CGP Programs)

In order to use pCG to test the projection operation that was found at the Notio level, a ‘cgp’ program file had to be generated. Table C.1 shows the cgp programs that match to each of the tests that are given in section 7.2.1 of Chapter 7.

Table C.1: CGP Program Files.

CGP Program Filename # of KB files # of Queries 5 graphs_5.cgp 6 2 11 graphs_11.cgp 6 5 21 graphs_21.cgp 6 6 31 graphs_31.cgp 6 8 53 graphs_53.cgp 6 10 73 graphs_73.cgp 6 12

The program file contained instructions to the pCG processor on what functions are to be executed and what information is to be retrieved. Figures C.1, C.2, C.3, and

C.4 contains an example of one of the cgp program files.

245 # A test of the final graphs. Reads a final graph file, asserts # each non type hierarchy related graph into pCG's top-level # knowledge base, and projects a filter over all these graphs, # returning matches and sending them to standard output. # Use a June 2001 CG Standard conformant CGIF parser and # generator. The current (0.2.2) Notio defaults are based upon # an older version of the standard. option cgifparser = "cgp.translators.CGIFParser"; option cgifgen = "cgp.translators.CGIFGenerator"; # Get the file path separator for the current operating system. sep = (_ENV.member("file.separator"))[2]; # Final file names. graphFileNames = {"graphs_53_1.cgf", "graphs_53_5.cgf", "graphs_53_100.cgf", "graphs_53_1000.cgf", "graphs_53_2500.cgf", "graphs_53_5000.cgf"};

Figure C.1: Part 1: Example of CGP Program from pCG.

The cgp program is broken into several parts so that it can be displayed in sev- eral figures. The first part indicates the parser and translator format being used by the

CGP program; this may be either the Notio original format or the CGIF format from

2001 [126]. The next section in this piece of the program, indicated the KBs that should be tested. Part 2 indicates the query graphs that will be tested against the KB.

The third part examines the parameters that the CGP program will be using to select the correct KB, query graph to be tested, and the number of times to run the test.

Then reads in the KB from the indicated file and runs the “Assertion” phase of the CGP program file to setup the knowledge base for the pCG system. In the fourth part is the actual running of the projection algorithm and the printing of the time results.

246 # filter graphs graphFilters = {`(ATTR1 [Block] [Color])`, `(ATTR1 [Block*b] [Color])(NAME1 ?b [Number])`, `(Above1 [Block*b2] [Block*b1])(OnTable1 ?b1 [Table]) (ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])`, `(Above1 [Block*b2] [Block*b1])(OnTable1 ?b1 [Table]) (ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(ATTR2 ?b2 [Color])`, `(Above1 [Block*b2] [Block*b1])(OnTable1 ?b1 [Table]) (ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape]) (LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])`, `(Above1 [Block*b2] [Block*b1])(OnTable1 ?b1 [Table]) (ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape]) (LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number]) (CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])`, `(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2) (OnTable1 ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number]) (CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color]) (NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place]) (ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])`, `(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2) (OnTable1 ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])(CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])`, `(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2) (OnTable1 ?b1 [Table*t1])(OnTable2 [Block*b4] ?t1) (Above3 [Block*b5] ?b4)(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number]) (CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color]) (NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place]) (ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])(CHRC3 ?b3 [Shape]) (LOC3 ?b3 [Place])(ATTR4 ?b4 [Color])(NAME4 ?b4 [Number]) (ATTR5 ?b5 [Color])(NAME5 ?b5 [Number])`, `(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2)(OnTable1 ?b1 [Table*t1])(OnTable2 [Block*b4] ?t1)(Above3 [Block*b5] ?b4) (NAMET ?t1 [Number])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number]) (CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])(CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])(ATTR4 ?b4 [Color])(NAME4 ?b4 [Number])(CHRC4 ?b4 [Shape])(LOC4 ?b4 [Place])(ATTR5 ?b5 [Color])(NAME5 ?b5 [Number])(CHRC5 ?b5 [Shape])(LOC5 ?b5 [Place])`};

Figure C.2: Part 2: Example of CGP Program from pCG.

247 # Get optional graph file number from command-line. # Defaults to 1. gFileNum = 1; if _ARGS.length > 0 and _ARGS.length <= 3 then gFileNum = (_ARGS[1]).toNumber(); end if gFileNum > 6 or gFileNum < 1 then exit "Invalid file number."; end graphFileName = graphFileNames[gFileNum]; gFNum = 1; if _ARGS.length > 1 and _ARGS.length <= 3 then gFNum = (_ARGS[2]).toNumber(); end if gFNum > 10 or gFNum < 1 then exit "Invalid filter number."; end tNum = 1; if _ARGS.length > 2 and _ARGS.length <= 3 then tNum = (_ARGS[3]).toNumber(); end # open the CGF file newF = file ("examples" + sep + "projection" + sep + graphFileName); # open to get timings u = new Util; # Read and assert the graphs. println "*** Asserting graphs into KB..."; println ""; startfull = u.getCurrentTimeInMillis(); graphs = newF.readGraphStream(); newF.close(); endtime1 = u.getCurrentTimeInMillis(); println "Storage time is " + (endtime1 - startfull) + " "; startassert = u.getCurrentTimeInMillis(); t = 0; foreach g in graphs do rels = g.relations; t.inc(); if rels.member("GT") is undefined then assert g; end end endtime2 = u.getCurrentTimeInMillis(); print "Assert time is " + (endtime2 - startassert)"; println " for " + t + " graphs.";

Figure C.3: Part 3: Example of CGP Program from pCG.

248 n = 0; while n < tNum do projections = {}; # Retrieve all graphs in the outer context containing an # OnTable relation. filter = graphFilters[gFNum]; # print "Result of projecting " + filter # println " onto asserted graphs..."; # println ""; t = 0; startpart = u.getCurrentTimeInMillis(); foreach g in _KB.graphs do h = g.project(filter); if not (h is undefined) then if (tNum <= 1) then endtimefull = u.getCurrentTimeInMillis(); println h; end t.inc(); projections.append(h); end if h is undefined then t.inc(); println "Not found graph is number " + t + "."; end end if (tNum > 1) then endtimefull = u.getCurrentTimeInMillis(); end endpart = (endtimefull - startpart); print "Actual Projection time is " + endpart + " for " println t + " graphs"; n.inc(); end

Figure C.4: Part 4: Example of CGP Program from pCG.

249 C.2 CP Environment, CPE

The CP Environment has both module documentation for showing how the

DDLs are designed and class documentation for some of the data structures and func- tions found in the classes of the systems.

C.2.1 CPE Module Documentation

The module documentation gives the top level systems API functions and the internal general functions for both the CPE and CG modules. The CPE module is the most generalized routines available for the CPE system and the CG module holds the basic data structure for the conceptual graphs storing the knowledge base.

C.2.1.1 CP_Graph Reasoning Operations

These functions perform the basic reasoning operations from the API:

• CPE_API CPLPGraphs STDCALL CPE_projectionUnique (void)

– Note: only returns one projection even if more than one available.

• CPE_API CPLPGraphs STDCALL CPE_projection (void)

– Projects the current query graph onto the current.

250 C.2.1.2 CP_Graph Reasoning Internal Operations

These functions perform the actual internal reasoning operations:

• CPLPGraphs cp_ops::CProjection (CPLPGraph, CPLPGraph)

– Actual graph to graph projection.

• CPLPGraphs cp_ops::CProject (CPLPKB, CPLPGraph)

– Knowledge base to query graph projection which processes all the graphs in the KB.

• BOOLEAN cp_ops::get_onlyOne (void)

– Check to see if only one projection needs to be found per KB graph.

• void cp_ops::set_onlyOne (BOOLEAN)

– Set if only one projection needs to be found per KB graph.

• CPLPGraph cp_ops::add_toornewprojections (CGLPChar, CGLPChar, CGLPChar,

CGLPChar, CGLPChar, CGLPNElement, CPLPGraph, CPLPGraph, CPLPGraphs)

– Check new matching concept or new projection line from current matching con- cept.

• BOOLEAN cp_ops::add_toexistprojections (CGLPChar, CGLPChar, CGLP-

NElement, CPLPGraph, CPLPGraphs)

– Add the new query triple match to all related projection graphs.

251 • BOOLEAN cp_ops::add_copyprojections (CGLPChar, CGLPChar, CGLPNEle-

ment, CPLPGraph, CPLPGraph, CPLPGraphs)

– Make a copy of a projection graph and add in the new triple for next round pro- cessing.

• BOOLEAN cp_ops::process_querytriple (CGLPChar, CGLPChar, CGLPChar,

CGLPNElement, CGLPCStr, CPLPGraph, CPLPGraphs, CPLPGraphs)

– Process multiple elements to anchorlist when not on first concept in query graph.

C.2.1.3 CGHash_Graph and CG_Graph Public Functions

These functions perform the basic graph operations from the API:

• bool addChild (CGLPChar)

– Adds a new child graph (nested graph) to the CG graph.

• bool addConcept (CGLPChar)

– Adds a new concept to the CG graph concepts list.

• bool addRelation (CGLPChar)

– Adds a new relation to the CG graph relations list.

• bool addTriple (CGLPChar)

– Adds a new triple name to the CG graph triples list.

252 • bool isChild (CGLPChar)

– Is a children list

• bool isConcept (CGLPChar)

– Is a concepts list

• bool isRelation (CGLPChar)

– Is a relations list

• bool isTriple (CGLPChar)

– Is a triples list

• CGLPNodes getNodes (short)

– Returns the node list for the type of node being searched for.

C.2.2 CPE Class Documentation

The class documentation indicates how the hierarchy of class references are

setup in C++ for the modules.

C.2.2.1 cp_graph Class Reference

• Inheritance diagram for cp_graph is seen in Figure C.5.

253 cghash_graph cg_graph

cp_graph

Figure C.5: Inheritance Diagram for Class ‘cp_graph’.

C.2.2.2 cghash_graph Class Reference

This class is the perfect hash implementation of a CG graph. Inheritance dia- gram for cghash_graph is the left side of Figure C.5. Base graph: cghash_graph data structure that is changing for graphs; when CGHASH defined then implemented as two hashtables and all lists are hashtables with keys that are unique numbers. Class specific functions are:

• cghash_graph (void)

– Constructor function that makes sure most internal lists are built.

• cghash_graph (UINT)

– Constructor function that makes sure internal lists are built and triples lists.

• ∼cghash_graph (void)

– The destructor class for cleaning up at the end.

254 C.2.2.3 cg_graph Class Reference

This class is the array implementation of a CG graph. Inheritance diagram for cg_graph is the right side of Figure C.5. Base graph: cg_graph: Conceptual Graph elements of graph data structure that is changing for graphs; when CG2DARR defined then implemented as two 2-dimensional arrays and all lists are list of strings. Class specific functions are:

• cg_graph (void)

– Constructor function that makes sure all internal list are built.

• cg_graph (int)

– Constructor function that makes sure internal lists are built and triples lists.

• ∼cg_graph (void)

– The destructor class for cleaning up at the end.

255 256 APPENDIX D

DATA COLLECTED FROM SAMPLE TESTS

This appendix will give an example of some of the data collected to produce the results found in Chapter 7. It also gives output from the tested systems to verify that the correct results were produced with the projections in both the unique relation instance and multiple relations within graph instance.

D.1 Data Collected for Computing Each Experimental Results Test Set - 53 nodes in KB Graphs

The three Tables D.1, D.2 and D.3 are the average data values used to produce the graphs found in subsection 7.4.5 of Chapter 7.

Table D.1: Average Data Values for 53 nodes KB with 1000 Graphs.

# of nodes in Query pCG CPE (array) CPE (hash table) 3 82.1 25.85 174.15 5 105.45 49.05 181.35 9 143.8 82.95 214.1 11 161.75 90.85 228.8 15 206.3 140.5 261.5 21 279 177.15 305.45 27 346.15 280.25 362 31 424.25 320.4 392.9 43 575.8 506.9 503.15 53 700.85 662.15 595.35

257 These averages came from computing the average value over 48 runs for each test case. A test case consisted of selecting the number of nodes in the KB graphs file, and then selecting the query graph to be projected onto that KB of graphs. Before com- puting the average value the four lowest (fastest) times and the four highest (slowest) times were dropped. The average values seen in the tables are for runs with the pCG system, CPE with the array data structures and CPE with the hash table data structures.

Table D.2: Average Data Values for 53 nodes KB with 2500 Graphs.

# of nodes in Query pCG CPE (array) CPE (hash table) 3 225.1 55.3 457.35 5 291.35 122.4 466.1 9 396.9 190.1 552.9 11 465.8 251.05 600.8 15 575.8 344.2 685.25 21 742.25 510.75 786.85 27 945.4 713.5 939.8 31 1037.55 857.15 1052.45 43 1494.6 1351.55 1303.65 53 1814.95 1875 1514.8

The reason some of the timings collected were not used in computing the av- erages is timings on the machine used for all testing was only accurate to within 16 milliseconds, so some “spreading” of the timing was seen. How much spreading is given in the error bar data below. It should be explained, that the 16 milliseconds accu- racy came because the tests were being executed on at 64 bit processing architecture,

258 but the clock values could only be retrieved with 32 bit accuracy. Therefore, the clock timing values jumped by 16 milliseconds on time change.

Table D.3: Average Data Values for 53 nodes KB with 5000 Graphs.

# of nodes in Query pCG CPE (array) CPE (hash table) 3 559.35 131.05 838.6 5 660.95 226.9 944.9 9 862.55 431.25 1144.65 11 1010.15 536.8 1174.5 15 1217.2 804.9 1340.4 21 1578.1 1142.6 1589.7 27 1976.6 1529.75 1774.1 31 2220.3 1881.95 2005.35 43 2991.45 2933.55 2429.5 53 3786.7 3924.75 3039.35

The justification for dropping part of collected data was that it was consistent over all tests runs for all systems being tested. For every 12 runs, the highest and lowest values collected were always dropped.

D.2 Error Bar Data - 53 nodes in KB Graphs

Discussed above was the fact that there was some “spreading” of data values; that is, not all the data fell cleaning in a small range of time values.

Tables D.4, D.5 and D.6 indicate the actual fastest and slowest values collected for the 53 nodes in KB graph test set.

259 Table D.4: Fast/Slow Values for 53 nodes KB with 1000 Graphs.

# nodes in Q pCG (f) pCG (s) CPE (af) CPE (as) CPE (hf) CPE (hs) 3 78 94 0 48 125 217 5 94 110 16 78 156 218 9 140 172 31 127 141 298 11 156 188 31 142 217 279 15 203 219 63 219 202 341 21 265 359 140 234 250 362 27 328 407 188 407 312 422 31 406 641 251 438 343 452 43 532 797 424 607 403 598 53 656 906 576 751 468 736

The columns are laid out by each of the three systems, giving the best (fastest) time for each set of runs followed by the worst (slowest) time for that run. Therefore,

first seen is the fastest time for the pCG system run followed by the slowest time for that same set of runs. Second is the CPE system using the array data structure with its fastest execution time for the runs followed by the slowest time, and lastly the CPE system using the hash table data structures fastest times followed by the slowest.

The rows in each table are the number of nodes within the query graph being projected. The query graph size is smaller than or equal in size to the graphs found in the KB. In fact, the query graphs are built from the abstract (most general) version of the graphs in the KB.

Tables D.7, D.8 and D.9 then display the ranges of data (or how far away from the average value), which will be referred to as Error Bar Data, for all of the 53 nodes

260 Table D.5: Fast/Slow Values for 53 nodes KB with 2500 Graphs.

# nodes in Q pCG (f) pCG (s) CPE (af) CPE (as) CPE (hf) CPE (hs) 3 218 235 16 79 391 517 5 281 312 78 188 359 531 9 390 407 141 232 468 627 11 453 484 203 298 515 673 15 562 656 249 438 548 763 21 719 843 390 583 667 849 27 937 1032 639 808 858 1052 31 985 1125 720 969 970 1124 43 1422 1969 1199 1475 1146 1441 53 1703 2344 1725 2008 1314 1673

Table D.6: Fast/Slow Values for 53 nodes KB with 5000 Graphs.

# nodes in Q pCG (f) pCG (s) CPE (af) CPE (as) CPE (hf) CPE (hs) 3 546 578 48 171 732 940 5 656 672 155 314 782 1077 9 844 875 328 533 1017 1296 11 1000 1032 433 641 1033 1281 15 1203 1235 655 908 1221 1441 21 1562 1656 980 1345 1437 1709 27 1953 2031 1418 1712 1607 2018 31 2141 3172 1682 2103 1836 2158 43 2859 4015 2714 3121 2368 2934 53 3593 4813 3699 4064 2839 3248

261 in KB test set. Again this is laid out in the columns by system with distance away from the lowest (fastest) value followed by the highest (slowest) value for each system. The rows again are just the number of nodes in the query graph being projected. It can be seen that as the number of graphs in the KB is increased, than the systems (especially pCG) become unstable when projecting a query graph that is close too or actually the size of the KB graphs (see row 43 and 53 in Table D.9).

Table D.7: Error Bar Data Values for 53 nodes KB with 1000 Graphs.

# nodes in Q pCG (l) pCG (h) CPE (al) CPE (ah) CPE (hl) CPE (hh) 3 4.1 11.9 25.85 22.15 49.15 42.85 5 11.45 4.55 33.05 28.95 25.35 36.65 9 3.8 28.2 51.95 44.05 73.1 83.9 11 5.75 26.25 59.85 51.15 11.8 50.2 15 3.3 12.7 77.5 78.5 59.5 79.5 21 14 80 37.15 56.85 55.45 56.55 27 18.15 60.85 92.25 126.75 50 60 31 18.25 216.75 69.4 117.6 49.9 59.1 43 43.8 221.2 82.9 100.1 100.15 94.85 53 44.85 205.15 86.15 88.85 127.35 140.65

D.3 Validation of Correct Projection

Shown within the next two subsections are the actual output data verifying that the projections were correct. The output data is three figures where the first figure is the

KB graph, second is the query graph that was projected and the third is the projection results found. Each of the graphs outputted give several parts:

262 Table D.8: Error Bar Data Values for 53 nodes KB with 2500 Graphs.

# nodes in Q pCG (l) pCG (h) CPE (al) CPE (ah) CPE (hl) CPE (hh) 3 7.1 9.9 39.3 23.7 66.35 59.65 5 10.35 20.65 44.4 65.6 107.1 64.9 9 6.9 10.1 49.1 41.9 84.9 74.1 11 12.8 18.2 48.05 46.95 85.8 72.2 15 13.8 80.2 95.2 93.8 137.25 77.75 21 23.25 100.75 120.75 72.25 119.85 62.15 27 8.4 86.6 74.5 94.5 81.8 112.2 31 52.55 87.45 137.15 111.85 82.45 71.55 43 72.6 474.4 152.55 123.45 157.65 137.35 53 111.95 529.05 150 133 200.8 158.2

Table D.9: Error Bar Data Values for 53 nodes KB with 5000 Graphs.

# nodes in Q pCG (l) pCG (h) CPE (al) CPE (ah) CPE (hl) CPE (hh) 3 13.35 18.65 83.05 39.95 106.6 101.4 5 4.95 11.05 71.9 87.1 162.9 132.1 9 18.55 12.45 103.25 101.75 127.65 151.35 11 10.15 21.85 103.8 104.2 141.5 106.5 15 14.2 17.8 149.9 103.1 119.4 100.6 21 16.1 77.9 162.6 202.4 152.7 119.3 27 23.6 54.4 111.75 182.25 167.1 243.9 31 79.3 951.7 199.95 221.05 169.35 152.65 43 132.45 1023.55 219.55 187.45 61.5 504.5 53 193.7 1026.3 225.75 139.25 200.35 208.65

263 1. Basenode - this is the unique identifier for the single concept node that is consid-

ered the basic node of the graph.

2. Concept nodes - these are the concepts in the graph, giving the unique identifier

as well as the type, referent (if any) and co-reference links (if any).

3. Relation nodes - these are the relations in the graph, giving the unique identifier

and the type value.

4. CRC list - displays the concept-relation-concept list by giving the unique iden-

tifier for the node followed by the direction of the linkage into that node. If the

direction is indented on the next line than that linkage is scoped within the unique

identifier displayed above.

D.3.1 11 nodes in KB graphs - Unique Relation Results

The Figures D.1, D.2 and D.3 show the three elements of the test for the 11 nodes graph in KB being projected by a 3 nodes query graph. It should be noted that the graph seen in the projection graph (see Figure D.3) has the unique identifying nodes from the KB graph, but the structure of the query graph. Therefore, a subgraph was, in fact, found within the KB that was isomorphic to the query graph.

264 Graph Graph*G1 built: Basenode - C2062 CG Base graph - Concepts in this graph: Concept unique label is C2062 with co-ref link *b and as Block Concept unique label is C5665 as Color Concept unique label is C0493 as Number Concept unique label is C8990 as Place Concept unique label is C3008 as Shape Concept unique label is C6346 as Table - Relations in this graph: Relation unique label is R9474 as ATTR Relation unique label is R0897 as NAME Relation unique label is R9634 as LOC Relation unique label is R1126 as CHRC Relation unique label is R4954 as OnTable - crc: C2062 -> R9474 -> C5665 -> R0897 -> C0493 -> R9634 -> C8990 -> R1126 -> C3008 -> R4954 -> C6346 C5665 <- R9474 <- C2062 C0493 <- R0897 <- C2062 C8990 <- R9634 <- C2062 C3008 <- R1126 <- C2062 C6346 <- R4954 <- C2062

Figure D.1: KB for Verifying 3 nodes Query onto 11 nodes KB.

D.3.2 13 nodes in KB graphs - Multi-Instances Relation Results

The Figures D.5, D.4 and D.6 show the three elements of the test for the 13 nodes graph in KB being projected by a 5 nodes query graph. It should be noted that the graphs seen in the projection graphs (see Figure D.6) show that two subgraphs are

265 query graphs - 1 graph/s read Graph Graph*G2 built: Basenode - C6067 CG Base graph - Concepts in this graph: Concept unique label is C6067 as Block Concept unique label is C1389 as Color - Relations in this graph: Relation unique label is R9447 as ATTR - crc: C6067 -> R9447 -> C1389 C1389 <- R9447 <- C6067

Figure D.2: Query Graph for Verifying 3 nodes Query onto 11 nodes KB.

projection graphs Graph P30001 built: Basenode - C2062 CG Base graph - Concepts in this graph: Concept unique label is C2062 with co-ref link *b and as Block Concept unique label is C5665 as Color - Relations in this graph: Relation unique label is R9474 as ATTR - crc: C2062 -> R9474 -> C5665 C5665 <- R9474 <- C2062

Figure D.3: Projection Verifying 3 nodes Query onto 11 nodes KB.

266 found within the KB that are isomorphic to the query graph. Again the image of the query graph is projected onto the KB graph, but the projection graph have the nodes from inside of the KB graph.

query graphs - 1 graph/s read Graph Graph*G3 built: Basenode - C4124 CG Base graph - Concepts in this graph: Concept unique label is C4124 with co-ref link *b and as Block Concept unique label is C1918 as Color Concept unique label is C5682 as Number - Relations in this graph: Relation unique label is R7152 as ATTR Relation unique label is R2455 as NAME - crc: C4124 -> R7152 -> C1918 -> R2455 -> C5682 C1918 <- R7152 <- C4124 C5682 <- R2455 <- C4124

Figure D.4: Query Graph for Verifying 5 nodes Query onto 13 nodes KB.

267 Graph Graph*G1 built: Basenode - C9474 CG Base graph - Concepts in this graph: Concept unique label is C2062 with co-ref link *b1 and as Block Concept unique label is C9474 with co-ref link *b2 and as Block Concept unique label is C0493 as Table Concept unique label is C8990 as Color Concept unique label is C3008 as Number Concept unique label is C6346 as Color Concept unique label is C3285 as Number - Relations in this graph: Relation unique label is R5665 as Above Relation unique label is R0897 as OnTable Relation unique label is R9634 as ATTR Relation unique label is R1126 as NAME Relation unique label is R4954 as ATTR Relation unique label is R5963 as NAME - crc: C2062 -> R5665 -> C9474 -> R9634 -> C8990 -> R1126 -> C3008 C9474 -> R0897 -> C0493 -> R4954 -> C6346 -> R5963 -> C3285 <- R5665 <- C2062 C0493 <- R0897 <- C9474 C8990 <- R9634 <- C2062 C3008 <- R1126 <- C2062 C6346 <- R4954 <- C9474 C3285 <- R5963 <- C9474

Figure D.5: KB for Verifying 5 nodes Query onto 13 nodes KB.

268 projection graphs Graph P30001 built: Basenode - C2062 CG Base graph - Concepts in this graph: Concept unique label is C2062 with co-ref link *b1 and as Block Concept unique label is C8990 as Color Concept unique label is C3008 as Number - Relations in this graph: Relation unique label is R9634 as ATTR Relation unique label is R1126 as NAME - crc: C2062 -> R9634 -> C8990 -> R1126 -> C3008 C8990 <- R9634 <- C2062 C3008 <- R1126 <- C2062 Graph P30002 built: Basenode - C9474 CG Base graph - Concepts in this graph: Concept unique label is C9474 with co-ref link *b2 and as Block Concept unique label is C6346 as Color Concept unique label is C3285 as Number - Relations in this graph: Relation unique label is R4954 as ATTR Relation unique label is R5963 as NAME - crc: C9474 -> R4954 -> C6346 -> R5963 -> C3285 C6346 <- R4954 <- C9474 C3285 <- R5963 <- C9474

Figure D.6: Projections Verifying 5 nodes Query onto 13 nodes KB.

269 270 REFERENCES

[1] H. Aidinejad. Semantic networks as a unified model of knowledge representa- tion. MCCS-88-117, 1988. [2] H. Ait-Kaci. An algebraic semantics approach to the effective resolution of type equations. Theor. Comp. Sc., 45:293–351, 1986. [3] J.F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):pp. 832–843, 1983. [4] J.F. Allen. Time and time again: The many ways to represent time. International Journal of Intelligent Systems, 6(4):pp. 341–355, July 1991. [5] J.-F. Baget and M.-L. Mugnier. Extensions of simple conceptual graphs: the complexity of rules and constraints. Journal of Artificial Intelligence Research (JAIR), 16:425–465, 2002. [6] P. Becker. ToscanaJ. Technical University of Darmstadt, Germany, 2004. http://toscanaj.sourceforge.net/. [7] P. Becker and J.H. Correia. The ToscanaJ suite for implementing Conceptual Information Systems. In G. Stumme, editor, Formal Concept Analysis – State of the Art, Berlin – Heidelberg – New York, 2004. Springer. To appear. [8] D.J. Benn. Implementing conceptual graph processes. Master’s thesis, Uni- versity of South Australia, School of Computer and Information Science, April 2001. http://members.ozemail.com.au/ djbenn/Masters/thesis/Thesis.pdf. [9] D.J. Benn and D. Corbett. An application of the process mechanism to a room allocation problem using the pCG language. In H.S. Delugach and G. Stumme, editors, Conceptual Structures: Broadening the Base, Springer-Verlag Lecture Notes in Computer Science 2120, pages 360–374, 2001. [10] D.J. Benn and D. Corbett. pCG: An implementation of the process mechanism and an extensible CG programming language. In CGTools Workshop Proceed- ings in connection with ICCS 2001, Stanford, CA, 2001. [Online Access: July 2001] URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html. [11] R.J. Brachman. On the epistemological status of semantic networks. In N.V. Findler, editor, Associative Networks: Representation and Use of Knowledge by Computers, pages 3–50. Academic Press, New York, 1979. [12] E. Charniak and D. McDermott. Introduction To Artifical Intelligence. Addison- Wesley, Reading, MA, 1985.

271 [13] G. Chartrand and L. Lesniak. Graphs & Digraphs. Mathematics Series. Wadsworth & Brooks/Cole, Pacific Grove, CA, second edition edition, 1986. [14] N.R. Chavez, Jr. and R.T. Hartley. The Role of Object-Oriented Techniques and Multi-Agents in Story Understanding. In Proceedings of the International Con- ference on Integration of Knowledge Intensie Multi-Agent Systems, Waltham, Mass, 2005. KIMAS 2005. [15] M. Chein and M.-L. Mugnier. Conceptual graphs: Fundamental notions. Revue d’Intelligence Artificielle, 6-4:365–406, 1992. [16] N. Chomsky. Syntactic Structures. The Hague, Mouton, 1957. [17] N. Chomsky. Aspects of the Theory of Syntax. MIT Press, Cambridge, MA, 1965. [18] R.J. Cole, P. Eklund, and G. Stumme. Document retrieval for email search and discovery using Formal Concept Analysis. Applied Artificial Intelligence, 17(3), 2003. [19] D. Corbett. Reasoning and Unification over Conceptual Graphs. Kluwer Aca- demic/Plenum Plublishers, New York, 2003. [20] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. The MIT Press, 1990. [21] M. Croitoru and E. Compatangelo. A combinatorial approach to conceptual graph projection checking. In Proc. of the 24th Int’l Conf. of the British Com- puter Society’s Specialist Group on Art’l Intell. AI’2004, Springer-Verlag, 2004. [22] M. Croitoru and E. Compatangelo. On conceptual graph projection. Technical Report AUCS/TR0403, University of Aberdeen, UK, Department of Computing Science, 2004. [23] Z.J. Czech. Quasi-perfect hashing. The Computer Journal, 41(6):416–421, 1998. [24] Z.J. Czech, G. Havas, and B.S. Majewski. Perfect hashing. Theoretical Com- puter Science, 182(1-2):1–143, 15 August 1997. Fundamental Study. [25] F. Dau. Types and tokens for logic with diagrams. In K.E. Wolff, H.D. Pfeiffer, and H.S. Delugach, editors, Conceptual Structures at Work, 12th International Conference on Conceptual Structures, volume LNAI of 3127, pages 62–93, Hei- delberg, July 2004. ICCS 2004, Springer. [26] E. Davis. Representations of Commonsense Knowledge. Morgan Kaufmann, San Mateo, CA, 1990.

272 [27] T. Dean and D. McDermott. Temporal data base management. Artificial Intelli- gence, 32:pp. 1–55, 1987.

[28] R. Dechter and J. Pearl. Network-based heuristics for constraint-satisfaction problems. Artificial Intelligence, 34:1–38, 1988.

[29] H.S. Delugach. CharGer: Some lessons learned and new directions. In G. Stumme, editor, Working with Conceptual Structures - Contributions to ICCS 2000, pages 306–309, 2000. Shaker-Verlag.

[30] H.S. Delugach. CharGer: A graphical Conceptual Graph ed- itor. In CGTools Workshop Proceedings in connection with ICCS 2001, Stanford, CA, 2001. [Online Access: July 2001] URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

[31] H.S. Delugach. Towards building active knowledge systems with conceptual graphs. In A. de Moor, Wilfried Lex, and Bernhard Ganter, editors, Conceptual Structures for Knowledge Creation and Communications, volume 2745 of LNAI, pages 296–308, Heidelberg, 2003. Springer-Verlag.

[32] H.S. Delugach. CharGer 3.3 - A Conceptual Graph Editor. University of Alabama in Huntsville, Alabama, USA, 2004. http://www.cs.uah.edu/ delu- gach/CharGer.

[33] H.S. Delugach. Common logic standard. Located at http://cl.tamu.edu, Novem- ber 2006.

[34] A. deMoor. Applying conceptual graph theory to the user-driven specification of network information systems. In D. Lukose, H.S. Delugach, M. Keeler, L. Searle, and J.F. Sowa, editors, Conceptual Structures: Fulfilling Peirce’s Dream, Springer-Verlag Lecture Notes in Artificial Intelligence 1257, pages 536–550. ICCS, Springer, August 1997.

[35] H.-D. Ebbinghaus, J. Elum, and W. Thomas. Mathematical Logic. Springer- Verlag, Berlin, second edition, 1994.

[36] P. Ekland. Mail-Sleuth. Email Analysis Pty Ltd, Australia, 2004. http://www.mail-sleuth.com/.

[37] G. Ellis and R. Levinson. The birth of peirce: A conceptual graphs workbench. In G. Ellis and R. Levinson, editors, Proccedings of the 1st International Work- shop on PEIRCE, 1992.

273 [38] D. Eppstein. Arboricity and bipartite subgraph listing algorithms. Tech. Report 94-11, University of California, Irvine, CA 92717, February 24 1994. Depart- ment of Information and Computer Science.

[39] D. Eppstein. Subgraph isomorphism in planar graphs and related problems. Jour- nal of Graph Algorithms and Applications, 3(3):1–27, 1999.

[40] J.M. Ettinger. The complexity of comparing reaction systems. Technical re- port, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA, November 2001.

[41] B. Ganter and R.Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag, Berlin Heildelberg New York, 1999.

[42] M.R. Garey and D.S. Johnson. Computers and Intractability A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, New York, 1979.

[43] J.C. Giarratano and G. Riley. Expert Systems: Principles and Programming. PWS-KENT Publishing Company, Boston, 1989.

[44] G. Gratzer. Lattice Theory: First concepts and distributive lattices. W.H. Free- man, 1971.

[45] N. Guarino. Philosophy and the , chapter The Ontological Level, pages 443–456. Holder-Pivhler-Tempsky, Vienna, 1994.

[46] F. Harary. Graph Theory. Addison-Wesley, Reading, MA, 1969.

[47] R.T. Hartley. A uniform representation for time and space and their mutual constraints. In F. Lehmann, editor, Semantics Networks, Oxford, ENGLAND, 1992.

[48] R.T. Hartley and M. Coombs. Reasoning with graph operations. In J.F. Sowa, editor, Principles of Semantic Networks: Explorations in the Representation of Knowledge, San Mateo, CA, 1991. Morgan Kaufmann.

[49] R.T. Hartley and H.D. Pfeiffer. Data models for Conceptual Structures. In Foun- dations and Applications of Conceptual Structures, Contributions to ICCS 2002. ICCS2002, 2002.

[50] L. Henkin. The Completeness of the First-Order Functional Calculus. The Jour- nal of Symbolic Logic, 14, 1949.

[51] W. Hodges. The Blackwell Guide to Philosophical Logic, chapter 1, pages 9–32. Blackwell Publishing, 2001.

274 [52] R. Jackendoff. Semantics Structures. MIT-Press, Cambridge, UK, 1990. [53] K.S. Jones. Early years in machine translation: memoirs and biographies of pioneers. John Benjamins, Amsterdam, 2000. [54] A. Kabbaj. Un systeme multi-paradigme pour la manipulation des connais- sances utilisant la theorie des Graphes Conceptuels. PhD thesis, Universite de Montreal, Departement d’Informatique et de Recherche Operationnelle, Canada, 1996. [55] A. Kabbaj. The Amine Platform, 2004. http://amine-platform.sourceforge.net. [56] A. Kabbaj. CS-TIW 2007 Second Conceptual Structures Tool Interoperability Workshop, chapter Interoperability: The next steps for Amine Platform, pages 65–70. Research Press International, 2007. [57] A. Kabbaj and M. Janta-Polcynzki. From Prolog++ to Prolog+CG: A CG object- oriented logic programming language. In B. Ganter and G. Mineau, editors, Conceptual Structures: Logical, Linguistic, and Computional Issues, pages 540– 554, Berlin, 2000. Lecture Notes in Artificial Intelligence, vol. 1867, Springer- Verlag. [58] A. Kabbaj and B. Moulin. An algorithmic definition of CG operations based on a bootstrap step. In Proceedings of ICCS’01, 2001. [59] knowledge. Dictionary.com unabridged (v 1.0.1). Available at Dictionary.com website: http://dictionary.reference.com/browse/knowledge, November 2006. [60] knowledge. Merriam-webster online dictionary. Available at web-site: http://www.m-w.com/dictionary/knowledge, November 2006. [61] F. Lehmann, editor. Semantics Networks. Pergamon Press, Oxford, ENGLAND, 1992. [62] D. Lenat and R. Guha. Building Large Knowledge-Based Systems - Representa- tion and Inference in the Cyc Project. Addison-Wesley, Reading, MA, 1990. [63] H. Levesque. A fundamental tradeoff in knowledge representation and reason- ing. In Proceedings of CSCSI-84, pages 141–152, London, 1984. [64] LIRMM. GoCITaNT. Montpellier, France, 2004. http://cogitant.sourceforge.net/index.html. [65] G. Luger and W. Stubblefield. Artifical Intelligence - Structures and Strategies for Complex Problem Solving. The Benjamin/Cummings Publishing Company, Inc., Redwood City, CA, 1993.

275 [66] R. MacGregor. The evolving technology of classification-based knowledge rep- resentation systems. In J.F. Sowa, editor, Principles of Semantic Networks: Ex- plorations in the Representation of Knowledge, San Mateo, CA, 1991. Morgan Kaufmann. [67] A. Martelli and U. Montanari. An efficient unification algorithm. ACM Trans- actions on Programming Languages and Systems, 4(2):258–282, April 1982. [68] B.T. Messmer and H. Bunke. Efficient subgraph isomorphism detection: A de- composition approach. IEEE Transactions on Knowledge and Data Engineering, 12(2):307–323, March/April 2000. [69] G.W. Mineau. From actors to process: The representation of dynamic knowl- edge using conceptual graphs. In Marie-Laure Mugnier and Michel Chein, ed- itors, Conceptual Structures: Theory, Tools, and Applications, volume 1453 of Springer-Verlag Lecture Notes in Artificial Intelligence, pages 65–79, Heidel- berg, August 1998. ICCS 1998, Springer. [70] G.W. Mineau. Constraints on processes: Essential elements for the validation and execution of processes. In William Tepfenhart and Walling Cyre, editors, Conceptual Structures: Standards and Practices, volume 1640 of LNAI, pages 66–82, Heidelberg, July 1999. ICCS 1999, Springer. [71] G.W. Mineau and Q. Gerbe. Contexts: A formal definition of worlds of asser- tions. In D. Lukose, H.S. Delugach, M. Keeler, L. Searle, and J.F. Sowa, editors, Conceptual Structures: Fulfilling Peirce’s Dream, volume 1257 of LNAI, pages 80–94. ICCS 1997, Springer, August 1997. [72] D. Moldovan, W. Lee, C. Lin, and M. Chung. Snap parallel processing applied to ai. Computer, 25(5):39–49, may 1992. [73] H. Motulsky. Intuitive Biostatistics. Oxford University Press, New York, 1995. [74] M.-L. Mugnier and M. Chein. Polynomial algorithms for projection and match- ing. In H.D. Pfeiffer and T.E. Nagle, editors, Conceptual Structures: Theory and Implementation, volume LNAI of 754, pages 239–251. ICCS, Springer-Verlag, July 1992. [75] M.-L. Mugnier and M. Leclere. On querying simple conceptual graphs with negation. In Data and Knowledge Engineering. DKE, Elsevier, 2006. Revised version of R.R. LIRMM 05-051. [76] A. Mukerjee. Computational Representation and Processing of Spatial Expres- sions, chapter Neat vs Scruffy: A review of Computational Models for Spatial Expressions, pages 1–37. Lawrence Erlbaum Associates, Mahwah, NJ, 1998.

276 [77] S.H. Myaeng and A. Lopez-Lopez. Conceptual graph matching: A flexible algo- rithm and experiments. Journal for Experimental and Theoretical AI, 4(2):107– 126, 1992.

[78] T.E.Nagle, J.W. Esch, and G. Mineau. A notationfor conceptual structures graph matchers. In Proceedings of the 5th Conceptual Structures Workshop, Boston, MA, 1990. held in conjunction with AAAI-90.

[79] T.E. Nagle, J.A. Nagle, L.L. Gerholz, and P.W. Ekland, editors. Conceptual Structures: Current Research and Practice. Ellis Horwood Workshops. Ellis Horwood, 1992.

[80] A. Newell. The knowledge level. Artifical Intelligence, 18(1):87–127, 1982.

[81] P. Oehrstoem, J. Andersen, and H. Scharfe. What has happened to ontology. In F. Dau, M-L Mugnier, and G. Stumme, editors, Conceptual Structures: Com- mon Semantics for Sharing Knowledge, volume 3596 of LNAI, pages 425 – 438. ICCS2005, Springer, July 2005.

[82] C.K. Ogden and I.A. Richards. The Meaning Of Meaning. Harcourt, Brace, and World, New York, NY, 1946.

[83] R. Pagh. Hash and displace: Efficient evaluation of minimal perfect hash functions. In Algorithms and Data Structures: 6th International Workshop. WADS’99, LNCS, May 1999.

[84] M.S. Paterson and M.N. Wegman. Linear unification. J. Comput. Syst. Sci., 16(2):158–167, April 1978.

[85] J. Pearl. Heuristics. Addison-Wesley, Reading, MA, 1984.

[86] C.S. Peirce. Manuscripts on existential graphs. Peirce, 4:320–410, 1960.

[87] H.D. Pfeiffer. An exportable CGIF module from the CP environment: A prag- matic approach. In K.E. Wolff, H.D. Pfeiffer, and H.S. Delugach, editors, Con- ceptual Structures at Work, volume 3127 of LNAI, pages 319–332. ICCS2004, Springer, July 2004.

[88] H.D. Pfeiffer, N.R. Chavez, Jr., and R.T. Hartley. A generic interface for commu- nication between story understanding systems and knowledge bases. In Richard Tapia Celebration of Diversity in Computing Conference, Albuquerque, NM, 2005.

277 [89] H.D. Pfeiffer, N.R. Chavez Jr., and J.J. Pfeiffer Jr. CPE design considering in- teroperability. In H.D. Pfeiffer, A. Kabbaj, and D.J. Benn, editors, CS-TIW 2007 Second Conceptual Structures Tool Interoperability Workshop, pages 71–75. Re- search Press International, 2007.

[90] H.D. Pfeiffer and R.T. Hartley. Semantic additions to conceptual programming. In Proc. of the Fourth Annual Workshop on Conceptual Structures, Detroit, MA, 1989.

[91] H.D. Pfeiffer and R.T. Hartley. Additions for set representation and processing to conceptual programming. In Proc. of the Fifth Annual Workshop on Conceptual Structures, pages 131–140, Boston&Stockholm, 1990.

[92] H.D. Pfeiffer and R.T. Hartley. The Conceptual Programming Environment, CP: Reasoning representation using graph structures and operations. In Proc. of IEEE Workshop on Visual Languages, Kobe, Japan, 1991.

[93] H.D. Pfeiffer and R.T. Hartley. The Conceptual Programming Environment, CP. In T.E. Nagle, J.A. Nagle, L.L. Gerholz, and P. W. Ekland, editors, Concep- tual Structures: Current Research and Practice, Ellis Horwood Workshops. Ellis Horwood, 1992.

[94] H.D. Pfeiffer and R.T. Hartley. Temporal, spatial, and constraint handling in the Conceptual Programming Environment, CP. Journal of Experimental and Theoretical AI, 4(2):167–182, 1992.

[95] H.D. Pfeiffer and R.T. Hartley. Visual CP representation of knowledge. In G. Stumme, editor, Working with Conceptual Structures - Contributions to ICCS 2000, pages 175–188, 2000. Shaker-Verlag.

[96] H.D. Pfeiffer and R.T. Hartley. ARCEdit - CG editor. In CGTools Workshop Proceedings in connection with ICCS 2001, Stanford, CA, 2001. [Online Access: July 2001] URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

[97] H.D. Pfeiffer and R.T. Hartley, editors. CGTools Workshop Proceedings in con- nection with ICCS 2001, Stanford, CA, 2001. [Online Access: July 2001] URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

[98] J. Piaget. Genetic epistomology. Columbia University Press, New York, 1970. Trans. E. Duckworth.

[99] S. Polovina and R. Hill. Enhancing the initial requirements capture of multi- agent systems through conceptual graphs. In F. Dau, M-L Mugnier, and

278 G. Stumme, editors, Conceptual Structures: Common Semantics for Sharing Knowledge, volume 3596 of LNAI, pages 439–452. ICCS2005, Springer, July 2005.

[100] B. Prasad. A planning system for blocks-world domain. In AICCSA ’01: Pro- ceedings of the ACS/IEEE International Conference on Computer Systems and Applications, page 59, Washington, DC, USA, 2001. IEEE Computer Society.

[101] A. Puder. Mapping of cgif to operational interfaces. In Marie-Laure Mugnier and Michel Chein, editors, Conceptual Structures: Theory, Tools, and Applications, Springer-Verlag Lecture Notes in Computer Science 1453, pages 119–126, 1998.

[102] K. Radeck. C# and Java: Comparing Programming Languages. MSDN, Octo- ber 2003. http://www.windowsfordevices.com/articles/AT2128742838.html.

[103] S.W. Reyner. An analysis of a good algorithm for the subtree problem. SIAM J. Comput., 6:730–732, 1977.

[104] J.A. Robinson. Machine Intelligence, volume 6, chapter Computational logic: The unification computation., pages 63–72. Edinburgh University Press, Edin- burgh, Scotland, 1971.

[105] S. Roman. Win32 API Programming with Visual Basic. O’Reilly, first edition, 1999.

[106] S. Russell and P. Norvig. Artifical Intelligence - A Modern Approach. Prentice Hall, Upper Saddle River, NJ, 1995.

[107] G. Ryle. The Concept of Mind. Penguin Books, Harmondsworth, UK, 1949.

[108] L. Schubert. Extending the expressive power of semantic networks. Artifical Intelligence, 7:163–198, 1976.

[109] S.C. Shapiro. A net structure for semantic information storage, deduction, and retrieval. In Proceedings of the 2nd International Conference on Artifical Intel- ligence, pages 512–523, 1971.

[110] S.C. Shapiro and W.J. Rapaport. The sneps family. Computers Math. Applic., 23(2-5):243–275, 1992.

[111] A. Shokoufandeh and S. Dickerson. Graph-Theoretical Methods in Computer Vision. Number 2292 in LNCS. Springer-Verlag, Berlin Heidelberg, 2002.

[112] J. Siegel. Making the case: OMG’s Model Driven Architecture (MDA). San Diego Times, 2002.

279 [113] D. Skipper and H. Delugach. OpenCG: An open source graph representation. In A. de Moor, S. Polovina, and H. Delugach, editors, First Conceptual Structures Tool Interoperability Workshop, pages 48–57. CS-TIW 2006, Aalborg Univer- sitetsforlag, 2006.

[114] R. Soley. Model Driven Architecture. OMG, 11-05 2000. document.

[115] F. Southey. Notio and Ossa. In CGTools Workshop Proceedings in con- nection with ICCS 2001, Stanford, CA, 2001. [Online Access: July 2001] URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

[116] F. Southey. NOTIO, 2003. http://notio.lucubratio.org/index.html.

[117] F. Southey and J.G. Linders. NOTIO - a Java API for developing CG tools. In W. Tepfenhart and W. Cyre, editors, Conceptual Structures: Standards and Prac- tices, pages 262–271, Berlin, 1999. Springer-Verlag. Lecture Notes in Artificial Intelligence, LNAI 1640.

[118] J.F. Sowa. Conceptual graphs for a data base interface. IBM Journal of Research and Development, 20(4):336–357, 1976.

[119] J.F. Sowa. Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, Reading, MA, 1984.

[120] J.F. Sowa, editor. Principles of Semantic Networks: Explorations in the Repre- sentation of Knowledge. Morgan Kaufmann, San Mateo, CA, 1991.

[121] J.F. Sowa. Conceptual graphs as a universal knowledge representation. In F. Lehmann, editor, Semantics Networks, Oxford, ENGLAND, 1992.

[122] J.F. Sowa. Conceptual graphs: Draft proposed american national standard. In Conceptual Structures: Standards and Practices, editors, Conceptual Structures: Standards and Practices, pages 1–65, Berlin, 1999. Springer-Verlag. Lecture Notes in Artificial Intelligence, LNAI 1640.

[123] J.F. Sowa. Knowledge Representation: Logical, Philosophical, and Computa- tional Foundations. Brooks/Cole, 2000.

[124] J.F. Sowa. Architectures for intelligent systems. IBM Systems Journal, 41(3):331–349, 2002.

[125] J.F. Sowa, N.Y. Foo, and A. Rao, editors. Conceptual Graphs for Knowledge Systems. Addison Wesley, New York, NY, 1989.

280 [126] J.F. Sowa et al. Conceptual Graph Standard, American National Standard NCITS. et all, T2/ISO/JTC1/SC32 WG2 M 00 edition, 2001. [Access On- line:April 2001], URL: http://www.bestweb.net/ sowa/cg/cgstand.htm.

[127] B. Stroustrup. The C++ Programming Language. Addison-Wesley, 3rd edition, 2000.

[128] D. A. Tappan. Knowledge-Based Spatial Reasoning For Automated Scene Gen- eration From Text Descriptions. PhD thesis, New Mexico State University, May 2004.

[129] W.M. Tepfenhart. Ontologies and conceptual structures. In Marie-Laure Mug- nier and Michel Chein, editors, Conceptual Structures: Theory, Tools, and Ap- plications, volume LNAI of 1453, pages 334–348, Heidelberg, August 1998. ICCS 1998, Springer.

[130] R. Thomopoulos, J.-F. Baget, and O. Haemmerle. Conceptual graphs as coopera- tive formalism to build and validate a domain expertise. In U. Priss, S. Polovina, and R. Hill, editors, Conceptual Structures: Knowledge Architectures for Smart Applications, pages 112–125. ICCS2007, Springer, 2007.

[131] M. Thorup. Even strongly universal hashing is pretty fast. In SODA ’00: Pro- ceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pages 496–497, Philadelphia, PA, USA, 2000. Society for Industrial and Applied Mathematics.

[132] J.R. Ullman. An algorithm for subgraph isomorphism. J. of the Assoc. for Com- puting Machinery, 23(1):31–42, 1976.

[133] W.P. Weijland. Semantics for logic programs without occur check. Theoretical Computer Science, 71:155–174, 1990.

[134] C.A. Welty. In Integrated Representation for Software Development and Discov- ery. PhD thesis, Vassar College, 1995.

[135] A.R. White. Conceptual analysis. In C.J. Bontempo and S.J. Odell, editors, The Owl of Minerva, pages 103–117. McGraw-Hill, 1975.

[136] M. Willems. Projection and unification for conceptual graphs. In G. Ellis, R. Levinson, W. Rich, and J. Sowa, editors, Conceptual Structures: Applica- tions, Implementation and Theory, volume LNAI of 954, pages 278–282. ICCS 1995, Springer, August 1995.

281 [137] T. Winograd. Frame representations and the declarative/procedural controvery. In Readings in Knowledge Representation, pages 185–210. Morgan Kaufman, 1975.

[138] K.E. Wolff. ’particles’ and ’waves’ as understood by temporal concept analysis. In K.E. Wolff, H.D. Pfeiffer, and H.S. Delugach, editors, Conceptual Structures at Work, pages 126–141. ICCS2004, LNAI 3127, Springer, July 2004.

[139] W.A. Woods. What’s in a link: Foundations for semantic networks. In D.G. Bobrow and A.M. Collins, editors, Representation and Understanding: Studies in Cognitive Science, pages 35–82. Academic Press, 1975.

[140] W.A. Woods. Understanding subsumption and taxonomy. In J.F. Sowa, editor, Principles of Semantic Networks: Explorations in the Representation of Knowl- edge. Morgan Kaufmann, 1991.

[141] W.A. Woods and J.G. Schmolze. The kl-one family. Computers Math. Applic., 23(2-5):133–177, 1992.

282