University: Mfcrdfilms International 300 N /FEB ROAD

INFORMATION TO USERS

This was produced from a copy of a document sent to us for microfilming. While the most advanced technological means to photograph and reproduce this document have been used, the quality is heavily dependent upon the quality of the material submitted.

The following explanation of techniques is provided to help you understand markings or notations which may appear on this reproduction.

1.The sign or “target” for pages apparently lacking from the document photographed is “Missing Page(s)”. If it was possible to obtain the missing page(s) or section, they are spliced into the Him along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure you of complete continuity.

2. When an image on the film is obliterated with a round black mark it is an indication that the film inspector noticed either blurred copy because of movement during exposure, or duplicate copy. Unless we meant to delete copyrighted materials that should not have been filmed, you will find a good image of the page in the adjacent frame.

3. When a map, drawing or chart, etc., is part of the material being photo graphed the photographer has followed a definite method in “sectioning” the material. It is customary to begin filming at the upper left hand comer of a large sheet and to continue from left to right in equal sections with small overlaps. If necessary, sectioning is continued again—beginning below the first row and continuing on until complete.

4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost and tipped into your xerographic copy. Requests can be made to our Dissertations Customer Services Department.

5. Some pages in any document may have indistinct print. In all cases we have filmed the best available copy.

University: Mfcrdfilms International 300 N /FEB ROAD. ANN ARBOR, Ml 4H1H6 19 BED I ORD ROW. LONDON WC 1 H 4L.I. E NG LAND 8107373

N y e r g e s, T im o t h y L e e

MODELING THE STRUCTURE OF CARTOGRAPHIC INFORMATION FOR QUERY PROCESSING

The Ohio State University PH.D. 1980

University Microfilms International 300 N. Zeeb Road, Ann Arbor, MI 48106

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By Timothy Lee Nyerges, B.A., M.A.

* * * * *

The Ohio State University 1980

Reading Conmittee: Approved By

Professor Harold Moellering, Chairman Professor Steven I. Gordon Professor John N. Rayner Professor Edward J. Taaffe Adviser Department of Geography ACKNOWLEDGMENTS

I wish to thank my adviser, Hal Moellering, for initiating my interest in cartographic data structures and for his academic support throughout my cartographic studies. Special thanks go to Anna Graham for lending psychological and technical support during all phases of this research. I wish to thank Fred Day for providing the data concerning family planning in Thailand and for his assistance in developing the geographic base files. Next, I would like to acknowledge Stan Radzio for his assistance with Assembler programing and the Instruction and Research Computer Center for the provision of computer services. VITA

April 5, 1951 Born - Lakewood, Ohio 1975 B.A., The Ohio State University, Columbus, Ohio

1976 M.A., The Ohio State University, Columbus, Ohio

1975-1979 Graduate Teaching Associate, Department of Geography, The Ohio State University, Columbus, Ohio

1979-1980 Graduate Research Associate, Department of Geography, The Ohio State University, Columbus, Ohio

PUBLICATIONS

"Representing Spatial Properties in Cartographic Data Bases," Proceedings of the 40th Annual Meeting of the American Congress on Surveying and Mapping, St. Louis, MO, March 9-14, 1980, pp. 29-41. "A Formal Model of a Cartographic Information Base," Proceedings of the Fourth International Symposium on Computer-Assisted Cartography, AUTO-CARTO IV, Reston, VA, November 4-8, 1979. "Cartographic Query Processing in Geographic Information Systems," Proceedings of the Tenth Annual Pittsburgh Conference on Modeling and Simulation, Pittsburgh, PA, April 24-26, 19^9, pp. 805-810.

FIELDS OF STUDY

Major Field: Cartography Minor Field: Urban Geography

m TABLE OF CONTENTS

Page ACKNOWLEDGMENTS ...... ii

VITA ...... iii

LIST OF FIGURES...... vii

CHAPTER 1 Justification for a Formal Approach to Modeling Cartographic Information for Query Processing

1.1 Introduction ...... 1 1.2 Justification for a Formal Approach ...... 6

CHAPTER 2 Research Related to Cartographic Information Modeling 2.1 Introduction ...... 14 2.2 Semiotics and Structural Linguistics ...... 16 2.3 Linguistic Approaches to Picture Processing .... 22

2.4 Data Processing of Spatial D a t a...... 28 2.4.1 Levels of Structuring Data and Information . 30 2.4.2 Data Structuring Formats and a Conception of S p a c e...... 37 2.4.3 A Review of Spatial Data Structures...... 42 2.4.3.1 Vector Format Data Structures .... 44

2.4.3.2 Raster Format Data Structures> .... 52 2.4.3.3 Data Structures with Combined Formats 54 2.4.4 Data Models and Spatial Data B a s e s...... 56

iv Page 2.4.5 A Discussion of Data Structures and Data M o d els...... 64 2.4.6 Linguistic Models and Data Base Design . . . 66 2.4.7 Information Systems and Data Management . . . 70

CHAPTER 3 Modeling Cartographic Information with an Information Structure 3.1 Introduction...... 77 3.2 Entities, Objects, Attributes and Relationships . . 78 3.3 Structural Aspects of Data Bases and Virtual Maps . 83

3.4 Information Structures ...... 85 3.5 A Grammar for Information Structures ...... 94

CHAPTER 4 Cartographic Query Processing in a Geographic Information System ■+.1 Introduction...... 114 4.2 CART-QUERY, a Prototype System for Cartographic Query Processing ...... 116 4.2.1 Information Base (INBASE)...... 120 4.2.2 Query Processor (QUEP) ...... 132 4.2.3 Cartographic Query Language (CARTQUEL) . . . 137

4.2.4 Query Decoder (QUED) ...... 149

CHAPTER 5 An Evaluation of CART-QUERY and Information Structuring 5.1 Introduction...... 154 5.2 A Discussion of CARTQUEL...... 157 5.3 A Discussion of QUED...... 162 5.4 A Discussion of QUEP...... 163 5.5 A Discussion of INBASE...... 166

v Page CHAPTER 6 Summary and Conclusions 6.1 Summary ...... 170 6.2 Conclusions and Implications for Further Research , 179

LIST OF REFERENCES...... 186

vi LIST OF FIGURES

Figure Page 1. Venn Diagram Portraying Related Research ...... 15

2. Production Rules for a Phrase Structure Analysis of 'Roger walks home' ...... 19 3. A Phrase Marker for a Sentence ...... 21

4. A Subset of the Production Rules for a MapGrammar . . . 26

5. A Subset of the Production Rules for a Cartographic G ranm ar...... 29 6. Production Rules and a Hierarchical Data BaseStructure 67

7. Network Data Base S tr u c tu r e ...... 69 8. Multiple Hierarchical Data Base Structures Transformed from the Network Data Base Structure of Figure 1 ... . 69

9. Primitive Object Types ...... 79 10. Compound Object Types ...... 81 11. Attributes and Relationships for CartographicObjects . 82 12. Major Components of the Skeletal Structure in a Hypergraph...... 86 13. An Information Structure for Virtual M a p s...... 89

14. A Base Web W and Its Underlying Graph G y ...... 91 15. Four Subwebs and Their Subgraphs ...... 93

16. Formation of Compound Objects Through Deep Structure Linkages ...... 95 17. A Grammar for Information S tructures ...... 96

18. Elements of the Nonterminal Vocabulary ...... 97

vii Figure Page 19. Elements of the Terminal Vocabulary ...... 98

20. Subweb Production Rules ...... 100 21. A Nonterminal Base Web after Application of Rpj j . . . 101

22. A Preterminal Base Web after Application of Rp^ . . 102

23. Terminal Web {Skeletal Thematic M a p ) ...... 106

24. Web Marker at the Preterminal S tage ...... 108 25. Surface Structure Representation: A Map ...... 112 26. Alternatives When Utilizing Cartographic Displays in a Problem Solving Environment ...... 117 27. Major Components in a Cartographic Query System ...... 118

28. Expanded INBASE Diagram ...... 123 29. INBASE Global Information Structure for Type III Virtual Map...... 126 30. Attributes Included in Canonical Structure ...... 127

31. Data Structure of the INBASE...... 129 32. Dictionary of Keywords for CARTQUEL ...... 131 33. Flow Diagram of File Retrieval from the INBASE...... 133

34. Flow Diagram of Clustering Algorithm for Analytical P ro cessin g ...... 135

35. System Response Documenting Analytical Processing .... 139

36. Skeletal M a p ...... 141 37. Choropleth Map Using Tambon Base F i l e ...... 144

38. Health Facility Locations ...... 145 39. Keyword Comnands in the LANG-PAK Command Language .... 146

40. Cartographic Query Language (CARTQUEL) Grammar ...... 148 41. Flow Diagram of Decoding Process in the QUED Component . 150

v• n * i _t CHAPTER 1

Justification for a Formal Approach to Modeling Cartographic Information for Query Processing

1.1 Introduction

Maps, as cartographic products, are the result of a combination of

three components: science, art, and technology. In the early periods of cartography maps were predominantly artistic products backed by little

science and little technology (Eckert 1908). Later, maps were the

result of a scientific art (Wright 1942). And more recently, maps have

been the result of the combination of an artistic science with a high

level technology (Robinson, Morrison and Muehrcke 1977). A map can function in one or more of four general ways (Robinson

1977):(1) as a graphical storage device, (2) as a symbolic/iconic representation with which to examine reality, (3) as a communication

tool, and/or (4) as an analytical tool. Functions (1) and (2) are well established, i.e. these two functions represent the commonly recognizable

ways in which maps have been put to use. Functions (3) and (4) are not as common because they represent a more recent vintage. As a graphical

storage device a map (e.g. a 7V topographic quadrangle sheet) has been estimated to contain, on average, about 100-200 million bits of infor

mation (Roberts 1962). As a symbolic/iconic representation with which

to examine reality, a map has always aided travelers in spatial orien

tation (Board 1967). As a communication tool, a map is often constructed 1 2 to display relative spatial distributions of phenomena in a selective and abstract manner (Morrison 1978). As an analytical tool, a map is employed as an aid in solving geographic problems (Moellering 1975,

Tobler 1976) through an investigation of spatial relationships.

The goal of the research reported in this study is to support the development of cartographic data base theory as it concerns the 'nature of maps', whether maps function as a storage device, a symbolic repre sentation of reality, a communication medium or an analytical tool. That goal is motivated by the need to understand better the 'nature of maps' in a broader theoretical context than in terms of any one of the

four functions alone. Because the realm of cartographic products has been expanding due to new theoretical and technological innovations,

new terms to describe these products have emerged. Riffe (1970) recognized that maps displayed on a CRT are different

in a fundamental way from conventional sheet maps. The CRT image has a transient nature whereas a conventional sheet map has a permanent

nature. Maps characterized by a transient nature he called temporary

maps. Maps that are not viewable, e.g. data bank files, he called

non-maps. Through the 1970's many different cartographic products have been developed necessitating a new way of defining them. Moellering (1976,

1977, 1980) elucidates that a solution to the problem of characterizing cartographic products is the concept of real and virtual maps. All cartographic products can be characterized by two major attributes: (a) that they may or may not be directly viewable, and (b) that they

may or may not possess a permanent tangible reality. Any cartographic 3 product that is directly viewable and has a permanent tangible reality is called a real map. Thus, sheet maps are real maps, as are hard copies of images from CRT's. Any cartographic product lacking a directly viewable nature or permanent tangible reality is called a virtual map. Thus, CRT images, cartographic film animation, and digital data bases are virtual maps. However, those virtual maps are of different ‘■ypes. Moellering (1980) distinguishes virtual maps of

Type I, II, and III according to the mixture of attributes discussed above. CRT images are Type I; they are directly viewable but do not have a permanent tangible reality. Film animations are Type II; they have a permanent tangible reality but are not directly viewable. Data bases are of Type III; they are not directly viewable and do not have a permanent tangible reality, but do possess a logical organization. The term, non-map, introduced by Riffe (1970) no longer accurately describes products of Type III, because it does not allow for notions such as the human cognitive map or its machine counterpart, an infor mation base/data base. The richness of interactive cartography and numerical cartography lie in the ability to undertake transformations on data between domains of Type I and III. Such transformations are facilitated by the manip- ulability inherent in the technology utilized. Thus, data manipulation,

information retrieval and display of virtual maps is enhanced consid erably. Data are defined as factual observations which may or may not be correct representations of reality; correct representations of reality are by intersubjective agreements. Information is datum that has been 4 abstracted for a particular reason, hence datum has been given an interpretation. An information structure is a logical organization of the data content of a virtual map. The concept of an information structure is synonymous with the concept of a cognitive map. However, the term information structure more accurately describes the knowledge structure for which the term cognitive map was intended. Information structures are thus models of reality which are data abstractions, abstractions having been undertaken using deductive or inductive processes. Infor mation structures are relational structures of knowledge, containing both spatial and nonspatial information components.

Data structures are relational structures set up for the purpose of processing data by a computer. Data structures relate parcels of data and thus provide a means of access from one parcel to another. Such structures may be built from an aggregation of information struc tures. Redundant information components are minimized in a data structure. If only one information structure is being used, then an

information structure is translated directly into a data structure.

Those structures are the structural components of a data base used for query processing with an interactive cartographic information system. Query processing is a term used for describing data analysis,

information retrieval and display with a nonprocedural orientation.

A nonprocedural orientation provides flexible interaction by not con straining this interaction to algorithmic flow. Interaction is facil

itated by an interface consisting of an 'English-like’ language.

Statements in the language are constructed to query the data base and 5 are input to the system through a CRT. Information structure, data structure, virtual map, data base, information base, and query processing are terms used throughout this paper that best characterize the subject matter of the research reported here. Those terms play a central role in the objectives form ulated to help reach the goal stated on page 2. The objectives of this research are: 1) to develop a linguistic approach for modeling an information structure for virtual maps,

2) to characterize an information structure in terms of a carto graphic data structure, and 3) to utilize an information structure and a data structure to operationalize query processing with the assistance of a simple data base management system and a nonprocedural query language.

The objectives in this research stem from an increasing awareness that: (1) the modeling, manipulation and display capabilities of spatial information systems are interdependent, and (2) 'intelligent and usable' cartographic systems are needed to support a capability for solving complex geographical problems (Rhind 1976, Bennet 1976).

Reaching those objectives to produce an 'intelligent and usable* carto graphic system in a programming environment necessitates a systematic approach to various fields of study. These include a study of linguistics as it pertains to picture processing in cartography, a dis cussion of syntactic and semantic characteristics of cartographic

information, Plus a study of data base design involving data models and data base management for cartographic/geographic query processing. 6

1.2 Justification for a Formal Approach

Pictures and photographs depict scenes which are, at times, more informative and more universal than words--whether spoken or written.

For some purposes, maps are even more informative than either pictures or photographs because of the consistency of selection and organization of the information (Robinson 1977, Robinson and Bartz-Petchenik 1976).

A recognition of this consistency has led many authors to suggest general notions about a ‘language for maps', i.e. cartographic language

(Toulmin 1953, Ackerman 1957, Bunge 1968, Harvey 1969, Dacey 1970a,

Betak 1972, 1975, Morrison 1974a, 1974b, 1976). Because there is a con sistent use of map elements, then, maps can be said to possess a syntax (Peucker 1972, Morrison 1976, Youngman 1978). Utilizing this notion of syntax should make it possible to define a set of rules, a granmar, for the arrangement of map elements in various forms. The feasibility of deriving cartographic granmars has been addressed in Petchenik (1974) and again in Robinson and Bartz-Petchenik

(1976). They discuss a deficiency of grammars when describing the over all graphic quality called the 'look of maps' (Petchenik 1974, p. 63). These authors further suggest that, "... there may well be unsurmountable obstacles to the employment in cartographic theory of such word-1anguage concepts as grammar and syntax," (Robinson and Bartz-Petchenik 1976, p. 43). This point of view is drawn from the fact that map perception deals with at least two-dimensional representation, whereas discourse deals with a unidimensional progression of spoken or written words. That approach and interpretation of Robinson and Bartz-Petchenik is divergent 7 from the use of a linguistic approach exemplified in the research on picture processing as discussed below (see e.g., Nake and Rosenfeld 1972,

Narasimhan 1974, Rosenfeld and Kak 1976). Picture processing from the viewpoint of generating descriptions of input pictures involves three distinct levels of processing. These are: (1) preprocessing in terms of noise-cleaning, figure-ground separation and image idealization and/or refinement. (2) object artic

ulation in terms of single object feature analysis and feature abstraction, and (3) scene analysis in terms of multiple-object

relational structure analysis (Narasimhan 1974). Since the mid 1960's a considerable volume of research has been reported concerning syntactic models in picture processing such as

syntactic analysis of pictures and generating descriptions of pictures; however, we are s till very far from being able to automate even the simplest of activities. According to Narasimhan (1974), reading contour maps, region maps and weather maps is a complex pictorial problem. To analyze and describe efficiently complex pictorial data, two kinds of facilities are essential. The first is an appropriate data-structure. The second is powerful problem-solving heuristics to make the relevant decisions at various levels of processing. . . The 2-dimensional picture-plane data structure, incorporated in parallel processing languages . . . offers significant advantages in the preprocessing and object-articulation levels. But other kinds of data structures may be needed for higher level processing where the decisions, in general, have to be based on global relational considerations. (Narasimhan 1974, p. 744) The value of relational structure analysis using picture grammars has been demonstrated using pictures in closed-ended classes, i.e. classes composed of synthetic pictures which are generated by the grammars (simple figures in Shaw (1969b) and chromosones in Ledley (1964)). However, the value of picture grammars for the analysis 8 of real-life pictures, i.e. pictures in open-ended classes, is somewhat more restricted due to their complexity and range of subject matter. Consequently, as Narasimhan (1974) states, a syntactic model of an open-ended class of pictures may, at best, serve as a working hypothesis because the open-ended class will require 'hetararchic' processing rather than strictly hierarchical processing as for closed-ended classes of pictures. Hetararchic processing involves the possibility of cycling back through a hierarchy to aid further discrimination of objects. In addition, Narasimhan states:

The distinction between synthetic and real-life pictures is quite analogous, in detail, to the distinction between formal language texts {such as programs written in a computational language) and natural language texts (or speech utterances). (Narasimhan 1974, p. 745) It is the assumption here that maps do not accurately f it either extreme of closed- or open-ended classes. They lie somewhere between the extremes. They are not simple synthetic pictures, and have not yet been generated by a closed set of rules, a grammar. For example, Dacey (1970a, 1971) has attempted to describe a two-dimensional syntax for applications in cartography, but the result is far short of a syntax for characterizing maps. In fact, the syntax is more applicable to a single cartographic symbol that may appear on a map than to a complete map. But neither do maps belong to the class of open-ended pictures because cartographers generate them using computer programs and because carto

graphic grammars are beginning to be developed. For example, the Geographic Information Management and Mapping System called GIMMS (Waugh and Taylor 1976) is a set of computer software that can generate, perhaps, thousands of different maps through the application of various types of commands; and Taketa (1979) offers a grammar that describes a topographic sheet. However, general rules are not fully formalized as yet to describe all maps. Thus, maps seem to have characteristics of both open-ended and closed-ended classes of pictures. As further support for the possibility of developing grammars for maps, Youngman (1978) offers a rebuttal to the narrow definition which Robinson and Bartz-Petchenik (1976) use for the term 'language*: Formal languages, such as mathematical algebra and computer programming languages, are defined by the statement of a lexicon of language elments and a system of rules for arranging and interpreting these elements. It is in the sense of a formal language that grammars are proposed for maps and not as derivatives or even corollaries to natural languages such as English. (Youngman 1978, p. 8) Furthermore, the connotations of the term 'language' implicit in the Robinson and Bartz-Petchenik discussion of grammar are ascribed to what linguists call the 'surface structure1 of 'natural1 language. A surface structure representation is simply a string of terminal elements (words) forming a sentence as in English or French. The notion of language, however, concerns much more than just surface structure; it also concerns the notion of 'deep structure' (Allen and Van Buren 1971). A deep structure of a statement in a language is the underlying relation ship among nonterminal elements, i.e. structural concepts of the language such as noun phrase and article. Thus, the major importance of syntactic models lies in the structural relations among the elements of the language, i.e. in the deep structure relations among nonterminal elements as well as the structural relations among terminal elements. 10

The advantage of using a linguistic approach is not dispelled because sentence surface structure is different from picture surface structure. Maps are agreeably much different than sentences as regards to surface structure since their physical representation is different. This as well applies to the difference in surface structure between elements of abstract algebra and maps; but, as Morrison has shown, functional processes concerned with maps, i.e. structural processes, can be expressed in algebraic terms (Morrison 1978). The function fi represents a cartographer's algebraic mapping from raw data to sensed data; and g-j represents an algebraic mapping from sensed data to the map. Thus, the elements of raw data that appear on a map are part of a compound functional mapping. Such an algebraic mapping is a struc tural relation of the cartographic process. Morrison further states

that: . . . the efficient use of the cartographic language requires a syntactical structure and grammar for symbolization based on the thresholds of a map reader's ability to perform detection, discrimination, recognition and estimation tasks in a spatial framework . . . and once the map reading tasks are thoroughly researched, the grammar of the cartographic language should become apparent. (Morrison 1976, p. 93) Morrison is suggesting that by an examination of map reading performance we will be led to map competence—a grammar for maps. It is true that: Performance provides evidence for the investigation of competence, (but) . . . it is d ifficult to see how performance can be seriously studied except on the basis of an explicit theory of the competence that underlies it, and, in fact, contributions to the understanding of performance have largely been by-products of the study of graranars that represent competence. (Allen and Van Buren 1971, p. 7) The above remarks by Morrison are directed entirely at a grammar describing human map reading in a context of map communication. 11

However, one can view the need for a grammar in a broader context—a context which involves systematic investigations of transformations for not only map reading by humans but transformations for machine map analysis, generation, and interpretation as well. Focusing on the latter of these processes is a step toward a systematic investigation of the nature of spatial relationships, both in surface structure and deep structure. An understanding of the types of relationships involved in spatial analysis is of paramount importance to the design of both spatial data bases and the information systems used for querying spatial data bases. According to Rhind (1976, pp. 516-17), 'intelligent and usable cartographic systems' for querying spatial data bases should have the

following characteristics:

1) Easy to use 2) Data independent 3) Device independent 4) Easily extendable by system designers and users 5) Highly reliable and easily repairable 6) Have decision making capabilities 7) Have internal arrangements to preserve privacy when necessary Those characteristics can be accommodated most efficiently in a data base environment. Date (1977, p. 4) defines a data base as a collection of stored operational data used by the application systems of some particular enterprise, e.g. any large-scale commercial, scientific, technical or other operation. According to Linders (1975, p. 162), a cartographic data base represents a repository of spatial information 12 of both a thematic and topographic type as well as a functional capabil ity for access, retrieval, and depiction of this information. Date (1977, p. 7) lists the advantages for processing data in a data base environment:

1) The amount of redundancy in the stored data can be reduced

2) Problems of inconsistency in the stored data can be avoided (to a certain extent) 3) The stored data can be shared 4) Standards can be enforced

5) Security restrictions can be applied 6) Data integrity can be maintained 7) Conflicting requirements can be balanced

8) Data independence Support for utilizing a formal approach to designing data bases for query processing has been presented by Tomlinson (1979, p. 191) in a discussion about the problems inherent in querying spatial data bases. These problems are: 1) There is no widely accepted and clearly defined set of spatial relationships between entities 2) There are no clearly identified categories of spatial query which can be specified in terms of the operations that they require to be performed on spatial data 3) It is not clear whether the use of modern data base management systems is inhibited because of the present imprecisions out lined in 1) and 2) above, or whether the relationships and queries are adequately defined from a user's standpoint and it is the technology of data base management systems that is inadequate 4) There is no clear understanding of the relative applicability of the various data structures inherent in existing data base man agement systems to the task of recording spatial relationships 13

5) There fs little understanding of the relationship between the need for explicit definition of spatial relationships in digital data base management systems, and the use of display to permit human observation and recognition of relationships 6) There is little understanding of the relationship between the need to specify spatial relationships explicitly for data base management systems and the calculative capacities of present and future computers 7) There appears to be no competent source of advice within the profession of geography or elsewhere which can provide answers to these questions at this time As indicated by the above list of problems, spatial associations between geographic features in a data base context are not well understood. Therefore, one is not yet sure which associations are to be stored and which are to be derived from stored data. At present, the role of explicit information is to provide the basis for a conceptual model from which implicit spatial relations can be observed. Among the many reasons for the problems concerning complex spatial data bases is undoubtedly our lack of understanding of the overall problem. This lack of under standing is partially due to a lack of a formal treatment. Linders (1975) concludes that proper handling of various levels of data and information involved in a data base necessitates data base models which provide an extensive capability for data base generalization and abstraction. However, what is of equal if not greater importance is a better understanding of the notion of association in the context of data base design, i.e. what is it that is an association and how these associations can be discussed in a formal manner. Before the objectives and problems of this chapter can be considered in depth, a review of the research related to those objectives and problems is presented in Chapter 2. CHAPTER 2

Research Related to Cartographic Information Modeling

2.1 Introduction

Overlapping efforts in the fields of computer graphics, cartography, and geography have defined the existence of interactive geographic cartography (Moellering 1975). For the research reported in this paper it is instructive to view the overlap of research as a four-part relationship: 1) computer science as a study of formal (programming) languages, grammars, graph theory, data structures, and data base design; 2) picture processing as a study of computer graphics, scene analysis, and image processing; 3) cartography as the science, art, and

technology of making maps with a focus on numerical data processing and data structure; and 4) geography as a study of spatial relationships in

terms of spatial analysis. (See Figure 1). The intent is to show a relationship by intersection only and not magnitude, since common interests are only beginning to be investigated. The separation of picture processing from computer science is a matter of convenience to identify realms of computer science which are not usually thought as being part of picture processing (e.g. data base design). Research in related fields having a direct influence on developing a formal, cartographic model involves structural linguistics, a linguis tic approach to picture processing, and logical data base design with

14 15

COMPUTER SCIENCE and DISCRETE MATHEMATICS GEOGRAPHY Formal Languages - Grammars Spatial Analysis Data Structures - Graph Theory Spatial Relationships Data Base Design

INTERACTIVE CARTOGRAPHY

CARTOGRAPHY PICTURE PROCESSING Numerical Processing Computer Graphics Spatial Data Structures Scene Analysis Image Processing

Figure 1. Venn Diagram Portraying Related Research 16 data structures and information structures. This linguistic approach utilizes formal grammars and graph theory to describe logical structure in pictures and maps. The implementation of a formal model involves research topics in data base organization and data base manipulation in information systems and, of course, numerical interactive cartograpy as cartographic query processing.

2.2 Semiotics and Structural Linguistics

Semiotics is a study of the language elements: signals, signs and

symbols, and their role in syntactic, semantic and pragmatic processes. Syntax is the a priori structural aspect of a language. That is, a set of de jure rules is assumed to exist that guide the production and interpretation of elements in the language. Semantics is the meaning

of the signal, sign or symbols in terms of the phenomena they represent

in reality. The meaning may not be the same in all contexts. Pragmatics is defined as the interpretation of the semantics of phenomena in a

given context; this context may include space and time. The interpret ation is highly dependent first upon primitive and compound structural

elements and second, upon the meaning of the structural elements of the

message being communicated to a receiver.

The difference between linguistics and semiotics is that linguistics

deals only with symbols in human language; whereas semiotics considers signals of machine language and signs of animal language as well. Con sequently, semiotics is a general field of study encompassing struc tural linguistics and has been described ", . . as a generalization of linguistics attached to a generalized analytical philosophy." (Nauta 17

1972, p. 46). In order that a system of symbols can function as a language, it has to meet two minimum criteria: (a) every symbol must be producible and receivable by every member of the language community (b) the symbols must be combinable in a certain way to form compound symbols. Condition (b) means that the symbol system must have a SYNTACTIC STRUCTURE. {Nauta 1972, p. 47) Not all systems of symbols meet the minimum criteria. For example, traffic signs are symbols; however, not all individuals can produce them

(at least, by law) and these signs are not combinable into compound symbols. Thus, the set of rules accompanying criteria (a) and (b) for traffic symbols are not well systematized to make the set of traffic symbols a language. Thus, traffic symbols lack a de jure grammatical structure, although they may possess a de facto one. Only the rules of artificial languages (occurring in logic and science) are de jure rules, i.e. they state something which ought to happen regardless of whether it does or not (Nauta 1972). Cartographic rules are a mixture of de jure and de facto rules. Some design considerations are well recog nized practices, for example, figure-ground considerations; whereas others, for example, gray scale shading or texture-tone pattern use, are not. Whether these are de jure or de facto is still a matter of discussion.

* The conflict between syntax and semantics is a major consideration impinging on the identification of rules, whether the rules are de jure or de facto. The major reason for using artificial languages is that they reduce pragmatics and semantics to a level of syntactics. However, in less artificial languages this can not be accomplished completely. 18

Thus, with languages other than deductive logic a controversy remains as to whether syntax preseupposes semantics or vice versa. The conflict resides in basic assumptions about structure and meaning. Is meaning derived completely from structure or is structure realized after a meaning is given? No attempt is made to pursue this topic, for such an attempt would involve a treatise that must concern the ontological aspects of reality, which eventually leads to a discussion of ontolog ical aspects of metaphysics (if this is possible). The theoretical approach to linguistics introduced by Chomsky

(1957, 1963, 1965) revolutionized structural linguistics. A structural

approach assumes that the meaning of a sentence is derived from an

'underlying structural relation' of the elements in the sentence. The

structural analysis of a natural language (e.g. English) as introduced by Chomsky (1957) involves a specification of sentence structure in

terms of grammars, phrase structures and phrase markers. A gramnar

consists of three major components: a vocabulary (lexicon), production rules, and an initial symbol. A vocabulary consists of nonterminal (phrases) and terminal (words) elements. Production rules generate descriptions of sentences. Any particular set or lis t of production rules grouped according to application is called a phrase structure.

For example, in the sentence 'Roger walks home* the production rules involved in a phrase structure analysis would be given as in Figure 2.

The production rules in the phrase structure of Figure 2 are

derived from a context-free phrase structure grammar. A context-free gramnar is defined as a lis t of production rules wherein each rule is

applied in its order in the list, regardless of its context of 19

Sentence ::= Noun Phrase + Verb Phrase

Noun Phrase ::= Article + Noun

Noun ::= Roger Article Null Verb Phrase ::= Verb + Noun Phrase

Verb ::= Walks Noun Phrase ::= Article + Noun

Noun ::= Home Article ::= Null where:

i * Sentence is the initial symbol; ' indicates 'is rewritten as'; '+' indicates 'is concatenated to' .

Figure 2. Production Rules for a Phrase Structure Analysis of 'Roger walks home'. 20 application. Furthermore, the left side of the production rule will always have one nonterminal symbol only; and this symbol will be rewritten as a concatenation of symbols on the right side of the rule.

Consequently, the initial symbol 'sentence' is the starting point of the description and is rewritten as 'noun phrase + verb phrase'. The nonterminal symbol 'noun phrase' is rewritten as 'article + noun' and the nonterminal symbol 'noun' is rewritten as terminal symbol 'Roger', etc. The phrase structure of a sentence can be diagrammed as a phrase marker (see Figure 3). A phrase marker takes the general form of a tree. The root of the tree is the initial symbol. The root node sprouts branches connecting the nonterminal nodes and these in turn become other nonterminal nodes or terminal nodes. The terminal nodes become the leaves of the tree. Transformational rules (Chomsky 1965) are rules applied to a phrase structure, eventually providing for a transformation of the surface structure of a sentence while retaining a similar meaning in a sentence. Nonterminal symbols and their associated production rule ordering are called the deep structure of a sentence. Terminal symbols form the surface structure of a sentence. Transformation rules are applied to the deep structure of a sentence, rearranging the nonterminal elements while retaining their internal relationships, but ultimately transforming the terminal elements--perhaps deleting an unnecessary article--into an alternative surface structure. Thus, the deep struc ture of a sentence stays the same but the surfacestructure changes. For example, the surface structure of the sentence 'Roger walks home' SENTENCE Initial Symbol

NOUN PHRASE VERB PHRASE Nonterminal Symbols ARTICLE VERB VERB NOUN PHRASE

ARTICLE NOUN

NULL ROGER WALKS NULL HOME Terminal Symbols

Figure 3. A Phrase Marker for a Sentence 22 can be transformed into the alternative form 'To home Roger walks' if we simply change the position of the firs t 'noun phrase + verb phrase' to 'verb phrase + noun phrase*. For this reason the potential meaning of a sentence resides in the deep structure component of a sentence,

i.e. the relationship among underlying conceptual elements. A similar type of conceptual process is posited for maps. Some components of a map are always related to other components of a map in

a structural way regardless of the surface structure. For example, a geographic domain, the mapped geographic area, is linked in some way to the legend appearing on the same map. In another example, a geographic domain can be 'rewritten' as a grouping of parts of the domain. The sum of the parts of the domain, however, do not carry the same concep tual message as the concept 'domain'. That is, the concept 'domain' carries a structural relation in addition to the parts; and if one understands the concept of geographic domain then additional infor mation is transmitted. These notions are elaborated in the next section.

2.3 Linguistic Approaches to Picture Processing

A picture can be defined as a cohesive group of graphic objects. An object is built of constituent parts called primitives which are

points, lines, areas, or faces of the object. Graphic objects may be

combined to form compound graphic objects. Those compound objects, when taken together, are then the picture. An analogy between pictures and sentences which characterizes a linguistic approach in picture processing can be discussed under three main topics. The firs t concerns the notion of competence--an implicit 23 general knowledge of sentences and pictures. Competence in a formal linguistic sense concerns the explication of this knowledge in terms of a system of rules which describe the structure of a sentence/picture. Thus, a sentence/picture is said to have a syntax (structure) which can be described by a set of rules called the grammar. The second topic concerns the notion of performance--the task of working with a sentence/ picture. That is, performance involves the actual process of con struction or analysis of the sentence/picture by a person or by machine. The third topic is that of meaning, whereby a given construction or analysis of a sentence/picture is given some relevant interpretation. The meaning of the sentence/picture results from the combination of syntactic and semantic rules employed during performance. Applications of a linguistic approach to picture processing with computers was first used in studies dealing with the recognition of handwriting (Eden 1961, 1962). Shortly thereafter, the f irs t general linguistic model was used for analyzing bubble chamber photographs (Narasimhan 1962, 1964), for describing chromosone patterns (Ledley 1963, 1964), and for describing 'hand printed' English characters (Narasimhan 1966). A linguistic approach was also utilized for describing simple pictures in several dimensions (Kirsch 1964). The approach was extended further to consider more complex pictures (Anderson 1968, Shaw 1968, 1969a, 1969b, 1970, 1972, Feder 1971, George 1971, 1972). Surveys of the field can be found in Feder (1966), Miller and Shaw (1968), Rosenfeld (1969, 1975), and Rosenfeld and Kak (1976). The main emphasis of a linguistic approach in picture processing has been on the analysis and recognition of pictures rather than their 24 generation. Consequently, most linguistic models are asymmetrical in design because of a lack of concern for generation. Kulsrud (1968), Shaw (1968), Miller and Shaw (1968), and George (1971, 1972) share the opinion that analysis and generation are symmetrical processes, and both should be incorporated into a model. A linguistic model such as this would then describe pictures in a systematic manner with syntactic and semantic rules. There exist many grammars which have been proposed as a single mechanism for describing the structure and general properties of pictures. We can subdivide these grammars into five general categories: string, plex, array, web (graph), and map grammars. String grammars generate expressions in a language much like those of sentences in natural English (Shaw 1968, Westman 1977). These expressions are linear expressions, read left to right, but are often composed of embedded parentheses specifying the order of primitive concatenations which build the picture. Plex granmars generate a language form much like mathema tical expressions; but they have the peculiarity that a separate list of nodes is used for concatenations of subparts of a picture (Feder 1971). Array grammars use small square blocks to generate right-sided triangles (Kirsch 1964, Dacey 1970b, 1971). A special formulation of an array grammar using mosaic tiles is a mosaic grams'" (Ota 1975). A fourth cat egory is that of a web or graph grammar (Pf.iltz and Rosenfeld 1969, Montanari 1970, Rosenfeld and Strong 1971, Pavlidis 1972a, 1972b, Rosenfeld and Mil gram 1972), A web or graph grammar generates a language which is a set of labeled graphs (webs). Although all of the grammars mentioned can describe pictures, the web grammar (I.e. graph grammars) 25 appear to be the most promising from a cartographic point of view because of the explicit topological considerations. Rosenfeld and Strong (1971) discuss a derivative of web grammars which they call map grammars for the description of a hypothetical map. An ordering of edges around a labeled node of the graph specifies the ordering of boundaries around a polygon of the map. A node in the graph represents a polygon. The class of hypothetical maps to which the grammar applies indicates a limitation of the grammar. The class of maps is a figure with polygons having closed boundaries, each polygon may have inlyers but no nested zones. A grarmar developed by Westman (1977) creates a map-like schematic description of landscapes. The expressions created by the grammar in the language are a set of directions for building a natural environment for animal studies. Although the grammar is not rigorously defined, it does represent a phrase structure grammar which generates a structural description of a schematic similar to a large scale topographic map. Youngman (1978) discusses maps in a context of picture processing and utilizes a linguistic approach for map description. A transform ational, phrase structure grammar is used for describing a map in a similar way that Chomsky analyzed a sentence. A subset of the rules is presented in Figure 4. Youngman does not provide a defintion for the signs ' ► ' and Thus, if we assume that the two signs indicate 'is rewritten as' and 'is concatenated to ', respectively, this may lead

to a problem with interpretation. The rewrite sign can be given an appropriate interpretation, but the concatenation symbol is problematic. If the concatenation symbol represents concatenation of nonterminal 26

Map ► Reference + Information

Reference ------► Base + Grid

Information ------► Title + Symbols

Base ------► Place + Scale Grid ------► Frame!ines + Gridlines Symbols » Legend + Data

Data ------► Points + Lines + Areas

Points ------► 1 x', 'y'

Figure 4. A Subset of the Production Rules for a Map Grarrmar (Youngman 1978, p. 16) 27 elements, in what way are they concatenated? Perhaps, the symbols are concatenated through logical association of conceptual symbols; but are the elements 'points + lines + areas' concatenated in the manner indicated? This concatenation problem needs clarification. Taketa (1979) reviewed the potential contribution of communication theory, semiotics, and linguistics to cartographic communication theory. He recognizes that the theoretical basis of classical commu nication theory (Shannon and Weaver 1949) does not approach the problem of cartographic communication in a manner that provides an understand ing of the basic process of cartographic communication. However, Taketa does recognize that semiotics (Nauta 1972), a study of signals, signs, and symbols in syntactic, semantic and pragmatic processes, provides an excellent theoretical basis to study cartographic commu nication. Semiotics offers an excellent theoretical basis because of its inherent logic in considering all types of language systems. However, semiotics lacks an analytical mechanism to study cartographic communication. Taketa argues that linguistics can and does provide an analytical mechanism. In a general linguistic context, structural linguistics is related to the syntactic and semantic processes of

semiotics, and therefore is an excellent approach for characterizing a

logical construction of maps.

Taketa examined a number of theoretical proposals (Chomsky 1965, Lakoff 1971, Bartsch and Vennemann 1972) for the linguistic structuring of language and found none to provide a final answer. The linguistic dilemma concerning the precedence of structure or meaning has not been resolved and still continues to plague linguists. The grammar for map description proposed by Taketa (1979) is an extension of Youngman's notions of cartographic grammars. Taketa‘s grammar includes not only a method for representing hierarchical, deep structure relationships in map displays, but also a method for incor porating spatial relationships in a grammar. Spatial relationships between map objects are incorporated by using predicate logic phrases, similar to functional expressions. Thus, an object or compound symbol is rewritten as two or more primitive objects inside a predicate expres sion representing the spatial relationship between them. A subset of the production rules is presented in Figure 5. Taketa employs the symbols 1----► ' and '+' in the same manner as Youngman. Without further definition an implicit indication of concatenation of elements signified by the '+' symbol presents confusion with functional elements such as 'CONT'. There may be a conflict in the symbolization employed. The application of the grammar is directed at cartographic communication in a context of cartographic generalization.

2.4 Data Processing of Spatial Data

A general review covering the last twenty-five years of computer assisted cartography (CAC) has been presented by Rhind (1977) and prospects for the next twenty-five years are presented by Robinson, Morrison and Muehrcke (1977). Progress with data processing in CAC has followed on the heels of progress in the field of computer science through a borrowing of theoretical and operational notions, with direct exceptions in the areas of cartographic theory development. Three general frontiers of development in data processing can be identified: 29

P: Map ► CONT(Land, Water) Land ► OVER(Road, Terrain) Terrain ------► Hills + Meadows + Points Points ------► Points + Np; Np

N ------Carmel Point; Pinnacle Point; p Point Lobos; South Point

where CONT signifies contiguity of elements OVER signifies overlay of elements

Figure 5, A Subset of the Production Rules for a Cartographic Grammar (from Taketa 1979, p. 142) 30 data organization, data manipulation, and hardware. Data organization concerns the definition, structure, and abstraction of data which can be made at various levels. Data manipulation involves algorithms and their implementation in software programming languages, whether this be a single program or a system of programs, not necessarily written in the same language. Hardware is a term used to describe any physical device for storing, manipulating or displaying data. Although the mixture of these three frontiers plus theoretical developments in carto graphy have contributed to the evolution of data processing in CAC, the main thrust of this review concerns data organization and data manip

ulation. Data organization is discussed first, focusing initially on levels of data organization followed by a discussion of the philosophies behind data structuring formats. A review of spatial data structures precedes a discussion of information structures, their role in data models and their relationship to data structures. Linguistic models are then considered in terms of data base design. Lastly, the section on data manipulation is presented focusing on data management

procedures for information systems.

2.4.1 Levels of Structuring Data and Information

The topic of data representation has received considerable

attention in two fields of research: data base management and programming languages. Although many believe that the fields are two

separate areas for research, the topic of structuring data or data representation is common to both. Structuring data and data repre sentation for data base management are usually discussed in a broader 31 context than in programming languages* A particular organization of data in a data base management context is usually referred to as a data model, whereas in the programing language context one refers to data structures. Data models are therefore more general or abstract than data structures, however every data model must ultimately be implemented using some data struc ture. Consequently, there is a tendency to identify data models with the data structure in which these models are logically operationalized.

This is not a difficulty as long as one recognizes that this is the case. Unfortunately, outside the realm of the computer science discipline there is a tendency to confuse the two and believe that they are the same. Wiederhold (1977) uses the term data model to refer to a single user's view of data and data base model to refer to a model of an entire data base. However, in Date (1977) and in accordance with ANSI/SPARC recommendations, the user view of the data is termed an external model.

Date (1977) discusses the difference between data models and conceptual models as introduced by ANSI/SPARC. He argues that existing approaches in the design of data bases consist of no more than aggregating all user views of data, hence data model, rather than attempting a general

overview of the data needs of an organization, hence conceptual model. However, he does recognize, as does ANSI/SPARC, a growing need for development of conceptual models because these models will help formalize our understanding of data base design. 32

Data models can be referred to as generalized data structures.

Data models are more likely to take on characteristics of conceptual models if they are derived from an information structure and canonical structure level of data organization (defined in the next section). The term data base takes on many meanings, especially outside of the computer science literature. At one extreme the term indicates a large volume of data that does not necessarily have any logical struc ture. At the other extreme, the data may be a well-integrated set of files having a very sophisticated logical structure underlying the data. In the first case, the physical structure mirrors the logical structure; that is, they are the same. In the latter case, the logical structure is completely different from the physical structure. A data base having a simple logical structure can, at most, provide simple, straight forward data retrieval. A data base having a complex logical structure can be utilized for complex data retrieval or analysis, whereby data elements can be retrieved or analyzed based on their relationship to

other data elements. Many researchers have generally agreed that the problem of data

base design can most effectively be discussed using different levels of abstraction (Senko et a l. 1973, Senko 1976, Palmer 1974, Date 1977, Martin 1977, Tompa 1977, Bubenko et a l. 1976, Wiederhold 1977). How ever, these levels are not generally agreed upon mainly because they

are discussed in different contexts, e.g. levels of data organization, levels of data description and transformation, levels of models, and

levels of structure and representation. Senko et al. (1973) and Senko 33

(1976) distinguishes between an infological level and a datalogical level. The purpose of developing infological and datalogical levels of data base design is to distinguish between information and data. Information is an interpretation given to data when data are selectively retrieved and analyzed in a data base. The infological level concerns the logical structuring of information in the data base whereas the datalogical level concerns representation of data, i.e. data structures, that support the information structuring. Palmer (1974) discusses the transformation of one data representation into another level of abstraction. He points out that although one may approach data bases from different 'user views', these views may not be characteristic of different data representations. According to Date, a user's view of data is called an external model, the data base administrator's view of data is called the data model or conceptual model and the computer's view of the data is termed the internal model. Martin (1977) distin guishes logical data organization from physical data organization. The logical level includes canonical structures and data models as general data structures whereas the physical level includes low level data structures and data representation at the device level. Tompa (1977) has identified five different levels of data structuring: data reality, data abstraction, information structure, storage structure, and machine encoding. Each of these levels must be considered in the process of solving a problem with a computer programming language. According to Bubenko et a l. (1976), researchers in the field of data base design have identified three levels of structural design for data bases: information structure, data structure, and storage structure. 34

An information structure corresponds to the end-user level where infor mation is referred to in problem-oriented and implementation independent terms. The data structure level corresponds roughly to a level where one has decided upon the data representation of information. For instance, one is concerned with a data structure when utilizing a particular data model that is based on a set of computer software in a particular data basemanagement system. The storage structure level is concerned with how records, sets, etc. are implemented and how these are allocated to physical devices on computer hardware. The levels of data organization presented below are not described in terms of data base views or descriptions because there is confusion in the literature as to the difference between the two terms and what they should include. However, the levels of organization as presented here are not totally divorced from views or descriptions. In fact, the confusion resides in the overlap or similarity with various organiz ational levels. Six levels of organization have been identified, each having a particular advantage for isolating it as a separate level.

The levels are: 1) Data Reality - The data existing as ideas about geographical entities and their relationships which knowledgeable persons would communicate with each other using any medium for comnuni cation. 2) Information Structure - A formal model that specifies the information organization of a particular phenomenon. This structure acts as a skeleton to the canonical structure and includes entity sets plus the types of relationships which exist between those entity sets. 35

3) Canonical Structure - A model of data which represents the inherent structure of that data and hence is independent of individual applications of the data and also of the software or hardware mechanisms which are employed in representing and using the data. 4) Data Structure - A description elucidating the logical struc ture of data accessibility in the canonical structure. There are access paths which are dependent on explicit links, i.e. resolved through pointers, and others which are independ ent of links, i.e. resolved through other forms of reference. Those access paths dependent on links would be based on tree or plex structures as in network models. Those access paths independent of links would be based on tables as in relational models. 5) Storage Structure - An explicit statement of the nature of links expressed in terms of diagrams which represent cells, linked and contiguous lis ts , levels of storage medium, etc. It includes indexing how stored fields are represented and inwhat physical sequence the stored records are stored. 6} Machine Encoding - A machine representation of data including the specification of addressing (absolute, relative or symbolic), data compression and machine code. Data reality could be subcategorized to include two stages: raw data and sensed data. Raw data involve the portion of reality that is potentially useful to an organization. Sensed data are those data which are deemed useful and actually collected by an organization for storage. An information structure represents an abstract structuring of classes of data in some organized fashion by individuals who use this data. Thus, the relationships between information classes(called entity sets) become information to those using the data. A canonical structure is a minimal structure of data developed from all information structures. This minimal structure includes an entity only once in the structure, this entity being shared by perhaps 36 multiple information structures. The canonical structure is a design structure used to reduce the data redundancy in overlapping information structures. If there is a single information structure for an organ ization, the canonical structure is synonymous with the information structure. A data structure is the logical organization of the canonical structure stated in terms of a given set of data management, computer software. Consequently, the accessibility of the data items is specified as either access path dependent or access path independent. Network data bases employ plex structures which are access path dependent because of the type of symbolic pointer addressing. Relational structures are, supposedly, access path independent because they employ functions which resolve the similarity or association between data entities. Data structures are often combined to produce more flexible ways of representing and processing data. A storage structure is the actual manner in which any given data structure is implemented. This manner of implementation must take

into consideration the physical size of records in a file and the number of files. Many different storage structures could be developed

for the same data structure and many data structures could potentially

use the same type of storage structure. The machine encoding of data is the most basic level. Different computers have different lengths of computer words, a basic unit of

storage, e.g. IBM employs 32 bit words, CDC employs 60 bit words. The addressing of these words is installation specific, thus many low level 37 programs, e.g. in assembly language, are not portable from one type of computer to another. Each level of data organization is at a different level of abstraction. A data base designer's view of this topic is assumed, thus greatest abstraction lies with data reality and least abstraction with machine encoding. One should keep in mind that a non-programming user of a data base would probably view this continuum of abstraction in the opposite order because such a person is concerned with data reality and information structure only.

Not all of the organizational levels have been utilized in carto graphy. In fact, until recently only some of them have been used in any integrated information system. In cartography, Simpson (1954) and

Tobler (1959) were among the firs t to apply computer technology in the map making process; whereas Bunge (1962), Schmidt-Falkenburg (1962), de Dainville (1964), Moles (1964), and Board (1967) were among the first to discuss a map in terms of a conceptual model. In computer assisted cartography, as in computer science: "Historically, the formal study of data structures (as a formal investigation of a logical model) has been preceded by their use in computing environments," (Haseman and

Whinston 1977, p. 56). In addition, the content and structure of information derived from data processing with data structures has tended to precede formal studies about the nature of information itself.

2.4.2 Data Structuring Formats and a Conception of Space

In the field of computer graphics in general, and recently in cartography, there is a recognizable discord developing between groups 38 of individuals who concern themselves with two fundamentally different data formats--vector and raster. The controversy over 'which is the best' is still raging. Raster might be faster but it is less flexible in terms of modeling information relationships with which we might be concerned. This controversy is fueled partly by basic differences in hardware processing. The discord underlying the basic dilemma is not new to cartographic data structuring due to its close ties with geographic epistemology.

The different data structure formats for vector and raster graphics currently in use in numerical cartography demonstrate different episte- mological approaches to spatial conception (much of this discussion originates in Chrisman 1978a). A choice of a conceptual model for structuring data indirectly implies a spatial theory which is in turn based on a spatial epistemology. Information structures and their implementation as data structures are the basic models of spatial knowledge in numerical cartography, and determine our ability to examine geographic/cartographic questions in an electronic data-processing environment. A choice of basic units in the data structure format, and the way these units interact in terms of relationships underlie the basic difference between raster and vector approaches. Any given choice ultimately operationalizes a certain conception of space, A discussion about basic units and their relationships is presented here as a philosophical background for a discussion about data structures.

Every data structure in numerical cartography is organized around a basic unit of spatial information. The choice of a unit, hence organization, is of paramount importance to the structure and eventual 39 meaning of a cartographic data structure. Two fundamentally different organizations exist. In one sense, space is viewed as an empty field filled in by building blocks, e.g. as in pixels of a raster data struc ture. In another sense, it is viewed as a field of discrete objects, e.g. as In objects of a vector data structure. A conception of space as either space-filling related or object related has played a crucial role in a debate between absolute and relative theory, e.g. Aristotelian and Newtonian space is absolute, Einsteinian space is relative.

The Aristotelian concept of absolute space treats space as a positional quality attached to each object. The Newtonian concept of space arose from a modification of Aristotle's, but Newton cast space in a more important light--as "a container of all material objects,"

(Einstein 1953, p. xiii). Newton's concept of space emphasized the continuity of Aristotle's space but placed a more fundamental truth value on space (Jammer 1956). It then became possible to conceive of space empty of objects--the void. For most cartographers an Aristotelian concept of space is favored because of its place orientation. The Aristotelian view asserts that 'Nature' abhors a void. Voids on maps are due mostly to poor design considerations. Since the early 1960‘s a Newtonian concept of absolute space has attracted theoretical geographers who are interested in space-filling notions of abstract space; but it had not attracted cartographers until

recently. Cartographers who are interested in satellite imagery and image processing techniques and thus raster technology, must deal with space-filling concepts due to the technology. Thus, a priori theories 40 of space--those with space-filling notions--are a recent vintage. The concept of place is an absolutist notion, particularly when implemented in the form of a uniform grid system (Chrisman 1978a, p. 10). A grid structure is located with respect to a coordinate system, and remains without reference to objects or spatial process. It is possible to conceive of an abstract space with an empty grid, void of objects. There is no natural reason to impose a particular grid over human activity and to treat the cells as distinctly derived. It is possible to distinguish structure and process well below the level of any practical grid system. Sampling theory as a basic study of grid accuracy (Tobler 1969) assumes that variation exists below the limits

of grid resolution in an essentially continuous space. In the Einsteinian notion of space, space is characterized as a quality derived from the relationships between objects. This

relationship is relativistic in character. Only the most basic of data structures stop at the definition of basic units. Data structures should include a way of relating basic units (Wirth 1975). Still other relationships can be derived indirectly by the examination of other information. Relationships that are incor

porated into a data structure are important for the conceptual utility of the data structure. An implicit relationship exists when the structure of the data, e.g. a grid structure as in a matrix, or a pixel as in a raster data structure, contains the relationship as a matter of position due to storage. The neighborhood of a cell in the grid is implied by the positional notation; (i+l,j+l), (i,j+l), (i-l,j+l), etc. This type of 41 structure can do no more than relate one place to other places. Con sequently, natural objects can only be retrieved by referring to places

(i.e. grid cells) severely restricting the access to relationships between objects, as well as between attributes and objects. In contrast to this, it is not difficult to assemble places from an object-oriented data structure (i.e. a polygonal structure). Independent object encoding contains coordinates from which one can derive the discrete indices of a grid. Concerning explicit relationships, the recent adoption of graph theoretic notions for cartographic data structures has provided a conceptual framework which allows for the introduction of relativistic space in cartographic models. Whereas 'objects' and 'place' are ancient notions, 'topology' is a relatively recent product of math ematical abstraction which emphasizes the explicit 'relationship' between objects. Thus, a basic topological fact is a relationship between objects. An object in a topological data structure serves as a link to two other objects of either higher or lower dimensionality.

Consequently, there is no direct adjacency of objects at the same dimension, but an adjacency through an intermediary--two regions juxtaposed at a boundary line, two lines at a topological node.

Although topology may not seem important for map display, the complete nature of topological relationships is important for algorithmic

analytical, and information theoretic reasons (Corbett 1975,

White 1979). 42

2.4.3 A Review of Spatial Data Structures

Spatial data structures include data structures developed in the fields of computer assisted cartography (CAC) and picture processing. Picture processing is a generic term which includes computer graphics, visual pattern recognition and image processing. As Guptill (1978) points out, the field of CAC is not a proper subset of computer graphics but does have many things in common with it. CAC is characterized by a larger set of problems, especially in terms of data volume. Con sequently, some approaches which are applicable in computer graphics (especially in data structures) will not suffice in cartography. The same argument may be applied to visual pattern recognition and image processing. However, as computer scientists become more interested in macro-geographic-scale problems in pattern recognition and image processing, we will see more contributions applicable to both CAC and picture processing. Reviews of cartographic data structures have appeared in Chrisman (1974) and Peucker and Chrisman (1975). The Laboratory for Computer

Graphics and Spatial Analysis recently sponsored and published an eight- volume set of papers presented in a symposium on cartographic data structures (Dutton 1978). The eight volumes represent a survey of the field of cartographic data structures. The controversy between vector and grid/raster format was clearly evident at the symposium.

Those who favored vector format argued that explicit storage of object-oriented relationships are logically essential because of the way in which humans conceptualize geographic entities and spatial 43 relationships. Those who favored raster format argued that raster processing is faster, more flexible and more natural in the computer processing environment. Surveys of data structures for computer graphics have been under taken by Gray (1967) and Williams (1971). These surveys focus on data structures for graphic display. A Conference on Computer Graphics, Pattern Recognition and Data Structures included four sessions on data structures (IEEE Computer Society 1975). Many of the papers concerning data structures were extended and later included in a book by Klinger et a l. (1977). A Workshop on Picture Data Description and Management (IEEE Computer Society 1977) included a number of papers on data structures and data base management procedures for pictorial data bases. Shapiro (1979) surveyed data structures currently in use in visual pattern recognition and image processing. She observed that data structures in these fields of research have many characteristics in common with data structures utilized in computer graphics. Her survey includes a four part taxonomy divided into: 1) linear list structures, 2) hierarchic or tree structures, 3) general graph or network struc tures, and 4) complex recursive structures. Interestingly, the taxonomy is presented in the basic chronological order in which the types of structures were developed. And furthermore, the chronological development and taxonomic subdivision can be ascribed to cartographic data structures as well; although this particular classification has not appeared in the cartographic literature to date. 44 All data structures are either vector-oriented, raster-oriented, or a combination of both. Consequently, this review of data structures is subdivided into three major sections in a similar fashion. Classifi cation by format of data structure is not the only way in which data structures can be classified as was shown by Shapiro (1979). In practice, data structures do not usually f it into mutually exclusive classes because they are often combined to provide a greater flexibility in data representation. In addition, lists, trees, and networks can be associated with vector formats as well as raster formats when one considers logical organization of basic building blocks.

2.4.3.1 Vector Format Data Structures

In the 1950’s and through the early 1960's a data structure was a simple linear lis t in which the ordering of points on the cards repre sented its overall structure; the data structure organization mirrored the storage structure organization. This data structure has since become known as 'cartographic spaghetti' (Schmidt 1969, Chrisman 1974). For some time cartographic models as ad hoc one-time off programs and data structures were characterized by this type of organization. In fact, some of these models s till exist, e.g. the World Data Sank I and II distributed by the CIA (Anderson, Angel and Gorny 1978). Little data manipulation other than the production of maps for display can be accomplished with this structure. Displaying any single entity, e.g. the boundary of a single country or group of countries, necessitates reorganization of the data. This stems from the fact that entities or cartographic objects are not conceptually defined 45 except in terms of an entire lis t of points. There is one major advan tage to this type of organization however; the total amount of data stored on files is int its most compact form. In the mid 1960's SYMAP, a system originally developed by H. Fisher, was redeveloped and released as one of the first products of the Laboratory for Computer Graphics and Spatial Analysis at Harvard University (Schmidt and Zafft 1975). With the introduction of SYMAP came the widespread application of a new level of sophistication for numerical cartographic models; an object-based, location list data structure for choropleth and isopleth mapping, and a grid-based data structure for isarithmic mapping. Although these types of carto graphic products and the techniques used in their construction had been known for some time (Robinson 1952, 1960), the introduction of computer assistance (requiring a rigorous specification of the surface) contributed to an expanded awareness about the nature of surface representation. For choropleth map models, objects (polygons) are conceptually defined as well as physically defined in terms of a list of (x,y) coordinate pairs representing the boundary for each polygon to be mapped. For isopleth map models nearest neighbors of data points are located by a search of the data for nearest objects of a similar value and isopleths are then determined from distance measurements associated with those values. For isarithmic mapping, irregular spaced data are set to a regular grid (grid values being estimated from six closest neighbors) to densify the surface with data points for easier, more efficient computation of the isarithms. 46

Because each polygon in an object-based data structure is defined by the encoding of distinct points in a location lis t, underlaps and overlaps of adjacent polygon boundaries, called slivers, tended to occur. This was especially a problem when further work with location list and grid based models extended to applications resulting in line plotter output with higher resolution than the traditional line printer output common to SYMAP (Arms 1970, Merrill 1973, Holmes et a l. 1974).

In addition to a problem of slivers, double storage of a single bound ary line also tends to increase the size of the files by about two-fold, although this applies to external storage only (Peucker and Chrisman 1975). In the late 1960's the recognized ability to process array sub scripts (because of increased core size in computers) brought about a sophistication in the location list later called the point dictionary (Chrisman 1974, Peucker and Chrisman 1975). Point dictionary structures are designed to improve location lists by establishing unique ident ifiers (subscripted arrays) for points in the entire file. This struc ture has two or more files, one for the list of all points and one or more for feature files containing pointers which uniquely identify a given coordinate pair. The result is a 'sliver free' boundary list for mapping systems (e.g. CALFORM, 1969, in Schmidt and Zafft (1975) and INTURMAP in Peucker (1974)). A disadvantage of a point dictionary is with its core storage requirements, but not in the size of the file on

external storage. The main memory storage problem arises because entities are not independent and self-contained, thus the entire dictionary must reside in memory during processing. Currently, this 47 problem has been ameliorated due to the introduction of direct-access disk storage allowing access to data in a direct manner, these data being stored externally on disk. During the late 1960's and early 1970's the development of the Dual Independent Map Encoding (DIME) structure by the Census Use Study (Cooke and Maxfield 1967) signaled a major change in the concept of data structure. A DIME file does not merely store information about a single entity but it stores relationships between entities in terms of a topological network. The development of the DIME structure came in response to the Address Coding Guide (ACG) developed as a computer- based procedure for organizing mail lists for self-enumeration in 1970. Since the connected graph representation of the ACG contained only streets and not areas it was difficult to edit, verify and update. Consequently, the ACG file was enhanced into a DIME file by adding the block number on either side of a street segment and retaining both the topological nodes defined by the endpoints of the segment and a range of street addresses for that segment. Since the basic unit of the data structure is a street segment, the file is very large and,

consequently, almost unmanageable for large cities. At about the same time as development of the DIME method, a similar method was being utilized in the Canadian Geographic Information System (Tomlinson 1968, Switzer 1975) for representing boundaries of polygons. However, in the data structure of the Canadian system polygons are the basic unit and are constructed by taking the first right or left turn upon leaving a boundary to find the next boundary to be

plotted. 48

The topological principles applied in the DIME structure were utilized to develop a new structure as a basis for data structures called a POLYVRT chain structure (Corbett 1975, Peucker and Chrisman

1975). Whereas the DIME structure consists of distinct records for each line segment defined by two endpoints, the chain structure consists of records for each uncrossed boundary line which may contain a number of segments. The endpoints of the uncrossed boundary are the topo logical nodes while the intermediary points are for cartographic definition. The chain structure facilitates more accurate cartographic description because a large number of small segments needed for representing curved lines is not a burdening constraint. The chains also have right and left polygon neighbor identifiers, to provide neighborhood definition. Using the topological nodes and the chains with right and left polygon neighbor identifiers a complete topological model can be constructed which is cartographically accurate to the level of digitizing error and reasonably compact in its data storage.

Throughout the 1970's other refinements on the data structures previously mentioned have taken the form of hierarchical structures to offer different levels of resolution. These hierarchical structures are called trees. The basic units of a tree (leaves) are attached exclusively to branches which meet at intermediate units which in turn meet at other more basic intermediate units and finally are connected to the base of the tree called the root node. Trees function as data compression and generalization structures because of unit aggregations. For object-based trees, polygons have been aggregated into large spatial units such as rectangular map sheets or sections (Cook and 49

Johnson 1973, Tomlinson 1972, Feagas 1978) or irregular patches (Edson 1975, Peucker and Chrisman 1975, Basoglu and Morrison 1978,

Baxter 1978, Edwards, Durfee and Coleman 1978). The evolution of cartographic data structures has reached a point where considerable conceptual and formal discussion takes place regarding the advantages and disadvantages with various types of data structures and processing (Shamos and Bentley 1978). At the Harvard symposium on topological data structures it was generally agreed that topological completeness is both necessary and sufficient for a well- formed cartographic data structure (Chrisman 1978b). There are two general types of surfaces to be represented in cartographic data structures: those which assume the value of data to be constant within each set of zones, e.g. census tracts, and those which assume con tinuously varying data, e.g. topography. L ittle (1978) discusses the complexity involved in the interaction of the two spatial forms and outlines a technique for interfacing the two, employing a natural extension of topological relations in data structures. The advantages and disadvantages of global versus local topological data structures were also discussed. Many cartographic data files contain large volumes of data. Many researchers contend that global data structures are impractical because large ^volumes of data are stored in main memory while checking topological relationships. In contrast, local processing of data, concerned only with sections of the model at any given time, proves to be beneficial because of reduced processing time and storage needs. Unfortunately, a local network data struc ture is not amenable to complex analytical operations at a global 50 level. This requires global relationship specification. Thus, data structures based on local networks—which are advantageous for batch- oriented display--perform poorly in an interactive problem solving environment. Global network data structures are more suited for data base operations, especially operations undertaken in an interactive problem solving environment. Brassel (1978) has proposed a global network data structure for multi-element map processing in which the ’node' is given the highest importance. These nodes (not necessarly topological) signify locations for phenomenologically important points—whether it be an isolated point, line intersection, polygon centroid and/or topo logical nodes. In addition to these purposes, nodes also serve in a capacity of hierarchical identification of Thiessen polygons for proximity relationships as well as neighborhood relationships.

Brassel's data structure was never implemented beyond a beginning experimental stage. Similar data structures have been proposed by Cooke (1978) and Edson and Lee (1977). Peucker (1978) classifies global data structures for topographical surfaces according to geometric and distributional characteristics. Types of data structures based on geometric elements are point-network structures, line structures and patch structures. The distribution characteristics concern irregular versus regular distributions of elements. The point-network structures involve (x,y,z) coordinate triples for data points with topological relations identified implicitly in regular networks and explicitly in irregular networks. The line structures are also usually defined by points, but the order of points 51 has a geometric as well as a topological significance. Patch structures are determined by mathematical, usually polynomial, functions which are valid for limited portions of the surface. Over the past few years Peucker (1978) has expended considerable effort and time inves tigating triangulation as a means for representing topographic surfaces. He concludes that a triangulated irregular network is the best repre sentation of a surface because it best retains the 'natural' charac teristics of the terrain; these characteristics are called critical points, e.g. peaks, pits and passes. Gold (1978) also reports a triangular element data structure as the best data structure for surface modeling because of its ability to retain all information in data points. Males (1978) discusses a similar approach to surface modeling, and reports that five years of experience employing irregular networks in terrain modeling has shown this practical utility. Thomas (1978) combines a data structure for three-dimensional volumes with a data structure for topographic surfaces to describe the built form of urban areas. The data structure for volumes employs facets, edges and points, whereas the data structure for topographic surfaces employs an irregular network of points. He reports that although volume data structures can be used to represent topographic surfaces, their use in such a capacity is too complex to justify the effort if large volumes of explicit pointers must be maintained. He

suggests an alternative to explicit pointers as a graph-matrix formulation of.*implicit pointers to solve the topology problem. 52

2.4.3.2 Raster Format Data Structures

The simplest type of raster-like data organization is a regular grid cell structure. Grid data structures have been utilized for a number of years because of the ease with which surface characteristics could be encoded manually and easily stored in a computer. Land use characteristics depicted on land-use zone maps for urban planning is an area where this approach has been practiced widely. Another form of raster data organization, and the one representing

current technology, is scan line format. Each scan line, i.e. raster, consists of a row of cells, each cell called a pixel, i.e. picture element. Grid data structures employ large-sized grid cells, whereas

raster data structures employ a fine mesh of pixels. Raster-like approaches were used widely before specialized graphics equipment was introduced. The row, column encoding used for C-MAP base files is a form of raster-like encoding, a pixel being the size of line printer characters (Scripter 1969). The output from SYMAP takes the form of raster output, however, the input structure is vector-oriented. Raster scan technology has made the practice of manual encoding almost obsolete, but the underlying purpose for utilizing such an approach is still the same; it is easier to collect digital data in this format, e.g. satellite imagery. Since individual raster cells, pixels, are listed in sequence all row and column defined locations are implicit and are determined by location in the lis t. In this 53 format, spatial and topological relationships of spatial entities are retained as an integral part of the data format and do not need to be explicitly stored. Utilizing a run-length encoding process, the data can be stored in a compact form. Peuquet (1979) reports on the state-of-the-art in raster data handling in geographic information systems. She concludes that the potential currently exists for implementing raster processing in geographic information systems in terms of algorithms that have been developed for image processing purposes. Data structures for raster data sometimes take the form of hierarchies which are built from squares rather than scan lines

(Durfee 1974, Tanimoto and Pavlidis 1975, Donelson 1978). These hierarhcies involve a grid cell which is subdivided into four smaller cells. A successful merging of LANDSAT imagery, depicting land use or land cover, with topological network base files of census tracts has been reported by Bryant and Zobrist (1978). The topological network is converted to image form by computing the latitude/longitude datum value and storing it as if it is a gray tone intensity value of the pixel in which it is located. The datum values become part of the image matrix which is used for display. This system is being developed at the Jet Propulsion Laboratory in Pasadena, California for the U.S.

Census Bureau. 54

2.4.3.3 Data Structures with Combined Formats

There are two types of combined approaches to structuring data with raster and vector format. One involves a combined grid with vector organization in a hierarchical structuring, while the other involves a flexible structure which can represent either raster or vector data. The data structure of the Canadian Geographic Information System

(Tomlinson 1968) is an early approach to combining the advantages of grid structures with the advantages of network structures. A map sheet is divided into squares for easy reference, but the data structure for polygonal representation is a network. This approach to network paging was improved upon and implemented by Cook (1978) in an experimental system for representing land cover in Australia.

In another attempt at combining data structures similar to those mentioned above, Weber (1978) investigated three types of cartographic data structures: polygonal, intersection graph, and gridded. The complementary advantages and disadvantages of vector and raster orientation led him to propose a fourth data structure which is a hybrid of the three listed above. The hybrid structure takes the form of a map sheet covered by a network of nested squares of systematically shrinking sizes. The leaves of the hierarchic structure contain the pool addresses for points, lines and areas which are in or pass through the respective leaf, i.e. grid square. The fixed size of the lowest level of the hierarchy should be larger than a pixel size of grid square size, otherwise nothing is gained with this type of representation.

The optimal, smallest size for a grid cell depends on the density of 55 the map and is reported as being a 'couple of millimeters' for large scale topographic maps. A recursive data structure introduced by Haralick and Shapiro (1978) and reformulated in Shapiro and Haralick (1979) is a general structure whichcan be expressed in either a vector or raster format. A spatial data structure D is a set D = (Rl, ... , Rk); where Rl, ...,Rk are mathematical relations (Codd 1970). A relation is a set of domains, thus Ri = (SI, ... , Sk). Each domain consists of a set of values, each value representing an attribute of a tuple. Each tuple is similar to a record in a file , the tuple being a set of attributes associated with an entity. Since the structure D is recursive it can be defined in terms like itse lf, where a value for an attribute in R may be another D. Thus, a map and all objects which comprise a map can be represented by structures such as D. This same structure can be used in raster format where D, in this case, is a map or map area and the set of relations Rl, ... , Rk represents the lis t of pixels in the map area. Many approaches to the utilization of both vector and raster techniques have concerned vector-to-raster or raster-to-vector conversion. These mainly involve input and output procedures. Input procedures are those where digital data are captured in raster format, e.g. scanning digitizers, but transformed to vector format for processing. The output procedures are those which involve the conversion of vectors to rasters. For example, Versatec raster plotter software processes data with vector format algorithms. An intermediate step to plotting involves the sorting of vectors which are output eventually in a raster 56 fomat by the plotter hardware.

2.4.4 Data Models and Spatial Data Bases

As mentioned previously, data models can be considered to be abstractions and generalizations of data structures. In this section we define and discuss four data models that have been applied to spatial data bases and two that offer interesting constructs but have not been applied in a cartographic context. The four models—hierarchical, network, relational and hypergraph— that have been applied in a cartographic context differ mainly in terms of their basic unit of construction, data redundancy and method of associations among objects and attributes. All models emphasize a data structure level of data organization; however, only the hypergraph-based approach alludes to the integration of an information structure level of development with the data structure level. The basic unit of a hierarchical data model consists of record types which contain attributes. Each record type may be a superior or dependent record type. These are organized in a tree data structure.

The initial record type is called the root record type and may have any number of dependents. Each superior, whether it is the root or lower in the hierarchy, is connected to its dependents by a link which is either implicit or explicit depending on the storage structure. Since the dependents of superior records cannot be shared in a tree, the model forces considerable data redundancy to occur if a given dependent is associated with more than one superior record. 57

The U.S. Geological Survey uses a data base management system with a hierarchical* data model for storage and retrieval of data from carto graphic files (Elassal 1978). The single factor that most influenced the implementation of such a system was data volume. The data file attributes are classified into a three-level hierarchy. The first level of the hierarchy is concerned with structure of a data file. The second level describes the characteristics of the tape on which the data file resides. The third level concerns the information content of the data file such as geographic name of file , types of information on the file, e.g. hydrography, reference system and accuracy. There are seventy-two attributes which are kept off-line on tapes, and can be retrieved by specifying the proper attributes. Since the data base involves only retrieval, insert, and update operations there is no need for a more flexible model. As of this time there are no reports in the literature of any applications of the hierarchical approach to cartographic/geographic information systems that have a high degree of on-line interactivity. Date (1977, p. 57) argues that the hierarchical approach is not flexible enough to support complex data base situations and that the user is forced to devote time and effort to solving problems that are intro duced by the model rather than those intrinsic to the questions being asked. The fundamental building block of the network data model is the record type (CODASYL Committee 1971). A record type consists of one or more data items. Data items name actual data in a record occurrence of a given record type. A one-to-many relationship defined between 58 two record types is called a set. One of the record types is declared the owner of the set and the other the member. Each record occurrence of the owner record type 'owns’ zero, one or more record occurrences of the member record type. A schema in the network model is a description of a data base in terms of a data description language. The schema consists of descrip tions for record types, set relationships and data items. A graphical depiction of a schema is a general characterization of the associations among record types in the schema. This graphical depiction is often confused with the schema itself. The graph-like depiction of a schema is similar to an information structure in terms of specifying logical associations among record types; however, it is more closely allied with data structure diagrams (Bachman 1969) which are utilized to display access-path dependent data structures, i.e. data structures which rely on explicit links between data elements. Consequently, there is some question about the capability of the network model to achieve data independence, particularly at the information structure level which is supposed to be free from the shackles of access path dependency (Chen 1976). Phillips (1977) utilized a network approach to model entities and their associations in an oil lease data base. Using the schema diagram displayed in Phillips (1977) it is difficult to distinguish between spatial relationships and entities. They are both depicted as record

types. Also, one cannot distinguish between logical relationships which are part of cartographic syntax and spatial or phenomenological relationships which are semantic, since they are both depicted by 59 singly-linked set relationships. The access path dependency in the logical schema is camouflaged effectively to a user, but only at a cost of internally defining every link among record types in a record type adjacency matrix (or more accurately, a dependency matrix). Such a practice remains questionable if a large data base is to be developed, especially if such a data base passes through many phases of restruc turing . The fundamental building block of the relational model is a relation (Codd 1970). A relation is a mathematical relation consisting of an unordered (possibly ordered in a cartographic application) set of domains, A domain is a pool of values, each value being an instance of an attribute. Attributes are names given to the domains. Relations are expressed a tabular form, a very simple form of representation.

Each relation (table) consists of tuples (the rows of the table) and attributes (the columns of the table). Each tuple (similar to a record case in a file organization) describes an entity (object). Because the fundamental building block of the relationa1 model is the relation--

consisting of named domains--it achieves a high degree of data independence. Consequently, in tabular form i t is an access path independent data structure representation. Since a relation is both the representation for data and infor mation, there is no representation analogous to a schema diagram in the network model. That is, a representation which provides a general

depiction of entities and their relationships does not exist in the relational model. It has been suggested that the virtues of the simple, relational form of representation may lose some important 60 semantic information about the world (Hainaut and Lecharlier 1974) because the information must be constructed by the user during each use of the data base. Although that criticism might be true to some degree it is not the case in general. Relations have been used in relational hierarchies to represent structural segmentations of images, scenes, and pictures (Kunii et a l . 1974, McKeown and Reddy 1977, Chang et al. 1977). Relations have been used to represent semantic relationships in the form of data base abstractions called aggregation and gener alization (Smith and Smith 1977): Aggregation refers to an abstraction in which a relationship between objects is regarded as a higher level object . . . A generalization is an abstraction which enables a class of individual objects to be thought of generically as a single named object. {Smith and Smith 1977, pp. 106-7) Smith and Smith (1977) have constructed schematic diagrams to depict an aggregation and generalization of objects. However, these diagrams do not represent the complete array of relationships which could potentially be included in data bases for geo-cartographic querying. Carlson et a l . (1974), Williams (1974) and Go, Stonebraker and Williams (1975) report implementation of a relational model for geographic/cartographic data bases. A relation defines a single cartographic object such as a polygon, a group of objects such as the geographic domain of the map (Williams 1974), and the map in its entirety (Go, Stonebraker and Williams 1975). In a relation that defines a polygon, the domains and tuples can be ordered to achieve a

savings in storage space; otherwise each coordinate pair would have to be duplicated in the relation. 61

In a relation defining a geographic area, domains and tuples can be unordered. There are two problems with this approach. Inserting or deleting tuples in an unordered relation is cumbersome due to a sequential search process. The second problem concerns data redundancy of coordinates in relations defining a polygon. Duplication or redundancy in this type of data can lead to significant waste of storage space. This latter problem is the reason why cartographic data structures currently employ topological, segment-definition files

rather than polygon point-definition files. The hypergraph-based data structure (HBDS) model is a general structure such as a tree model, network model, and a relational model. However, these latter models can be considered subsets of the HBDS (F. Bouille, personal communication 1980). The HBDS model is based on the theory of hypergraphs (Berge 1976), an extension to graph theory which conveniently includes the theory of sets for very flexible repre

sentation of a phenomenon. As Bouille states: According to the set theory a set is composed of elements whicn have properties and may present relations or not. Though a property is nothing but a particular relation, we keep this old distinction. Using the abstract data type concept (Liskov and Zilles, 1974) we consider four abstract data types respectively named: class, object, attribute and relation. They must be associated with distinctive graphical concepts. Graph theory is generally used, but cannot correctly represent the difference between a set and its elements. We include here the hypergraph concept as the main component of the (data) structure. The skeleton of structure is a partial subgraph of the data structure composed of the arborescence Ac (classes in the hypergraph), the edges of the hypergraph (hierarchical links between classes) H, and the multi graph (links between classes carrying relations). (Bouille 1978, p. 1) Thus, the HBDS is based on two fundamental concepts: hypergraphs and multigraphs. Hypergraphs are similar to graphs in the sense that both 62 are formed by nodes and edges; however, the hypergraph concept extends the definition of edges to include a bounding edge around a collection of elements. The collection of elements forms a set of objects iden tified by a principal node that acts as an abstract representation of the set. These nodes are the arborescence of classes; the links which represent spatial or graphical relationships are the multi graphs. The two models that have not yet been applied in a cartographic context, entity set and entity relationship, offer some interesting constructs for the design process. The entity set model (Senko et al. 1973, Senko 1976), which is part of the Data Independent Accessing Method (DIAM, now called DIAM II) was formulated in an attempt to operationalize a notion of information structuring. Information struc turing is a method used for representing information about entities as conceived in 'data reality '. This information structuring approach uses non-redundant binary associations to model information in given contextual situations. DIAM was originally developed by researchers at

IBM as a set of models against which data base ideas could be tested.

To date, no cartographic data base applications have reported an im plementation of DIAM or the entity set model, and their potential use has not been reported in the cartographic literature. The model provides a valuable contribution to data base design when that design is most concerned with discussing the nature of relationships between entities.

The entity relationship model discussed by Chen (1976) was developed in an attempt to unify the network, relational and entity set models. The entity relationship model, having borrowed the most beneficial concepts from all three models, can be viewed as a 63 generalization or extension to those models. In the entity relation ship model, entities are classified by different entity sets. Any entity may belong to one or more entity sets depending on its context. Entity sets are represented as relations as in the Codd (1970) relational model. Relationships can be classified by different relationship sets. A relationship set is a relation defined on members of an entity set or sets. The ‘role1 of an entity in a relationship is the function an entity performs in a relationship. The information about an entity or a relationship is expressed by

a set of attribute/value pairs. An attribute is a function which maps from an entity set or relationship set into a value set or Cartesian product of value sets. A value set is a total set of values over a given domain (domains in relational terminology); each value is the actual datum representing an attribute. Attributes and value sets are

different concepts although they are often confused as being the same in some models. This distinction is not made in the network model

and is seldom made clear in the relational model. Since an attribute is a function it maps a given entity to a single value, or single tuple

of values in the case of a Cartesian product. The entity set model introduces the notion of information struc turing at an infological level of data base design. Since this level is close to data reality it makes it easy to communicate the design of a data base. The entity relationship model utilizes an information structure level. Such a level makes i t easy to design data bases using a top-down approach because one can talk about data reality before one considers data representation. In fact, Chen (1976) discusses the 64 entity relationship model in terms of translations to other data models at lower levels of data representation.

2.4.5 A Discussion of Data Structures and Data Models

The preceding review implies that big and more complex data struc tures are better. However, any realistic discussion and evaluation should be couched in terms of tradeoffs (costs and benefits)—what does one get for what one must pay. In recent years there has been consid erable discussion about the design of different data bases in terms of their complexity, i.e. whether complex data base models are more advantageous than simple file structures (Chrisman 1978b). A final answer can only be specified in terms of a matter of purpose. For the less sophisticated data structures, e.g. a string of points, location list, point dictionary and even simple chain struc tures, as long as line drawings are the primary purpose for their construction, those structures are not easily challenged. Thus, if one puts little effort into the structure design then one cannot expect much in return. As for the more sophisticated structures, only if one is going to make considerable use of data bases should they be developed. If one is not going to undertake query processing then a data base model is not necessary*, in fact, it would be wasteful. Substantial software and hardware interfaces are needed for data base models. The tradeoffs are numerous and can only be resolved upon proper design of the system, which includes the design of the data base model and the associated data structure. 65

The choice of elements in a data structure often influence a computer algorithm. For example, the USGS Geographic Information Retrieval and Analysis System (GIRAS) network data structure (Guptill 1978) utilizes arc files rather than polygon files to convert from a polygon to grid structure, thus providing order of magnitude improve ments in computation time over the polygon to grid methods for the polygon to grid overlay problem. As for other data structures, the spatial regularity of pixels and grid squares makes it easy to resolve spatial boundaries and thus facilitates processing during an overlaying of structures. However, grid structures are prone to lower levels of resolution than vector structures. Raster format data can be displayed faster than vector format data, but developments in algorithmic manipulation of raster data for analytical purposes lags behind that for vector data. Optimal tech niques for processing rasters by different types of computers, e.g. parallel or array processors, have existed for ten years. However, these processors are s till new technologies in the sense that they can not handle volumes of data and therefore have not been introduced on

the commercial computer market in large numbers. They are s till somewhat experimental. The processing undertaken in numerical carto graphy is still very much vector-oriented, partly due to the notion that human perception is object-oriented rather than raster-oriented. Since the focus of research reported here is on cartographic information modeling and data base design, the data models offer advantages in logical organization that simple data structures do not.

Furthermore, even among the data models there are those which are, 66 perhaps, better suited to modeling information, e.g. the HBDS model and the entity relationship model. The concepts and structures underlying those two are the most flexible. Unfortunately, implementation of each of the data models in the manner discussed requires an enormous overhead in computer software on the order of fifteen man-years of programming effort. That is a critical drawback if the software is not readily available at a low cost.

2.4.6 Linguistic Models and Data Base Design

Researchers in picture processing have been attempting to utilize linguistic notions for specifying the relationships between objects in pictures and translating these specifications into data structures (Pfaltz and Rosenfeld 1969, Youngman 1978). In addition, there has been an attempt at utilizing linguistic models in a data base context. According to Bonczek (1976) and Bonczek and Whinston (1976) a linguistic model can characterize two major properties of a data base: (a) the structure of the data base and (b) the storage and retrieval capabilities of the data base. They emphasize that a grammar derived for a data base does not necessarily describe a data base in a unique manner; they point out that what is important is that it is a formal characterization. A linguistic model for a hierarchical data base is a context free grammar. Consequently, there is a similarity between a hierarchical data structure diagram and a phrase structure diagram, i.e. both are tree diagrams. The grammar for describing a hierarchical data base is a triple: 6^ = (V,I,R), where V is a set of vocabulary elements, 1=1 is an initial symbol, and R is a set of production rules. The set V is composed of two, non-empty, finite, disjoint subsets, and Vy. The set = { 1,2,3,4,5,6 } is the set of nonterminal labels denoting record types and the set Vy is the set of terminal elements represented by data. The symbol I initiates a set of production rules which generate the hierarchical data base structure

(see Figure 6) and the capabilities of storage and retrieval from this structure. A grammar for directly modeling a network data base is context sensitive because two different record types may own 4he same member record. This situation produces a particular context of ownership; hence a production of record types transpires only in this context. The formulation of such a context sensitive grammar would require a different rule for every set relationship. However, an alternative way of viewing a network data base is as the intersection of one or more hierarchical data bases. This viewpoint has historical basis. Since sequential file processing is a form of hierarchical storage processing, a collection of interrelated sequential files constitute

Figure 6. Production Rules and a Hierarchical Data Base Structure 68 a network data base. Therefore, given a data base it is sufficient to find a group of hierarchical structures of which the network data base is the intersection (see Figure 7). If a hierarchical data base is described by a context free grammar then an interrelationship of hierarchies (which is a network) is capable of being described by an interrelationship of context free grammars. Thus, to represent the network structure as a hierarchical structure a transformation is required. This transformation is simply the computation of all maximal hierarchical paths through the network as is performed in Figure 8. It should be emphasized that the hierarchical form produced in Figure 8 is accomplished through a process of path determination, and not by storing the data in this manner. Linguistic concepts have also been utilized in a data base context for discussing relations among information objects and their constit uent parts for information structuring (Bubenko et a l. 1976). These relations are called constituent relations and describe the manner in which an information object can be subdivided into its constituent components in a similar manner that a sentence is broken into its constituent parts in Section 2.1. An information object class is an abstract name that represents a similar class of objects; it is similar to the abstract data type 'class' mentioned by Bouille*. According to Bubenko et al. (1976) an information object class B. is a constituent of an information object class Bj iff Bj CB.. and is written B. (<) B-. Thus, a constituence is that information associated with * J an object class B. which is a 'finer specification* than B., i.e. Bj J is a set of objects which are needed to produce the more general 69

.Figure 7. Network Data Base Structure

Figure 8. Multiple Hierarchical Data Base Structures Transformed from the Network Data Base Structure of Figure 7 70 object B.. This notion of constituency is a linguistic notion and J rules are utilized to formalize the production of constituent objects from other information objects,

2.4.7 Information Systems and Data Management

Senko et a l. (1973) discuss the evolution of business-oriented integrated data base information systems as a changing conceptual structure facilitated by developments in software and hardware tech nology. Processing has evolved from utilizing card storage to the use of tape storage and presently to the use of direct access disk storage. Information systems can be characterized by two extreme types of processing, thus two general categories of information systems arise. One type is an operational system which processes a question and retrieves a single entity on the basis of an exact match. The number of possible types of transactions are rather limited in operational systems. This allows the systems designer to preprogram and precompile transaction programs, thereby trading flexibility of system in favor of processing speed. A second type is called executive systems which generally perform more complex analyses of information for long range planning and, therefore need not be updated as often as operational systems. A typical executive system retrieval is based on a query, the answer to which is the content of one or more information elements in a stored representation. Such an answer to a general query may involve access to hundreds or thousands of representations. Consequently, executive systems are distinguished from operational systems on the 71 basis of the generality of queries which can not be preprogrammed and the volumes of data which must be accessed to answer any given query. Since transactions in executive systems take many forms and normally cannot be preprogrammed, the user works with a simplified query language to specify informational requirements. Most information systems lie somewhere in between these two extremes. Consequently, most systems must find a balance between executive type transactions and operational type transactions. Throughout the 1970‘s the development of complex structures for organizing data had been partially motivated by and had contributed to a sophistication of data management procedures in executive information systems. Data management is defined here as the process of utilizing primitive routines written in a programming language to manipulate

*• . . . ^ various types of data structures. Thus, flexibility in data management has evolved in a similar manner as flexibility in data structures. Data management systems have evolved from single purpose programs to integrated data base information systems. Theevolution ofsystems and their structures can be divided into seven stages according to Haseman and Whinston (1977, p. 78); 1. Programming language structures 2. Report generator structures 3. Systems with hierarchical structures

4. Systems with inverted structures 5. Systems with network structures 6. Systems with relational structures 7. Planning systems with multiple structures 72

Many of the recent developments are a result of work performed by such groups as the Conference of Data Systems Languages (CODASYL) committee and ANSI/SPARC. A Data Base Task Group (DBTG) was organized in 1969 to examine a general structure for data base management systems.

The DBTG report (CODASYL Committee 1971) outlined a data management system having many of the features of an 'in tellig en t, automated, usable1 system. One of the most important features is that different classes of users should possess different levels of access capability to a common data base. The four classes of users that are identified are: (1) data base administrator, (2) programming user, (3) non programming user (query user), and (4) parametric user. The data base administrator sets up the data and manages the files for those who use the system. The programing user can access the data base through procedural programming language (e.g. FORTRAN) primitives which retrieve, store, and modify records. The nonprogramming, query user accesses the data base through a generalized, query language composed of English-like corrmands for retrieving data displays and undertaking certain types of analysis without having to specify the particular steps to do so. The parametric user interfaces with the system through a simple question-answer process in which a specific set of questions is previously prepared for answering. The CODASYL approach has recently been extended and generalized further into general planning systems {Bennet 1976, Bonczek 1976, Bonczek, Holsapple and Whinston 1976). General planning systems are the most recent of 'intelligent, automated and usable' systems

developed to aid decision support. One of the most important 73 extensions includes query processing with an information base rather than a data base (Bonczek, Holsapple and Whinston 1977a, 1977d). This extension is an information base that contains not only data but programs which transform a data structure from one logical structure to another when needed, e.g. one structure may be incompatible for retrieval or analysis if the data management process is supported in geographically distributed locations (Bonczek, Holsapple and Whinston 1977c). Another extension includes an information base that has stored within it simple models which automatically process certain data members for analysis. As in computer science with business-oriented information systems, computer-assisted cartography has experienced a development of the data management process. That development is characterized by an evolution from one-off, single application programming toward spatial information systems, coninonly called geographic and/or cartographic information systems (Phillips 1974, Tomlinson et a l. 1976). A geographic/cartographic information system consists of all the data that are maintained by an organization, e.g. metropolitan planning organization, and the integrated procedures by which those data are manipulated for analysis and display in whatever form is necessary

or permitted. Along with the greater sophistication in programming systems, and as a result of considerable developments in technology, information processing has tended to move from a high dependency on batch processing systems to interactive processing. The development of cathode ray tube (CRT) technology, both refresh and storage tube, and 74 its introduction (Riffe 1970) into CAC for analysis and display has made it possible to interact more conveniently with a map model, a process called man-machine interaction. Interactive cartography (Moellering 1975) offers a much richer channel for cartographic commu nication and problem solving than does batch processing (Carlson et al. 1974, Moellering 1977). A virtual map (Moellering 1976, 1980) as a model can be manipulated easily—change scale, add and delete symbols-- while it is still in a virtual state in memory and then redisplayed in

virtual state on the CRT screen. Then, after a cartographer is satis fied he can commit his work to hard copy in the form of a real map with the assistance of a copy device interfaced with the CRT screen. A cartographer can also use the map to display analytical results in problem solving, e.g. in transportation planning (Moellering 1977). In this case the whole or parts of a virtual model can be retrieved and examined at the display scale or windowed for a more detailed exam ination which can aid in the solution of the problem. In summary, structural linguistics as developed by Chomsky (1957, 1963, 1965) can provide an analytical mechanism for characterizing the underlying structural relationships in a map. In a general linguistic context, structural linguistics is assumed to be closely related to the syntactic and semantic realms of semiotics (Nauta 1972); however, structural linguistics is different because it provides an analytical mechanism for describing structure. Picture processing, an umbrella

covering the subfields of computer graphics, image processing and visual pattern recognition (Shapiro 1979), deals with syntactic models and data structures for picture description, analysis, generation, and 75 interpretation (Narasimhan 1974). Contributions to cartographic grammars by Dacey (19701, 1970b, 1971), Betak (1972, 1973 1975), Youngman (1978), and Taketa (1979) are based largely on notions in picture processing. Dacey developed a grammar for describing simple geometric figures. Betak elaborated on

Dacey's notions by developing measures of syntactic complexity for geometric figures and evaluated these measures by examining subjects' responses to syntactic complexity concerning the same figures. Youngman developed a map grammar for describing the hierarchical deep structure of map displays and utilized these descriptions in a computer mapping system. Elaborating on Youngman's notions, Taketa generalized grammars for map description to include not only the hierarchical deep structure relationships in map displays but also a way of incor porating spatial relationships in the grammar. The contribution by

Dacey and Betak focus on the syntax of a single graphic symbol that may appear on maps. Youngman focused on the syntax of the entire map and Taketa discussed the syntax of both. Data Base Design involves a logical approach to the design of cartographic data bases utilized in computer processing of virtual maps. A logical approach to the design of data bases is characterized by six levels of data organization: data reality, canonical structure, information structure, data structure, storage structure, and machine encoding. This review focused on data structures and information structuring. Data structures were considered because of their role in processing data and their close ties to information structuring. Although the stage of data structure design involves logical 76 organization rather than physical organization as in storage structures or machine encoding, one must realize that any data structure is ultimately constrained by the programming language and data base soft ware in which it is implemented. Information structuring is a new topic, relative to data struc turing. When one views the process of data base design at this level there is a better chance of benefiting those whom are interested in developing a data base. These information structuring notions can then be translated into available software. Considerable advances in user-orientation have occurred during the past few years. Systems are being designed with the non- programming user in mind. No longer does the user need to be a programmer and worry about the correct card column in which to specify a parameter for batch oriented systems. The user should not be bothered with the tedium of a procedural, structured approach in the retrieval and analysis of maps and other pertinent information, but should be able to use English-like statements to interact in a non procedural, unstructured manner to retrieve and analyze information

in various forms (Phillips 1977). CHAPTER 3

Modeling Cartographic Information with an Information Structure

3.1 Introduction

In this chapter concepts from linguistics, picture processing, data base design and numerical cartography are integrated to formalize a logical approach for characterizing virtual maps (Moellering 1980) in a cartographic data base design context. The discussion focuses on the development of a syntactic-semantic model for structuring carto graphic information, called a cartographic information structure. Although the notion of an information structure is developed in a carto graphic context, this notion is not limited strictly to cartographic interpretation. An information structure is a multi-dimensional mechanism for describing syntactic and semantic aspects of knowledge in general. These notions can be utilized in the design of any data base, or perhaps what might be more appropriately called an information base {Bonczek, Holsapple and Whinston 1977d). In a cartographic context a similar notion for describing the main structural information of spatial phenomena has been introduced by Bouille/ (1978) and is called the ’skeletal structure' of a data struc ture. The two notions are completely compatible. In fact, the notion of information structure presented in this chapter is a formalization of the concept of information structuring and therefore a formalization 77 78 of the notion of 'skeletal structure'. A grammar is the basis of a linguistic formalism which generates an information structure. The development of the grammar draws from theoretical contributions on transformational grammars developed by

Chomsky (1965), plus contributions on two-dimensional web grammars (also called graph grammars) introduced by Pfaltz and Rosenfeld (1969) and further discussed in Montanari (1970), Rosenfeld and Strong (1971), Rosenfeld and Milgram (1972), Abe et a l. (1973), and Della Vigna and

Ghezzi (1978), in addition to the contributions on cartographic grarmiars presented by Youngman (1978) and Taketa (1979).

3.2 Entities, Objects, Attributes and Relationships

An entity is a 'thing' or 'event' of which the mind is conscious. A cartographic object is defined as the representation of an entity employed in the modeling of a geographic distribution for cartographic purposes. Cartographic objects are representations of both geograph ical entities, e.g. features on the surface of the earth, and carto graphic entities, e.g. legends. In numerical cartography, cartographic objects need not be tangible; but at some time they must be virtual

(Moellering 1980), i.e. exist in computer storage or on the screen of a CRT. There are two types of cartographic objects: primitive and compound (Youngman 1978). A primitive object type is a general

classification for basic objects of a map, e.g. point, line or polygon (see Figure 9). A compound object type is a general classi fication of the collection of primitives which takes on a higher level

of meaning due to a synergistic effect of logical association, e.g. a 79

Label Object Type Attributes

Pt Poi nt x,y Nd Node Pt,L Sym Symbol Pt,T,V,L Ve Vector Pt,Rho»D,V,L Ar Arc Pt,Pt,Rad,V,L Sec Sector Pt,Rad,Rho,Theta,V ,L Str String Pt + Pt + Pt + . . .

Ch Chain Nd + Str + Nd Li Line Ch + Ch + Ch + . . . Po Polygon Pt»V,L,Li

Px Pixel i J SI Scan Line i where the attributes are defined as: x : an x coordinate value or Easting y : a y coordinate value or Northing i : an ith scan line j : a jth entry V : a numeric value L : a text label T : a symbol prototype Rho : an angle of rotation Theta : an angle of opening D : a distance Rad : a radius

Figure 9. Primitive Object Types (After Bracchi and Ferrari 1971, and Youngman 1978) 80 geographic domain as in counties aggregated into a state {see Figure 10). A primitive cartographic object is not necessarily graph ically primitive. That is, other primitives may be needed in its construction. For example, a polygon is defined by a closed line which is defined by a number of points or chain codes. Cartographic objects are characterized by two associations: attributes and relationships. An attribute is a general descriptive identifier of an entity, i.e. something which, when given a value, will specifically describe an entity. A relationship names an association which one entity may have with another; this can be syntactic or semantic, i.e. concerned with structure or meaning. Although a relationship can be considered—technically, at least--an object in itself, it is always necessary for two or more objects to participate in the association. When one object is the focus of attention in a given association we view the other object(s) as its related counter part. Consequently, a relationship between two objects is said to be characteristic of both objects. Attributes and relationships may be of three kinds: graphical, spatial and phenomenological (see Figure 11). Graphical attributes concern the geometric form of single symbols. Graphical relationships concern the similarity of geometrical form among symbols. Spatial attributes are those which concern location, position, orientation, extension or length. Spatial relationships are those which concern pattern of distribution, relative or absolute proximity such as topological adjacency or distance among objects, hence symbols. Phenomenological attributes are those which concern description for a given symbol whether this be a label or a 8]

Label : Object Type Constituent Parts

GD Geographic Domain Points, Lines, Polygons MA Map Area Scan Lines RI Road Intersection Chains META Metropolitan Area Census Tracts

CT Census Tracts Polygons

Figure 10. Compound Object Types 82

ATTRIBUTES a) GRAPHICAL - geometric form of symbolization such as; n-sided, square* or circle b) SPATIAL - locational characteristics such as: position, orientation, or azimuth c) PHENOMENOLOGICAL - nonspatial descriptor such as: name or class ranking

RELATIONSHIPS a) GRAPHICAL - similarity or difference between geometric forms of symbols b) SPATIAL - relative or absolute proximity such as topological connections or distance among objects c) PHENOMENOLOGICAL - functional associations among objects in terms of nonspatial character

Figure 11. Attributes and Relationships for Cartographic Objects (After Youngman 1978 and Taketa 1979) 83 rank. Phenomenological relationships involve the similarity or difference among entities or objects according to nonspatial charac teristics.

3.3 Structural Aspects of Data Bases and Virtual Maps

Entity sets can be combined in a logical way to represent the information structure of a spatial theme of some geographically oriented phenomenon. An information structure can be thought of as a combination of an information hierarchy (Peucker 1972) and a semantic network (Woods 1975), i.e. it is a multi-dimensional representation for portraying cartographic entity sets and the logical relationships between these sets. An information structure serves as the skeletal conceptual basis of a canonical structure. Through a canonical process, canonical structures provide a minimal structure for logically organ izing the data representation of entity sets, relationships and their attributes (Martin 1977). As defined in Section 2.4.1, a canonical structure is a conceptual data structure which emphasizes all inherent qualities of information about entities and their attributes plus the relationships between entities. A canonical structure is used as a preliminary step in the design of an operational data structure. A canonical structure can be translated into an operational data struc ture of a particular data model, assuming that the data model is rich enough to support all of the relations described in the canonical

data structure. 84

An operational data structure is a logical organization of all data objects in a data base in terms of a particular software imple mentation utilizing hierarchical, network, relational, recursive or hypergraph structures. A description of the logical structure of a virtual map has a direct counterpart in an information structure, and through a canonical structure should have a counterpart in an oper ational data structure for computer processing. Narasimhan (1974) suggests that successful attempts at generating and analyzing pictures (used in a general sense which includes maps) rely on flexible oper ational data structures. The notion of information structure is directly compatible with any canonical structure; however, some data structures may not be compatible with all canonical structures. When a data structure is not compatible it indicates that the respective data structure is not flexible enough to facilitate representation of the relationships necessary as defined in the information structure. This, in fact, is the greatest benefit from developing information structures and a canonical structure before attempting to represent data in a data structure. If the information structure does not ultimately translate into the intended data structure then the data structure is of little utility to the overall inherent structure of the information. Among the most flexible of data structures set forth in carto graphy to date are the hypergraph-based data structure (HBDS) model introduced by Bouille (1978) and the recursive spatial data structure set forth by Haralick and Shapiro (1978). Both contain a mixture of

notions from hierarchical, network and relational data models. A similar model to these, but not yet applied in a cartographic context 1 the entity-relationship model of Chen (1976). All three employ set notation and include links between data objects; however, the links are operationalized differently*

3.4 Information Structures

As was mentioned earlier, an information structure is a formal ization of the notion of skeletal structure. Before a discussion of information structures is presented a brief discussion of the cogent aspects of 'skeletal structure' in the HBDS model is necessary. Bouill states that a: . . . skeleton-structure sums up the main information of the data structure, and all other data are inserted in or around this skeleton, but in a second time. It is the part of the data structure which is deducted of the analysis of phenomenons [sic], before the concrete processing of data, and even before their capture. The skeleton may be updated, but it mainly represents the most important characteristics which determine a data struc ture. Adding or suppressing one-hundred objects in a class is a minor event compared with the suppression of a class, even if the class is empty. (Bouille*1978, p. 4) The two main components of the skeletal structure in HBDS model are: 1) a hierarchical part which is a tree-like directed graph embedded in the hypergraph, and 2) a non-hierarchical part consisting of one or more multigraphs (see Figure 12). Principal nodes in the hierarchical part represent classes of information, or entity sets in the skeletal structure. Each set may contain one or more objects which are instances of entities, or the set may be empty. The edges of the directed graph connecting class nodes that form the hierarch ical components of the skeletal structure are undistinguished in 86

O Principle node representing a class of objects ■■ Edges of the directed graph in the hierarchical component “ Edges of the multi graphs in the nonhierarchical component

Figure 12. Major Components of the Skeletal Structure in a Hypergraph 87 meaning except for the fact that they do represent linkages. In the formalization of the information structure presented here, these edges take on a general class of linkages called 'deep structure' in the information structure. In the non-hierarchical component of the model multi graphs can be defined as links connecting two nodes. Those links represent binary relationships between entity sets at any level of the directed graph. Multi graphs can represent relationships of any kind. However, since the HBDS is a cartographic data structure the relationships are mainly spatial or graphical relationships, e.g. adja cency between polygons or segments connecting topological nodes. Bouill/ discusses a skeletal structure in terms of 'main infor mation of a data structure'. Consequently, the classes of objects which are identified, and the associations between these classes are really information. This information is an abstraction of the data, a symbolization or representation of the data in a general sense. Distinguishing an information structure from a data structure is an important design step because data structures are usually associated with a particular software. Designing the structure of a data base in terms of a particular software may be a very important pragmatic consideration at some time, however the limitations of such an exercise will eventually outweigh the advantages. Information structures should be translated into a canonical structure which provides for integration of information classes and describes these classes in terms of a minimal structure. The canonical structure can then be translated into

an operational data structure which is software dependent. 88

Like the skeletal structure of the HBDS, an information structure consists of two major components: 1) a hierarchical component, and 2) a non-hierarchical component. An information structure is concep tually hierarchic because a top-down linguistic approach is used to deduce the deep structure of spatial phenomena to be portrayed carto graphical ly. However, when the information structure is placed in an operational context it is hetararchic. Hetararchic means that a possibility exists of cycling back through the levels of a data struc ture to facilitate analysis as well as generation in a cartographic context (Narasimhan 1974). Consequently, an information structure is hierarchical in conceptual derivation, while the data structure is hetararchic in operation. When the hierarchical component is combined with the non- hierarchical component into a multi-dimensional graph, and this graph is labeled to provide meaning, the labeled graph represents an infor mation structure (see Figure 13). The information structure can be subdivided into various stages of generation, each stage being a representation of the non-hierarchical component of the information structure. Any given stage of generation can be represented by a base web which is the labeled graph at the base of the information struc ture at that particular stage of production. Thus, a base web passes through a number of stages of development or specification. Deductively, one initiates the construction of an information structure with a special base web called an initial web. The initial web conceptually represents all of the potential cartographic infor mation that is derivable. For example, an initial web may represent 89

INITIAL WEB

BASE WEB 1

BASE WEB 2

J HIERARCHICAL COMPONENT

— WEB COMPONENT O CARTOGRAPHIC OBJECT L LABEL I INITIAL SYMBOL

Figure 13. An Information Structure for Virtual Maps 90 a particular topic for which information is stored in a cartographic data base (information base). Hence an initial web represents a multitude of data and relationships. A base web 1 generated from the initial web consists of classes of cartographic information, called entity sets or information classes* that appear on virtual maps such as reference information as in graticules or toponomy. Another general class of information may be the thematic data representing phenomeno logical characteristics of census tracts, etc. These general classes of information are the labeled nodes in an information structure, labeled for the information they represent. Here a base web will be referred to as W. Technically, a base web W is a labeled graph that is highly connected, and the underlying graph G which graphically depicts the base web W is defined by the pair

G * (NW,AW) (see Figure 14). Formally, the web W on a vocabulary V of labeled map elements is a triple:

W = (Ny»Ay»fy) where Ny is a set of nodes in W, Ay is a set of arcs that represent relationships between unordered pairs (m,n) of nodes, and fw is a function from Ny into a vocabulary of labels;labeling each node of Ny as a cartographic entity set. The node set Ny is the set of all information classes contained in the

web, each node representing an information class, e.g. a class of reference information represented by node m and a class of thematic information represented by class n. These classes may change depending 91

-O

BASE WEB W GRAPH

Figure 14. A Base Web W and Its Underlying Graph Gy 92 upon the level of information specification in the structure. The set of arcs Ay represents the binary associations in the non-hierarchical component of the information structure. If the pair of nodes (m,n) is in Ay, the nodes m and n, representing information classes, or entity sets (which are not necessarily distinct), are said to be graphically, spatially or phenomenologically associated in some manner. These entity sets may be at the same or perhaps different levels of phenomenological existence. A subweb a is a labeled subgraph of the web W (see Figure 15).

Formally, o - {Na,Aa,fa ) a Subweb

Na is a subset of Ny, Aa is a subset of Ay, and f is a subset of function fy restricted only to Nfl into V. The members of W that are not members of ot (the complement of a in W) are the nodes of the subweb W-a defined byrestricting Ay and fyto the subset Ny-Na of Ny. Thus, a subweb a is only part of the web W on V. An information structure is generated with the use of webs and subwebs. The hierarchical component records an information structure generation process from an in itial web through base webs to a terminal web. An initial web is simply a labeled node for starting the process and a terminal web is a web consisting of instances of primitive object types plus their logical relationships. The hierarchical com ponent represents the overall deep structure of the information struc ture underlying a virtual map and the data base from which it was derived. 93

L L L L O O cr * o

SUBWEBS SUBGRAPHS

Figure 15, Four Subwebs and Their Subgraphs 94

A subcategorization of node labels starting from the initial web down through base webs to a terminal web represents the deep structure of any given element in the terminal web. Therefore, a single linkage in the deep structure is a single subcategorization of nodes. A sub categorization is represented by an edge connecting nodes of different hierarchical levels. The entity sets represented by the nodes are not totally distinct, i.e. the entity sets directly lower (at the lower end of the linkage) serve as constituent parts to the compound object type at the higher end of the linkage (see Figure 16). In some sense these constituent parts act as attributes in the compound object type. The nature of the subcategorization and constituency is of central concern

in the formalization of the information structure to be presented in the next section.

3.5 A Grammar for Information Structures

A cartographic grammar is composed of a finite set of vocabulary (map elements), initial symbols and rules which potentially allow an infinite set of webs, hence information structures to be generated. Formally, G is a triple: G = (V,I,R) (see Figure 17). The set V is a vocabulary consisting of two finite, non-empty, disjoint sunsets: (1) is a nonterminal vocabulary of labels for conceptual map elements most of which are compound object types, e.g. legends, reference and geographic base (see Figure 18), and (2) Vj is a terminal vocabulary of representational elements, e.g. instances of a point, line or polygon (see Figure 19). The compound object types are associated with nonterminal nodes in a web. The set of nonterminal elements also 95

I COMPOUND OBJECT TYPE Entities of same entity sets collected to I form Compound Object Type

COMPOUND OBJECT TYPE Entities of different entity sets collected A to form Compound Object Type

Figure 16, Formation of Compound Objects Through Deep Structure Linkages 96

G = (V,I:,R) V : Vocabulary V*, : Nonterminal N V-j. : Terminal

I :: Initial Element

R : Rules of Grammar Rp : Production Rules = (a,B»C,E)

a , B : Subwebs C : Contextual Condition E : Embedding Function

Rl : Lexical Rules = (Q^P^AT^) Q. : Label Representing Preterminal Variable P.. : Label for Occurrence of Primitive Object AT. . : Attributes of Primitive Object (P^,A^) inserted into when P^ matches Ry : Transformational Rules = (W,S^,RS^,AT^)

W : Terminal Web : Structural Index RS^ : Rule Status; Elementary or Compound AT.. : Application Type; Single Object or Multiple Object

Fi gure 17. A Grammar for Information Structures 97

I Initial Element VN = R Reference Base M Message Theme Gb Geographic Base R1 Reference Lines T Title

S Symbolization Fo Feature Objects Sc Scale Mb Map Boundary G1 Grid Lines L Legend D Data Domain

LI Lat., Long. 7m Tic Marks Nd Node Sym Symbol Rule Preterminal Variables Ch Chain Primitive Object Types Li Line Po Polygon

Figure 18. Elements of the Nonterminal Vocabulary Label Attributes Vy = Nd.l Pt.1,0 Nd.2 Pt.2,0

Nd.n Pt.n,City.name

Chn.l Nd,String,Nd

• *

■ •

Chn. n Li.l Chn.5,Chn,6

• *

* * Li.n Chn.n-1 ,Chn.n

Sym.l Pt ,T »V ,L

5 c Sym. n

Figure 19. Elements of the Terminal Vocabulary 99 includes a special subset of preterminal variables which are primitive object types, e.g. nodes, lines and polygons. The preterminal variables or primitive object types are associated with the repre sentational graphic elements of the map. The terminal vocabulary is the lexicon of the grammar. A lexicon consists of an unordered set of lexical entries and redundancy rules. Each lexical entry is a label plus a set of feature attributes representing an instance of a primitive object type. Lexical entries constitute the full set of idiosyncratic primitive objects on a map, whereas redundancy rules (containing a seed for location definition) add and specify primitive objects that can be generated by a general rule. The elements of the lexicon are representational data, representing the actual graphic elements that are perceived on a map. The set 1, a subset of VN, is a set of special symbols which

'initiate' the information structure. These initial symbols are used as surrogates to represent a virtual map. The set R is a finite, non-empty set of rules consisting of three disjoint subsets: Rp for production rules, for lexical rules, and

Ry for transformational rules (see Figure 17). The set Rp is a set of rules which 'rewrite' a node or nodes of a subweb a associated with the nonterminal vocabulary into a node or nodes of another subweb 8 (see

Figure 20). This rewriting process expands a host web W and con sequently produces another level of the information structure (see

Figures 21 and 22). A rule Rpi is a quadruple: Rp^ = (a,6,C,E), where ' i * is a rule member; a and 8 are subwebs of W (B is to replace a in the rewrite operation), C is a condition of applicability for 100

: : = R M (1) 1o (Initial Element) o O : : = Gb R1 (2) Ro (Reference Base) o O (3) M (Message Theme) : T s o o D : : = Fo Sc (4) Gbo (Geographic Base) o O :: - Mb G1 (5) R1o (Reference Lines) o 0 : : = (6) os (Symbolization) oL Do (7) G1 (Grid Lines) : : = LI Tm o o o (8) (Feature Objects) : : = Syn. Nd Ch Li Po Fo° O o o o o (Scale) :: = (9) Sco Spi (10) Mb (Map Border) : : = Li o o (Lat.» Long.) :: = Sym (11) LIo O Tm (Tic Marks) : : = Sym (12) o O Nd (13) To (Title) : : = O (Legend) : : = Nd (14) oL sr o D (Data Domain) : : = Nd Ch Li (15) o SF o o o Po° where Sym Symbol Nd Node Ch Chain Li Line Po Polygon

Figure 20- Subweb Production Rules 101

Figure 21. A Nonterminal Base Web after Application of RP1 TITLE TEXT RULE

LEGEND SYMBOL RULE

DATA DOMAIN SHADE RULE SCALE SYMBOL RULE

Li O BORDER

Sym

FEATURE OBJECTS

Figure 22, A Preterminal Base Web after Application of Rp^ 103 rewriting a as Bin the host web W, and E is an embedding function which associates neighboring nodes of a with nodes in e The contextual condition C of Rp^ provides an explicit statement governing the con textual conditions for applying the production rule to the web which is being generated. A value of 'C = true' indicates that the production rule is applicable, thus the subweb a will be rewritten as the subweb

6. The embedding function E specifies that there is a link between nodes of 1^ in the subweb £ and nodes of Nw_in the subweb W-a, or specifies that nodes of are linked onto themselves. (Nodes of are all those nodes of Nw which have already been generated except those of Na. ) The process of linkage is accomplished through an examination of a set of labels: = { fem*£n»•■■iEp 1 grouped into binary subsets. Thus node n e N0 is joined to a node p e Nu because p Qt their labels appear as a binary set EUn,fi,p). The set E(j,n,s,p)becomes analogous to the arc set A^ in the web W discussed previously. Binary subsets represent spatial and graphical relationships.

Those relationships are different from the deep structural relationships which are produced from the application of rules in Rp themselves. The spatial relationships between elements described by the embedding function may or may not underlie the same compound object type. According to the grammar described by Taketa (1979) it is possible only to describe a spatial relationship between elements which are rewritten by the same rule. In contrast to this, with this function it is also possible to describe relationships which do not necessarily take part in the formation of the same compound object type, or any compound object type. There may be relationships between entity sets which are 104 simply spatial associations, with no compound entity being formed, e.g. an association such as commercial or environmental pollution flows between states or countries. The flows of goods and currency or sulfur dioxide are the associations between states, but no compound entities are created as a result. A crucial distinction exists between a deep structural relation ship and a spatial relationship. A deep structure relationship is not externally visible as is a spatial relationship between two objects. The notion of a deep structure relation derives its interpretation from a logical, structural association between objects and is not limited to the spatial realm. The association may be graphical, spatial or phenomenological in nature or it may be a cartographic relation as in the logical association between a geographic domain and the legend. Thus, a deep structure relation can have a graphical, spatial or phenomenological relationship participate in the formation of a deep structure relation, but is not limited to any one of them. A base web contains nonterminal elements only. The lowest level of the base web has preterminal variables associated with the nodes of the web. Since a preterminal variable is a primitive object type representing a class of lexical entries, a web with preterminal variables is also called a preterminal web in addition to being called

a base web. The set R^ is a set of lexical rules which assigns lexical entries (cartographic elements of the terminal vocabulary) to preterminal variables (primitive object types) that are the nodes of a preterminal web. This consequently generates a terminal web. A lexical rule R^ 105 is a triple: = (Q^ »P^»A - j ) where i is a rule ID, Q.. is a label representing the preterminal variable (primitive object type), P. is a label of an occurrence of a primitive object, and is a set of attributes. In the lexical rule the lexical entry (P^,A^.) is inserted into when the label P^ is successfully matched against the label Q.. An insertion takes the actual form of attributes A. . replacing 1 ' J Q. to produce a terminal web. At this stage of production it is still appropriate to refer to the two-dimensional representation of the terminal web as a graph. Although lexical entries have been assigned, final definition of location and description has not yet occurred. Thus, it is still possible to refer to the representation as a graph rather than a map because the map is only in skeletal form (see Figure 23). Using the lexical rule = (Q^.P^A^) a preterminal variable Q.j * e.g. 'polygon* in production rule #8 on Figure 20 is matched against variable label P^, e.g. 'polygon* for census tract, of the lexical entries in the data set. Since a match is successful, attri butes A^ (the lis t of chains and identifier for the polygon) replace Qi, attaching the lexical entry to the proper place in the web marker. This process of lexical assignment produces a fully defined terminal web. The base component of the grammar consists of the set of production rules plus the set of lexical rules. It is the base com ponent that generates a deep structure. A deep structure is a set of internal syntactic relations among nonterminal web elements underlying the syntactic structure of the terminal elements. A description 106

I------1

j + TITLE I I |------I I II II II II II II II II II II I I I + LEGEND I

I I GEOGRAPHIC I I I l+ domain J |

I + SCALE I I______J + MAP BORDER

Figure 23. Terminal Web (Skeletal Thematic Map) 107 of the deep structure is taken to be a systematic ordering of the base rules and lexical rules needed to produce a terminal web from an initial web and base web. The description can be documented in two ways: (1) by means of an ordered lis t of base rules and lexical rules which have been applied and/or (2) by a multi-dimensional tree diagram which visually displays the subcategorization of nonterminal elements and necessary lexical rule assignment (see Figure 24). Either documentation can be called a web marker because both describe the information structure of a virtual map. An advantage of the listing of rules is that it specifically defines the order of application of

base rules while the advantage of the diagram is that it intuitively aids conceptualization of the web production process. The production rules can replace a single node by one or more other nodes comprising a subweb, but the rules cannot delete elements, rearrange elements or expand more than one node at a time. These restrictions ensure that one can trace the derivational history of every web generated by the base rules. That in turn enables one to formulate transformational rules which can bring about changes in the terminal web if and only if the web has a certain derivational history recorded in the preterminal web marker associated with it. Strictly speaking, the transformations that delete, rearrange and expand elements are defined not on terminal webs or the virtual maps which are produced, but on the web markers which are more abstract. The task of the transformational component is to clearly define the nature of a virtual map, i.e. produce a well-formed virtual map which is recognizable as a map. The transformational component INITIAL SYMBOL

COMPOUND OBJECT TYPE NONTERMINAL ELEMENT PRIMITIVE OBJECT TYPE PRETERMINAL VARIABLE

Eym m Sym

Figure 24. Web Marker at the Preteminal Stage 109 consists of a set of transformational rules. Each transformational rule is defined by a structural index, a rule status, and an application-type parameter. A structural index defines the sequence of production rules in a web marker that are needed to generate a map expression of a primitive object type. A rule status parameter defines a transformation in terms of rule complexity. An application-type parameter defines a transformation in terms of singular or multiple applications. These parameters operate in concert on a terminal web to organize map elements for output as a surface structure repre sentation, an actual map. A crucial distinction between deep structure and surface struc ture is that deep structure specifies the structure of a cartographic image or a data base in such a way as to bring out underlying syntactic relationships {associations among conceptual elements), even though this may result in an abstract representation of those elements that is far removed from the final surface structure of the map. Much of the potential usefulness and flexibility in a data base and much of what a map percipient can potentially understand about a map cannot be represented in terms of surface structure but depends on an awareness of underlying relations expressed in deep structure. The set Ry is a set of transformational rules which transforms the deep structure inherent in the derivational history of a web into a surface structure called a cartographic display. A rule Ry^ is a quadruple: Ry. * (W.S^ .RS^ .AT^) where i is a rule ID, VI is a terminal web, S,j is a set of structural indexes, RS^ is a rule status and

AT^ is an application-type. Each member of the set of structural 110 indexes S.. is composed of a stack of values indicating the base rules utilized to generate a preterminal variable and its associated lexical entries (terminal objects). Thus, each structural index specifically identifies the structural history of a terminal object; the terminal objects being appended to the top of the stack.

The rule status RS.. signifies that RTi 1Seither elementary or compound. An elementary transformation is one which involves a single rule, whereas a compound transformation is one which involves a group of rules or a complex rule. Elementary or compound transformations may be singular or general.

Therefore* the application-type AT., signifies that the rule is applied to a single object or multiple objects. A singular transformation involves a single terminal element whereas a general transformation involves one or more preterminal variables, hence multiple elements. A general transformation requires recursive application of a rule, once for each of the terminal objects involved.

For transformational rules when RS has a value of '1 ', this indicates that an elementary, or single rule is being applied; and, if RS is '2', a compound rule is applied. If AT is '1' this indicates that a single terminal object is of concern, whereas if AT is '2' it indicates that multiple objects identified by the preterminal variable are of concern. The initial task of the transformational component is to define location, size, orientation, and shape for the surface structure of the virtual map. A structural index of each transformational rule identifies appropriate terminal elements for attribute definition. I l l

That definition can be specified by the graphical, spatial and phenom enological attribute values as they exist in the data base, or it can be a transformation of this data. Thus, the rules are of two slightly different transformational types. The former involves attributes and

transforms data from the virtual domain of the data base to the virtual domain of the CRT screen. While the latter is of the type as repre sented by those in transformational geometry. In either case, a transformational rule is required to output the surface structure in

the form of a cartographic display (see Figure 25). The main focus of the graimar presented in this chapter is on the

formalization of information structures for virtual maps. These infor

mation structures describe the major structural relations of all carto

graphic objects that compose virtual maps, and are not constrained to

the structural aspects of objects in the geographic domain. In fact,

the structural aspects of objects in a geographic domain may be considered a substructure of the overall information structure.

An implementation of information structuring requires the use

of a cartographic/geographic information system. The potential use of

virtual maps for geographic problem solving depends on a robust overall

system design. System design includes both data base design and algorithmic design of data manipulation software. The design process of a data base involves consideration of the levels of data organ

ization. The design process of data manipulation software involves an

examination of structural processes that involve interfacing with geographic and cartographic data bases in general. Those notions of POPULATION DENSITY SOME CITY

SCALE

Figure 25. Surface Structure Representation: A Map 113 system design are considered in Chapter 4 in terms of a geographic/ cartographic information system for query processing. CHAPTER 4

Cartographic Query Processing in a Geographic Information System

4.1 Introduction

A geographic information system (GIS) is an integrated set of

computer software for manipulating both spatial and non-spatial data. The process of data manipulation is divided into two major categories: data base management (DBM) and analytical processing. DBM involves the input, storage, retrieval, and output of data. Data management proce dures do not alter the form of the data before storage, retrieval and display. Those procedures are the means of shuttling data from an external medium to internal main memory storage and vice versa for ana lytical processing and/or virtual map display. Analytical processing involves computations which result in a summary or synthesis of data, e.g. simple statistical computations as in mean values and/or complex computations as in mathematical modeling. Analytical procedures operate on data in an attempt to produce a basis for information. Virtual map displays rely on a combination of data management and analytical processing procedures because data is retrieved and then transformed into a graphic display that summarizes the numerical form of the data. Although DBM procedures are individually less complex than analyt

ical processing procedures, DBM is an extremely important process in a GIS. The complexity and effectiveness of analytical processing depend 114 115 on a well-managed data base because DBM routines are used to retrieve the necessary data for processing. A recent advance in information processing systems concerns the ability to accomplish sophisticated analysis within a highly user- oriented communication mode. These highly user-oriented systems, called general planning systems or query systems (Bonczek, Holsapple and Whinston 1977a), are directed toward enhancing the decision process in planning. Query processing is a flexible and comprehensive method of communicating and processing user requests for data base management

and analytical processing. Requests take the form of English-like phrases which are convenient to use and understand by a non-programming

user of an information system. One of the foremost advantages of query systems is that they have a nonprocedural orientation. That is, a user is not completely channeled by a sequence in program logic when communicating with the system. Cartographic query processing is more general than coninon query processing in the sense that the former not only utilizes a textual mode

of communication but also utilizes a graphical mode of communication for investigating spatial relationships. Since a map is an optimal channel for spatial communication, a virtual map can act as a valuable addition to an interactive, heuristic examination of spatial relation ships. Thus, a virtual map in combination with a high level user- oriented command language can be used extensively to facilitate analyt ical and display procedures in a geographical problem-solving context. Currently, three major alternatives exist for incorporating carto graphic displays into the problem-solving process. These are: 1) produce 116 line printer maps portraying spatial patterns for off-line examinations, e.g. SYMAP; 2) provide interactive cartographic displays for on-line examination of spatial patterns, e.g. GIMMS; and 3) support an on-line query system which can be used in both on-line interactive analysis and display (see Figure 26). Cartographic query processing which utilizes textual and graphical modes of interaction increases the potential for spatial analysis as it combines not only modeling with interactive cartographic displays, but, in addition, it provides for graphic queries to support analyses as well.

4.2 CART-QUERY, A Prototype System for Cartographic Query Processing

This system for cartographic query processing consists of four major components: 1) A cartographic query language (CARTQUEL) is a high-level command language of statement-1 ike expressions used for communicating with the system in a 'natural English-like manner*. 2) A query language decoder (QUED) accepts the comnand expression and translates it into a decoded deep structure expression (DSE) using a phrase structure grammar and then applying inverse transformational rules. 3) A query processor (QUEP) uses the DSE of the query to specify the manipulation to be undertaken utilizing a data manipulation language (DML). 4) An information base (INBASE) is comprised of conceptual infor mation, a logical data structure, and empirical data. These components together facilitate the process of human-machine inter action (see Figure 27). Logical flow of processing in the system begins when the user specifies a query as a surface structure expression con sisting of a command, a clause and/or parameters as part of the CARTQUEL. The surface structure expression (SSE) is translated into a 117

PROBLEM

INTERACTION INTERACTION VIA MAPS IN INTERACTION VIA VIA VIRTUAL BATCH VIRTUAL MAPS MAPS IN PROCESSING QUERY SYSTEM

ANALYSIS

------ON-LINE INTERACTION

------OFF-LINE INTERACTION

Figure 26. Alternatives When Utilizing Cartographic Displays in a Problem Solving Environment 118

User

QUEL Query Language Cot i t , and Operation Values

QUED Query Decoder

f Prob Query Processor Message rProcess ^ ’

INBASE Information Base c

Analysis/ V Results ( Display J

Figure 27. Major Components in a Cartographic Query System 119 deep structure expression (DSE) (Bonczek, Holsapple and Whinston 1977b) by the QUED, then it is either recognized as being an allowable expres sion or the user is signaled that an incorrect term has been specified. Processing takes place with the QUEP, after message recognition, using the data and conceptual information in the INBASE. An INBASE is an extended notion of a data base. It contains conceptual information in the form of relationships, in addition to empirical data stored as instances of attributes. The CART-QUERY system consists of thirty-eight subroutines and a main program (totaling 3,250 lines of program code), not including the subroutines residing in the graphics subroutine libraries. Thirty-four of the subroutines are written in FORTRAN IV, eight of these (627 lines of code) were extracted from the LANG-PAK Interactive Design System (Heindel and Roberto 1975). The remaining four subroutines are written in IBM 370 Assembler. The system is programmed to run on the AMDAHL 470/V6 under Time Sharing Option using Tektronix 4012 and 4014 terminals. At this time it is necessary to add that the system is not a full scale GIS with a cartographic query processing component.

Rather, the emphasis in this implementation is on cartographic query processing developed to demonstrate a flexible, interactive system that incorporates the theoretical notions described in Chapter 3. A description of the system is presented in the following sections of this chapter. Since the major focus of this research concerns infor mation and data structuring, the INBASE component is described first. The QUEP component is described next as ways in which data is manip ulated in the system. That is followed by a discussion of CARTQUEL, the 120 language used for communieating requests to the system for data manip ulation. Finally, a discussion about the QUED component that decodes requests is given.

4.2.1 Information Base (INBASE)

A definition of data is given in Chapter 1 as factual observations about a particular topic. Information is defined as the interpretation of an abstraction of datum. As is mentioned in Chapter 2, the realm of pragmatics in semiotics is interpretation. Interpretation is predicated on structure and meaning, that is, syntactics and semantics, respec tively. Thus, at the basis of interpretation of information is a multi dimensional structural meaning of information (Nauta 1972). An information structure was defined in Chapter 1 as a logical organization of information about a topic or phenomenon. Information is recoverable from an INBASE through a simple retrievable process or is deriveable through analytical processing. Since an INBASE is an extended notion of a data base an INBASE contains both information and data. Information is considered to be (a) knov/ledge as abstractions of data, e.g. entity set names, and (b) models which are used to process data. Data are considered to be instances of the graphical, spatial, and phenomenological attributes of the entity sets. Logical design of a cartographic INBASE is a direct consequence of the design of its cartographic information structure. In the CART-QUERY system two general types of information structures can be identified: a geographic information structure and a cartographic information struc ture. The geographic information structure consists of the three 121 entity sets: province, tambon and health facility. Those entity sets are names that describe groups of data from a larger data set on family planning in Thailand. The entity sets, province and tambon, represent administrative districts whereas a health facility represents a place where family planning and health services are offered. A cartographic information structure includes the cartographic representation of geographic entities in addition to cartographic objects such as titles and legends. The INBASE in CART-QUERY is the combination of both information structures. Since emphasis in this implementation is on a cartographic information structure, the geo

graphic information structure has been limited to a rather simple structure. However, the limited number of geographic entity sets s till allows a characterization of cartographic information structures as

presented in Chapter 3. Cartographic and geographic information structures may range from a local to a global extent. Labeling an information structure as either local or global is an attempt to distinguish the degree to which the information structure represents the totality of an INBASE. A labeling as such is not a distinction of the areal extent that is covered by the data base, i.e. small scale or large scale mapping; although areal extent may be one of the components pertinent to a particular user or application. Global information structures involve a majority (if not totality) of the INBASE, i.e. the information of the entire organization supporting the INBASE. The only attempt to distinguish local information structures from global information struc tures is to call the two applications in the CART-QUERY implementation 122 local information structures and their combination a global information structure. As is explained in Section 3.4, deep structure in cartographic information can be depicted by a cartographic information structure diagram and/or by a web marker; both diagrams serve a useful purpose in their visual depiction of multi-dimensional ideas (see Figures 13 and 24, respectively). However, a more suitable diagram for displaying the logical design of an INBASE is an INBASE diagram (see Figure 28). An INBASE diagram displays two basic types of relationship sets also called information sets (INSETs): a) deep structural relationship sets also called vertical information sets (VINSETs), and b) graphical, spatial and phenomenological relationship sets also called horizontal

information sets (HINSETs). As explained in Section 3.5 graphical, spatial and/or phenomeno logical relationships can take part in defining a deep structural relationship. However, for the purpose of clarifying the nature of compound and primitive information, deep structural relationship sets are distinguished from graphical, spatial and phenomenological rela

tionship sets. Thus, a deep structural relationship is a super structure relationship because of the special role of generating a

named compound object from primitive objects. A deep structural relationship as a VINSET represents a relation

ship between one conceptual level and another in the cartographic information structure. The VINSETs are depicted in Figure 28 as nested

boxes. The nesting of boxes follows the subweb production rules in the linguistic model presented in Figure 20. The largest box labeled >148' 123

NAP REFERENCE BASE

REFERENCE l in e s GEOGRAPHIC BASE NAP BOUNDARY FEATURE OBJECTS SCALE MIN x ,y LAI./LO NG . GRID CHAIN r—t LINE MAX x ,y NOTATION RULE ATT R: hmN T ! : a[TTrTn AME ATTR:KIND : ATTR;K1ND RELATIONS ■ RELATIONS

THEMATIC IDENTIFICATION STRINGS ENTITY3 101 ID1 ID2 ID2

MESSAGE THEME

t i t l e symbolization LEGEND TEXT DATA DOMAIN RULE TEXT NODE CHAIN RULE SYMBOL SYMBOL SHADING RULE RULE RULE

THEMATIC DATA BASE SCHEMA

Figure 26. Expanded INBASE Diagram 124 is an initial web that represents a type III virtual map, hence an INBASE. The box for 'MAP' encloses boxes for 'REFERENCE BASE' and 'MESSAGE THEME' plus boxes for 'COORDINATE DEFINITION' and 'THEMATIC IDENTIFICATION'. The enclosure indicates that the initial web 'MAP' is rewritten as subwebs 'REFERENCE BASE' and 'MESSAGE THEME'. The boxes for 'COORDINATE DEFINITION' and 'THEMATIC IDENTIFICATION' are necessary for description of both ‘REFERENCE BASE' and 'MESSAGE THEME' and are shared indirectly by them. Furthermore, the boxes 'COORDINATE DEFINITION* and 'THEMATIC IDENTIFICATION’ are not high level conceptual entities like 'REFERNECE BASE' and 'MESSAGE THEME', but are instead nested conceptually at the lowest level of the compound object types

•REFERENCE BASE' and 'MEASSAGE THEME'. The nesting of boxes shown in the INBASE diagram is an attempt to depict syntactic relationships between cartographic information and thus depict a global information structure. The global information structure depicted in Figure 28 can be called the preterminal web marker of the INBASE. However, as mentioned above it is more descriptive to label it an INBASE diagram in this context. A graphical, spatial and/or phenomenological relationship as a HINSET represents a relationship between two object types when those types are on the same conceptual level and do not create a compound

object type. The HINSETs are depicted in Figure 28 as dotted-line arrows. The arrows indicate that the relationship is directional. The tail of the arrow emanates from the member of the HINSET whose attribute is defined by the member at the head of the arrow. HINSETs represent the relationships created by application of an embedding function E which 125 is part of a production rule as defined in Figure 17. Those relation ships are more clearly defined at the level of canonical structuring because attributes are specified at that level. A subset of the INBASE diagram of Figure 28 has been operational ized in the CART-QUERY system (see Figure 29). The thematic data base has been incorporated in the diagram to show functional dependencies between data objects. Some INSETs are derived through analytical processing and others are part of the stored data. The purpose of the information structure is to contain all of these to signify the existence or possible exis tence, of the information. Information structures composed of INSETs are translated into a canonical structure through elimination of redundant entity sets and redundant INSETs. A canonical structure is defined in Section 2.4.1 as a minimal structure needed for representing all entity sets plus their attributes and all relationships in all information structures for a particular organization. In the example presented here, the

canonical structure is the same as the global information structure (see Figure 29) with the addition of attributes for the entity sets

(see Figure 30). The canonical structure is translated into a network data structure of vector format. A network data structure of vector format utilizes a

logical pointer structure for operationalizing INSET relationships. When operationalized, an INSET consists of an owner and a member. The owner in an INSET is the entity set name or attribute from which the pointer emanates. The member is the entity set name or attribute to 126 VIRTUAL MAP REFERENCE BASE

REFERENCE LINES GEOGRAPHIC BASE

MAP BOUNDARY GRAPHIC OBJECTS

•NODE-**** v i *CHAIN-*"* POLYGON MIN X,Y * : MAX X,Y i ID1 ID1 ID1 ID2 ID2

* *

+ COORDINATE DEFINITION THEMATIC;DATA BASE

POINTS STRINGS ■*■ HEALTH FACILITY TAMBON PROVINCE ID1 ID1 ID1 X >Y X ,Y ; X ,Y ; X,Y ID2 ID2

MESSAGE THEME

TITLE SYMBOLIZATION TEXT RULE LEGEND DATA DOMAIN

TEXT CROSSHATCH ^ NODE AREA* RULE SYMBOL SQUARE CROSSHATCH RULE ASTERISK SHADING RULE

Figure 29, INBASE Global Information Structure for Type III Virtual Map 127

NODE CHAIN POLYGON

ID ID ID Pointer Low Node Node Pointer List High Node Chain Pointer List String Pointer

HAP BOUNDARY POINTS POINT STRINGS XMIN, YMIN ID ID XMAX, YMAX X,Y X »Y ;X ,Y ;X ,Y;... ;X ,Y

TITLE PROVINCE TAMBON Text Rule ID ID X,Y Literal Graphic Pointer Graphic Pointer Population Served

HEALTH FACILITY

ID Graphic Pointer Staff Family Planning Acceptors Outpatients Service Level

Figure 30. Attributes Included in Canonical Structure 128 which the pointer is pointing. The pointer is an identifier and, in a sense, becomes a pseudo-representation of the member in the entity set of the owner. The pseudo-representation of the member is actually the logical address of its record in the appropriate subfile. Thus, an owner record has a characteristic which is defined in terms of the mem ber record. Pointer structures eliminate data redundancy in the opera tionalization of INSETs. A data structure diagram graphically depicts owner-member relationships and indicates the graphical pointer a ttri butes (see Figure 31) employed in the CART-QUERY system. Logical record construction is also part of the logical data struc ture design. Logical records are the way in which single entities will be referenced, that is, as a grouping of attributes for each entity. The logical records for the CART-QUERY system are designed in the same fashion as given in Figure 30 for the canonical structure. A data structure is translated into a storage structure through 1) a specification of physical record construction, 2) a method of addressing for symbolic pointers, and 3) a type of indexing for entities in the INBASE. Physical records are aggregations of logical records. The operations system of the computer stores and accesses physical records rather than logical records for efficiency purposes. Data

exist in four large files: an entity set file, a graphic object file, a string file, and a work file. Each file is mounted on a separate

direct-access disk storage unit at the time of system use. Operationalization of the symbolic pointers is by base and index relative addressing of records on direct-access disk storage volumes.

Thus, different entity set subfiles can reside on the same direct-access OWNER OWNER PROVINCE TAMBON GRAPHIC TYPE GRAPHIC TYPE

MEMBER MEMBER POLYGON HEALTH FACILITY

OWNER OWNER POLYGON HEALTH FACILITY GRAPHIC TYPE

MEMBER MEMBER CHAIN NODE

OWNER OWNER CHAIN NODE

MEMBER MEMBER STRING-POINTS POINTS

Figure 31. Data Structure of the INBASE 130 volume. Entity sets are retrieved by specifying a relative address for the base record, i.e. the header record for the entity set, and then specifying an index number for the position of the particular record in the subfile. A subsidiary part of the INBASE is a dictionary of keywords (see

Figure 32). Keywords are terms used for defining entity sets and denoting processing options to the system. The keywords are part of the vocabulary of the CARTQUEL to be explained in Section 4.2.3. Access to the keywords is a sequential search process. Consequently, the keywords are listed by descending order of character string length to facilitate the development of a simple algorithm for searching the

dictionary. Each of the dictionary keywords is defined by a string of nine attributes following the character representation of the keyword. The first attribute signifies the number of attributes for each keyword.

In this example each keyword has eight attributes. Those eight attributes uniquely identify each keyword and specify the nature of

processing to be undertaken with it. As for example, attributes two, three and four represent the type of keyword, an ID within that type

and the attribute status of that keyword, respectively. Attributes seven and eight represent the I/O unit number for an entity set and the

base record; a zero indicates that the attribute is not applicable to

the keyword. 131 HEALTH.FACILITY 8 2 1 0 1 2 2 1 1 15 200 197 193 211 227 200 75 198 193 195 201 211 201 SER.LEVEL 8 2 1 7 5 1 2 1 1 9 226 197 217 75 211 197 229 197 211 PROXSIZE 8 3 3 18 0 2 0 0 0 8 215 217 214 231 226 201 233 197 ASTERISK 8 4 3 22 1 3 0 0 3 8 193 226 227 197 217 201 226 210 PROVINCE 8 2 5 0 2 1 2 200 3 8 215 217 214 229 201 213 195 197 VILL.POP 8 2 1 3 5 2 2 1 1 8 229 201 211 211 75 215 216 215 HF.NODE 8 2 1 2 2 1 8 1 1 7 200 298 65 213 214 196 197 POP.SER 8 2 3 3 2 2 2 100 3 7 215 214 215 75 226 197 217 FP.ACCP 8 2 1 5 5 2 2 1 1 7 198 215 65 193 195 195 215 OUT.PAT 8 2 1 6 6 2 2 1 1 7 216 228 227 75 215 193 227 TAMBON 8 2 3 0 1 3 ? 100 3 6 196 214 212 193 201 213 SQUARE 8 4 3 21 1 3 0 0 3 6 211 197 199 197 213 196 P.POLY 8 2 5 2 2 2 8 451 3 6 215 65 215 214 211 232 T.POLY 8 2 3 2 2 2 8 352 3 6 227 75 215 214 211 232 STAFF 8 2 1 4 5 2 2 1 1 5 226 227 193 198 198 SHADE 8 2 1 4 5 2 2 1 1 5 226 200 193 196 197 ASTER 8 4 3 22 1 3 0 0 3 5 193 226 227 197 217 PROX 8 3 3 6 1 0 0 1 4 4 215 217 214 231 BOX 8 4 3 1 1 3 0 0 3 3 294 214 231 HF 8 2 1 0 1 3 2 1 1 2 ?no 198

Figure 32. Dictionary of Keywords for CARTQUEL 132

4.2.2 Query Processor (QUEP)

The QUEP module utilizes subroutines named in a data manipulation language (DML) that undertake data manipulation. Data manipulation for processing queries consists of two types: data management and analyt ical processing. Data management involves data storage and data retrieval, whereas analytical processing involves data transformations from one form of the data into another. Data management is different from analytical processing in the sense that data management procedures do not alter the nature of raw data; while analytical procedures do alter the nature of raw data in generating the basis of information. As mentioned in Section 4.1, virtual maps displayed on a CRT screen are created from a combination of data management and analytical processing, but at a very primitive level in the programming system. The subroutines developed for the purpose of data management are FDGRPM, GTHEAD, GTRECD AND FLCOPY. Subroutine GTHEAD retrieves a header record for an entity subfile. The header record specifies the name of the subfile, the number of entities in the subfile and a boundary rectangle for all entities in the subfile. Subroutine GTRECD retrieves records from a subfile according to the parameters retrieved by the subroutine GTHEAD (see Figure 33). A single call to subroutine GTRECD will retrieve all entities of the subfile and an attribute if such an option has been specified. The routine retrieves the subfile for display and/or analytical processing.

Subroutine FDGRPH locates the appropriate graphic data type for an entity set that is to be displayed. The appropriate graphic type is 133

START

SEMANTICALLY CORRECT QUERY

FDGRPH FIND THE GRAPHIC ATTRIBUTE

GTHEAD GET HEADER

GTRECD GET A FILE OF RECORDS

PERFORM ANALYSIS OR DISPLAY

RETURN

Figure 33. Flow Diagram of File Retrieval from the INBASE 134 sent to subroutine GTRECD indicating the appropriate type of output. Subroutine FLCOPY copies from main storage to an external medium a subfile of entities that are derived through analytical processing. A header record is created with the appropriate parameters to retrieve the derived file. All derived records are then copied on to the external medium in accordance with the manner of processing. The subroutines for analytical processing are grouped into two major categories: hierarchical clustering and graphic display computations. The firs t group involves spatial clustering using a hierarchical method. Spatial clustering is undertaken using an algorithm developed by Ward (1963) within a context of cartographic generalization and health facility reduction. The Ward algorithm is based on an analysis of variance technique. Group membership is determined through a procedure of minimizing within group variation while maximizing between group variation (see Figure 34). The process of generalization implemented in the CART-QUERY system is a combination of spatial classification and simplification for point based entities (Robinson, Sale and Morrison 1978). Entities are firs t classified into proximate locational groups and then each group is simplified according to a location or size criterion as specified by the system user. The total number of groups which are to remain from the original number of entities is determined by a method employing Topfer's Radical Law of Selection (TSpfer and Pillewizer 1966). The basic form 135 START

NOBJ = ORIGINAL # OBJECTS-1

NOBJ YES COMPUTE # GROUPS CENTROID OF GROUPS

’r FIND DO 1, NOBJ REPRESENTATIVE: OBJECTS

COMBINE GROUP ^ RETURN ^ COMPUTE ERROR

NO TERROR MINIMUM ?

Figure 34. Flow Diagram of Clustering Algorithm for Analytical Processing 136 of the law for small scale maps is:

Nf - Na (ma/ mf )- where, Np = number of features to be included on derived map, = number of features on the original map,

Mp = scale of derived map, and = scale of original map. The user chooses the derived scale; the original scale and number of objects are read from the subfile that is processed. The classification process resulting in a derived number of features is accomplished in a hierarchical manner. All feature objects or entities of a subfile start out as members of their own group. In successive iterations a grouping of objects is performed by combining the membership of groups that are closer in proximity, defined as Euclidean distance, than to other groups. Consequently, objects that are close based on location will be clustered first. If the option for simplification by proximity is chosen by a user then the object that is closest to the group centroid will be chosen as the representative object of that group. If the proximity and size options for simplification are chosen then the object of largest size in the clustered group is chosen as the representative object of the group. An additional application of the algorithm is developed to show elimination of health facilities due to budget constraints assuming that a certain number could be identified. The user chooses the number of health facilities to be eliminated. The algorithm eliminates health 137 facilities based on spatial competition. That is, those facilities located closest to each other will be grouped first. The generalization and health facility applications implemented are used as a matter of convenience and no theoretical justification of the results is presented.

The intention of such applications is to depict the flexibility of query processing systems and not to depict the particular topic. The second category of analytical routines concern computations for the display of maps. Subroutine CHGSCL changes the scale of the carto graphic objects depending upon a choice of the display area. Subroutine CHGSCL computes the scaling of the graphic work area in user coordinates when a display area for the domain and/or the legend is defined using the graphic cursor. Those definitions can be made anywhere inside of the graphic work area. The coordinates of the domain or legend are scaled appropriately to f it the window that is defined. Display of the domain is based on the enclosing rectangle defined by the coordinates in the header record of each of the subfiles.

4.2.3 Cartographic Query Language (CARTQUEL)

The communication interface in CART-QUERY is a mixture of proced ural and nonprocedural interaction. Communication with the system is initiated with a nonprocedural language through which certain procedural channels are opened. However, during procedural communication the user is required to interact at only one level below the nonprocedural level. The CARTQUEL contains statements which indicate requests for analysis, design and display. Statements for analysis indicate to the 138 system that computational work is to be undertaken, i.e. in this example generalization or health facility clustering. Statements for design include locating a title, defining a geographic domain and constructing choropleth class intervals. Statements for display indicate that an entity subfile is to appear as the geographic domain of the virtual map on the CRT screen. The nonprocedural language called CARTQUEL is of the general form:

Figure 35. System Response Documenting Analytical Processing 140 whether or not the derived file is to be saved as part of working storage for later display or purged from the system. When saving the derived file in working storage, a temporary file number is issued that must be noted by the user in order to retrieve the file for display.

The statement: PERFORM CLUSTER PROXSIZE HF SER.LEVEL is basically the same operation as described with the SIMPLIFY option except that the user is prompted for the number of entities to be deleted from the number on the original file. The eliminated entities are those entities that are grouped together, only one of which is chosen as the

representative entity of the group. The process of designing a virtual map on the CRT screen is a method of elucidating the link between the surface structure of a Type I virtual

map and the deep structure of a Type III virtual map in the INBASE. The link can be characterized as the stage at which a terminal web is trans formed into a virtual map on the CRT screen, i.e. the interface between

virtual maps of Type III and I respectively. A terminal web is defined in Section 3.5 as an information structure with abstract lexical entries assigned but not yet fully defined. Such a skeletal map is created in CART-QUERY through the design of the geographic domain, title and the legend to appear on the map (see, for example, Figure 36). Those are the basic components that visually depict the deep structure of a

cartographic image. 141

SANPHURI THAILAND

BELOW 10 1 0 - 2 9 3 0 - 4 9 5 0 - 7 9 A B O V E 8 0

Figure 36. Skeletal Map 142

The statement: DESIGN DOMAIN requests the system to provide display crosshairs on the screen. The crosshairs are used to define the window inside of which the domain will be plotted. The province outline can then be plotted to give some idea of domain definition (see Figure 36).

The statement: DESIGN LEGEND BOX KEY indicates that a legend of type 'box' will be plotted and the literal 'KEY' will be written at the top of the boxes. This command also requests the system to prompt the user for definition of legend place ment and thelabeling of classification intervals in the legend (see Figure 36). The number of class intervals is already set to five. The statement: DESIGN TITLE 'SANPHURI, THAILAND' indicates that a title is to be displayed somewhere in the graphic work area. The literal string that will be written is specified in apostrophes. The user is then prompted to input the location of the starting point of the literal string plus the height of the letters in the literal string. The statement: DISPLAY MASTER TAMBON POP.SER SHADE requests the system to produce a cartographic display of the tambon entity in the master data base file using the attribute POP.SER, popu lation serviced, and to shade the polygons assuming that a shading 143 scheme had not been previously set up through the design of a legend

(see Figure 37). The statement: DISPLAY WORK 1 SQUARE requests the system to display subfile 1 from working storage and symbolize the entities as a square. The user must remember the file number that was provided when the file was originally created. The domain window used for display is the one that is current when the cornnand is issued. A title, descriptive text and the tambon outlines can be added to create a map with reference information (see Figure 38). The statements in CARTQUEL described in this section have been developed with the aid of an interactive language translator called LANG-PAK (Heindel and Roberto 1975). The LANG-PAK system employs a command language to assist in the development of a grammar such as the one used in CARTQUEL. The cornnand language consists of keywords that indicate different types of input to various option states (see Figure 39). The command language is defined in terms of itself so that the LANG-PAK system can be used in an interactive fashion. The firs t phase of CARTQUEL development concerns the development of its grammar. A grammar is developed through the use of syntactic and semantic language specification types. A syntactic specification type is a method used for structurally defining the components of a production rule, and takes the form:

^element name^ A semantic specification type is a convention used to clarify the SANPHURI. THAILAND

B E L O W 1 0 1 0 - 2 9 3 0 - 4 9 5 0 - 7 9 A B O V E 8 0

gure 37. Choropleth Map Using Tambon Base F ile SANPHUR1 * THAILAND

HCM-TH r« C lL tfY INOICATED

Figure 38. Health Facility Locations 146

State Name Required Input Abbreviation

Definition _ ...

Parse <• • > Examination EXAMINE E Undefine UNDEFINE U Trace TRACE T No-trace NOTRACE N Completion FINISHED F Save SAVE S Termination DISCONTINUE D

Figure 39. Keyword Commands in the LANG-PAK Command Language (After Heindel and Roberto, 1976) 147 context for application of a production rule, and takes the form:

'semantic message*. Those language specification types are used as the component parts of production rules defined to the system in one or more interactive sessions. The LANG-PAK system allows one to test the language, hence the grammar, during stages of development. This makes the system an extremely powerful and flexible tool. CARTQUEL is based on a context free, phrase structure grammar

(see Figure 40). The grammar is context free because the left side of a production rule must have one language specification type only; thus, no structural transformations are performed on the syntax of a language statement. The rules for the grammar as specified in Figure 40 are not listed in order of application, but are listed in the sequence in which they are internally stored according to their retrieval key by the LANG-PAK system. Some elements in the keyword dictionary are specific to the application demonstrated here, while others are general and can be applied in all cartographic applications. The keywords in the dictionary are terminal vocabulary elements, so they function in the same manner as the terminal vocabulary elements specified in quotation marks in language specification types. Thus, there is no functional difference between the terminal vocabulary elements in the grammar specification and those in the keyword dictionary. A deliberate attempt was made to design a language with a portion of the terminal vocabulary included in the definition of the rules and another portion included as part of the keyword dictionary (see Figures 40 and 32, respectively). The purpose was to test both types

Figure 40. Cartographic Query Language (CARTQUEL) Gramar 149 of design representation according to their effect on the implementation and flexibility of the language. Terminal vocabulary elements near the left bottom of a phrase marker, i.e. in left-most positions in language statements, are included as part of the language specification types in the grammar. Vocabulary elements near the right bottom of a phrase marker, i.e. the right-most positions of the language statements, are included in the keyword dictionary. The advantages and disadvantages of this design are discussed in Chapter 5.

4.2.4 Query Decoder (QUED)

The purpose of the query decoder component of the system is to accept a statement in the CARTQUEL and translate the surface structure expression of the statement into a deep structure expression. The elements of the deep structure expression are used by the QUEP component to undertake data manipulation. Query decoding is a three step translation process (see Figure 41). The firs t step involves a retrieval of an input symbol sequence in ASCII code as it appears on the graphics terminal screen. An input symbol sequence is defined in the previous section as a statement in the CARTQUEL. The input symbol sequence (consisting of four alpha numeric characters per 32 bit word) is translated into its EBCDIC integer equivalent sequence (consisting of a single alphanumeric character per 32 bit word). Since that process of translation is computer installation dependent in addition to being a process of byte manipulation, sub routines to undertake the task are programmed in IBM 370 Assembler language. Those routines, as well as their complementary routines 150 START J

INPUT SYMBOL SEQUENCE

ASSEMBLER ROUTINES DECODE ASCII CHARACTERS INTO EBCDIC INTEGERS

LANG-PAK PARSE BY DECODING EBCDIC SEQUENCE INTO TRANSLATION ELEMENTS

SYNTACTICALLY 'N.CORRECT?/

YES LANGUAGE APPLICATION PROGRAM DECODE TRANSLATION SEQUENCE FOR PROCESSING

SEMANTICALLY CORRECT FOR PROCESSING?

YES PROCESS QUERY

STOP

Figure 41. Flow Diagram of Decoding Process in the QUED Component 151 which reverse the translation process for output to the screen, are part of the system interaction management subcomponent of QUED. In the second step of query decoding, the integer EBCDIC sequence

is input to the LANG-PAK syntactic compiler for parsing. The result of a parse is a translation element consisting of an integer number that represents one of five classes of terminal words from a terminal break set plus a group of integer attributes that represents the character istics of the terminal word in the query language. A concatenation of translation elements is called a translation sequence. A translation

sequence represents the deep structure expression of the surface struc ture expression of the query language. The five classes of terminal words in the terminal break set are: integers, numbers, strings,

variables, and literals. The LANG-PAK syntactic compiler employs a top-down, fast-back parsing algorithm in conjunction with the CARTQUEL grammar to parse statements in the CARTQUEL. The term top-down means that parsing starts at the left-most symbol sequence and proceeds to the right. All com ponents of the left-most phrases are resolved before advancing to the right without backtracking. The term fast-back refers to the way in which the parse machine processes alternatives in a repeating specification, i.e. a specification that may be defined by a number of repeating alternatives. These alternatives are indicated by parenthese

enclosing a pair of numbers as in language specification type ^VLIST^ in Figure 40. The first member of the repeating specification is set as the current goal. If that member fails then the algorithm sets the next member as the goal and continues. A complete discussion of each 152 subroutine in LANG-PAK and how they operate is documented in the LANG-PAK reference book by Heindel and Roberto (1975). The syntactic compiler determines only if the SSE is syntactically correct. To assist in the recognition of semantically correct state ments a semantic compiler and a semantic machine must be developed. The semantic compiler parses semantic expressions that are used for indicating failures in the match of words for certain language specifi cation types. The language specification types ^AERR^ and ^OERR^ are examples of messages defined for such purposes (see Figure 40). The semantic machine is used in testing semantically correct com binations of elements in the query lanouage. The semantic machine is application dependent and thus becomes rather complex when the number of statements in the language is large. Tests for semantic correctness are undertaken by comparing attributes of the terminal break set trans lation elements that appear in the translation sequence. Comparison of attributes is specified in semantic test statements by position in the translation sequence. The position is given by a 1T1 followed by an integer. The value of the integer attributes is compared according to the relational operator in the test statement. If the relational operation succeeds then the semantic test fails and a flag is set to indicate failure. A PRINT message may be specified as part of the semantic test that indicates the nature of the failure. To avoid a complex grammar, the semantic machine is not fully developed in this demonstration project. Other advantages are discussed in the next

chapter. 153

The third step in decoding a query is a post processing of the translation sequence. Post processing concerns an identification of translation elements and what to do with them. Elements are identified by iterating through the translation sequence and resolving the attribute that identifies each translation element as a terminal break set member. The type of processing in the QUEP component is determined by the combination of elements that are resolved. Semantic processing

is also undertaken at this time to ensure proper element combinations defining particular types of data processing. The semantics is checked in the application program in lieu of constructing a rather complex

semantic machine; this is discussed in the following chapter. CHAPTER 5 An Evaluation of CART-QUERY and Information Structuring

5.1 Introduction

Three major types 01 systems or stages of geographic information system development exist (Tomlinson et al. 1976, p. 13): 1. Research or Experimental - to demonstrate that a particular data processing or manipulation technique is feasible 2. Demonstration - to show the system's actual or potential utility in one or more applications with a small test data set

* 3. Operational or Production - to process actual data regularly for specified problems and maintain the data through specified updating procedures The CART-QUERY system falls into category 1., although the focus of the implementation is more general than Tomlinson et al. (1976) describe. The objectives of this research as described in Chapter 1 are to characterize cartographic information structuring and implement this notion in a cartographic information system. Before an evaluation

of the results is presented, some comments on the nature of evaluation

are necessary. Tomlinson et a l. (1976) point out that any evaluation of a geographic information system has some evaluation perspective. An evaluation perspective is a background against which a system is measured. Three general perspectives are recognized (Tomlinson et al.

1976, p. 13): 154 155

1. Evaluation within context is aimed specifically at an evaluation that considers a system in terms of its stated objectives. Frequently objectives are not stated which makes perspective difficult to adopt. Objectives of a system may change during development and demonstration process. Finally, to obtain adequate information one must ask the system developers for a self-evaluation at the very time they are attempting to demonstrate feasibility. This perspective is used to point out a constraint on evaluation. 2. Evaluation outside context is

aimed at the question of transferability. Focus on generic data handling capabilities and the possibility of applying them to another set of problems. Perspective limits evaluation to specific functions that are currently in operation. The conditions according to which the system operates are con sidered only to the extent that they are a constraint on data handling capabilities. It is not concerned with whether or not the system is meeting a defined set of objectives, or whether objectives exist at all. 3. Evaluation by objective appraisal is aimed at whether methods and techniques, plus their future capabilities can be utilized in other systems with a similar set of problems. The procedure is to establish a conceptual framework containing the elements considered appropriate to meet a perceived need, to which the system or some of its elements may be transfered. Evaluation by objective appraisal is also aimed at identifying the total range of capabilities of data acquisition and manipulation for a perceived set of problems, and then compares the capabilities of existing systems to this set. In this framework the various elements of a system may be evaluated independently with respect to their application on new problems. Their problem involved comparison of an 'ideal' system, i.e. a con ceptual framework, with an 'imperfect* system, a GIS. Caulkins argues that a comparison of that nature is more valid than a comparison against other systems (although inevitably this will be the case) for the following reasons (Tomlinson et al. 1976, p. 15): 156

1) The conceptual framework represents a neutral framework and thus allows the system to be evaluated in an objective manner, although some notions of 'within context’ considerations are inevitable. 2) It does, not consider only the elements of a system that are evident in case studies; rather, it highlights some of the major issues which have arisen during development of the systems but which could not be exploited within the time and resources available. 3) It provides a perspective on the systems studied which directly suggests recommendations for similar efforts elsewhere. Unfortunately for the evaluation of concern here, the studies reported in Tomlinson et a l. (1976) are directed at GIS's with rather limited interactive capability and with a general focus on data input procedures. Furthermore, it is most likely that a truly objective evaluation cannot be accomplished here because of the reasons cited above. Nevertheless, an effort has been made to surmount these obstacles and undertake an evaluation of CART-QUERY in terms of its intended purpose. The following two questions are asked when evaluating each component of the CART-QUERY system: 1. How does the component facilitate information structuring and why is this good or bad? 2. What can be done to improve the component in terms of infor mation structuring? Each question is addressed according to four design aspects: 1. System design - pertains to software and hardware plus the modularity of a component with regardto the system as a whole 2. User design - pertains to flexibility and friendliness of interaction channels 3. Conmunication design - pertains to the means used for comnuni cation 157

4. Cartographic design - pertains to logical design considerations for Type I and Type III virtual maps. Some design aspects and questions are more pertinent than others for a given system component. Furthermore, neither the four design aspects nor the questions are entirely distinct at all points of discussion.

However, each design aspect has its major thrust; therefore, the combination of design aspect and questions facilitates a comprehensive evaluation of each component.

5.2 A Discussion of CARTQUEL

The main thrust of information structuring is to separate logical development of data bases from physical development. The development of CARTQUEL takes place outside of a system programming environment, and brings language development to the user in an interactive fashion. That modularity enhances the high-level orientation of the system because it focuses on language development rather than program develop ment. A language designer can test a language during the development phase, identifying improper syntax and semantics. In addition, during later stages of program development a designer can pass back and forth between language application and further language refinement. The grammar development phase can take place separately from the keyword dictionary development phase of the language. This allows a skeleton language to be developed that can be used for other than one application program. The keywords in the dictionary become variables inserted into that skeletal language. 158

Although LANG-PAK makes language development an easy process, the development phase can be improved through the use of an interactive program for inserting keywords and their attributes into the keyword dicitonary. At present that can be done only by editing the data set outside of the LANG-PAK system. Such a facility is not included by the developers of LANG-PAK because the subroutine which operationalizes access to words in the dictionary must be written by the application system developer. That subroutine would then determine the access method to be used in keyword insertion. The CARTQUEL is implemented as a mixture of procedural and non procedural user design that supports information flow. The procedural- nonprocedural combination is implemented to otpimize user friendliness, especially when multiple parameters are needed as input, e.g. during legend construction. Systems that are based on nonprocedural user design offer more flexible interaction than systems based entirely on procedural user design. Nonprocedural user design permits the user to enter commands and parameter options to a system without being prompted for such input. In a procedural environment a user must be prompted for a particular input command or parameter. Consequently, a non procedural design does not constrain a user to interact with a system

in terms of the logic of program flow, which is the case with a procedural design. Unfortunately, the more flexibility one has in a nonprocedural user

environment, the less direction one has as well. That lack of direction is undesirable particularly for an inexperienced user of a system. Inexperienced users need more support than users who are familiar with 159 the types of commands and parameters needed to extract information from the system. A number of alternatives can remedy the problem of dealing with two types of users. One alternative may be to provide informative support in the nonprocedural environment. Thus, an inexperienced user can ask for help from the system and the system would provide descriptions of language options at any stage of interaction. A second alternative may be to support two language interfaces, one for non procedural interaction and another for procedural interaction. A user could interact in the most comfortable way. A structured system design should be able to incorporate the two interfaces without extensive changes in system programming. Many alternatives exist for communication design: command language keyboard input, menu input, command language voice input, touch input by electronic pencil, or touch input by finger. Hardware availability usually precludes all but the first two: command language by keyboard and menu input. Some systems rely heavily on one or the other and some mix the two. Menu input is most practical with a tablet that exists apart from the graphics terminal because numerous functions can be displayed all at one time. Command language input is usually more practical when such a tablet is unavailable. However, a combination of menu and command language communication may be an optimal method of communication if designed properly. Unfortunately, the criteria for such a communication design have not been reported in the literature. 160

The CARTQUEL is developed to support flexible communication within the constraints of user design. Coimunication that takes place in the CART-QUERY system is facilitated by the structure of CARTQUEL. The language is formulated to be able to express a full thought for processing, i.e. to express a desire for information in a single input statement and thereby facilitate coherent communication. Input symbol order may be inverted in the sequence of specification. CARTQUEL permits mnemonics and synonyms to be developed to shorten the time needed to input a statement or to allow different terms for the same type of processing, respectively. Those mnemonics and synonyms are developed as part of the keyword dictionary, and not part of the grammar. Such a procedure permits the user to develop terms that are meaningful for a particular application, which may help with information transferral. As mentioned in Chapter 4, the terminal vocabulary of CARTQUEL is dispersed among the gramnar and the keyword dictionary. Both vocab

ulary locations have their advantages and disadvantages. Vocabulary located in the gramnar causes the grammar to be application dependent.

However, it is much easier to specify error messages because syntactic error messages take the same form as semantic error messages. This

reduces the complexity of the semantic machine; in fact, it may eliminate the need for a semantic machine. As a language grows so must the semantic complexity of the language. When the terminal vocabulary is specified in the keyword diction ary the grammar is less application dependent. However, the semantic

tests specified in the grammar become very unwieldly because of the 161

number of possible combinations in each statement. Specification of vocabulary in the dictionary makes it easier to change the application dependent portion of the language, i.e. to change keywords by addition, deletion or modification of entire words or attributes belonging to those words. The sp lit design utilized in this implementation is believed to be the optimal design because terminal elements belonging to the structural part of the system, i.e. somewhat application independent, reside in the granuiar; whereas the terminal elements associated with the application reside in the dictionary. The CARTQUEL supports information structuring of Type I virtual maps through cartographic design statements that control definition of geographic domain, legend and title . Those statements indicate to the system that a cartographic object is to be defined of the kind denoted in the statement. Once a statement is given the graphic cursor is turned on and the crosshairs appear on the screen in order to define the size and location of those objects. Use of the graphic cursor permits a user to define abstract cartographic objects in a graphic manner. That method does not require the user to estimate the length and width of the domain to be displayed in the graphic

work area; these are computed by the system. Another aspect of cartographic design that also supports user

convenience is the line count manager. The line count manager monitors the number of text lines displayed on the screen. When the text display reaches the bottom of the screen, the screen is auto matically erased and whatever appears in the graphic work area is 162 redisplayed. The user continues to input commands with the line counter reset to indicate a new text display.

5.3 A Discussion of QUED

The QUED component processes statements in the CARTQUEL, and therefore supports information processing at a primitive level. The major processing unit of the query decoder for CART-QUERY is the LANG-PAK system. The purpose of the LANG-PAK system is to support language parsing through a modular approach. Consequently, all sub routines that facilitate language processing belong to the QUED com ponent. In addition to LANG-PAK, those subroutines also include the installation dependent ones which are not supplied as part of the

LANG-PAK system. Because the language processor is a necessary part of the system, the QUED component can be said to support information structuring. The QUED component represents a substantial portion of the CART- QUERY software. The algorithms involved are not simple algorithms. If the CARTQUEL is to be more user-oriented, the software to support it would make the QUED component even more complex. Semantic processing of statements in the language should occur at the time the translation sequence is being created. That involves semantic tests imbedded in the grammar which in turn means that a complex semantic machine must be developed. Ironically, when semantic machines grow in complexity, they increase in application orientation. Such application orientation is difficult to 163 monitor when the keyword dictionary is changed to include other key words, especially if translation elements are of different lengths. Thus, there is a tradeoff between semantic machine complexity and application orientation, the payoff may not be worthy of the effort. One alternative to this problem would be to construct multiple semantic machines using a structured programming approach, rather than relying on a single machine to handle all semantic processing. The grammar of the query language is constrained to a linear, phrase structure form because the parsing algorithm in LANG-PAK is a top-down, fast-back parser. An enhancement in communication design could be made if a transformational grammar is employed. Thus, an improvement in the system would be to employ a parsing algorithm that supports transformational grammars. A user would then be able to invert the sequence of entire phrases in conmand statements, rather than inverting a single keyword.

5.4 A Discussion of QUEP

A data management system is intended to support information struc turing rather than data structuring. Unfortunately, the absence of a formal data base management software package constrains the flexibil ity of data manipulation in CART-QUERY. Consequently, the system is not as general as it could be given the availability of such software. Furthermore, CART-QUERY is not an elegant modular system because a structured programming methodology is not employed in all phases of software design. However, the system is modular in the sense of the components and subcomponents described in Chapter 4. 164

The data management subcomponent of the QUEP component facilitates information structuring by separating a logical manipulation of data from a physical manipulation of data. Logical manipulation concerns a transferral of records as they exist in a logical organization, whereas physical manipulation concerns the data as it exists in the system in terms of physical blocks. In terms of logical manipulation a lexical rule (read statement for data entry) assigns instances of cartographic objects to an abstract lexical item (a keyword in the dictionary) of the same phenomenological type. This process is repeated for all objects that have been designated for output from a subfile. Those objects are used for either analytical processing and/or virtual map display. The use of common blocks for sharing data among subroutines reduces modularity, whereas parameter lists in subroutines provide modularity. However, parameter lists can become unrulely when large numbers of variables are shared among subroutines. Furthermore, common blocks add to programming simplicity as long as such common blocks are monitored effectively. The analytical portion of the QUEP component supports information structuring through information derivation. An information structure interpretation can be given to the spatial clustering process that produces a spatial hierarchy for health facility sites. Each site that is symbolized by a square in Figure 38 represents a terminal element at the surface structure level. When two elements group together to form a spatial cluster a proximity relationship is constructed between them. The spatial cluster represents an abstract compound object when the 165

cluster is considered a conceptual whole based on the spatial relation ship. Consequently, each element is associated with the compound object through a deep structural relationship that is based on the spatial proximity relationship. Successive deep structural relationships are created as the hierarchical clustering algorithm proceeds to cluster elements and groups of elements. The sequence of clustering creates an information hierarchy which is a derived information structure tha*.

be depicted as a web marker described in Section 3.5. The geographic domain becomes the root node of the information structure in the same way a sentence is the root node of the phrase marker. Each of the spatial clusters becomes a conceptual nonterminal symbol of the web marker. Thus, rather than storing the information in the INBASE, it is

derived through analytical processing. The display capacity of CART-QUERY could be improved considerably

with an alphanumeric terminal added to the hardware configuration of the system. An alphanumeric terminal, possible with scrolling capability, could be used for corrmand language input and display. That would provide the system with a CRT screen dedicated to cartographic display only, rather than sharing with the command language. Such a configuration is an industry standard for graphic information systems,

and eliminates the need for a command language monitor. An improvement in cartographic design of CARTQUEL would be to include statements that permit windowing into a cartographic display. The graphic cursor could also be used to support locational identi fication of objects that must be identified for further analytical processing. Routines to support those processes would be part of the 166 data management system. Cartographic object placement is crucial to information struc turing . CART-QUERY contains some facility for placement of the geographic domain and legend. However, further enhancements should include routines that allow objects to be stored and recalled at any scale without redefining the parameters which originally defined these objects. A significant improvement in programming the entire system can be made by utilizing a programming language that supports abstract data types. Abstract data types are programming tools for data represen tation that provide a higher level concept definition as in the HBDS model than standard program data types, e.g. real, inten^r aiid character.

Although abstract data types may facilitate data management, they may not facilitate analytical processing of the kind implemented in

CART-QUERY. Further improvements to CART-QUERY would include legend and domain

symbolization options. Providing more options enhances one's ability to perform correct cartographic design in regards to data dimensions.

5.5 A Discussion of INBASE

The design of an INBASE forces one to pass through stages of data base design in order to clarify data base organization. An information structure can be used as a design tool for specifying relationships prior to designing a data structure that operationalizes these relation ships. The advantage in using information structures is that one is able to show what relationships should exist between entities without 167 dictating the method for operationalizing them. In this way information structures can act as an interface in user design of data bases without requiring a user to know about data structures. Two local information structures are used in the INBASE design of CART-QUERY, one for cartographic generalization and another for health facility site analysis. Since the data in the two are treated in such a similar analytical manner one may label the two as if derived from the same local information structure. However, that is a programmer's view and not a user's view. From a user’s viewpoint phenomenological attributes of health facilities do not take on the same importance in cartographic generalization as they do in health facility site analysis. Furthermore, the terminology of the two topics make the process of data base design different applications. Consequently, the health facility information structure is different because of those phenomenological characteristics particular to the subject matter. In the same manner, with cartographic generalization a health facility is not of primary interest, but what is of primary interest is whether a cartographer is dealing with a point, line or area entity and the nature of neighborhood relationships of these entities. Because the topic of cartographic generalization is of interest to cartographers and the topic of health facility site analysis is of interest to health planners, cartographers and health planners cannot be expected to communicate their interests using the same 'programming' terminology. Therefore, local information structures particular to each topic have been used to develop a global information structure which results in the CART-QUERY INBASF. 168

Part of system design concerns the data structure to support information structuring. A task can be trivial or complex depending upon the data structure utilized. Cartographic base files as virtual maps should facilitate the same mental operations as paper maps or actual geographic reality. They should be able to provide information about the total neighborhood of a feature if we want to use them for such tasks as generalization or name placement. (Brassel 1979, p. 129) A flexible data structure as a virtual map can facilitate more mental operations than can paper maps and/or geographic reality because of the ability to extend the number of entities to be considered at any single time. The data structure in CART-QUERY is a global network data structure

that utilizes symbolic pointers to operationalize relationships between information parcels. The symbolic pointers are relative addresses to logical records in the different subfiles of the INBASE- The keyword dictionary is the part of the INBASE that is the 'master key' to the subfiles* setting up the base index on which to build the relative addresses of the records. The base address in a subfile is a manually encoded integer. Manually encoding the base address is advantageous from the point of view that new entities can be stored in a subfile after the data base has been created if sufficient storage is allocated. The disadvantage is that manual input by the user is not an attractive

alternative because it is data base specific. The storage structure was transformed from an input directed data base organization to a manipulation and output directed data base organization. That occurred at data base load time when the structure 169 went from separate object-related files to a combined object-related file organization. The change was precipitated by a change in the goal-direction, from a simplicity-oriented input goal to an efficiecy- oriented manipulation and output goal. The change is supportive of the data structure utilized for information structuring. A further enhancement to the storage structure is a record compaction format that utilizes disk storage in the most efficient manner. A discussion of user design for the keyword dictionary appears

in the CARTQUEL evaluation section and will not be repeated here. However, it is important to understand that the keyword dictionary is part of both the communication language and the INBASE and is the primary interface between those two that supports information struc turing. Improvements can be made to the data structure and storage struc ture if an alternative programming language is available. Such programming languages as PASCAL or ALGOL provide one with the flexibil ity to define recursive structures. Recursive structures conveniently model the organization of production rules presented in Chapter 3. A standard data base management software package and data base

model would facilitate the information structuring process by making it easier to change storage structures. That would allow a more vivid depiction of the effect of information structures on data structures and storage structures. CHAPTER 6 Summary and Conclusions

6.1 Sumnary

In the past few years cartography has enjoyed an enormous expansion due to theoretical developments in numerical and analytical cartography along with rapidly changing technology. Any cartographic product can now be categorized as one of two types: real or virtual, and further subdivided in the virtual domain as either Type I, II or III. Those map types may take on any one or more of four basic functions: (1) as a graphical storage device, (2) as a symbolic/iconic represent ation with which to examine reality, (3) as a communication tool, and/or (4) as an analytical tool. Future cartographic data base theory developments may affect most types and most functions. A theoretical approach to cartographic data base design is necessary to guide an orderly development. This research concerns the logical organization of interactive cartographic displays and data base structure, i.e. Type I and Type III virtual maps, respectively. The major focus in this research is on information structures and how they can assist in the understanding of virtual map structure. Description of an information structure is a way of describing the interface between surface structure and deep structure in virtual maps. Surface structure is the physical structure 170 171 observed or most readily apparent. Deep structure is the underlying conceptual structure or the logical way in which terminal elements of the surface structure are tied together to offer spatial information and meaning. The literature related to a linguistic interpretation of infor mation structures of virtual maps comes from semiotics, picture processing, computer science and geographic cartography. Semiotics is a study of signals, signs and symbols and their role in syntactic, semantic and pragmatic processes. Linguistics is a subfield of semiotics that concerns symbols in human language rather than signals in machine language or signs in animal language. Linguistics is particularly useful as a way of characterizing symbols in terms of structural relationships. For that reason, researchers in picture processing have been using linguistics and grammars as a method for logically describing pictures since the early 1960's. Although a grammar for a natural language such as English is different from one for a programming language such as FORTRAN and both are different from a grammar for pictures, the concept of logical organization guided by production rules is similar. Syntactic models as grammars for closed-ended classes of pictures have been successful because grammars that generate a class of pictures can easily describe or parse those same pictures. Syntactic models as grammars for open-ended classes of pictures have not enjoyed the same level of success because of the variety of pictures involved. Virtual maps of Type I lie somewhere between closed-ended and open-ended classes because these maps are generated by computer algorithms but at the same 172 time they have the potential of being very complex. Grammars for generating picture data structures have been form ulated but provide little operational results. Therefore, grammars have been used in a heuristic capacity for generating data structures more than they have been used for generating the actual data structures. For this reason, grammatical concepts are utilized in an information structuring context rather than a data structuring context. Six levels of structuring information and data have been identi fied as steps in a process of logical data base design. The firs t level, closest to the user of a data base, involves data reality which is the everyday environment concerning communication within an organization. The second level involves formulating an information structure which is a logical description of the information to be stored and accessed by users of the information system. A canonical structure, the third level, involves a process of combining information structures in order to reduce data redundancy so that a conceptual data model of the data base can be formulated. The fourth level is the data structure or data model that is supported and utilized by the data base management system that is written in some programming language. The fifth level is the storage structure involving the actual way in which records are to be stored in a given file structure on some computer hardware. The sixth level is machine encoding which is the particular way that a particular computer stores programs and data. The levels most pertinent to the theoretical contributions in this research are information structure, canonical structure and data structure. Because most notions are described in terms of a single 173

information structure, the canonical structure is effectively the same as the information structure. However, if multiple information struc tures, i.e. many local information structures, are of concern then one must consider the canonical structure step as a separate, very

important step in data base design. Initial concern in the literature review is with spatial data structures and data models. Data structures are sufficiently close to algorithmic design and programming that a significant amount of re search has appeared in the cartographic, geographic and computer science literature. A history of the development of spatial Outa structures, from lists of points to data base structures, is presented. Data models have been a topic of concern in the computer science literature for the past ten years; however, only recently have data models been applied to spatial data structuring. Until recently, cartographic data bases have had very little logical organization because they lacked a data model. Therefore, such data bases have been utilized for cartographic display

only. Flexible data bases are those that are based on a data model. Data models for cartographic data bases are beginning to be investigated. Interesting problems are surfacing because of the difference between spatial data handling and nonspatial data handling in information

sys terns. Linguistic models have been applied to data structures both out side and within the context of data base design. Linguistic-like rules for information structuring in an infological data base design context have been reported in the literature, but not in terms of a formal linguistic model. Thus, linguistic techniques have been utilized for 174 both data and information structuring. At which level linguistic tech niques are more suitably employed for overall benefit to data base design in particular and information systems in general has yet to be determined. The two major extremes of management information systems are exec utive and operational. Executive information systems support a wide variety of users and hence a wide variety of questions that perhaps have no standard sequence of input. In contrast, operational infor mation systems deal with a narrowly defined group of users that input questions in a standard format. Most information systems have a com bination of characteristics from both extremes. Geographic information systems are not much different from many management information systems in the sense that most GIS's have char acteristics from both extremes. The major difference between a manage ment information system and a geographic information system is that the former utilizes a non-spatial data base whereas the latter utilizes a

spatial data base, and the spatial data base may have a non-spatial and a spatial component. Another significant difference has been that many GIS's contain a cartographic subsystem. This cartographic subsystem has become so important in some systems that they are called cartographic information systems or mapping information systems. The approach to information structuring in this research might be called a linguistic, infological approach. The main concern is with the logical organization of geographic entities and how they are concep tually represented as cartographic objects, attributes and relationships in virtual maps. The linguistic approach employed is a structural 175 approach, i.e. concern is with surface structure representations and deep structure representations. The information structure that is dis cussed formalizes a description of the logical organization of objects, attributes and relationships in terms of a grammar. The grammar is the basis of a model of virtual map competence rather than virtual map performance, i.e. the way maps are organized logically, rather than how they are constructed exactly. A grammar for information structures of virtual maps consists of a vocabulary and a set of rules. The vocabulary is comprised of two major categories: a nonterminal vocabulary and a terminal vocabulary. An initial symbol is a special member of the nonterminal category and is often represented as a category by itself. The nonterminal vocabulary consists of members that are conceptual or compound in nature, i.e. they are usually comprised of a group of elements. The terminal vocabulary consists of members that are the classes of primitive elements that make up a map, i.e. they are usually single elements at the base of an infor mation structure or single categories of a basic cartographic symbol. The set of rules in a grammar consists of three major kinds: production rules, lexical rules and transformational rules. Production rules generate a base web from an initial web. A web is a labeled graph which is a level in an information structure. An initial web is the root node of the information structure that is labeled by the in itial symbol mentioned previously. The root node in a sense represents a virtual map, whether Type I or Type III, in its entirety. The base web that is generated from the initial web is a labeled graph represent ing a virtual map in abstract terms. That is, the virtual map exists 176 only as a conceptual entity with no cartographic objects depicted as in the case with a Type I virtual map or empirical data loaded in the data base as in the case of a Type III virtual map. Production rules have the task of rewriting nonterminal elements into other nonterminal elements; and in the latter stages of generation nonterminal elements are rewritten into preterminal elements. Preterminal elements are the general classes of primitive objects that appear in or on virtual maps, e.g. points, lines and polygons. Lexical rules insert lexical entries into preterminal elements. Lexical entries are members of the terminal vocabulary, i.e. instances of actual objects represented as each point, line or polygon element on Type I virtual maps or data that represents each point, line or polygon in the data base. Lexical entries are inserted into preterminal elements by a process of matching attributes of the abstract object with an actual instance of that object. Operationally, lexical rules are statements that have been determined through a process of matching attributes in the keyword dictionary with attributes in the data base. Transformational rules are functions that transform instances of one object into an alternative form. Programming read statements in CART-QUERY are simple transformational functions that select data from the INBASE and transform them (through the use of a number of functions) into cartographic objects that appear on the CRT screen. Trans formational rules may be elementary or compound, and singular or general. An elementary transformation involves one transformational step, whereas a compound transformation involves a number of concatenated steps. A singular transformation involves a single cartographic object whereas 177 a general transformation involves an entire class of objects. A subweb of an information structure is a local information struc ture; it is part of the overall information structure. A terminal web is a global information structure because all production rules and lexical rules have been applied but transformational rules have not. A web marker graphically depicts an information structure. A web marker may depict graphically a local information structure or a global infor mation structure depending upon the extent of the data base utilized. A prototype system for query processing, CART-QUERY, was developed to demonstrate the feasibility of an information structuring framework for virtual maps. Query processing is an interaction method that utilizes high-level, English-like statements as a user communication interface and is therefore user-oriented. An information structure is operationalized in terms of an information base, INBASE. An INBASE is an extended notion of a data base, incorporating con ceptual information as well as data. The keyword dictionary is the major source of conceptual information, entity names are stored along with attributes that are used to retrieve data from the data base. Con ceptual information in the INBASE is included, also, at run-time when the geographic domain, legend and title are given definition. Each of these compound objects is stored during run-time as part of the map; this allows a redisplay of the map when the text display is erased automatically from the screen. The data in the INBASE consists of instances for graphical and phenomenological attributes of tambons and

health facilities. 178

The query processing component* QUEP, of CART-QUERY has two major subcomponents, a data base management component and an analytical processing component. The data base management subcomponent handles data storage, retrieval and display whereas the analytical component consists of routines that transform data into information. The analyt ical routines implemented in CART-QUERY perform conceptual object definition with the graphic cursor and spatial clustering with a hierarchical clustering algorithm. The cartographic query language, CARTQUEL, is a command language of

English-like statements that are used to specify requests to the system in user-oriented terms. The grammar of the language is of a linear, phrase structure design. Keywords, i.e. words in the terminal vocab ulary, that are of a general cartographic nature are included in the definition of the gramnar. Keywords that are application-oriented are included in the keyword dictionary. The separation of keywords in that manner provides for language generality where it is important, but also facilitates the development of a simple query decoder that would otherwise be impossible. The query decoder component, QUED, consists of the LANG-PAK soft ware system in addition to computer installation dependent subroutines.

The parsing method which is the major purpose of QUED employs a top-

down, fast-back algorithm. The parsing algorithm uses the CARTQUEL

grammar as a guide in decoding CARTQUEL statements. The subroutines

that are installation dependent support input at the terminal by decoding an input symbol sequence from ASCII representation to EBCDIC representation and vice versa for output at the terminal. 179

Evaluation perspectives are presented as a prelude for an evalu ation of CART-QUERY components. The evaluation employs a within-context perspective because the system is predominantly a research project that is dissimilar to many systems which currently exist. Two major questions are asked: (1) how does the component facil itate information structuring and why is this good or bad?, and (2) what can be done to improve the component in terms of information struc turing? Answers to those questions are discussed in terms of four design aspects of a system: system design, user design, communication design, and cartographic design. Some design aspects are more pertinent to some components than others, e.g. user and communication design for CARTQUEL, system design for QUEP and QUED, and cartographic design for INBASE. However, all components are addressed in terms of both questions and each design aspect if at all possible. Numerous improvements are suggested that would enhance the research stage of CART-QUERY. Some sense of an 'ideal' system can be gained through aggregation of the improvements if all are implemented in some manner in the CART-QUERY system.

6.2 Conclusions and Implications for Future Research

Conclusions that have developed as a result of this research range from theoretical contributions in geographic cartography to suggestions for possible improvement in program design of interactive cartographic/

geographic information systems. The primary contributions of the research are: 180

1. Identifying six steps of structuring information and data for logical cartographic data base design 2. Characterizing virtual maps of Type I and Type III in a formal manner with a focus on information structures 3. Developing a link between discussions about the logical organ ization of Types I and III virtual maps that is consistent with a methodology for pursuing investigations of spatial cognition The goal of this research is to advance the theoretical development of cartographic data base theory through an identification of logical steps in a data base design process with a focus on information rather than data. This focus on information rather than data attempts to bridge the user and systems designer interface for data bases.

The levels of structuring imply important design problems and decisions: - how to translate 'reality' into an information structure - how to translate information structures into a canonical struc- structure - how to translate a canonical structure into a data structure - how to translate a data structure into a storage structure On each level an almost infinite number of alternative mappings exist,

i.e. data reality can be represented by many alternative information structures, an information structure by many alternative canonical structures, a canonical structure by many alternative data structures, and a data structure by many alternative storage structures. Additional problems are introduced when alternatives are difficult to evaluate because these alternatives may possess many properties which cannot be weighted and combined into one objective function. 181

The linguistic approach provides a theoretical framework for describing the multi-dimensionality of Type I and III virtual maps as related to information storage and display. A grammar cannot be totally deductive in the strictest sense because identification of terminal elements requires inter-subjective agreement, and it is these terminal elements which are recognized most easily in day to day exposure to a given topic. Bouille'’ suggests that his method of devel

oping a skeleton of a data structure is deductive; however the deduction is an interpretation by itself. Bouill/ is interested in developing

more than application oriented data structures and attempting to use deductive methods to do this. No discussion is carried out concerning

the possibility of whether deductive structures exist. A good discussion about this is given by Mark (1978). Here it is assumed that

a logical, deductive approach is utilized in information structuring

previous to data structuring. The concept of an INBASE helps to identify unusual relationship situations before the implementation of a data structure in a DBMS.

The process of creating information sets is a firs t step at under standing the primitives that compose local information structures which

in turn create global information structures that are similar to

knowledge structures. An optimal data structure for all applications does not exist because differential complexity requires different levels

of sophistication pertaining to the problem at hand. However, if one

deals with information and is given the tools to construct data struc tures from information structures then one would be able to build a

wide variety of structures. The objective, therefore, is to supply 182

tools that would be able to build a data structure that is based on information sets that are organized in a logical fashion by a user in

cooperation with a data base designer. A fundamental concern with many data bases is whether to store information or continually derive the same information. Information structuring can help identify those situations* and clarify whether to store or derive. Information that changes rapidly should be derived, whereas information that is stable should be stored. Also, relation ships may be stored or derived depending upon their frequency of use. If relationships are queried frequently, perhaps more than ten times, then it is more efficient to store them than to derive them over and over again. Storage of relationships is constrained, of course, by physical limitations, i.e. on the availability of physical storage space. If storage space is available, an appropriate alternative to storing complex pointer structures is to store relationships as binary information sets. Such an approach makes i t easy to manage information. That approach will be employed more often as mini-computers are set up to lend back-end data base management support to main-frame host computers that undertake all analytical processing functions. A major concern with systems that support information structuring is the communication interface. Many alternatives exist, e.g. command languages, screen menus, menu tablets and light pens. A comparison of those comnunication devices in a cartographic context has not yet

appeared in the literature. Therefore, their relative efficiency has yet to be established even though those techniques have existed for quite some time. However, it may be impossible to specify their 183 relative advantages due to context and user peculiarities. Another major concern in systems that support information struc turing is user design of the communication interface in the system. A flexible user design supported by nonprocedural interaction is more complex to program than is a procedural approach. A procedural approach to user interaction eliminates the need for a separate language decoder. It also simplifies semantic and syntactic error checking. However, the language decoder software must be written as part of the application software. Such a procedure reduces the extendability and generality of a system due to continual reprogramming of the software. In either case, speed of communication is of critical importance. It is important to allow the user to proceed at a pace which best suits the user; advanced users always expect quick input and output. Speed may be facilitated by programming all input and output management routines in an assembly language; however an added expense in lack of software transferability is incurred. A within-context evaluation perspective of the information struc turing capabilities of the CART-QUERY system tends to beg the question as to whether the system can handle information structuring. Of course it can because that is the intended purpose of the programming project; but how it does is the major question. If one used an out-of-context evaluation perspective a different set of questions may be asked, thus a different set of answers would result. CART-QUERY could have been compared to an 'ideal' GIS as set forth in Tomlinson et al. (1976), but as is mentioned in 184 the introduction to Chapter 5, the ideal GIS focuses more on data input than on information manipulation. CART-QUERY does not have an integrated data input subsystem. Different questions and answers are involved when evaluating research oriented and production oriented systems. A number of topics for future research on interactive cartographic information systems can be derived from the conclusions mentioned in this section. Advancements are needed on all research frontiers. More advances lead to more questions. An empirical comparison of network, relational, HBDS, and other standard data models applied to the same cartographic data context is urgently needed to clarify the relative advantages and disadvantages of these and other data models. The literature review attempts to discuss the relative merits of the models, but an operationalized investigation would go much further. In concert with a comparison of data models, empirical tests of the information structuring—canonical structuring— data structuring process could be undertaken to show the true merits of the data models and an information structuring methodology. The research reported here is only one small step toward a complete investigation of relationships in the data base design process. Further research is needed to analytically specify the nature of relationships and how they can best be handled in a data base environ ment. Deriving spatial relationships is often a time consuming process partly because of the way in which space is characterized in data structures. Representation of relationships is still an active topic for consideration for data structures and storage structures, the best 185 methods may not have been discovered yet. Further investigation of abstract data types and data structure operations as pertains to spatial data structures are needed. Another topic in need of research is user interface design, especially of an interdisciplinary focus concerning cartography,

computer graphics and psychology. The advantages and disadvantages of both procedural and nonprocedural interaction have not been investi gated fully because of a lack of formal experimentation on the topic. In addition to user design, more research is needed with side by side

comparisons of communication techniques at the terminal. The relative advantages of cartographic command languages and menus has not been thoroughly investigated and is therefore s till an interesting topic for experimentation. Consequently, continued research on all of these topics is critical to advance our knowledge and solve the problems concerning query processing for spatial information systems. LIST OF REFERENCES

Abe, Norihiro; Masaharu Mizumoto; Jun-Ichi Toyoda; and Kohkichi Tanaka 1973 "Web Grammars and Several Graphs." Journal of Computer and Systems Science 7 (1973): 37-65.

Ackerman, E. A. 1957 "Resources for the Future, Inc. and Resource Use Education." Journal of Geography 51 (1957): 103-109.

Allen, J. P. B. and P. Van Buren 1971 Chomsky: Selected Readings. London: Oxford University Press, 1971. Anderson, D. E., J. L. Angel and A. J. Gorny 1978 "World Data Bank II: Content, Structure and Application." Harvard Papers on Geographic Information Systems Vol. 2, G. Dutton, ed. Cambridge, MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis.

Anderson, R. H. 1968 "Syntax-Directed Recognition of Hand-Printed Two- Dimensional Mathematics." Interactive Systems for Experimental Applied Mathematics (1968): 436-59.

Arms, S. 1970 MAP/MODEL System: System Description and User's Guide. Eugene, OR: Bureau of Governmental Research and Service, University of Oregon.

Bachman, C, W. 1969 "Data Structure Diagrams." Data Base 1 2 (Summer 1969): 4-10. Bartsch, R. and T. Vennemann 1972 Semantic Structures. Frankfurt: Athenaum Verlag, 1972.

Basoglu, U. and 0. L. Morrison 1978 "The Efficient Hierarchical Data Structure for the U.S. Historical County Boundary Data File." Harvard Papers on Geographic Information Systems Vol. 4, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. 186 187 Baxter, R. S 1978 "Data Structure Problems in Interactive Manipulations and Graphical Display of Nested Zone Thematic Maps." Harvard Papers on Geographic Information Systems Vol. 2, G. Dutton, ed., Cambridge, MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis.

Bennett, J. L. 1976 "User Oriented Graphics Systems for Decision Support in Unstructured Tasks." Computer Graphics 10 (1976): 3-11.

Berge, C. 1976 Graphs and Hypergraphs. Amsterdam: North-Holland Publishing, 1976. Betak, J. F. 1972 "Measuring Two-Dimensional Complexity: A Conceptual Structure." Pattern Recognition 4 (1972): 235-42.

1973 "Two-Dimensional Complexity Measures: A Preliminary Evaluation." Geographical Analysis 5 (1973): 5-15.

1975 "The Conceptual Locus of a Two-Dimensional Language: Some Implications for Human Responses to Two-Dimensional Displays." Geographical Analysis 7 (1975): 1-17. Board, C. 1967 "Maps as Models." in Models in Geography. R. J. Chorley and P. Haggett, eds. London: Methuen, pp. 671-719. Bonczek, R. H. 1976 "Theoretical Description of an Access Language for a General, Decision Support System." Ph.D. dissertation, Purdue University, 1976.

Bonczek, R. H., C, W. Holsapple and A. B. Whinston 1976 "Extensions and Corrections for the CODASYL Approach to Data Base Management." International Journal of Information Systems 2 (1976): 71-77. Bonczek, R. H., C. W. Holsapple and A. B. Whinston 1977a "Observations on a Generalized Intelligent Query Processor for Decision Support." Technical Paper No. 600, West Lafayette, IN: Krannert Graduate School of Management Science, Purdue University, 1977. 188

Bonczek, R, H., C. W. Holsapple and A. B. Whinston 1977b "Processing Deep Structure in a Generalized Intelligent Query Processor for Decision Support." Technical Paper No. 612, West Lafayette IN: Krannert Graduate School of Management Science, Purdue University, 1977. Bonczek, R. H., C. W. Holsapple and A. B. Whinston 1977c "Information Transferral within a Distributed Data Base Via a Generalized Mapping Language." The Computer Journal 21 (1977): 110-16. Bonczek, R. H., C. W, Holsapple and A. B. Whinston 1977d "Design and Implementation of an Information Base for Decision Makers." Proceedings of National Computer Conference Montvale NJ: AFIPS (1977): 855-63. Bonczek, R. H. and A. B. Whinston 1976 "A Generalized Mapping Language for Network Data Struc tures." International Journal of Information Systems 2 (1976): 171-85.

Bouille', F. 1978 "Structural Cartographic Data and Spatial Processes with the Hypergraph-Based Data Structure." Harvard Papers on Geographic Information Systems Vol. 5, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis.

Bracchi, G. and D. Ferrari 1971 "A Language for Treating Geometrical Patterns in Two- Dimensional Space." Communications of ACM 14 (January 1971): 26-32.

Brassel, K. E. 1978 "A Topological Data Structure for Multi-Element Map Processing." Harvard Papers on Geographic Information Systems Vol. 4, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis.

1979 "Future Tasks in Cartographic Software Development." Third International Symposium on Computer Assisted Cartography, AUTO-CARTO III, San Francisco (January 1978): 127-30. Bryant, N. A. and A. Zobrist 1978 "An Image Based Information System: Architecture for Correlating Satellite and V«'logical Data Bases." Harvard Papers on Geographic I...~>rmation Systems Vol. 4, G. Dutton, ed. Cambridge MA: Harvard University. 189

Bubenko, J. A., J r., S. Berild, E. Lindencrona-Ohlin and S. Nachmens 1976 "From Information Requirements to DBTG-Data Structures." Proceedings of Conference on Data: Abstraction, Definition and Structure in SIGPLAN Notices 8 {1976): 73-85.

Bunge, W. 1962 Theoretical Geography. Lund Studies in Geography Series C, No. 1. Cited in C. Board, "Maps as Models," in Models in Geography, R. J. Chorley and P. Haggett, eds. London: Methuen, 1967.

1968 The Philosophy of Maps. Discussion Paper No. 12, Ann Arbor, MI: University of Michigan, Department of Geography, 1968.

Carlson, E. ., J. Bennett, G. Giddings and P. Mantey 1974 "The Design and Evaluation of an Interactive Geo-Data Analysis and Display System." Information Processing 74, Proceedings of IFIP Congress, Stockholm (1974): 1057-61.

Chang, S. K , N. Donato, B. H. McCormick, J. Reuss and R. Rocchetti 1977 "A Relational Database System for Pictures." Proceedings of Workshop on Picture Data Description and Management Chicago, IEEE Society (1977): 142-49,

Chen, P. P. 1976 "The Entity-Relationship Model - Toward a Unified View of Data." ACM Transactions on Database Systems 1 (1976): 9-36. Chomsky, N. 1957 Syntactic Structures. The Hague: Mouton, 1957.

1963 "Formal Properties of Grammars." Handbook of Mathematical Psychology, R. D. Luce, R. Bush and E. Galanter, eds. New York: John Wiley and Sons, Vol. 2, pp. 323-418.

1965 Aspects of the Theory of Syntax. Cambridge, MA: MIT Press, 1965.

Chrisman, N . R. 1974 "The Impact of Data Structures on Geographic Information Processing." AUTO-CARTO I , International Conference on Automation in Cartography, Reston, VA. 190

Chrisman, N. R. 1978a "Concepts of Space as a Guide to Cartographic Data Struc tures." Harvard Papers on Geographic Information Systems Vol. 7, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis.

1978b "Comments on Data Structures to be Discussed at an Advanced Study Symposium on Topological Data Structures for Geographic Information Systems." Harvard Papers on Geographic Information Systems Vol. 4, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. CODASYL Committee 1971 Data Base Task Group Report. New York: Association for Computing Machinery, 1971. Codd, E. F. 1970 "A Relational Model of Data for Large Shared Data Banks." Communications of ACM 13 (1970): 378-87.

Cook, B. G. 1978 "The Structural and Algorithmic Basis of a Geographic Data Base." Harvard Papers on Geographic Information Systems Vo1. 4, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. Cook, B. G. and B. V. Johnson 1973 "A Computer Data Base for Regional Planning." Proceedings of the First Australian Conference on Urban and Regional Planning Information Systems, Newcastle, Queensland: np.

Cooke, D. F. 1978 "DIME Variants for Curvilinear Networks." Harvard Papers on Geographic Information Systems Vol. 4, G. Dutton , ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. Cooke, D. F. and W. F. Maxfield 1967 "The Development of a Geographic Base File and Its Use in Mapping." Proceedings URISA 5 (September 1967): 207-18.

Corbett, J. P. 1975 "Topological Principles in Cartography." AUT0-CART0 I I , International Conference on Automation in Cartography, Reston VA (1975): 61-65. 191

Dacey, M. F. 1970a "Linguistic Aspects of Maps and Geographic Information." Ontario Geography 5 (1970): 71-80.

1970b "The Syntax of a Triangle and Some Other Figures." Pattern Recognition 2 (1970); 11-31.

W l "Poly: A Two Dimensional Language for a Class of Poly gons." Pattern Recognition 3 (1971): 197-208.

Date, C. J. 1977 An Introduction to Data Base Systems. Second edition, Reading MA: Addison-Wesley, 1977. de Dainville, Fr. S. J. 1964 Le langaqe des geographes. Cited in C. Board, "Maps as Models," in Models in Geography, R. J. Chorley and P. Haggett, eds. London: Metheun, 1967. Della Vigna, P. and C. Ghezzi 1978 "Context-Free Graph Grarmars." Information and Control 37 (1978): 207-33.

Donelson, W. C. 1978 "Spatial Management of Information." Computer Graphics 12 (1978): 203-09.

Durfee, R. C. 1974 "ORMIS: Oak Ridge Regional Modeling Information System, Part I." ORNL-NSF-EP-73. Oak Ridge TN: Oak Ridge National Laboratory, 1974.

Dutton, G. , I. 1978 Harvard Papers on Geographic Information Systems. 8 vols. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. Eckert, M. 1908 "On the Nature of Maps and Map Logic." Bulletin of American Geographical Society 40 (1908): 344-51. Eden, M. 1961 "On the Formalization of Handwriting." Proceedings of the AMS, Applied Mathematics Symposia 12 (1961): 83-88.

1562 "Handwriting and Pattern Recognition." IRE Transactions of Information Theory IT-8, 2 (February 1962): 160-66. 192

Edson, D. T 1975 "A Digital Cartographic Data Base." AUTO-CARTO 11. International Conference on Computer Assisted Carto graphy, Reston, VA (1975): 523-38. Edson, D. T . and G. Y. G. Lee 1977 "Ways of Structuring Data Within a Digital Cartographic Data Base." Computer Graphics 11 (1977): 148-57. Edwards, R. G., R. C. Durfee and P. R. Coleman 1978 "Definition of a Hierarchical Polygonal Data Structure and the Associated Conversion of a Geographic Base File from Boundary Segment Format." Harvard Papers on Geographic Information Systems Vol. 4, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. Einstein, A 1953 Forward to first edition of Max Jammer, Concepts of Space: The History of Theories of Space in Physics. 2nd edition, Cambridge MA: Harvard University Press, 1956. Elassal, A. A. 1978 "U.S.G.S. Digital Cartographic File Management System." Proceedings of Digital Terrain Models Symposium St. Louis, ASP, ACSM (May 1978): 16-23. Feagas, R. G. 1978 "The Graphic Input Procedure—An Operational Line Segment/ Polygon Graphic to Digital Conversion." Harvard Papers on Geographic Information Systems Vol. 7, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. Feder, J. 1966 "The Linguistics Approach to Pattern Analysis: A Literature Survey." Report #400-133, New York: Department of Electrical Engineering, New York University.

1971 "Plex Languages." Information Sciences 3 (1971): 225-41.

George, J. E. 1971 "GEMS - A Graphical Experimental Meta System." Ph.D. dissertation, Stanford University, Ann Arbor MI: University Microfilms.

1972 "A Graphical Meta System." Graphic Languages, F. Nake and A. Rosenfeld, eds. Amsterdam: North-Holi and Publishing Co. (1972): 83-105. 193

Go, A., M. Stonebraker and C. Williams 1975 "An Approach to Implementing a GEO-DATA System." Proceedings of 1975 ACM SIGGRAPH-SIGMOD Workshop on large Databases for Interactive Design Waterloo, Ontario (September 1975): 67-77. Gold, C. M 1978 "The Practical Generation and Use of Geographic Triangular Element Data Structures." Harvard Papers on Geographic Information Systems Vol. 5, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. Gray, J. C 1967 "Compound Data Structure for Computer-Aided Design - A Survey." Proceedings of ACM 22nd National Conference Wayne, PA (1967): 355-65.

Guptill, S . C. 1978 "The Impact on Computer Graphics Data Manipulation Soft ware, and Computing Equipment on Spatial Data Structures." Harvard Papers on Geographic Information Systems Vol. 2 G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. Hainaut, J . and B. Lecharlier 1974 "An Extensible Semantic Model of Data Base and Its Data Language." Information Processing 74, Proceedings of IFIP Congress, Stockholm, Sweden (1974): 1026-30.

Haralick, R. M. and L. G. Shapiro 1978 "A Data Structure for Spatial Information Systems." Proceedings of the International Symposium on Computer- Assisted Cartography, AUT0-CART0 IV ACSM, Washington D.C. "(November 1979j. Harvey, D. 1969 Explanation in Geography. London: Edward Arnold, 1969. Haseman, W . D. and A. B. Whinston 1977 Introduction to Data Management. Homewood IL: Richard D. Irwin, Inc. 1977.

Heindel, . E. and J. T. Roberto 1975 LANG-PAK - An Interactive Language Design System. New York: American Elsevier Publishing Company, Inc., 1975.

Holmes, H. , D. Austin and W. Benson 1974 "The MAPEDIT System for Automatic Map Digitization." Technical Papers of the ACSM 40th Annual Meeting, St. Lou is (Ma rch 1974). 194

IEEE Computer Society 1975 Proceedings, Conference on Computer Graphics, Pattern Recognition and Data Structures. Los Angeles (1975).

1977 Proceedings of Workshop on Picture Data Description and Management. Chicago fApril 1977). Jammer, M. 1956 Concepts of Space: The History of Theories of Space in Physics. Cambridge MA: Harvard University Press, 2nd edition, 1956. Kirsch, R. 1964 "Computer Interpretation of English Text and Picture Patterns." IEEE Transactions EC-13 (August 1964): 363-76. Klinger, A., ■ . S. Fu and T. L. Kunii 1977 Data Structures, Computer Graphics and Pattern Recognition. New York: Academic Press, 1977. Kulsrud, H. E 1968 "A General Purpose Graphic Language." Communications of ACM 11 (April 1968): 247-54.

Kunii, T. L ., S. Weyl and J. M. Tenebaum 1974 "A Relational Data Base Schema for Describing Complex Pictures with Color and Texture." Proceedings of Second International Joint Conference on Pattern Recognition Copenhagen (August 1974): 310-16. Lakoff, G. 1971 "On Generative Semantics." Semantics: An Interdisciplinary Reader in Philosophy, Linguistics and Psychology^ Steinberg and Jakobovits, eds. Cambridge: Cambridge Univeristy Press (1971): 232-96. Ledley, R. S. 1963 Programming and Utilizing Digital Computers. New York: McGraw-Hill, 1963.

1964 "High Speed Automatic Analysis of Biomedical Pictures." Science 146 (1964): 216-23.

Linders, J. G 1975 "Computer Assisted Cat „ography and Geographic Data Bases." Data Bases for Interactive Design. W. van Cleemput and J. G. Linders, eds. Waterloo, Ontario: ACM (1975): 161-69. 195

Liskov, B. and S. Zilles 1974 "Programming with Abstract Data Types." SIGPLAN Notices 9 (April 1974).

Little, 0. J. 1978 "Strategies of Interfacing Geographic Information Systems." Harvard Papers on Geographic Information Systems Vol. 5, G. Dutton, ed. Cambridge MA: Harvard University Labora tory for Computer Graphics and Spatial Analysis.

McKeown, D. M., Jr. and D. R. Reddy 1977 "A Hierarchical Symbolic Representation for an Image Database." Proceedings of Workshop on Picture Data Description and Management IEEE Computer Society, Chicago (April 1977): 40-44.

Males, R. 1978 "ADAPT - A Spatial Data Structure for Use with Planning and Design Models." Harvard Papers on Geographic Infor mation Systems Vol. 3, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. Mark, D. 1978 "Concepts of 'Data Structure' for Digital Terrain Models." Proceedings of Digital Terrain Models Symposium ASP, St. Louis MO (1978): 24-31, Martin, J, 1977 Computer Data-Base Organization. Second edition, Englewood Cliffs NJ: Prentice-Hal1 , 1977. Merrill, R 1973 "Representation of Contours and Regions for Efficient Computer Search." Communications of the ACM 16 (1973): 69-82.

Miller, W. and A. Shaw 1968 "Linguistic Methods in Picture Processing: A Survey." Proceedings FJCC 33 pt. 1 (1968): 279-90.

Moellering , H. 1975 "Interactive Cartography." AUTO-CARTO I I . Second Inter national Conference on Automation in Cartography, Reston VA (1975): 415-21.

* 1976 "Real and Virtual Maps." Paper presented at the meetings of the Association of American Geographers, Salt Lake City, 1976. 196

Moellering, H 1977 "Interactive Cartographic Design." Technical Papers of the ACSM 40th Annual Meeting Washington D.C. (1977): 516-30.

1980 "Strategies of Real Time Cartography." forthcoming in Cartographic Journal, 1980. Moles, A. A. 1964 "Theorie de 1'information et message cartographique." Sciences et Enseignement des Sciences 5 (1964): 11-16. Cited in C. Board, "Maps as Models," in Models in Geography R. J. Chorley and P. Haggett, eds. London: Metheun, 1967.

Montanari, U. G 1970 "Separable Graphs, Planar Graphs and Web Grammars." Information and Control 16 (1970): 243-367. Morrison, J. L. 1974a "Changing Philosophical-Technical Aspects of Thematic Cartography." The American Cartographer 1 (1974): 5-14.

1974b "A Theoretical Framework for Cartographic Generalization with Emphasis on the Process of Symbolization." Inter national Yearbook of Cartography XIV (1974): 115-27.

1976 "The Science of Cartography and Its Essential Processes." International Yearbook of Cartography XVI (1976): 84-97.

1978 "Towards a Functional Definition of the Science of Cartography." The American Cartographer 5 (1978): 97-110.

Nake, F. and A. Rosenfeld, Eds. 1962 "Panel Discussion - Are Picture-Grammars of Any Use in Scene Analysis?" Graphic Languages. Amsterdam: North- Holland Publishing Co. (1962): 225-43.

Narasimhan, i 1962 A Linguistic Approach to Pattern Recognition. Digital Computer Laboratory Report No. 121. Urbana IL: University of Illinois.

1964 "Labeling Schemata and Syntactic Descriptions of Pictures." Information and Control 1 (1964): 151-79. 197

Narasimhan, R. 1966 "Syntax-Directed Interpretation of Classes of Pictures." Communications of the ACM 9 (1966): 166-73.

1974 "The Role of Syntactic Models in Picture Processing." Information Processing 74, Proceedings of I FIP Congress, Stockholm (1974): 743-47. Nauta, D. 1972 The Meaning of Information. The Hague: Mouton, 1972. Ota, P. A. 1975 "Mosaic Grammars." Pattern Recognition 7 (1975): 61-65.

Palmer, I. R. 1974 "Levels of Data Base Description.11 Information Processing 74. Proceedings of IFIP Congress, Stockholm (1974): 1026-30. Pavlidis, T. 1972a "Grarmiatical and Graph Theoretic Analysis of Pictures." in Graphic Languages, F. Nake and A. Rosenfeld, eds. Amsterdam: North-Holland Publishing Co. (1972): 210-24.

1972b "Linear and Context-Free Graph Grammars." Journal of ACM 19 (1972): 11-22. Petchenik, B. B. 1974 "A Verbal Approach to Characterizing the Look of Maps." American Cartographer 1 (1974): 63-71. Peucker, T. K. 1972 Computer Cartography. Commission on College Geography Resource Paper No. 17, Association of American Geographers.

1974 The Interactive Map in Urban Research: Report After Year One. Vancouver: University of British Columbia, 1974.

* 1978 "Data Structures for Digital Terrain Models: Discussion and Comparison." Harvard Papers on Geographic Information Systems Vol. 2, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. 198

Peucker, T. K. and N. Chrisman 1975 "Cartographic Data Structures." The American Cartographer 2 (1975): 55-69.

Peuquet, D. 1979 "Raster Data Handling in Geographic Information Systems." The American Cartographer 6 (1979): 129-39.

Pfaltz, J. L. and A. Rosenfeld 1969 "Web Grammars." Proceedings of First International Joint Conference on Artificial Intelligence Washington, D.C. (1969): 609-19.

Phillips, R. L. 1974 "Computer Graphics in Urban and Environmental Systems." Transactions of IEEE 62 (1974): 437-52.

1977 "A Query Language for a Network Data Base with Graphical Entities." Computer Graphics 11 (1977): 179-85.

Rhind, D. 1976 "Towards Universal, Intelligent and Usable Automated Cartographic Systems." ITC Journal (1976): 515-45.

1977 "Computer-Aided Cartography." Institute of British Cartographers, Transactions New Series N.2 (1977):71-97-

Riffe, P. 1970 "Conventional Map, Temporary Map, or Nonmap?" Inter national Yearbook of Cartography X (1970): 95-103.

Roberts, J. A. 1962 "The Topographic Map in a World of Computers." Profes sional Geographer 14 (1962): 12.

Robinson, A. H. 1952 The Look of Maps. Madison WI: University of Wisconsin Press, 1952.

I960 Elements of Cartography. Second edition. New York: John Wiley and Sons, 1960.

1977 "Research in Cartographic Design." The American Carto grapher 4 (October 1977): 63-69. 199 Robinson, A. H. and B. Bartz-Petchenik 1976 The Nature of Naps. Chicago: The University of Chicago Press, 1976. Robinson, A. H., J. L. Morrison and P. C. Muehrcke 1977 "Cartography 1950-2000." Institute of British Geographers, Transactions 2 (1977): 3-18. Robinson, A. H.t R. Sale and J. L. Morrison 1978 Elements of Cartography. Fourth edition, New York: John Wiley and Sons, 1978. Rosenfeld, A. 1969 "Picture Processing by Computer." Computing Surveys 1 (1969): 147-74.

1975 "A Survey of Picture Processing: 1974." Computer Graphics and Image Processing 4 (1975): 133-35. Rosenfeld, A. and A. C. Kak 1976 Digital Picture Processing. New York: Academic Press, 1976. Rosenfeld, A. and D. L. Milgram 1972 "Web Automata and Web Grammars." Machine Intelligence 7 (1972): 307-24. Rosenfeld, A. and J. P. Strong 1971 "A Grammar for Maps." Proceedings of Computer and Infor mation Sciences, Software Engineering Miami Beach, 1969, J. tou, ed. New York: Academic Press (1971): 227-39. Schmidt, A. H. and W. A. Zafft 1975 "Programs of the Harvard University Laboratory for Computer Graphics and Spatial Analysis." Display and Analysis of Spatial Data J. C. Davis and M. J. McCullagh, eds. London: John Wiley (1975): 231-43. Schmidt, W. 1969 "The Automap System." Survey and Mapping (March 1969): 101-06. Schmidt-Falkenburg, H. 1962 "Grundlinien einer Theorie der Kartographie." Nachricten aus dem Karten-und Vermessungs-weson 1 (1962): 5-37. Cited in C. Board, "Maps as Models." in Models in Geography R. J. Chorley and P. Haggett, eds. London: Metheun, 1967. 200

Scripter, M. W. 1969 "Choropleth Maps on Small Digital Computers." Proceedings of the Association of American Geographers (1969): 133-36.

Senko, M. E. 1976 "DIAM II and Levels of Abstraction." Proceedings of Conference on Data: Abstraction, Definition and Structure in SIGPLAN Notices 8 (March 1976): 121-40.

Senko, M. E. , E. B. Altman, M. M. Astrahan and P. L. Fehder 1973 "Data Structures and Accessing in Data Base Systems." IBM Systems Journal 12 (1973): 30-93.

Shamos, M. I . and J. L. Bentley 1978 "Optimal Algorithms for Structuring Geographic Data." Harvard Papers on Geographic Information Systems Vol. 6, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis.

Shannon, C. E. and W. Weaver 1949 The Mathematical Theory of Communication. Urbana IL: University of Illinois Press, 1949.

Shapiro, L. G. 1979 "Data Structures for Picture Processing: A Survey." Computer Graphics and Image Processing 11 (1979): 162-84.

Shepi. o, L. G. and R. M. Haralick 1979 "A Spatial Data Structure." Technical Report #C579005-R, Blacksburg VA: Computer Science Department, Virginia Polytechnic Institute and State University. Shaw, A. C. 1968 "The Formal Description and Parsing of Pictures." Ph.D. dissertation, Stanford Univeristy, Ann Arbor MI: University Microfilms, 1968.

1969a "A Formal Picture Description as a Basis for Picture Processing Systems." Information and Control 14 (1969): 9-52.

1969b "On the Interactive Generation and Interpretation of Artificial Pictures." SLAC-PUB-664, Stanford Linear Accelerator Center, Stanford University. Presented at the 1969 ACM/SIAM/IEEE Conference on Mathematics and Computer Aids to Design, Anaheim CA. 201

Shaw, A. C. 1970 "Parsing of Graph-Representable Pictures." Journal of ACM 17 (July 1970): 453-81.

* 1972 "Picture Graphs, Grammars and Parsing." Frontiers of Pattern Recognition S. Watanabe, ed. New York: Academic Press (1972): 492-510. Simpson, S. N., Jr. 1954 "Least Squares Polynomial Fitting to Gravitation Data and Density Plotting by Digital Computers." Geophysics 19 (1954): 250-57.

Smith, J. M. and D. C. P. Smith 1977 "Database Abstractions: Aggregation and Generalization." ACM Transactions on Database Systems 2 (1977): 105-33.

Switzer, W. A. 1975 "The Canadian Geographic Information System." Automation in Cartography J. M. Wilford-Brickford, R. Bertrand and L. van Zuylen, eds. The Netherlands: International Cartographic Association (1975): 58-81. Taketa, R. A. 1979 "Structure and Meaning in Map Generalization." Ph.D. dissertation, University of Washington, 1979.

Tanimoto, S. L. and T. Pavlidis 1975 "A Hierarchical Data Structure for Picture Processing." Computer Graphics and Image Processing 4 (1975): 104-19.

Thomas, A. L. 1978 "Data Structures for Modeling Polygonal and Polyhedral Objects." Harvard Papers on Geographic Information Systems Vol. 5, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. Tobler, W. 1959 "Automation and Cartography." Geographical Review 49 (1959): 526-34.

T$69 "Geographic Filters and Their Inverses." Geographical Analysis 1 (1969): 234-53.

1976 "Analytical Cartography." The American Cartographer 3 (1976): 21-31. 202

Tomlinson, R. F. 1968 "A Geographic Information System for Regional Planning.'1 Land Evaluation. Papers of CSIRO, G. A. Steward, ed. South Melbourne, Australia: MacMillan of Australia (1968): 200-10.

1972 Geographical Data Handling. Proceedings of the Inter national Geographical Union, vol. 2, Ottawa, Canada.

1979 "Difficulties Inherent in Organizing Earth Data in a Storage Form Suitable for Query." Third International Symposium on Computer Assisted Cartography, AUTO-CARTO III San Francisco (January 19^8): 181-201.

Tomlinson, R.F., H. W, Caulkins and D. F. Marble 1976 Computer Handling of Geographical Data. Paris: UNESCO Press, 1976.

Tompa, F. I. 1977 "Data Structure Design." Data Structures, Computer Graphics and Pattern Recognition. A. Klinger, K. S. Fu and T. I. Kunii, eds. New York: Academic Press (1977): 3-30. Topfer, F. and W. Pillewizer 1966 "The Principles of Selection," (with Introduction by S. H, Maling) The Cartographic Journal 3 (1966):10-16.

Toulmin, S 1953 The Philosophy of Science: An Introduction. London: Hutchinson and Co., 1953.

Ward, J. H 1963 "Hierarchical Grouping to Optimize an Objective Function." Journal of the American Statistical Association 58 (1963):"236-44. Waugh, T. C. and D. R. F. Taylor 1976 "GIMMS/An Example of an Operational System for Computer Cartography." Canadian Cartographer 13 (1976); 158-66,

Weber, W, 1978 "Three Types of Map Data Structures, Their ANDs and NOTs and a Possible OR." Harvard Papers on Geographic Information Systems Vol. 4, G. Dutton, edl Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis. 203

Westman, R. S. 1977 "Environmental Languages and the Functional Bases of Animal Behavior." Quantitative Methods in the Study of Animal Behavior B. A. Hazlett, ed. New York: Academic Press (1977): 145-201.

White, M. 1979 "A Survey of the Mathematics of Maps." Proceedings of the Fourth International Symposium on Computer Assisted Cartography, AUT0-CART0 IV Reston VA (November 1979).

Wiederhold, G. 1977 Database Design. New York: McGraw-Hill, 1977. Williams, R. 1971 "A Survey of Data Structures for Computer Graphics Systems." Computing Surveys 3 (1971): 1-21.

1974 "On the Application of Relational Data Structures in Computer Graphics." Proceedings of IF1P Congress, Information Processing 74, Stockholm, Sweden: North- Holland (1974): 722-26.

Wirth, N. 1975 Algorithms + Data Structures = Programs. Englewood Cliffs NJ: Prentice-Hall, 1976.

Woods, W. A. 1975 "What's in a Link: Foundations for Semantic Network." Representation and Understanding D. G. Bobrow and A. Collins, eds. New York: Academic Press (1975): 35-82. Wright, J. 1942 "Map Makers are Human: Comments on the Subjective In Maps." Geographical Review 32 (1942): 527-44.

Youngman, C E. 1978 "A Linguistic Approach to Map Description." Harvard Papers on Geographic Information Systems Vol. 4, G. Dutton, ed. Cambridge MA: Harvard University Laboratory for Computer Graphics and Spatial Analysis.