Open Pike Thesis.Pdf

Home , Existential graph

The Pennsylvania State University

The Graduate School

College of Earth and Mineral Sciences

AUGMENTING COLLABORATION THROUGH SITUATED

REPRESENTATIONS OF SCIENTIFIC KNOWLEDGE

A Thesis in

Geography

William A. Pike

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

May 2005

The thesis of William A. Pike was reviewed and approved* by the following:

Mark N. Gahegan Professor of Geography Thesis Adviser Chair of Committee

James Martin Associate Professor of Psychology, Emeritus

Brent M. Yarnal Professor of Geography

Roger M. Downs Professor of Geography Head of the Department of Geography

* Signatures are on file in the Graduate School

Abstract Information systems that support scientific collaboration often facilitate the sharing of tangible resources, such as data files, as a proxy for sharing the knowledge embedded in or emerging from those resources. Current computational aids to science work thus do little to support knowledge-based inquiry; the human knowledge that creates meaning out of analyses is often only recorded when work reaches publication – or worse, left unrecorded altogether – for lack of an abstract model for scientific concepts that can capture knowledge as it is created and used. In this research, concepts rather than datasets are treated as the primitive elements of scientific inquiry. A model for scientific concepts is developed that incorporates representation of (1) the situated processes of science work, (2) the social construction of knowledge, and (3) the emergence and evolution of understanding over time. In this model, knowledge is the result of collaboration, negotiation, and manipulation by teams of researchers. Capturing the situations in which knowledge is created and used helps these collaborators discover areas of agreement and discord, while allowing individual inquirers to maintain different perspectives on the same information. The capture of provenance information allows historical trails of reasoning to be reconstructed, revealing the process by which knowledge is adopted, revised, and reused in a community. This work leverages advancement in the areas of cyberinfrastructure and the Semantic Web to produce a proof-of-concept system, called Codex, based on this situated knowledge model. Codex supports visualization of knowledge structures and inference across those structures. The proof-of-concept is deployed in two collaborative application contexts, human-environment interaction and geoscience. These use cases demonstrate the viability of Codex to support distributed teams of learners and researchers by encouraging greater appreciation for shared understanding.

iii Table of Contents List of Figures...... vi Acknowledgements...... vii Chapter 1 – Introduction ...... 1 1.1 ‘As We May Think’...... 1 1.2 The Problem...... 2 1.3 Goals and Objectives ...... 3 1.4 Application Cases ...... 6 1.4.1 Application to Geographic and Geologic Cyberinfrastructure ...... 6 1.5 Summary of impacts ...... 7 1.6 Outline of the Thesis...... 8 Chapter 2 – The Construction of Knowledge ...... 9 2.1 The Ingredients of Knowledge...... 10 2.1.1 Knowledge ...... 10 2.1.2 Concepts and Contexts...... 10 2.1.3 Situations and Perspectives...... 12 2.2 The Nature of Scientific Knowledge ...... 15 2.2.1 Aristotelian Categorization ...... 15 2.2.2 Kantian Judgment ...... 16 2.2.3 Peircean Inquiry...... 17 2.2.4 Hermeneutic Interpretation ...... 20 2.3 Contemporary Motifs in Scientific Cognition ...... 23 2.3.1 Positivism and Paradigms...... 23 2.4 Summary...... 24 Chapter 3 – Knowledge in Computing and Geography...... 26 3.1 Do Our Tools Do Justice to Our Ideas?...... 26 3.2 Trends in Knowledge-Based Computing...... 26 3.2.1 Computational Knowledge Representations...... 27 3.2.2 Computer-Supported Cooperative Work ...... 31 3.2.3 Cyberinfrastructure ...... 34 3.3 Handling Knowledge in Geographic Computing ...... 36 3.4 Toward Situated Knowledge in Information Systems...... 38 3.4.1 Interlude: Tying a Brick to a Pencil...... 38 3.4.2 Computing with Perspectives ...... 39 3.5 Summary...... 41 Chapter 4 – Untying the Brick: Situating Knowledge in a Distributed Computational Environment...... 42 4.1 Toward a Modern Memex ...... 42 4.2 Infrastructure for Collaborative Understanding: From Memex to Codex ...... 43 4.2.1 Codex: A Knowledge-Based Portal...... 45 4.2.2 Codex Architecture ...... 47 4.3 Modeling Knowledge in Codex...... 50 4.3.1 Concepts and Contexts...... 50 4.3.2 Situations and Perspectives...... 54 4.4 Interacting with Situated Concept Networks ...... 57

iv 4.4.1 Describing Knowledge Structures with Concept Maps...... 58 4.4.2 Creating and Using Situations through Concept Mapping ...... 60 4.5 Summary...... 64 Chapter 5 – Putting Perspectives into Practice ...... 65 5.1 Assessing the Impact of New Tools...... 65 5.2 Rubrics for Evaluating Knowledge Management Systems ...... 66 5.3 Use Cases...... 70 5.3.1 Aiding Pedagogy in Human-Environment Relations ...... 70 5.3.2 An Idea Lab for Distributed Geoscience ...... 75 5.4 Achieving the Objectives of Situated Science...... 81 5.5 Summary...... 83 Chapter 6 – Social Wisdom through Situated Computing...... 84 6.1 Memex Revisited ...... 84 6.2 The Philosophy of Science and the Science of Computing ...... 85 6.3 Practicing Situated Science...... 86 6.4 Making Social Knowledge Sharing Better ...... 88 6.4.1 Richer Concept Definitions...... 89 6.4.2 Improved Perspective Comparison...... 90 6.4.3 The Knowledge-Driven Desktop ...... 91 6.4.4 New Knowledge Sharing Modalities...... 93 6.5 Moving Knowledge across the Human-Computer Interface ...... 95 References...... 96

v List of Figures Figure 2.1. Nexus of contextual elements...... 12 Figure 2.2. Context surrounding the creation or use of an individual resource or concept...... 14 Figure 2.3. The triadic relationship among elements of a sign...... 19 Figure 2.4. The progression from Aristotelian classification to hermeneutic interpretation...... 23 Figure 3.1. XML, RDF, and OWL...... 29 Figure 3.2. OWL snippet from SWEET physical substances ontology...... 29 Figure 3.3. Existential and Conceptual and Graphs...... 30 Figure 3.4. Traditional information system design...... 40 Figure 3.5. Hermeneutic view of information systems...... 41 Figure 4.1. Excerpt from Codex Leicester...... 44 Figure 4.2. 1945 rendering of Bush’s Memex as a mechanical desktop...... 45 Figure 4.3. The home page for a Codex user’s workspace...... 46 Figure 4.4. Codex architecture...... 48 Figure 4.5. Semiotic view of Concept structure in Codex...... 51 Figure 4.6. Simplified OWL implementation of a representative concept in Codex...... 53 Figure 4.7. Detecting inferential situations in Codex resource descriptions...... 55 Figure 4.8. Perspectives filter a complex information space according to particular situations.. 56 Figure 4.9. The Memex user interface...... 58 Figure 4.10. Codex concept map client...... 60 Figure 4.11. Flow of control for a sample “add new resource” action in Codex...... 60 Figure 4.12. Sample Codex control panels...... 61 Figure 4.13. Four perspectives on a “seismic velocity”...... 63 Figure 5.1. Reflexive development of both human understanding and technological systems.... 67 Figure 5.2. Electronic notebook based on a static directory structure...... 71 Figure 5.3. Existing ontological structure used to extend a user-defined structure...... 72 Figure 5.4. The union of multiple user perspectives...... 73 Figure 5.5. The search for consilience between two knowledge structures...... 74 Figure 5.6. Using Codex to reconstruct the creation of meaning...... 76 Figure 5.7. Extending the sea level concept...... 77 Figure 5.8. Using Codex to register a concept at the CoI level to different CoP ontologies...... 78 Figure 5.9. Evolution of “Depositional environment” concept through use...... 79 Figure 5.10. Trails of signification in Codex concept representation...... 80 Figure 6.1. Codex users can select a group with which to share each resource...... 87 Figure 6.2. Describing concept intension through dimensional and fuzzy approaches...... 90

vi Acknowledgements I am grateful to my committee members for their guidance, not just on this product, but on my intellectual and personal growth over the years I spent at Penn State. I owe particular thanks to my advisor, Mark Gahegan, for his unwavering support and even occasional likemindedness. The GEON and HERO projects were full of valuable collaborators who, often indirectly but sometimes frankly, refocused my thinking and ensured that it remained relevant to real problems. Both Mark and Brent, in their capacity as my supervisors on those projects, extended me a great deal of autonomy in pursuing my interests, which was an extraordinary luxury indeed.

To my friend Isaac, who always stayed just one step ahead of me in his own graduate work, thank you for scouting the territory that I would soon cross. Ola Ahlqvist was an indispensable discussion partner with whom I spent many hours working through hard problems, even when we both knew there would be no solution. My Great Aunt Ruth, the first of the Dr. Pikes, provided sage advice and gustatory support throughout my graduate career. Finally, special gratitude is due my wife Lindsay, companion and resident grammarian, for teaching me when it’s acceptable to end a sentence with a participle, what the difference between were and was is, and most important, when it’s ok to just stop working altogether.

vii Chapter 1 – Introduction

Science has provided the swiftest communication between individuals; it has provided a record of ideas and has enabled man to manipulate and to make extracts from that record so that knowledge evolves and endures throughout the life of a race rather than that of an individual.

There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers - conclusions which he cannot find time to grasp, much less to remember, as they appear. Yet specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial. Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose. If the aggregate time spent in writing scholarly works and in reading them could be evaluated, the ratio between these amounts of time might well be startling.

…A record, if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted. (Bush, 1945)

1.1 ‘As We May Think’ Vannevar Bush’s vision for the future of a scientific enterprise in postwar transition expresses a refrain that remains common today. The scientific community’s ability to generate new information – ever more detailed observations, about more diverse phenomena – outpaces its ability to turn these measurements into useful knowledge. What insight was discovered and then forgotten, or discovered but never communicated?

Bush’s prescience is often thought to anticipate modern technological aids to science work, such as hypertext-linked online libraries. But the implications of this vision suggest an even more fundamental rethinking of how scientists and other information workers create, revise, and communicate understanding. The title Bush gave to his argument, As We May Think, suggests that it is the practice of science, not just the products, that reveals understanding. The technological advances that Bush foresaw need to be seen not just as ways to access stored knowledge but as aids to navigating and constructing representations of knowledge as we think. Recognizing that understanding originates from networks of ideas that transcend the boundaries of single information sources, disciplines, people, and perspectives, Bush looks toward a recordkeeping mechanism that can mimic, to a certain degree, the deep structure of scientific explanations created in the mind. The tool he proposes, called Memex (discussed further in Chapter 3), emphasizes an associationist perspective in which the act of making connections between information resources is central to scientific discovery.

Despite the speed and flexibility with which we can access contemporary digital resources, we realize that mere access to hyperlinked trails of information is not enough to render scientific problems more tractable. Hypertext links can provide threads through dense information spaces,

1 but they do little to express the understanding that goes into the creation or use of such a thread (although in fairness this is a limitation by design). For some of the purposes Bush envisioned, such as encyclopedic cross-referencing of terms, hypertext is an appropriate and perhaps even ideal solution, but for the wider problem of encoding deep knowledge structures – the how and why of science – the purely syntactic expression of current scientific records is insufficient.

1.2 The Problem The sciences produce volumes of information, from hyperspectral sensors taking hundreds of measurements each hour, to thousands of millions of words in journal articles published every year. The problem is not that there is no wisdom contained in these resources, nor that contemporary science is at a standstill for its inability to make sense of increasingly complex descriptions of the world – quite the opposite, and that is the problem. How do we make efficient and effective use of that knowledge? Even within a single discipline, the variety of information types and analytical methods brought to bear on a problem can complicate assessing commensurability between researchers’ approaches. Geography is a classic example of this difficulty, since geographers approach problems from subdisciplines characterized by both methodological and philosophical differences. While these differences are central to geography’s ability to tackle integrative problems, differences can also hinder synthesis between collaborators that do not understand, or do not appreciate, each other’s perspectives. A typically geographic problem such as land use change might be explored from the points of view of policy, image analysis, social justice, and climate change, among others. But the clearest picture of the problem might only be painted when these points of view are integrated into an explanation broader than any one alone could provide. The communication of conceptual models between collaborators is crucial to accomplishing this integration, especially in environmental applications (Heemskerk et al., 2003).

The geographic information science literature has recently become bloated with efforts to represent geographic “concepts” computationally, but, as will be discussed later, the prevailing view of a concept in geographic research is as a category label useful for integrating heterogeneous data sources. Geographic data contains knowledge, to be sure, and it is used to create and apply knowledge, but that knowledge is not yet represented well. As a result, geographic information integration tasks are often data-centric; semantics are important to the extent that they support data interoperability, but the human knowledge and practices that guided the collection or use of that data remain implicit somewhere in the data’s syntax or schema. A land cover map, for instance, says something about the place it depicts, although what it says to an individual researcher is either locked in the data, locked in the researcher’s head, or described elsewhere in natural language text. In any case, it is not easily accessible to others who want to know how or why to use this information (say, to devise a new theory), or whether it went into any existing theories. For a field in which meaning depends, in part, on the subjective perspectives of its inquirers, a restricted view of what constitutes a concept does not do justice to the complexity of geographic knowledge structures.

This work approaches the problem of capturing, storing, and communicating scientific knowledge by treating science foremost as a process. Knowledge is constructed and applied during this process as observations are collected and manipulated, hypotheses generated and tested, and results transmitted and built upon. Here, concepts rather than datasets are the

2 primitive elements of geographic inquiry (as one example of scientific inquiry in general). This research emphasizes interoperability of ideas, not simply data; it recognizes that the knowledge these ideas embody is by turns a shared and contested conceptualization, the result of collaboration, negotiation, and manipulation by teams of researchers.

While the characterization of science as a knowledge-construction activity is widely held, current computational aids to science work do little to support a knowledge-based process. Where processes are considered, they are often workflows at the lowest levels of abstraction (such as protocols for moving data through a series of analysis tools). The human knowledge that creates meaning out of this analysis is often only recorded when work reaches publication – or worse, left unrecorded altogether – for lack of an abstract model of scientific concepts that can capture knowledge as it is created and used. This thesis examines how we can make the scientific record more useful by creating just such a model – a model unique in geographic information science for its explicit support of individual perspectives on a problem. Leveraging contemporary computational techniques, scientific concepts will be represented as cooperatively constructed, experientially grounded, and semantically interoperable resources capable of reflecting their evolution, in multiple contexts, over time.

1.3 Goals and Objectives Recent approaches to knowledge sharing in computational environments fall into two broad categories. One devises tools and methods to describe the semantics of concepts in machine- readable formats, often modeling knowledge through hierarchical ontologies. This approach is characterized by projects such as Cyc (Guha and Lenat, 1991), a “general knowledge” base and associated text-based browser for encyclopedic information (a sort of top-down, authoritative knowledge model). Ontological tools such as this focus mainly on enabling sharable underlying representations of knowledge (ontologies in standard exchangeable notations),1 and less on interfaces and supporting infrastructure to let collaborators construct this knowledge together. The second category of knowledge sharing method emphasizes the bottom-up, discursive nature of knowledge. This approach acknowledges the perspectives of collaborating inquirers (rather than an imposed ontology) in defining concepts relevant to a community. The cooperative approach is evident in computer-mediated communication methods such as the Delphi method (Turoff and Hiltz, 1996), where the aim is to generate shared understanding (or areas of disagreement) over time. Cooperative tools focus on effective interfaces to collaborative work, but at the expense of underlying representations of that work that are not interoperable and cannot be repurposed in other systems.

Between these two views of knowledge sharing lies an opportunity for combining machine- readable, standardized representations of knowledge with the ability for communities to elicit and refine them over time. The research described here leverages the benefits of both approaches to develop knowledge structures and associated infrastructure. While ontological representations of knowledge (at various levels of generality, from task to domain (Guarino,

1 This is the meaning of ontology in information science, where ontology amounts to formally describing the salient entities in a domain, and their categories and relations. In philosophy, ontology refers more broadly to questions of being. Throughout this work the information science definition will be used, and it will be against this definition that Chapter 2 broadens the discussion of knowledge representation to accommodate philosophical concerns. 3 1997)) have been described as no less than a “silver bullet” (Fensel, 2001) for information integration – and have been rapidly accepted in the geographic information science community – there has been relatively little reflection on how conceptual structures emerge from practice and how they can reflect the evolving nature of that practice. The top-down imposition of an ontologist’s domain model is an easy (and sometimes effective) method of describing the concepts relevant to a problem, but it can come at the expense of understanding how practitioners themselves construct meaning collaboratively, especially in ill-structured problems where there may not even be initial agreement about nature of the domain itself. Usually, it is the experts in ontology who determine how to represent a domain, not communities of practitioners. Moreover, the semantics that are captured by common ontological approaches often fail to reflect the fluidity with which concepts and their relations change through cooperative work (here, semantics refers to the relationships between symbols in a representational notation and their meaning in the mind). Elusive concepts in geography, such as human vulnerability to environmental change, are not easily described using static ontologies, let alone ontologies that are intended to apply across an entire domain. Methods from the cooperative work community, on the other hand, address the bottom-up nature of knowledge construction but generally lack semantic richness; the expressions of meaning they produce are not routinely grounded in knowledge representations that allow concepts to be efficiently shared, searched, and reused in other problems or by other tools. The present study brings cooperative construction and emergence to ontologies, and richer semantics to cooperative tools.

This work will create a formal model for the computational representation of concepts that incorporates elements of both ontological structure and situated evolution. The goals in furtherance of this research are threefold: (1) Theory: Develop a theoretical model for concept representation that accounts for evolution and context. The model will be described in both natural and diagrammatic languages. (2) Implementation: Translate the theoretical model into a schema for a standard notation for knowledge sharing in computational environments (e.g., OWL, the Web Ontology Language; (McGuinness and van Harmelen, 2003)). The model will serve as a foundational structure to support the representation of evolving knowledge in collaborative systems. (3) Evaluation: Demonstrate that (i) the implementation accurately reflects the theoretical model, (ii) it can be integrated into the practice of science through concept capture and exploration activities, and (iii) it meets the requirements of user communities. The implemented model will be the core of a Web portal for cooperative knowledge capture and sharing.

The model for scientific concepts that will be developed here is intended to occupy a level of abstraction above that of the category label or ontological hierarchy; a category, an ontology element, or an entire ontology are simply types of concepts.

The following objectives will serve as requirements for the model, in their capacity as characteristics of a successful concept representation, and hence measures to assess the achievement of this study’s goals:

4 (a) Concepts are grounded in the situated practice of science work and are never independent of the processes by which they are constructed and used. The representation will expose the manner by which concepts are created and revised. (b) Concepts are socially constructed, and their form reflects the confluence of ideas at different scales, from individual practitioners to entire communities. The representation will explicitly encapsulate the perspectives of these inquirers. (c) Concepts can emerge and evolve over time. The representation will support the capture of provenance information that allows historical trails of change to be reconstructed. (d) Concepts are associated with people, places and times. The representation will incorporate temporal and spatial versioning information that connects individual instances of resources to particular users and the geographic and temporal context of their work.

The motivation for developing a contextualized model of concepts for computational tools is not to imbue the tools with the ability to reason with knowledge representations (although this could be a follow-on effect at a later stage), but to leverage computational power as an aid to human reasoning – much as Bush envisioned sixty years ago. These representations should act as cues that enable humans to synthesize new information more effectively by communicating meaning at appropriate levels of generalization. Further, providing a mechanism to ground concept representations in the process of their construction allows meaning to be embedded in larger knowledge structures, such as problem-solving strategies or descriptions of places. For example, a geographer studying natural hazards might traditionally create a set of maps depicting levels of risk for a particular place, a few articles describing the analysis process and interpreting results, and perhaps a model for risk assessment. Each of these resources is currently a discrete object that references the others informally; we are left to assume that they are related because they were created by the same person, at about the same time, and cover similar topics. If, however, we can provide a means of making the relationships between and among concrete resources and abstract concepts explicit by embedding them in contextual networks, we can offer communities both in the present and the future the ability to interpret and reuse information efficiently. If given a platform for integrating the various resources used in studying vulnerability, the human- environment geographer can keep track of how a set of map categories was informed by a particular journal article, and how the procedures by which model parameters were created and tested. This audit trail can result in more robust explanations and lessen the likelihood of repeating dead ends.

The professional culture of many scientific domains, however, creates significant barriers to adopting a knowledge sharing system such as that proposed here. When accolades, promotions, and other measures of status are often attained through ruthlessly protecting one’s own intellectual property, a sea change in science would be required to create a culture of open sharing. This work ask scientists to confront such a change, and this change could have clear benefits – not least a greater ability to integrate observations and hypotheses across space and over time – but without institutional support (from academic departments, funding agencies, research laboratories, journal publishers, and so on) it is unlikely to be widely successful. There are early signs that the need to achieve community synthesis is changing the way science is performed; the National Science Foundation, for instance, is beginning to encourage greater knowledge sharing through community ownership of resources produced by its funding. NSF

5 requires, for instance, that software products under its National Middleware Initiative use open- source licenses. But as in the human-environment geography example above, the techniques that will be developed in this work will have benefit even if only adopted at the individual, not collaborative, level. Indeed, it is possible that the benefits individual researchers might realize from using these tools (combined with the weight of a funding agency like NSF) will encourage the cultural shifts necessary to make their collaborative use possible.

1.4 Application Cases The use cases for the knowledge representation framework to be developed herein come from two fields, human-environment relations and geoscience. Both of these domains are characterized by ill-structured problems (discussed further in Chapter 3) and are faced with different information-integration problems. Geoscientists, for example, commonly work from shared classification schemes (describing such characteristics as lithology, age, or rock provenance) that are intended to serve as a lingua franca for geologic mapping across a region of study, often at the scale of a national geologic survey. These classification schemes amount to authoritative ontologies, but there are often inconsistencies in classification between regions. Moreover, the imposition of a set of domain concepts without associated information on how the concepts were created or how they are applied results in differential application from place to place – negating, in part, the interoperability the classification schemes afford (Brodaric and Gahegan, 2001).

In contrast to the common ontologies available to geoscientists, human-environment relations, long a research focus in geography, is typified by problems that lack such a widely accepted structure. One such problem is assessing the vulnerability of people and places to environmental change. Vulnerability differs from place to place based not only on variations in the physical and social attributes of places but on researchers’ views on the components of vulnerability, and on how those components are measured and related to each other.2 There is no single classification scheme to apply to observations made at different places, by different people, or at different times.

Each of these application domains is highly dependent on perspective; in one, it affects how common concepts are applied to local explanations; in the other, it dictates how local explanations coalesce into larger-scale explanations, requiring that common concepts remain compatible with the perspectives that created them.

In addition to providing a background in real-world problems that typify the need for collaborative, situated knowledge representations, the application areas used here will serve as testbeds for prototypes of the knowledge model and interfaces under development.

1.4.1 Application to Geographic and Geologic Cyberinfrastructure To demonstrate the ability of the conceptual model developed here to support real-world scientific exploration, the model is integrated into a Web portal called Codex (see Chapter 4) that

2 For example, vulnerability to technological hazards depends not just on the number or proximity to populated areas of industrial facilities, which might be straightforward to measure consistently in different locations, but on parameters that vary with the perspective and location of the researcher. These parameters could include the choice of demographic parameters for at-risk populations, or the spatial resolution at which hazards are studied. 6 is capable of capturing, reconstructing, and drawing inferences from situated knowledge representations. This portal will support the geographic and geologic communities by serving as a repository for organizational memory.

The nascent fields of cyberinfrastructure (NSF, 2003) and the Semantic Web (Berners-Lee et al., 2001) reflect recent attempts to enable sharing of distributed knowledge resources. Cyberinfrastructure includes networked data stores and analytical nodes, which, while distributed geographically and perhaps based on different specifications, are integrated through middleware that brokers between them. The development of cyberinfrastructure for distributed science is most visible in bioinformatics (e.g., myGrid, the “e-scientist’s workbench” (Stevens et al., 2003)), somewhat less so in geoscience (e.g., GEON), and nearly absent from geography. Composing services using this infrastructure, (whether for protein analysis, geologic map reconciliation, or vulnerability comparison) depends on representations of semantic relationships between resources, at least at the level of data and method interoperability. The Semantic Web describes a group of related technologies, notably knowledge representation languages and inference engines, designed to communicate these relationships more explicitly than in the current, “syntactic” Web. The intersection of cyberinfrastructure and Semantic Web research hints at a model of science where recordkeeping is integrated with the practice of online scientific exploration. Current approaches to e-science, where they exist, tend to be ontologically driven. But in this new model, scientific records reflect trails through an emergent and epistemologically contextualized knowledge space.

The present study promotes this model by providing a candidate formalization for describing the evolution of knowledge in online science. The Web portal developed here provides an architecture for describing the components of a scientific cyberinfrastructure at a level of abstraction above the semantics of data and method interoperation. This portal is grounded in the conceptual framework articulated in this thesis – one that describes the semantics of situated discovery and the epistemological contexts overlooked by most other infrastructure projects.

One of the distinguishing characteristics of this work is that it considers knowledge resources to be fundamentally collaborative; the form and content of a concept reflects the contributions and perspectives of the inquirers (perhaps distributed in both place and time) who constructed it. This study’s portal is organized around personal and group workspaces, which provide private and shared knowledge spaces, respectively.

1.5 Summary of impacts The unique contributions of this work are threefold: (1) By providing a mechanism for describing and contextualizing scientific concepts, the knowledge model under development can underlie contemporary versions of traditional scientific notebooks. The knowledge model emphasizes the shareable, evolutionary nature of scientific concepts. (2) The knowledge model integrates both ontological (top-down) and discursive (bottom-up) approaches to knowledge elicitation and structure. By capturing knowledge as it is constructed and as it evolves, this work draws from the benefits of computer-supported cooperative work. By allowing groups to treat certain concept structures as

7 representative of their community (and to share them as such), this work acknowledges the utility of top-down ontologies and the structure they can lend to a problem. (3) The knowledge model is developed and implemented using emerging Web standards for information sharing and collaborative science. The implementation is designed to be compatible with larger scientific cyberinfrastructure efforts, which will benefit from this work by having a foundation for knowledge sharing to complement their existing focus on data integration.

1.6 Outline of the Thesis In working toward a model for knowledge-centered aids to science work, this thesis begins, in Chapter 2, by sketching the philosophical and cognitive roots of scientific inquiry. These central themes, including hermeneutics and semiotics, motivate the present work by grounding it in the history of inquiry into the nature of scientific discovery. Chapter 3 looks at how these themes have been by turns reflected and neglected in both information science and geographic scholarship and illustrates how new ways of thinking about geographic computation are needed to continue the discipline’s progress. Chapter 4 introduces a new model for contextualized knowledge representation in online environments and presents an implementation of this model in a proof-of-concept application deployed for real-world use. Chapter 5 evaluates the proof-of- concept in terms of its support for the representational needs outlined in Chapters 2 and 3. Chapter 6 concludes by way of proposing future scenarios for the representation and use of scientific knowledge in computational environments, using the present work as a forerunner to a knowledge-driven geographic cyberinfrastructure.

8 Chapter 2 – The Construction of Knowledge

Find a scientific man who proposes to get along without any metaphysics – not by any means every man who holds the ordinary reasonings of metaphysicians in scorn – and you have found one whose doctrines are thoroughly vitiated by the crude and uncriticized metaphysics with which they are packed. We must philosophize, said the great naturalist Aristotle – if only to avoid philosophizing. (Peirce, 1931)

To represent formally the conceptual information that enters into the production and use of geographic information, we need, as Peirce hints, to first articulate the systems that underlie our reasoning. An examination of the history of inquiry into ways of knowing can help guide the transition toward thinking about geographic information as a knowledge resource and can illustrate how geographic scholarship can be at once informed by, and a contributor to, broader questions about how knowledge is constructed and manipulated.

Certainly many geographers and computer scientists – and likely most – embark on tool development tasks without first worrying about the nature of knowing. And for most applications, this is with good reason; the computational tools we tend to develop operate at levels of abstraction well removed from the inner workings of their users’ minds. A tool such as the word processor used to write this text, for instance, arguably supports the process of knowledge construction and its communication through natural language. However, it offers this support using a representational scheme that, while intuitive through practice, is not necessarily reflective of the structure of ideas in the author’s mind. (If it were, the process of writing would simply involve the transcription of fully formed ideas from a mental narrative onto the page). Like most other computer programs, from a game of chess to a digital map, it operates at a level of abstraction that necessarily avoids concerns about how the knowledge that is employed is represented, stored, retrieved, or applied by the user. At another extreme, one could try to devise tools that avoid abstraction entirely by concentrating on support for the biological processes of thought. Short of surgical intervention, however, this is outside the abilities of present technology.

But between abstract philosophizing and corporeal knowledge representation lies an area ripe for exploration. We are beginning to recognize the limitations of current computational techniques in achieving efficient construction and communication of knowledge. There are certain tasks, collaborative knowledge discovery among them, that can benefit from more explicit and computable representations of semantic structure than natural language or traditional metadata provide. To devise more effective representations, we should start by thinking about “cognitively appropriate” solutions – those that build on what we already know about the mental structure of scientific knowledge. This chapter provides such an examination, offering an overview of major themes in the history of inquiry related to the problem of scientific discovery. While not an exhaustive enumeration, these themes coalesce to suggest a set of key considerations for dealing with scientific knowledge computationally.

9 2.1 The Ingredients of Knowledge

2.1.1 Knowledge Knowledge is the information that results from the accumulation of experience and reasoning, by human, machine, or both. More than just awareness of information, it involves aspects of understanding, or the ability to apply information – consciously or otherwise – to solve a problem. The distinction between awareness and understanding is a critical one, but in computational environments, awareness is sometimes conflated with understanding. A database of facts, for example, is sometimes called a “knowledge base.” But does this accumulation of facts reflect understanding (that is, are the experiences and reasoning facilities capable of making use of this knowledge present)? Or are the facts meant solely to facilitate the recollection or creation of knowledge by their user? Only in the former case could this computational information, as it is stored, properly be called knowledge. Central to this study’s approach is an effort to incorporate aspects of understanding into the representational medium itself. Thus, the representation explicitly preserves the sense of utility that creates knowledge out of information. The definition of knowledge used here is intentionally broader than the classical view of knowledge as “justified true belief.” This definition admits the evolving and contestable nature of scientific knowledge, where justification (and often, truth) is subjective.

2.1.2 Concepts and Contexts The notions of concept and context are used throughout this work to denote two components of knowledge. The first component, the concept, expresses the existence of an abstract category. A concept encompasses everything in its extension (the synonymous terms instance or individual denote single extensional entities; “earthquake” is a concept, while “Northridge earthquake of 1994” is an individual). A given concept may have different names in different circumstances while preserving the same underlying meaning (its intension).

There are a number of basic views on the form of mentally held concepts. Classical Aristotelian theory holds that concept membership is defined through a set of singly necessary and jointly sufficient conditions, although it is widely recognized that this view deals poorly with “typicality effects” (the case that one conceptual member is a better example of the class than another) and that consistent definitions are scarce in practice and limit the capacity for conceptual change (Wittgenstein, 1953). In dealing with these shortcomings, the probabilistic approach to conceptual structure suggests that some concepts are better examples of their category than others. This approach finds support in empirical classification studies (e.g., Rosch, 1975; Rosch and Mervis, 1975) that indicate classification is not done on the basis of defining features so much as through proximity to prototypes. Finally, a “conceptual atomist” approach defines concepts through relationships between mental symbols and the objects they represent, a position in which concepts themselves are devoid of intensional definitions (Fodor, 1998).

A key problem for the present work is to implement a model for concepts that can be communicated efficiently across the human-computer interface. We could choose to use natural language terms, and indeed most studies of the structure of concepts focus on lexical concepts, but to effect rich knowledge representations it is desirable to move beyond simple syntactic labels for concepts. Although concepts may be difficult to define internally (that is, their intension may be vague), it is possible to describe them in terms of their relationships with other

10 concepts. To this end, the present work bases its representation of concepts in a dimensional variety of probabilistic model. In this approach concepts are defined through the values (or range of values, as in Gardenfors (2000)) they occupy along continuous dimensions (Smith and Medin, 1981). Each dimension represents another concept, with an indicated value describing the nature of the relationship (this model is discussed further in Chapter 4). A further characteristic of the view of concepts taken here follows from the notion of perceptual-functional affordances (Tversky, 2005) initially developed to account for visual and spatial properties of an entity. If, however, we use a dimensional approach to represent concepts, then these concepts come to occupy a multidimensional “concept space” within which we might look for some of the same functional affordances (the roles a concept plays or the capabilities it enables). Representing concepts’ functional roles in a larger knowledge structure is important to depicting “how” and “why” in scientific reasoning. As a result, we must introduce to our computational knowledge models philosophical theories beyond just metaphysics and philosophical ontology (the categories of “what” that, as Chapter 3 shows, limit much present work in computation).

The second component of knowledge – understanding leading to the ability for application – consists here of context. While arguing that knowledge is always contextualized, the cognitive science literature does not devote significant attention to describing the nature of context save to say that that our perceptions are dependent upon it and our use of concepts varies with it (e.g., Barsalou and Medin, 1986). The identification of a concept’s salient features (or dimensions, depending on conceptual model) has been shown to depend on context (Goldstone et al., 1997); for instance, that a basketball floats is not likely to be considered salient to a game of basketball, but may be a primary characteristic when considered as an item on the deck of a sinking ship. Generally, though, the term “context” is used to refer to an amorphous set of external conditions that include some subset of time, place, and circumstance. It becomes necessary to formalize the notion of context, however, when we wish to use it as a vehicle with which to facilitate knowledge-based interaction between humans and computers, and among humans via the computer.

Here, context describes a network of attributes surrounding a concept that comprise the social, scientific, and epistemological circumstances of its creation and use. Contextual information is recorded at the level of an individual concept or instance: Who created or applied it? When? Using what tools? For some resources, information producers occasionally make these contextual elements explicit (such as the time or place where an observation was made), but many others (such as the theories or methodological assumptions underlying an analysis) are kept tacit in the mind or only reported after the fact. The content of contextual elements is also partly determined by the approach a researcher’s community takes to a problem (thus arises the collaborative aspect of the present work). The representation of context should incorporate references to these communities, particularly since they are often nested within each other and can be evanescent, forming and reforming along different aspects of commonality between members.

Context is complicated by the highly interconnected relationships between its elements, such as between times and theories (a heliocentric model of the universe, for instance, is interpreted in light of its time). These interconnections are nexus-like (Figure 2.1), borrowing from Whitehead’s (1929) theory of nexus; context is an entity (an “actual occasion” in Whitehead’s

11 Figure 2.1. Nexus of contextual elements concerning the development and application of scientific concepts. Some (ellipses) are often made explicit in scientific reports or metadata, while others (clouds) are not; the latter, however, are crucial to understanding, communicating, and reusing scientific knowledge.

terminology) that itself contains other entities and the relations between them.(FGDC, 1998; DCMI Usage Board, 2005) The elements shown in Figure 2.1 subsume what we traditionally capture as “metadata”, and by using the term “context” we avoid metadata’s pejorative connotations that arise from its use in data-centric activities as well as introduce new elements (such as motivations and hypotheses) that are not addressed in current metadata standards.3 Currently, we tend to (1) think of metadata as a resource separate from information it describes; (2) create metadata documents after a resource has been created, not during the process of creation; (3) only create metadata for resources that represent final products, losing vital insight into the intermediate stages that went into their creation; and (4) neglect the need to collect metadata on how we use information, not just how we produce it (Lang and Burnett, 2000; Gazan, 2003; Halbert et al., 2003). The truth remains that metadata production is a significant burden on information producers because it is rarely captured in situ, where and when knowledge is created and applied. Refocusing resource- level descriptions on contexts, and bearing in mind that contexts are fundamental parts of the resources they describe, is a step toward addressing these shortcomings.

2.1.3 Situations and Perspectives Context has been defined as the circumstances of an individual resource’s creation or use. As has been noted, a given resource may exist in multiple contexts, and may play a different role in each. In the process of inquiry, then, concepts are selected based on relevant contexts and linked together into larger structures. These acts of conceptual manipulation have been described as situation (Solomon et al., 1999), the bringing together of background contexts and current observations and analyses toward some goal. Situation is typically treated as a different quality

3 In geographic information science, for instance, metadata usually consists in the Content Standard for Digital Geospatial Metadata (FGDC, 1998). A broader standard, the Dublin Core Metadata Initiative (DCMI Usage Board, 2005), attempts to provide a generic framework that can be extended to sound, image, narrative, and so on. 12 from context (Barsalou, 2002) because situation explicitly reproduces the enactment that is part of selecting and reasoning with a set of concepts and their associated contextual wrappers. Lemke (1997) calls situation an “ecology,” a term that evokes the dynamic interaction between concepts and thinkers in the process of knowledge construction.

Situation, then, encompasses the coordinated activity that is directed toward some goal. A given concept – we might think of it as a node in a conceptual network – can be reused in different circumstances, but there will be some meta-information we want it to carry with it regardless of circumstance, and some that will be unique to the role it plays in a particular case. The former shall be called context, the latter situation. A given set of concepts may exist in the same context but be linked together in different ways depending on the goal to which cognitive activity is directed. As a result, we could say that these concepts can occur in different situations, and importantly, that they have different roles in each situation. In Figure 2.2, individual concepts are depicted as nodes in a larger knowledge structure. Each concept has a context wrapper that provides ancillary information about the concept’s provenance. When linked together, these concepts and their relations describe a situation. Having previously suggested that context can be seen an evolutionary step in resource-level metadata, then situation is a step toward describing network-level metadata.

Situated cognition has been shown to be important to learning, in terms of pedagogy and instructional design (e.g., Clancey, 1994) as well as in circumstances of informal learning (e.g., Lave and Wenger’s (1991) study of midwives and meatcutters, among others). Since the computational aids to knowledge construction considered in this work are basically learning aids, incorporating support for situated thinking should prove useful.

We rarely, if ever, consider concepts as isolated entities devoid of background. As empirical evidence suggests, we instead associate concepts with the situations in which we encounter them (Barsalou et al., 1993). A new concept might initially exist in the single situation in which it was first learned, and gradually be extended to other situations as facility or familiarity with it increases. But knowledge of a concept can be considered incomplete if we cannot apply it in relevant situations (Barsalou and Wiemer-Hastings, (in press)). Some concepts gradually come to have culturally paradigmatic situations. For instance, a commonly held situation for “map” might involve wrestling with a folded road map in the passenger’s seat of a car.4

Situation involves characteristics that can only be described by examining the collective roles that concepts play in a larger activity. Consider two examples of situation, the first derived from everyday life and the second extending this reasoning to scientific inquiry: The concept of “car” considered in the situation of driving might include those aspects of a car we interact with from the inside while driving. Perhaps the steering wheel, accelerator, dashboard, feel of movement and the road surface become salient. In the situation of car repair, we might think of mechanical characteristics, or the engine compartment or underside of the car (Barsalou and Wiemer-Hastings, (in press)). Different situations lead to different ways of linking concepts together (e.g., in the former case “engine” might be source of motive power, while in the latter it might be the reason

4 The existence of common situations across a culture is illustrated by the old game show “Password,” in which a contestant’s ability to deduce a concept from situational clues was rewarded with cash. 13 for a repair). Note that in both cases the relevant concepts (or their instances) may have the same context, as it has been defined here: this could be my car, which I purchased at a particular time, use for a particular purpose, and so on. In geoscience, the concept of a particular iron-sulfur compound may be the result of an experimental petrology task. In this situation, the compound plays the role of an end point, a molecule that was discovered as a result of temperature and pressure conditions in the experiment. In another situation, perhaps an explanation for the magnetic field of Mars, a theory might suggest that the presence of this compound in the planet’s core justifies a claim about the state of the core, and therefore is evidence for a particular theory of magnetism.

To denote the particular choice of concepts, contexts, and situations that a particular thinker (or community of thinkers) uses to describe a process, problem, or phenomenon, I use the term perspective. In Figure 2.2, situations A and B might represent different perspectives on a problem taken by two thinkers. Each might use concepts from the same body of shared understanding but see them as being directed toward different explanations (Chapter 4 contains a more detailed description of perspectives and their representation). For thinker A to appreciate B’s perspective, it is necessary to reproduce for A both the entities that are relevant (the concepts and contexts) as well as their surrounding situation (the directed aim of B’s reasoning). This work will create an infrastructure for achieving that reproduction, and later chapters introduce a visual mechanism for depicting concepts and situations.

The capacity for two concepts to play the same role (and even have the same name) in two researcher’s perspectives results from the use of “indexicals” in expressions of the concepts’ intensions (Putnam, 1988). Indexicals often refer to what are defined here as contextual

Figure 2.2. Contexts describe the circumstances surrounding the creation or use of an individual resource or concept (a node in a conceptual network); situations describe the circumstances of larger knowledge structures arising from the different ways these nodes can be connected.

14 elements – they can refer to particular people (e.g., the speaker or thinker of the concept), particular places and times, and particular instances (this latter indicating that, as concepts are treated here, intension and extension are interdependent). An indexical description of a concept amounts to asserting that a rock type is “stuff that behaves like this and has the same composition as this and is described by someone who is focusing on a particular sample of a substance.”

The key ideas introduced in this section – concept, context, situation, and perspective – will become core elements of a knowledge model introduced in Chapter 4. The remainder of this chapter outlines the rationale for a contextualized and situated view of scientific knowledge.

2.2 The Nature of Scientific Knowledge The history of the philosophy of science is rich with examples of inquiry into the nature of knowledge that can inform modern computational methods. The discussion that follows shows how the body of prior work reveals core themes from which a computational theory can evolve. These themes will be used as design criteria for a knowledge-sharing system.

2.2.1 Aristotelian Categorization In the Posterior Analytics, Aristotle puts forth a structure of science in which different domains are based on different principles, rather than on shared tenets. This structure emphasizes what he calls equivocation in science, or differences in the meaning of a concept between scientific situations (e.g., an earthquake to geologists might be a mechanism for relieving stress in lithospheric plates, while to engineers it might describe a process that damages built infrastructure). Note, though, that to Aristotle these differences in perspective apply to communities, but do not differentiate their members. In the Aristotelian approach, there are no substantial differences in interpretation between two people’s view of a concept (e.g., if one says, “I am human” and the other says “I too am human,” they must mean the same thing by “human”).

By positing different patterns of predication in scientific domains, but patterns that remain consistent among members of a domain, Aristotelian reasoning aims for the codification of existing knowledge. (This codification is often where contemporary knowledge representations stop, as will be shown in Chapter 3). Aristotle identifies ten categories into which all concepts must find an exclusive fit: substance, quantity, quality, relation, place, time, position, possession, action, and passion). Individually, these cannot consist of true or false statements (e.g., the truth of the substance “Bill,” the quality “blue,” the action “read,” or the time “yesterday” cannot be evaluated). Aristotle argues that substance is the highest level of concept and the only one that admits contrary attributes; in a land-use domain, the quality “urban” cannot also be the quality “agricultural”, but the qualified substance “this location’s land use” can have one value at one time, another at later time. Thus, Aristotle provides an early formalization of the notion that concepts can evolve, and can have different values in different situations. By formalizing the capacity for contradiction and change in these conceptual structures, Aristotle’s work also supports the differentiation of roles among scientific concepts; a data set may be at once a result and an input, depending on situation. Furthermore, some of the ten categories – such as place, time, and possession – can be modeled through context, as defined above. We might wish to extend Aristotle’s domain-level patterns of predication (which we might call the domain’s

15 perspective) to include any community of thinkers, formal or informal, and even individual researchers themselves.

2.2.2 Kantian Judgment In contrast with Aristotelian focus on codifying terms, the Kantian view is one of observer- dependent understanding. In Critique of Pure Reason (1781) (Kant, 1996), he raises the question of how we can acquire knowledge of things that we cannot experience directly (the transcendental critique). In the present case, such knowledge might include the relationship between lithospheric stress and fault slip. Kant is not satisfied with the claims of empiricists who argue that sensory impressions are discrete data that the mind unifies into conceptions. He suggests a separation between “phenomena” (what we sense of things) and “noumena” (the things themselves); of the latter we can have no direct knowledge, only perceptions of their phenomena. Organizing these experiences into beliefs about the world is the realm of the understanding. Unlike Aristotelian understanding-through-direct-experience, Kant suggests that “intuitions without concepts are blind” – we use mental concepts to mediate our experiences and our understanding. But if experiences are not immediately concepts, then we require some mechanism by which the two can be connected to form understanding. By his third critique, the 1790 Critique of Judgment (Kant, 1987), Kant argues that judgment plays this mediating role.

Kant separates judgment into two kinds, aesthetic and teleological. The former is largely concerned with notions such as good and beauty, but there are lessons to be taken from it. Primary among these is what he calls “sensus communis” – a shared sense, the feeling that others should agree with our judgments – that has parallels with this study’s examination of collective and collaborative understanding in science. Teleological judgment involves what Kant terms purposiveness, or action directed toward some end, and is explicitly concerned with understanding the natural world. Through purposiveness, teleological judgments bridge the sensible features of the natural world with their underlying supersensible causes or purposes. This unification is directly comparable to the mediating role of scientific measurements in connecting transcendent entities with our observations (Rothbart and Scherer, 1997).

To Kant, purposiveness arises from reflective judgments, those that are essentially inductive (using judgment to find the universal for a set of particulars). This contrasts with determinative judgments that amount to deduction (classifying particulars as universals). In the Kantian view, scientific research is predicated on the assumption that there is purposiveness in the world, an order that makes the world knowable. Our pre-judgments are implicated in the act of observing the world. The choice of concepts and data we use – our perspective – commits us to the worldview those resources embody. Despite its etymology as that which is given or granted, “data” only becomes “evidence” in light of a presupposed theory. Observations must thus be linked to concepts if we are to make sense of them, and these concepts organized into situations that reflect our judgments. To achieve “sensus communis” and to make our judgments known to others, we require a means of representing the process by which observations are turned into knowledge.

The use of reflective judgment, then, is to align our observations so that we may find purposiveness in nature (while presupposing that it exists). Kant says that the “harmony of nature with our judgment is there merely for the sake of systematizing experience, and so

16 nature's formal purposiveness as regards this harmony can be established as necessary.” The Kantian notion of purposiveness thus aligns with the idea of directedness in situations, described earlier. Kant also usefully extends Aristotle’s domain-dependent understanding to include the judgment (perspective) of the individual inquirer.

2.2.3 Peircean Inquiry C.S. Peirce recast understanding in terms of discovery rather than Aristotelian classification (deduction) or Kantian judgment (induction). He argues that scientific discovery is a privileged kind of cognition that supplants the traditional modes through which humans generate beliefs about the world. Unlike these other modes – tenacity (steadfastness in the face of contradiction), authority (e.g., belief imposed on a community by a state or church – or, in the present case, domain ontologies such as rock classifications), and a priori (e.g., those that seem reasonable, such as a round earth) – beliefs fixed through scientific inquiry, while fallible, are self-correcting (Peirce, 1877). The nature of science accommodates continuous revision.

Peircean scientific inquiry thus replaces faith in prior beliefs with human experiences in the world, but moves beyond how Aristotle and Kant conceived of those experiences. Peirce’s problem with Aristotelian, deductive categorization is that it takes as first premises immutable rules and is therefore incapable of generating new hypotheses (the observations to be explained must always be the true conclusion of the premises). Instead, Peirce’s Fixation of Belief proposes that scientific knowledge emerges from the interaction between three stages of inquiry: the first, retroduction (sometimes called abduction, although Pierce appears to favor the former term), uses experiences to stimulate possible hypotheses by what he would later call “an appeal to instinct.” Deduction follows, in which the consequences of those hypotheses are examined. Lastly, inductive hypothesis testing selects the most likely explanations. The self-correcting nature of science arises from this inductive step, as induction’s shortcomings (it is not truth- preserving, and future observations may alter or contradict a hypothesis) turn into advantages. As the body of observations we have made about the world grows, induction converges toward truth. Peirce calls this “qualitative induction,” aligning it with the quantitative induction we perform when inferring from a sample to a population in statistics. The problem, of course, is that convergence toward truth only happens in the long term, so it becomes even more important to preserve records of reasoning for future generations (Objective C in Chapter 1). The methods developed in the present work will help create and share this record through the capture of provenance information that reflects how observations were linked to concepts, and concepts organized into explanations.

Peirce’s three-stage model of inquiry supports a collaborative approach to creating scientific explanations. He suggests that any truth that there is to be found in science is simply that which a community of inquirers agrees upon over time (recognizing, of course, that there are always more observations to be made). In contrast to a Cartesian view of science in which the individual knower is omnicompetent, Peircean scientists are interlocutors in a process of inquiry that spans generations; a community’s emergent belief eclipses the individual’s, the possibility of change with future explanations eclipses preservation of past beliefs. A scientist’s community, his or her “circle of society,” is a sort of “loosely compacted person, in some respects of higher rank than the person of an individual organism” (Peirce, 1931). Importantly, a community is not reducible to a collection of individuals; it includes the commitments those individuals have made

17 as part of the process of identifying as a community. The representation of an individual’s knowledge thus needs to be linked to the communities of which that person is a member, and it must be possible to represent the shared beliefs that all members of a community hold (Objective B in Chapter 1).

In On a New List of Categories (Peirce, 1868), Peirce reduces Aristotle’s ten categories to three: quality (“firstness”), relation (“secondness”), and mediation (“thirdness”). These categories reflect the movement of thought from unary qualitative attributions (“this is”), to dyadic comparative judgments (“this is different from that”), to triadic mediative conditions (“this is different from that in a certain respect”). As in Kantian reasoning, the application of a quality to a substance is a hypothesis. But Peirce’s approach to science is pragmatic and derives from the increasing precision afforded by characterizing a concept as a first, second, or third. He identifies three grades of clarity in scientific explanations: that which is clear is a kind of first, implying simple familiarity with the world; that which is distinct is a second, defined in relation to another concept but divorced from experience; the third and highest grade, that which is pragmatic, can only be derived from experiential examples (Peirce, 1905). Pragmatism holds that science is in its practice, when inquirers replace faith in the a priori with their experiences in the world. In pragmatic retroduction, useful guesses (if not always correct ones) emerge out of the manipulation of observations. The present work’s focus on capturing the context and situation of a concept acknowledges the Peircean position that the researcher’s practices in the world are the key to generating robust understanding (Objective A in Chapter 1).

Pragmatism has some similarities to Husserl’s phenomenology (Husserl, 1970), developed at about the same time. Both ground understanding in experience, but phenomenology adds the aspect of intentionality. Every cognitive action is directed at something; there is always an object of intention. As with Kant’s purposiveness, the directedness of situation as it has been defined here draws from this notion; concepts should be represented in the circumstances that illustrate the purposes to which they are put. These purposes can be represented through situation (again meeting Objective A in Chapter 1).

There are also connections between Peirce’s view of scientific concepts and Whitehead’s notion of nexus. Nexus is akin to what Peirce called synechism, the intrinsic continuity and connectedness between concepts. Just as his communal view of science holds the collective of inquirers above the individual, the synechistic view of concepts holds their continuity above their discreteness. The nexus, in which elements are connected to each other by inherent mediating factors, is a kind of Peircean third (Sowa, 2000a); the elements of context shown in Figure 2.1 and the network of concepts in Figure 2.2 can each be viewed synechistically, as a kind of mesh.

The pragmatic manipulation of observations in retroduction has links to Peirce’s semiotics, which contends that all thinking is in terms of signs. If we take the Peircean view that reasoning is the act of manipulating signs, there are a few interesting consequences for creating situated representations of knowledge in computers, particularly in terms of the roles that concepts can play in an analysis task. (While it is possible to spend a lifetime formalizing semiotics for computation, here only the most obvious connections are articulated; further, Saussure’s contributions (Saussure, 1974) are left out of this discussion, in part because Saussure was largely concerned with signs in linguistics, while Peirce’s treatment of semiotics incorporates

18 observation of the natural world, which is useful for the present application.) Peircean semiotics covers the triadic relationship between representamen (Peirce’s term; elsewhere called sign vehicle), interpretant, and object (Figure 2.3a, following the semiotic triangle of Ogden and Richards (1927)). Peirce’s obvious affinity for triads notwithstanding, together these are the three components of a sign. In Figure 2.3, directional arrows between sign components indicate that the relationship between each is not fixed (by convention the dashed line indicates that there may be no direct relationship between representamen and object; the relationship exists only through their interpretant). As an example of the relevance of signification to representing geographic concepts, consider the sign representing “urban land use.” The sign at the top of Figure 2.3b might represent a conventional way of thinking about this concept in terms of a map classification task. We start with the notion of the land cover category (as a map classification, say). This notion could be called the representamen for some actual feature (or set of features), perhaps an office building. The interpretant is how we derive meaning from the sign, in this case through the sense that the map category indicates an attribute or feature of a particular location. At the lower right of Figure 2.3b is the sign that might represent a map query; of what is the entity present at a particular map location the representamen? Here, the feature is given the meaning (interpretant) of “office building” through which we infer that it is an instance of a particular land cover category in question. At the lower left is yet another example, where the concept of office building signifies a particular concrete feature through the category that gives it meaning; this sign might be one we use on the ground. Each of these semiotic triangles can be seen as different situations in which the concepts might exist or different perspectives onto the characterization of “urban land use.” The role each sign element plays will necessarily vary according to the problem to be solved – the very choice of what is representamen and what is object implicate a worldview. While MacEachren (1995) and Head (1991) have previously

Figure 2.3. a) The triadic relationship among elements of a sign. b) The relationship between elements of sign and meaning is variable. Here, the sign “urban land use” could be instantiated through different semiotic relationships depending on perspective or problem. 19 treated the issue of semiotics as a basis for examining the visual representation of geographic information, here it is used to illustrate the roles played by the concepts underlying geographic/scientific knowledge.

The key features of Peirce’s philosophy that inform the present work are thus: 1. Retroduction underscores the importance of pragmatic manipulation of observations. It is only through this manipulation – looking at things one way, then another – that we create the circumstances in which the spark of hypothesis generation can light. The focus here on devising methods to represent situated practices in science follows from Peirce’s view that the highest degree of explanatory power derives from explanations grounded in pragmatic experience. 2. Semiotics provides a useful way to articulate the different roles that concepts and resources can play in scientific investigation. The “looking at things one way, then another” of retroduction is reflected in the transitivity between components of a sign. It is likely that encouraging thinkers to experiment with this transitivity, and to have access to the different perspectives on a problem that their colleagues might take, can have real impact on creating understanding.

2.2.4 Hermeneutic Interpretation If the meaning of concepts is grounded in individual and collective experiences and perspectives, an attempt to represent this grounding computationally must address the problem of interpreting others’ experiences and perspectives. Such interpretation is the basis of hermeneutics, which places the individual in a historical context that colors his or her judgments (Gadamer, 1975). Initial articulations of hermeneutic principles arose from the problem of interpreting texts in the humanities and theology, but it is possible to extend hermeneutic propositions to scientific computing (Fonseca and Martin, 2005). The implications of hermeneutic context in the realm of scientific investigation support the Peircean view of community and experience. Because situations are constantly evolving (at any moment, historical context is changing, akin to Peirce’s synechism), hermeneutics stresses that understanding a concept comes from the ability to apply it in a current situation.

In a hermeneutic approach to scientific understanding, information comes to us bearing a message (a message that is not entirely objective – we are complicit in recognizing it), and the role of understanding is to interpret that message in light of our own experiences. Everything that is handed down to us over time requires its meaning to be unlocked. Questioning is the process by which this interpretation progresses. Summarizing Collingwood, Gadamer (1975) suggests that “we can understand a text only when we have understood the question to which it is an answer.” That is, what is the purpose to which this information was put? Capturing the provenance and evolution of concepts as they are manipulated by scientists can help reveal these purposes through depictions of the situations in which concepts are found. The act of questioning is not to be conducted insincerely – asking questions is at the core of having experiences. In Gadamer’s view, it is not the seeking of determinate solutions that is central to understanding, but rather the act of questioning in pursuit of a solution. The solutions developed here will further the researcher’s ability to question his or her own knowledge, as well that as that of collaborators.

20 What Gadamer terms historically effected consciousness guides the choice of appropriate questions. Such consciousness is not merely an acknowledgement that a particular work has a history, but that this history influences our understanding. The acknowledgment of a history does not suffice to put the questioner in a text’s historical context, nor is it even possible to transcend one’s contemporary context to visit a historical text in its native context. Rather, asking questions of historical texts involves finding horizons – the horizon of the questioner and the horizon of the text. A horizon bounds what can be seen from a particular perspective, but it is not intended as an exclusive device so much as an inclusive one. Within a horizon exist all of the concepts important to a problem. Questioning is the process by which we come to understand the form of another’s horizon (Gadamer uses the example of a doctor questioning a patient, although this implies an expert-novice relationship between the questioner and object of study that confuses matters). When we ask questions that help define a text’s horizon, we are engaged in a process of fusing that horizon with our own. (We cannot directly visit a historical horizon and we cannot escape our own; reconstructing another’s horizon can only be done through fusion, producing an expanded understanding of a problem). Through visual depictions of another researcher’s processes of knowledge construction and use, as will be introduced in Chapter 4, and the ability to integrate portions of another’s knowledge into a new situation (Chapter 5), a researcher can interrogate and gradually come to appreciate the horizons of his or her colleagues.

The process of questioning, while it helps delimit horizons useful for understanding another position, does not provide determinate answers for the relationship between a historical stance, or someone else’s stance, and one’s present horizon. Instead, questioning exposes indeterminacy – it suggests possibilities. Gadamer identifies a collection of ingredients involved in the act of questioning to define and fuse horizons: Dialogue. Questioning involves dialogue, whether it is between two people attempting to achieve mutual understanding, between a reader and a text, between data user and data producer, and so on. And as Peirce noted, as our understanding develops we are also in dialogue with a future self (the one who understands) that is coming into being (Peirce, 1905). (Without knowing of hermeneutics, Vannevar Bush articulated a similar kind of implicit dialogue between record-keepers and record-readers in As We May Think.) In the case where the creator cannot be directly questioned, the horizon of the text is only expressed by the questioner. There is an implied “I-Thou” relationship between the questioner and the text, however, that makes the text less of an external object and more of an integral component of our own coming into understanding. The fundamentals of dialogue still hold; it must reflect a mutual respect and genuine attempt to understand each other. Disregarding the horizon of an interlocutor, real or imagined, is not an option, since it cannot lead to the fusion that represents understanding. Play. Gadamer uses play to describe the testing of horizons, the back-and-forth that occurs between questioner and text during the process of understanding. Play is important to hermeneutics because it reinforces the free flowing nature of possibilities during the creation of meaning over the simple imposition of a singular actuality. Prejudice. Unlike Cartesian rationality in which inquirers should strive for objectivity free from bias, hermeneutics acknowledges the unavoidable (and not necessarily undesirable) nature of our prejudices. These pre-judgments are part of our horizons, and

21 they are the basis of our ability to have experience; pre-judgments guide our exploration of a text as we attempt to elicit its horizon (Bernstein, 1983). The hermeneutic circle. Related to the concept of play, the hermeneutic circle defines the process by which we experiment with possible interpretations. Making one interpretation may change the way we interpret another part of a text. Interpretation can thus be circular, or at least involve feedback with earlier interpretations. It is insufficient to model hermeneutic interpretation as a linear process. Breakdown. When our theories about some part of the world fail to explain our experiences, we enter into a condition of breakdown. The breaking down of a prior interpretation necessitates the construction of a new interpretation. Rather than being a condition of failure, however, breakdown creates an opportunity for ingenuity.

There are connections between Gadamer’s views and Peirce’s, particularly in the areas of dialogue (Peirce’s community inquiry), play and breakdown (retroductive generation and testing of possibilities), and the hermeneutic circle (differences in signification depending on perspective). But where Peircean inquiry might ask us to make our interpretants clear, Gadamer recognizes a difficulty in representing background situations: we are quite often unaware that they are being employed. Gadamer writes that “The very idea of a situation means that we are not standing outside it and hence are unable to have any objective knowledge of it.” The hermeneutic approach is also in contrast with claims made for the objectivity of some knowledge, particularly in artificial intelligence applications. Searle (2002), for instance, suggests that some arguments can be evaluated independently of perspective (these he terms “epistemically objective”), as can certain observations about the world (termed “ontologically objective”) such as the observation “this is a rock.” Phenomena that are purely experiential (e.g., pain) are also ontologically subjective, existing only as they are experienced. The problem with epistemic and ontological objectivity is that they tend to support only the most basic of claims, and once any qualification or evidence is applied in support of those claims (which is the basis of science), they become immediately dependent on perspective.5

The history presented here, from Aristotle, through Kant and Peirce to Gadamer, suggests that a new computational model for scientific knowledge can be an evolutionary step in epistemology and the human-computer interface (Figure 2.4): traditional knowledge representation languages generally support syllogistic reasoning and Aristotelian codification of terms; when integrated with problem-solving tools, the result affords Peircean scientific exploration; finally, representations of hermeneutic context support interpretation and subsequent reuse of scientific knowledge.

5 An incident from my own experience illustrates this point. At a planetary geology conference, an argument broke out between a speaker and a member of the audience over the concept of “soil.” The speaker, an expert in a field other than soil science, was making the ontologically objective claim “this substance is soil” in his discussion of the material upon which a Mars rover was traveling. This notion of soil was colloquial, as in “soil is the top layer of a planet’s surface.” The audience member, a soils expert, contended that the same substance was not soil but regolith (a generic kind of loose material), since soil implied some ability to harbor life, which the speaker would not claim existed on Mars. The two scientists’ models of the perceptual properties that constitute soil were situated in different perspectives and had only partial overlap. Thus each considered the other’s perspective subjective by comparison. 22 Figure 2.4. The progression from Aristotelian to Peircean logic, with the addition of hermeneutics, is analogous to the cumulative affordances of modern-day knowledge representation efforts, including this study.

2.3 Contemporary Motifs in Scientific Cognition Cognitive science emerged, in part, as a response to positivism and behaviorism, and was spurred by the representational problems presented by computers. Early cognitive models, such as functionalism (which associates brain states with computational states) and mentalism (which holds that concepts are largely intensional, reflecting basic brain operations such as beliefs and desires), were demonstrated to be computationally tractable; functionalism is reflected in Parallel Distributed Processing (Garson, 1998), mentalism in belief-desire-intention frameworks that form the basis of a popular structure for agent programming (Rao and Georgeff, 1991). Functionalist models like neural networks consider the basic elements of cognition to be untyped nodes that reproduce meaning by learning differences in activation thresholds across a network. While the units of functional cognition are not explicit symbols as they might be in semiotics, functionalist perspectives correspond to a limited degree with what Vannevar Bush called associative indexing between elements of a research record; a problem, though, is how to offload associative connections in the mind to a worldly representational mechanism.

2.3.1 Positivism and Paradigms The early-twentieth-century doctrine of logical positivism sought to treat knowledge as only that which could be verified through empirical testing. Among other problems with verification, such as a tendency toward the fallacy of affirming the consequent, a key shortcoming is the problem of induction. If we “know” that all swans are white, verifying this hypothesis as indisputably true requires examination of every swan past, present, and future. Popper’s critique of positivism is that science is instead grounded in falsifiability, which puts the burden on

23 negative counterexamples instead of affirmation (Popper, 1959). Barring evidence of counterexample, we can say that a theory is proven.

A problem with both verification and falsification, however, is that they suggest the kind of universal method that a model of knowledge grounded in situation and judgment does not support. Are there instances where individual researchers arrive at similar conclusions via different approaches to truth? Feyerabend (1988), for instance, prefers to cast science as a worldly, individual activity that is not dependent on universal method but that occurs within an idiosyncratic context.

Positivist and falisificationist claims in science are further rejected by the concept of holism, which treats hypotheses and evidence as a “corporate body” (Quine, 1990); individual experiences alone cannot constitute tests of scientific hypotheses. Experience is “non- monotonic” and does not lead deterministically toward verifying or falsifying a hypothesis; as Peirce suggests, there is always the chance that further experiences might amend or contradict a hypothesis.

The result of accumulated amendment and contradiction of old models by new information is described by Kuhn’s paradigm shifts (Kuhn, 1962). Kuhn suggests that there is no mechanism by which every member of a community will produce the same model of a problem (there is not a universal method that, in his terms, amounts to a “neutral algorithm for theory choice”). Observations about the world can be described with different models, and some such models may be ubiquitous enough to be called paradigms. (The notion of perspective outlined earlier is informed by Kuhnian paradigms to the extent that both delineate the worldview within which a researcher works.) But when evidence fails to fit a paradigmatic model, science’s capacity for belief revision enables news models to emerge. This emergence does not render old interpretations useless, it simply requires us to recognize that other models were in use at the time (and I suggest that this reconciliation of models aligns with hermeneutic approaches to fusing horizons). This is not to say that individuals and communities choose perspectives irrationally, but what constitutes rationality to them depends on characteristics that may vary from time to time or person to person, rather than being based on universal truths. Theory choice is thus a “judgmental activity requiring imagination, interpretation, the weighing of alternatives, and application of criteria that are essentially open” (Bernstein, 1983)(56). In the present work, Kantian judgment, Peircean retroductive imagination, and hermeneutic interpretation interweave in the delineation of situations that surround the creation and application of scientific knowledge.

2.4 Summary This chapter has defined the key ideas of concept, context, situation and perspective, and has introduced the need to formalize these elements of knowledge if we are to create effective computational aids to knowledge construction. Motivating theories in the philosophy of science and human cognition illustrate the importance of both individual and community perspectives in this process. The need for concepts to play evolving and potentially contradictory roles in different situations emerges from Aristotle’s articulation of the categories of being. But the restrictions against disagreements over meaning within these categories make simple taxonomies insufficient for reflecting the diversity of belief within a community. Support for judgment,

24 interpretation, and pragmatic manipulation become design criteria for a system to support the social process of knowledge building.

This background sets the stage for an examination, in Chapter 3, of the current state of knowledge-based activities in computation, especially in geography and geoscience. Geoscience, in particular, is beginning to recognize some of the philosophical and cognitive justification for accommodating situation, perspective, and experience; geoscientists have recently argued that their explanations of the world are fundamentally dialogical, experiential (Frodeman, 2003) and abductive (Voisard, 1999). Geoscientists are also coming to grips with the interpretitive nature of their field, wherein observations must be situated in larger narratives about earth history (Martin, 1998; Baker, 1999). Chapter 4 will use the theories of understanding articulated here to develop a formal model for representing this type of scientific knowledge computationally.

25 Chapter 3 – Knowledge in Computing and Geography

Intertwingularity is not generally acknowledged – people keep pretending they can make things hierarchical, categorizable and sequential when they can’t. Everything is deeply intertwingled. (Nelson, 1974)

3.1 Do Our Tools Do Justice to Our Ideas? Between theories of the nature of concepts formulated in the cognitive sciences and philosophy and applications of those theories in scientific tools lie computational representations of concepts. These representations form the mechanism by which knowledge is communicated between and among humans and computational systems. The previous chapter has shown how scientific explanations reflect the “intertwingling” of multiple perspectives, data sets, hypotheses, and researchers; the challenge for representation, then, is to balance this intertwingularity with enough structure to make information resources interoperable.6 Without imposing structural constraints on users, application of a knowledge representation technique can result in a tower of babel problem; users may be unable to find intersections between their individual perspectives. Too much structure, however, constrains the ability of inquirers to describe the world as they know it. A knowledge representation for scientific problem solving should permit a multiplicity of perspectives while preserving sufficient interoperability for meaning to be reliably communicated. (Nelson, 1965) This chapter introduces the basics of computational knowledge representation and examines how representational trends have been reflected in geographic research. For their part, geographers have focused on promoting particularly formal kinds of knowledge representations, but the pertinence of these to everyday geography is limited as a result. How can we bridge the divide between highly formal knowledge representation languages, which are required for many kinds of computational work and the informal, fluid nature of scientific knowledge creation? There will be no single representational format that is perfect for all applications, but the characteristics of scientific knowledge discovery suggest a technique that supports situated networks of ideas. The chapter concludes by outlining a path toward such situated representations and the information systems that use them to enable interpretation and reuse of scientific knowledge.

3.2 Trends in Knowledge-Based Computing Contemporary knowledge-based computing is defined around the interweaving elements of formal languages, computer-supported cooperative work (CSCW), and cyberinfrastructure. The creation and use of knowledge representation (KR) languages tends to be a top-down activity, where the primitives and structure of a notation are used as an a priori classification for domain knowledge. CSCW techniques that support communication between researchers are often used to synthesize meaning informally from the bottom-up (that is, by encouraging users to search for

6 Ted Nelson is one of the creators of the concept of hypertext, which he billed as a “file system for the complex, the changing, and the indeterminate” (Nelson, 1965). Hypertext is a representation that permits intertwingularity – in the form of arbitrary links between resources – but it remains an incomplete implementation. Hypertext links are purely syntactic, allowing the fact of a relationship to be expressed but none of the reasoning behind it (except in natural language, which carries its own set of difficulties). 26 similarities in perspective); post facto knowledge mining approaches such as inductive bibliometrics are similarly bottom-up. Cyberinfrastructure is the emerging fabric within which KR and CSCW techniques might be integrated to support collaborating scientists. This section reviews developments in each of these fields and suggests how recent trends point to the need for a combined solution.

3.2.1 Computational Knowledge Representations A knowledge representation is a surrogate within a reasoning agent for information about the world and a medium for drawing inferences about that information. The representation is semantic in that it bears a degree of correspondence to the referents it describes (Davis et al., 1993). Treating KR as a surrogate avoids the existential concern of whether the objects to which a concept refers are in the mind (human or otherwise) or in the world. We can view representations as carriers of meaning without asserting that they are equivalent to meaning. It is not problematic for representations that lack complete fidelity to the concepts they reflect to still be a necessary part of communicating understanding.

There are four levels of abstraction at which knowledge representations can be conceived (Reichgelt, 1991). From the concrete to the abstract, these are: (1) Implementational: The data structures required for computation. (2) Logical: The rules of inference that operate on those structures. (3) Epistemological: The actual primitives needed to express a structure. (4) Conceptual: The general types of primitives needed for sufficient expressivity.

The approach of the present work is to use existing KR notations (which accomplish the first and second of these levels) to bring situated representations to the third and fourth levels. KR languages typically constrain what we can describe in the world and how we can describe it by providing a fixed vocabulary for a domain. In this way, they are a set of ontological- metaphysical commitments – their creators and users commit themselves to a worldview in their choice of KR. But as the foregoing chapter argued, the experience and consequences of making these commitments is critical to working with and sharing knowledge.

Approaches to knowledge representation range from the logical (those that are essentially deductive) to the psychological (those that are essentially abductive), two poles in a tension between decidability and expressiveness. Deductive approaches founded in first-order predicate calculus (FOPC) are attractive for their decidability. The truth-preserving nature of deduction means that inferences can be both drawn and verified on the basis of available information. Implementations of FOPC in KR languages such as Prolog (Colmerauer and Roussel, 1993) and Loom (MacGregor, 1994) enforce this decidability by restricting the variety and format of statements users can make about the world. Typical FOPC statements assert facts of the form ∀ x (“for all x…”) or ∃ y (“there exists a y that…”). The difficulty in deploying a FOPC-based KR tool for real-word use, though, is that the logic deals poorly with characteristics like possibility, contradiction, and interrelation. Extensions to FOPC, such as the Knowledge Interchange Format (KIF), have attempted to deal with certain shortcomings by adding the capacity for reification (which allows statements in the language to be considered entities in themselves, and therefore they can have claims made about them), but the fact remains that the

27 strictness of FOPC makes it most appropriate for tasks that can be achieved using deduction alone.

Frame logics are the basis of object-oriented approaches to KR such as KL-ONE (Brachman and Schmolze, 1985). In frame logic, classes instead of predicates are the basic units of reasoning (Kifer et al., 1995). These classes are defined on the basis of features, represented as slot-value pairs. Frames thus bear a resemblance to the Aristotelian “defining features” approach to representing concepts (Section 2.1.2). Description logics (DL) take a frame-like view onto FOPC; like frames, the atomic element of a DL is the class or concept, but it is defined on the basis of properties that relate it to other classes. DLs provide for instances of classes to inherit from and extend parent classes, and for classes to be defined on the basis of restrictions to particular domain and range values.

Contemporary information system ontologies are based in DL approaches to knowledge representation. These ontologies are intended to formalize domain knowledge by enumerating the kinds of things that exist and their properties (Gruber, 1995; Guarino, 1997), but they tend to be hierarchical (based on subsumptive or mereological relations). Wordnet (Fellbaum, 1998) is a classic example of a hierarchical ontology of lexical terms; the Standard Upper Ontology (SUO) project is devising epistemological primitives that can support data interchange regardless of domain, while the NASA SWEET project is creating ontologies for physical processes and phenomena.7 (Section 3.3 describes ontological approaches specific to geography).

Within the past five years, KR has come to be synonymous with XML-based languages due in large part to efforts of the World Wide Web Consortium (W3C). XML describes the structure of a document, but not its semantics, through hierarchical tags; the Resource Description Framework (RDF) is notation that uses XML syntax to provide further semantic markup (Decker et al., 2000). RDF is a format for which the RDF Schema (RDFS) provides a specific language of ontological terms (classes, individuals, properties, and relationships). However, RDFS is not quite a description logic and not quite a frame logic; it makes describing semantics simple, but inference across those descriptions is tricky. To remedy this, various languages have extended RDF in an attempt to make it a proper DL. Two of these, DAML and OIL, merged and were later subsumed under the W3C OWL (Web Ontology Language) project; the latter is now the de facto standard for expressing semantic information in computational environments (Figure 3.1). OWL adds specific support for expressions of disjointness, transitivity, cardinality, and collections, but separated into three varieties that accommodate varying degrees of expressivity versus decidability (McGuinness and van Harmelen, 2003). OWL Lite is a small subset of the OWL language that is best used for taxonomies and thesauri. It is rigidly decidable. OWL DL (so named because of its correspondence to description logics) allows a greater variety of expressiveness but restricts the style in which assertions can be made to preserve inferencing capabilities (Figure 3.2). OWL Full has maximum expressiveness, and allows contradictions and simultaneity (e.g., a concept may be at once a class and an instance of another class), but statements made in Full are not guaranteed to be computable by any inference engine. (An inference engine is any piece of software capable of reasoning with the semantics that a language like OWL encodes).

7 SUO: http://suo.ieee.org; SWEET: http://sweet.jpl.nasa.gov 28 Figure 3.1. XML provides a foundational syntax for describing KR structure. RDF builds on XML’s document structure by encoding metadata primitives like classes and properties. OWL gives RDF rich semantics.

Figure 3.2. OWL snippet from the SWEET physical substances ontology. The overall structure is hierarchical (classes and subclasses), although the class “InfraredRadiation” illustrates how a class can be defined on the basis of singly necessary and jointly sufficient features (a restriction). Here, infrared radiation is a subclass of electromagnetic wave that has a wavelength range in the infrared portion of the spectrum.

When XML-based ontologies are concerned solely with classification and reasoning tasks (as is often the case, e.g., Bozsak et al., 2002; Sugumaran and Storey, 2002), what the end user is left with is precisely what Chapter 2 suggested was insufficient: neutral concept structures isolated from the collaborative experiences that informed their creation and from the fluid process of their change. Indeed, there have been attempts at “national knowledge infrastructures” built from static ontologies and designed to be adopted without modification (Cao et al., 2002). Some ontological frameworks have been extended to represent the process of constructing natural

29 science knowledge through experimental procedures (Noy and Hafner, 2000), although these ontologies are intended to retrieve process descriptions from existing text.

There have also been some commendable recent efforts to examine the problem of ontology versioning and change as it relates to maintaining logical consistency (notably Klein et al., 2002), but there is room for a deeper exploration of how changes resulting from the manipulation of resources over time can be reflected in KR. A change in the way we express a concept (through its intension, extension, or relations) over time reflects something deeper than the straightforward relabeling supported by versioning capabilities in many KR tools; the change indicates a shift in ourselves that necessitates modification to the interpretive stance (Section 2.2.4) others must take to understand our knowledge (Buzaglo, 2002).

A further difficulty with structuring knowledge as ontologies in OWL is that their machine readability comes at the expense of human readability. Semantic networks provide a graphical alternative. Some such networks have been proven equivalent to first-order logics and could be used to represent ontological information. One of the earliest examples is the Existential Graph (EG) of C.S. Peirce (Figure 3.3a). Peirce was a proponent of what he called diagrammatic reasoning – the value of which anyone who has sketched on the back of a napkin can confirm. Sowa (2000a) has created a semantic network formalism called Conceptual Graphs that is based

Figure 3.3. a) An example of Peirce’s visual notation for logic, the Existential Graph, from one of his manuscripts (source: Peirce Online Resource Testbeds). b) A statement in Conceptual Graph notation, asserting that a particular researcher studies earthquakes for the purpose of reducing their risk.

30 on EG but affords greater readability (Figure 3.3b). CGs have nearly identical expressivity to RDF and OWL, but in avoiding linear notation offer greater readability (Sowa, 2000b). Compared to the OWL snippet in Figure 3.2, the CG provides a clearer depiction of which entries are concepts and which are relations, and how the two are connected. CGs also incorporate a simple notion of situation by allowing portions of a graph to constitute a reference to a broader activity. In Section 4.4, a GG-like visual technique for describing knowledge will be introduced as part of the implementation of this study’s aims.

Inductive approaches to KR forfeit the strict decidability and consistency that make for efficient deduction in logic-based approaches, but attempt to reflect some of the complexity involved in negotiating the semantics of the real world. (This is an acknowledgement of Peircean fallibilism, in which errors in reasoning are an unavoidable and informative part of investigation). Typical inductive approaches are rooted in data mining and Parallel Distributed Processing techniques. Bibliometrics, for instance, attempts to detect connections between concepts and communities of researchers based on citations (e.g., Chen, 2004) or on natural language processing of journal articles (e.g., Landauer et al., 2004) and online discussion environments (Pike and Gahegan, 2003).

Both deductive ontology and inductive pattern detection have merit in KR, but as Peirce argues (Section 2.2.3), neither reflects the creative process of science. Ideally, information systems that support both top-down and bottom-up reasoning will embed them in the overarching process of retroductive hypotheses generation. Indeed even natural language processing, long a bastion of inductive approaches to semantic structure, has started to become concerned with extracting abductive beliefs from written text (Bunt and Black, 2000). Moreover, knowledge elicitation practitioners have long held that task analysis – understanding how subjects actually solve problems in a domain – is a key to understanding the domain’s structure (Zaff et al., 1993; Cooke, 1994). There is now growing recognition in the knowledge representation field that KR tools should reflect these situated work practices (Schultze and Boland, 2000) and accommodate the dialogical, interactive nature of exploration (Nake and Grabowski, 2001; Dustdar, 2004). Marcos and Marcos (2001) argue that ontologies in information science are often treated as unassailable schema for “external” knowledge rather than as representations of shared knowledge with their own context and schema. Magnani (2001) suggests that situatedness is precisely what makes abduction a useful model for computer-based hypothesis creation – even under conditions of hypothesis failure, it produces useful information. Creative abduction can furnish partial explanations (Magnani, 1999) that are revised and extended over time through cooperative belief revision (Thagard, 1992).

3.2.2 Computer-Supported Cooperative Work Douglas Engelbart, an early pioneer in human-computer interaction, was one of the first to recognize Vannevar Bush’s insight into how computers could change the way we work as individuals and communities. Engelbart’s concern was how computational tools can augment human intellect – not by replacing it, but amplifying its natural abilities through aids to reasoning, memory, and communication (Engelbart, 1962). He proposed a system called H- LAM/T (the cumbersome acronym for the even more cumbersome “Human using Language, Artifacts, and Methodology, in which he is Trained”) that, if automated by a computer, would permit users not only to manipulate complex concept and symbol structures, but also to capture

31 how these structures are linked across multiple roles. These roles amount to nested situations – the act of collecting data might be nested within the process of creating a numerical model, which in turn might be part of an effort to predict the direction of a trend. One of the impacts of an ideal H-LAM/T system is efficient knowledge reuse in an organization, in that a library of reasoning processes is accumulated and made available to team members. Engelbart’s proposal for H-LAM/T foresees the development of the field of Computer-Supported Cooperative Work (CSCW, Prakash et al., 1999; Borghoff, 2000), which aims to devise mechanisms that support shared understanding.

Traditionally, CSCW has been relevant to geographers to the extent that it provides a framework for supporting cooperative visual exploration (Brewer et al., 2000; MacEachren, 2001) and collaborative geographic information systems (Churcher and Churcher, 1999). Others, notably Carroll et al. (2001), have developed collaborative online communities that use spatial metaphors to organize a workgroup’s resources, including whiteboards, notebooks, representations of colleagues, and customizable user-created objects. CSCW applications for scientific collaboration often take the form of electronic notebooks, organized into hierarchies of chapters and pages (e.g., Lysakowski and Doyle, 1998; Myers et al., 2001), in which researchers can enter and search for free-form records (although these notebook are still linear in structure).

The descriptive nature of CSCW relaxes many of the normative constraints of formal knowledge representations. Unfortunately, the problem of describing scientific discovery is, in CSCW terms, what Rittel and Webber (1973) term “wicked” and Newell and Simon (1976) “ill- structured”. In contrast to tame problems, which might be modeled deterministically and about which different researchers usually have similar perspectives, with wicked problems there is rarely even agreement about the definition of the problem itself or the conditions that constitute a successful solution. Problem-solving methods for ill-structured problems are generally flexible and recursive; at any point in the process the problem solver might return to collect more observations, jump ahead to an evaluation stage, or stop to recast the entire problem. It is clear that formal ontology strains to represent this fluidity. The interrogation of wicked problems amounts to a series of nested “selective abductions” (Magnani, 2001), whereby an inquirer (1) makes an abductive guess that introduces a set of plausible diagnostic hypotheses for a set of observations, (2) makes deductive inferences to explore the consequences of these hypotheses, and (3) tests the hypotheses with inductive inferences from data to increase the likelihood of finding an accurate explanation (Newell and Simon, 1972).

Typically, CSCW tools support the exploration of wicked problems through highly structured forms of dialogical inquiry. Toulmin’s (1958) notation for representing the production of claims from observations is the basis for many such tools in planning, policymaking, and law, where it amounts to an aid to drawing deductive inferences. The Toulmin model is common in Issue- based Information Systems (IBIS) (e.g., Conklin and Begeman, 1989) that guide collaborative exploration of positions (which respond to issues) and arguments (which support or refute positions). While representing inquiry as a process of argumentation implicitly acknowledges the social aspects of meaning, IBIS tools tend not to support formal semantics beyond the deductive consequences implicit in their argument structure. The lack of well-defined semantics extends to natural-language-discussion CSCW tools, such as the Delphi method (e.g., Turoff and Hiltz, 1996; Turoff et al., 1999).

Recently, the CSCW community has begun to embrace ontologies as the basis for tools to support scholarly discourse. These ontologies describe the kind of entities that a CSCW system is capable of expressing (van Bruggen et al., 2003) and are typically considered generic containers for scientific work (e.g., Suthers, 1999), not as choices of perspective that are themselves contestable. Some CSCW tools, like ScholOnto (Buckingham Shum et al., 2000), develop discourse ontologies to express connections between researchers and published topics. However, the public record of science tells only part of the story, and not always faithfully; it does not reveal all of the analysis procedures, decisions, wrong turns, and intermediate results that underlie the work that merits publication. Publications are a high-level mechanism for knowledge transfer within a large community, but within science teams actively working on problems together, publications are not the primary means of communication (although they may provide background information). Much of the discourse relevant to science is thus inaccessible outside of the small groups in which it occurs. Practitioners in other places or times can have difficulty in reconstructing the discursive process that lead to a particular finding. Remedying this shortcoming to achieve reconstruction is the aim of Objectives A and C in Section 1.3.

CSCW approaches tend to be limited both by the static nature of their representational notations and by the relative simplicity of the semantic relationships they encode. ClaiMaker (Li et al., 2002), for instance, provides promising insight into mechanisms for answering queries based on perspective (“why”) or provenance (“how”), but treats semantic similarity between perspectives as transitive. In practice, differences in perspective result in intransitivity of relations – I might consider myself a good friend of someone who turns out not to like me at all. Further, work into providing portal-like environments for collaborative concept exploration does not allow for user- defined semantics and therefore cannot accommodate change in a group’s understanding (e.g., InvestigationOrganizer: Keller, 2003).8 Thus adoption of ontologies would seem to undermine some of the benefits of a CSCW approach to collaborative science.

Some of the most promising insight into the effectiveness of CSCW comes not from academic projects but from the adoption of “knowledge management” (KM) techniques in private industry. Frequently, KM tools are concerned with capturing, archiving, and mining documents and records (generally, text and numbers) produced in the course of business activity. Spurred by the practicality of the commercial enterprise, knowledge management embraces an almost Peircean view of pragmatism: knowledge (hence truth) is that which is used. The preservation of organizational memory and the creation of “collective intelligence” (Szuba, 2001) is only important to the extent that it improves the efficiency with which members of an organization execute tasks or the quality of their decisions (Fischer and Ostwald, 2001). Effective KM recognizes that making information widely accessible in digital form does not immediately make it useful; the paperless office requires users to change their work practices or tools to align neatly with their users’ tasks (Sellen and Harper, 2002). An ideal KM solution likely requires adjustments to both tool and user (these adjustments will be discussed in Chapter 5 as evaluative mechanisms).

The present work extends CSCW and KM efforts in four ways: by (1) representing concepts as more than expressions in natural language arguments; (2) not imposing a priori restrictions on

8 A portal is a model for interacting with distributed electronic resources, and is discussed further in Section 4.2.1. 33 the nature of the relationships between concepts; (3) representing concepts as they are constructed, not after they are reported; and (4) creating a conceptual model and user interface informed by insight into the cognitive structure of concepts.

3.2.3 Cyberinfrastructure The technical aspects of improving communication and collaboration between scientists are the purview of cyberinfrastructure – essentially, CSCW over distributed networks. One of the goals of cyberinfrastructure is to build online scientific workbenches by connecting distributed high- performance computing and data storage resources. Such workbenches would be virtual toolkits where researchers can find, analyze, and share data. Knowledge-based, cooperative exploration is not a perquisite of cyberinfrastructure, but this infrastructure can provide a framework for integrating knowledge representation and CSCW tools.

Two of the enabling technologies for cyberinfrastructure, Web Services and the Semantic Web, are steps toward knowledge-based computing. Web services incorporate open, XML-based standards for passing structured information between distributed applications and can be a possible mechanism for the automation of workflow processes. For instance, an analysis performed by one researcher might be stored as a service so that others can repeat it, perhaps using their own data. The Semantic Web describes both the general goal of augmenting the current syntactic Web with semantic markup, and the particular project of the World Wide Web Consortium to implement languages (primarily OWL) to achieve this markup. In light of the deductive nature of OWL and its ilk, some have suggested a turn toward a “Pragmatic Web” instead that explicitly enables communities to test, refine, and implement emergent, rather than top-down, solutions (de Moor et al., 2002).

Cyberinfrastructure is largely synonymous with the concept of collaboratories (Kouzes et al., 1996; Finholt, 2001), although the choice of label a project uses to self-identify can be telling. The term “cyberinfrastructure” emphasizes the technical aspects of distributed science work, “collaboratory” the social and pragmatic aspects. In a 2003 report grandiosely titled Revolutionizing Science and Engineering through Cyberinfrastructure (NSF, 2003), a National Science Foundation panel argues that current technological capabilities (in terms of computational power, storage capacity, and data transfer speed) have the potential to revolutionize the way science is performed. This revolution will enable research centers to achieve better and faster results by making it possible to transcend geographic boundaries in the pursuit of knowledge. This position should be by now familiar, repeating the still-reverberating claims of Vannevar Bush and Douglas Engelbart, and in fact was expressed in similar form ten years earlier in National Research Council report on the promise of collaboratories (Cerf et al., 1993). The NSF panel seeks nothing less than the “radical empowerment” of students and researchers through cyberinfrastructure, and of six critical applications for cyberinfrastructure that the panel notes, half are related to geography: understanding global climate change, protecting our natural environment, and predicting and protecting against natural and human disasters.

34 Currently, however, most cyberinfrastructure applications are in the physical and life sciences,9 and their emphasis is on metadata standards for data interchange. The few prominent geographic infrastructure projects reflect this focus: the Open Geospatial Consortium (OGC) is a standards- making body for geographic information interchange, and the Geospatial One-Stop is designed as a data warehouse for the National Spatial Data Infrastructure.10

The earliest efforts to create an infrastructure of federated databases focused on data schema; shared schema were thought to be a necessary prerequisite for communication between data sources (Sheth and Larson, 1990). Recognizing the difficulty in achieving agreement about data structure across a domain (and the problem of converting existing databases already bound to different schema), later work focuses instead on schema mapping (e.g., Elmagarmid et al., 1999). This mapping is the (generally manual) process of identifying correspondences between fields of one database and those of another, while allowing those databases to maintain their different representations. The problem with focusing on schema is that semantics are assumed to be implicit in the database structure, and it is up to the data consumer a) to interpret the meaning in this structure and apply it to a new situation, and b) to perform the mapping on a dataset-by- dataset basis. The goal of the Semantic Web and the cyberinfrastructure projects that use it is to make integration easier by explicating semantics. The problem of semantic mapping is, however, not far removed from that of schema mapping; data producers might use different ontologies to mark up their resources, introducing the same problems of interpretation and case- by-case mapping to achieve interoperation. The alternative is to agree on domain semantics by committee, but the success of this approach is clearly limited by diversity of opinion (what Ginsberg (1991), referring to authoritative ontologies recorded in a Knowledge Interchange Format, calls the “KIF of death”). Kazic (2000) suggests a promising middle ground, where information producers map database fields to a shared semantics that describes only the most abstract, simple concepts, called “semiotes” (alluding to Peirce’s semiotic, Section 2.2.3, in the sense that data elements should signify one of these basic concepts). Two interoperating data sources need not be mapped directly but instead mapped independently to semiotes, which become the mechanism by which meaning is brokered. The simplicity of semiotes is also their shortcoming: those ideas “most likely to engender controversy are left where they belong – as the private opinions of people, databases, or algorithms” (Kazic, 2000). But it is these controversial ideas, opinions, hypotheses, and theories that are often important to forming, evaluating, and modifying scientific explanations. Ideally, our ability to represent semantics computationally should not be reduced to the lowest common denominator upon which we can all agree. Disagreement may identify topics ripe for breakthrough.

What recent experience with cyberinfrastructure projects reveals is that access to information is not the key to understanding; there are substantial curatorial issues involved in creating huge information respositories (Hey and Trefethen, 2003). Effective cyberinfrastructure requires forethought to the problems of creating and representing meaning. The developments pursued in

9 The most conspicuous examples are the Bioinformatics Research Network (http://www.nbirn.net), myGrid (http://www.mygrid.org.uk), the Science Environment for Ecological Knowledge (http://seek.ecoinformatics.org), the National Ecological Observatory Network (http://www,nsf.gov/bio/neon), and the Geosciences Network (http://www.geongrid.org).

10 OGC: http://www.opengis.org; Geospatial One-Stop: http://www.geo-one-stop.gov 35 this research are not in themselves a form of cyberinfrastructure, but they will offer insight into how to bridge the divide between experimental procedures (i.e., manipulative abduction) and representations of the understanding that results.

3.3 Handling Knowledge in Geographic Computing Geographers are principally concerned with describing the world. Geographic Information Science devises mechanisms to represent these descriptions. But reality is not just that which we can represent cartographically, a set of features and attributes; our understanding about the world comes from using a variety of representational schemes, analytical methods, and individual perspectives. How much of this understanding can we, and should we, offload onto current representational tools, such as GIS? Does GIS adequately reflect the integrative nature of modern science? The problem of global climate change, for instance, is explored along a continuum of spatial and temporal scales, from local to global and immediate to future, and incorporates variables from fluid dynamics to policy choices. Are such complex problems well served by contemporary GIScience methods, especially when data-centric techniques can bias outcomes toward perspectives reproduced in quantitative measurements (Pickles, 1995)? If we want to create richer representations of the world that can help us express more complex relationships, we may need to start from a new, more abstract model of geographic information. As GIScience comes to recognize its intellectual roots in the problems of representation, the philosophical tradition can provide guidance about how to develop this model.

Knowledge representation languages, and the construction of ontologies that use them to describe features of the world, have garnered substantial attention in geographic applications (Doel, 2001; Frank, 2001; Fonseca et al., 2002b). Frequently, work on creating mechanisms for interoperable geographic concepts assumes that concepts and their definitions will already exist; what is largely missing from the geographic perspective on knowledge representation, however, is consideration of the process of knowledge generation, promulgation, and revision. As in other fields, semantic integration tasks in GIScience are too often treated as deductive endeavors based on the application of neutral, uncontestable, top-down rules. Geographic ontologies (e.g., Kuhn, 2001; Visser et al., 2002) are usually “hierarchical, categorizable, and sequential” – traits that Ted Nelson claims mask the true interdependence of the knowledge they embody.

The basis of geographic ontologies as top-down entities is frequently found in empirical assessment of commonly held categories (Smith and Mark, 2001). Smith (1995) further suggests that “good ontologies” for geographic information represent a certain fidelity to the real world, based in what Vckowski (1999) calls a “common model of reality.” It remains unanswered how this fidelity is to be measured in light of disagreement about the nature of an ill-structured problem. Making concepts interoperable through ontologies presupposes a “neutral ground” that is an objective foundation for resolving differences in classification (Fonseca et al., 2002a). It might be more useful to suggest that “good ontologies” are consistent with a perspective on the real world; concepts will take on different properties from different perspectives (Sheth, 1999). Functional descriptions have been one way to associate geographic concepts with perspectives (Rodriguez et al., 1999), although functional aspects tend to be (1) assumed to reflect a priori agreement or at least self-evidence, and (2) related to the functions in the world of the object the concept represents, not the functions in an investigation aimed at explicating the meaning of the concept. Bishr (1999) recognizes the first of these shortcomings, acknowledging that different

36 information communities may work from different intensional definitions of concepts. This is the problem of “discipline perception,” which creates a partial account for the context of a concept by describing it in terms of a community’s theoretical (but still tacit) knowledge (Bishr, 1998). However, Bishr reduces the reconciliation of these definitions to the following: “if each of the information communities has its own ontology, the two information communities must share a common ontology to be able to share certain information.” This claim assumes that the legwork of generating, testing, and agreeing on these structures has already been done (the difficulty of which has been demonstrated in the ecological domain by Davey and Tatnall (2004)). In most geographic domains, reaching this agreement, even if possible, would not be a “once and done” process – communities are always in flux, gaining and losing members who contribute unique perspectives. Even individual scientists, as Peircean “contrite fallibilists,” are in flux. It would be preferable to suggest that, far from sharing a common ontology, information communities must only have to understand or appreciate each others’ ontologies to share information. This appreciation can come from a hermeneutic fusion of horizons (Section 2.2.4).

Investigation into the role of map use in forming geographic concepts confirms the importance of perspective or horizon: the purpose of a single sign or representation changes fluidly with the nature of the inquiry, and a single representation can be applied to multiple problems (Sismondo and Chrisman, 2001). The present study extends this analysis of representational roles to more abstract descriptions of geographic knowledge. Geographers have also adopted the notion of boundary objects to denote points at which different actors’ perspectives cross (Harvey and Chrisman, 1998) – catalysts, perhaps, for horizon fusing. Boundary objects represent concepts that are used differently in different domains, or by different researchers, but they enable communication because collaborators can at least agree that they are talking about the same thing (Star, 1989). In the geographic domain, vulnerability to environmental change is a boundary object; it might be assessed differently from political or geomorphological perspectives, but these differences, if recognized, do not need to be problematic. This work’s explicit representation of perspectives can help identify boundary objects (Section 5.3.1 shows how the resulting implementation achieves this).

The second of the shortcomings noted above – the absence of a representation for processes of geographic inquiry – has been examined indirectly through work on collaborative decision making with geographic information (Armstrong, 1993; Bibby and Shepard, 2000; Jankowski and Nyerges, 2001) and the integration of human expertise and spatial data (Balram et al., 2004). These studies address the construction of meaning through GIScience tools by examining the dual interaction (1) among human stakeholders in a problem and (2) between stakeholders and computational methods. As a result of this work, there is growing recognition that geographic tools and information resources need to be considered at conceptual and social, not just technical, levels. In GIS, conceptual abstractions that mask implementational detail can help focus attention on the salient elements of a problem (Nyerges, 1991). Representations of the pragmatic use of data and tools can also supercede definitional characteristics in their ability to communicate conceptual meaning (Harvey et al., 1999), while Nyerges et al. (2002) formalize strategies for collecting evidence of pragmatic application. Kottman (2001) summarizes early use of GIS as a “classical period” in which the personal and the local bounded data collection and decision making. We are entering a “modern period” in GIScience which is collaborative and global: non-linear, multidimensional, wicked problems necessitate new models for

37 geographic information representation. Kottman argues that GIS cannot survive if GIS operations cannot be repeated; successful replication, I suggest, means that the pragmatic reasoning that went into the selection of data and methods must be preserved so that it can be reapplied in a new time or to a new problem.

A representational framework capable of reflecting the bottom-up emergence of concepts through application can extend, rather than eliminate, the usefulness of top-down ontological approaches currently common in GIScience. (The goals of this framework were introduced in Section 1.3). Emergent concepts that acknowledge the process of inquiry by which they were created can self-organize into ontologies that may be applied elsewhere in a top-down fashion. (For instance, a set of emergent categories could later be used as an a priori classification for a land use analysis). Top-down ontology can still be meaningful, but its defensibility stems from the pragmatic inquiry that produced it and the mechanism by which it came to be a commonly accepted form.

3.4 Toward Situated Knowledge in Information Systems The computational representation of information – in a KR tool, a CSCW method, or a GIS – requires on an underlying model, and we can view information systems as essentially these models and the methods that use them to store, retrieve, and analyze information. It is natural, then, to question the role of and reasoning behind these models. A model specification may differ whether the information itself is spatial or tabular or textual, but the role of the model is the same: it imposes restrictions on what can be represented, and on how it can be represented, to ensure that operations using the system are determinate. The system is (it would seem) a perfectly rational entity, since it returns results not because of its beliefs or tradition, but because of a fixed reference on which it can base every action (Marcos and Marcos, 2001). This rationality is not in itself a problem, as long as we do not treat the computational model of a problem as if it were equivalent to a universal mental conception of the same thing.

The traditional information system is built on a schema derived from a particular problem specification (Figure 3.4). The interpretation of resources contained in this system is left to the user (this is the model followed by most geographic applications in knowledge representation, as described in Section 3.3). In the traditional information system, the hermeneutic fusing of horizons occurs implicitly at several levels: between the user and the builder (this fusion occurs during the manipulation of the tool) and between the user and other users (this fusion occurs during the mental evaluation of information presented by the system). The play involved in this fusion is between a questioning user and a largely unresponsive system, just as it would be with a text in the user’s hand.

3.4.1 Interlude: Tying a Brick to a Pencil How would humankind’s ability to create records, to communicate, and to reason have developed differently if the closest thing we had to a pencil were the size and mass of a brick? Douglas Engelbart uses the case of a brick tied to a pencil as an example of a tool hindering human abilities. Subjects who try to write with this implement must adapt their working style to accommodate its clumsy affordances – to write faster, for instance, one must write larger. To reduce fatigue, one must write less. To what extent is using a contemporary information system like using a pencil tied to a brick?

When we must change the way we reason to align with the schema of our tools (or perhaps more worrisome, learn to reason using their affordances), we are experiencing de-augmentation. When our natural inclinations are subjugated by a problem structure dictated by a tool, we are experiencing de-augmentation. The challenge for information system design is to minimize the cognitive and workflow adjustments that applications entail. To meet this challenge, systems should not restrict users to computing with pre-defined concepts, but instead provide the tools to express new concepts as they arise, and to synthesize broader concepts from multiple expressions. Thus, the ideal system provides a meta-model, rather than a model, for scientific knowledge (Keller and Dungan, 1999). Collaborators should be free to define, structure, and manipulate resources according to their own perspectives (as introduced in Section 2.1.3).

3.4.2 Computing with Perspectives Hermeneutics offers a way to conceive of augmentative design in information systems. Others have begun to articulate connections between hermeneutics and information science, particularly in the area of tool development (Butler, 1998). Myers (1995), for instance, has shown that design criteria for information systems constitute a dialectical circle, and Gould (1994) suggests that hermeneutic principles inform user-centered approaches to GIS design. Fonseca and Martin (2005) find hermeneutics a useful critique for the lack of interpretive capacity in information system ontologies.

Those things that a tool does well are likely those that are within the builder’s horizon – these are the concepts and structures necessary to create a functional tool, but not necessarily what a potential user needs to create useful knowledge. While this choice of horizon is seemingly one- sided, the alternative is untenable – the user creates a specification that matches his or her needs but that is incompatible with anyone else’s (and potentially incompatible with any tool). There is also the problem of asking for too much introspection; would many of us be able to describe our models (our horizons) in such detail that an information system could be built around them? The challenge, then, is to design a model for information systems that combines the interoperability afforded by a shared schema with the capacity for idiosyncrasy. This is the challenge taken up in Chapters 4 and 5.

Winograd and Flores (1986) outline four justifications for an appreciation of hermeneutics in computation. These defenses are material to the present course of knowledge-based computing: (1) It is impossible to make all implicit assumptions explicit, due to the situated nature of our viewpoints. Computational tools suffer from an inherent blindness that reflects the commitments of their creators. (2) Descriptions of practice, of acting with tools, are more useful for interpretation than descriptions of theory. (3) Interaction with tools is not based on “detached contemplation” but on the manipulation of concepts. Understanding derives from “thrownness” – the necessity of acting in response to events presented by the system. (4) Meaning is shared socially, and computational environments must accommodate social networks.

39 Hermeneutics is not prescriptive, but if we work to improve the dialogical nature of information systems we can make the user’s role in the progress of understanding clearer. Knowledge representation languages and methods from computer-supported cooperative work offer a way to give our hidden interlocutors a voice in this dialogue – a voice that texts and data do not offer. Instead of handling only linear query-and-response, as in Figure 3.4, information systems can represent the recursion and discursion of understanding (Figure 3.5). In such a system the user works within a trail of concepts and interpretations that constitute the rational foundation for his or her work. During the process of inquiry, the relationships between interpretations are preserved so that changes in one can be used to detect breakdown in others. By examining these interpretations, the user engages in a playful dialogue with other elements of a problem, and the other thinkers who have engaged it. The spiraling development of understanding that results from the process of searching, evaluating, and reusing information over time (Abou-Zeid, 2003), as reflected in Figure 3.5, will be supported by the system introduced in Chapter 4, and its support for knowledge evolution will be demonstrated through use cases in Chapter 5.

In a step toward “untying the pencil,” revisiting the characteristics of hermeneutic interpretation introduced in Section 2.2.4 summarizes the changes we might wish to see in computational aids to scientific inquiry:

Dialogue: If resources in the system have a stronger voice – if they are able to indicate when, where, how, and why they were constructed and used – the implicit dialogue that leads to understanding between and among researchers and resources can be made more explicit. Play: The playful exploration involved in manipulating tools and data is lost in the traditional information system. By capturing it, we gain insight in the process of experimentation that leads to hypothesis generation. Prejudices: The horizon of an information resource (to what problems or places it is relevant) is generally implicit. Augmenting the representation of a resource with a view of its horizon can lead to its reuse in relevant situations.

Figure 3.4. A traditional information system emphasizes the creation and query of resources according to a predefined schema. This model requires the user to conform to the application’s schema.

40 Figure 3.5. A hermeneutic view of information systems improves their ability to serve as aids to interpretation. The user is situated with a historical trail of concepts (nodes along solid line), playful interaction with which affords the fusion of horizons and the continuation of the user’s and community’s understanding (dotted line). This view is not intended as a replacement for the model in Figure 3.4, but might sit above it as an organizing interface exposed to the user.

The hermeneutic circle: Scientific understanding evolves, and if we are able to keep track of how particular resources are interpreted through use, we may be able to point out other interpretations that will be broken by new developments. Breakdown: Tracking the use of resources helps detect conditions of breakdown, pinpointing the time and circumstances of new insights that resulted from hypothesis failure.

3.5 Summary The management of knowledge in computational tools straddles the boundary of philosophy and applied science. Current methods of modeling knowledge computationally tend toward the deductive and rigid, while collaborative techniques from CSCW frequently abjure formal semantics in favor of fluid discourse. For an information system both to represent scientific knowledge and facilitate its interoperation, a middle ground is needed. We can appreciate the benefits of ontological standardization, for instance, while treating ontologies as a snapshot in a larger process of discovery and refinement. We want to expose, instead of dismiss, the human conditions under which knowledge is created and applied.

Geographers are shepherds of a class of representational methods, the increasingly global and complex applications of which demand new, semantically aware techniques for information integration. At the same time, they work within a discipline distinguished by its eclecticism. In this ecology, different perspectives must not just coexist, but thrive on each other; geography is a supremely intertwingular pursuit. The goal for future information system design is thus to devise a knowledge model that supports human use and maintenance of information, geographic and otherwise. This model must reflect the flux and fallibilism inherent in scientific inquiry – “revision can strike anywhere” (Quine, 1990). The following chapter proposes a candidate model and describes its implementation in a knowledge-sharing environment.

41 Chapter 4 – Untying the Brick: Situating Knowledge in a Distributed Computational Environment

Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, “Memex” will do. A Memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

[The Memex] affords an immediate step … to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. The process of tying two items together is the important thing. When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. Before him are the two items to be joined, projected onto adjacent viewing positions. Moreover, when numerous items have been thus joined together to form a trail, they can be reviewed in turn, rapidly or slowly, by deflecting a lever like that used for turning the pages of a book. It is exactly as though the physical items had been gathered together to form a new book. It is more than this, for any item can be joined into numerous trails. (Bush, 1945)

4.1 Toward a Modern Memex In the second half of his essay As We May Think, Vannevar Bush proposes a series of technological solutions to the problem of collecting, analyzing, and communicating vast amounts of scientific information. These tools range from the mundane to the fanciful: microphotography will create portable records in pocket-sized film libraries; voice-recognition systems will transcribe the scientist’s thoughts onto punched cards; cameras mounted in his or her glasses will take snapshots of observations for the permanent record. At the center of this analogue world, all Bakelite and mechanical levers, is the Memex. This is scientific infrastructure as a formidable piece of office equipment – a device that at once serves as library, notebook, and culture dish for thought experiments. And what’s more, the Memex enables all of this through a radically new open and extensible structure that foresees Ted Nelson’s “intertwingularity.”

Taken as a chorus, albeit one separated by twenty years and the replacement of analogue switches with digital ones, the voices of Vannevar Bush and Douglas Engelbart are a contrapuntal plea for tools that can represent our thought structures as integral parts of the information resources we use. But more than sixty years since Memex and forty years since H- LAM/T, how much progress has been made? Can we attribute the fact that we are not that much closer to realizing thought-centered computation to the limits our technology? Likely not – computers are more powerful, faster, and ubiquitous than Bush or Engelbart would have imagined. So if technology is not the limiting factor, what is? I argue that we are still limited by our appreciation for the basic processes of knowledge construction. Without critical examination of the philosophical and cognitive bases for intellect, we are destined to continue producing tools that enforce a discontinuity between human understanding and computational representation.

42 In this chapter, I put forth a model for computational knowledge representation and a proof-of- concept implementation of that model in an aid to cooperative science work. This model and its implementation are informed by the history of inquiry and the limitations of present computational techniques outlined in the previous chapters. I begin by addressing the basic architecture for a system to support collaborative and situated knowledge representations. The convergence of key philosophical ideas outlined in Chapter 2 – situation, community, experiential manipulation – can motivate more complete models for knowledge-centered computing than are currently available. In support of such philosophically and cognitively informed solutions, John Sowa writes that “independently developed, but convergent theories that stand the test of time are a more reliable basis for standards than the consensus of a committee” (Sowa, 2000b). To represent our thought structures, we should not start by adopting a language like OWL as the Received Pronunciation for our domain, asking only “How can we represent our ideas using this structure?” Instead, we should ask, “What are the fundamental characteristics of knowledge-based work, and how should the design of computational tools follow from them?” The intent of this turnabout is to show that the solutions developed below are not dependent on any one technology, but on a longstanding tradition of inquiry into human understanding. Technological advances will spur changes in how the solution is implemented, but the underlying principles should remain valid. This chapter begins by sketching a system that supports the construction of cooperative expressions of knowledge (Section 4.2), followed by a discussion of the knowledge representation techniques implemented in this system (Section 4.3) and a visual interface to interact with networks of ideas (Section 4.4). The discussion here is limited to implementational detail; Chapter 5 examines use cases for this system in environmental and geoscientific problem-solving and evaluates the system’s ability to aid the capture, interpretation, and reuse of scientific knowledge.

4.2 Infrastructure for Collaborative Understanding: From Memex to Codex The ultimate goal of any knowledge representation scheme is to help people solve problems using computers. With this goal in mind, it is useful to begin implementing a KR system by conceiving the environment in which users will work. Scientists, or any information analysts who might use KR tools, should not be asked to become ontologists as well. Users should not have to think in terms of formal representations that add another layer of complexity to their work. We want an environment that masks the implementational details of knowledge representation, allowing users to focus on the work of reporting and collaboration.

Scientific notebooks are a familiar touchstone for thinking about recording measurements, results, and hypotheses as they occur. Leonardo da Vinci’s notebooks might be considered archetypes of the genre (Figure 4.1). In the pages of these notebooks are observations of the natural world, tentative theories, diagrams and explanations of experiments. The notes are not a linear narrative; a single notebook contains insight into dozens of domains, connected by interwoven themes. Each page is bordered by marginalia that provide commentary, revisions, and links to other areas of thought. Today, these notes provide crucial insight into theories of the world from half a millennium ago. We call this style of manuscript a codex, a book that consists of sheaves of wood or parchment. The codex offered a number of advantages over the scrolls that preceded it: codices had greater information density – both sides of a sheet could be written upon, and the whole could be stored vertically in a library – and it was easier to navigate through

43 Figure 4.1. Excerpt from Codex Leicester (1506-1510), one of Leonardo da Vinci's notebooks. © Corbis 1996.

pages than along a continuous scroll. In deference to the advances of these early recordkeeping devices, let us borrow the name Codex for our new tool.11

Over the course of centuries, the practice of making scientific records has not been radically altered. It should be noted however, that notebooks lacked one crucial motivator for change – widespread use. While some scholars, particularly those in laboratory sciences, maintain detailed descriptions of experimental procedures (often to leave an audit trail for patent or regulatory purposes), other domains like geography never adopted the practice of disciplined recordkeeping. While the lack of records may have been for lack of clear and present need, the increasingly complex, interdisciplinary, and collaborative nature of science now demands some forethought to the problem of communicating knowledge.

The Memex contains some important ingredients for a new Codex. Several months after the publication of Bush’s essay in the Atlantic, Life magazine reprinted an abridged version. In it, a Life illustrator created renderings of what the Memex might look like (Figure 4.2). While the Memex was an entirely hypothetical device, and it is unclear whether Bush contributed to or approved the drawings, they help us imagine how the Memex might fit into the practice of science. First is the form of the Memex itself; the size and shape of a desk, it is designed to be as personal a workspace as one’s regular desk. Second, the Memex promotes the act of manipulating information as a source of understanding. The user does not simply read information that is stored, but interacts with it by using the keypad to create a “codebook” of

11 The etymology of codex offers interesting parallels to the present purposes. Initially, codex (or caudex) referred to a tree or stump, and in Rome was the name for the wooden post to which criminals were tied – although we shall hope that our modern version is not seen as punitive by its users. Codex later came to mean sheets of wood coated in wax upon which rough drafts could be made, suitable for capturing the exploratory nature of investigation. In botany, caudex is a term for the central stalk of some plants, the structure from which leaves branch. In medicine, a codex is a record of pharmacological recipes – in the present case, codex might store the recipe for performing a particular investigation. 44 threads through an information space. Third, the Memex affords collaboration; users can share and copy microfilm records from each other.

4.2.1 Codex: A Knowledge-Based Portal Codex is a Web-based application that serves as proof-of-concept for the ideas developed in this thesis. Much as Memex is a permanent and always ready feature of the scientist’s workspace, the Web is a universally-accessible information appliance. In a Web environment, however, it is unnecessary for all of a user’s resources to be physically contained within his or her workspace. Codex uses the portal model to organize distributed resources under a single interface. (Here, the term resource is a generic label for a unit of information contained in Codex; a resource might be a binary data file, a description of an abstract concept, or even a representation of a collaborator.) The portal makes it appear to the user as though the resources displayed coexist together, but in the construction of a particular view into the portal, information may have been pulled from many locations.

Codex is principally a knowledge-sharing medium, and it occupies a niche not considered by other information systems. It operates at a level of abstraction different from that of the electronic notebooks described in the previous chapter; while Codex allows data files to be stored and linked together, data are described foremost by the human concepts they signify. An online scientific workbench, meanwhile, might focus on data integration for automated analysis; Codex treats problem-solving as an issue of human consideration and interpretation. A problem- solving environment might integrate several analysis tools in an attempt to support hypothesis generation (Sanchez and Langley, 2003), but fails to leverage the history of resource manipulations that result. Codex is at once a CSCW tool that enables rich semantic descriptions, and a semantic markup platform that relaxes the constraints of common ontological approaches.

Like the Memex, and following from an appreciation for Peircean pragmatic inquiry, Codex is

Figure 4.2. 1945 rendering of Bush’s Memex as a mechanical desktop. At left is a camera that reduces documents on the plate to microfilm. At center, dual displays allow the user to cross-reference material. At right is the keypad used to enter new links. Reprinted from Life Magazine 19(11), 10 September 1945.

45 built around the concept of workspace. Workspaces can be both private and communal. Each Codex user has a personal workspace to store his or her ideas, data, hypotheses, and so on. Researchers can move resources to shared workspaces where they can be accessed, applied, or modified by collaborators. The cooperative, Web-based nature of Codex means that one user’s insight can be made immediately accessible to colleagues a world away.

The researcher logging in to Codex is first presented with a nexus-like view onto the workspace (Figure 4.3). This starting point groups resources together under a set of default categories, providing quick access to the basic units of an investigation. From the workspace home page the researcher can rapidly upload a file, look in on collaborators, or describe a new analysis. Six types of resource are supported on this page, although these entry points can be supplanted with user-defined categories (further discussion of this topic is found in Chapter 5).

People. The individuals and groups who create or apply resources accessed through the Portal. Each person maintains a profile that can communicate elements of his or her background and expertise. Concepts. Descriptions of abstract ideas, such as “flood” or “earthquake”. Files. Binary data that express something about a concept. Files could include spreadsheets, text documents, images, audio clips, maps, or other data formats (quantitative or qualitative) that connect observations or measurements to the cognitive structures represented by concepts.

Figure 4.3. The home page for a Codex user’s workspace.

46 Tools. The methods used to analyze data and to construct instantiations of concepts (categories) from data. Tools could include GIS operations, visualization methods, predictive models, interviewing instruments, or statistical tests. Places. Geography is fundamental to integrative research, and places help researchers define the locations and scales under study, whether described as bounding polygons or as place names. Place also helps to account for differences in epistemology between researchers. Tasks. People, concepts, files, tools, and places are linked together through tasks that might describe a workflow process, an experimental procedure, or a problem-solving approach

(Time as a separate category will be included in future developments of the methods described here. Time is currently part of the knowledge model [Section 4.3.2], but has not yet been implemented as a distinct view into an information space).

A nexus view into these categories underscores the preeminence of interrelations between resources types in Codex; knowledge structures express the interdependence of data, analyses, hypotheses, and results. For instance, Codex might reveal a thread showing how a file was produced by a particular person as a step in a task aimed at describing the relevance of a concept to a place. As with the Memex, the act of making connections is the important part of knowledge construction.

There are some important correlations between Codex and the theories of knowledge outlined in Chapter 2. To Aristotle, for instance, different communities describe concepts with different characteristics; community workspaces allow collaborators to share a lingua franca of concepts and knowledge structures relevant to their domain. The directedness of Kantian judgments is reflected in task descriptions that can reveal the ends to which resources are put. Following from Peirce’s belief in the stimulating value of manipulating ideas through retroduction, Codex maintains continuous records of the manipulation of resources. As a result we can hope to capture the moment of spark where a plausible explanation was formed. Records of continuous change also preserve the synechistic nature of scientific inquiry, showing how streams of insight flow together to create rivers of understanding. Granting collaborators access to each others’ knowledge structures through shared workspaces also fosters hermeneutic interpretation. And from this shared use comes a representation of the evolution of knowledge. Each person who touches a resource leaves a mark – even if the user does not modify the resource, the very fact that it has been used is added to its description and indicates the transmission of information through a community.

4.2.2 Codex Architecture Codex is built on a two layer client-server design (Figure 4.4). At Codex’s core is a set of server-side applications that manage the basic functionality for maintaining a shared knowledge base. Client applications, such as Web portals, mobile devices, and analytical models, interact with this knowledge base through HTTP and XML communication standards. By enforcing a separation between core functionality and client interfaces, the set of services that manages situated, perspective-based representations of scientific knowledge can be used by multiple

47 Figure 4.4. Codex architecture.

applications at once. These applications can all be thin clients,12 leaving the server to do the heavy lifting of storage and inference while the clients provide interfaces or extensions to the core functions. This architecture also lets users interact with the same knowledge base through a variety of interfaces, supporting the insight that can come from examining information in different formats. Third party clients can also be developed over time that layer domain-specific views over the same underlying applications (for instance, a Codex customized for geoscientists). Currently, the Web portal has received primary development attention, although some work on enabling access to Codex functionality through mobile devices has also been conducted. Chapter 6 outlines possible routes for Codex to interact with cyberinfrastructure analysis nodes such as numerical models.

There are five components to the Codex server. Representation. All resources in Codex are described in the OWL Full semantic markup language. The representation module maps between resource expressions in

12 A “thin” client is one that contains little or no application logic, and is used mainly for data display and user interaction; it depends on a server for most computational tasks. A “thick” client is a larger application that uses the server mainly for storage or communication with other clients. 48 Codex clients and their corresponding OWL primitives, using the Jena API.13 This module is the hook for all user interfaces to Codex; it provides a set of Java methods to interact with OWL resources (e.g., add a new resource, express a new relationship, display an existing resource) while hiding the implementational details of OWL. Consequently, neither users nor client developers need to be facile with OWL, and in fact OWL could be replaced by a next-generation knowledge representation language with no impact on a client’s interface or operation. Storage. Concept and workspace files are stored as OWL text files to make them accessible to Semantic Web crawlers that index knowledge resources (just as standard Web crawlers index the syntactic Web for search engines). Storing OWL resources in a database is possible, and although this may lead to performance improvements, doing so puts the resources out of the reach of crawlers. Since the knowledge structures built with Codex are intended to be shared and reused, it makes sense to ease third party search and retrieval. Representing workspaces in OWL (that is, not only the content of a workspace, but the definition of the Codex workspace itself) means that researchers can use any third-party, OWL-compliant tool to access and manipulate their personal resources. The storage module is also responsible for maintaining links to external resources (for instance, OWL files imported into a workspace but that actually reside on a remote server). Registration. Each resource belongs to one or more workspaces, which represent a sandbox in which individuals and groups work. The registration module tracks the assignment of resources to these workspaces and the promotion of resources from one workspace to another (e.g., from a private space to a shared space). Whenever a client requests a resource, the request is passed first to the registration module, which locates the resource in a workspace and retrieves it from the storage module. The registration component also validates user permissions, determining what resources should be exposed to a user or a search engine (keeping hidden, for instance, those tentative knowledge structures that collaborators have marked as private). Query and Inference. Storing resources is one thing; finding them is another. The query module uses Jena’s inference engine to return resources based on their semantic relationship to the search criteria. For instance, a user interested in a particular concept can query for tasks in which it was used. The inference engine finds instances of tasks that contain the concept, perhaps limiting the results to only cases where the concept played a certain role specified by the user. Currently, the query interface does not support semantic similarity measures – it cannot use semantic distance to determine degrees of relevance among search results, although others have investigated such similarity measures (Formica and Missikoff, 2002; Raymond et al., 2002). Instead, the inference engine relies on semantics to follow trails of relationships between resources. Further examples of searches within Codex are presented in Chapter 5. Versioning. Codex resources change through use, so it is not possible to keep just one copy of a resource for all collaborators to share. Each user might make slight modifications that conform to his or her perspective. For each modification, Codex spawns a new version of the resource that contains a reference to its immediate predecessor (or predecessors, if it was created by merging properties from several resources). By following these ancestral paths, audit trails emerge that show the steps

13 http://jena.sourceforge.net 49 taken to put a resource into its current form. The versioning module also provides the facility to undo changes (by reverting to a previous copy of the resource) while preserving the evidence that they were made. These tentative paths that were later abandoned can still be informative.

The modular architecture of Codex leaves open the possibility of integration with other cyberinfrastructure platforms. For example, Codex could serve as the knowledge management node on a broader network, handling the capture and communication of explanations that emerge from manipulating information in a larger online workbench.

4.3 Modeling Knowledge in Codex Now that the basic architecture of Codex has been outlined, it is possible to provide a closer examination of how concepts, contexts, and situations are modeled. Together, concepts and contexts are the constituent elements of a unit of knowledge; situations show how these atomic units are variously linked to produce explanatory networks. The knowledge model discussed below is informed directly by the cognitive models for conceptual information described in Section 2.2.

4.3.1 Concepts and Contexts The Concept (capital C) is the universal set in Codex; every resource and set of resources that can be described using Codex is either a member of the class of Concepts or a member of a proper subset. The six resource categories introduced in Section 4.2.1 represent five proper subsets, or specializations, of Concept: file, group, place, task, and tool. The sixth, concepts, contains direct members of the class Concept (such as “chair”). The reason for this top-level category is that it allows certain rules to be instituted for the format of conceptual information and simplifies reification of resource collections as instances of another resource type (for instance, a file, place, and tool can be gathered and reified as a task). Each of the specializations extends the default Concept with unique characteristics; the file subset, for instance, adds properties for file location, size, type, and so on; the place subset accommodates attributes like place name and geographic coordinates; membership in the group subset is limited to instances of people.

The use of Concept as a universal quantifier also places Codex’s knowledge model in explicit opposition to contemporary style. Conventional wisdom holds that the ontology is the top-level category, the container for all knowledge represented computationally. This viewpoint is in fact hard-coded into the OWL language. Each OWL file is intended to represent a single ontology – a single way of structuring resources. There is even a top-level class in OWL called “ontology” that is declared at the start of the OWL file and provides metadata for the entire set of resources. This “ontology-first” model is highly restrictive; it presupposes a structure where there may be none (indeed, the act of manipulating resources in a tool like Codex might be for the very purpose of devising this structure, so imposing one at the start would be futile). Given the emphasis of the present work on structuring knowledge from the bottom up through pragmatic use, Codex takes a “Concept-first” model.14

14 One might well ask, given these shortcomings, why use OWL at all? It turns out that the problems with OWL are more cultural than technical. OWL can be made to do what we want for Codex, but we must change OWL’s default representation of knowledge structures according to the goals of this work. 50

Each Concept in Codex is an OWL statement that contains intensional, extensional, and contextual components. This tripartite model mirrors the relationship between representamen, interpretant, and object in Peirce’s semiotic (Figure 4.5): The representamen, or signifier, is the intensional definition of a Concept. In Codex, intension is modeled through OWL class definitions. This intension is based on a dimensional variety of probabilistic model, compatible with the models of Rosch and Mervis (1975) and Gardenfors (2000): a Concept C is the set of properties {P1…Pn} that characterize it. Each P is another Concept typecast as an OWL property. The object is the extension of a Concept. The extensional set may be empty, or it may contain one or more instances that represent cases of C typecast as an OWL individual. Context and situation play the role of interpretant, helping collaborators understand the meaning of the Concept. The next section discusses representation of situations in more detail.

Unlike “real-world” semiotics, where there may be no direct correspondence between representamen and object except through interpretant, in the Codex model the two are directly related as a result of OWL instantiation. An OWL instance explicitly reports the class of which it is an extensional member. However, just as Chapter 2 showed that semiotic roles are flexible, Codex allows Concept components to take on different semiotic positions. The Codex representation of a file, for example, can be both representamen and object. In the former case, the file is a signifier for a set of resources in its extension (derivative data, perhaps). In the latter, the file itself is the extensional example for an abstract concept. Through reification, even the interpretant can be used as representamen or object in another Concept; a situation, for instance, can be treated as a signifier in a meta-cognitive act. Having engaged in a retroductive activity in Codex, a researcher might want to say something about it; reification of interpretant makes it possible to “make a statement about a statement.” In most other information systems, the roles a resource plays are not flexible, limiting the reasoning paths that can be explored. Here, though, it is possible for researchers to describe different situations for the same resources, which can help to identify boundary objects useful for making associations between perspectives (Section

Figure 4.5. Semiotic view of Concept structure in Codex.

51 3.3).

The elements of context are represented in Codex as a set of metadata tags appended to every resource. Most of these tags conform to the Dublin Core metadata standard (DCMI Usage Board, 2005) in an effort to make individual resources useful to tools outside of Codex, although within Codex they provide extra functionality.15 Only two Dublin Core elements, those that provide descriptive content (title and description) in natural language, are user-specified; the remainder are captured automatically by Codex as resources are manipulated. These automatic elements include a timestamp of each manipulation and references to the username of the person who constructed the resource or added it to the system (creator, contributor). In addition, the publisher tag creates a link to the client used to create the resource (for instance, the Codex portal, a handheld device, a third-party visualization package). Type and format tags extend publisher data with a controlled vocabulary of descriptors that indicate (1) whether a resource represents an abstraction, event, image, service, interactive resource, physical object, sound, text, and so on; and (2) characteristic associations with applications that should be used to further manipulate the resource (for instance, an image viewer for a GIF file, a spreadsheet for tabular data, or a web service for a numerical model). Combinations of publisher, type, and format furnish important resource-level clues that guide appropriate reuse of a resource in new situations.

A resource’s versioning history is also included in its context header. There are two versioning tags, prior_version and built_from. The former has a restricted cardinality. Only one resource can be the immediate predecessor of a current resource, and this tag is automatically added to the new resource that is spawned whenever the properties of an existing version are changed. The latter has unlimited cardinality; its value is the set of references to all resources from which a given resource inherited properties. For instance, an earthquake concept could be built by borrowing properties like magnitude and location that already exist in other concepts.

Codex’s approach to versioning ameliorates some of the difficulties with performing inference in OWL Full. In Codex, an OWL class definition for resource x amounts to the logical assertion ∀x (that is, the resource definition is to apply to all current and future instances of x). Creating an individual in Codex asserts ∃x (that is, there exists an x such that…). But in an environment that allows multiple researchers to modify and extend resources in new circumstances, contradictions can easily arise; how should two different∀ statements about the same x be reconciled? For example, what if one user defines a concept through one set of properties, but another’s definition explicitly excludes one of those properties? Because Codex generates a new version of a resource each time any researcher uses or modifies it, contradictions at the language level are rare. The Codex versioning module ensures that there are never two cases of ∀x – only ∀x1 and∀x2 .

Figure 4.6 is a simplified OWL snippet that illustrates how a typical resource in Codex blends semiotic aspects of intension, extension, and context. The long alphanumeric strings that begin with the letter c uniquely identify each resource, property, and version thereof; coupled with the

15 Introduced in Section 2.1.2, the Dublin Core metadata set (http://www.dublincore.org/) provides an interoperable, and highly generic, standard for describing online resources. 52 storage protocol in Codex, this string provides a globally unique path through which any application can directly access any unit of information in Codex. In this example, the resource in blue is an abstract concept (i.e., a direct member of the Codex class Concept) called “Earthquake risk.” This title and accompanying description offer syntactic meaning. The structure of the concept encodes its semantics. Here, earthquake risk is intensionally defined through a single property, which is a cast of a “Distance decay” concept defined elsewhere in the user’s workspace. This decay function has been given an alternate title to use when it acts as a property: “variesWithDistance” supplies a verb form for an otherwise nominative concept. The target of this property is a particular instance of the concept “Geographic area” (“Fault zone” is in the extension of things that are geographic areas).

Including only intensional properties in Codex resource definitions creates parsimonious knowledge representations. If resource A relates to resource B such that B is part of A’s intension (A→B), A is the relationship’s domain and B is its range. The Codex definition of A includes only those properties for which A is the domain. In the example of Figure 4.6, “Earthquake risk” is the domain and “Fault zone” is the range of the relationship “variesWithDistance.” There may be a resource C for which A is range (C→A), but A’s representation is not aware of this relation. For instance, there may be a concept “Insurance rate” that is defined in part through a relationship to “Earthquake risk.” But because “Insurance rate” is not an intensional part of “Earthquake risk” (“Earthquake risk” is the range, not the domain, of this relation), Codex does not encode insurance as part of risk. The alternative is to encode each relationship twice, once in the domain and once in the range, but Codex avoids this to reduce the likelihood of an inference engine entering a loop while traversing a resource set. The only drawback to this approach is

Property dc:description="Area of fault slip..." Individual dc:title="Fault zone" of

Figure 4.6. Simplified OWL implementation of a representative concept in Codex.

53 increased search time, since there are fewer hard-coded pathways between resources. In the case of truly transitive relationships A↔B (as would be the case with family relations, say), Codex applies an OWL transitivity clause to the property, but still encodes it in the domain only.

In Codex’s Concept model, it is impossible for any resource to be an empty set; even if the resource has no explicitly encoded semantic properties, it still has contextual properties that link it to other resources. Knowledge of the resource’s creator lets it be viewed as a member of a researcher’s corpus; its timestamp puts it in historical context with contemporary resources. Reliance on these contextual properties is the basis of one technique for inferring situation, described below.

4.3.2 Situations and Perspectives Following from Chapter 2’s examination of the situatedness of cognitive concepts, Codex employs situations to support the description of interpretive stances (knowledge capture) and their adoption or appreciation by others (knowledge sharing). There is a one-to-one relationship between Concepts and contexts (Section 2.1.2); each Concept has a single context, for each new version of a resource initiates a new context. There is a one-to-many relationship between Concepts and situations. A situation is an arbitrary group of resources and the relationships that connect them; a given resource can be used in any number of different situations (Section 2.1.3). The capacity to work with multiply-situated resource descriptions makes Codex unique among CSCW and ontological approaches to knowledge management.

Codex does not explicitly represent situations through a special set of tags, as it does with contexts. Representation of situational meaning is instead software-driven, based on the semantic and contextual relationships already stored within resources. In Codex, a situation S is a subset of the universe of resources U managed by the system, S ⊂ U . Any S is the union of a

set of Concepts and a set of property relations, S = {C1...Cn }∪{P1...Pn } , although either of these

sets may be empty in a valid situation. Situations can be additive, S x = S y ∪ Sz , built from the union of two or more other situations. The consilience of two or more situations can be assessed through the set that represents their intersection, Sc = S y ∩ Sz . Here, Sc shall be called the consilient set, expressing the overlap or agreement between situations.

The justification for not storing situations as explicit collections in Codex is twofold. First, there can be redundancy between situations (one resource stored many times, once for each situation in which it exists), and avoiding the storage of extraneous information is desirable. Second, situations can in theory be inferred around any set of resources. In the long run, it is wiser to store the procedures for creating collections through an arbitrary set of parameters (therefore, as software rules in Codex) than it would be to hard-code each possible collection separately.

Codex supports two varieties of situation, user-defined and inferential. A user-defined situation is formed on the basis of resource selection and/or definition by an individual researcher. For instance, in the course of defining the “Earthquake risk” concept in Figure 4.6, the researcher might: 1. Define two new concepts, “Earthquake risk” and “Distance decay.”

54 2. Find an existing concept, “Geographic area” and create a new instance of it, “Fault zone.” 3. Relate “Earthquake risk” to “Fault zone” through a distance decay property. There is now a situation that contains a small set of concepts and relations (indeed, Figure 4.6 depicts a sample situation in Codex). Should another user query for “Fault zone,” Codex can show that in one situation, a fault zone is a geographic area prone to earthquake risk. (Section 5.3.1 contains examples of this process).

Inferential situations result from detecting relationships between the contextual elements of the resources in a given set. What makes them special is that they do not require resources to have any predefined semantics; relationships between resources are inferred on the basis of co- occurrent context attributes (Langley et al., 2002). Codex allows users to search for inferential situations over sets bounded by (1) the resources contained in a given workspace or (2) the resultset of any query over a larger set. Inferential situations can be useful for spurring retroductive hypothesis generation by presenting candidate knowledge structures to the user. These structures are not ones that he or she created, but that represent other ways in which the resources in one’s workspace could be connected.

Exposing inferential situations in Codex is a variation on collaborative filtering, a self-organizing approach to detecting ad hoc relationships (Ansari et al., 2000). Collaborative filtering is typically used to produce recommendations in e-commerce environments – for instance, based on the books for which I have expressed a preference, an online bookseller will recommend more books I might like by examining what others who share my preferences have read. The same connections are possible in Codex, where an inferential situation can be built around any subset of resources that share a time, place, creator, and so on. The collaborative nature of Codex means that we can look across a user community to find relevant resources. In the simple case, Codex could build a situation around a user’s query for resources that his or her collaborators created within one day of the time a target Concept was created, creating a situation of temporal association, or could recommend one researcher’s “Seismology” resource to someone who already uses that person’s “Earthquake risk” concept (Figure 4.7). (A further example of these situations is found in Section 4.4.2).

Figure 4.7. Detecting inferential situations in Codex resource descriptions.

55 In Chapter 2, perspective was introduced to denote the view onto a problem taken by a particular thinker. A perspective is a choice of situation, and the Concepts and contexts therein, that reveals the directedness or purposiveness of reasoning. In Codex, situations reduce the complexity of an information space by constraining the resources that are relevant to a problem from a particular perspective. If we enable these perspectives to be reused – that is, to use one user’s perspective to contribute information to another user’s understanding – then knowledge can be communicated in radically new ways. Through reuse, the knowledge structures that Codex stores can actively participate in hermeneutic dialogue; the situation-cum-perspective becomes an interactive, interpretive device.

To understand how perspectives work in Codex, consider the children’s amusement where a colored plastic lens is passed over a complex background; suddenly a pattern appears, often a word or image in answer to a riddle. The lens absorbs certain wavelengths of light while permitting others to pass through. The result is that some of the complexity of the printed background is obscured, allowing only the salient elements – those that are compatible with the composition of the lens – to be seen. A Codex perspective works on the same model (Figure 4.8). The perspective filters out some information, revealing only certain “wavelengths” of meaning that conform to the resource types present in a given situation. The remainder of a concept space is masked.

To examine a set of resources from different perspectives, the Codex user foviates on a resource and queries for the situations in which that resource is found. Codex can combine situations to either restrict or expand the selection of resources salient to a perspective.

Figure 4.8. Perspectives filter a complex information space according to particular situations. Perspectives A and B preferentially select different types of resources and relations from the universal set of all Codex resources. 56 In the union of situations, the researcher finds the bounds of a problem space, given by the complete set of resources that a community deems relevant to it. The uniqueness of a particular perspective is found in the relative complement (or set theoretic difference) between situations. That is, the uniqueness of a researcher’s perspective can be described as the set of resources that are in the situation through which he or she describes a problem, but that are not found in anyone else’s. Taking the complement of a perspective can also reveal areas of hermeneutic breakdown, where concepts in one user’s perspective fail to correspond to those in others’. The resources and relations in an intersection, the consilient set, constitute a new situation that represents the points of agreement within a community. Codex uses the consilient set as the basis for expressions of community or domain belief that might qualify to be used elsewhere as top-down knowledge structures.

Although perspectives can be compared and integrated in Codex, Codex does not mandate the use of a “neutral” ontology as might be the case with other tools. A neutral ontology amounts to a mapping vocabulary that regulates interoperability between terms in different ontologies. In Codex, a common vocabulary can be discovered if one exists, but it is not an a priori requirement to describe knowledge in Codex. Examples of perspective comparison are discussed through use cases in the next chapter.

4.4 Interacting with Situated Concept Networks Now that we have defined the architecture for a collaborative concept management system and a concept model capable of reflecting the situatedness of scientific knowledge, we need a mechanism for researchers to encode and extract knowledge. There is a precedent for this mechanism to be a visual one; the Memex interface (Figure 4.9) consisted of dual microfilm displays and a series of code buttons through which a scientist threaded resources together. Memex preserved a correspondence between the internal representation of the relationship and its presentation to the user. More recently, many concept visualization techniques have been developed and successfully used to depict relationships among electronic resources (e.g., Kamada and Kawai, 1991; Hetzler et al., 1998; Havre et al., 2000; Fabrikant and Buttenfield, 2001). In particular, visual techniques can provide contextual cues that help users navigate dense information spaces (Utting and Yankelovich, 1989). Visualization schemes have also become standard fare for online scientific notebooks (Edelson et al., 1996) and cyberinfrastructure applications (Schissel et al., 2002).

Most visualization techniques applied to the problem of knowledge construction are not two-way – they are used as presentation media, but do not let researchers use them to describe the construction of new knowledge. Some approaches founded in ontology visualization (e.g., Storey et al., 2001; Suh and Bederson, 2001) do enable both creation and depiction of semantic information and provide clues to the role of a concept, but tend to rely heavily on text, rather than on spatialization and other visual variables, to communicate semantic information. There is evidence, however, that visual faculties are implicated heavily in the process of abduction (Shelley, 1996) and that visualization aids problem formulation in geographic domains (Blaser et al., 2000). Ideally, an interface to a knowledge construction tool like Codex will trade on the relationship between visualization and hypothesis generation.

57 Figure 4.9. The Memex user interface. Bush recognized that the visual depiction of resource relationships was vital to effectively describing information trails. Reprinted from Life Magazine 19(11), 10 September 1945.

4.4.1 Describing Knowledge Structures with Concept Maps The selection of a visual interface is helped along by the graph structure of the OWL notation. In Codex, a set of OWL resources and relationships amounts to a set of nodes and edges, a directed graph where resources point to both their intensional structure and their extensional examples (although undirected edges are sometimes possible, such as in the case of transitive relationships). In Chapter 3, we saw that Peirce (and later, Sowa) preferred visual depictions of knowledge – the Existential Graph was a logical formalism, but it was also a map-like illustration of knowledge. Peirce thought that the process of graph construction, what he called diagrammatic reasoning, stimulated hypothesis creation.

The concept map is a visual language for graph structure and an evolutionary cousin of the Existential Graph. While the name and popularity of these maps arose within the last forty years (Novak and Gowin, 1984), the idea itself is of much longer standing. Unlike EGs, however, concept maps tend to be highly informal, and thus highly expressive. There are rarely constraints imposed on the content of nodes and edges, although by convention nodes often represent nouns and edges verbs. In pedagogical contexts, concept maps can depict an existing cognitive structure into which new resources are situated – and reasoning about situation is the essence of learning (Novak, 1990). To Bush, Engelbart, Gadamer, Peirce, and legions before, knowledge is all about making connections between ideas to create something of greater value. Concept maps are a useful way to privilege the process of making those connections.

58 Implementations of concept mapping in computational environments fall in two categories. Following from the informal nature of most concept maps, the majority of applications are basically sketch tools that provide a visual palette of nodes, edges, and labels to the user. The semantics of the concept map are implicit in the labels; without them, the concept map is meaningless. Examples of this “syntactic” concept mapping include CMap (Cañas et al., 1999) and Belvedere (Suthers, 1999). Alternatively, techniques like UML (Unified Modeling Language) derive from the highly formal needs of computational system design, and employ a strict visual vocabulary to ensure fidelity of communication between parties. UML captures semantic associations, but it is not appropriate for everyday use as a knowledge construction tool by lay users.

Codex seeks a middle ground, a concept mapping approach that permits expressiveness while preserving the semantic relationships between resources. The concept mapping interface built for Codex is a client plugin that translates the graph structure of a set of OWL resources into a visual map. Thus, the formality of the concept map in Codex corresponds to the formality of OWL, that is, it is backed up by the capacity for inference across a graph but permits the complete expressivity of OWL Full. The concept mapping client is intended to be but one of a suite of techniques to interact with Codex; drawing concept maps might be useful, but it is not the only way one might want to describe and explore knowledge. Moreover, reasoning with concept maps can entail cognitive shift that is not appropriate in every domain or for every task (Kinchin, 2001).

Concept mapping approaches that lack underlying semantics often rely on graph-theoretic measures of conceptual similarity. Any comparison between knowledge structures must be done on the basis of graph depth and topology alone (e.g., Redeker, 2000; Leake et al., 2002). These sorts of comparison often result in convenient similarity metrics, but of how much use is the quantification of similarity to the average user? Certainly topologically derived metrics can provide a starting point for measures of relevancy when searching for resources, but a single similarity value of, say, 0.85 does not offer much help to the researcher wishing to understand how his or her methods compare to a collaborator’s. But because information resources in Codex exist in multiple situations, each with a particular purposiveness, researchers can compare their ideas in terms of the purposes to which they are put. Codex can make use of the semantics of perspective to depict – visually – how resources are used in different situations.

Codex’s concept mapping client is developed around a dynamic graph browser derived from a third-party, open source graph library.16 Figure 4.10 shows a sample Codex concept map developed around a set of information resources related to the notion of seismology. In the client, OWL classes or individuals are depicted as nodes; relationships between nodes (i.e., their properties) are depicted as edges. The semiotic nature of concept representations in Codex is seen here in the form of iconic representations for selected nodes. For instance, the concept “Seismic reflection” (here, a representamen, or signifier) stands for a particular reflection profile (the object), which is pictured. The graph structure itself is one situation in which the resources it contains are found; the structure is thus an interpretant for its resources.

16 TouchGraph: http://www.touchgraph.com/ 59 Figure 4.10. Codex concept map client. Typically, collaboration via Codex is asynchronous – one user views another concept structures after they have been created. Codex’s client-server model supports synchronous collaboration as well, however. Two or more users can load a shared concept set into their map canvas and manipulate it individually. As they add new resources, or add and change relationships between them, Codex can update each collaborator’s view to show updates in near-realtime.

4.4.2 Creating and Using Situations through Concept Mapping The situatedness of knowledge structures is captured on the fly as a user manipulates resources in the Codex concept map client. In the case of defining a new concept, for instance, the user might drag a new node onto the map canvas or drag an edge between two existing nodes (creating a concept cast as a property). Either action causes the client to initiate a request for the newly defined resource (step 1 in Figure 4.11) from the Codex server. In response, the server creates an empty OWL resource waiting to be populated with user parameters, and presents a context-sensitive control panel to the user to collect these parameters. When these parameters

Figure 4.11. Flow of control for a sample “add new resource” action in Codex.

60 are returned, the server builds the OWL statement, registers the resource to the user’s personal or shared workspace, and returns a representation of the new resource to the client. The form this resource takes varies according to the capabilities of the client; in the case of the concept mapping tool, the server produces an object that maps resource parameters onto visual variables (e.g., color, node size, or node symbol).

Figure 4.12 details some of the options available through control panels during the concept map construction process. When defining a new concept (top left), for example, a user provides the descriptive elements of context that cannot be captured automatically. Meanwhile, the content of the concept – its semantics, defined through its intensional properties – is captured automatically through the edges that the user draws. When drawing an edge, the Codex server presents a

Figure 4.12. Sample control panels that constitute a portion of the Codex user interface and that manage interaction between client and server. Left, top: defining a new resource. Right: adding a property cast of an existing concept. Left, bottom: A nexus-like wheel for switching between perspectives.

61 control panel for defining a property (right). Here the user has the option of selecting ontological operators (the generalization/specialization hierarchies that OWL typically supports) or using a property cast of an existing concept.

As a user draws a graph, he or she is creating a situation for its set of resources. When the user adds a copy of an existing resource (whether as a node or an edge), Codex now contains multiple situations for that resource: the situation in which it was originally created, and the new situation in which it is being applied. Each of these situations constitutes a different perspective on the same resource (either the perspectives of different researchers, if the resource has been used by multiple people, or the various perspectives that an individual scientist adopts when thinking about the resource in different situations). In facilitating reuse of existing resources, Codex enables users to extend their understanding by borrowing ideas from collaborators. This mechanism is a variation on the network elaboration technique (Eckert, 1998), which has been shown to be especially useful in pedagogical settings: learners can start with a simple structure provided by an instructor, text, or colleague and gradually extend it with new information as their learning progresses.

When a resource is found in multiple situations, we want the user to be able to navigate between them easily. Moving between different perspectives affords the adoption of interpretive stances – it is a kind of hermeneutic play. A user interface widget to speed the application of different perspective filters to a Codex concept space is shown in the bottom left of Figure 4.12. This interface reinforces the nexus-like connections between resources; in its default state, it provides six clickable nodes that change the perspective on a selected resource by preferentially filtering for concepts, files, tools, social networks, places, or tasks that are related to that resource. These default filters operate on the full set of resources in a user’s workspace (including the shared workspaces of groups to which the user belongs). The user can add new filters to this default set based on any situation. That is, the user loads or creates a graph in the map canvas, clicks the “new” button in the perspective nexus, and Codex searches for other situations that contain this set of resources and relations. (Currently, these new filters are not stored permanently, but continued development of Codex will see them added to a user’s library of personal filters).

A simple example shows how different perspectives are displayed in the Codex concept map client. Suppose a geoscientist has created a concept map describing the domain of seismology (Figure 4.13a). In this example, the graph represents an ontology and contains only concepts, so we would say that this situation is directed toward describing the structure of a domain. Now that the user has described this structure, he or she is interested in finding other situations for one of the concepts it contains, “seismic velocity.” (It is possible to find situations for a set of resources of any size, but for simplicity we will use a single node). The user selects the seismic velocity node and clicks the “task” button in the perspective nexus. Codex searches for situations in which a task has been described that includes this concept; Figure 4.13b reveals such a task. Now the user has gained a new perspective on the concept of seismic velocity, seeing it situated in a network of different purposiveness. Next, the user might want to know which collaborators have also used this seismic velocity concept; clicking on the social network button produces Figure 4.13c; based on information in context headers, Codex finds users who have applied the concept. In this case, two of the “users” are in fact groups that have applied the concept in shared workspaces, so the researchers shown use the resource indirectly though their

62 Figure 4.13. Four perspectives on a “seismic velocity” concept (red node). a) Intensional concept structure. b) A task that describes how seismic velocity can be measured. c) A social network built around users of the concept. d) Data resources that have been used to describe seismic velocity. membership in the groups. Finally, the geoscientist wants to know what data files contain information on seismic velocity. Clicking on the “files” button in the perspective nexus produces Figure 4.13d; here, nodes represent instances of files that other researchers have included in the extension of seismic velocity. While the first two situations are user-defined (they only exist if a Codex user has built them), the latter two are inferential – they can be found automatically on the basis of information in resources’ contexts (Figure 4.13c) or resources’ content (Figure 4.13d).

Through the nexus interface, the visual display of information in Codex changes to correspond with a user’s cognitive focus. Rather than show all possible relations between seismic velocity and any other resource, the perspective filters facilitate interpretation by displaying only the nodes that are salient to the chosen situation. Changing the visualization to suit the user’s goals has been shown to increase the efficiency and effectiveness of interpretation (Neuwirth et al., 1998); in light of to the fluid, adaptive nature of problem-solving, enabling this sort of change can quickly communicate multiple approaches on a problem to a thinker coming to grips with its complexity (Chung et al., 2003). The Codex concept map client and its mechanism for navigating across perspectives is a step toward increasing the transfer of understanding among collaborators.

63 4.5 Summary The Codex system attempts to put the philosophical and cognitive bases for scientific knowledge, outlined in Chapter 2, into practice. In so doing, it provides a new model for knowledge-sharing software, one that concentrates on the social and evolutionary aspects of understanding. On the need to capture and share threads of meaning through dense information spaces, Bush writes, “There may be millions of fine thoughts, and the account of the experience on which they are based, all encased within stone walls of acceptable architectural form; but if the scholar can get at only one a week by diligent search, his syntheses are not likely to keep up with the current scene.” Inspired by Memex, Codex reconfigures the “stone walls” of our information systems, making it easier for researchers to access the thoughts, and the experiences and manipulations that created them, of collaborators.

As a semantic mediator, an application that can broker between different representations of concepts, Codex brings to the fore what information science typically relegates to the backstage domain of database interoperability (Moulton et al., 2001). In the practice of science, semantic interchange is the star player. Without it, and without tools that aid it, the progress of knowledge is slowed. By representing the human perspectives on a problem, Codex manifests the keys to mediating understanding. In the following chapter, the knowledge model and interfaces introduced here are applied to two uses cases in the environmental and geological sciences. Through these cases, the ability for Codex to support the interpretation and reuse of knowledge – its ability to “untie the brick” that encumbers communication by suppressing situations – can be evaluated.

64 Chapter 5 – Putting Perspectives into Practice Systems like Codex have the potential to change the way researchers interact with computers and with each other. To assess this impact, this chapter evaluates the roles that situated knowledge representations, and their implementation in Codex, can play in real-world research activities. This evaluation comprises three parts. First, I address how well Codex meets the criteria for effective knowledge sharing as expressed in the knowledge management literature. Second, I describe the formative evaluation of Codex in the two application domains in partnership with which it was developed and deployed. Third, the literature rubrics and use cases will help judge the present implementation against the objectives outlined in Chapter 1.

5.1 Assessing the Impact of New Tools There are two components to evaluating the tools and methods of which Codex is comprised: the dimensions along which assessment is performed, and the mechanisms of assessment. The former constrains the things we want to measure; the latter constrains the techniques by which we measure them. Assessment dimensions examine the changes to an individual’s or organization’s knowledge building practices brought about by an information system. o Psychological changes encompass the modifications to user work practices induced by a tool or method. Positive changes could include better understanding of one’s place in a community, stronger connections with collaborators, and enhanced ability to express complex ideas clearly. Negative psychological changes (Section 3.4.1) include increased cognitive overhead, forced alignment of user actions with system horizon, and confusion caused by obscure interfaces. o Operational changes reflect productivity adjustments. Positive changes could include shortened response times, increased quality and volume of stored resources, and greater frequency with which a knowledge base is consulted. Negative changes include difficulty in creating and finding relevant resources, less communication between collaborators, and decreased workflow efficiency as a result of poor system performance. Assessment techniques provide ways to describe the performance of knowledge building tools. o Quantitative approaches employ performance metrics (such as time-to- completion for a given task) or usability quantifiers (ease-of-completion) to produce expressions of success or failure. Such approaches might rely on either software sentinels that monitor user actions or comparative usability experiments using existing tools as controls. o Qualitative techniques address the integration of new tools into work practices through descriptive means. These descriptions might include use cases, task observations, and user feedback through the prototyping process.

Here, I focus on qualitative assessment of the psychological and operational changes brought about by situation-driven knowledge representation strategies. Existing quantitative measures (some examples are given in the next section) provide a precision that comes as a result of tight integration with user tasks. Often, collecting information at this level of detail can require a separate layer of monitoring software atop the system we are interesting in evaluating; devising

65 techniques for this monitoring would require much additional research so is left for further work. Since the aim of the methods under development here is to support situated expressions of knowledge, it follows that an examination of how the methods themselves are situated in science work can be informative.

5.2 Rubrics for Evaluating Knowledge Management Systems The increasing adoption of knowledge sharing tools, albeit focused on document- and data- centered tactics, has lead to the development of guidelines for assessing knowledge management methods. The literature’s prominent rubrics, enumerated in this section, can be treated as prescriptive measures for good system design.

Collins (2003) identifies four broad goals for knowledge-based portals: 1) Knowledge requirements must drive technology, not the other way around. 2) Desktop access should be customized around individual requirements and work styles. 3) Quick access to information should lead to better decision making. 4) Knowledge transfer should be made faster, more accurate, and more flexible than with conventional tools. The first two of these criteria deal with the design of the system, the latter two with its operation. In terms of design, the Codex implementation is driven by what we know about the cognitive structure of knowledge and the processes of its construction. Technological advances do not dictate the decisions about how best to facilitate inquiry – they merely enable them to be effected. Further, the workspace model employed in Codex does customize views onto a dense information space according to group membership and personal interest. Perspective-based representations of knowledge are intended to suit the particular approaches that each researcher brings to his or her work.

Collins’ third and fourth goals are largely achieved through the psychological changes brought about by a system. A typical quantifier for such change might estimate the level of human effort that is applied to maintaining and using a knowledge base (Graves and Mockus, 2001), or the level of interest in resources that a particular tool cultivates (Stojanovic et al., 2003). These metrics would suggest that, all else being equal, the better tool is one that requires less effort to complete a task or produces more interest than its alternatives. Quantifying effort and interest can help identify areas of knowledge degradation or difficult-to-understand concepts, both of which should be reflected in increased effort and possibly decreased interest. In practice, the result of applying these metrics is not so much an assessment of a tool’s abilities as it is a measure of the quality of the tool’s content – a related, but different, parameter. These measures are not generally applied to tools that operate in knowledge-discovery settings, perhaps because users cannot articulate their objectives in advance. It becomes difficult to determine how much less effort could have been expended to achieve the same discovery. Where the creation of new information is the goal in using a tool, quantitative metrics for psychological change can afford undue precision (Kleist et al., 2004).

Because of the “noise” involved in trying to quantify psychological factors in a knowledge construction activity, some approaches to system evaluation turn instead to quantifying operational changes. Ehrig et al. (2004) evaluate distributed knowledge sharing systems on the basis of number of resources managed, the topological structure between system components,

66 and number of collaborators. This style of assessment produces measures of reliability, relevance, recall, precision, and information loss. Yet again, measuring the quality of recall and precision can require that expected results of a knowledge search are known in advance. This foreknowledge cannot be assumed in a knowledge discovery environment where the user is hoping for unexpected (but useful) knowledge to be uncovered. Presently, the maturity of the Codex tool and methods is insufficient to make these measures meaningful.

Most of the quantitative measures developed so far to evaluate system performance take a limited view of what constitutes success. Here, however, the successful implementation is one that demonstrates that it is possible to build an information system that works with situated reasoning. We can thus turn to qualitative rubrics to help assess achievement of Collins’ third and fourth goals: improved decision making ability and knowledge transfer. In devising these rubrics, the literature looks not at specific technical solutions, but instead at the human component in knowledge management. Indeed, most see the technical implementation as only one part of a larger process. Kawalek (2004), for instance, characterizes knowledge management as the interaction between two systems (Figure 5.1): the human system of knowledge development (described in Chapter 2) and the operational system of knowledge construction aids (described in Chapter 4). In enabling the expression of knowledge under development through an operational platform, Codex helps achieve this interaction. In addition, the development of Codex from within domains that are part of its intended application achieves the interdependence between human and technical systems that Kawalek calls for. This interdependence is reinforced through the act of manipulating knowledge in Codex. In exploring each other’s expertise, users become involved in the larger process of inquiry into how they and their community create understanding.

Opening up the conceptual basis of knowledge construction tasks by accommodating the synechistic and situated nature of reasoning meets Kawalek’s further call for systems to promote transformative knowledge transfer. The role of the knowledge-based system is not just to communicate information to the user (as an internet database might), but to transform the user’s thinking and action. To effect the transformation of human understanding, Kawalek sets forth two specific requirements: (i) Human learners should be able to utilize information in a way that makes learning more

Figure 5.1. An ideal knowledge-based system acknowledges that knowledge production emerges from the reflexive development of both human understanding (top node) and the technological systems (bottom node) that can spark understanding (after Kawalek, 2004).

67 personal, through activities that are both critical and reflexive. (ii) Human learners should be able to reflect critically on their own thinking in action and be able to participate in refinement of knowledge.

As a perspective-driven information system, Codex is transformative by design. Kawalek’s first requirement is achieved through Codex’s capacity to induce introspection; the second is achieved through the capacity to see how one’s understanding is enacted by others in the pursuit of new knowledge. Codex furthers introspection through the provision of a personal workspace for researchers to express the development of their ideas. Unlike information-retrieval systems where information flow is one-way, with Codex concept maps, interaction involves active dialogue between the user and the system, as well as implicit dialogue between the user and his or her collaborators. Through the use of perspectives to illustrate both the consilience and uniqueness of individual understanding, Codex promotes critical evaluation.

Codex meets Kawalek’s second requirement, that of understanding how knowledge is put into action, through the capability for users to extend and repurpose others’ knowledge in new situations. Researchers can view the impact of their perspectives on the course of their community’s thinking, and critical reflection and refinement become possible when users question why certain knowledge structures might be reused more often then others. Part of this critical reflection is afforded by the cognitive adjustments induced by Codex. The psychological changes spurred by an operational system introduce new ways of seeing the larger knowledge development process.

Another set of knowledge management criteria emerges from a set of case studies in which the course of innovation demanded that knowledge be reused. In the reuse-for-innovation case, three major actions need to be supported by the interdependent human-computer system (Majchrzak et al., 2004):

1) Reconceptualize the problem and approach, and decide to search for others’ ideas to reuse.

In the first stage of developing an idea with Codex, the user can begin to sketch his or her ideas by situating core resources in a number of different conceptualizations, essentially testing various approaches to describing a problem. The user’s decision to search for collaborators’ insights is spurred by visually sensing gaps in understanding or available information.

2) Find and evaluate others’ ideas.

Codex queries produce situated depictions of resources, supporting the searcher in evaluating both the quality and relevance of collaborators’ knowledge.

3) Develop the selected ideas.

Importing a collaborator’s ideas into a personal workspace, the Codex user integrates team knowledge with his or her own understanding. The individual user’s ideas are

68 developed by reusing the insight of others, but the knowledge held by those colleagues is also furthered, however tacitly. When it becomes possible to see how others have reused one’s ideas, the development of these ideas by others can be incorporated back into our own understanding.

Majchrzak et al. summarize four empirical findings that point to the critical importance of situated and abductive inquiry in knowledge construction tools. First, the decision to reuse a knowledge component from someone else – not yet knowing which resource to reuse, but that it will be necessary to find one – comes from sensing an insurmountable gap in one’s understanding. In the language introduced in Chapter 2, this gap points to hermeneutic breakdown. Second, there must be an adapter to bridge the idea source and recipient; visualizing source knowledge in its situations of use can serve as this adapter in Codex. Third, knowledge reuse is a layered activity. The first layer involves scanning to find ideas to reuse, the second briefly examines resultant ideas for desired attributes, and the last accomplishes a detailed analysis of how to integrate the selected new ideas with existing ones. This three-layer process is clearly abductive: the first layer generates candidates, the second and third test them. In Codex, this abduction is accomplished by the ability to search for candidates in other situations and to test those candidates by visually inserting them into a present conceptualization. Majchrzak et al.’s final finding is that inquirers use “metaknowledge” to evaluate each candidate for reuse. In Codex, the contextual elements of every resource populate this metaknowledge – who constructed it, when, using what tools, and so on.

Holsapple and Joshi (2001) create a rubric for categorizing knowledge resources in cooperation with a panel of knowledge management experts. Their taxonomy enumerates the kinds of resources that an organizational knowledge base should include, based on two upper-level types: schema and content. Schema includes four components – purpose, strategy, culture, and infrastructure – that describe the ways that knowledge is applied. Content includes participant knowledge and artifacts; the former is abstract understanding, the latter are the data resources that manifest that understanding. Codex meets the requirements of this model by representing schema through context and situation, and content through Concept intension and extension. For content, the relationship between participant knowledge and artifact in Codex is modeled semiotically. For schema, situations capture the purposiveness of knowledge schema and the problem-solving strategies used to create them. Looking at situations as perspectives onto a problem taken by an individual or group helps identify the cultural aspects of knowledge through the search for consilience across perspectives.

The literature on evaluating knowledge sharing strategies has recently begun to highlight the problem of “knowledge sourcing” (Gray and Meister, 2004). Instead of measuring success from the supply-side, by looking at the amount of information stored, knowledge sourcing is a demand-side argument. To what extent do researchers intentionally access and reuse each other’s knowledge? In one use case described below, an initial knowledge-sharing implementation was a supply-side solution that emphasized the submission of evidence rather than understanding; a perspective-driven approach to knowledge representation, however, introduces greater motivation for researchers to want access to each others knowledge. The ability to immediately reuse knowledge structures (rather than, say, having to infer them from reading collaborators’ papers) can drive demand for knowledge sharing. The necessary

69 antecedent to demand for knowledge reuse – being faced with a problem that requires expertise beyond one’s own horizon – is brought to the fore in Codex through automated comparison of perspectives. While collaborators in large teams tend to interrate each others’ expertise poorly, the basis for these ratings is usually lack of knowledge about collaborators’ skills, not firm knowledge of their lack of skill (Denrell et al., 2004). The solutions devised in the present work help overcome the problem of lack of knowledge by depicting collaborator expertise in its situations of use.

5.3 Use Cases The representational methods developed in this work have been created in concert with research programs in two application domains: human-environment interaction and geoscience. In each case, the techniques described here were developed from the author’s experiences working in a research group in each domain. Internal access to the problems and practices of researchers in these domains permitted rapid prototyping, where knowledge model design and Codex functionality could be refined over time through informal interaction with domain experts. The creation and deployment of prototypes in response to the needs of the science communities that might adopt them amounts to ongoing formative evaluation. This formative evaluation makes it possible to identify psychological and operational changes in the work practices of domain members. These use cases also demonstrate the relevance of an epistemologically driven methodology to the real-world practice of science.

5.3.1 Aiding Pedagogy in Human-Environment Relations The Human-Environment Regional Observatory (HERO) is a multi-site collaboration to explore the local and regional drivers and impacts of environmental change. HERO is a synthetic endeavor where explanations of national and global patterns and processes are derived from the bottom up by integrating local observations. Information systems that support this synthesis across space and over time can speed the discovery of important associations and the creation of viable explanations. Groups of undergraduate students are often responsible for performing local analyses of environmental change, and in the course of their work with HERO have used Codex to produce a body of knowledge structures relevant to the problem of human-environment relations. Examining these structures shows how situation-based knowledge representations can drive the discovery of shared meaning. Students are an appropriate user community for investigating some of the affordances of the present knowledge model, because their engagement in active learning makes them well disposed to borrowing and extending each other’s knowledge.

The effort to provide HERO collaborators with an information-sharing system began with the deployment of a third-party electronic notebook (Myers et al., 2001). This system used a fixed hierarchy of files and folders to structure resources, and every user had to abide by the same view onto the shared information space (Figure 5.2). The notebook promoted the sharing of tangible resources such as data files, but not the reasoning behind them. Using the early notebook resulted in negative psychological and operational changes; in the fixed horizon of the system, information could not be stored as an explanation, but as atomic and largely unconnected data resources. Users occasionally contributed information to the notebook, but rarely retrieved other’s resources. It was hypothesized that the reason for this lack of use was that data resources alone were not an effective mechanism for describing “why” or “how” explanations came to be,

70 Figure 5.2. Electronic notebook based on a static directory structure.

and without this situated understanding the resources themselves were of little value (Pike et al., 2005). The notebook’s underlying model for the practice of science (or more properly, lack of such a model) severely limited its potential to induce Gray and Meister’s (2004) knowledge sourcing. In other domains, research has suggested that information displays need to be relevant to the task at hand. Instead of contending with the entire information space, as would be necessary in the early notebook, stakeholders only want information relevant to their goals (Arias and Fischer, 2000). The Codex perspective model, and the ability to filter a knowledge space by situation, was built to accommodate selective views onto a complex and collaborative understanding.

Examples of the actual concept structures created by HERO students help demonstrate the ability of the knowledge model to meet the requirements laid out in Chapter 1. In particular, they illustrate knowledge reuse, the merging of top-down and bottom-up structures, and the union and intersection of perspectives. The structures described in the following discussion were created by twelve students trying to understand the process of local environmental planning.

Figure 5.3 depicts a simple case of knowledge reuse. (For the sake of clarity in this and subsequent concept maps, only snippets of much larger structures are shown). A student has begun by intensionally defining a concept called “Policy” as something that affects local environmental initiatives. These initiatives in turn, affect the environment itself; to express this relationship, the student has opted to search for ways of describing the concept “Environment”;

71 Figure 5.3. Existing ontological structure for the concept of “Environment” (bottom graph) used to extend a user-defined structure (top graph). this can be accomplished in Codex by either searching syntactically for resources that have this word in their label or description, or semantically by setting intensional criteria that a resource must have to be qualify as a description of “Environment”. Here, the student has selected a situation in which environment is a part of quality of life. This description was from an existing ontology of environmental variables, so by including it in the current graph the student is using authoritative domain knowledge to extend her own understanding. The existing situation has been ingested into the knowledge structure she is building, adding a new situation to the ontology snippet; this is an example of a user-defined situation (Section 4.3.2).

Inferential situations can display emergent behavior by expanding the horizon of the inquirer or by limiting it. In the first case, taking the union of a set of perspectives creates a representation of the breadth of interest in a community. For example, three HERO students’ perspectives on environmental change can be merged along points of agreement in Codex, creating a community overview (Figure 5.4). Each perspective amounts to a hermeneutic horizon, the fusion of which is afforded by Codex’s search for inferential situations. The ability to create these views overcomes the problem of “symmetry of ignorance” – no one researcher is likely to hold all of the knowledge relevant to a problem (Fischer, 2000). In the Codex concept model, individual expertise becomes interoperable; the result is a broader explanation than any one researcher could provide. The points of agreement along which situations can be joined identify boundary objects (Section 3.3). Boundary objects are resources that distinguish individual perspectives while also serving as points of reference between them. In Figure 5.4, “Water quality” is a typical boundary object; it is situated differently in each perspective, yet we can impart its intensional definition in the red ellipse to the version in the blue ellipse, creating a point along which the two perspectives can be compared.

72 Figure 5.4. The union of multiple user perspectives can bound the problem space in which a community is interested. Here, three user situations (red, green, and blue ellipses) are joined along points of agreement (“water quality” joins red and blue, “urban development” joins red and green).

73 The limiting behavior of situations is illustrated by the emergence of consilience. The intersection of perspectives limits the horizon of an inquirer by restricting the view into a concept space to only that which is held in common by a group. Within an existing community of inquirers, consilience shows the largest knowledge structure on which all parties agree. Across any set of workspaces, Codex’s search for consilience can create an ad hoc community built around inferred agreement. One possible consequence of this feature is that researchers can identify potential collaborators. In Figure 5.5, the intersection of knowledge structures created by different students describing land use planning reveals a set of three consilient concepts. This agreement was not detected on the basis of similar syntactic labels alone, but as a result of similar intensions. Here, the consilient set includes three nodes and the portion of their intensions that overlaps (that is, only those relations that connect the consilient nodes in both maps). While this is a simple example, the resources in this set represent emergent agreement – a core of understanding shared by team members – and could form the basis of a new community-wide ontology. Each of the concept maps in Figure 5.5 represents a different perspective on the same problem, an idiosyncratic filtering of the resources in a shared concept space to include just those that their creators believe salient to the issue of land use planning. The complement of their intersection contains the resources that are unique to each researcher’s horizon.

Integrating, comparing, and reusing knowledge in educational settings creates a kind of “living book” in which slices of information can be structured differently for each learner (Baumgartner

Figure 5.5. The search for consilience between two knowledge structures reveals three concepts (red) that share full or partial intensional overlap.

74 et al., 2004). Through an implementation that spurs positive operational changes compared to its predecessor, Codex makes possible efficient knowledge transfer between students.

5.3.2 An Idea Lab for Distributed Geoscience The GEON project is a cyberinfrastructure effort in the geosciences, focusing on building the technological backbone to support distributed access to large data stores and high-performance computing nodes. The locus of GEON’s approach to data-sharing is the creation of ontological hierarchies that will be used to attach semantic attributes to data elements. The knowledge model developed here can serve as the mechanism by which researchers express personal perspectives on the associations between datasets and shared concepts. In addition, Codex’s knowledge-based interface presents GEON’s distributed resources in their situations of use; this presentation style helps collaborators understand how to reuse these resources themselves.

A hypothetical application in the domain of sedimentary geology illustrates how Codex can support GEON science activities. Versioning histories are one way for researchers from different backgrounds reconstruct each other’s reasoning. Figure 5.6 shows how this reconstruction proceeds through Codex’s retracing of historical manipulations. At top left is the current state of one sedimentologist’s “Sea level” concept, retrieved in response to a query. This query result shows just that concept’s immediate intension (the concepts “Depositional environment” and “Epeiric sea”) and extension (a stratigraphic column signifier). Suppose that an inquirer now wants to understand how this sea level concept came to be. Through version tracking, Codex can display the original version of the concept (top right), revealing that at first it had null intension; it was simply the abstract concept that a particular stratigraphic column resource signified. This stratigraphic column described the lithology in a region. At a later point in time (t1, center), the lithology concepts had been extended with observed features (wave ripples and fossil specimens) and possible explanations for those features (marine depositional environments). Still later (t2, bottom), the sedimentologist was able to link depositional environments back to the original sea level concept. Finally, a collaborating sedimentologist (Figure 5.7) made the conjecture that an epeiric sea explained the depositional pattern expressed through his colleague’s concept of sea level.

While versioning histories help colleagues retrace how concepts emerged from continuously evolving situations, they are not a complete solution for GEON’s knowledge sharing needs. Compared to the HERO case, GEON participants are greater in number and more diverse in expertise. GEON members represent a broad “community of interest” (CoI) within which are a number of “communities of practice” (CoP). The GEON CoI includes scientists performing basic research in geophysics, hard- and soft-rock earth history, geologic mapping, and earth science education, among others. The members of this CoI buy in to the same overarching goals for greater sharing of both knowledge and data, but come at the problem from perspectives entrenched in the various CoPs to which they belong. Thus, there might be agreed-upon taxonomies and ontologies at the CoP level (for instance, to describe geophysical processes, or learning goals), but these often lose traction at the CoI level (Wenger, 1998).

A continuation of the sedimentology example demonstrates how Codex perspectives help different CoPs use shared concepts differently. Turning to the “Depositional environment” concept, which could be shared among the full GEON community of interest (that is, it is a

75 Figure 5.6. Using Codex to reconstruct the creation of meaning: how “sea level” got its intension.

76 Figure 5.7. Another sedimentologist reused the sea level situation, and hypothesized that an epeiric sea could explain the sea level changes reflected in this depositional pattern.

concept available in the project-wide workspace), we can examine how it might be registered to different GEON ontologies by members of each CoP using their unique horizon on the information space. In Figure 5.8, a paleobiologist (top) classifies specimens according to astandard geologic time scale; registering the depositional environment in which a specimen is found to the period in which the environment obtained helps collaborators find other resources from the same period. Meanwhile, the sedimentologist (middle) defines a particular depositional environment as subaqueous, constraining the genesis of any rock samples representing that environment. Finally, the geoscience educator (bottom) defines depositional environments as one of the earth surface changes that, according to the AAAS (2001), should be taught in grades 6 through 8. Codex now maintains a series of intensional definitions for “Depositional environment” that help GEON researchers understand the concept from the point of view of their own community of practice as well as from that of others. In the latter case, fusion of horizons is eased.

In the HERO application, Codex and its situated knowledge model can be the route to discovering larger patterns of understanding from expressions of local belief. In GEON, the goal is somewhat different; researchers all need access to the same body of shared geoscientific resources, but their “discipline perception” as members of different CoPs complicates arriving at a single organizational system that accommodates their various perspectives on the meaning of those resources, or the meaning that can be generated from them. The ability of Codex to accommodate user-specific situations provides a means of translating the fixed ontologies on which GEON tools might depend into CoP-appropriate representations for human consumption.

77 Figure 5.8. Using Codex to register a concept at the CoI level to different CoP ontologies.

78 A particular hurdle in the GEON approach to data integration is the semantic registration of data fields to ontological categories. When the fields of each dataset are mapped to labels in one or more ontologies, GEON tools can make the data interoperable; GEON has used this technique to register geologic maps that use different classification schemes to the same general ontology, allowing the maps to be aggregated (Lin et al., 2003). In most cases, this registration can only be performed (or at least verified) manually – unless the field and category are named identically, there may be no indication that they share the same meaning. Moreover, different semiotic relationships will hold to different researchers, so there may not be a single mapping that works from all perspectives. However, a database schema is a special case of an ontology and could be depicted in Codex as a concept map. Using the same process of registering user-defined concepts to domain ontologies as shown in Figure 5.8, it would be possible to perform database registration visually and flexibly, on a task-by-task basis. The GEON researcher can load the schema as a top-down ontology and create a knowledge structure out of it by specifying the concepts of which the data fields are extensions.

When versioning histories and registration of concepts with different perspectives are combined, it becomes possible to present a single view onto the evolution of a resource. Figure 5.9 shows such a view, summarizing the intensional and extensional changes in the “Depositional environment” concept. (This versioning history can be retrieved automatically by Codex, although its visual display here is only a prototype). The original concept, at upper left, was created by one sedimentologist and contained a single extensional element. As successive researchers adopted the concept, the connections they made between top-level ontologies (in Figure 5.8) were added to the concept’s intension. Signifiers for “Depositional environment”, which might be data resources in GEON’s online repositories, become associated with elements

Figure 5.9. Evolution of “Depositional environment” concept through use by researchers in different communities of practice, progressing from upper left to lower right.

79 in an “authoritative” ontology, thus furthering GEON’s data integration goals.

Codex’s semiotic model of concept representation is central to accommodating the user-specific situations of meaning shown in these examples. Different users are permitted to use different data to signify the same concept, or the same data to signify different concepts; there is none of the one-size-fits-all registration between data and ontology that is common in other systems. The Codex semiotic is schematized in Figure 5.10. An object of study has some representation in the mind, which is translated into a computational knowledge structure through Codex’s concept mapping interface and registered to a particular workspace. Any resource in this map has further levels of signification, including not just the underlying OWL statements, but any links to extensional data that describe a concept; again, these links are registered at the workspace, not system, level. Note that in this diagram there is not a direct correspondence between the object and data about it; the correspondence is only between data and the concepts it represents. Section 4.3.1 showed that extensional objects and representamen are related through the situations in which their relationship holds. The concept structure in Codex brokers between objects (in GEON’s case, these are generally quantitative data) and the representations of their real-world meaning.

GEON is also illustrative of how a knowledge-centered portal like Codex can be an organizing metaphor for large cyberinfrastructure efforts. Thus far the focus of GEON’s work has been on the use of semantic markup to support the backend interoperation of data and systems. Once collected, however, this large and ever-growing body of geologic resources needs to be accessible to researchers. By organizing resources into shared workspaces, each CoP can use different organizational situations to accommodate their perspectives on geoscientific problems. When new users join the project, they can register with one or more CoPs and inherit the default

Figure 5.10. Trails of signification in Codex concept representation.

80 perspectives appropriate to that group (such as rock genesis ontologies or curricula). But by moving between CoP views, as in Figure 5.8, GEON researchers can engage in the sort of playful dialogue that can illuminate the horizons of collaborators in other CoPs. These horizons fulfill the need for “situations that talk back to us” (Schön, 1995), helping GEON collaborators manage a complex and competing CoI.

5.4 Achieving the Objectives of Situated Science The principal convictions of this work are that scientific computing should be based on concepts, not on data; that particular arrangements of concepts constitute expressions of situations in which those concepts can occur; and that an individual’s or community’s perspective on the world can be revealed through its choice of concepts and situations.

Revisiting the goals and objectives first laid out in Chapter 1 highlights the extent to which these convictions were reflected in the knowledge model and proof-of-concept implementation. This work embraced three goals:

(1) Theory: Development of a theoretical model for concept representation that accounts for evolution and context.

Chapters 2 and 3 articulated the theoretical basis for representing scientific resources as historically situated entities reflective of the judgments and perspectives of their creators and users. Any resource can be cast as a specialization of a generalized Concept type, carrying with it a contextual header that includes information on the circumstances of its use. Collections of resources constitute situations that reveal the purposes to which their constituent elements have been put.

(2) Implementation: Translate the theoretical model into a schema for a standard notation for knowledge sharing in computational environments.

In Chapter 4, the desiderata for situated model of knowledge were translated into a system for describing concepts, contexts, and situations using OWL notation. In addition, a concept graph interface was introduced that maps resource intension, extension, and situation onto a visual mechanism for both displaying existing concept structures and capturing new knowledge from users.

(3) Evaluation: Demonstrate (i) that the implementation accurately reflects the theoretical model, (ii) that it can be integrated into the practice of science through concept capture and exploration activities, and (iii) that it meets the requirements of user communities.

Chapter 4 and the present chapter illustrate how the situated model of knowledge is implemented in a Web portal called Codex. The implementation enables resources to be depicted in situations of their use and to be queried and reused in new situations. The manipulation of resources in Codex achieves (i) the requirements laid out by philosophical and cognitive theory and (ii) integration with the pragmatic acts of retroduction and play that are part of scientific practice. Use by HERO and GEON communities showed that systems of the sort developed here can (iii) be part of the

81 research process – Codex supersedes its e-Notebook predecessor, and directly improves upon many of the latter’s failings.

In terms of the four objectives for a successful implementation, this work achieved the following: (a) Concepts are grounded in the situated practice of science work and are never independent of the processes by which they are constructed or used. The representation will expose the manner by which concepts are created and revised.

Context wrappers reflect continuous records of resource creation and use as an intrinsic part of the resource itself. Unlike current ontologies, which users must take as granted in their final form, any system that accesses a Codex resource can also follow the trail of that resource’s development, using context tags that can be parsed by any OWL- compliant tool.

(b) Concepts are socially constructed, and their form reflects the confluence of ideas at different scales, from individual practitioners to entire communities. The representation will explicitly encapsulate the perspectives of these inquirers.

Using personal and shared workspaces, inquirers can express knowledge structures that reflect individual and shared belief. The set of resources in a given workspace bounds the perspective of its owner or group. Codex also supports manual and inferential construction of shared structures, such as authoritative ontologies, from the resources held by individuals.

(c) Concepts can emerge and evolve over time. The representation will support the capture of provenance information that allows historical trails to be reconstructed.

Versioning histories reveal the synechism inherent in scientific thought. Comparison between versions shows which intensional characteristics were borrowed from other resources, when, by whom, and in the course of creating what situation. New concepts can emerge by detecting overlap in the intensional definitions of existing concepts.

(d) Concepts are associated with people, places and times. The representation will incorporate temporal and spatial versioning information that connects individual instances of resources to particular users and the geographic and temporal context of their work.

Codex’s registration module keeps track of which researchers are responsible for which resources. Coupled with versioning information, collaborators can reconstruct the status of each others’ reasoning at any point in time (as shown in Figure 5.6). Currently, specifiers of place are recorded as free text (place names) or numeric values (coordinates or bounding boxes), but continued work should see the development of richer geographic descriptors (as might be achieved by linking Codex to a gazetteer).

82 5.5 Summary The adoption of knowledge-sharing tools engenders psychological and operational changes in the work practices of scientists. Attempts to evaluate these changes quantitatively can be difficult in knowledge-discovery environments, however, for lack of a baseline by which to measure objective performance. However, qualitative assessments of the conformity of the Codex approach with rubrics for the design and evaluation of knowledge-based tools are possible, as are explorations of use cases in which Codex can trigger positive changes.

Guidelines for desirable characteristics in knowledge management tools reflect the general sense that knowledge construction and use should be treated as a process. As shown in Chapter 3, however, this process is rarely treated explicitly. Codex is different, modeling concepts as continually evolving and differentially situated resources. The literature also suggests that the effective knowledge management tool imparts some sense of how to act with the knowledge it maintains. Codex situations of use show how thinkers have acted with knowledge.

Compared to its e-Notebook predecessor in HERO, Codex has clear operational benefits. The greatest of these is that communities of learners can integrate local patterns into larger explanations. This operational integration of resources is mirrored by psychological changes, particularly suggestions of a sense of community understanding among researchers. For the HERO students that use Codex, the concept mapping interface allows complex ideas to be expressed simply. Moreover, the system affords experimentation and revision of ideas, encouraging students to engage in learning. In the GEON case, the operational changes enabled by a knowledge-driven portal are significant: the integration of distributed resources can be achieved on a task-by-task basis as individual researchers create mappings between personal knowledge and shared ontologies. The numerous communities of practice that make up the GEON team can also realize the psychological benefits that accrue from visualizing each other’s idiosyncratic methods of attributing top-down meaning to shared concepts.

Not all the changes induced by Codex will be resolutely positive. Codex is not yet a generic enough container for science work to suit every user, making some negative psychological changes unavoidable. Some such changes include the simple need to reflect more deeply on the conceptual structure of one’s ideas than may be comfortable, and the difficulty of interacting with concept maps for someone who might express knowledge most easily through prose. The affect of these psychological changes is negative operational change – if it is difficult to achieve the cognitive adjustment a tool requires, the amount and quality of information stored within the tool will suffer. The next chapter outlines directions for the continued development of the methods introduced here; the thrust of these improvements is the amelioration of negative changes by enhancing both the content of and interaction with knowledge managed by Codex.

83 Chapter 6 – Social Wisdom through Situated Computing

The machine’s primary service lies in extending the mass of recollection, and in rendering this explicitly rather than vague. It also provides a memory which does not fade, and by causing it to be more promptly accessible than by the somewhat haphazard trails of association in the brain itself. Its trails are formed deliberately, under full control of the user, ultimately in accordance with the dictates of experience in the art of trail architecture. This, in turn, remolds the trails of the user’s brain, as one lives and works in close interconnection with a machine of scanned records and transistors. For the trails of the machine become duplicated in the brain of the user, vaguely as all human memory is vague, but with a concomitant emphasis by repetition, creation and discard, refinement, as the cells of the brain become realigned and reconnected, better to utilize the massive explicit memory which is its servant.

[…]

The race progresses as the experience and reasoning of one generation is handed to the next. Can a son inherit the Memex of his father, or the disciple that of his master, refined and polished over the years, and go on from there?... Can science and technology, as they support and extend man’s power of thought, bring us nearer to social wisdom, rather than merely to extend control over the forces of nature for good or ill?... The path toward the objective has only recently been entered upon. Progress along the path depends upon the advent of new technical instrumentalities, and still more upon greater understanding of how to use them. (Bush, 1959)

6.1 Memex Revisited This thesis is about knowledge, and about the construction of information systems to support the human processes of its development and use. It also about an appreciation for, and advocacy on the behalf of, social wisdom. Unlike Bush, I suggest that we do not lack this wisdom, but that too often it goes unexpressed, and is perhaps inexpressible by conventional means. Social wisdom exists in the transfer of knowledge from one thinker to another, in the evolution of ideas through cooperative exploration, and in the shared perspectives that make meaning mobile. But the mechanisms by which this wisdom grows are generally locked inside our tools and our heads. Exposing these mechanisms to leverage the power of shared wisdom is not, at bottom, a computational problem; it is a human problem. As Bush argues above, tools are not enough – indeed, from the title of his 1967 book of essays, Science is Not Enough. To capture knowledge, and to express it to others, we need to examine how we know what we know, and only then build information systems based on new models that promote expressions of shared understanding.

In the years following the publication of As We May Think in 1945, Vannevar Bush occasionally returned to the topic of Memex, refreshing his ideas in light of continued technological advancement. In 1959, Bush penned Memex II, which was posthumously published by Nyce and Kahn (1991); in 1965, Memex Revisited appeared in Science is Not Enough. In both essays,

84 Bush updates his original vision to accommodate changes such as the supplanting of analogue machinery with transistor-based digital computers. The crux of his vision remains consistent, however, and it is this vision that the present work updates. Memex trails, reinforced through use, come to reflect the personality of their creators. The perspective of the inquirer becomes an intrinsic part of the representation of knowledge, not because it can be made so, but because it is so in the mind, and ought to be made so if knowledge is to be reused intelligently by others.

It is fitting for geographers to be concerned with the problems of representing and manipulating knowledge. As a discipline, geography is all about representations of the world, visual, narrative, or otherwise. Even without introducing the vast literature on collaborative information systems that geographers have produced, the act of creating and sharing descriptions of worldly phenomena is inherently geographic. To en extent, the integrative nature of geography can overcome the classic “two cultures” problem (Snow, 1959).17 Using a tool like Codex in this practice can help communicate unique perspectives across disciplinary cultures even further. The tool becomes a partner in inquiry, a mediator between perspectives, a mirror by which we examine our own understanding, and through its ability to express meaning in response to the researcher’s explorations, an interlocutor in its own right.

6.2 The Philosophy of Science and the Science of Computing This research examined the problem of how we know what we know by first considering the nature of scientific knowledge. An implementation derived from the basic tenets of human inquiry can, it is hoped, guide collaborators toward effective knowledge sharing; the aim is to situate scientific explanations in the processes of their creation and use such that researchers can retrieve both the intellectual and manipulative provenance of any scientific resource, tangible or abstract.

In Chapter 2, Aristotelian categorization paved the way for scientific resources to be differentially connected – thus having different intensions – in different situations. Unfortunately, as some of the “ontological” implementations presented in Chapter 3 showed, categorization is as far as contemporary computing often treads. But in the millennia since Aristotle, some important claims about the nature of knowing have illuminated paths toward more cognitively appropriate computational aids to reasoning. From Kant, we inherit appreciation for the importance of the judgment that aligns our experiences into explanations. Kantian teleological judgment implies a purposiveness in the world that makes it knowable, and it is this purposiveness that situations, in the present implementation, capture. Purposiveness also inculpates our pre-judgments, in that they guide the choice of concepts we include in creating an explanation. In Codex, these judgments are not biases but are material to the practice of science; they are the basis for comparing perspectives between researchers.

The motivation for representing scientific resources in Codex as generic types that can take on motile roles – concept, extension, task, and so on – comes from Peirce’s semiotic. With situation (and thus with researcher, or with time) varies both intension (representamen) and extension (object). Codex also captures the retroductive nature of science by keeping continuous records

17 It is also likely that Bush’s 1959 and 1965 essays (especially Science Pauses, which appeared in Fortune magazine in 1965 and examines the boundaries between scientific inquiry and art, emotion, and faith) were informed in part by Snow’s influential lecture. 85 of resource manipulation and change. This Peircean pragmatism aids the tracking of situations – since resources are situated in the systems in which they are used, Codex does not require further intervention from the researcher to describe what it is that he or she is doing. Codex’s knowledge model also embodies Gadamer’s historically effected consciousness, but in a way unique among computational approaches. The existence of Codex users in a longer history of inquiry (at least within the history managed by Codex) is made explicit, such that the perspectives of a current inquirer can be directly informed by a prior thinker.

Chapter 3 showed that previous attempts to build knowledge-driven or cooperative information systems succeeded on some accounts, but none addressed the need to integrate knowledge representations and collaborative work in a single environment that treats scientific inquiry as a process. Geographic computing, in particular, frequently models human expertise only to the extent that it helps aggregate piecemeal observations. This is an ends-focused approach that does not reap the full value that can come from modeling knowledge for its own sake, not least in terms of our ability to express understanding to each other in rapid and reusable forms. Geographers do appreciate the value of starting from cognitive models of geographic concepts when building interoperable systems (e.g., Agarwal, 2004). But should a discipline increasingly concerned with computational methods remain focused on data-driven computing, efforts to assert geographic knowledge as something more than the merely declarative (Golledge, 2002) would be undermined.

The rise of cyberinfrastructure and Semantic Web technologies anticipates new kinds of computational tools that permit distributed knowledge-based work. For these tools to fulfill their promise, however, they need to emphasize the creation of explanations over workflows; the workflow is, in the end, a means to generating or evaluating a candidate explanation. Online workbenches that follow the model outlined here, by maintaining records of situations in which resources are used, grant collaborators a more substantial inheritance from their intellectual forebears than has yet been offered them.

6.3 Practicing Situated Science The benefits yielded by situated knowledge models and their implementation in collaborative systems should be felt across the range of inquiry-based activities. At the outset, pedagogy and basic scientific research are obvious application contexts – students and researchers are actively engaged in (and indeed have as part of their remit) knowledge exposition and communication. In these applications, a tool like Codex formalizes, extends, and speeds an extant knowledge transfer process. Still, there are possibilities for a Codex-like approach in other application domains. Indeed, in any field where information is reused, built upon, or acted upon, the representation of the situated reasoning that went into a resource can help guide appropriate use. Planning and policymaking tasks, for instance, can require searching for consilience in group perspectives over time. In strategy-setting endeavors, understanding the coherence of individual perspectives and detecting common threads can lead to policy candidates likely to have widespread approval. (Although Codex, like policy, need not be democratic; particularly distinctive perspectives on a problem can be detected, appreciated, and adopted, or not, by others). In competitive intelligence, systems that help analysts create meaning out of disparate information resources can support effective sense-making. A knowledge model that preserves audit trails of resource manipulation and concept growth can increase the transparency of a

86 research enterprise; audiences can engage in deeper critical examinations than prima facie reports might ordinarily allow. Furthermore, the trustworthiness of any information resource can derive from its coherence with a “reference” perspective – thus, trust need not be a consistent measure for all circumstances, but can vary according to the needs of an inquirer or the circumstances of a situation.

Across these possible application areas, however, there is a tremendous barrier to adopting a tool like Codex. In some domains, data and knowledge are, by convention if not necessity, kept very close to the researcher’s vest. Approaches like Codex are not likely to be successful, at least in the short term, if they are perceived as flinging open the doors to the academy, the boardroom, or the halls of governance. Power structures are reinforced in part through making it difficult for potential competitors to access one’s products and reasoning. Exposing this reasoning to scrutiny risks loss of competitive advantage in the race for funding and repute, or the revelation that one’s reasoning is more specious than one lets on.

Two directions could be taken to address these social concerns. The first and most difficult is cultural change. It could be hoped that, with enough time, communities of inquirers would gradually appreciate the affordances of knowledge sharing systems. In some cyberinfrastructure projects, for instance, codes of conduct have been introduced that stipulate that continued funding is contingent on contributing one’s resources back to the community; to complement this stick, Section 6.4.4 proposes some carrots to encourage Codex use, such as new ways of measuring scientific impact by examining resource uptake and reuse in a community. A wholesale shift is unlikely, but a movement toward greater sharing of knowledge resources might be led by those research communities already at the vanguard of cyberinfrastructure (such as geoscience) or integrative analysis (such as geography). An alternative (and perhaps complementary) direction is to pursue technological controls on knowledge access and reuse. Codex already supports a limited version of such controls, in the form of separate workspaces for individuals and teams. Codex users can make a per-resource determination about granting others access, and if they decide to share a resource, can control which groups are permitted to use it (Figure 6.1). Beyond this level of control, the albatross of Digital Rights Management could be implemented in Codex. Resources could be obfuscated, for instance, to complicate reverse- engineering, or encrypted and digitally signed so that only approved viewers could make sense of them. The Creative Commons project has produced a set of licenses for digital information that can be freely adopted by third-party tools like Codex. There are machine readable (RDF, in fact) versions of these licenses that stipulate the conditions under which a resource can be used

Figure 6.1. Codex users can select a group with which to share each resource.

87 and shared. Codex and like systems could be extended to read and respect such licenses, offering users a measure of security when releasing their products through these tools. The licensing of educational materials is already a Creative Commons thrust area.18

While some of the affordances of Codex can be approximated by conventional approaches, none formalize the process of knowledge construction. But few researchers would suggest that they are incapable of communicating effectively with their colleagues, and few students would argue that current pedagogic practice impedes their learning. Email, written publications, and open spaces as sites for informal exchange will remain vital to research tasks. Indeed, many of the technological solutions identified in Chapter 3 (such as Computer-Supported Cooperative Work tools) will become increasingly indispensable to work communities. However, is there a larger process in which these tools and techniques are embedded? Codex, and the knowledge model it embodies, attempts to integrate knowledge production tasks into the larger process of forging understanding. Formalizing a framework for knowledge construction transcends, but at the same time permits, the multitude of existing tools.

A further problem facing the adoption of Codex and like systems is the persistence of the resources they maintain. When responsibility for maintaining a resource base is distributed, as it might be if Codex is used as the front end to a larger knowledge- and data-storage infrastructure, the potential for those resources to be ephemeral is great. The benefits of Codex will not be realized if users cannot reliably access the resources they might discover using it. Solutions to the problem of persistence can take several forms. First, if Codex is integrated with Grid computing infrastructure, as might be the case with GEON, a grid node replication service might ensure that resources are always available. Second, there are attempts to create persistent URLs using services that resolve a persistent identifier to a changing physical storage location.19 Third, integration of Codex with digital libraries could provide the long-term curatorial support needed to ensure that present-day insight, as recorded through Codex, is available to future researchers.

It is important to avoid the suggestion that just because we can foist a fundamental change in the nature of knowledge production on research communities, we should. But by the same token, we should not disregard the possibilities offered by technological advancement out of hand. Just because science is not paralyzed does not mean that it could not be improved. The possibilities offered by the interaction between cyberinfrastructure and a knowledge construction platform like Codex can fulfill Vannevar Bush’s ideals on a grand scale.

6.4 Making Social Knowledge Sharing Better Meeting the needs of increasingly global and integrative research communities depends on continued investment in knowledge sharing methods. Broadly speaking, there are two areas that necessitate deeper exploration. The first deals with knowledge representation techniques themselves; how can human knowledge be better described in computational terms, or more appropriately, how can computational representations be made more consistent with human mental structures? The second research thrust questions the nature of the human-computer interface, and how meaning can be more efficiently elicited from inquirers. Below, I outline a course for the future expansion of the ideas presented in this thesis. The development of rich

18 http://www.creativecommons.org/education 19 OCLC is one organization with the resources to create a viable persistence service: http://www.purl.org 88 concept definitions and improved perspective comparison addresses the first research area above; creating knowledge-driven desktops and new knowledge-sharing modalities addresses the second.

6.4.1 Richer Concept Definitions In its current state, the knowledge model developed here depends on the capabilities of OWL to represent concept intension. Roschian prototypical effects are modeled to the extent that individual users can modify each others’ concept definitions to make them “better” examples of their type, but this is a weak solution to typicality that only accounts for one type of uncertainty in class membership – disagreement. There are other kinds of uncertainty, specifically vagueness and imprecision, that if modeled properly would be useful to collaborators in understanding the relative importance of a concept’s intensional dimensions. Uncertainty can also aid estimates of semantic similarity between concepts, in that highly certain relationships can be privileged over those less certain.

One approach to enhancing concept definitions with uncertainty measures uses membership functions to account for fuzzy intensions (Ahlqvist et al., 2003). In this model, a concept is intensionally defined through its properties (in other words, through its relationships to other concepts), as is currently the case in Codex. But the fuzzy membership model extends Codex by expressing, for each property, its degree of membership in the intension through a single value or a range of values (Gardenfors, 2000). For example, the concept “Red Delicious Apple” might be defined along intensional dimensions of redness and sweetness. Rather than simply being defined as something that is red and sweet, an inquirer could apply a membership parameter, often as a value between 0 and 1, to each property. One could be crisp about things and indicate that the apple has a redness of 0.85, or fuzzy bounds could be used instead – the redness is a region between 0.7 and 0.9. Similarly, one could query for concepts by looking for concepts that have not just “redness”, but that are very red or certainly red – those that have a redness membership greater than 0.9, say. There is a distinction between this sort of membership function and providing a unique scale for each property. The unique-scale approach might, for example, use a hue/saturation/value color space to describe redness, and sugar content to define sweetness. 20

Using an abstract membership function is initially desirable because the same scale can be applied to any property. Since a serious difficulty with trying to record fuzzy sets comes in how to encourage users to reflect on their uncertainty, it may be beneficial to start with a simple, visual approach to uncertainty. A prototype interface for controlling user interaction with such an abstract scale has been developed for Codex (Figure 6.2). This interface is a Simple Vector Graphic (SVG) device that sits below the concept map (but could in future be used independently, or in combination with another interface) and presents a means of both defining and displaying fuzzy typicality. On the left is a “property stack” that lists the properties that have been defined for a resource; any concept can be viewed as a column of properties. When a user selects a node in a concept map, that node is displayed here as a stack of tokens. Each token is a property; here, the stack represents the concept “Seismic model” and the property “Outputs” has been selected. The things that the model “Outputs” are shown on the right side; in the

20 I am indebted to Ola Ahlqvist for his insight and encouragement into the prospect of defining concepts through the fuzzy areas they occupy along multiple property dimensions. 89 Figure 6.2. Describing concept intension through a combination of dimensional and fuzzy approaches.

concept map view, the node “Seismic model” would have edges labeled “Outputs” directed toward the nodes “Magnetic maps,” “Tomography,” and “Wave drag.” The user drags a red column to indicate the membership degree of each property instance – here, tomography output is quite certain, wave drag much less so.

Extensions to OWL that can accommodate the sort of fuzzy definitions that can be captured using this interface are only now beginning to be considered (e.g., Nagypal and Motik, 2003). Mechanisms to combine experts’ subjective estimates into expressions of community certainty will prove enormously valuable to planning and decision-making (Wallsten et al., 1997; Wallsten and Diederich, 2001), and must also be integrated into computational tools.

In addition to fuzzy membership functions, Codex concept definitions could be enriched through integration with other tools, particularly the e-Delphi system I have also implemented (Pike and Gahegan, 2003; MacEachren et al., 2004; Pike et al., 2005). In e-Delphi, collaborators participate in iterative, discussion-based exploration. Discussions are augmented with balloting and preference-ranking modules. The content of these discussions can be treated as another example of situated knowledge. Already, it is possible to create simple concept maps from e- Delphi activities that depict the most salient terms in the discussion. Coupled with improved natural language processing and the information e-Delphi contains about the time and author of each item, it would be possible to create more complex knowledge structures in a semi-automatic fashion. If e-Delphi were extended to output OWL representations of its concept maps, the text a discussant wrote could be used as a signifier for the concepts it contains. E-Delphi discussions could help populate the Codex knowledge base with tentative conceptual structures.

6.4.2 Improved Perspective Comparison Codex is not yet at the point where a researcher can use the structure of a perspective he or she defined to organize the resources in another’s workspace, but this is a direction for continued exploration. To truly capture the lens-like quality of perspectives, and to afford the hermeneutic

90 fusion of horizons, users should be able to overlay their perspectives on any set of resources, bringing them into an order that is coherent with their worldview.

Part of the ability to overlay new perspectives on existing corpora can be achieved by leveraging more fully the semantic search capabilities of an OWL inference engine. For instance, if we want to examine how well a target set of resources matches our own perspective on the same problem, it might be desirable to compute degrees of difference. Currently, Codex only looks for overlapping concepts on the basis of overlapping intensions – two entities with either identical intensions or overlapping intensions will be deemed similar. But a more sophisticated semantic search might examine further degrees of distance – not just the intension of a concept, but the intension of its intensional elements, and so on. For example, from one researcher’s perspective, earthquakes might be measured with seismographs, and seismographs might elsewhere be defined as measuring ground shaking, as in seismic imaging. If another researcher defines earthquakes as measured by ground shaking, these intensions will currently have no overlap. However, the fact that a colleague has made essentially the same association between earthquakes and ground shaking, but separated by a degree of distance (passing through the seismograph node), would be useful information to an inquirer. Ultimately, we would want an inference engine to be able to traverse trails of concepts, up to a depth specified by a user, to find potential overlaps.

Improving perspective comparison also necessitates improvements to perspective navigation. The current navigational device (Figure 4.12) can be cumbersome to use, and is best for showing one perspective at a time. Users also need to keep track of which “perspective” they are editing or viewing. It would be desirable to create a more seamless transition between perspectives, perhaps by building support for multiple perspectives into the display of the concept map itself. To this end, a three-dimensional graph display could paint a concept map pertaining to each perspective on the sides of a multifaceted, rotatable volume. Alternatively, the two-dimensional nature of concept maps could be preserved, but instead of showing a “fixed” map, the nodes and edges displayed could change as the user hovers over each node. In this approach, the initial view into the concept structure might be sparse, but using the cursor as a focusing device reveals further detail. Lastly, the lens-like nature of the perspective could be reinforced, perhaps by allowing users to drag successive filters from a list of available perspectives onto a dense concept map representing all of the possible connections among resources in a workspace. This device makes manifest the schematic in Figure 4.8.

6.4.3 The Knowledge-Driven Desktop Beyond improvements to the way that we might represent and manipulate human concepts, a more fundamental change to the way we work (or rather, at present, do not work) with knowledge using computers is afoot. Currently, users interact with the knowledge capture facilities of Codex explicitly; they make a conscious decision to access the tool, describe a knowledge structure, or search for collaborator knowledge.

A tool like Codex can only be effective if it is of clear benefit to its users, demonstrably increasing the efficiency or quality of their work. Depending on researchers’ altruistic natures to contribute their concepts to a growing body of knowledge will only take a community so far (and this is likely not far at all). Populating the knowledge base with enough cases to enable reliable

91 reasoning is the bootstrapping problem. Codex provides a framework for describing knowledge, but is not a knowledge base itself. Steps must be taken to ensure that it does not remain an empty bucket; until there is a critical mass of information in Codex, users will be asked to travel a one-way street, sharing but not receiving. Some of the benefits of a tool like Codex only accrue gradually, and only through use. Once a concept base has been populated, though, researchers will find that their ability to perform comparative analyses, reuse analytical components and visualize the breadth and depth of a problem space make continued contributions to the system worthwhile. Unfortunately, reaching that stage could require that investigators spend weeks, even months, contributing and organizing concept definitions before there is a sufficient basis for query, comparison, and reuse.

Techniques for solving the bootstrapping problem can simultaneously address the need to minimize the obtrusiveness of Codex on a researcher’s workflow. It is unreasonable to expect users to pause every few minutes to update the system on their current thinking, although the opposite⎯where researchers work through entire analyses without reflection⎯is currently the norm and equally undesirable. To reduce the knowledge-collection costs imposed by tools like Codex, we should begin to think about concept capture as an embedded component in one’s everyday workflow. Codex, after all, is not (yet) the tool that is used to actually perform analyses – that role is occupied by the word processors, the GIS packages, the spreadsheets, and so on, with which we interact in the course of our daily work. Are there ways to extract concept structures, or at least their tentative skeletons, from simply observing and recording how resources are manipulated in these tools? Indeed, if comprehending the situatedness of knowledge is key to understanding its relevance, then perhaps the elicitation of that knowledge should be truly situated in the low-level practices of everyday work. We can then posit a level of abstraction above that of the individual tool, one that takes a more holistic view of the entire knowledge construction process. Individual desktop applications feed information upward, where a system built on a framework like Codex’s integrates it into a synechistic model of a researcher’s understanding. When concepts and situations are captured in the background, the Codex knowledge base can be bootstrapped over time. But for this approach to be workable, user communities must accept a “training” period where the system operates with limited functionality, if they are to reap its later benefits.

There are a number of tactics we could take to intermesh the collection of knowledge structures in the flow of day-to-day work. One option is to seek tight coupling of knowledge representation mechanisms with desktop applications, so that a researcher’s own descriptions of an analysis could be augmented with details of the specific operations performed. Tight coupling is not likely to be a viable option on any significant scale – proprietary systems, and the requirements of creating a series of one-off sentinels for popular programs, would impede progress. At the other extreme, a very loosely coupled approach might consist of a desktop widget that periodically queries the user to update Codex’s records with current progress; in this case, as with paper notebooks, the comprehensiveness of the record is at the sole mercy of the researcher’s diligence.

Systems like Google Desktop Search, the detection of knowledge communities using Web links (Henzinger and Lawrence, 2004), and Microsoft’ prototype Stuff I’ve Seen (Dumais et al., 2003) offer insight into one potential middle ground. These techniques detect latent associations

92 between resources, generally based on time, creator, or keywords. Moreover, they integrate into the operating system, building indices through continuous recording. It may be fruitful to begin thinking about how semantic associations could be detected by these tools, and how they could be merged with something like Codex to create social wisdom through collating usage patterns over networks. Eventually, desktop applications may be superseded by Web service clients, making the possibilities for embedded collection even greater.

An alternative approach to situating knowledge collection in everyday practice is to work toward the promulgation of standards for recording resource use histories. Standards-making can be an expensive, time-consuming, and divisive affair, but it may be the only way to extract descriptions of manipulations from closed tools. If practicality prevents insinuating a perspective-capturing device into every tool, standard markup notations can provide a mechanism for tools to report outward on their own.

Naturally, the use of tacit capture devices should be balanced with the opportunity for explicit interaction with and consideration of one’s (and one’s collaborator’s) knowledge structures. This explicit interaction is afforded by the current Codex implementation. It is a common refrain that no closed system can be free of contradictions unless one steps outside the system (Gödel, 1962) – the goal in using of tools like Codex is to encourage thinkers to consider their own “systems” and their congruity with others’. Even if it is impossible to completely shed one’s hermeneutic horizon and step outside one’s system entirely, due consideration of the interplay between perspectives is a desirable behavior to inculcate among inquirers.

6.4.4 New Knowledge Sharing Modalities Codex is currently a destination; its users must actively seek it out. The Codex implementation, however, is just one way of realizing the impact of the situated knowledge model developed here. There are a number of other techniques by which this model could influence the communal act of building understanding.

Social networking tools are gaining currency outside of academe as a means of creating ad hoc communities on the basis of shared interests or mutual acquaintance. What social networks show is that we are members of nested and overlapping communities, many of which we are unaware, but which could still be leveraged to achieve some goal. Visions of the potential of social networks in the sciences have already been articulated in terms of virtual teams (Chin et al., 2002) and in geoscience and seismology research in particular (Wagner and Leydesdorff, 2003). Future social networking systems could take advantage of the perspectives modeled here to create networks based on more than just latent associations (such as the co-citation patterns that are common fodder for these networks). When starting to tackle problems of great geographic or intellectual scope, researchers might search for potential collaborators based on their facility with a particular set of concepts.

The ability to formalize knowledge structures in Codex also hints at new publication models. The repeatability of studies can be enhanced when the analysis process is described as a workflow that can be imported into a researcher’s workspace, modified, and re-applied. Could a knowledge resource be offered to one’s community as a contribution to the field, much as a journal article is today (with the cultural caveats described earlier)? There are parallels between

93 these reporting mechanisms – peer-review can be implemented in a platform like Codex through measures of authority and reuse; concept structures that find widespread adoption align with seminal articles. Efforts to rethink the dissemination of scientific information are burgeoning: examples include SciX, the Open, Self Organising Repository for Scientific Information Exchange (Martens et al., 2003), and the open-source DSpace project.21 Both of these promising projects are still heavily focused on what we currently think of as the tangible products of research – words, images, and so on. But it is not too far-fetched to think that knowledge markets could soon open for the sort of situated representations that Codex can capture.

For new publication models to gain traction, acceptance of alternatives to traditional written knowledge representations must increase. In the current academic model, tight control over the release of information helps safeguard one’s career progression. When administrators make promotional decisions, a researcher’s influence in a community is measured in large part by his or her publication record. Could a researcher still accrue rewards by contributing knowledge to a community in a Codex-like format? Ostensibly the impact of these contributions could be at least as great as that of publications, given their ability to directly influence the work of others, but the established academy must grow to accept measures of knowledge reuse as indicators of status.

Lastly, the growth of pervasive networks creates new opportunities for scientists to interact with each other and with their stored knowledge bases. Sensing the impact that wireless two-way radio could have on society, Vannevar Bush imagined the field scientist able to dictate to his or her recordkeeping device back in the laboratory. Today, prototypes for interacting with Codex via handheld computer and mobile phone have been built; researchers can add to and query the record from, in theory, anywhere, and their collaborators can have instant access to these insights. Grid computing endeavors present a further opportunity for knowledge structures to be built from distributed resources (concepts, data files, and analysis procedures located on servers pre-authorized to allow incoming connections and to initiate outgoing requests). It is conceivable for some task descriptions in Codex to be operationalized as grid workflows, making them eminently repeatable. Grids provide an appealing virtualization of scientific knowledge – resources appear to exist in an ether in which knowledge diffuses instantly.

In the future, Codex could serve as a knowledge-based front end to grid-based cyberinfrastructure such as the workbench that GEON is creating. If all of the resources currently managed by cyberinfrastructure environments could be accessed through Codex, then provenance information would be captured automatically as those resources were manipulated. (There is currently user action logging and mining functionality in Codex, although questions regarding the granularity of logging remain outstanding). Codex does not need to subsume the technical aspects of facilitating interoperation between grid nodes, but it will augment these operations with a layer of knowledge-capture and knowledge-driven search. Furthermore, the collaborative aspects of cyberinfrastructure, often neglected in favor of emphasis on high- performance computing, would be brought to the fore. When cyberinfrastructure makes resources accessible, Codex can make them useful.

21 http://www.dspace.org/ 94 6.5 Moving Knowledge across the Human-Computer Interface Aspiring toward the development of worldly representations of human expertise brings us closer to realizing true knowledge-driven computing. Far from a static report or downloadable ontology, a knowledge representation situated in the circumstances of its creation and use, reflective of the perspectives of its creators, and continually changing as understanding evolves, opens up new venues for cooperative solutions to the tough problems of modern science. Individual intellect is augmented with the ability to retrieve immediately understandable and applicable knowledge from colleagues known and foreign. Research and pedagogical communities gain insight into the consilience of shared concepts. Moving meaning between researchers via compelling concept-based tools can dramatically change the way we establish understanding of the world.

Geographers are relevant to the development of these tools through their concern for representation and their concern for place. Geographic representations express what we know about the world; how we know it should become an intrinsic part of these representations. And in these representations, place need not be just an earthly element – “where” in a knowledge space is equally important to achieving understanding as “where” in geographic space.

Ultimately, the combined knowledge model and computational implementation described here is a step toward fulfillment of Vannevar Bush’s vision for fundamentally new ways of representing the process of scientific inquiry. His entreaty in Memex Revisited is applicable now as ever: “Each generation will receive from its predecessor, not a conglomerate mass of discrete facts and theories, but an interconnected web which covers all that the race has thus far attained.”

95 References AAAS (2001). Atlas of Science Literacy. Washington, DC, American Association for the Advancement of Science. 165 p. Abou-Zeid, E. S. (2003). What can ontologists learn from knowledge management? Journal of Computer Information Systems 43(3): 109-117. Agarwal, P. (2004). Contested nature of place: Knowledge mapping for resolving ontological distinctions between geographical concepts. Geographic Information Science: Proceedings. M. J. Egenhofer, C. Freksa and H. Miller (eds.). Lecture Notes in Computer Science. 3234: 1-21. Ahlqvist, O., J. Keukelaar and K. Oukbir (2003). Rough and fuzzy geographical data integration. International Journal of Geographical Information Science 17(3): 223-234. Ansari, A., S. Essegaier and R. Kohli (2000). Internet recommendation systems. Journal of Marketing Research 37: 363-375. Arias, E. and G. Fischer (2000). Boundary objects: Their role in articulating the task at hand and making information relevant to it. Proceedings of the International ICSC Symposium on Interactive and Collaborative Computing (ICC2000), Wetaskiwin, Canada. Armstrong, M. (1993). Perspectives on the development of group decision support systems for locational problem-solving. Geographical Systems 1(1): 69-81. Baker, V. (1999). Geosemiosis. GSA Bulletin 111(5): 633-645. Balram, S., D. E. Suzana and S. Dragicevic (2004). A collaborative GIS method for integrating local and technical knowledge in establishing biodiversity conservation priorities. Biodiversity and Conservation 13(6): 1195-1208. Barsalou, L. (2002). Being there conceptually: Simulating categories in preparation for situated action. Representation, Memory, and Development: Essays in Honor of Jean Mandler. N. Stein, P. Bauer and M. Rabinowitz (eds.). Mahwah, NJ, L. Erlbaum: 1-16. Barsalou, L. and D. Medin (1986). Concepts: Static definitions or context-dependent representations? Cahiers de Psychologie Cognitive 6: 187-202. Barsalou, L. and K. Wiemer-Hastings ((in press)). Situating abstract concepts. Grounding Cognition: The Role of Perception and Action in Memory, Language, and Thought. R. Zwaan (ed.). New York, Cambridge University Press. Barsalou, L., W. Yeh, B. Luka, K. Olseth, K. Mix and L. Wu (1993). Concepts and meaning. Chicago Linguistics Society: Papers from the Parasession on Conceptual Representations. K. Beals, G. Cooke, D. Kathman et al. (eds.). Chicago, University of Chicago. 29: 23-61. Baumgartner, P., U. Furbach, M. Gross-Hardt and A. Sinner (2004). Living book - Deduction, slicing, and interaction. Journal of Automated Reasoning 32(3): 259-286. Berners-Lee, T., J. Hendler and O. Lassila (2001). The semantic web. Scientific American 284(5): 34-43. Bernstein, R. (1983). Beyond Objectivism and Relativism: Science, Hermeneutics, and Praxis. Philadelphia, University of Pennsylvania Press. 284 p. Bibby, P. and J. Shepard (2000). GIS, land use, and representation. Environment and Planning B 27: 583-598. Bishr, Y. (1998). Overcoming the semantic and other barriers to GIS interoperability. International Journal of Geographical Information Science 12(4): 299-314.

96 Bishr, Y. A., H. Pundt, W. Kuhn and M. Radwan (1999). Probing the concepts of information communities. Interoperating Geographic Information Systems. M. Goodchild, M. Egenhofer, R. Fegeas and C. Kottman (eds.). Dordrecht, Kluwer: 55-69. Blaser, A., M. Sester and M. Egenhofer (2000). Visualization in an early stage of the problem- solving process in GIS. Computers & Geosciences 26: 57-66. Borghoff, U. (2000). Computer-Supported Cooperative Work. New York, Springer. 529 p. Bozsak, E., M. Ehrig, S. Handschuh, A. Hotho, A. Maedche, B. Motik, D. Oberle, C. Schmitz, S. Staab, L. Stojanovic, N. Stojanovic, R. Studer, G. Stumme, Y. Sure, J. Tane, R. Volz and V. Zacharias (2002). KAON - Towards a large scale Semantic Web. E-Commerce and Web Technologies, Proceedings. Lecture Notes in Computer Science. 2455: 304-313. Brachman, R. and J. Schmolze (1985). An overview of the KL-ONE knowledge representation system. Cognitive Science 9(2): 171-216. Brewer, I., A. MacEachren, H. Abdo, J. Gundrum and G. Otto (2000). Collaborative geographic visualization: Enabling shared understanding of environmental processes. IEEE Information Visualization Symposium: INFOVIS 2000, Salt Lake City, IEEE. 137-141. Brodaric, B. and M. Gahegan (2001). Learning geoscience categories in situ: Implications for geographic knowledge representation. Proceedings, ACM-GIS 2000: The Ninth ACM International Symposium on Advances in Geographic Information Systems, Atlanta, GA. 130-135. Buckingham Shum, S., E. Motta and J. Domingue (2000). ScholOnto: An ontology-based digital library server for research documents and discourse. International Journal on Digital Libraries 3(3): 237-248. Bunt, H. C. and W. J. Black, Eds. (2000). Abduction, belief, and context in dialogue: Studies in computational pragmatics. Natural language processing ; v 1. Philadelphia, John Benjamins. 471 p. Bush, V. (1945). As we may think. The Atlantic 176(1): 101-108. Bush, V. (1959). Memex II. Vannevar Bush Papers, MC78, Box 21. MIT Archives. Butler, T. (1998). Towards a hermeneutic method for interpretive research in information systems. Journal of Information Technology 13(4): 285-300. Buzaglo, M. (2002). The Logic of Concept Expansion. New York, Cambridge University Press. 182 p. Cañas, A., D. Leake and D. Wilson (1999). Exploring the Synergies of Knowledge Management & Case-Based Reasoning. AAAI Workshop Technical Report WS-99-10. Menlo, CA, AAAI Press. Cao, C. G., Q. Z. Feng, Y. Gao, F. Gu, J. X. Si, Y. F. Sui, W. Tian, H. T. Wang, L. L. Wang, Q. T. Zeng, C. X. Zhang, Y. F. Zheng and X. B. Zhou (2002). Progress in the development of national knowledge infrastructure. Journal of Computer Science and Technology 17(5): 523-534. Carroll, J., M.-B. Rosson, P. Isenhour, C. Van Metre, W. Schafer and C. Ganoe (2001). MOOsburg: Multi-user domain support for a community network. Internet Research: Electronic Networking Applications and Policy 11(1): 65-73. Cerf, V., A. G. W. Cameron, J. Lederberg, C. T. Russell, B. R. Schatz, P. M. B. Shames, L. S. Sproull, R. A. Weller and W. A. Wulf (1993). National Collaboratories: Applying Information Technology for Scientific Research. Washington, DC, National Academy Press. 118 p.

97 Chen, C. (2004). Searching for intellectual turning points: Progressive knowledge domain visualization. Proceedings of the National Academy of Sciences 101(Suppl. 1): 5303- 5310. Chin, G., J. Myers and D. Hoyt (2002). Social networks in the virtual science laboratory. Communications of the ACM 45(8): 87-92. Chung, P. W. H., L. Cheung, J. Stader, P. Jarvis, J. Moore and A. Macintosh (2003). Knowledge- based process management - an approach to handling adaptive workflow. Knowledge- Based Systems 16(3): 149-160. Churcher, C. and N. Churcher (1999). Realtime conferencing in GIS. Transactions in GIS 3(1): 23-30. Clancey, W. (1994). Situated cognition: How representations are created and given meaning. Lessons from Learning. R. Lewis and P. Mendelsohn (eds.). Amsterdam, North-Holland: 231-242. Collins, H. (2003). Enterprise Knowledge Portals. New York, American Management Association. 430 p. Colmerauer, A. and P. Roussel (1993). The Birth of Prolog. History of Programming Languages II. Cambridge, MA, SIGPLAN Notices. 28: 37-52. Conklin, J. and M. Begeman (1989). gIBIS: A tool for all reasons. Journal of the American Society for Information Science 40(3): 200-213. Cooke, N. (1994). Varieties of knowledge elicitation techniques. International Journal of Human-Computer Studies 41: 801-849. Davey, B. and A. Tatnall (2004). Misinforming knowledge through ontology. Informing Science InSITE 2004, Rockhampton, Australia, Informing Science Institute. Davis, R., H. Shrobe and P. Szolovits (1993). What is a knowledge representation? AI Magazine 14(1): 17. DCMI Usage Board. DCMI Metadata Terms. http://dublincore.org/documents/2005/01/10/dcmi- terms/. Accessed: 20 January 2005 2005. de Moor, A., M. Keeler and G. Richmond (2002). Towards a pragmatic web. International Conference on Computational Science, Proceedings. U. Priss, D. Corbett and G. Angelova (eds.), Springer-Verlag. Lecture Notes in Computer Science. 2393: 235-249. Decker, S., S. Melnik, F. Van Harmelen, D. Fensel, M. Klein, J. Broekstra, M. Erdmann and I. Horrocks (2000). The Semantic Web: The roles of XML and RDF. IEEE Internet Computing 4(5): 63-74. Denrell, J., N. Arvidsson and U. Zander (2004). Managing knowledge in the dark: An empirical study of the reliability of capability evaluations. Management Science 50(11): 1491-1503. Doel, M. A. (2001). Qualified quantitative geography. Environment and Planning D 19(5): 555- 572. Dumais, S., E. Cutrell, J. Cadiz, G. Jancke, R. Sarin and D. Robbins (2003). Stuff I've Seen: A system for personal information retrieval and reuse. SIGIR 2003, Toronto. Dustdar, S. (2004). Caramba - A process-aware collaboration system supporting ad hoc and collaborative processes in virtual teams. Distributed and Parallel Databases 15(1): 45- 66. Eckert, A. (1998). The "Network Elaboration Technique": A computer-assisted instrument for knowledge assessment. Diagnostica 44(4): 220-224. Edelson, D., R. Pea and L. Gomez (1996). The Collaboratory Notebook. Communications of the ACM. 39: 32-33.

98 Ehrig, M., C. Schmitz, S. Staab, J. Tane and C. Tempich (2004). Towards evaluation of peer-to- peer-based distributed knowledge management systems. Agent-Mediated Knowledge Management. Lecture Notes in Artificial Intelligence. 2926: 73-88. Elmagarmid, A., M. Rusinkiewicz and A. Sheth, Eds. (1999). Management of Heterogenous and Autonomous Database Systems. San Francisco, Morgan Kaufmann. 413 p. Engelbart, D. (1962). Augmenting human intellect: A conceptual framework. AFOSR-3233. Menlo Park, CA, Stanford Research Institute. Fabrikant, S. and B. Buttenfield (2001). Formalizing semantic spaces for information access. Annals of the Association of American Geographers 91(2): 263-280. Fellbaum, C., Ed. (1998). WordNet: An electronic lexical database. Cambridge, MA, MIT Press. 433 p. Fensel, D. (2001). Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. New York, Springer. 138 p. Feyerabend, P. (1988). Against Method. Revised ed. London, Verso. 296 p. FGDC (1998). FGDC-STD-001-1998. Content standard for digital geospatial metadata. Washington, DC, Federal Geographic Data Committee. Finholt, T. A. (2001). Collaboratories. Annual Review of Information Science and Technology. B. Cronin. 36: 73-108. Fischer, G. (2000). Symmetry of ignorance, social creativity, and meta-design. Knowledge-Based Systems 13(7-8): 527-537. Fischer, G. and J. Ostwald (2001). Knowledge management: Problems, promises, realities, and challenges. IEEE Intelligent Systems 16(1): 60-72. Fodor, J. (1998). Concepts: Where cognitive science went wrong. New York, Oxford University Press. 174 p. Fonseca, F. and J. Martin (2005). Toward an alternative notion of information system ontologies: Information engineering as a hermeneutic enterprise. Journal of the American Society for Information Science and Technology 56(1): 46-57. Fonseca, F., J. Martin and M. A. Rodriguez (2002a). From geo- to eco-ontologies. Geographic Information Science, Proceedings. M. J. Egenhofer and D. M. Mark (eds.). Boulder, CO. Lecture Notes in Computer Science. 2478: 93-107. Fonseca, F. T., M. J. Egenhofer and P. Agouris (2002b). Using ontologies for integrated Geographic Information Systems. Transactions in GIS 6(3). Formica, A. and M. Missikoff (2002). Concept similarity in SymOntos: An enterprise ontology management tool. The Computer Journal 45(6): 583-594. Frank, A. U. (2001). Tiers of ontology and consistency constraints in geographical information systems. International Journal of Geographical Information Science 15(7): 667-678. Frodeman, R. (2003). Geo-Logic: Breaking Ground Between Philosophy and the Earth Sciences. Albany, SUNY Press. 184 p. Gadamer, H.-G. (1975). Truth and Method. G. Barden and J. Cumming (trans.). New York, Continuum. 551 p. Gardenfors, P. (2000). Conceptual Spaces: The Geometry of Thought. Cambridge, MA, MIT Press. 307 p. Garson, G. (1998). Neural networks: an introductory guide for social scientists. London, Sage. 194 p. Gazan, R. (2003). Metadata as a realm of translation: Merging knowledge domains in the design of an environmental information system. Knowledge Organization 30(3-4): 182-190.

99 Ginsberg, M. (1991). Knowledge interchange format - the KIF of death. AI Magazine 12(3): 57- 63. Gödel, K. (1962). On Formally Undecidable Propositions of Principia Mathematica and Related Systems. B. Meltzer (trans.). Edinburgh, Oliver and Boyd. 72 p. Goldstone, R., D. Medin and J. Halberstadt (1997). Similarity in context. Memory and Cognition 25(2): 237-255. Golledge, R. G. (2002). The nature of geographic knowledge. Annals of the Association of American Geographers 92(1): 1-14. Gould, M. (1994). GIS design - A hermeneutic view. Photogrammetric Engineering and Remote Sensing 60(9): 1105-1116. Graves, T. and A. Mockus (2001). Identifying productivity drivers by modeling work units using partial data. Technometrics 43(2): 168-179. Gray, P. H. and D. B. Meister (2004). Knowledge sourcing effectiveness. Management Science 50(6): 821-834. Gruber, T. (1995). Toward principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies 43(5/6): 907-928. Guarino, N. (1997). Understanding, building, and using ontologies. International Journal of Human-Computer Studies 46: 293-310. Guha, R. V. and D. B. Lenat (1991). Cyc - a Midterm Report. Applied Artificial Intelligence 5(1): 45-86. Halbert, M., J. Kaczmarek and K. Hagedorn (2003). Findings from the Mellon Metadata Harvesting initiative. Research and Advanced Technology for Digital Libraries. T. Koch and I. Solvberg (eds.). Lecture Notes in Computer Science. 2769: 58-69. Harvey, F. and N. Chrisman (1998). Boundary objects and the social construction of GIS technology. Environment and Planning A 30(9): 1683-1694. Harvey, F., W. Kuhn, H. Pundt, Y. Bishr and C. Riedemann (1999). Semantic interoperability: A central issue for sharing geographic information. Annals of Regional Science 33(2): 213- 232. Havre, S., B. Hetzler and L. Nowell (2000). ThemeRiver: Visualizing theme changes over time. Proceedings of IEEE Symposium on Information Visualization, InfoVis 2000. 115-123. Head, C. (1991). Mapping as language or semiotic system: Review and comment. Cognitive and Linguistic Aspects of Geographic Space. D. M. Mark and A. U. Frank (eds.), Kluwer: 237-262. Heemskerk, M., K. Wilson and M. Pavao-Zuckerman (2003). Conceptual models as tools for communication across disciplines. Conservation Ecology 7(3). Henzinger, M. and S. Lawrence (2004). Extracting Knowledge from the World Wide Web. Proceedings of the National Academy of Sciences 101(Suppl. 1): 5186-5191. Hetzler, B., P. Whitney, L. Martucci and J. Thomas (1998). Multi-faceted insight through interoperable visual information analysis paradigms. Proceedings of IEEE Symposium on Information Visualization, InfoVis 1998, Research Triangle Park, NC. 137-144. Hey, T. and A. Trefethen (2003). e-Science and its implications. Philosophical Transactions of the Royal Society of London Series A - Mathematical, Physical and Engineering Sciences 361(1809): 1809-1825. Holsapple, C. W. and K. D. Joshi (2001). Organizational knowledge resources. Decision Support Systems 31(1): 39-54. Husserl, E. (1970). Logical Investigations. J. Findlay (trans.). London, Routledge. 877 p.

100 Jankowski, P. and T. Nyerges (2001). GIS-supported collaborative decision making: Results of an experiment. Annals of the Association of American Geographers 91(1): 48-70. Kamada, T. and S. Kawai (1991). A general framework for visualizing abstract objects and relations. ACM Transactions on Graphics 10(1): 1-39. Kant, I. (1987). Critique of Judgment. W. Pluhar (trans.). Indianapolis, Hackett. 576 p. Kant, I. (1996). Critique of Pure Reason. Unified ed. W. Pluhar (trans.). Indianapolis, Hackett. 1030 p. Kawalek, J. P. (2004). Systems thinking and knowledge management: Positional assertions and preliminary observations. Systems Research and Behavioral Science 21(1): 17-36. Kazic, T. (2000). Semiotes: A semantics for sharing. Bioinformatics 16(12): 1129-1144. Keller, R. InvestigationOrganizer. http://io.arc.nasa.gov/. Accessed: 10 March 2005. Keller, R. and J. Dungan (1999). Meta-modeling: A knowledge-based approach to facilitating process model construction and reuse. Ecological Modelling 119: 89-116. Kifer, M., G. Lausen and J. Wu (1995). Logical foundations of object-oriented and frame-based languages. Journal of the ACM 42(4): 741-843. Kinchin, I. (2001). If concept mapping is so helpful to learning biology, why aren't we all doing it? International Journal of Science Education 23(12): 1257-1269. Klein, M., D. Fensel, A. Kiryakov and D. Ogyanov (2002). Ontology versioning and change detection on the Web. Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web. A. Gomez-Perez and V. Richard Benjamins (eds.). Berlin, Springer-Verlag. Lecture Notes in Computer Science. 2473: 197-212. Kleist, V. F., L. Williams and A. G. Peace (2004). A performance evaluation framework for a public university knowledge management system. Journal of Computer Information Systems 44(3): 9-16. Kottman, C. (2001). White paper on trends in the intersection of GIS and IT. National Academies Computer Science and Telecommunications Board. Kouzes, R., J. Myers and W. Wulf (1996). Collaboratories: Doing science on the internet. Computer 29(8): 40-46. Kuhn, T. (1962). The Structure of Scientific Revolutions. Chicago, University of Chicago Press. 172 p. Kuhn, W. (2001). Ontologies in support of activities in geographical space. International Journal of Geographical Information Science 15(7): 613-631. Landauer, T., D. Laham and M. Derr (2004). From paragraph to graph: latent semantic analysis for information visualization. Proceedings of the National Academy of Sciences 101(suppl. 1): 5220-5227. Lang, K. and M. Burnett (2000). XML, metadata, and efficient knowledge discovery. Knowledge-Based Systems 13: 321-331. Langley, P., J. Shrager and K. Saito (2002). Computational discovery of communicable scientific knowledge. Logical and Computational Aspects of Model-Based Reasoning. L. Magnani, N. Nersessian and C. Pizzi (eds.). Amsterdam, Kluwer. Lave, J. and E. Wenger (1991). Situated Learning: Legitimate Peripheral Participation. New York, Cambridge University Press. 138 p. Leake, D., A. Maguitman and A. Cañas (2002). Assessing conceptual similarity to support concept mapping. Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference, AAAI Press. 168-172.

101 Lemke, J. (1997). Cognition, context, and learning: A social semiotic perspective. Situated Cognition: Social, Semiotic, and Psychological Perspectives. D. Kirshner and J. Whitson (eds.). Mahwah, NJ, Erlbaum: 37-55. Li, G., V. Uren, E. Motta, S. Buckingham Shum and J. Domingue (2002). ClaiMaker: Weaving a semantic web of research papers. The Semantic Web - ISWC 2002: First International Semantic Web Conference. I. Horrocks and J. Hendler (eds.). Berlin, Springer-Verlag. Lecture Notes in Computer Science. 2342: 436-441. Lin, K., B. Ludaescher, B. Brodaric, D. Seber, C. Baru and K. Sinha (2003). Semantic mediation services in geologic data integration: A case study from the GEON grid. Geological Society of America Abstracts with Programs 35(6): 365. Lysakowski, R. and L. Doyle (1998). Electronic lab notebooks: Paving the way of the future of R&D. Records Management Quarterly: 23-28. MacEachren, A. (1995). How Maps Work. New York, Guilford. 513 p. MacEachren, A. (2001). Cartography and GIS: Extending collaborative tools to support virtual teams. Progress in Human Geography 25(3): 431-444. MacEachren, A., M. Gahegan and W. Pike (2004). Visualization for constructing and sharing geo-scientific concepts. Proceedings of the National Academy of Sciences 101(Suppl. 1): 5279-5286. MacGregor, R. (1994). A description classifier for the predicate calculus. Proceedings AAAI 94 National Conference, Seattle. 213-220. Magnani, L. (1999). Withdrawing unfalsifiable hypotheses. Foundations of Science 4(2): 133- 153. Magnani, L. (2001). Abduction, Reason, and Science: Processes of Discovery and Explanation. New York, Kluwer. 205 p. Majchrzak, A., L. P. Cooper and O. E. Neece (2004). Knowledge reuse for innovation. Management Science 50(2): 174-188. Marcos, E. and A. Marcos (2001). A philosophical approach to the concept of data model: Is a data model, in fact, a model? Information Systems Frontiers 3(2): 267-274. Martens, B., Z. Turk, B. C. Bjork and G. Cooper (2003). Re-engineering the scientific knowledge management process: the SciX project. Automation in Construction 12(6): 677-687. Martin, R. (1998). One Long Experiment. New York, Columbia University Press. 272 p. McGuinness, D. and F. van Harmelen. OWL Web Ontology Language Overview. http://www.w3.org/TR/owl-features. Accessed: 17 November 2004. Moulton, A., S. E. Madnick and M. D. Siegel (2001). Knowledge representation architecture for context interchange mediation. MIT Sloan Working Paper No. 4184-01. Cambridge, MA, MIT Sloan School of Management. Myers, J., E. Mendoza and B. Hoopes (2001). A collaborative electronic notebook. Proceedings of the IASTED International Conference on Internet and Multimedia Systems and Applications, Honolulu. Myers, M. (1995). Dialectical hermeneutics: A theoretical framework for the implementation of information systems. Information Systems Journal 5(1): 51-70. Nagypal, G. and B. Motik (2003). A fuzzy model for representing uncertain, subjective, and vague temporal knowledge in ontologies. On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. Berlin, Springer-Verlag. Lecture Notes in Computer Science. 2888: 906-923.

102 Nake, F. and S. Grabowski (2001). Human-computer interaction viewes as pseudo- communication. Knowledge-Based Systems 14: 441-447. Nelson, T. (1965). A file structure for the complex, the changing, and the indeterminate. Proceedings of the 20th National ACM Conference, Cleveland, ACM Press. 84-100. Nelson, T. (1974). Computer Lib/Dream Machines. Chicago, T.H. Nelson. 69 p. Neuwirth, C. M., J. H. Morris, S. H. Regli, R. Chandhok and G. C. Wenger (1998). Envisioning communication: Task-tailorable representations of communication in asynchronous work. Proceedings of the 1998 ACM Conference on Computer Supported Cooperative Work, Seattle, ACM Press. 265-274. Newell, A. and H. Simon (1972). Human Problem Solving. Englewood Cliffs, NJ, Prentice-Hall. 920 p. Newell, A. and H. Simon (1976). Computer science as empirical inquiry. Communications of the ACM 19(3): 113-126. Novak, J. (1990). Concept mapping: A useful tool for science education. Journal of Research in Science Teaching 27(10): 937-949. Novak, J. and D. Gowin (1984). Learning How to Learn. New York, Cambridge University Press. 199 p. Noy, N. F. and C. D. Hafner (2000). Ontological foundations for experimental science knowledge bases. Applied Artificial Intelligence 14(6): 565-618. NSF (2003). Revolutionizing Science and Engineering Through Cyberinfrastructure. Washington, DC, National Science Foundation. Nyce, J. and P. Kahn, Eds. (1991). From Memex to Hypertext: Vannevar Bush and the Mind's Machine. Boston, Academic Press. 367 p. Nyerges, T. (1991). Geographic information abstractions: conceptual clarity for geographical modeling. Environment and Planning A 23: 1483-1499. Nyerges, T., P. Jankowski and C. Drew (2002). Data-gathering strategies for social-behavioural research about participatory geographical information system use. International Journal of Geographical Information Science 16(1): 1-22. Ogden, C. and I. Richards (1927). The Meaning of Meaning. New York, Harcourt Brace. 363 p. Peirce, C. (1868). On a new list of categories. Proceedings of the American Academy of Arts and Sciences 7: 287-298. Peirce, C. (1877). Fixation of belief. Popular Science Monthly 12(November): 1-15. Peirce, C. (1905). What pragmatism is. The Monist 15(2): 161-181. Peirce, C. (1931). Collected Papers. C. Hartshorne and P. Weiss (eds.). Cambridge, MA, Harvard University Press. Pickles, J. (1995). Representations in an electronic age: Geography, GIS, and Democracy. Ground Truth: The Social Implications of Geographic Information Systems. J. Pickles (ed.). New York, Guilford Press: 1-30. Pike, W. and M. Gahegan (2003). Constructing semantically scalable cognitive spaces. Conference on Spatial Information Theory 2003, Kartause Ittingen, Switzerland. 332- 348. Pike, W., B. Yarnal, A. M. MacEachren, M. Gahegan and C. Yu (2005). Retooling collaboration: A vision for environmental change research. Environment 47(2): 8-21. Popper, K. (1959). The Logic of Scientific Discovery. London, Hutchinson. 479 p. Prakash, A., H. Shim and L. Lee (1999). Data management issues and trade-offs in CSCW systems. IEEE Transactions on Knowledge and Data Engineering 11(1): 213-227.

103 Putnam, H. (1988). Representation and Reality. Cambridge, MA, MIT Press. 136 p. Quine, W. (1990). Pursuit of Truth. Cambridge, MA, Harvard University Press. 113 p. Rao, A. and M. Georgeff (1991). Modeling rational agents within a BDI-architecture. Proceedings of Knowledge Representation and Reasoning. R. Fikes and E. Sandewall (eds.). San Mateo, CA, Morgan Kaufmann: 473-484. Raymond, J., E. Gardiner and P. Willett (2002). RASCAL: Calculation of graph similarity using maximum common edge subgraphs. The Computer Journal 45(6): 631-644. Redeker, G. (2000). Coherence and structure in text and discourse. Abduction, Belief and Context in Dialogue. H. C. Bunt and W. J. Black (eds.). Philadelphia, John Benjamins: 233-264. Reichgelt, H. (1991). Knowledge Representation: An AI Perspective. Norwood, NJ, Ablex. 251 p. Rittel, H. and M. Webber (1973). Dilemmas in a general theory of planning. Policy Sciences 4: 155-169. Rodriguez, M. A., M. Egenhofer and R. Rugg (1999). Assessing semantic similarities among geospatial feature class definitions. Interoperating Geographic Information Systems INTEROP'99. A. Vckowski, K. Brassel and H.-J. Schek (eds.). Zurich, Springer-Verlag. Lecture Notes in Computer Science. 1580: 189-202. Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology 104(3): 192-233. Rosch, E. and C. Mervis (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology 7: 573-605. Rothbart, D. and I. Scherer (1997). Kant's Critique of Judgment and the scientific investigation of matter. International Journal for Philosophy of Chemistry 3: 65-80. Sanchez, J. and P. Langley (2003). An interactive environment for scientific modeling and discovery. Proceedings of the International Conference on Knowledge Capture, Sanibel Island, FL, ACM Press. 138-145. Saussure, F. (1974). Course in General Linguistics. W. Baskin (trans.). C. Bally, A. Sechehaye and A. Reidlinger (eds.). London, Fontana. 240 p. Schissel, D. P., A. Finkelstein, I. T. Foster, T. W. Fredian, M. J. Greenwald, C. D. Hansen, C. R. Johnson, K. Keahey, S. A. Klasky and K. Li (2002). Data management, code deployment, and scientific visualization to enhance scientific discovery in fusion research through advanced computing. Fusion Engineering and Design 60(3): 481-486. Schön, D. A. (1995). The Reflective Practitioner: How Professionals Think in Action. Aldershot, England, Arena. 374 p. Schultze, U. and R. J. Boland (2000). Knowledge management technology and the reproduction of knowledge work practices. Journal of Strategic Information Systems 9: 193-212. Searle, J. (2002). Twenty-one years in the chinese room. Views into the Chinese Room. J. Preston and M. Bishop (eds.). Oxford, Clarendon Press: 51-69. Sellen, A. and R. Harper (2002). The Myth of the Paperless Office. Cambridge, MA, MIT Press. 231 p. Shelley, C. (1996). Visual abductive reasoning in archaeology. Philosophy of Science 63: 278- 301. Sheth, A. (1999). Changing focus on interoperability in information systems. Interoperating Geographic Information Systems. M. Goodchild, M. Egenhofer, R. Fegeas and C. Kottman (eds.). Dordrecht, Kluwer: 5-30.

104 Sheth, A. and J. Larson (1990). Federated database systems for managing distributed, heterogenous, and autonomous databases. ACM Computing Surveys 22(183-236). Sismondo, S. and N. Chrisman (2001). Deflationary metaphysics and the natures of maps. Philosophy of Science 68(3 Suppl.): S38-S49. Smith, B. (1995). Formal ontology, common sense, and cognitive science. International Journal of Human-Computer Studies 43: 641-667. Smith, B. and D. M. Mark (2001). Geographical categories: an ontological investigation. International Journal of Geographical Information Science 15(7): 591-612. Smith, E. and D. Medin (1981). Categories and concepts. Cambridge, MA, Harvard University Press. 203 p. Snow, C. (1959). The Two Cultures and the Scientific Revolution. Cambridge, Cambridge University Press. 51 p. Solomon, K., D. Medin and E. Lynch (1999). Concepts do more than categories. Cognitive Science 3(3): 99-104. Sowa, J. (2000a). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove, CA, Brooks/Cole. 594 p. Sowa, J. (2000b). Ontology, metadata, and semiotics. Conceptual structures: Logical, linguistic, and computational issues. B. Ganter and G. Mineau (eds.). Berlin, Springer-Verlag. Lecture Notes in Computer Science. 1867: 55-81. Star, S. (1989). The structure of ill-structured solutions: Heterogenous problem-solving, boundary objects, and distributed artificial intelligence. Distributed Artificial Intelligence. M. Huhns and L. Gasser (eds.). Menlo Park, CA, Morgan Kaufmann: 37-54. Stevens, R., A. Robinson and C. Goble (2003). myGrid: Personalised bioinformatics on the information grid. Bioinformatics 19(Suppl. 1): i302-i304. Stojanovic, L., N. Stojanovic and A. Maedche (2003). Change discovery in ontology-based knowledge management systems. Advanced Conceptual Modeling Techniques. Lecture Notes in Computer Science. 2784: 51-62. Storey, M.-A., M. Musen, J. Silva, C. Best, N. Ernst, R. Fergerson and N. Noy (2001). Jambalaya: Interactive visualization to enhance ontology authoring and knowledge acquisition in Protege. Workshop on Interactive Tools for Knowledge Capture, K-CAP- 2001, Victoria, B.C., Canada. Sugumaran, V. and V. C. Storey (2002). Ontologies for conceptual modeling: their creation, use, and management. Data & Knowledge Engineering 42(3): 251-271. Suh, B. and B. Bederson (2001). OZONE: A zoomable inferface for navigating ontology information. HCIL Tech Report #2004-04. College Park, MD, University of Maryland. Suthers, D. (1999). Representational support for collaborative inquiry. Proceedings of the 32nd Hawaii International Conference on System Sciences, Maui, HI, IEEE. 1076. Szuba, T. (2001). Computational Collective Intelligence. New York, Wiley. 424 p. Thagard, P. (1992). Conceptual Revolutions. Princeton, NJ, Princeton University Press. 285 p. Toulmin, S. (1958). The Uses of Argument. Cambridge, Cambridge University Press. 264 p. Turoff, M. and S. Hiltz (1996). Computer based Delphi processes. Gazing into the Oracle: The Delphi Method and its Application to Social Policy and Public Health. M. Adler and E. Ziglio (eds.). London, Kingsley. Turoff, M., S. Hiltz, M. Bieber, J. Fjermestad and A. Rana (1999). Collaborative discourse structures in computer mediated group communications. Proceedings of the Thirty- Second Annual Hawaii International Conference on Systems Sciences, Maui, HI, IEEE.

105 Tversky, B. (2005). Form and function. Functional Features in Language and Space. L. Carlson and E. van der Zee (eds.). New York, Oxford University Press. Utting, K. and N. Yankelovich (1989). Context and orientation in hypermedia networks. ACM Transactions on Information Systems 7(1): 58-84. van Bruggen, J., H. Boshuizen and P. Kirschner (2003). A cognitive framework for cooperative problem soving with argument visualization. Visualizing Argumentation. P. Kirschner, S. Buckingham Shum and C. Carr (eds.). London, Springer-Verlag: 25-47. Vckowski, A. (1999). Interoperability and spatial information theory. Interoperating Geographic Information Systems. M. Goodchild, M. Egenhofer, R. Fegeas and C. Kottman (eds.). Dordrecht, Kluwer: 31-37. Visser, U., H. Stuckenschmidt, G. Schuster and T. Vogele (2002). Ontologies for geographic information processing. Computers & Geosciences 28(1): 103-117. Voisard, A. (1999). Abduction and deduction in geologic hypermaps. Advances in Spatial Databases: Sixth International Symposium. R. Güting, D. Papadias and F. Lochovsky (eds.). Lecture Notes in Computer Science. 1651: 311-329. Wagner, C. and L. Leydesdorff (2003). Seismology as a dynamic, distributed area of scientific research. Scientometrics 58(1): 91-114. Wallsten, T. S., D. V. Budescu, I. Erev and A. Diederich (1997). Evaluating and combining subjective probability estimates. Journal of Behavioral Decision Making 10(3): 243-268. Wallsten, T. S. and A. Diederich (2001). Understanding pooled subjective probability estimates. Mathematical Social Sciences 41(1): 1-18. Wenger, E. (1998). Communities of Practice: Learning, Meaning, and Identity. New York, Cambridge University Press. 318 p. Whitehead, A. (1929). Process and Reality: An Essay in Cosmology. New York, Social Science Book Store. 546 p. Winograd, T. and F. Flores (1986). Understanding computers and cognition. Norwood, NJ, Ablex. 207 p. Wittgenstein, L. (1953). Philosophical Investigations. G. Anscombe (trans.). New York, Macmillan. 232 p. Zaff, B., M. McNeese and D. Snyder (1993). Capturing multiple perspectives: A user-centered approach to knowledge and design acquisition. Knowledge Acquisition 5: 79-116.

106 Vita WILLIAM A. PIKE EDUCATION 2001-2005 The Pennsylvania State University Ph.D., Geography, 2005 1999-2001 The Pennsylvania State University M.S., Geography, 2001 1995-1999 Carleton College B.A., Geology, 1999

PUBLICATIONS Pike W, Yarnal B, MacEachren A, Gahegan M, Yu C, 2005, “Retooling collaboration: A vision for the future of environmental change science”, Environment 47(2): 8-21. Fuhrmann S and Pike W, “User-centered design of collaborative geovisualization tools”, forthcoming in: A.M. MacEachren, M.-J. Kraak & J. Dykes (Eds.): Exploring Geovisualization, Elsevier, London. Pike W, 2004, “Modeling water quality violations with Bayesian networks”, Journal of the American Water Resources Association 40(6): 1563-1578. MacEachren AM, Gahegan M, Pike W, Brewer I, Cai G, Langerich E, Hardisty F, 2004, “Geovisualization for knowledge construction and decision-support”, Computer Graphics and Applications 24(1): 13-17. MacEachren AM, Gahegan M, Pike W, 2004, “Visualization for constructing and sharing geo-scientific concepts”, Proceedings of the National Academy of Sciences 101(suppl. 1): 5279-5286. Pike W and Gahegan M, 2003, “Constructing semantically scalable cognitive spaces”, in Spatial Information Theory: Foundations of Geographic Information Science. Conference on Spatial Information Theory COSIT03, Lecture Notes in Computer Science 2825, Kuhn W, Worboys M, and Timpf S (Eds.). Springer- Verlag, Berlin: 332-348. MacEachren AM, Pike W, Yu C, Brewer I, Gahegan M, Weaver W, Yarnal B, “Building a Geocollaboratory: Supporting Human-Environment Regional Observatory (HERO) Collaborative Science Activities”, submitted to: Computers, Environment, and Urban Systems.

PRESENTATIONS Pike W and Gahegan M, “Beyond ontologies: Toward situated representations of concept development”, Workshop on the Potential of Cognitive Semantics for Ontologies, International Conference on Formal Ontologies in Information Systems, Torino, Italy, November 2004. Pike W and Gahegan M, “Visualizing concept relationships in a distributed knowledge sharing environment”, GIScience 2004, Adelphi, MD, October 2004. Pike W, Ahlqvist O, Gahegan M, Oswal S, “Supporting collaborative science through a knowledge and data management portal,” Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data, Second International Semantic Web Conference, Sanibel Island, FL, October 2003. Pike W and Gahegan M, “Constructing semantically scalable cognitive spaces,” Conference on Spatial Information Theory 2003, Ittingen, Switzerland, September 2003. Pike W, “Concept visualization for Web-based group collaboration,” 99th Annual Meeting of the Association of American Geographers, New Orleans, LA, March 2003. Pike W, “HERO project overview and infrastructure development” (panel participant), 99th Annual Meeting of the Association of American Geographers, New Orleans, LA, March 2003. Pike W, “Modeling water quality violations with Bayesian networks,” 98th Annual Meeting of the Association of American Geographers, Los Angeles, CA, March 2002. Pike W, Bertka CM, Fei YM, “High pressure melting relations in the Fe-Ni-S system: Implications for the core of Mars,” 30th Lunar and Planetary Science Conference, Houston, TX, March 1999. Pike W, et al., “Viscosity of venusian lava flows: Constraints from fractal dimension and chemical composition,” 29th Lunar and Planetary Science Conference, Houston, TX, March 1998.

AWARDS AND MEMBERSHIPS AT&T Wireless Graduate Research Fellowship, 2004-2005 Stephen E. Dwornik Planetary Geosciences Student Paper Award, March 1999 Sigma Xi Scientific Research Honor Society, Elected May 1999