Contextual Retrieval of Digital Content in Topic Maps
Christo Dichev, Darina Dicheva, Jan Fischer
Winston-Salem State University 601 M.L.K. Jr. Dr., Winston Salem, N.C. 27110, USA {dichevc, dichevad, jfisc139}@wssu.edu http://gorams.wssu.edu/faculty/dichevc/ http://myweb.wssu.edu/dichevad/
Abstract various search engines and classification systems are available, finding relevant information is still a challenging task. Whether the retrieved documents are relevant to the user’s request depends on much more than just the subject In this paper we discuss the contextual support provided in of the documents. A big part of the problem is that TM4L - an environment for building and using Topic relevancy is contextual and subjective to the individual Maps-based learning repositories. We propose an extension who is performing the search. What is relevant depends on of the Topic Maps model with contexts and an approach the context. Only the users can judge how relevant a for contextual retrieval of information. Our approach is particular bit of information is to what they are attempting based on defining context primitives and a simple language for expressing context queries. The resources retrieved by to discover. The documents may be too technical or too running the queries can be directly examined or used as general or out of date for one’s needs. seeds for further locating relevant resources in the Obviously the information retrieval would be much repository. The contextual retrieval functionality is more efficient if the user’s context is taken into account. In provided through an intuitive user interface. We discuss the this paper we propose an approach to contextual retrieval context representation and the contextual retrieval of information in Topic Map-based digital collections. We implementation in TM4L. extend the Topic Map model (Biezunski, Bryan, & Newcomb 2002) with a notion of context beyond scopes and attempt to solve the above mentioned IR problems by: (i) making topic map authors’ vocabulary available to 1 Introduction users for expressing their information needs (ii) using samples identified by the users to guide them to the The exploration of information resources is always location of relevant resources within the collection. affected by users’ context. Consequently, in case of We have exemplified the proposed approach in TM4L – concept-driven access to digital collections, the conceptual a Topic Map-based environment for building, maintaining, structure should not be predetermined on the basis of many and using standards-based, ontology-aware e-learning assumptions, but should rather allow views from different repositories (Dicheva & Dichev 2006) (see perspectives, depending on the context in which the digital http://compsci.wssu.edu/iis/nsdl/). collection is being used. There are two major problems The paper is organized as follows. In Section 2 we related to information retrieval (IR) in such collections. discuss briefly the Topic Maps model and TM4L. In The first problem is related to user’s assumption of a Section 3 we define contexts incorporated in topic maps certain structure when exploring the collection. This and propose an approach for context mapping. In Section 4 problem stems out of previous user’s experience with we discuss the extension of TM4L to support contextual digital classifications, textbooks, etc. that impose a retrieval in learning repositories. particular view on the information. Indeed, since a classification is intrinsically contextual, digital collections tend to inherit certain individual perspectives from their 2 Building Topic Map Repositories authors. The second problem is related to searching information relevant to a particular subject. Although Topic Maps (TM) are a technology (Park & Hunting 2002) Copyright © 2006, American Association for Artificial Intelligence that provides a flexible and intuitive modeling paradigm (www.aaai.org). All rights reserved. for defining a conceptual navigation layer that supports finding of web resources of various kinds, such as documents, images, database records, audio/video clips, etc. The advantage of using the TM technology for developing digital learning collections is twofold: from one side it provides convenient and intuitive presentation of interrelated concepts embedded in information 3 Using Context in Topics Maps resources, and from another, the material is in a standards- based format, which makes it interchangeable and In an ideal world, information seeking applications would interoperable. respond to their users presenting always the precise set of Basically, topic maps are collections of topics, resources they might need to complete their current task, associations, resources, and scopes. In the Topic Maps automatically putting away any irrelevant documents. This model the concepts are reified in topics, and they can be type of interaction implies that the application is aware of categorized using types. TMs describe by means of topics the user’s personal context. There are many ways to model what a resource is about. Associations express semantic or represent context, and scope is just one means explicitly relationships between topics, and the extent of validity of provided by the TM model. The scope mechanism enables topics, associations, and resources is called scope. The any topic characteristic to be qualified by defining a range concept of scope is used in topic maps basically as a within which the information is valid. Scopes may be used constraint on the interpretation of a topic characteristic. A to define different perspectives on the same set of typical application of scopes is to resolve name collisions. information. For example, scopes may be used to separate For example, a topic map may contain two topics named “beginner” resources from “advanced” resources, thus Washington, one in the scope of City and the other in the enabling different sets of information to be presented to scope of State. A user may specify a scope while searching users with different knowledge levels. The scope concept a topic map in order to reduce the number of returned hits. is intended as a syntactic shorthand in cases where more However, a scope does not provide a mechanism for elaborate modeling of context by other means is beyond defining regions of related topics within a given topic map. the intended ‘expressiveness’ of the topic maps. The basic concepts of TMs such as topics, associations, Intuitively, we define a context as a collection of etc., are simple but expressive enough for representing entities grouped on the basis of their relations to a common complex structures. TMs can be viewed as a method for set of features determined by the current user’s goal. structuring and organizing information on the semantic and Entities can be any objects, facts, statements, resources, metadata level. etc. The common set of features defines the grouping TM4L is an environment for building and using criterion for entities. The goal specifies the boundaries of ontology-aware learning repositories represented as topic the context, which consists of all entities directly or maps. The TM4L Editor allows creating hierarchies of indirectly related to the defining set of features and topics, topic types and instances as well as relations relevant for achieving the goal. The set of assumptions between topics and learning resources. It supports TM- that provides a background of a particular task such as related operations such as merging, browsing, searching, finding a book in a library is an example of such a context. and scoping. The learning content created by the Editor is Here, the context includes the assumptions about the compliant with the ISO XML Topic Maps (XTM) standard library organization, classification system, location of (Biezunski, Bryan, & Newcomb 2002), thus interoperable specific shelves, etc. with any XTM-based applications. Often certain assumptions are left implicit. That is why In order to account for personalization in the topic map the notion of context is so elusive, despite of the various navigation, we propose to mediate the access to the models proposed and developed (Bouquet et al. 2005; learning collection by using contexts. Contexts can be Bouquet et al. 2004; Dey 2000; Giunchiglia 1993; Guha, considered as specific views on a domain, dependent on R. 1991; McCarthy 1987). In this work we don’t intend to the users’ information needs. Our approach is based on the propose a general framework for modeling contexts, we assumption that a successful information support rather take a pragmatic approach. application should not force users to change their way of In the Topic Maps world all objects of interest are looking at the learning repository, as an externally imposed mapped to topics, associations, occurrences and scopes. schema might be perceived either as unfamiliar or unusual. Our intuitive interpretation of context in Topic Map terms The adopted Topic Map platform provides the scope is as the minimal environment that surrounds and gives mechanism that allows representation of context sensitive meaning of the topics of interest. In this sense, context can information. Exploiting the scope features, we extend the be interpreted as the minimal sum of all related topics and Topic Maps paradigm in order to incorporate the notion of related occurrences. Thus, we define context as a set of context. facts (topics or occurrences), grouped on the basis of their relations with a common set of topics, associations, or scopes, determined by a given goal. The set of all topics related to a given topic is an example of such a context, where the task is to find all relevant topics. In this example all objects that belong to the context share a common topic. Another example is the set of all pairs of topics to the terminal topics generated in each branch of the topic generated by the whole-part associations starting from a generation procedure. particular topic. This context can be used when exploring If the context expression includes no relations, resource digital content from the viewpoint of its natural parts, such types, themes or range, then the interpretation of such an as chapters, subchapters, etc. Learning materials for expression is respectively all relation types, all resource advanced students in a course support system is yet types, no themes, and full range. The list of resources is another example with context - the set of topics and obtained by adding to an initially empty list, all resources occurrences sharing a common scope (“Advanced linked to the set of topics specified in the context students”). expression, followed by generating all children topics of the latter set of topics, by applying the relations specified in the context expression. This procedure is recursively 3.1 Contexts in TM4L applied to the list of the newly generated topics until no more new children can be generated. An additional The typical use of TM4L is for supporting users in their filtering is performed at each step, using the themes and task of finding relevant resources. Therefore, in TM4L we resource properties specified by the remaining part of the use the above context definition with the restriction of the context expression. set of entities to a set of occurrences. The common set of In creating a context expression, the user may constrain features defining the context of a TM4L user’s information the retrieved resources to introductory material with easy needs is a combination of topics, relation (association) explanation of the key concepts, by selecting the theme types, resource types, and themes (scopes). “Beginner”. In addition, he may indicate his preferred We describe contexts using a simple language for resource types, e.g. “Online notes” or “Examples”. The creating contextual expressions. A contextual expression is user may also wish to constrain the resources based on the interpreted as a query for information resources that resource coverage of the topical structure generated by the represents a user’s current information needs. The context expression. Assume that the topic “List evaluation of the query results in a set of resources Processing” includes subtopics “List Representation” and satisfying the context expression. Each context expression “Operations on Lists”. If the user wants only resources is defined as follows: linked to the terminal topics (leafs) of the structure rooted by the topic “List Processing” (i.e. “List Representation”
resources of c1. The fact implies that ) At. Denote next by Tp the set of all tuples o(t, ri) with a fixed topic t (e.g. the set of all resources ri linked to the topic t): Tp = {r(t, ri)| r(t, ri) ct, i= 1,2,..,m}. Assume now that there is
a device such that for each rj cs it is able to display rj’s image in ct within Tp (e.g. ri within the group of the corresponding resources linked to t along with t and its ancestors). The key insight here is that the knowledge acquired by
observing the traces of s) (e.g. the locations and the distribution of resources from As inside the target context structure) will help the agent in exploring the target context ct in a more systematic way and make the search for relevant recourses rj more logical. The actual goal of this approach is to provide a framework for Fig. 1. Defining a context in TM4L. information exchange based on topic maps not by eliminating differences between TM authors and users or between users but by offering a system that enables 4.2 Context Mapping and Structure Cognition semantic interoperability between different types of users.
Context mapping is a relation between a context called source and another context called target. The mapping can 4.3 Use of Contexts be formally represented by a triple < cs; M; ct>, where cs and ct are the source and target contexts, respectively and TM4L allows formulation of context queries, which enable M is the mapping, i.e. the relation between the explicit on one hand, to specify the boundaries of the contextual representations of cs and ct. For a Topic Map representation space, and on the other, to filter the result on the basis of we define the mapping (a special case of (Bouquet 2005) certain semantic properties. as a triple
5 Related Work 6 Conclusion In the first TM applications the most common way of searching topic maps was by walking through topics, Efficient information retrieval requires information occurrences and associations. While such approach is filtering and search adaptation to the user’s current needs, suitable for small and not complicated maps, it does not interests, knowledge level, etc. The notion of context is turn out to be useful for large sets of topics. It became intrinsically related to this subject. In this paper we obvious that there was a need of creating a query language propose an approach to context modeling in Topic Map- specialized for topic maps. There are several different based digital library applications. It is based on the proposals for TM query languages: standard TM support for associations and scopes and Topic Maps Query Language – TMQL (Wrightson defines the context as an abstraction of grouping of related 2000) - an XML-based extension of the Structured information based on a specified task. The proposed model Query Language (SQL), developed for meeting the of context is utilized in an extension of TM4L, an e- specialized data access requirements of Topic Maps. learning environment aimed at supporting the development AsTMa? (Barta, 2003) – a query language similar to of efficiently searchable and reusable learning repositories, SQL, with queries in the form of ‘LET – IN – focused on aiding the retrieval of relevant information. WHERE – RETURN’ expressions. AsTMa? is part of the AsTMa* language family, a set of languages for Topic Map authoring, constraining, manipulation, and Acknowledgement querying. Tolog (Garshol 2001) – a query language based on the This material is based upon work supported by the logic programming language Prolog. Using Tolog one National Science Foundation under Grant No. DUE- can ask for all topics of a particular type, the names of 0333069 “NSDL: Towards Reusable and Shareable all topics of a particular type in a particular scope, etc. Courseware: Topic Maps-Based Digital Libraries” and Grant No. DUE-0442702 “CCLI-EMD: Topic Maps-based However, these languages are designed for selecting courseware to Support Undergraduate Computer Science specified TM elements within the whole (possibly scoped) Courses”. topic map. In our approach, we can constrain the query to certain localities defined by a set of topics and a set of relations. With regard to its constraining power, our context learning for intelligent information retrieval. In EUSAI’04: Proceedings of the 2nd European Union References symposium on Ambient Intelligence, ACM Press, 19–24. Guha, R. 1991. Contexts: A Formalization and Some Applications, Technical Report ACT-CYC-423-91, MCC, Barta, R. 2003. AsTMa? Language Definition. Bond Austin, Texas. University Technical Report,, Online: http://astma.it.bond. edu.au/astma%3Fec.dbk?style=printable. Henzinger B, Chang B.W, Milch B., and Brin S. 2003. Query-free news search. In Proceedings of the 12 Twelfth Bharat K. 2000. Searchpad: Explicit Capture of Search International World Wide Web Conference (WWW-2003), Context to Support Web Search. International Journal of Budapest, Hungary, May 20-24 2003. Computer and Telecommunications Networking, North- Holland Publishing Co., 33(1-6), 493–501. Kraft R., Maghoul F., Chang C. C. 2005. Y!Q: Contextual Search at the Point of Inspiration CIKM 2005, ACM Biezunski, M., Bryan, M., and Newcomb, S. 2002. Fourteenth Conference on Information and Knowledge ISO/IEC 13250:2000 Topic Maps: Information Management (CIKM), Bremen, Germany, November, Technology,http://www1.y12.doe.gov/capabilities/sgml/sc 2005. 34/document/0323.htm. [Last viewed April 15, 2006]. McCarthy, J. 1987. Generality in Artificial Intelligence, Bouquet P., Giunchiglia F., Van Harmelen F., Serafini L., Communications of the ACM, 30(12) 1030–1035. and Stuckenschmidt H. 2004. Contextualizing Ontologies. Journal of Web Semantics, 1(4), 1-19. Park, J., Hunting, S. 2002. XML Topic Maps: Creating and Using Topic Maps for the Web, Addison-Wesley. Bouquet, P., Serafini, L., and Zanobini, S. 2005. Peer-to- Peer Semantic Coordination. J. of Web Semantics, 2(1). Wrightson A. 2000. TMQL Draft (Topic Map Query Language). Ontopia, BSI. Online: Dey, A.K. 2000. Providing Architectural Support for http://www.y12.doe.gov/sgml/ sc34/document/0186.doc. Building Context-Aware Applications. Ph.D. Thesis, College of Computing, Georgia Institute of Technology. Online: http//www.cc.gatech.edu/fce/ctk/pubs/dey- thesis.pdf. Dichev C. and Dicheva D. 2005. Contexts as Abstraction of Grouping, Proceedings of the Workshop on Contexts and Ontologies, 12th National Conf. on Artificial Intelligence, AAAI 2005, Pittsburgh, 49-56. Dicheva, D. and Dichev, C. 2006. TM4L: Creating and Browsing Educational Topic Maps, British Journal of Educational Technology - BJET, 37(3), 391-404. Dodds, L. 2005. Introducing SPARQL: Querying the Semantic Web, O’Reilly XML, Online: http://www.xml.com/pub/a/2005/ 11/16/introducing- sparql-querying-semantic-web-utorial.html. Garshol, L.M. 2001. Tolog: Topic maps query language. Proceedings of the XML Europe 2001 Conference, IDEAlliance, http://www.ontopia.net/topicmaps/materials/tolog.html. Giunchiglia F. 1993. Contextual reasoning, Epistemologia, Special issue on I Linguaggi e le Macchine, XVI, 345– 364. Goker A. 1999. Capturing information need by learning user context. In Proceedings of Sixteenth International Joint Conference in Artificial Intelligence: Learning About Users Workshop, 21–27. Goker A, Watt S., Myrhaug H.I., Whitehead N.,Yakici M., Bierig R., Nuti S.K., and Cumming H. 2004. User