LIS 663 /Dr. Péter Jacsó

Term Paper

Basic Searching

By HaiYing Wang

UNIVERSITY OF HAWAII AT MANOA Library and Information Science Program

Dec. 1st, 2003 I. Topic Analysis

I like to use Google to begin my search. Google can provide fast search and retrieve highly relevant information needed from millions of resources with one single search query. It enables me to use natural language to build my search strategy, and also gives me suggestion about the queries. That is why I am so interested in the OneSearch features in the DialogWeb.

Therefore, the topic of my search is about “cross database searching.” I aim to find information in technical aspect about the topic, that is, why cross database searching can be implemented, and how it is executed, etc. The first three articles that I have read about the topic are:

i. Jacsó, Péter. Cross-searching electronic journal archives. Information Today 19(6) (June, 2002): 34. ii. Jacsó, Péter. Cross-database searching on the Web with term mapping from multiple thesauri. In Proceedings of 20th National Online Meeting, New York, NY 18-20 May 1999 (M.E. Williams, ed.): 217-25.

iii. Tennant, Roy. The right solution: tools. Library Journal. 128(11) (Jun 15, 2003): 28

These three articles helped me to learn that there exist three main techniques in cross-data base searching: Query mapping, search, and Z39.50 protocol/standard. I initially defined two concept groups and corresponding terms as follows: Concept groups Corresponding terms Cross-database searching Cross database searching simultaneous database searching multiple database searching techniques Query mapping Metadata Z39.50

As Z39.50 protocol and metadata search are very new topics, the limit of the publication year can be set from 1990 to present. I am preferred to the type of journal articles because I can find the fulltext easily compared to conference papers, or research report. The language limited is English only.

1 II. Six Nominated chosen in DIALINDEX

Because my topic is related to computer and information Science, I began DIALINDEX search with setting files to two subject categories: CompSci and InfoSci (b 411; sf compsci,infosci). There are 26 files selected. According to the two concept groups mention above, I tested several queries:

2

According to the posting numbers for each test query, I took the following 10 files/databases as nominees.

3 File Database Name 1: ERIC (1966-present) 2: INSPEC (1969-present) 6: NTIS (1964-present) 8: Ei Compendex ® (1970-2003) 34: SciSearch® - a Cited Reference Science Database (1990-present) 144: PASCAL (1973-2003) 148: Gale Group Trade & Industry Database(TM) (1976-present) 202: Information Science & Technology Abstracts (1966-present) 233: Internet & Personal Computing Abstracts(TM) (1981-present) 438: Library Literature and Information Science (1984-present)

After studying the File Description and Subject Coverage in the blue sheets of these 10 files, I kicked File 438 out of game first because it is an index only database although the subject coverage is perfect for the topic. File 148 was kicked out secondly because it has too much information about business products and the document types are journal articles and newsletters/newspaper articles only. File 233 was kicked out because only 90 journals are indexed and abstracted so that the document type is journal articles only.

Kicking out File 34 is the most difficult decision I made. This file is the most famous citation database by the Institute for Scientific Information (ISI) and has over 12,104,390 records as of November 2003. From the viewpoint of the posting number in each test query, it should be kept. However, when I used File 414 (Dialog Journal Name Finder) to find and report the journal name of “Library Journal” I found none of File 2, 6, 8, and 34 includes this important journal. Compared with the other 6 files left (File 1, 2, 6, 8, 144, 202), there are only 2 document types in File 34: Journal Articles and Book Reviews, and an abstract is not always available for each record. Although there is a descriptor index, but the consistency of descriptors is not good because the descriptors are from author(s)’ key words.

Through the comparisons among 6 nominated files/databases (File 1, 2, 6, 8, 144, 202), it can be seen clearly that the time coverage of these 6 files are almost the same, the Geographic Coverage of each file is international, and the document types of each file include journal articles, conference papers, books, reports, and theses/dissertations (except File 8: Ei Compendex). Three basic indexes: TI (word indexing), AB (word indexing), and DE (word & phrase indexing) are available for all 6 files.

4

5 III. Two Best Databases Chosen

The 6 nominated files/databases mentioned above can be categorized into 2 clusters: File 2, 6, and 8 are more engineering aspect, and File 1, 144, and 202 are more general science. So I attempted to choose one best database from each cluster.

Based on the test queries for the concept groups executed in the section II of this paper, the following queries were formed by including synonyms and term variations, and expanding proximity.

Set Term Searched Items File

S1 CROSS()DATABASE()SEARCH? 53

S1 CROSS()DATABASE()SEARCH? 16 202 S1 CROSS()DATABASE()SEARCH? 8 144 S1 CROSS()DATABASE()SEARCH? 4 8 S1 CROSS()DATABASE()SEARCH? 1 6 S1 CROSS()DATABASE()SEARCH? 18 2 S1 CROSS()DATABASE()SEARCH? 6 1

S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 423

S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 95 202 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 83 144 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 45 8 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 39 6 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 120 2 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 41 1

S3 INFORMATION()RETRIEVAL()TECHNI? 603

S3 INFORMATION()RETRIEVAL()TECHNI? 75 202 S3 INFORMATION()RETRIEVAL()TECHNI? 90 144 S3 INFORMATION()RETRIEVAL()TECHNI? 103 8 S3 INFORMATION()RETRIEVAL()TECHNI? 57 6 S3 INFORMATION()RETRIEVAL()TECHNI? 242 2 S3 INFORMATION()RETRIEVAL()TECHNI? 36 1

S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 9818

S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1869 202 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1553 144 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1731 8 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1451 6 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1568 2 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1646 1

S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 16396

S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 533 202 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 1325 144 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 2612 8

6 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 286 6 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 11457 2 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 183 1

S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 1582

S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 110 202 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 402 144 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 275 8 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 55 6 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 604 2 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 136 1

S7 Z39()50 727

S7 Z39()50 199 202 S7 Z39()50 207 144 S7 Z39()50 41 8 S7 Z39()50 10 6 S7 Z39()50 226 2 S7 Z39()50 44 1

S8 S5 OR S6 OR S7 18607

S8 S5 OR S6 OR S7 834 202 S8 S5 OR S6 OR S7 1920 144 S8 S5 OR S6 OR S7 2916 8 S8 S5 OR S6 OR S7 350 6 S8 S5 OR S6 OR S7 12228 2 S8 S5 OR S6 OR S7 359 1

S9 S4 AND S8 376

S9 S4 AND S8 109 202 S9 S4 AND S8 36 144 S9 S4 AND S8 60 8 S9 S4 AND S8 10 6 S9 S4 AND S8 125 2 S9 S4 AND S8 36 1

S10 S2 AND S9 2

S10 S2 AND S9 0 202 S10 S2 AND S9 2 144 S10 S2 AND S9 0 8 S10 S2 AND S9 0 6 S10 S2 AND S9 0 2 S10 S2 AND S9 0 1

S11 S2 OR S9 797

S11 S2 OR S9 204 202 S11 S2 OR S9 117 144 S11 S2 OR S9 105 8 S11 S2 OR S9 49 6

7 S11 S2 OR S9 245 2 S11 S2 OR S9 77 1

S12 ID (sorted in duplicate order) 797

S13 IDO S11 (duplicates only) 170

S14 RD S11 (unique items) 701

S14 RD S11 (unique items) 200 202 S14 RD S11 (unique items) 78 144 S14 RD S11 (unique items) 76 8 S14 RD S11 (unique items) 33 6 S14 RD S11 (unique items) 237 2 S14 RD S11 (unique items) 77 1

S15 S14/ENG 667

S15 S14/ENG 195 202 S15 S14/ENG 72 144 S15 S14/ENG 75 8 S15 S14/ENG 29 6 S15 S14/ENG 219 2 S15 S14/ENG 77 1

S16 S15/1990:2003 508

S16 S15/1990:2003 156 202 S16 S15/1990:2003 71 144 S16 S15/1990:2003 53 8 S16 S15/1990:2003 13 6 S16 S15/1990:2003 169 2 S16 S15/1990:2003 46 1

From the query log above, it shows that File 2 and 202 have the most two highest posting numbers in Set 2, 9, and 11. These are the two best (I thought) databases chose initially from the viewpoint of posting number. After duplicated records removed, the language limited English, and publication year limited to 1990 to present, there are 169 and 156 records in File 2 and 202, respectively. It seems pretty equivalence between 2 databases

From the records sampled from all 6 nominated databases, I can conclude that most of the records retrieved are relevant to the topic. Comparisons between File 1 and 202, and between File 8 and 2, File 1 and 8 were kicked out mainly because of the posting numbers, although there are some good records. File 6 was kicked out not only for the posting number of the records retrieved, but also for the document types (no papers/articles available, reports only).

Set Term Searched Items File

S22 S16 AND (DT=J? OR DT=P OR DT=PA OR DT=PAP? OR DT=ART?) 403

S22 S16 AND (DT=J? OR DT=P OR DT=PA OR DT=PAP? OR DT=ART?) 112 202

8 S22 S16 AND (DT=J? OR DT=P OR DT=PA OR DT=PAP? OR DT=ART?) 67 144 S22 S16 AND (DT=J? OR DT=P OR DT=PA OR DT=PAP? OR DT=ART?) 20 8 S22 S16 AND (DT=J? OR DT=P OR DT=PA OR DT=PAP? OR DT=ART?) 0 6 S22 S16 AND (DT=J? OR DT=P OR DT=PA OR DT=PAP? OR DT=ART?) 163 2 S22 S16 AND (DT=J? OR DT=P OR DT=PA OR DT=PAP? OR DT=ART?) 41 1

File 144 was kicked out mainly because the languages. Although English descriptor and title are available, some of abstracts are missing or in French .

9 IV. Queries (Re)formulation

Compared the blue sheets between File 2 and 202, the subject scopes are different from each other but also complimentary to each other in technical aspect.

10 On one hand, there are many common features between these two files, such as, both File 2 and 202 have TI, AB, and DE as basic index; both files can be limited to ENG, NOENG, and publication year; both files can be sorted by TI, AU, JN, PY; etc. On the other hand, there are many special features in each file, such as, File 2 has ID basic index while File 202 has MH and SH. In the next steps of best query formulation and result evaluation, I must be aware of these common and special features between two databases. File 2

File 202

File 2

11 File 202

I must also be aware of the difference of descriptors/identifiers indexed, identical such as “cross database searching” is indexed in File 2 as an ID while it is not indexed in File 202.

My search strategy is very simple: based on the concept groups and their related terms defined in Section I in this paper, • Firstly, find the most possible synonyms of each word; • Secondly, use unlimited truncation ? at the end of a word to match its variations (pl., v., n., …); • Then, use Boolean “OR” to group synonyms and their variations; • Use proximity operator (2n) between the grouped terms to form a similar concept in the most possible orders/directions among terms; • Use Boolean AND or OR to narrow or group a concept; • Until the final concept matched the topic.

To me, the most difficult and time consumption step is to find synonyms,which is the first step of my search strategy. From the index browsing and the records sampled, I learned and found many

12 synonyms. Some of the synonyms are good, but there are always bad ones. These synonyms were refined through a new query and re-sampling. Like “cross database searching,” firstly I just added “multi?” to “cross.” From the sampled records, I found good synonyms, such as “federated,” ”simultaneous,” “integrated,” etc, and the truncation of “multi?” is too loose. Some unrelated terms, such as MULTIELECTRODE were included.

The following is the final query log run in DialogWeb OneSearch mode.

Set Term Searched Items File (CROSS? OR MULTI OR MULTIPLE OR SIMULTAN? OR FEDERATED OR INTEGRATED) S1 (2N)(DATABASE? OR DATA()BASES? OR FILE?)(2N)(SEARCH? OR ACCESS? OR 905 APPROACH?) (CROSS? OR MULTI OR MULTIPLE OR SIMULTAN? OR FEDERATED OR INTEGRATED) S1 (2N)(DATABASE? OR DATA()BASES? OR FILE?)(2N)(SEARCH? OR ACCESS? OR 213 202 APPROACH?) (CROSS? OR MULTI OR MULTIPLE OR SIMULTAN? OR FEDERATED OR INTEGRATED) S1 (2N)(DATABASE? OR DATA()BASES? OR FILE?)(2N)(SEARCH? OR ACCESS? OR 692 2 APPROACH?) S2 INFORMATION(2N)(SEARCH? OR RETRIEV? OR SERVIC?) 111410 S2 INFORMATION(2N)(SEARCH? OR RETRIEV? OR SERVIC?) 53970 202 S2 INFORMATION(2N)(SEARCH? OR RETRIEV? OR SERVIC?) 57440 2 S3 STANDARD? OR TECHNI? OR METHOD? OR TOOL? 2870315 S3 STANDARD? OR TECHNI? OR METHOD? OR TOOL? 60267 202 S3 STANDARD? OR TECHNI? OR METHOD? OR TOOL? 2810048 2 S4 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 11990 S4 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 533 202 S4 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 11457 2 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR S5 714 APPROACH?) (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR S5 110 202 APPROACH?) (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR S5 604 2 APPROACH?) S6 Z39()50 425 S6 Z39()50 199 202 S6 Z39()50 226 2 S7 (S5 OR S6) AND S4 63 S7 (S5 OR S6) AND S4 7 202 S7 (S5 OR S6) AND S4 56 2 S8 S1 AND S2 AND S3 133 S8 S1 AND S2 AND S3 47 202 S8 S1 AND S2 AND S3 86 2 S9 S7 OR S8 196 S9 S7 OR S8 54 202 S9 S7 OR S8 142 2 S10 S9/1990:2003 135 S10 S9/1990:2003 33 202

13 S10 S9/1990:2003 102 2 S11 S10/ENG 130 S11 S10/ENG 32 202 S11 S10/ENG 98 2 S12 E13-E16 6185636 S12 E13-E16 82990 202 S12 E13-E16 6102646 2 S13 S11 AND S12 68 S13 S11 AND S12 26 202 S13 S11 AND S12 42 2 S14 ID (sorted in duplicate order) 68 S15 IDO S13 (duplicates only) 0

Note: “S E12-E16” is based on the browsing of document type “e dt=a” as shown below.

14 V. Viewing Results

Title: Accessing research-level journal literature about phytomedicinals: a searching conundrum Author Moberly, H.K. Title: ADMIRE: an adaptive data model for meta search engines Author Huang, L.; Hemmje, M.; Neuhold, E.J. Title: Aquarelle to 239.50: the A to Z of access to cultural heritage information Author Dawson, D. Automation and technical services organization. Author(s): Bazirjian, R Building a better meta-search. Author(s): Arnold, Stephen Title: Building efficient and effective metasearch engines Author Weiyi Meng; Yu, C.; King-Lup Liu Title: Building the infrastructure of resource sharing: union catalogs, distributed search, and cross- database linkage Author Lynch, C.A. Building the infrastructure of resource sharing: union catalogs, distributed search, and cross-database linkage. Author(s): Lynch, C A A comparison of two databases for the retrieval of business literature: ABI/INFORM and Management Contents. Author(s): Rey, Jesus; Fereres, Raquel Title: Context and selective retreat in hierarchical menu structures Author Field, G.E.; Apperley, M.D. Consumer health information and the not-for-profit health agency. Author(s): Calvano, M Title: Cost benefits of ontologies Author Menzies, T. Title: Decision points for databases Author Basch, R. Title: DejaNews and other Usenet search tools Author Notess, G.R. Title: Database publishers' challenges for the future Author Lawrence, B. Title: A database selection expert system based on reference librarian's database selection strategy: a usability and empirical evaluation Author Wei Ma Discriminating meta-search: a framework for evaluation. Author(s): Chignell, Mark H; Gwizdka, Jacek; Bodner, Richard D Title: A distributed architecture for resource discovery using metadata Author Roszkowski, M.; Lukas, C. Title: Enhancing NTIS database access at a multi-campus university Author Conkling, T.W.; Jordan, K. Enhancing NTIS database access at a multi-campus university. Author(s): Conkling, T W; Jordan, K Evaluation of a computerized community information system through transaction analysis and user survey. Author(s): De Smet, E FEATURES: real-time adaptive feature and document learning for Web search. Author(s): Chen, Zhixiang ([email protected]); Meng, Xiannong ([email protected]); Fowler, Richard H ([email protected]); Zhu, Binhai Title: Fast and quasi-natural language search for gigabytes of Chinese texts

15 Author Lee-Feng Chien Title: GPO Access-government at its best? Author Platt, N. Title: Grainger Engineering Library: an object-enhanced user interface for information retrieval Author Johnson, E.H. Grainer Engineering Library: an object-enhanced user interface for information retrieval. Author(s): Johnson, Eric H Title: Grouper: a dynamic clustering interface to Web search results Author Zamir, O.; Etzioni, O. Title: Identification of duplicate and near-duplicate full-text records in database search-outputs using hierarchic cluster analysis Author Kirriemuir, J.W.; Willett, P. : combining content-based and metadata-based approaches. Author(s): Day, Michael ([email protected]) Information access in the humanities: perils and pitfalls. Author(s): Walker, G; Atkinson, S D Informal research on digital images on the WWW. Author(s): Gibboney, Richard Title: Inquirus, the NECI meta Author Lawrence, S.; Giles, C.L. Title: Integrating ontologies and thesauri for RDF schema creation and metadata querying Author Amann, B.; Fundulaki, I.; Scholl, M. Integrating ontologies and thesauri for RDF schema creation and metadata querying. Author(s): Amann, Bernd; Fundulaki, Irini; Scholl, Michel A mulligan stew of searching techniques. CASLINK and SmartSELECT. Author(s): Buntrock, R E Title: MultiLink-an intermediary system for multi-database access Author Wu, G.; Ahlfeldt, H.; Wigertz, O. Title: A metadata approach to statistical query processing Author Froeschl, K.A. Title: Networking CD-ROM in German libraries Author Heinisch, C. New search and navigation techniques in the digital library. Author(s): Stern, David Title: New search and navigation techniques in the digital library Author Stern, D. Title: On Z39.50 wrapping and description logics Author Velegrakis, Y.; Christophides, V.; Constantopoulos, P. Title: Problems and pitfalls in using CD-ROM technology Author van Brakel, P.A. A primer on Z39.50. Part five. Author(s): Hinnebusch, M A quick guide to...Z39.50. Author(s): Taylor, Stephanie Title: A quick guide to ... Z39.50 Author Taylor, S. The right solution: federated search tools. Author(s): Tennant, Roy Title: Retrieval technology and the information explosion Author Stubbs, R. Title: Revolution in current awareness services Author Rowley, J.

16 Title: Schema coordination in federated database management: a comparison with schema integration Author Zhao, J.L. Scientific and technical information: this millennium and the next Author(s): Lambert, Nancy Title: SciSearch on STN-unique features for sophisticated searching Author Huber, C.F. Title: Similarity searching in the CORDIS text database Author Petrakis, E.G.M.; Tzeras, K. Similarity searching in the CORDIS text database. Author(s): Petrakis, Euripides G M ([email protected]); Tzeras, Kostas Title: Semantic caching of Web queries Author Chidlovskii, B.; Borghoff, U.M. Title: STARTS: Stanford proposal for Internet meta-searching Author Gravano, L.; Chang, C.-C.K.; Garcia-Molina, H.; Paepcke, A. Title: System architecture for information processing Author Ozkarahan, E. Thesaurus federations: loosely integrated thesauri for document retrieval in networks based on Internet technologies. Author(s): Kramer, Ralf; Nikolai, Ralf; Habeck, Corinna Title: Thesaurus federations: loosely integrated thesauri for document retrieval in networks based on Internet technologies Author Kramer, R.; Nikolai, R.; Habeck, C. Title: Thesauri on the Web: current developments and trends Author Asghar Shiri, A.; Revie, C. Thesauri on the Web: current developments and trends. Author(s): Shiri, Ali Asghar; Revie, Crawford Title: Towards distributed library systems: Z39.50 in a European context Author Dempsey, L.; Russell, R.; Kirriemuir, J. Title: Unifying heterogeneous information models Author Singh, N. Title: Using technical databases with minority patent coverage to enhance retrieval Author Adams, S.R. Title: Visualization of metadata Author Beagle, D. Visualization of metadata. Author(s): Beagle, Donald ([email protected]) Z39.50: an information retrieval protocol. Author(s): Aruna, A

17 VI. Evaluating Results

From the ID list, it shows clearly that there are 18 records duplicated. The reason that IDO command couldn’t recognize the duplicated records is the different formulations of author(s), journal name, or title between two databases.

From the following 3 pair duplicated records, even the same record, the descriptors indexed are different. They followed the indexing rule of each database. Like the descriptor ”Abstracting and indexing services” of File 202, it was indexed as two descriptors of “Abstracting” and “indexing” in File 2. Also, the content of the abstract between the duplicated records may be different.

18

19 In order to compare the coverage different between two databases, I split the final results into two sets and run rank command by JN, PY and DE,ID/DE,MH for each set, respectively.

S16 S13 FROM 2 42 2 S17 S13 FROM 202 26 202

File 2 (rank jn) File 202 (rank jn)

File 2 (rank py) File 202 (rank py)

20

File 2 (rank de,id) File 202 (rank de,mh)

Both of the two databases I chose are good. I prefer the descriptor index of File 2 which is more completed because of the special INSPEC thesaurus. However, its journal name index is not good (many duplicated entries). File 202 has Main Heading index which makes user to distinguish the main points of the record.

21 VII. Bibliography of the 30 Most Relevant Records Amann, Bernd; Fundulaki, Irini; Scholl, Michel Integrating ontologies and thesauri for RDF schema creation and metadata querying. International Journal on Digital Libraries vol. 3 , no. 3 (October 2000): 221-236 Presents a new approach for building metadata schemas by integrating existing ontologies and structured vocabularies (thesauri.) The integration is based on the specification of inclusion relationships between thesaurus terms and ontology concepts, and results in application-specific metadata schemas incorporating the structural views of ontologies and the deep classification schemes provided by thesauri. Shows how the result of this integration can be used for RDF schema creation and metadata querying. In this context, (metadata) queries exploit the inclusion semantics of term relationships, which introduces some recursion. Describes a fairly simple database-oriented solution for querying such metadata which avoids a (recursive) tree traversal and is based on a linear encoding of thesaurus hierarchies. Descriptors: Thesauri; Ontologies; Metadata; Query processing

Arnold, Stephen Building a better meta-search. Information World Review , no. 185 (November 2002): 33 Presents an in-depth review of the Copernic Agent 6 software, a tool which can be launched from within Microsoft Office applications or from a browser as a type of search-and-retrieval software tool that performs what is known as a meta-search. Users can select a content area to search (there are 80, grouped under seven broad categories such as business and economy), enter a query as a work, phrase, or natural language sentence, and start scanning results ten to 15 seconds later. Results are relevance ranked; search terms are highlighted; and the document is summarized in well-formed sentences. Quebec-based Copernic claims to have 20 million users worldwide, as well as 250,000 users of the Copernic Plus and Pro products, and seems to have made state-of-the-art search, real-time indexing, agent technology, and snazzy interfaces its obsession. Copernic has another wave of innovations scheduled for late 2002 or early 2003, including projects for enhanced relevancy and a common interface for the Web and for content. Descriptors: Search engines; Computer software; Query processing; Reviews

Aruna, A. Z39.50: an information retrieval protocol. DESIDOC Bulletin of Information Technology vol. 21 , no. 6 , (November 2001): 25-39 The enormous growth of various online bibliographic databases on the Web has led library professionals to learn the diverse data structures and various search interfaces of these databases for effective information retrieval. Z39.50 is an open-communication protocol which reduces the burden of library professionals by providing one search interface for multiple databases, and it also enables interlibrary loan to be totally electronic. Presents an in-depth discussion of the Z39.50 protocol, officially known as Information Retrieval (Z39.50): Application Services Definition and Protocol Specification, ANSI/NISO Z39.50-1995. Descriptors: Communications protocols; Electronic data interchange; Libraries

Bazirjian, R. Automation and technical services organization. Library Acquisitions: Practice and Theory vol. 17 , no. 1 (Spr 1993): 73-77 This article highlights four factors about automation that prompt reorganization and must be considered in the reorganization process. The streamlining of functions, cost-effectiveness, immediate access, and the integrated database are focused upon. The reorganization of technical services at Syracuse University is detailed. Several products and innovations which are mandating additional change in the traditional technical services divisions in libraries are also studied. Descriptors: Academic libraries; Acquisitions; Cataloging; Library automation

Beagle, Donald. Visualization of metadata. Information Technology and Libraries vol. 18 , no. 4 , (December 1999): 192-199

22 Visualization research has transformed the operating system environment of Web browsers and online public access catalogs (OPACs), but has not yet changed the way their contents are manipulated. Discusses the potential for visualization of metadata and metadata-based surrogates, including a command interface for metadata viewing, site mapping and data aggregation tools, dynamic derivation of surrogates, and a reintroduction of transient hypergraphs from the tradition of cocitation networking. Also examines digital library research into query- specific instantiation through agents accessing a central metadata repository in the context of potential synergies between querying, browsing, and group information sharing. Descriptors: Information retrieval; Databases; Query processing

Buntrock, R E A mulligan stew of searching techniques. CASLINK and SmartSELECT. Database vol. 17 , no. 3 , (Jun 1994): 116-118, 120-121 The author briefly reviews some information database searching techniques recently introduced on STN primarily to facilitate multifile patent searching. The author believes the techniques are also applicable to other kinds of searching. The techniques are described originally in a brochure, "Patents on STN," which includes CASLINK and SmartSELECT. CASLINK is a searching technique that incorporates substructure searching in multiple files, searching the resulting structures in bibliographic files, and removal of duplicate references from the results. SmartSELECT has two important uses: (1) facilitation of transfer of the search terms derived in one file to another, and (2) statistical analyses of the results. Descriptors: Databases; Information retrieval; Online retrieval; Patent information

Chen, Zhixiang; Meng, Xiannong ; Fowler, Richard H ; Zhu, Binhai FEATURES: real-time adaptive feature and document learning for Web search. Journal of the American Society for Information Science and Technology vol. 52 , no. 8 (June 2001):655-665 Reports research on building FEATURES, an intelligent Web search engine able to perform real- time adaptive feature (i.e., keyword) and document learning. Not only does FEATURES learn from the user's document relevance feedback, but it also automatically extracts and suggests indexing keywords relevant to a search query and learns from the user's keyword relevance feedback, so that it is able to speed up its search process and enhance its search performance. Describes two efficient and mutual-benefiting learning algorithms that work concurrently: one for feature learning and the other for document learning. FEATURES employs these algorithms, together with an internal index database and a real-time meta-searcher, to perform adaptive real- time learning to find desired documents with as little relevance feedback from the user as possible. Concludes by discussing the architecture and performance of FEATURES. Descriptors: Machine Learning; Query Processing; Documents; Keywords

Chignell, Mark H; Gwizdka, Jacek; Bodner, Richard D. Discriminating meta-search: a framework for evaluation. Information Processing and Management vol. 35 , no. 3 , (May 1999) : 337-362 Meta-search engines were developed to improve search performance by querying multiple search engines at once. In principle, meta-search engines can greatly simplify the search for electronic information by selecting a subset of first-level search engines and digital libraries to submit a query based on the characteristics of the user, the query/topic, and the search strategy. This selection would be guided by diagnostic knowledge about which of the first-level search engines works best under what circumstances. Programmatic research is required to develop this diagnostic knowledge about first-level search engine performance. Introduces an evaluative framework for this type of research, and illustrates its use in two experiments. Observes significant interactions between search engine and two other factors (time of day and Web domain). Derives preliminary information about the complex relationship between search engine functionality and performance in different contexts. Descriptors: Information retrieval; Models; Query processing

Dawson, D. Aquarelle to 239.50: the A to Z of access to cultural heritage information. New Review of Hypermedia and Multimedia (UK) vol.4 (1998): 245-54

23 The way forward for handling the variety of dynamic data held by museums has shifted from conventional databases and CD-ROM publication, to the Internet. At the same time, a new technology has emerged from the library world-the Z39.50 protocol, recently adopted as an ISO Standard (IS023950). The problem faced in the museum world was also identified when libraries were making their catalogues available over the Internet-despite the availability of many databases, the only way of finding rare works was to search databases one-by-one. Z39.50 enables searches to be made across databases on different platforms, with different data structures, using different software and different database technologies. The key concept behind Z39.50 is that of the Access Point. Access Points are broad conceptual groupings of information that are used to aggregate fields from within, and across, databases. This approach is being further enhanced by the Aquarelle Project. The project aims to enable access to cultural heritage information. It will employ Z39.50 protocols to access multiple databases, and Standard Generalised Mark-up Language (SGML) to deliver object records, bibliographic records and catalogue essays to the end user. Aquarelle will also have the added elements of multilingual searching and the ability to create SGML digital folders, for publishing on the Internet. Descriptors: humanities; information resources; information retrieval; Internet; page description languages; protocols; publishing

Day, Michael Image retrieval: combining content-based and metadata-based approaches. Ariadne , no. 19 (March 1999) Image-based information is a key component of human progress in a number of distinct subject domains, and digital image retrieval is a fast-growing research area with regard to both still and moving images. Reports on the Second UK Conference on Image Retrieval held in Newcastle, UK in February 1999, noting that participants included researchers and practitioners in the area of image retrieval. Gives highlights of papers presented on the following topics: content-based image retrieval, image retrieval on the , CBIR user studies, content filtering applications, and standards with relation to image retrieval. Descriptors: Information retrieval; Images; Query processing; Conferences

Dempsey, L.; Russell, R.; Kirriemuir, J. Towards distributed library systems: Z39.50 in a European context . Program (UK) vol.30, no.1 (Jan. 1996) : 1-22 Z39.50 is an information retrieval protocol. It has generated much interest but is so far little deployed in UK systems and services. The article gives a functional overview of the protocol itself and the standards background, describes some European initiatives which make use of it, and outlines various issues to do with its future use and acceptance. It is argued that Z39.50 is a crucial building block of future distributed information systems but that it needs to be considered alongside other protocols and services to provide useful applications. Descriptors: client-server systems; information systems; library automation; protocols; query processing; standards

Froeschl, K.A. A metadata approach to statistical query processing. Statistics and Computing (UK) vol.6, no.1 (March 1996) : 11-29 Concerning the task of integrating census and survey data from different sources as it is carried out by supranational statistical agencies, a formal metadata approach is investigated which supports data integration and table processing simultaneously. To this end, a metadata model is devised such that statistical query processing is accomplished by means of symbolic reasoning on machine-readable, operative metadata. As in databases, statistical queries are stated as formal expressions specifying declaratively what the intended output is; the operations necessary to retrieve appropriate available source data and to aggregate source data into the requested macrodata are derived mechanically. Using simple mathematics, this paper focuses particularly on the metadata model devised to harmonize semantically related data sources as well as the table model providing the principal data structure of the proposed system. Only an outline of the general design of a statistical information system based on the proposed metadata model is given and the state of development is summarized briefly.

24 Descriptors: data handling; data structures; query processing; statistical databases

Hinnebusch, M. A primer on Z39.50. Part five. Academic & Library Computing vol. 9 , no. 6 , (Jun 1992) : 7-11 This article discusses the query mechanism built into the Z39.50 protocol. The type 1 query, which is an integral part of Z39.50, is described. The article also examines the power of defining various query types; Reverse Polish Notation, which defines expressions consisting of operators and operands; and the relationship of RPN query and RPN structure. Descriptors: Library automation; Protocols; Query processing; Standards

Huang, L.; Hemmje, M.; Neuhold, E.J. ADMIRE: an adaptive data model for meta search engines. Computer Networks (Netherlands) vol.33, no.1-6 (June 2000 ): 431-448 Considering the diversity among search engines, efficient integration of them is an important but difficult job. It is essential to provide a data model that can provide a detailed description of the query capabilities of heterogeneous search engines. By means of this model, the meta-searcher can map users' queries into specific sources more accurately, and it can achieve good precision and recall. Moreover, it will benefit the selection of target source and computing priority. Because new search engines emerge frequently and old ones are updated when their function and content change, the data model needs good adaptivity and scalability to keep in step with the rapidly developing World Wide Web. This paper gives a formal description of the query capabilities of heterogeneous search engines and an algorithm for mapping a query from a general mediator format into the specific wrapper format of a specific search engine. Compared with related work, the special features of our work are that we focus more on the constraint of/between the terms, attribute order, and the impact of logical operator restraints. The contribution of our work is that we offer a data model that is both expressive enough to meticulously describe the query capabilities of current World Wide Web search engines and flexible enough to integrate them efficiently. Descriptors: adaptive systems; data models; information resources; search engines

Huber, C.F. SciSearch on STN-unique features for sophisticated searching . Database vol.18, no.2 (April-May 1995) : 52-4, 56-60, 62 STN International has been very active in the past couple of years, responding to customer demand by adding major databases to round out its coverage of scientific and technical information. The most recent major addition was one long requested and much anticipated: ISI's SciSearch, which made its debut at the end of August 1994. Was it worth the wait? In a word, yes! STN's implementation of SciSearch takes advantage of the unique features of STN's Messenger command language, adding special features for citation and cross-file searching to give the searcher powerful techniques never before publicly available. Where the STN implementation really shines, however, is in the power of its SELECT and SORT commands, which allow some very sophisticated applications in the SciSearch file that are unavailable elsewhere. Descriptors: information retrieval system evaluation; natural sciences computing; sorting

Kirriemuir, J.W.; Willett, P. Identification of duplicate and near-duplicate full-text records in database search-outputs using hierarchic cluster analysis. Program (UK ), vol.29, no.3 p. 241-56 July 1995 Clustering the output of a multi-database enables a user to obtain an overview of the information that has been retrieved without the need to inspect any documents that contain only redundant information. In this paper we describe a classification scheme that characterises the degree of relationship between pairs of documents in database search-outputs and then report the application of a range of clustering methods and similarity coefficients to 20 such outputs. These experiments demonstrate that clustering is capable of grouping documents that are identical to, or closely-related to, other documents in the search-output on the basis of their term similarities.

25 Descriptors: full-text databases; information analysis; information retrieval; tree searching

Kramer, Ralf; Nikolai, Ralf; Habeck, Corinna. Thesaurus federations: loosely integrated thesauri for document retrieval in networks based on Internet technologies. International Journal on Digital Libraries vol. 1 , no. 2 , (June 1997): 122-131 Due to the distribution of interrelated information over several different information systems, the interconnection of information systems has increased in recent years. However, a purely technical interconnection is insufficient for users who need to find their way to information they need. Thesauri are a proven means to identify documents, e.g., books of interest in a library. For different domains, different thesauri are available, which can be used in information systems as well, e.g., for the indexing and retreival of data objects. Furthermore, recent advances in open interoperability technologies (Web, CORBA, and Java) offer the potential for completely new technical solutions for employing thesauri. Presents an approach for integrating multiple thesaurus databases. Concentrates on the integration of distributed and heterogeneous thesaurus databases and the integration of multilingual and monolingual thesauri. The software architecture takes advantage of the most advanced Internet and CORBA technology currently available in public domain and commercial applications. Descriptors: Databases; Distributed systems; Information retrieval; Network management systems

Lambert, Nancy. Scientific and technical information: this millennium and the next .Searcher vol. 8 , no. 1 , (January 2000) : 24-38 Predicts what scientific and patent information will look like in the next millennium and beyond. Begins by looking back over the past history of science and technology, as well as education. Notes that computers used for information retrieval came into existence not too long after modern computers themselves, and although early searching tended to be primitive Boolean, now powerful search engines exist that permit searching of multiple databases. Predicts computers becoming much smaller, a linking and melding of media, advances in capabilities to search more than just text and indexing, and more. Provides a sidebar (p26-34) titled "Comments from the Patent and Sci-Tech Communities," and another (p36), "Patent and Technical Information Searching in the Next Millennium: Another World View" by Robert E. Buntrock. Descriptors: Future; Patent information; Searching

Lynch, C A. Building the infrastructure of resource sharing: union catalogs, distributed search, and cross-database linkage. Library Trends vol. 45 , no. 3 (Win 1996): 448-461 Effective resource sharing presupposes an infrastructure which permits users to locate materials of interest in both print and electronic formats. Two approaches for providing this are union catalogs and Z39.5-based distributed search systems. The advantages and limitations of each approach are considered, paying particular attention to a realistic assessment of Z39.50 implementations. This article argues that the two approaches should be considered complementary rather than competitive. Technologies to create linkage between the bibliographic apparatus of catalogs and abstracting and indexing databases and primary content in electronic form, such as the new Serial Item and Contribution Identifier (SICI) standard are also discussed as key elements in the infrastructure to support resource sharing. Descriptors: Abstracting and indexing services; Bibliographic systems; Distributed databases; Information infrastructure

Meng, Weiyi; Yu, C.; King-Lup Liu Building efficient and effective metasearch engines. ACM Computing Surveys vol.34, no.1 (March 2002) : 48-89 Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a can be constructed. When a metasearch engine receives a query from a user, it invokes the underlying search engines to retrieve useful information for the user. Metasearch engines have other benefits as a search tool such as increasing the search coverage

26 of the Web and improving the scalability of the search. In this article, we survey techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine. Among the main challenges, the database selection problem is to identify search engines that are likely to return useful documents to a given query. The document selection problem is to determine what documents to retrieve from each identified search engine. The result merging problem is to combine the documents returned from multiple search engines. We also point out some problems that need to be further researched. Descriptors: bibliographies; merging; online front-ends; search engines

Ozkarahan, E. System architecture for information processing . Information Processing & Management (UK) vol.27, no.4 : 347-69 Text and fact retrieval efficiency can be achieved only within the framework of a computer system architecture embodying parallel search hardware, intelligent I/O architecture, and effective data partitioning and search strategies. Parallelism by itself is not a panacea. This paper presents the framework of the system architecture and the details of the intelligent I/O architecture called IMSA, data partitioning, and the RAP.3 integrated retrieval system. The RAP.3 system is driven by cluster searching for narrowing the search space. All media documents are converted to special relational formats and searched by relational database and text search commands. An integrated multimedia information system supported by object-oriented database methodology within the RAP.3 system architecture is included as a demonstration. Descriptors: information retrieval; knowledge based systems; multimedia systems; object- oriented databases; parallel architectures; relational databases

Pan, Miin-Jeng; Shi-Kuo Chang; Chien-Chiao Yang A two-level metadata dictionary approach for semantic query processing in multidatabase systems . International Journal of Software Engineering and Knowledge Engineering (Singapore) vol.3, no.2 (June 1993) : 231-55 A multidatabase system (MDBS) is a system that integrates several autonomous database systems and provides users with a uniform access to all the databases. The author develops a two-level active metadata dictionary approach for semantic query processing. To capture the global view of data schemas of participating databases which may be heterogeneous, a Horn- clause data model is used. The lower-level metadata dictionaries (LLMDs) keep metadata for each corresponding local database in MDBS. The higher-level metadata dictionary (HLMD) integrates the metadata about all LLMDs. The database integration strategy includes two phases: schema translation and schema integration. A bottom-up approach integrates schema from the underlying database schemas. The evaluation strategy is a top-down approach. It starts with a query as a global goal to be achieved, unifies and optimizes the query to decompose the goal into subgoals that can be evaluated against the extensional database, then translates these subgoals into corresponding queries against underlying DBMSs. To solve the control problem, he employs a G-net model for procedure control and inference control. An experimental implementation in Prolog is described. Descriptors: distributed databases; Horn clauses; query processing

Roszkowski, M.; Lukas, C. A distributed architecture for resource discovery using metadata. D-Lib Magazine June 1998 The article describes an approach for linking geographically distributed collections of metadata so that they are searchable as a single collection. The authors describe the infrastructure, which uses standard Internet protocols such as the Lightweight Directory Access Protocol (LDAP) and the Common Indexing Protocol (CIP), to distribute queries, return results, and exchange index information. The authors discuss the advantages of using linked collections of authoritative metadata as an alternative to using a keyword indexing search-engine for resource discovery. They examine other architectures that use metadata for resource discovery, such as Dienst/NCSTRL, the AHDS HTTP/Z39.50 Gateway, and the ROADS initiative. Finally, they discuss research issues and future directions of the project. Descriptors: access protocols; information retrieval; Internet; query processing

27 Rowley, J. Revolution in current awareness services . Journal of Librarianship and Information Science (UK) vol.26, no.1 (March 1994) : 7-14 Reviews the market-place for existing current awareness services and uses the main features of these products as standards against which to assess three new current awareness services: Inside Information; SwetScan; and UnCover. Inside Information, from the British Library Document Supply Centre, provides a short listing of the key features of each article in the world's principal scholarly periodicals. Approximately one million references are added to the database each year. SwetScan, from Swets Subscription Service covers 7000 periodical titles, and provides title, ISSN, year-volume, some article names, authors, page numbers, and, if the library is a Swets subscriber, the library's subscription number. Uncover is a collaborative venture between Blackwells and CARL and offers access to a multi-disciplinary database based on the holdings of the participating libraries. The main focus of UnCover is speed of document delivery and the intention is to provide a fax of a document within 24 hours or less (within the hour if the document is stored on optical disc). Descriptors: bibliographic systems; document delivery; information services; library automation

Shiri, A Asghar.; Revie, C. Thesauri on the Web: current developments and trends . Online Information Review vol.24, no.4 (2000): 273-9 This article provides an overview of recent developments relating to the application of thesauri in information organisation and retrieval on the World Wide Web. It describes some recent thesaurus projects undertaken to facilitate resource description and discovery and access to wide-ranging information resources on the Internet. Types of thesauri available on the Web, thesauri integrated in databases and information retrieval systems, and multiple-thesaurus systems for cross-database searching are also discussed. Collective efforts and events in addressing the standardisation and novel applications of thesauri are briefly reviewed. Descriptors: information resources; information retrieval; Internet; thesauri

Stern, D. New search and navigation techniques in the digital library. Science & Technology Libraries vol.17, no.3-4 (1999): 61-80 The introduction of technology into library information systems has provided new and enhanced search powers in the following areas: the speed of searching large individual and federated databases, keyword access, access to value-added metadata, customized interfaces (that relieve the burden of difficult techniques for sophisticated options), combinatorics for citation and semantic analysis, post-search relevancy analysis, release from the cost recovery scenario and smart agent assistance. These advances save time, provide new research possibilities and create new data relationships and research areas. However, there are still many areas in which improvements are needed: filters for handling information overload, cross-database searching standards, subject schema normalization, and balancing the need for subject-specific customization and cross-disciplinary standardization. Regardless of the technological advances, there will always be a need for critical thinking skills in order to perform an adequate research search. Descriptors: digital libraries; distributed databases; information analysis; information retrieval; meta data; online front-ends; software agents; standardisation

Stubbs, R. Retrieval technology and the information explosion. Business Information Review (UK) vol.15, no.4 p. 224-8 Dec. 1998 Among a range of important technical issues facing by publishers for profit, the most urgent would seem to be that, while CD-ROM represents an established, proven source of revenue, the vast majority of investment is currently going into Internet and Web development, where evidence to date shows that revenue is scarce. The paper considers the question of how to generate Web revenues with reference to the briefing paper. A model approach, produced by Optosof, the specialist in Internet and CD-ROM retrieval solutions. The paper looks at the relationship between publishers' existing business models, and the technology issues affecting their ability to establish the Web as a profitable source of revenue in the short term. The paper suggests that a logical strategy might be to put core content on CD-ROM, while supplementing it with Internet based

28 access to topical and real time content. The key benchmark as far as performance is concerned will be to have a product built around a fast search engine that offers simultaneous search of multiple databases stored on one or more servers and where retrieval of data allows instant analysis and easy view, copy, export and print. It concludes with useful advice on implementing a successful strategy based on these requirements. Descriptors: CD-ROMs; information resources; information retrieval; Internet; investment ; search engines

Taylor, S. A quick guide to ... Z39.50. Interlending & Document Supply (UK) vol.31, no.1 (2003) : 25-30 Z39.50 is a standard that defines a protocol to support the searching and retrieval of information across networks. It is much used in libraries, working "behind the scenes" to enable complex searches and multiple database searches to be carried out by users in a user-friendly way, where the complexity of search and retrieval is hidden from them. This guide offers an overview of the Z39.50, what it is, how and why it has been developed, how it works and where future developments might lead. In addition, the "Jargon buster" section contains definitions of the terms and acronyms associated with Z39.40, that can be confusing to the beginner. The "resources" section offers suggestions for further reading on the subject, to help in building on the basic overview offered here. Descriptors: information retrieval; protocols; standards

Tennant, Roy. The right solution: federated search tools. Library Journal vol. 128 , no. 11 , (June 15, 2003): 28-30 Discusses why federated or cross-database search tools now available on the market represent the correct solution for unifying access to a variety of information resources. These tools can search not only library catalogs, but also commercial abstracting and indexing databases, Web search engines, and a variety of other databases, while often merging and de-duplicating results. While these software solutions are still in an early stage of development, they already offer key functionality, both proving the benefits of federated search tools and pointing to their potential. A number of libraries already use such tools to serve their clientele. Descriptors: Information services; Crossfile searching; Bibliographic databases; Library and archival services

Velegrakis, Y.; Christophides, V.; Constantopoulos, P. On Z39.50 wrapping and description logics. International Journal on Digital Libraries (Germany) vol.3, no.3 (Oct. 2000) : 208-20 Z39.50 is a client/server protocol that is widely used in digital libraries and museums for searching and retrieving information spread over a number of heterogeneous sources. To overcome semantic and schematic discrepancies among the various data sources, the protocol relies on a world view of information as a flat list of fields, called access points (AP). One of the major issues for building Z39.50 wrappers is to map this unstructured list of APs to the underlying source data structure and semantics. For highly structured sources (e.g. database management systems, knowledge base systems) this mapping is quite complex and considerably affects the quality of the retrieved data. Unfortunately, existing Z39.50 wrappers have been developed from scratch and they do not provide high-level mapping languages with verifiable properties. In this paper, we propose a description logic (DL) based toolkit for the declarative specification of Z39.50 wrappers. We claim that the conceptualization of AP mappings enables a formal validation of the query translation quality (e.g., ill-defined mappings, inappropriate APs, etc.) and allows one to tackle a number of Z39.50 pending issues (e.g. meta-data retrieval, query failures due to unsupported APs, etc.). Furthermore, our DL-based approach allows the development of Z39.50 wrappers enriched with a number of added-value services such as conceptual structuring of flat Z39.50 vocabularies and intelligent Z39.50 query assistants. These services are quite useful for profile developers, Z39.50 wrapper administrators and end-users. Descriptors: access protocols; client-server systems; computer aided software engineering; data structures; digital libraries; formal logic; formal specification; query processing; vocabulary

29 Wu, G.; Ahlfeldt, H.; Wigertz, O. MultiLink-an intermediary system for multi-database access. Methods of Information in Medicine (West Germany) vol.32, no.1 (Feb. 1993) :. 82-9 The design and implementation of MultiLink is described, a desktop system for direct enduser access to remote bibliographic information. The system is an application of a client-server technique based on a campuswide network, with the objective to assist end-users to accomplish the information retrieval process by capturing knowledge and expertise for searches in query formulations. MultiLink, via intelligent interfaces, allows users to access several dozens of bibliographic databases. The application integrates regional, national and international resources, and brings library services to the user's desktop level. Descriptors: bibliographic systems; information retrieval; online front-ends; user interfaces

Zhao, J.L. Schema coordination in federated database management: a comparison with schema integration . Decision Support Systems (Netherlands) vol.20, no.3 (July 1997) : 243-57 We introduce a new approach, termed schema coordination, as an alternative to the well-known schema integration approach for processing cooperative queries in federated database systems. The schema coordination approach is based on the attribute correspondence matrix that links similar attributes in all component databases. We compare the schema coordination approach with the schema integration approach for both metadata management and cooperative query processing. We demonstrate that schema coordination offers a much simpler methodology that enables logical data independence and is especially better suited for database federations consisting of many competing databases with ever-evolving metadata. Descriptors: business data processing; distributed databases; query processing; relational databases

30