Basic Database Searching
Total Page:16
File Type:pdf, Size:1020Kb
LIS 663 /Dr. Péter Jacsó Term Paper Basic Database Searching By HaiYing Wang UNIVERSITY OF HAWAII AT MANOA Library and Information Science Program Dec. 1st, 2003 I. Topic Analysis I like to use Google to begin my search. Google can provide fast search and retrieve highly relevant information needed from millions of resources with one single search query. It enables me to use natural language to build my search strategy, and also gives me suggestion about the queries. That is why I am so interested in the OneSearch features in the DialogWeb. Therefore, the topic of my search is about “cross database searching.” I aim to find information in technical aspect about the topic, that is, why cross database searching can be implemented, and how it is executed, etc. The first three articles that I have read about the topic are: i. Jacsó, Péter. Cross-searching electronic journal archives. Information Today 19(6) (June, 2002): 34. ii. Jacsó, Péter. Cross-database searching on the Web with term mapping from multiple thesauri. In Proceedings of 20th National Online Meeting, New York, NY 18-20 May 1999 (M.E. Williams, ed.): 217-25. iii. Tennant, Roy. The right solution: Federated search tools. Library Journal. 128(11) (Jun 15, 2003): 28 These three articles helped me to learn that there exist three main techniques in cross-data base searching: Query mapping, Metadata search, and Z39.50 protocol/standard. I initially defined two concept groups and corresponding terms as follows: Concept groups Corresponding terms Cross-database searching Cross database searching simultaneous database searching multiple database searching Information retrieval techniques Query mapping Metadata Z39.50 As Z39.50 protocol and metadata search are very new topics, the limit of the publication year can be set from 1990 to present. I am preferred to the type of journal articles because I can find the fulltext easily compared to conference papers, or research report. The language limited is English only. 1 II. Six Nominated Databases chosen in DIALINDEX Because my topic is related to computer and information Science, I began DIALINDEX search with setting files to two subject categories: CompSci and InfoSci (b 411; sf compsci,infosci). There are 26 files selected. According to the two concept groups mention above, I tested several queries: 2 According to the posting numbers for each test query, I took the following 10 files/databases as nominees. 3 File Database Name 1: ERIC (1966-present) 2: INSPEC (1969-present) 6: NTIS (1964-present) 8: Ei Compendex ® (1970-2003) 34: SciSearch® - a Cited Reference Science Database (1990-present) 144: PASCAL (1973-2003) 148: Gale Group Trade & Industry Database(TM) (1976-present) 202: Information Science & Technology Abstracts (1966-present) 233: Internet & Personal Computing Abstracts(TM) (1981-present) 438: Library Literature and Information Science (1984-present) After studying the File Description and Subject Coverage in the blue sheets of these 10 files, I kicked File 438 out of game first because it is an index only database although the subject coverage is perfect for the topic. File 148 was kicked out secondly because it has too much information about business products and the document types are journal articles and newsletters/newspaper articles only. File 233 was kicked out because only 90 journals are indexed and abstracted so that the document type is journal articles only. Kicking out File 34 is the most difficult decision I made. This file is the most famous citation database by the Institute for Scientific Information (ISI) and has over 12,104,390 records as of November 2003. From the viewpoint of the posting number in each test query, it should be kept. However, when I used File 414 (Dialog Journal Name Finder) to find and report the journal name of “Library Journal” I found none of File 2, 6, 8, and 34 includes this important journal. Compared with the other 6 files left (File 1, 2, 6, 8, 144, 202), there are only 2 document types in File 34: Journal Articles and Book Reviews, and an abstract is not always available for each record. Although there is a descriptor index, but the consistency of descriptors is not good because the descriptors are from author(s)’ key words. Through the comparisons among 6 nominated files/databases (File 1, 2, 6, 8, 144, 202), it can be seen clearly that the time coverage of these 6 files are almost the same, the Geographic Coverage of each file is international, and the document types of each file include journal articles, conference papers, books, reports, and theses/dissertations (except File 8: Ei Compendex). Three basic indexes: TI (word indexing), AB (word indexing), and DE (word & phrase indexing) are available for all 6 files. 4 5 III. Two Best Databases Chosen The 6 nominated files/databases mentioned above can be categorized into 2 clusters: File 2, 6, and 8 are more engineering aspect, and File 1, 144, and 202 are more general science. So I attempted to choose one best database from each cluster. Based on the test queries for the concept groups executed in the section II of this paper, the following queries were formed by including synonyms and term variations, and expanding proximity. Set Term Searched Items File S1 CROSS()DATABASE()SEARCH? 53 S1 CROSS()DATABASE()SEARCH? 16 202 S1 CROSS()DATABASE()SEARCH? 8 144 S1 CROSS()DATABASE()SEARCH? 4 8 S1 CROSS()DATABASE()SEARCH? 1 6 S1 CROSS()DATABASE()SEARCH? 18 2 S1 CROSS()DATABASE()SEARCH? 6 1 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 423 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 95 202 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 83 144 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 45 8 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 39 6 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 120 2 S2 (CROSS OR MULTI OR MULTIPLE OR SIMULTAN?)(2N)(DATABASE? OR DATA()BASE?)(2N)SEARCH? 41 1 S3 INFORMATION()RETRIEVAL()TECHNI? 603 S3 INFORMATION()RETRIEVAL()TECHNI? 75 202 S3 INFORMATION()RETRIEVAL()TECHNI? 90 144 S3 INFORMATION()RETRIEVAL()TECHNI? 103 8 S3 INFORMATION()RETRIEVAL()TECHNI? 57 6 S3 INFORMATION()RETRIEVAL()TECHNI? 242 2 S3 INFORMATION()RETRIEVAL()TECHNI? 36 1 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 9818 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1869 202 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1553 144 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1731 8 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1451 6 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1568 2 S4 INFORMATION(2N)RETRIEVAL?(2N)(TECHNI? OR RESEARCH OR METHOD?) 1646 1 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 16396 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 533 202 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 1325 144 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 2612 8 6 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 286 6 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 11457 2 S5 (QUERY OR QUERIES)(2N)(PROCESS? OR HANDL? OR MAP?) 183 1 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 1582 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 110 202 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 402 144 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 275 8 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 55 6 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 604 2 S6 (METADATA OR META()DATA OR META)(2N)(SEARCH? OR ACCESS? OR APPROACH?) 136 1 S7 Z39()50 727 S7 Z39()50 199 202 S7 Z39()50 207 144 S7 Z39()50 41 8 S7 Z39()50 10 6 S7 Z39()50 226 2 S7 Z39()50 44 1 S8 S5 OR S6 OR S7 18607 S8 S5 OR S6 OR S7 834 202 S8 S5 OR S6 OR S7 1920 144 S8 S5 OR S6 OR S7 2916 8 S8 S5 OR S6 OR S7 350 6 S8 S5 OR S6 OR S7 12228 2 S8 S5 OR S6 OR S7 359 1 S9 S4 AND S8 376 S9 S4 AND S8 109 202 S9 S4 AND S8 36 144 S9 S4 AND S8 60 8 S9 S4 AND S8 10 6 S9 S4 AND S8 125 2 S9 S4 AND S8 36 1 S10 S2 AND S9 2 S10 S2 AND S9 0 202 S10 S2 AND S9 2 144 S10 S2 AND S9 0 8 S10 S2 AND S9 0 6 S10 S2 AND S9 0 2 S10 S2 AND S9 0 1 S11 S2 OR S9 797 S11 S2 OR S9 204 202 S11 S2 OR S9 117 144 S11 S2 OR S9 105 8 S11 S2 OR S9 49 6 7 S11 S2 OR S9 245 2 S11 S2 OR S9 77 1 S12 ID (sorted in duplicate order) 797 S13 IDO S11 (duplicates only) 170 S14 RD S11 (unique items) 701 S14 RD S11 (unique items) 200 202 S14 RD S11 (unique items) 78 144 S14 RD S11 (unique items) 76 8 S14 RD S11 (unique items) 33 6 S14 RD S11 (unique items) 237 2 S14 RD S11 (unique items) 77 1 S15 S14/ENG 667 S15 S14/ENG 195 202 S15 S14/ENG 72 144 S15 S14/ENG 75 8 S15 S14/ENG 29 6 S15 S14/ENG 219 2 S15 S14/ENG 77 1 S16 S15/1990:2003 508 S16 S15/1990:2003 156 202 S16 S15/1990:2003 71 144 S16 S15/1990:2003 53 8 S16 S15/1990:2003 13 6 S16 S15/1990:2003 169 2 S16 S15/1990:2003 46 1 From the query log above, it shows that File 2 and 202 have the most two highest posting numbers in Set 2, 9, and 11.