Mark Andrea

Standards in Single Search: An Annotated Bibliography

Mark Andrea INFO 522: Information Access & Resources Winter Quarter 2010 Mark Andrea

Introduction and Scope The following bibliography is a survey of scholarly literature in the field of metasearch standards as defined by the Library of Congress (LOC) and the National Information Standards Organization (NISO). Of particular interest is the application of the various protocols, as described by the standards, to real world searching of library literature found in scholarly databases, library catalogs and internally collected literature. These protocols include z39.50, Search Retrieval URL (SRU), Search Retrieval Web Service (SRW) and Context Query Language (CQL) as well as Metasearch XML Gateway (MXG).

Description Libraries must compete with the web to capture users who often do not consider the wealth of information resources provided by the library. This has only been an issue in the last decade. Prior to that, most users, and that includes academic and specialty library users such as corporate users, went to a physical library for their research. With the rise of web-based information, users have become accustomed to easy keyword searching from web pages where sources can range from known and established authority to completely the opposite. Libraries have responded with attempts to provide easy search interfaces on top of complex materials that have been cataloged and indexed according to controlled vocabularies and other metadata type tools. These tools have enabled users for decades effectively find information. In some cases it’s merely an issue of education that most researchers are lacking. So are these metasearch systems ultimately a step backward to accommodate the new search community or do they really address the need to find information that continues to grow exponentially.

Mark Andrea

Summary of Findings

The topic was chosen out of interest to become more familiar with the origins of these metasearch technologies and how they start off as published standards to eventually become working products that serve users daily.

The articles detailed below tend to range from very technical discussions concerning the implementation of the standards to specific case studies where a metasearch system was implemented to solve specific issues. A few articles were examinations of user behavior in the context of experimenting with metasearch systems. The knowledge gained from such studies enabled further program development that better suited the particular needs of that user population. (Jung 2008 entry 2). In the end, despite best efforts, users were still drawn to the search tools provided by rather than the tools provided by the library with the knowledge that, when users are educated in library databases, they will be more drawn to using the library, rather than the web. So this a crucial point since large amounts of money and time are spent trying to make searching easier or like the web, when, in some cases it might not be the payoff that everyone is looking for. Conversely, there is so much literature being produced, that the standards and protocols that are being implemented as metasearch are only improving to the extent that the merger of precise results and ease of use might soon become a reality. The problem is certainly understood by those working in this field and the use of metadata to address the problem continues to advance.

Some articles found, but were not included in this bibliography, would be of use to those interested in the topic but would not be considered academic scholarship. These would be articles found in publications like Computers in Libraries and even the Library Journal which focused more on personal accounts of the technology and opinions of the tools provided by vendors etc etc. These are helpful in getting to know what is out there and would be the source of Mark Andrea another type of bibliography but were not really appropriate for this project. When to exclude a paper was not always easy though needless to say.

Bibliography

Entry 1

A.K. Tyagi, Madhavi M. Dhanwantari, Aparna Raghuraman, and Priyanka D. Kalbhor (2009) Improving Visibility of Libraries through SRU Journal of Library & Information Technology 29 no3 12-15 My 2009

Annotation: The paper discusses the need for a single search interface and common results format to multiple library databases based on the Search Retrieval over URL (SRU) and Search Retrieval Web Service (SRW) standards issued by the Library of Congress. These standards provide a web based communication method for queries and responses that provide an application framework for building multiple database search applications. This framework is known as a gateway that facilitates communication between the user application and the database. The databases must be enabled by SRU or SRW. The article goes into details about how this gateway can be implemented. The authors also discuss test case scenarios with library various SRU compliant library catalogs.

Abstract: Library users nowadays expect digital libraries to be searchable through a single search form. The ultimate goal is to provide a high ranking search quality to enable access to documents containing relevant information from all participating libraries. The Information Centre and Library at the Defence Institute of Advanced Technology (DIAT), Deemed University (DU), at Pune, aims to ensure wide dissemination of knowledge, through an innovative pilot project and achieve a gradual change towards indigenous knowledge storage. The present work is intended to develop a client gateway using search and retrieval via URL (SRU) protocol for searching SRU-compliant databases, keeping in mind trials done in information retrieval in the 1980s and 90s. The client has been developed, tested and implemented successfully with some limitations. Efforts are on to extend the facility for searching DSpace that conforms to industry standard for developing IR's. We are in the process of developing and implementing a server for DSpace to make it SRU compliant. The client part however, has been developed, and tested successfully, though with some limitations. Keywords: Information retrieval standards, digital libraries, client gateway, databases

Mark Andrea

Search Strategy: I chose LISA (Library and Information Science Abstracts due to the subject focus of the database. I started with a keyword search, and then narrowed it down based on descriptors in the most relevant records.

Database: Library and Information Science Abstracts LISA [Dialog]

Method of Searching: Keyword searching

Search String: SRU? OR SEARCH()RETRIEVAL(2N)URL OR FEDERATED()SYSTEM? AND (CHALLENGE? OR DIFFICULT?)

Entry 2

Seikyung Jung (2008) LibraryFind: System design and usability testing of academic metasearch system Journal of the American Society for Information Science and Technology 59,3,375-389, 2008

Abstract: Using off-the-shelf search technology provides a single point of access into library resources, but we found that such commercial systems are not entirely satisfactory for the academic library setting. In response to this, Oregon State University (OSU) Libraries designed and deployed LibraryFind, a metasearch system. We conducted a usability experiment comparing LibraryFind, the OSU Libraries Web site, and Google Scholar. Each participant used all three search systems in a controlled setting, and we recorded their behavior to determine the effectiveness and efficiency of each search system. In this article, we focus on understanding what factors are important to undergraduates in choosing their primary academic search system for class assignments. Based on a qualitative and quantitative analysis of the results, we found that mimicking commercial Web search engines is an important factor to attract undergradu-ates; however, when undergraduates use these kinds of search engines, they expect similar performance to Web search engines, including factors such as relevance, speed, and the availability of a spell checker. They also expected to be able to find out what kinds of content and materials are available in a system. Participants' prior experience using academic search systems also affected their expectations of a new system.

Annotation: The paper studies the behavior of undergraduate students with respect to finding information electronically. OSU attempted to build a Google like simple search interface to multiple local and third party licensed databases Mark Andrea to draw students into using library materials instead of the web exclusively. The study found students wanted a simple single search interface that quickly returned relevant results. After building LibraryFind, which incorporated metasearch elements as well as results recommendations, students with limited library database searching knowledge found Google Scholar to be easier to use and preferred it over LibraryFind and the Library’s web site which served as a gateway to the databases directly.

Search Strategy: Related Record in Web of Science.

Database: Web of Science

Method of Searching: Keyword field search with recommendation

Search String: in the title

Entry 3

Kennedy, P. (May 2008). Manifestations of metadata: from Alexandria to the Web--old is new again. The Australian Library Journal, 57, 2. p.128(19). Retrieved March 13, 2010, from Academic OneFile via Gale:

Abstract: This paper is a discussion of the use of metadata, in its various manifestations, to access information. Information management standards are discussed. The connection between the ancient world and the modern world is highlighted. Individual perspectives are paramount in fulfilling information seeking. Metadata is interpreted and reflected upon in a broad sense, using its literal meaning.

Annotation: This paper addresses the use of metadata throughout as a natural extension to the human need to organize the world around them. Identifies the historical roots of metadata and correlates a consistent theme of standards past and present. Surveys the various metadata standards in use in librarians with a focus on their use in databases and information retrieval.

Search Strategy: Chose INFOSCI to expand my search beyond LISA.

Database: INFOSCI Dialog

Mark Andrea

Method of Searching: Known controlled vocabulary from LISA in the descriptor field combined with terms in the abstract.

Search String: (INTERNET AND STANDARDS/DE AND META?/DE AND INFORM?/AB) AND PY=2006:2010

Entry 4

LeVan, R. (2006). OpenSearch and SRU: A Continuum of Searching. Information Technology and Libraries, 25(3), 151-3. Retrieved March 13, 2010, from Library Lit & Inf Full Text database.

Abstract: Not all library content can be exposed as HTML pages for harvesting by search engines such as Google and Yahoo!. If a library instead exposes its content through a local search interface, that content can then be found by users of metasearch engines such as A9 and Vivísimo. The functionality provided by the local will affect the functionality of the and the findability of the library’s content. This paper describes that situation and some emerging standards in the metasearch arena that choose different balance points between functionality and ease of implementation.

Annotation: This paper provides an overview to the problem of consistent results set in metasearching and advocates the use of SRU, OpenSearch and other metasearch standards, when implementing a single search interface to library controlled content sources including the OPAC and licensed databases.

Search Strategy: Chose INFOSCI to expand my search beyond LISA.

Database: INFOSCI Dialog

Method of Searching: Known controlled vocabulary from LISA in the descriptor field combined with terms in the abstract.

Search String: (INTERNET AND STANDARDS/DE AND META?/DE AND INFORM?/AB) AND PY=2006:2010

Entry 5

Alling, E. & Naismith, R. (2007). Protocol Analysis of a Federated Search Tool: Designing for Users. Internet Reference Services Quarterly, 12(1), 195-210. doi:10.1300/J136v12n01_10

Mark Andrea

Abstract: Librarians at Springfield College conducted usability testing of Endeavor's federated search tool, ENCompass for Resource Access. The purpose of the testing was to make informed decisions prior to customizing the look and function of the software's interface in order to make the product more usable for their patrons. Protocol, or think-aloud, analysis was selected as a testing and analysis method. Subjects from the general college community were recruited and given a list of tasks to perform on ENCompass, and they were asked to speak all of their thoughts out loud as they worked. Upon analyzing the test results, researchers found that subjects' problems fell into certain categories, such as unfamiliarity with terms or navigation from screen to screen. The researchers were able to use their findings to recommend extensive revisions to the interface, which improved usability for this library's patrons.

Annotation: This paper is a usability study of the federated search product called EnCompass from Endeavor Information Systems primarily due to the products ability to be customized. Students reactions to searching the system were recorded on video tape and later analyzed to determine usability. The study discovered more or less empirically, what users found difficult with federated search and used that information to enhance their particular implementation.

Search Strategy: Chose INFOSCI to expand my search beyond LISA.

Database: INFOSCI Dialog

Method of Searching: Known controlled vocabulary from LISA in the descriptor field combined with terms in the abstract.

Search String: (INTERNET AND STANDARDS/DE AND META?/DE AND INFORM?/AB) AND PY=2006:2010

Entry 6

LeVan,RalphR.,Hickey,ThomasB.,andJennyToves.2005. Parallel Text Searching On a Beowulf Cluster usingSRW, DLibMagazine11, no. 9 http://www.dlib. org/dlib/september05/levan/09levan.html (accessed November 2, 2005).

Abstract: While the news is full of reports of the success of the Internet search engines at searching billions of web pages at prices so low that they can afford to give the searching away for free, such affordable searching is not common in the rest of the world. What searching is available in the rest of the world is not Mark Andrea scalable, not cheap or not fast. Often it suffers from a combination of those flaws.

This article describes our experience building a scalable, relatively inexpensive, and fast searching framework that demonstrated 172 searches per second on a database of 50 million records. The article should be of interest to anyone seeking an inexpensive, open source, text-searching framework that scales to extremely large databases. The technology described uses the SRW (Search/Retrieve Web) service in a manner nearly identical to federated searching in the metasearch community and should be of interest to anyone doing federated searching.

Annotation: Authors undertake an experiment to build a large database based on the SRW (Search Retrieval over Web) protocol, which simulates the type of search retrieval found in commercial metasearch applications. Using open source cluster technology, (Beoweulf) the authors were able to build a system that exceeded their original searches per second expectations for the target price.

Search Strategy: Reading through the source article.

Database: N/A

Method of Searching: Footnote chasing

Search String: Referenced in SRU Open Data Future of Meta Search Reiss, K. (2007). SRU, Open Data and the Future of Metasearch. Internet Reference Services Quarterly, 12(3), 369-386. doi:10.1300/J136v12n03_09

Entry 7

Reiss, K. (2007). SRU, Open Data and the Future of Metasearch. Internet Reference Services Quarterly, 12(3), 369-386. doi:10.1300/J136v12n03_09

Abstract: Search/Retrieve via URL (SRU) is a REST-ful Web service defined by the Library of Congress that supports the standardized, machine-readable transmission of textual queries via HTTP using XML. This article evaluates the suitability of SRU for wide implementation by information providers of all types as a tool to enable effective metasearch services. SRU proves to be an excellent Mark Andrea choice for commercial publishers, content aggregators, Internet search engines, and digital library maintainers who wish to make their content available to metasearch tools. This article also presents a hypothetical scenario illustrating how a library could use SRU compliant information resources to implement a metasearch service. doi:10.1300/J136v12n03_09

Annotation: This paper provides a detailed examination of SRU and related protocols for implementing gateways into databases for metasearch applications. The paper describes several methodologies for this implementation with case histories involving libraries around the world.

Search Strategy: I chose LISA (Library and Information Science Abstracts due to the subject focus of the database. I started with a keyword search, and then narrowed it down based on descriptors in the most relevant records. .

Database: Library and Information Science Abstracts Dialog

Method of Searching: Keywords

Search String: SRU? OR SEARCH()RETRIEVAL(2N)URL OR FEDERATED()SYSTEM? AND (CHALLENGE? OR DIFFICULT?)

Entry 8

Thi Truong Avrahami (2005) The FedLemur project: Federated search in the real world Journal of the American Society for Information Science and Technology 57, 3, 347-358 DOI: 10.1002/asi.20283

Abstract: Federated search and distributed information retrieval systems provide a single user interface for searching multiple full-text search engines. They have been an active area of research for more than a decade, but in spite of their success as a research topic, they are still rare in operational environments. This article discusses a prototype federated search system developed for the U.S. government's FedStats Web portal, and the issues addressed in adapting research solutions to this operational environment. A series of experiments explore how well prior research results, parameter settings, and heuristics apply in the FedStats environment. The article concludes with a set of lessons learned from this technology transfer effort, including observations about search engine Mark Andrea

Annotation: In house developed federated search system to retrieve statistics from multiple federal government databases was developed built on top of an existing search system, which was extended for federated searching. The paper provides extensive details on all aspects of the project including metrics for searching with particular keywords and concepts.

Search Strategy: I chose LISA (Library and Information Science Abstracts due to the subject focus of the database. I started with a keyword search, and then narrowed it down based on descriptors in the most relevant records. .

Database: Library and Information Science Abstracts Dialog

Method of Searching: Keywords

Search String: (SRU OR SEARCH()RETRIEVAL(2N)URL? OR FEDERATED?)

Entry 9

Freund, L., Nemmers, J. R. & Ochoa, M. N. (2007). Metasearching -- An Annotated Bibliography. Internet Reference Services Quarterly, 12(3), 411-430. doi:10.1300/J136v12n03_11

Abstract: This is a selective annotated bibliography of journal articles and electronic resources relating to metasearching and metasearch tools. This bibliography is intended as a guide to the resources available to assist librarians and other information professionals interested in implementing or enhancing their metasearch projects. In addition to reviewing the major topics addressed in the publications, each annotation summarizes the important conclusions reached by the authors of the works. doi:10.1300/J136v12n03_11

Annotation: An excellent survey of metasearch products and specific user experiences and case studies. Over 25 annotations are included, mostly from a user and librarian perspective with focus on the need for such systems in the age of Google. Not as technical of a listing as other articles in this bibliography but provides a nice arc on the theme of practical usability in actual context.

Search Strategy: I chose LISA (Library and Information Science Abstracts due to the subject focus of the database. I started with a keyword search, and then narrowed it down based on descriptors in the most relevant records. Mark Andrea

. Database: Library and Information Science Abstracts Dialog

Method of Searching: Keywords

Search String: (SRU OR SEARCH()RETRIEVAL(2N)URL? OR FEDERATED?)

Entry 10

Walker, J. (2006). Cross-Provider Search -- New Standards for Metasearch. The Serials Librarian, 50(1), 125-135. doi:10.1300/J123v50n01_12

Abstract: This article, one of two in these proceedings addressing the issue of cross-provider search, concentrates on metasearch and the National Information Standards Organization (NISO) initiative to create and promote standards in this area. Metasearch is the process whereby a user can search heterogeneous resources simultaneously, through a single query form, and receive results back in a consistent way that will enable both merging of the results and general reuse of the results. The NISO Metasearch Initiative addresses three key aspects of metasearch-access management, collection description and search and retrieve.

Annotation: This paper outlines the issues facing metasearch in libraries and provides a detailed history of the various metasearch standards coming from NISO. The close connection to standards and their real world implementations is well presented and give a good overview for those needing a quick yet thorough introduction without a programming background.

Search Strategy: I chose LISA (Library and Information Science Abstracts due to the subject focus of the database. I started with a keyword search, and then narrowed it down based on descriptors in the most relevant records. . Database: Library and Information Science Abstracts Dialog

Method of Searching: Keywords

Search String: INTERNET AND STANDARDS/DE AND XML/DE AND METADATA

Entry 11 Mark Andrea

Sanderson, Robert, Young, Jeffrey and Ralph LeVan. (2005) SRW/U and UnexpectedSynergies, DLib Magazine 11, no.2: http://www.dlib.org/dlib/february05/sanderson/ 02sanderson.html (accessed November 3, 2005).

Abstract: SRW/U (the Search/Retrieve Web service) and OAI (Open Archives Initiative) are both modern information retrieval protocols developed by distinct groups from different backgrounds at around the same time. This article sets out to briefly contrast the two protocols' aims and approaches, and then to look at some novel ways in which they have been or may be usefully co-implemented. While using SRW as a search service to an OAI repository or aggregated data set is an obvious synergy, there are also many other useful architectures that can be constructed without bending the protocols' semantics.

Annotation: This article provides an historical and technical overview of two metadata related protocols, SRW and OAI (Open Archives Initiative) that have converged to a certain extent over the use of web services particularly SRW protocol for metasearch. The authors provide detailed examples of using OAI with SRW to retrieve records and maintain control over results display format.

Search Strategy: Reading through the source article.

Database: N/A

Method of Searching: Footnote chasing

Search String: Referenced in SRU Open Data Future of Meta Search Reiss, K. (2007). SRU, Open Data and the Future of Metasearch. Internet Reference Services Quarterly, 12(3), 369-386. doi:10.1300/J136v12n03_09

Entry 12

Denenberg, Ray (2009) Search Web Services - The OASIS SWS Technical Committee Work: The Abstract Protocol Definition, OpenSearch Binding, and SRU/CQL 2.0 Mark Andrea

Abstract: The OASIS Search Web Services Technical Committee is developing search and retrieval web services, integrating various approaches under a unifying model, an Abstract Protocol Definition. SRU/CQL and OpenSearch are the two approaches featured by the current work, and we hope that additional protocols will be similarly integrated into this model.

The model provides for the development of bindings. Three bindings will be developed by the Committee: SRU 1.2, OpenSearch, and SRU 2.0. These three are so-called "static" bindings; they are human-readable documents. The first two are simply renderings of the respective existing specifications. The SRU 2.0 binding however is a major new version of SRU, and there will also be a new version of the companion query language, CQL 2.0. The model also defines the concept of a "dynamic" binding, a machine-readable description file that a server provides for retrieval by a client that may then dynamically configure itself to access that server. The premise of the dynamic binding concept is that any server – even one that pre-dated the concept – need only provide a self-description in order to be accessible. A client will be able to access the server simply by reading and interpreting the description and, based on that description, formulating a request (including a query) and interpreting the response. Of course, the premise behind this concept is a standard description language, and that will also be part of the OASIS work.

Annotation: This paper outlines very recent technical work with SRU and CQL (Context Query Language) by OASIS, which is a group responsible for the larger development of XML based technologies related to web services which relates to the much broader application development over the web. Other metasearch protocols are also discussed such as OpenSearch. Technical details are presented with real world examples.

Search Strategy: I chose LISA (Library and Information Science Abstracts due to the subject focus of the database. I started with a keyword search, and then narrowed it down based on descriptors in the most relevant records. .

Database: Library and Information Science Abstracts Dialog

Method of Searching: Keywords

Search String: SRU? OR SEARCH()RETRIEVAL(2N)URL OR FEDERATED()SYSTEM? AND (CHALLENGE? OR DIFFICULT?) Mark Andrea