Searching for Information on Occupational Accidents
Total Page:16
File Type:pdf, Size:1020Kb
SEARCHING FOR INFORMATION ON OCCUPATIONAL ACCIDENTS DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Shih-Kwang Chen, M.S. ***** The Ohio State University 2008 Dissertation Committee: Approved by Professor Philip J. Smith, Adviser Professor Jerald R. Brevick Adviser Professor Steven A. Lavender Industrial and Systems Engineering Graduate Program Copyright by Shih-Kwang Chen 2008 ABSTRACT Effective retrieval of the most relevant documents on the topic of interest from the Internet is difficult due to the large amount of information in all types of formats. Studies have been conducted on ways to improve information retrieval (IR). One approach to improve searches in large collections, such as the Web, is to take advantage of semantic representations in pre-existing relational databases that have been developed for explicit purposes besides supporting Internet searches in general. In an effort to enhance IR on the Internet, a prototype of a topic-oriented search agent, SAOA-1, was developed to use embedded semantics and domain-specific knowledge extracted from such a database. Activated when a set of retrieved keywords appears related to the topic of “occupational accidents”, SAOA-1 constructs an alternative search query and pruned lists of suggested refinements by applying the search engine knowledge and the domain- specific knowledge and semantics extracted from a relational database. Information seekers could then use the alternative search query or refine it further with a modified search query developed by SAOA-1 based on its semantic representation of the topic of occupational accidents to complete context-sensitive pruning of the semantic neighborhood. ii An empirical study was conducted to evaluate the usefulness of SAOA-1 in assisting information seekers to retrieve relevant documents. Sixty participants were randomly assigned to one of two treatments: with or without the assistance of SAOA-1, to perform Internet searches. Prior to performing searches, each participant had to decide upon a topic based on two given articles addressing hand injuries in the workplace. The participants then performed searches and, when satisfied with the results, evaluated the relevance of the first forty documents of their final search to their research topics on a 1- to-5 scale Likert scale. It was hypothesized that the treatment type could have an overall effect on expected rating. Based on the data collected, the average expecting rating was statistically significant (p < 0.001) with a slight improvement of 10% due to the treatment. From a practical perspective, however, the size of the effect was modest. The findings suggest that a topic-oriented search agent might be useful in assisting information seekers to retrieve more relevant documents but suggest important directions for further evaluating methods and settings for taking advantage of such semantically-based search techniques. iii Dedicated to my family iv ACKNOWLEDGMENTS This dissertation could not have possibly been written without the dedicated support from my advisor, Dr. Phil Smith. My many thanks and appreciation to him for “adopting” me as his graduate student. I am deeply indebted to him for his guidance, challenge, encouragement, patience, and his help in many ways. Thank you, Dr. Smith! I am grateful to Drs. Jerald Brevick and Steven Lavender for serving on my committee and their friendship and contribution. I wish to thank Drs. Mark McCord (Civil Engineering), Jane Fraser, and Gary Maul who had advised me during prior stages of my study. I wish to thank the faculty and staff of the Industrial, Welding, and Systems Engineering department for their friendship during my years in the department, especially Drs. Clark Mount-Campbell, Jose Castro, James Connors (Agriculture Education), David Dickinson, Ralph Gardner III (Special Education), Julie Higle, Robert Lundquist, and Allen Miller who had been involved in my program. I wish to thank Ms. Pam Hussen for always making me feel welcomed even during her busiest hours. I wish to thank Dr. William Harper for his encouragement and for spending his precious consulting time with my questions on statistics. I would like to thank Mr. Tim Watson at Graduate School for his generous assistance. v I want to express my gratitude to Mr. Cedric Sze for his moral and financial supports and sharing his experiences and knowledge all these years. He has been my boss, my mentor, and my good friend. He is always positive and looks at things from the brighter side. I wish to thank my colleagues in the department’s computer lab office for their friendship and expertise in computers. I have significantly enhanced and broadened my computer skills by having worked with them. I wish to thank Dr. Mohammed Rahman at Office of Information Technology for statistics consulting service. I wish to thank Miss Laurie Brevick as my editor. I want to thank my neighbors Kelly Jeppensen (Doctor-to-be) and Jeff Dotson (Ph.D.-to-be) for our frequent discussions on statistics. The final data analysis was output by Stata which was installed on Kelly’s laptop. I feel deeply blessed to have had many other friendships with special thanks to Scott and Angie Kelly for being such a wonderful neighbor. I’d like to thank Peir-En Yeh and her mother for treating my family as part of their family. I want to thank my buddy John Hoffman, Jr. He is my “twin” brother, my dictionary of English, and my map of the U.S.A. Thanks, Jack, for always being there for me! Finally, I want to thank my wife Amy and sons Jerem and Jed, my mother-in-law Chiou-Tzu Lin Wang, my mother Nancy and father Sankuo, my sister Alice and her family: Oliver, Dennis and Debbie, and my other sisters Joyce and Doris for their love and support. Special thanks to my Mom for helping Amy and me throughout these past two years. vi VITA 2000 ......................................................... M.S. Industrial and Systems Engineering, The Ohio State University FIELDS OF STUDY Major Field: Industrial and Systems Engineering vii TABLE OF CONTENTS Page Abstract ............................................................................................................................. ii Dedication ........................................................................................................................ iv Acknowledgments .............................................................................................................. v Vita .................................................................................................................................. vii List of Tables ................................................................................................................... xi List of Figures ................................................................................................................ xiii Chapters: 1. Introduction ............................................................................................................ 1 2. Literature Review ................................................................................................... 3 2.1 Introduction ...................................................................................................... 3 2.2 General IR techniques ...................................................................................... 5 2.2.1 Relational models .............................................................................. 6 2.2.2 Character string models .................................................................... 9 2.2.2.1 Boolean models .................................................................. 9 2.2.2.2 Stemming ......................................................................... 10 2.2.3 Statistical models ............................................................................ 11 2.2.3.1 Vector space models ........................................................ 13 2.2.3.2 Latent semantic indexing models ..................................... 15 2.2.3.3 Probabilistic models ......................................................... 16 2.2.4 Models based on browsing .............................................................. 19 2.2.5 Semantic representations ................................................................ 23 2.2.5.1 Thesaurus ......................................................................... 23 2.2.5.2 Semantic Web .................................................................. 34 2.3 Post-retrieval processing ................................................................................ 38 2.3.1 Ranking ........................................................................................... 38 2.3.2 Refinements .................................................................................... 39 2.4 Search assistance provided by search engines ............................................... 41 2.5 Summary ........................................................................................................ 50 3. Software design: conceptual development and implementation .......................... 54 3.1 Introduction .................................................................................................... 54 viii 3.2 Test results with the assistance of SAOA-1 ................................................... 60 3.3 The assistance of SAOA-1 ............................................................................