INFORMATION RETRIEVAL USING LUCENE and WORDNET a Thesis

Total Page:16

File Type:pdf, Size:1020Kb

INFORMATION RETRIEVAL USING LUCENE and WORDNET a Thesis INFORMATION RETRIEVAL USING LUCENE AND WORDNET A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Jhon Whissel December, 2009 INFORMATION RETRIEVAL USING LUCENE AND WORDNET Jhon Whissel Thesis Approved: Accepted: Advisor Dean of the College Dr. Chien-Chung Chan Dr. Chand Midha Committee Member Dean of the Graduate School Dr. Zhong-Hui Duan Dr. George R. Newkome Committee Member Date Dr. Kathy Liszka Department Chair Dr. Chien-Chung Chan ii ABSTRACT This thesis outlines the use of Apache Lucene, its subproject Nutch, and WordNet as tools for information retrieval, a science of searching through data in order to obtain knowledge that has become increasingly relevant in the Information Age. Lucene is a software library released by the Apache Software Foundation which provides information retrieval capabilities to programmers. Nutch, based on Lucene, adds Internet search functionality. WordNet is a lexical database which groups similar words into sets of synonyms, or synsets, and tracks their semantic relationships. This creates a combination of a dictionary and a thesaurus which may be browsed as such or used in software applications. Lucene and Nutch are released under the Apache Software License and are free and open source. WordNet is released under a similar license. The availability of free and open source tools such as Lucene, Nutch, and WordNet grants software developers a base from which they may create applications that can be tailored to their specifications, while simultaneously eliminating the need to create the entire code base for their projects from scratch. This practice can result in reduced programming time and lower software development costs. The remainder of this document outlines the use of these tools and then presents the methodology for the integration of their combined capabilities into a search engine capable of online information retrieval. This discussion culminates with a demonstration of how WordNet may be employed to remove search query ambiguity. iii DEDICATION This thesis is dedicated to My father, Fred Whissel For inspiring creativity My mother, Barbara Whissel For inspiring perseverance My aunt, Carol Decker For her unwavering support And to all of my family For allowing me to stand on their shoulders So that I might reach greater heights. iv ACKNOWLEDGEMENTS I would like to extend my most sincere gratitude to everyone who has played a role in my life and education throughout the years. The guidance, support, and inspiration of these numerous individuals has helped shaped me into who I am today. Many thanks to Dr. Pelz, Peggy Speck, Chuck Van Tilburg, and the rest of the faculty and staff of the Department of Computer Science for both sharing their knowledge as well as lending their aid whenever it was called upon. Though I was unable to attend classes with all of you, there was not a single faculty member who did not contribute in some way to my education over the last several semesters. Thanks to the Department as well for awarding me with an assistantship, providing me with the opportunity to teach at the University of Akron and an experience that I will never forget. Special thanks in particular to Dr. Chan for agreeing to become my mentor and aiding me with my research. The crucial guidance and direction that I received has bestowed upon me the skills needed to successfully complete this thesis. Thanks to my Aunt Carol and the rest of my family for helping me through the challenges I have faced, and for demonstrating selfless compassion and understanding. Finally, I would like to offer my deepest thanks to my parents for raising me, instilling me with their wisdom, setting a good example for me, molding my personality, and without whose sacrifices the life I know today would not be possible. v TABLE OF CONTENTS Page LIST OF FIGURES ...........................................................................................................xi CHAPTER I. INTRODUCTION ..........................................................................................................1 1.1. Society in Transition .......................................................................................1 1.2. The Information Revolution ...........................................................................2 1.3. Information as a Commodity ..........................................................................3 1.4. Impact of the Internet ......................................................................................3 1.5. Information Abstractions ................................................................................4 1.6. Challenges of Information Management ........................................................5 1.7. Information Retrieval ......................................................................................6 1.8. Software Development in the Information Age ..............................................6 1.9. Free and Open Source Software Movements ..................................................7 1.10. Open Source Today .......................................................................................8 1.11. Drawbacks of Traditional Search Engines ....................................................9 1.12. Open Source Information Retrieval ............................................................10 1.13. Benefits of Open Source Development .......................................................10 1.14. WordNet ......................................................................................................11 vi 1.15. Combination of Open Source Tools ............................................................11 1.16. Working with Lucene and WordNet ............................................................12 II. APACHE LUCENE AND WORDNET ......................................................................13 2.1. Introduction to Lucene ..................................................................................13 2.2. Features of Lucene ........................................................................................14 2.3. Lucene Drawbacks and Nutch ......................................................................14 2.4. The Lucene Query Parser ..............................................................................15 2.5. Terms and Operators .....................................................................................16 2.6. Term Modifiers .............................................................................................16 2.7. Lucene Indexes .............................................................................................17 2.8. Index Segments .............................................................................................18 2.9. The WordNet Database .................................................................................18 2.10. Word Relationships .....................................................................................19 2.11. Word Types ..................................................................................................20 2.12. Word Types Excluded from WordNet .........................................................20 2.13. Relationships Between Word Types ............................................................21 2.14. Using WordNet ...........................................................................................21 2.15. Type Relationships in WordNet ..................................................................22 2.16. Noun Hypernyms and Hyponyms ...............................................................22 2.17. Noun Holonyms and Meronyms .................................................................23 2.18. Verb Hypernyms and Hyponyms ................................................................24 2.19. Verb Entailments .........................................................................................24 2.20. Adjective Relationships ..............................................................................25 vii 2.21. Adverb Relationships ..................................................................................25 2.22. Lexical Relationships in WordNet ..............................................................26 2.23. Practical Applications of WordNet .............................................................27 III. IMPLEMENTING A WEB SEARCH ENGINE .......................................................28 3.1. Using Nutch ..................................................................................................28 3.2. Installing the Java Environment ....................................................................29 3.3. Selecting a Web Interface .............................................................................30 3.4. Installing a Shell Environment ......................................................................31 3.5. Web Crawling with Nutch .............................................................................31 3.6. Large Scale vs. Small Scale Crawls ..............................................................33 3.7. Advantages of Large Scale Crawls ...............................................................33 3.8. Preparing Nutch for Crawling .......................................................................34 3.9. Configuring Nutch ........................................................................................35 3.10. Creating
Recommended publications
  • Security Log Analysis Using Hadoop Harikrishna Annangi Harikrishna Annangi, [email protected]
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by St. Cloud State University St. Cloud State University theRepository at St. Cloud State Culminating Projects in Information Assurance Department of Information Systems 3-2017 Security Log Analysis Using Hadoop Harikrishna Annangi Harikrishna Annangi, [email protected] Follow this and additional works at: https://repository.stcloudstate.edu/msia_etds Recommended Citation Annangi, Harikrishna, "Security Log Analysis Using Hadoop" (2017). Culminating Projects in Information Assurance. 19. https://repository.stcloudstate.edu/msia_etds/19 This Starred Paper is brought to you for free and open access by the Department of Information Systems at theRepository at St. Cloud State. It has been accepted for inclusion in Culminating Projects in Information Assurance by an authorized administrator of theRepository at St. Cloud State. For more information, please contact [email protected]. Security Log Analysis Using Hadoop by Harikrishna Annangi A Starred Paper Submitted to the Graduate Faculty of St. Cloud State University in Partial Fulfillment of the Requirements for the Degree of Master of Science in Information Assurance April, 2016 Starred Paper Committee: Dr. Dennis Guster, Chairperson Dr. Susantha Herath Dr. Sneh Kalia 2 Abstract Hadoop is used as a general-purpose storage and analysis platform for big data by industries. Commercial Hadoop support is available from large enterprises, like EMC, IBM, Microsoft and Oracle and Hadoop companies like Cloudera, Hortonworks, and Map Reduce. Hadoop is a scheme written in Java that allows distributed processes of large data sets across clusters of computers using programming models. A Hadoop frame work application works in an environment that provides storage and computation across clusters of computers.
    [Show full text]
  • Building a Scalable Index and a Web Search Engine for Music on the Internet Using Open Source Software
    Department of Information Science and Technology Building a Scalable Index and a Web Search Engine for Music on the Internet using Open Source software André Parreira Ricardo Thesis submitted in partial fulfillment of the requirements for the degree of Master in Computer Science and Business Management Advisor: Professor Carlos Serrão, Assistant Professor, ISCTE-IUL September, 2010 Acknowledgments I should say that I feel grateful for doing a thesis linked to music, an art which I love and esteem so much. Therefore, I would like to take a moment to thank all the persons who made my accomplishment possible and hence this is also part of their deed too. To my family, first for having instigated in me the curiosity to read, to know, to think and go further. And secondly for allowing me to continue my studies, providing the environment and the financial means to make it possible. To my classmate André Guerreiro, I would like to thank the invaluable brainstorming, the patience and the help through our college years. To my friend Isabel Silva, who gave me a precious help in the final revision of this document. Everyone in ADETTI-IUL for the time and the attention they gave me. Especially the people over Caixa Mágica, because I truly value the expertise transmitted, which was useful to my thesis and I am sure will also help me during my professional course. To my teacher and MSc. advisor, Professor Carlos Serrão, for embracing my will to master in this area and for being always available to help me when I needed some advice.
    [Show full text]
  • Natural Language Processing Technique for Information Extraction and Analysis
    International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 2, Issue 8, August 2015, PP 32-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Natural Language Processing Technique for Information Extraction and Analysis T. Sri Sravya1, T. Sudha2, M. Soumya Harika3 1 M.Tech (C.S.E) Sri Padmavati Mahila Visvavidyalayam (Women’s University), School of Engineering and Technology, Tirupati. [email protected] 2 Head (I/C) of C.S.E & IT Sri Padmavati Mahila Visvavidyalayam (Women’s University), School of Engineering and Technology, Tirupati. [email protected] 3 M. Tech C.S.E, Assistant Professor, Sri Padmavati Mahila Visvavidyalayam (Women’s University), School of Engineering and Technology, Tirupati. [email protected] Abstract: In the current internet era, there are a large number of systems and sensors which generate data continuously and inform users about their status and the status of devices and surroundings they monitor. Examples include web cameras at traffic intersections, key government installations etc., seismic activity measurement sensors, tsunami early warning systems and many others. A Natural Language Processing based activity, the current project is aimed at extracting entities from data collected from various sources such as social media, internet news articles and other websites and integrating this data into contextual information, providing visualization of this data on a map and further performing co-reference analysis to establish linkage amongst the entities. Keywords: Apache Nutch, Solr, crawling, indexing 1. INTRODUCTION In today’s harsh global business arena, the pace of events has increased rapidly, with technological innovations occurring at ever-increasing speed and considerably shorter life cycles.
    [Show full text]
  • Building a Search Engine for the Cuban Web
    Building a Search Engine for the Cuban Web Jorge Luis Betancourt Search/Crawl Engineer NOVEMBER 16-18, 2016 • SEVILLE, SPAIN Who am I 01 Jorge Luis Betancourt González Search/Crawl Engineer Apache Nutch Committer & PMC Apache Solr/ES enthusiast 2 Agenda • Introduction & motivation • Technologies used • Customizations • Conclusions and future work 3 Introduction / Motivation Cuba Internet Intranet Global search engines can’t access documents hosted the Cuban Intranet 4 Writing your own web search engine from scratch? or … 5 Common search engine features 1 Web search: HTML & documents (PDF, DOC) • highlighting • suggestions • filters (facets) • autocorrection 2 Image search (size, format, color, objects) • thumbnails • show metadata • filters (facets) • match text with images 3 News search (alerting, notifications) • near real time • email, push, SMS 6 How to fulfill these requirements? store query At the core a search engine: stores some information a retrieve this information when a question is received 7 Open Source to the rescue … crawler 1 Index Server 2 web interface 3 8 Apache Nutch Nutch is a well matured, production ready “ Web crawler. Enables fine grained configuration, relying on Apache Hadoop™ data structures, which are great for batch processing. 9 Apache Nutch • Highly scalable • Highly extensible • Pluggable parsing protocols, storage, indexing, scoring, • Active community • Apache License 10 Apache Solr TOTAL DOWNLOADS 8M+ MONTHLY DOWNLOADS 250,000+ • Apache License • Great community • Highly modular • Stability / Scalability
    [Show full text]
  • NEXT GENERATION CATALOGUES: an ANALYSIS of USER SEARCH STRATEGIES and BEHAVIOR by FREDRICK KIWUWA LUGYA DISSERTATION Submitted I
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Illinois Digital Environment for Access to Learning and Scholarship Repository NEXT GENERATION CATALOGUES: AN ANALYSIS OF USER SEARCH STRATEGIES AND BEHAVIOR BY FREDRICK KIWUWA LUGYA DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Library and Information Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2017 Urbana, Illinois Doctoral Committee: Associate Professor Kathryn La Barre, Chair and Director Assistant Professor Nicole A. Cooke Dr. Jennifer Emanuel Taylor, University of Illinois Chicago Associate Professor Carol Tilley ABSTRACT The movement from online catalogues to search and discovery systems has not addressed the goals of true resource discoverability. While catalogue user studies have focused on user search and discovery processes and experiences, and construction and manipulation of search queries, little insight is given to how searchers interact with search features of next generation catalogues. Better understanding of user experiences can help guide informed decisions when selecting and implementing new systems. In this study, fourteen graduate students completed a set of information seeking tasks using UIUC's VuFind installation. Observations of these interactions elicited insight into both search feature use and user understanding of the function of features. Participants used the basic search option for most searches. This is because users understand that basic search draws from a deep index that always gives results regardless of search terms; and because it is convenient, appearing at every level of the search, thus reducing effort and shortening search time.
    [Show full text]
  • Full-Graph-Limited-Mvn-Deps.Pdf
    org.jboss.cl.jboss-cl-2.0.9.GA org.jboss.cl.jboss-cl-parent-2.2.1.GA org.jboss.cl.jboss-classloader-N/A org.jboss.cl.jboss-classloading-vfs-N/A org.jboss.cl.jboss-classloading-N/A org.primefaces.extensions.master-pom-1.0.0 org.sonatype.mercury.mercury-mp3-1.0-alpha-1 org.primefaces.themes.overcast-${primefaces.theme.version} org.primefaces.themes.dark-hive-${primefaces.theme.version}org.primefaces.themes.humanity-${primefaces.theme.version}org.primefaces.themes.le-frog-${primefaces.theme.version} org.primefaces.themes.south-street-${primefaces.theme.version}org.primefaces.themes.sunny-${primefaces.theme.version}org.primefaces.themes.hot-sneaks-${primefaces.theme.version}org.primefaces.themes.cupertino-${primefaces.theme.version} org.primefaces.themes.trontastic-${primefaces.theme.version}org.primefaces.themes.excite-bike-${primefaces.theme.version} org.apache.maven.mercury.mercury-external-N/A org.primefaces.themes.redmond-${primefaces.theme.version}org.primefaces.themes.afterwork-${primefaces.theme.version}org.primefaces.themes.glass-x-${primefaces.theme.version}org.primefaces.themes.home-${primefaces.theme.version} org.primefaces.themes.black-tie-${primefaces.theme.version}org.primefaces.themes.eggplant-${primefaces.theme.version} org.apache.maven.mercury.mercury-repo-remote-m2-N/Aorg.apache.maven.mercury.mercury-md-sat-N/A org.primefaces.themes.ui-lightness-${primefaces.theme.version}org.primefaces.themes.midnight-${primefaces.theme.version}org.primefaces.themes.mint-choc-${primefaces.theme.version}org.primefaces.themes.afternoon-${primefaces.theme.version}org.primefaces.themes.dot-luv-${primefaces.theme.version}org.primefaces.themes.smoothness-${primefaces.theme.version}org.primefaces.themes.swanky-purse-${primefaces.theme.version}
    [Show full text]
  • Chapter 2 Introduction to Big Data Technology
    Chapter 2 Introduction to Big data Technology Bilal Abu-Salih1, Pornpit Wongthongtham2 Dengya Zhu3 , Kit Yan Chan3 , Amit Rudra3 1The University of Jordan 2 The University of Western Australia 3 Curtin University Abstract: Big data is no more “all just hype” but widely applied in nearly all aspects of our business, governments, and organizations with the technology stack of AI. Its influences are far beyond a simple technique innovation but involves all rears in the world. This chapter will first have historical review of big data; followed by discussion of characteristics of big data, i.e. from the 3V’s to up 10V’s of big data. The chapter then introduces technology stacks for an organization to build a big data application, from infrastructure/platform/ecosystem to constructional units and components. Finally, we provide some big data online resources for reference. Keywords Big data, 3V of Big data, Cloud Computing, Data Lake, Enterprise Data Centre, PaaS, IaaS, SaaS, Hadoop, Spark, HBase, Information retrieval, Solr 2.1 Introduction The ability to exploit the ever-growing amounts of business-related data will al- low to comprehend what is emerging in the world. In this context, Big Data is one of the current major buzzwords [1]. Big Data (BD) is the technical term used in reference to the vast quantity of heterogeneous datasets which are created and spread rapidly, and for which the conventional techniques used to process, analyse, retrieve, store and visualise such massive sets of data are now unsuitable and inad- equate. This can be seen in many areas such as sensor-generated data, social media, uploading and downloading of digital media.
    [Show full text]
  • A Framework for Evaluating the Retrieval Effectiveness of Search Engines Dirk Lewandowski Hamburg University of Applied Sciences, Germany
    1 A Framework for Evaluating the Retrieval Effectiveness of Search Engines Dirk Lewandowski Hamburg University of Applied Sciences, Germany This is a preprint of a book chapter to be published in: Jouis, Christophe: Next Generation Search Engine: Advanced Models for Information Retrieval. Hershey, PA: IGI Global, 2012 http://www.igi-global.com/book/next-generation-search-engines/59723 ABSTRACT This chapter presents a theoretical framework for evaluating next generation search engines. We focus on search engines whose results presentation is enriched with additional information and does not merely present the usual list of “10 blue links”, that is, of ten links to results, accompanied by a short description. While Web search is used as an example here, the framework can easily be applied to search engines in any other area. The framework not only addresses the results presentation, but also takes into account an extension of the general design of retrieval effectiveness tests. The chapter examines the ways in which this design might influence the results of such studies and how a reliable test is best designed. INTRODUCTION Information retrieval systems in general and specific search engines need to be evaluated during the development process, as well as when the system is running. A main objective of the evaluations is to improve the quality of the search results, although other reasons for evaluating search engines do exist (see Lewandowski & Höchstötter, 2008). A variety of quality factors can be applied to search engines. These can be grouped into four major areas (Lewandowski & Höchstötter, 2008): • Index Quality: This area of quality measurement indicates the important role that search engines’ databases play in retrieving relevant and comprehensive results.
    [Show full text]
  • E-Book: an Introduction to Custom Search Engines for Websites
    E-book: An Introduction to Custom Search Engines for Websites What are you looking for? Search www.sitesearch360.com 1 Contents An introduction to search engines........................................ 4 Your visitor’s search journey: the key elements................ 23 What is site search?................................................................ 6 Presentation — how does it look?................................................ 24 Why is site search important?............................................... 6 Processing — how does it work?.................................................. 28 Navigation vs. search................................................................. 7 Results — how does it deliver?.................................................... 32 How are people using search?............................................... 9 Implementation challenges.................................................... 36 Why are custom search engines the perfect choice?........ 12 JavaScript................................................................................ 36 Semantic search........................................................................ 12 Security.................................................................................. 37 Natural search language............................................................ 13 WordPress............................................................................... 37 Actionable analytics.................................................................. 14 Cloudflare and
    [Show full text]
  • Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"
    Appendix to the paper "Code smell prediction employing machine learning meets emerging Java language constructs" Hanna Grodzicka, Michał Kawa, Zofia Łakomiak, Arkadiusz Ziobrowski, Lech Madeyski (B) The Appendix includes two tables containing the dataset used in the paper "Code smell prediction employing machine learning meets emerging Java lan- guage constructs". The first table contains information about 792 projects selected for R package reproducer [Madeyski and Kitchenham(2019)]. Projects were the base dataset for cre- ating the dataset used in the study (Table I). The second table contains information about 281 projects filtered by Java version from build tool Maven (Table II) which were directly used in the paper. TABLE I: Base projects used to create the new dataset # Orgasation Project name GitHub link Commit hash Build tool Java version 1 adobe aem-core-wcm- www.github.com/adobe/ 1d1f1d70844c9e07cd694f028e87f85d926aba94 other or lack of unknown components aem-core-wcm-components 2 adobe S3Mock www.github.com/adobe/ 5aa299c2b6d0f0fd00f8d03fda560502270afb82 MAVEN 8 S3Mock 3 alexa alexa-skills- www.github.com/alexa/ bf1e9ccc50d1f3f8408f887f70197ee288fd4bd9 MAVEN 8 kit-sdk-for- alexa-skills-kit-sdk- java for-java 4 alibaba ARouter www.github.com/alibaba/ 93b328569bbdbf75e4aa87f0ecf48c69600591b2 GRADLE unknown ARouter 5 alibaba atlas www.github.com/alibaba/ e8c7b3f1ff14b2a1df64321c6992b796cae7d732 GRADLE unknown atlas 6 alibaba canal www.github.com/alibaba/ 08167c95c767fd3c9879584c0230820a8476a7a7 MAVEN 7 canal 7 alibaba cobar www.github.com/alibaba/
    [Show full text]
  • Apache Solr 3 1 Cookbook.Pdf
    Apache Solr 3.1 Cookbook Over 100 recipes to discover new ways to work with Apache's Enterprise Search Server Rafał Kuć BIRMINGHAM - MUMBAI Apache Solr 3.1 Cookbook Copyright © 2011 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: July 2011 Production Reference: 1140711 Published by Packt Publishing Ltd. 32 Lincoln Road Olton Birmingham, B27 6PA, UK. ISBN 978-1-849512-18-3 www.packtpub.com Cover Image by Rakesh Shejwal ([email protected]) Credits Author Project Coordinator Rafał Kuć Srimoyee Ghoshal Reviewers Proofreader Ravindra Bharathi Samantha Lyon Juan Grande Indexer Acquisition Editor Hemangini Bari Sarah Cullington Production Coordinators Development Editor Adline Swetha Jesuthas Meeta Rajani Kruthika Bangera Technical Editors Cover Work Manasi Poonthottam Kruthika Bangera Sakina Kaydawala Copy Editors Kriti Sharma Leonard D'Silva About the Author Rafał Kuć is a born team leader and software developer.
    [Show full text]
  • Natural Language Processing for Online Applications Text Retrieval, Extraction and Categorization
    Natural Language Processing for Online Applications Natural Language Processing Editor Prof. Ruslan Mitkov School of Humanities, Languages and Social Sciences University of Wolverhampton Stafford St. Wolverhampton WV1 1SB, United Kingdom Email: [email protected] Advisory Board Christian Boitet (University of Grenoble) Jn Carroll (University of Sussex, Brighton) Eugene Charniak (Brown University, Providence) Eduard Hovy (Information Sciences Institute, USC) Richard Kittredge (University of Montreal) Geoffrey Leech (Lancaster University) Carlos Martin-Vide (Rovira i Virgili Un., Tarragona) Andrei Mikheev (University of Edinburgh) Jn Nerbonne (University of Groningen) Nicolas Nicolov (IBM, T.J. Watson Research Center) Kemal Oflazer (Sabanci University) Allan Ramsey (UMIST, Manchester) Monique Rolbert (Université de Marseille) Richard Sproat (AT&T Labs Research, Florham Park) K-Y Su (Baviour Design Corp.) Isabelle Trancoso (INESC, Lisbon) Benjamin Tsou (City University of Hong Kong) Jun-ichi Tsujii (University of Tokyo) Evene Tzoukermann (Bell Laboratories, Murray Hill) Yorick Wilks (University of Sheffield) Volume 5 Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization by Peter Jackson and Isabelle Moulinier Natural Language Processing for Online Applications Text Retrieval, Extraction and Categorization Peter Jackson Isabelle Moulinier omson Legal & Regulatory John Benjamins Publishing Company Amsterdam / iladelia TM The paper used in this publication meets the minimum requirements of American 8 National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984. Library of Congress Cataloging-in-Publication Data Jackson, Peter, 1948- Natural language processing for online applications : text retrieval, extraction, and categorization / Peter Jackson, Isabelle Moulinier. p.cm.(Natural Language Processing, issn 1567–8202 ; v.5) Includes bibliographical references and index.
    [Show full text]