Indexing and Searching Document Collections Using Lucene

Indexing and Searching Document Collections Using Lucene

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of New Orleans University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 5-18-2007 Indexing and Searching Document Collections using Lucene Sridevi Addagada University of New Orleans Follow this and additional works at: https://scholarworks.uno.edu/td Recommended Citation Addagada, Sridevi, "Indexing and Searching Document Collections using Lucene" (2007). University of New Orleans Theses and Dissertations. 1070. https://scholarworks.uno.edu/td/1070 This Thesis is protected by copyright and/or related rights. It has been brought to you by ScholarWorks@UNO with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights- holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/or on the work itself. This Thesis has been accepted for inclusion in University of New Orleans Theses and Dissertations by an authorized administrator of ScholarWorks@UNO. For more information, please contact [email protected]. Indexing and Searching Document Collections using Lucene A Thesis Submitted to the Graduate Faculty of the University of New Orleans in partial fulfillment of the requirements for the degree of Master of Science in Computer Science by Sridevi Addagada B.Tech. Jawaharlal Nehru Technology University, 2002 May 2007 ACKNOWLEDGEMENTS I take this opportunity to thank my thesis advisor Dr. Shengru Tu. I thank him for giving me the freedom to explore the various possibilities in this fast growing field. I would also like to take this opportunity to thank my thesis committee members, Dr. Jing Deng and Dr. Adlai DePano. I once again thank Dr. Shengru Tu, graduate coordinator, Department of Computer Science, UNO for guiding me all through my Master’s program. I would also like to thank Dr. Mahdi Abdelguerfi, Chairman, Department of Computer Science, University of New Orleans, for his support. I would also like to express my gratitude to all the other professors in the department. I finally thank my parents and friend for their overwhelming support all the way. ii Table of Contents List of FIGURES...................................................................................................................................... v Abstract................................................................................................................................................... vi 1. INTRODUCTION................................................................................................................................ 1 2. BACKGROUND.................................................................................................................................. 3 2.1 History of Lucene ...................................................................................................................... 3 2.2 Installing Lucene........................................................................................................................ 3 2.3 Lucene Ports .............................................................................................................................. 3 2.4 Lucene Classes........................................................................................................................... 4 2.5 Indexing..................................................................................................................................... 6 2.5.1 Structure of Lucene Index ............................................................................................ 6 2.5.2 Factors affecting Indexing Speed.................................................................................. 7 2.5.3 In-Memory Indexing .................................................................................................... 8 2.5.4 Multi-Threaded Indexing.............................................................................................. 8 2.5.5 optimizing a Lucene index............................................................................................ 8 2.6 Searching................................................................................................................................... 9 2.6.1 Uninformed search....................................................................................................... 9 2.6.2 Informed search ........................................................................................................... 9 2.6.3 Adversarial search...................................................................................................... 10 2.6.4 Interpolation search.................................................................................................... 10 2.7 Analyzer .................................................................................................................................. 10 2.8 Overview of Lucene Architecture and Lucene Applications ...................................................... 10 2.9 Lucene in Action...................................................................................................................... 11 2.9.1 Indexing..................................................................................................................... 12 2.9.2 A Lucene Index.......................................................................................................... 12 3. Application Design ............................................................................................................................. 14 3.1 System Configuration .............................................................................................................. 14 3.2 Information Collection ............................................................................................................. 15 3.3 Database table management...................................................................................................... 16 3.4 Indexing................................................................................................................................... 17 3.5 Searching................................................................................................................................. 18 3.6 Integrating keyword search with course property query............................................................. 20 4 System Implementation....................................................................................................................... 21 4.1 Database queries ...................................................................................................................... 21 4.2 Indexer..................................................................................................................................... 23 4.2.1 Lucene Index Classes................................................................................................... 23 4.2.2 Indexing PDF files....................................................................................................... 26 4.2.3 Indexing Microsoft Word and Rich Text Format Documents ........................................ 29 4.2.4 Using Lucene Index Classes......................................................................................... 30 4.3 Searcher................................................................................................................................... 30 4.3.1 Search classes in Lucene .............................................................................................. 30 4.3.2 Searching PDF Documents........................................................................................... 33 4.3.3 Search Results.............................................................................................................. 34 iii 5. RESULTS .......................................................................................................................................... 36 6. CONCLUSION /FUTURE WORK..................................................................................................... 38 7. REFERENCES................................................................................................................................... 39 VITA ..................................................................................................................................................... 40 iv List of Figures Figure 2.1: Structure of a Lucene index ...........................................................................7 Figure 2.2: Overview of Lucene applications.................................................................11 Figure 2.3: A Lucene Index ........................................................................................... 13 Figure 2.4: Lucene index screen capture........................................................................ 13 Figure 3.1: System Configuration GUI ..........................................................................15 Figure 3.2: Add/Edit course properties information ....................................................... 15 Figure 3.3: Add/Edit GUI Buttons.................................................................................16 Figure 3.4: Database management .................................................................................16

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    47 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us