Enterprise Search Technology Using Solr and Cloud Padmavathy Ravikumar Governors State University

Governors State University OPUS Open Portal to University Scholarship All Capstone Projects Student Capstone Projects Spring 2015 Enterprise Search Technology Using Solr and Cloud Padmavathy Ravikumar Governors State University Follow this and additional works at: http://opus.govst.edu/capstones Part of the Databases and Information Systems Commons Recommended Citation Ravikumar, Padmavathy, "Enterprise Search Technology Using Solr and Cloud" (2015). All Capstone Projects. 91. http://opus.govst.edu/capstones/91 For more information about the academic degree, extended learning, and certificate programs of Governors State University, go to http://www.govst.edu/Academics/Degree_Programs_and_Certifications/ Visit the Governors State Computer Science Department This Project Summary is brought to you for free and open access by the Student Capstone Projects at OPUS Open Portal to University Scholarship. It has been accepted for inclusion in All Capstone Projects by an authorized administrator of OPUS Open Portal to University Scholarship. For more information, please contact [email protected]. ENTERPRISE SEARCH TECHNOLOGY USING SOLR AND CLOUD By Padmavathy Ravikumar Masters Project Submitted in partial fulfillment of the requirements For the Degree of Master of Science, With a Major in Computer Science Governors State University University Park, IL 60484 Fall 2014 ENTERPRISE SEARCH TECHNOLOGY USING SOLR AND CLOUD 2 Abstract Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database in9tegration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites. Databases and Solr have complementary strengths and weaknesses. SQL supports very simple wildcard-based text search with some simple normalization like matching upper case to lower case. The problem is that these are full table scans. In Solr all searchable words are stored in an "inverse index", which searches orders of magnitude faster. Solr is a standalone/cloud enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results. The project will be implemented using Amazon/Azure cloud, Apache Solr, Windows/Linux, MS-SQL server and open source tools. The following are the software requirements of my project: Content Description OS Windows/ Linux/ Unix Database MS-SQL server 2012 Technologies Apache Solr Cloud Amazon, Azure Browser IE , Chrome, Firefox ENTERPRISE SEARCH TECHNOLOGY USING SOLR AND CLOUD 3 Contents LIST OF FIGURES ........................................................................................................................ 5 LIST OF TABLES .......................................................................................................................... 7 INTRODUCTION .......................................................................................................................... 8 OVERVIEW ............................................................................................................................... 8 Data and Information Technology .......................................................................................... 9 Searching and Search Engines ................................................................................................ 9 Database Approach to Searching .......................................................................................... 11 Apache Solr and Lucene Approach to Searching ................................................................. 12 FOCUS ...................................................................................................................................... 14 TOOLS ...................................................................................................................................... 16 CONCEPTS AND TERMINOLOGIES ................................................................................... 18 INDEXING ............................................................................................................................... 21 TOOLS OF THE TRADE ............................................................................................................ 25 LUCENE ................................................................................................................................... 25 Lucene Architecture .............................................................................................................. 25 Lucene Functions: ................................................................................................................. 27 SOLR ........................................................................................................................................ 27 Solr Features ......................................................................................................................... 28 ENTERPRISE SEARCH TECHNOLOGY USING SOLR AND CLOUD 4 Solr Concepts and Terminologies ......................................................................................... 30 Solr Cloud Architecture ........................................................................................................ 32 Solr/Lucene Architecture ...................................................................................................... 33 Solr History ........................................................................................................................... 34 Getting started with Solr ....................................................................................................... 35 CHAPTER 3: GETTING STARTED ........................................................................................... 38 VIEWING INSTANCES IN SERVER .................................................................................... 38 ACCESSING INSTANCES ..................................................................................................... 42 INDEXING DATA ................................................................................................................... 46 SEARCHING DATA................................................................................................................ 48 EXAMPLE 1 ................................................................................................................................. 51 AIRPORT APPLICATION ...................................................................................................... 51 REFERENCES: ............................................................................................................................ 60 ENTERPRISE SEARCH TECHNOLOGY USING SOLR AND CLOUD 5 LIST OF FIGURES Fig 1. Yahoo Website.............................................................................................12 Fig 2. Indexing and Searching............................................................................... 16 Fig 3. Indexing........................................................................................................22 Fig 4. Indexing Process..........................................................................................23 Fig 5. Querying.......................................................................................................23 Fig 6. Inverted Indexing..........................................................................................24 Fig 8. Lucene Architecture..................................…………………………………28 Fig 9. Solr................................................................................................................31 Fig 10. Solr Cloud Architecture................................................................................35 Fig 11. Lucene/Solr Architecture..............................................................................36 Fig 12. Solr History...................................................................................................37 Fig 13. Solr Master Slave Architecture.....................................................................37 Fig 14. Solr in Web Architecture..............................................................................40 Fig 15. Enterprise Search Architecture.....................................................................40 Fig 16. Amazon AWS EC2.......................................................................................42 Fig 17. View Instance...............................................................................................44 Fig 18. Instance Properties........................................................................................46 Fig 19. RedHat_Solr1............................................................................................... 46 Fig 20. Putty Key Generator..................................................................................... 47 Fig 21. Putty Configuratio.........................................................................................48 Fig 22. Terminal Window 1.......................................................................................49

Enterprise Search Technology Using Solr and Cloud Padmavathy Ravikumar Governors State University

Oracle Data Sheet- Secure Enterprise Search

Lucene in Action Second Edition

Apache Lucene Searching the Web and Everything Else

Magnify Search Security and Administration Release 8.2 Version 04

Unravel Data Systems Version 4.5

IBM Cloudant: Database As a Service Fundamentals

Final Report CS 5604: Information Storage and Retrieval

Apache Lucene™ Integration Reference Guide

BI SEARCH and TEXT ANALYTICS New Additions to the BI Technology Stack

Apache Lucene - Overview

Hot Technologies” Within the O*NET® System

Text Analysis: the Next Step in Search