Elasticsearch in Action MEAP
Total Page:16
File Type:pdf, Size:1020Kb
MEAP Edition Manning Early Access Program Elasticsearch in Action Version 11 Copyright 2014 Manning Publications For more information on this and other Manning titles go to www.manning.com ©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders. http://www.manning-sandbox.com/forum.jspa?forumID=871 Licensed to zhailiang UNKNOWN <[email protected]> Welcome Thanks for purchasing the MEAP for Elasticsearch in Action. We're excited to see the book reach this stage, and we're looking forward to its continued development and eventual release. This is an intermediate level book, designed for anyone writing applications using Elasticsearch, or responsible for managing Elasticsearch in a production environment. We've worked to make the content both approachable and meaningful, and to explain not just how to do things with Elasticsearch, but also why things are done the way they are. We feel it's important to know Elasticsearch's foundations prior to diving into all of the different ways you can leverage it. We’re releasing the first five chapters to start. Chapter 1 explains what Elasticsearch is: what a typical use-case looks like, the tasks it's good at, and the challenges it faces. By the end of chapter one, you'll know if Elasticsearch is likely to be a good fit for you and what you can expect from it. You will also learn various ways to install it. Chapter 2 is about getting your hands dirty with Elasticsearch's functionality. You will understand how data is organized, both logically and physically. By the end, you'll know how to index and search for documents, as well as how to configure Elasticsearch. Chapter 3 is all about indexing, updating and deleting documents. You will understand the types of fields you can have in your documents – all of which goes into your mapping definition, as well as how updating and deleting data works under the covers. Chapter 4 is all about searching. You will understand most types of queries, and when you'll use which. You will also get a clear idea about the difference between queries and filters: how they work and where you'd use which. Chapter 5 explains analysis, which is a big chunk of how Elasticsearch can make your searches fast, and also return the right results in the right order. Analysis is the process of making tokens out of your text and by the end, you'll understand how it works and how to configure analysis for your use-case. Looking ahead in part 2, other chapters will deal with more advanced functionality: how to make searches more relevant – where you'll discover some more query types; how to work with relational data, and how to extract statistics from your data in real-time, using aggregations. We'll also show how to make Elasticsearch perform to your production standards: we'll talk about scaling out, tuning indexing and search performance, as well as administering your cluster. As you're reading, we hope you’ll take advantage of the Author Online forum. We’ll be reading your comments and responding. We appreciate any feedback, as it's very helpful in the development process. Thanks again! Matthew Lee Hinman and Radu Gheorghe ©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders. http://www.manning-sandbox.com/forum.jspa?forumID=871 Licensed to zhailiang UNKNOWN <[email protected]> brief contents PART 1: CORE ELASTICSEARCH FUNCTIONALITY 1 Introducing Elasticsearch 2 Diving into the functionality 3 Indexing, updating and deleting data 4 Searching your data 5 Analyzing your data 6 Searching with relevancy 7 Exploring your data with Aggregations 8 Relations among documents PART 2: ADVANCED FUNCTIONALITY 9 Scaling out 10 Improving performance 11 Administering your cluster APPENDIXES: Appendix A Working with geo-spatial data Appendix B Plugins Appendix C Highlighting Appendix D Introduction to the Java API Appendix E JVM and JC tuning 101 ©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders. http://www.manning-sandbox.com/forum.jspa?forumID=871 Licensed to zhailiang UNKNOWN <[email protected]> 1 1 Introducing Elasticsearch This chapter covers • Understanding search engines and the issues they address • How Elasticsearch fits in the context of search engines • Typical scenarios for Elasticsearch • Features Elasticsearch provides • Installing Elasticsearch We use search everywhere these days. And that’s a good thing, because search helps you finish tasks quickly and easily. Whether you’re buying something from an online shop or visiting a blog, you expect to have a search box somewhere to help you find what you’re looking for, without scanning the entire website. Maybe it’s me, but when I wake up in the morning, I wish I could enter the kitchen and type in “bowl” in a search box somewhere and have my favorite bowl highlighted. We’ve also come to expect those search boxes to be smart. I don’t want to have to type the entire word “bowl”; I expect the search box to come up with suggestions, and I don’t want results and suggestions to come to me in random order. I want the search to be smart and give me the most relevant results first: to guess what I want, if that’s possible. For example, if I search for “laptop” from an online shop but have to scroll through laptop accessories before I get to a laptop, I’m likely to go somewhere else after the first page of results. And this isn’t only because we’re in a hurry and spoiled with good search interfaces, it’s also because there’s increasingly more stuff to choose from. For example, a friend asked me to help her buy a new laptop. Typing “best laptop for my friend” in the search box of an online store that sells ©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders. http://www.manning-sandbox.com/forum.jspa?forumID=871 Licensed to zhailiang UNKNOWN <[email protected]> 2 thousands of laptops wouldn’t be effective. Good keyword search is often not enough; you need aggregated data so you can narrow down the results to what the user is interested in. I narrowed down my laptop search by selecting the size of the screen, the price range, and so on, until I had five or so laptops to choose from. Finally, there’s the matter of performance—because nobody wants to wait. I’ve seen websites where you search for something and get the results in few minutes. Minutes! For a search. If you want to provide search for your data, you’ll have to deal with all these issues: returning relevant search results, returning statistics, and doing all that quickly. This is where search engines like Elasticsearch come into play because they’re built to meet exactly those challenges. You can deploy a search engine on top of a relational database to create indexes and speed up the SQL queries. Or, you can index data from your NoSQL data store to add search capabilities there. You can do that with Elasticsearch, and it works well with document- oriented stores like MongoDB because data is represented in Elasticsearch as documents, too. Modern search engines like Elasticsearch also do a good job storing your data, so you can use it as a NoSQL data store with powerful search capabilities. Elasticsearch is open-source, distributed, and it’s built on top of Apache Lucene, an open-source search engine library, which allows you to implement search functionality in your own Java application. Elasticsearch takes this functionality and extends it to make storing, indexing, and searching faster, easier, and, as the name suggests, elastic. Also, your application doesn’t need to be written in Java to work with Elasticsearch; you can send data over HTTP in JSON to index, search, and manage your Elasticsearch cluster. In this chapter, we expand on these searching and data features, and you’ll learn to use them throughout this book. First, let’s take a closer look at the challenges search engines are typically confronted with and Elasticsearch’s approach to solving them. 1.1 Elasticsearch as a search engine To get a better idea of how Elasticsearch works, let’s look at an example. Imagine that you’re working on a website that hosts blogs, and you want to let users search across the entire site for specific posts. Your first task is to implement keyword search. For example, if a user searches for “elections”, you’d better return all posts containing that word. A search engine will do that for you, but for a robust search feature, you need more than that: results need to come in quickly, and they need to be relevant. And it’s also nice to provide features that help users search when they don’t know the exact words of what they’re looking for. Those features include detecting typos, providing suggestions, and breaking down results into categories. TIP In most of this chapter, you’ll get an overview of Elasticsearch’s features. If you want to get practical and jump to installing it, skip to section 1.5.