Elastic Stack Overview

The world’s most popular enterprise open source products for real-time search, logging, analytics, and more Agenda

• Elastic Stack Overview • Architecture • Demos: Logging, Search • Logstash & Beats • Elasticsearch

✦ The Distributed Model

✦ Text Analysis

✦ Search

✦ Aggregation •

Once upon a time … • As any good story begins, “Once up on a time...”

✦ More precisely: in 1999, Doug Cutting created an open-source project called Lucene • Lucene is:

✦ a search engine library entirely written in Java

✦ a top-level Apache project, as of 2005

✦ great for full-text search • But, Lucene is also:

✦ a library (you have to incorporate it into your application)

✦ challenging to use

✦ not originally designed for scaling

The Birth of Elasticsearch

• In 2004, Shay Banon developed a product called Compass

✦ Built on top of Lucene, Shay’s goal was to have search integrated into Java applications as simply as possible • The need for scalability became a top priority • In 2010, Shay completely rewrote Compass with two main objectives: 1. distributed from the ground up in its design 2. easily used by any programming language • He called it Elasticsearch ... and we all lived happily ever after! • Today, Elasticsearch is the most popular enterprise search engine 85,000+ 100M+ 3,000+ Community Product Subscription Members Downloads Customers

Statistics since 2012, founding of Elastic

7 Who is using Elasticsearch? Tech

Finance

Telco

Consumer

Enterprise Customers in Every Industry

9 “Improving patient “Combating our global “Mining 3-4 billion “Many use cases from care with real-time human trafficking events per day to trade optimization to clinical decision problem.” ensure security compliance to HR making.” intelligence.” recruiting.”

Solving Problems Beyond ‘Search’

10

Security

Alerting

Monitoring

X-Pack Reporting Single install Extensions for the Elastic Stack Graph Subscription pricing

Machine Learning

12 Elastic Cloud Hosted Elasticsearch & Kibana Includes X-Pack features

Available in AWS today Available in (Beta) Available as a private cloud/on-premise solution (Elastic Cloud Enterprise)

13 Enterprise Deployment Architecture

Beats Elasticsearch

Master Nodes (3) Custom UI Log Files Metrics

Logstash Ingest Nodes (X) Kibana Wire Data your{beat}

Data Nodes – Hot (X) Kafka

Instances (X) Datastore Web APIs Redis Data Notes – Warm (X) Messaging Nodes (X) Queue Social Sensors X-Pack X-Pack

LDAP AD SSO ES-Hadoop Hadoop Ecosystem Authentication Notification Elastic Stack X-Pack Elastic Cloud

Application Search Log Analytics Security Analytics Metrics Analytics Business Analytics Many more …

Solving many diverse & complex use cases Demo: Apache Logging Logstash Data processing pipeline

Ingest data of all shapes, Parse and dynamically Transport data to any sizes, and sources transform data output

Secure and encrypt data Build your own pipeline More than 200+ plugins inputs Parsing Logs Using Logstash Logstash Configuration Example – Apache Access Logs

input { file { path => "/Users/aquan/Desktop/JUG/demo/access_log" start_position => "beginning" } } filter { if [path] =~ "access" { mutate { replace => { "type" => "apache_access" } } grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } geoip { source => "clientip" } } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } } output { elasticsearch { hosts => ["localhost:9200"] } } Logstash Configuration Example - Spring Boot Logs

filter { # If log line contains tab character followed by 'at' then we will tag that entry as stacktrace if [message] =~ "\tat" { grok { match => ["message", "^(\tat)"] add_tag => ["stacktrace"] } }

# Grokking Spring Boot's default log format grok { match => [ "message", "(?%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) %{LOGLEVEL:level} %{NUMBER:pid} --- \[(?[A-Za-z0-9-]+)\] [A-Za-z0-9.]*\.(?[A-Za-z0- 9#_]+)\s*:\s+(?.*)" ] }

# Parsing out timestamps which are in timestamp field date { match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ] } } Beats Lightweight data shippers

Ship data from the source Ship and centralize in Ship to Logstash for Elasticsearch transformation and parsing Libbeat: API framework to Ship to Elastic Cloud build custom beats 30+ community Beats FILEBEAT METRICBEAT PACKETBEAT WINGLOGBEAT Log Files Metrics Network Data Window Events

Apachebeat, dockbeat, httpbeat, More than 30 community Beats mysqlbeat, nginxbeat, redis beats, and growing … twitterbeat, and more

Elasticsearch Heart of the Elastic Stack

Distributed, Scalable High-availability Multi-tenancy

Developer Friendly Real-time, Full-text Search Aggregations Clusters, Nodes and Indices

Cluster my_cluster Server 1

Node A d1 d3 d6 d2 d1 d4 d7 1

d9 d8 d5 d12 d3 d6 d10 d1 d2 Index twitter d4 d5 Index logs Split Indices into Shards

Cluster my_cluster Server 1

Node A d1 d3 d6 d2 d1 d4 d7 1

d9 d8 d5 d12 d3 d6 d10 d1 d2 Index twitter d4 d5 Index logs Distribute Shards over Multiple Nodes

Cluster my_cluster Server 2 Server 1

twitter shard 1 Node B Node A

d1 d6 d3 twitter d2 d1 1 twitter shard 4 d4 d7 shard 0 d9 d8 d5 d12 d3 d6 d10 twitter d1 d2 shard 2 d4 twitter logs d5 shard 3 shard 0 logs shard 1 CRUD Text Analysis Inverted Index Most think of search as…

SEARCH Multilingual

Full Text Search

Stemming

Type ahead Mobile Time Range

Geo search

Influenced by Rating Personalized Ranking Search

Pagination Time range Filter

Numeric Filter

Geo range Filter

Stemming / Highlighting Demo: e-Commerce Search Search – Finding the Needles in the Haystack

• Relevancy – scoring of a document basedon how closely it matches the query

✦ TF (term frequency): The more a term appears in a field, the more important it is

✦ IDF (inverse document frequency): The more documents that contain the term, the less important the term is

✦ Field length: shorter fields are more likely to be relevant than longer fields • Structured Search • Full-Text Search Structured Search • Answer is always “Yes” or “No” • Does not worry about document relevance or scoring • Filters – very very fast, easily cached, no relevance, use as often as you can

✦ Term Filter, Terms Filter – numbers, Booleans, dates, and text

✦ Bool Filter (compound filter) – must, must_not, should

✦ Range Filter – number, date (date math), string

✦ Exists Filter

✦ Missing Filter • Filter Order – Important for performance

✦ More specific filters should be placed before less-specific filters Full-Text Search • Relevance

✦ The match Query

✦ Multiword Queries – Precision control

๏ Operator: and, or

๏ minimum_should_match

✦ Bool Query - Combining Queries

✦ Boosting Query – boost parameter • Multi-field Search

✦ The multi_match Query

✦ Types: Best, Most, Cross

✦ Boosting Individual Fields - ^ Proximity Matching – Phase Matching • Search for “sue alligator”

✦ Sue ate the alligator

✦ The alligator ate Sue

✦ Sue never goes anywhere without her alligator-skin purse • The match_phrase Query

✦ Find words that are near each other – “quick fox”

✦ Closer is better

✦ Flexibility - slop Partial Matching • The prefix Query • Wildcard and regexp Queries • Completion Suggester

✦ Query-Time Search-as-You-Type

๏ match_phrase_prefix – “johnnie walker bl”

- slop

- max_expensions

✦ Index Time Search-as-You-Type – edge n-grams

๏ “quick” à q, qu, qui, quic, quick

๏ Storage vs. perfromance Dealing with Human Language • Language Analyzers - Many

✦ Tokenize text into individual words – Think about Chinese, no space

✦ Lowercase tokens

✦ Remove stopwords – a, an, and, are, as, at, be, but, for, if, into …

✦ Stem tokens to their root form – foxes à fox • Synonyms – jump, leap, and hop • Dictionary • Typos and misspellings – Fuzzy Query Real-time Reporting & Analytics - Aggregation • Aggregations are a way to perform analytics on your indexed data

✦ Combination of buckets and metrics

✦ Buckets – Collection of document that meet a criteria

✦ Metrics – Statistics calculated on the documents in the bucket • Example: Average salary per combination, in one request with one pass over the data!

✦ Partition documents by country (bucket)

✦ Partition each country by gender (bucket)

✦ Partition each gender bucket by age ranges (bucket)

✦ Calculate the average salary for each age range (metric) Aggregations: Count by Country

GET /person/person/_search?search_type=count { "aggs": { "by_country": { "terms": { "field": "address.country" { ..., "aggregations" : { } "by_country" : { } "buckets" : [ { } "key" : "England", } "doc_count" : 30051 England }, { Germany "key" : "Germany", France "doc_count" : 30004 17% Spain }, { 33% "key" : "France", "doc_count" : 15034 17% }, { "key" : "Spain", 33% "doc_count" : 14912 } ]}}} A lot more … Elasticsearch Clients

• Java API • Java REST Client • JavaScript API • Groovy API • .Net API • PHP API • Perl API • Python API • Ruby API • Community Contributed Clients: B4J, Clojure, Erlang, Go, Groovy, Haskell, Java, JavaScript, kotlin, Lua, .Net, Ocaml, Perl, PHP, Python, R, Ruby, Rust, Scala, Smalltalk, Vert.x Kibana Window into the Elastic Stack

Visualize and analyze Geospatial Customize and Share Reports

Graph Exploration UX to secure and manage Build Custom Apps the Elastic Stack 47 Become an Elastic Pioneer

1 Download 6.0 preview release

2 Provide feedback via GitHub or Discuss forum

3 Get limited edition Pioneer swag Elastic Pioneer Program We want your feedback!

1 Download 6.0 preview release (alpha, beta, etc)

2 Provide feedback via GitHub or Discuss forum

3 Get limited edition Pioneer swag THANK YOU

@elastic www.elastic.co