Copyright © 2016 Splunk Inc. Use Splunk With Big Data Repositories Like Spark, Solr, Hadoop And Nosql Storage

Raanan Dagan, May Long Big Data Architect, Splunk Disclaimer

During the course of this presentaon, we may make forward looking statements regarding future events or the expected performance of the company. We cauon you that such statements reflect our current expectaons and esmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward- looking statements made in the this presentaon are being made as of the me and date of its live presentaon. If reviewed aer its live presentaon, this presentaon may not contain current or accurate informaon. We do not assume any obligaon to update any forward looking statements we may make. In addion, any informaon about our roadmap outlines our general product direcon and is subject to change at any me without noce. It is for informaonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaon either to develop the features or funconality described or to include any such feature or funconality in a future release.

2 Agenda

Use Cases: Fraud With Solr, Splunk, And Splunk Analycs For Hadoop Business Analycs With Cassandra, Splunk Cloud, And Splunk Analycs For Hadoop Document Classificaon With Spark And Splunk Network IT With Kaa And Splunk Kaa Add On Demo

3 Fraud With Solr, Splunk, And Splunk Analycs For Hadoop Use Case: Fraud – Why

Apache Solr is an open source enterprise search plaorm from the API. Its major features include full-text search, hit highlighng, faceted search, and real-me indexing.

1. Problem: To scan a whole month of data or longer looking for a key words in Splunk Analycs for Hadoop takes long me since we have many log files to search through. 2. Goal: We would like to limit number of files to search and reduce number of map/ reduce jobs to run. 3. Soluon: To achieve this affect, we have created summary of key words in Solr. This Solr summary data contains key words and in which HDFS files the key words are found in.

5 Architecture Splunk / Splunk Analytics for Hadoop

2 3

Splunk Indexers Hadoop Raw Data Hadoop Indexed Data Real-Time Data 1 Fraud – Technical Details

Hadoop - Solr Splunk – Solr Cassandra - Splunk Analycs for Hadoop • Solr monitors any changes • Splunk Form dashboard • Splunk Analycs for to Hadoop Directory • User enter Key words Hadoop runs MapReduce • Index Key words based on • Python script calls Solr Hadoop jobs with the Hadoop Source files • Solr returns to Splunk all Specific Source Files Hadoop source files with • Eliminates massive the Key words Hadoop scan

Hadoop Raw data Hadoop Indexed data Splunk Search Head

7 Business Analycs With Cassandra, Splunk Cloud, And Splunk Analycs For Hadoop Business Analycs – Why Cassandra

Apache Cassandra is an open-source distributed NoSQL database system designed to handle large amounts of data across cluster of commodity servers.

1. Problem: Lack of Visibility into customer behavior from Mobile Applicaons. 2. Goal: Visualize and analyze all data that is stored in Cassandra. 3. Soluon: To achieve this affect, we stored all Mobile acvity into Cassandra and use Splunk Analycs for Hadoop add on for Cassandra to query that data.

9 Architecture Splunk Cloud

4

Splunk Cloud Splunk Heavy UF 3 Forwarder / Splunk Analycs for Hadoop 2 1

Cassandra RabbitMQ iPhone Apps

10 Business Analycs – Technical Details

Cassandra - Splunk Analycs for Splunk Analycs for Hadoop – Summary Index – Splunk Cloud Hadoop Summary Index • Splunk Analycs for Hadoop • index = cassandra_weathercql • Output.conf [tcpout] Add On for Cassandra | table * And forwardedindex.0.whitelist = • [cassandra_weathercql] Schedule Search SummaryIndex vix.provider = cassandra_erp • index = cassandra_weathercql • SummaryIndex for 5 Min vix.cassandra.cql.cmd = | Collect SummaryIndex • Use the normal Splunk SELECT * FROM Cloud UF weathercql.monthly

Splunk Search Head Summary Index Cassandra Splunk Cloud

11 Document Classificaon With Spark And Splunk Document Classificaon – Why Spark

Apache Spark provides APIs that provides a fast processing and it was developed in response to limitaons into the Hadoop MapReduce cluster compung paradigm. The main components for Spark are: Core, SQL, Machine Learning, Stream, and Graph APIs..

1. Problem: Spark processing does not provides an easy analycs or any visualizaon. 2. Goal: Allows analysts and regulators the ability to know exactly where each file exists in the system. 3. Soluon: Apache Nifi collect all new files from NFS and store it on Hadoop. Spark Core, Spark Machine Learning, and creates Metadata classificaon. Splunk Analycs for Hadoop expose Metadata classificaon files to end users.

13 Architecture Splunk / Splunk Analytics for Hadoop Search Head

5

4 Hadoop Metadata

3 2 1 Machine Learning Extracts Metadata Hadoop Raw data Transport Documents

14 Network IT With Kaa And Splunk Kaa Add On Network IT – Why Kaa

Apache Kaa is a fast publish-subscribe messaging system. A single Kaa broker can handle hundreds of megabytes of reads and writes per second from thousands of clients a indexing.

1. Problem: No unified collecon framework. 2. Goal: Real Time visualizaon and analycs using Splunk, Batch visualizaon and analycs using Hadoop and RDBMS. 3. Soluon: To achieve this affect, we used a Splunk Add-on for Kaa.

16 Architecture Splunk Indexers Splunk Add-on for Kafka Data Consumers 2 3 • Real-me, fast Data Producers analyc • Alerts CEP 1 • Network Data Debugging Kafka • Dashboards Window Logs NoSQL Cisco Logs Real-Time Pipeline Mobile Data Security Logs Batch Pipeline Streaming Hp Server data 2 • Ad-hoc 3 exploraon • Data Science Oracle

17 Network IT – Technical Details

Kaa - Splunk Add-on for Kaa

• Splunk Add-on for Kaa • index = networkdata • kaa_brokers = sandbox:6667 • kaa_paron_offset = earliest • kaa_topic_whitelist = truckevent • Search • index=networkdata sourcetype="kaa:topicEvent"

18 Demo Addional Resources

Use Cases: • Fraud with Solr: hps://lucidworks.com/soluons/lucidworks-splunk-connector/ • Business Analycs with Cassandra: hps://splunkbase.splunk.com/app/2668/ • Document classificaon with Spark: hps://splunkbase.splunk.com/app/2686/ (Spark SQL) • Network IT with Kaa: hps://splunkbase.splunk.com/app/2935/

20

THANK YOU