Copyright © 2016 Splunk Inc. Use Splunk With Big Data Repositories Like Spark, Solr, Hadoop And Nosql Storage
Raanan Dagan, May Long Big Data Architect, Splunk Disclaimer
During the course of this presenta on, we may make forward looking statements regarding future events or the expected performance of the company. We cau on you that such statements reflect our current expecta ons and es mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward- looking statements made in the this presenta on are being made as of the me and date of its live presenta on. If reviewed a er its live presenta on, this presenta on may not contain current or accurate informa on. We do not assume any obliga on to update any forward looking statements we may make. In addi on, any informa on about our roadmap outlines our general product direc on and is subject to change at any me without no ce. It is for informa onal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obliga on either to develop the features or func onality described or to include any such feature or func onality in a future release.
2 Agenda
Use Cases: Fraud With Solr, Splunk, And Splunk Analy cs For Hadoop Business Analy cs With Cassandra, Splunk Cloud, And Splunk Analy cs For Hadoop Document Classifica on With Spark And Splunk Network IT With Ka a And Splunk Ka a Add On Demo
3 Fraud With Solr, Splunk, And Splunk Analy cs For Hadoop Use Case: Fraud – Why Apache Solr
Apache Solr is an open source enterprise search pla orm from the Apache Lucene API. Its major features include full-text search, hit highligh ng, faceted search, and real- me indexing.
1. Problem: To scan a whole month of data or longer looking for a key words in Splunk Analy cs for Hadoop takes long me since we have many log files to search through. 2. Goal: We would like to limit number of files to search and reduce number of map/ reduce jobs to run. 3. Solu on: To achieve this affect, we have created summary of key words in Solr. This Solr summary data contains key words and in which HDFS files the key words are found in.
5 Architecture Splunk / Splunk Analytics for Hadoop
2 3
Splunk Indexers Hadoop Raw Data Hadoop Indexed Data Real-Time Data 1 Fraud – Technical Details
Hadoop - Solr Splunk – Solr Cassandra - Splunk Analy cs for Hadoop • Solr monitors any changes • Splunk Form dashboard • Splunk Analy cs for to Hadoop Directory • User enter Key words Hadoop runs MapReduce • Index Key words based on • Python script calls Solr Hadoop jobs with the Hadoop Source files • Solr returns to Splunk all Specific Source Files Hadoop source files with • Eliminates massive the Key words Hadoop scan
Hadoop Raw data Hadoop Indexed data Splunk Search Head
7 Business Analy cs With Cassandra, Splunk Cloud, And Splunk Analy cs For Hadoop Business Analy cs – Why Cassandra
Apache Cassandra is an open-source distributed NoSQL database system designed to handle large amounts of data across cluster of commodity servers.
1. Problem: Lack of Visibility into customer behavior from Mobile Applica ons. 2. Goal: Visualize and analyze all data that is stored in Cassandra. 3. Solu on: To achieve this affect, we stored all Mobile ac vity into Cassandra and use Splunk Analy cs for Hadoop add on for Cassandra to query that data.
9 Architecture Splunk Cloud
4
Splunk Cloud Splunk Heavy UF 3 Forwarder / Splunk Analy cs for Hadoop 2 1
Cassandra RabbitMQ iPhone Apps
10 Business Analy cs – Technical Details
Cassandra - Splunk Analy cs for Splunk Analy cs for Hadoop – Summary Index – Splunk Cloud Hadoop Summary Index • Splunk Analy cs for Hadoop • index = cassandra_weathercql • Output.conf [tcpout] Add On for Cassandra | table * And forwardedindex.0.whitelist = • [cassandra_weathercql] Schedule Search SummaryIndex vix.provider = cassandra_erp • index = cassandra_weathercql • SummaryIndex for 5 Min vix.cassandra.cql.cmd = | Collect SummaryIndex • Use the normal Splunk SELECT * FROM Cloud UF weathercql.monthly
Splunk Search Head Summary Index Cassandra Splunk Cloud
11 Document Classifica on With Spark And Splunk Document Classifica on – Why Spark
Apache Spark provides APIs that provides a fast processing and it was developed in response to limita ons into the Hadoop MapReduce cluster compu ng paradigm. The main components for Spark are: Core, SQL, Machine Learning, Stream, and Graph APIs..
1. Problem: Spark processing does not provides an easy analy cs or any visualiza on. 2. Goal: Allows analysts and regulators the ability to know exactly where each file exists in the system. 3. Solu on: Apache Nifi collect all new files from NFS and store it on Hadoop. Spark Core, Spark Machine Learning, and Apache Tika creates Metadata classifica on. Splunk Analy cs for Hadoop expose Metadata classifica on files to end users.
13 Architecture Splunk / Splunk Analytics for Hadoop Search Head
5
4 Hadoop Metadata
3 2 1 Machine Learning Extracts Metadata Hadoop Raw data Transport Documents
14 Network IT With Ka a And Splunk Ka a Add On Network IT – Why Ka a
Apache Ka a is a fast publish-subscribe messaging system. A single Ka a broker can handle hundreds of megabytes of reads and writes per second from thousands of clients a indexing.
1. Problem: No unified collec on framework. 2. Goal: Real Time visualiza on and analy cs using Splunk, Batch visualiza on and analy cs using Hadoop and RDBMS. 3. Solu on: To achieve this affect, we used a Splunk Add-on for Ka a.
16 Architecture Splunk Indexers Splunk Add-on for Kafka Data Consumers 2 3 • Real- me, fast Data Producers analy c • Alerts CEP 1 • Network Data Debugging Kafka • Dashboards Window Logs NoSQL Cisco Logs Real-Time Pipeline Mobile Data Security Logs Batch Pipeline Streaming H p Server data 2 • Ad-hoc 3 explora on • Data Science Oracle
17 Network IT – Technical Details
Ka a - Splunk Add-on for Ka a
• Splunk Add-on for Ka a • index = networkdata • ka a_brokers = sandbox:6667 • ka a_par on_offset = earliest • ka a_topic_whitelist = truckevent • Search • index=networkdata sourcetype="ka a:topicEvent"
18 Demo Addi onal Resources
Use Cases: • Fraud with Solr: h ps://lucidworks.com/solu ons/lucidworks-splunk-connector/ • Business Analy cs with Cassandra: h ps://splunkbase.splunk.com/app/2668/ • Document classifica on with Spark: h ps://splunkbase.splunk.com/app/2686/ (Spark SQL) • Network IT with Ka a: h ps://splunkbase.splunk.com/app/2935/
20
THANK YOU