Use Splunk with Big Data Repositories Like Spark, Solr, Hadoop and Nosql Storage
Total Page:16
File Type:pdf, Size:1020Kb
Copyright © 2016 Splunk Inc. Use Splunk With Big Data Repositories Like Spark, Solr, Hadoop And Nosql Storage Raanan Dagan, May Long Big Data Architect, Splunk Disclaimer During the course of this presentaon, we may make forward looking statements regarding future events or the expected performance of the company. We cauJon you that such statements reflect our current expectaons and esJmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward- looking statements made in the this presentaon are being made as of the Jme and date of its live presentaon. If reviewed aer its live presentaon, this presentaon may not contain current or accurate informaon. We do not assume any obligaon to update any forward looking statements we may make. In addiJon, any informaon about our roadmap outlines our general product direcJon and is subject to change at any Jme without noJce. It is for informaonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaon either to develop the features or funcJonality described or to include any such feature or funcJonality in a future release. 2 Agenda Use Cases: Fraud With Solr, Splunk, And Splunk AnalyJcs For Hadoop Business AnalyJcs With Cassandra, Splunk Cloud, And Splunk AnalyJcs For Hadoop Document Classificaon With Spark And Splunk Network IT With Kaa And Splunk Kaa Add On Demo 3 Fraud With Solr, Splunk, And Splunk AnalyJcs For Hadoop Use Case: Fraud – Why Apache Solr Apache Solr is an open source enterprise search plaorm from the Apache Lucene API. Its major features include full-text search, hit highlighJng, faceted search, and real-Jme indexing. 1. Problem: To scan a whole month of data or longer looking for a key words in Splunk AnalyJcs for Hadoop takes long Jme since we have many log files to search through. 2. Goal: We would like to limit number of files to search and reduce number of map/ reduce jobs to run. 3. Soluon: To achieve this affect, we have created summary of key words in Solr. This Solr summary data contains key words and in which HDFS files the key words are found in. 5 Architecture Splunk / Splunk Analytics for Hadoop 2 3 Splunk Indexers Hadoop Raw Data Hadoop Indexed Data Real-Time Data 1 Fraud – Technical Details Hadoop - Solr Splunk – Solr Cassandra - Splunk Analy<cs for Hadoop • Solr monitors any changes • Splunk Form dashboard • Splunk Analy<cs for to Hadoop Directory • User enter Key words Hadoop runs MapReduce • Index Key words based on • Python script calls Solr Hadoop jobs with the Hadoop Source files • Solr returns to Splunk all Specific Source Files Hadoop source files with • Eliminates massive the Key words Hadoop scan Hadoop Raw data Hadoop Indexed data Splunk Search Head 7 Business AnalyJcs With Cassandra, Splunk Cloud, And Splunk AnalyJcs For Hadoop Business AnalyJcs – Why Cassandra Apache Cassandra is an open-source distributed NoSQL database system designed to handle large amounts of data across cluster of commodity servers. 1. Problem: Lack of Visibility into customer behavior from Mobile Applicaons. 2. Goal: Visualize and analyze all data that is stored in Cassandra. 3. Soluon: To achieve this affect, we stored all Mobile acJvity into Cassandra and use Splunk AnalyJcs for Hadoop add on for Cassandra to query that data. 9 Architecture Splunk Cloud 4 Splunk Cloud Splunk Heavy UF 3 Forwarder / Splunk AnalyJcs for Hadoop 2 1 Cassandra RabbitMQ iPhone Apps 10 Business AnalyJcs – Technical Details Cassandra - Splunk Analy<cs for Splunk Analy<cs for Hadoop – Summary Index – Splunk Cloud Hadoop Summary Index • Splunk Analy<cs for Hadoop • index = cassandra_weathercql • Output.conf [tcpout] Add On for Cassandra | table * And forwardedindex.0.whitelist = • [cassandra_weathercql] Schedule Search SummaryIndex vix.provider = cassandra_erp • index = cassandra_weathercql • SummaryIndex for 5 Min vix.cassandra.cql.cmd = | Collect SummaryIndex • Use the normal Splunk SELECT * FROM Cloud UF weathercql.monthly Splunk Search Head Summary Index Cassandra Splunk Cloud 11 Document Classificaon With Spark And Splunk Document Classificaon – Why Spark Apache Spark provides APIs that provides a fast processing and it was developed in response to limitaons into the Hadoop MapReduce cluster compuJng paradigm. The main components for Spark are: Core, SQL, Machine Learning, Stream, and Graph APIs.. 1. Problem: Spark processing does not provides an easy analyJcs or any visualizaon. 2. Goal: Allows analysts and regulators the ability to know exactly where each file exists in the system. 3. Soluon: Apache Nifi collect all new files from NFS and store it on Hadoop. Spark Core, Spark Machine Learning, and Apache Tika creates Metadata classificaon. Splunk AnalyJcs for Hadoop expose Metadata classificaon files to end users. 13 Architecture Splunk / Splunk Analytics for Hadoop Search Head 5 4 Hadoop Metadata 3 2 1 Machine Learning Extracts Metadata Hadoop Raw data Transport Documents 14 Network IT With Kaa And Splunk Kaa Add On Network IT – Why Kaa Apache KaVa is a fast publish-subscribe messaging system. A single Kaa broker can handle hundreds of megabytes of reads and writes per second from thousands of clients a indexing. 1. Problem: No unified collecJon framework. 2. Goal: Real Time visualizaon and analyJcs using Splunk, Batch visualizaon and analyJcs using Hadoop and RDBMS. 3. Soluon: To achieve this affect, we used a Splunk Add-on for Kaa. 16 Architecture Splunk Indexers Splunk Add-on for Kafka Data Consumers 2 3 • Real-Jme, fast Data Producers analyJc • Alerts CEP 1 • Network Data Debugging Kafka • Dashboards Window Logs NoSQL Cisco Logs Real-Time Pipeline Mobile Data Security Logs Batch Pipeline Streaming Hp Server data 2 • Ad-hoc 3 exploraon • Data Science Oracle 17 Network IT – Technical Details KaVa - Splunk Add-on for KaVa • Splunk Add-on for KaVa • index = networkdata • kaa_brokers = sandbox:6667 • kaa_parJJon_offset = earliest • kaa_topic_whitelist = truckevent • Search • index=networkdata sourcetype="kaa:topicEvent" 18 Demo AddiJonal Resources Use Cases: • Fraud with Solr: hops://lucidworks.com/soluJons/lucidworks-splunk-connector/ • Business AnalyJcs with Cassandra: hops://splunkbase.splunk.com/app/2668/ • Document classificaon with Spark: hops://splunkbase.splunk.com/app/2686/ (Spark SQL) • Network IT with Kaa: hops://splunkbase.splunk.com/app/2935/ 20 THANK YOU .