THE ERA OF BIG DATA: From IoT to NewSQL

Daniela Barreiro Claro Outline

 The era of Big Data

 RDBMS

 NOSQL

 NewSQL

 Big Data Analytics

 Where is our course?

Prof. Daniela Barreiro Claro 2 de X;X=38 Introduction

Big Data  Are you ready for the BigData era? RDBMS NOSQL NewSQL Data Analytics Our course

3 de X Prof. Daniela Barreiro Claro Introduction

Big Data  Are you ready for the BigData era? RDBMS NOSQL NewSQL Data Analytics Our course

Prof. Daniela Barreiro Claro Introduction

Big Data Big Data = cloud+social+mobile RDBMS NOSQL NewSQL Data Analytics Our course

Prof. Daniela Barreiro Claro Introduction

Big Data  What is BIG DATA? RDBMS NOSQL  Big data is data that exceeds the processing NewSQL capacity of conventional systems. Data Analytics  The data is too big, moves too fast, or doesn’t fit Our course the structures of a database architecture

 The buzzword started by 2012

FORMAS - UFBA 6 de X Internet of Things

Big Data Physical Objects RDBMS NOSQL + NewSQL Controller, Sensor, and Actuators Data Analytics + Our course Internet = Internet of Things

1. Adrian McEwen & Hakim Cassimally. Designing the Internet of Things, 7 de X Internet of Things

Big Data  Integrate things into RDBMS NOSQL the existing web NewSQL  HTML and REST Data Analytics Our course Smart things

FORMAS - UFBA 8 de X Introduction

Big Data  RDBMS are 25- year-old legacy code lines RDBMS NOSQL that should be retired in favor of a collection NewSQL of from-scratch specialized engines Data Analytics (Stonebraker et al.) Our course  Are we really prepared to the death of the relational area?

FORMAS - UFBA 9 de X RDBMS

Big Data  One-size-fits-all RDBMS NOSQL  If you wanted to NewSQL build Data Analytics  an ecommerce shop Our course  a banking core  rental car website Database skills: You need to deeply know about a UNIQUE RDBMS 10 de X Prof. Daniela Barreiro Claro RDBMS

Big Data RDBMS NOSQL NewSQL Data Analytics Our course

11 de X Prof. Daniela Barreiro Claro RDBMS

Big Data RDBMS NOSQL NewSQL Data Analytics Our course

12 de X Prof. Daniela Barreiro Claro RDBMS

Big Data  Strengths  Drawbacks: RDBMS  Experts in only one  experts in only one database NOSQL database technology technology.  Standard NewSQL  Vertical scalability  SQL Data Analytics  Hard and costly to make horizontal  Security (ACID) scalability Our course  Triggers  Models do not fit all cases

 Joins  Structured

 Composed keys  Do not deal well with non structured data  Structured  13 de X Prof. Daniela Barreiro Claro RDBMS

Big Data  ACID are absolutely essential for most operational systems and RDBMS online systems, including retail, banking, NOSQL and finance

NewSQL  ACID compliance may not be important to

Data Analytics  a search engine that may return different results to two users Our course simultaneously, or

 to Amazon when returning sets of different reviews to two users.

 In these applications, speed and performance triumph the consistency of the results.

FORMAS - UFBA 14 de X NOSQL

Big Data  No SQL then Not Only SQL RDBMS  NOSQL Non structured NewSQL  Eventual consistency Data Analytics  Cap Theorem (Consistency, Availability, Partitions Our course tolerance)

 Main memory

 Data stored in graphs, key-value, columns format

FORMAS - UFBA 15 de X NOSQL

Big Data RDBMS NOSQL NewSQL Data Analytics Our course

FORMAS - UFBA 16 de X NOSQL

Big Data Strengts Drawbacks RDBMS  High performance  Flexible schema NOSQL  Horizontal scalability NewSQL  It is not secure at all Data Analytics  Diversity of models  Eventual consistency Our course  Flexible schema  There is not a standard  High availability  Manage well non structured data and big data

FORMAS - UFBA 17 de X NOSQL

Big Data  3-4 “V”s RDBMS NOSQL  Volume NewSQL  Variety Data Analytics  Velocity Our course  Value

18 de X NOSQL

Big Data Cap theorem: RDBMS You can only have Few two out of three NOSQL solutions Consistency, are here NewSQL tolerance, Data Analytics Availability Our course

Most NOSQL lives here NOSQL

Big Data RDBMS NOSQL NewSQL Data Analytics Our course

FORMAS - UFBA NOSQL

Big Data RDBMS NOSQL NewSQL Data Analytics Our course

Prof. Daniela Barreiro Claro 21 de X NOSQL

Big Data Analytical queries RDBMS NOSQL NewSQL select sum(salary) Data Analytics from customerperson Our course

Prof. Daniela Barreiro Claro 22 de X NOSQL

Big Data Compression RDBMS Poor compression NOSQL ratio (low repetition) NewSQL Data Analytics Our course Good compression ratio (high repetition)

Prof. Daniela Barreiro Claro 23 de X NOSQL

Big Data Insertion RDBMS NOSQL NewSQL Insert * into customerperson Data Analytics Our course

Prof. Daniela Barreiro Claro 24 de X NewSQL

Big Data  A problem situation

RDBMS  Perhaps you have gigabytes to terabytes of data that needs high-speed NOSQL transactional access. NewSQL  You have an incoming event stream (sensors, mobile phones, network access points) and need per-event transactions to compute responses and Data Analytics analytics in real time. Our course  Your problem follows a pattern of “ingest, analyze, decide,” where the analytics and the decisions must be calculated per-request and not post- hoc in batch processing.

FORMAS - UFBA 25 de X NewSQL

Big Data  A problem situation

RDBMS  Perhaps you have gigabytes to terabytes of data that needs high-speed NOSQL transactional access. NewSQL  You have an incoming event stream (sensors, mobile phones, network access points) and need per-event transactions to compute responses and Data Analytics analytics in real time. Our course  Your problem follows a pattern of “ingest, analyze, decide,” where the analytics and the decisions must be calculated per-request and not post- hoc in batch processing.

FORMAS - UFBA 26 de X NewSQL

Big Data  It is a new concept from 2011 RDBMS NOSQL  Bring together the best of relational NewSQL database and the best of NOSQL Data Analytics Our course  More tables…

FORMAS - UFBA 27 de X NewSQL

Big Data RDBMS NOSQL NewSQL Data Analytics Our course

FORMAS - UFBA 28 de X NewSQL

Big Data Strengths Drawbacks RDBMS  ACID  Model does not fit all cases NOSQL  SQL  Does not tackle well with NewSQL non structured data Data Analytics  Standard  Structured Our course  Structured  New concept (2011)  High performance  Do not have resources, tools  Horizontal scalability as relational and  High availability FORMAS - UFBA 29 de X NewSQL

Big Data  NuoDB

RDBMS  a cluster-first SQL database with a focus on cloud: NOSQL  run on many nodes across many datacenters NewSQL  let the underlying system manage data locality and consistency for you

Data Analytics  NuoDB is the closest to being called eventually consistent of Our course the NewSQL systems

 Hekaton

 adds sophisticated in-memory processing to the more traditional Microsoft SQL Server.

FORMAS - UFBA 30 de X NewSQL

Big Data  MemSQL

RDBMS  often offers faster OLAP analytics than all-in-one OldSQL systems, NOSQL with higher concurrency and the ability to update data as it’s NewSQL being analyzed Data Analytics  focus on clustered analytics Our course  Distributed, with MySQL compatibility

 VoltDB

 the most mature of these systems, combines streaming analytics, strong ACID guarantees and native clustering

FORMAS - UFBA 31 de X NewSQL

Big Data  VoltDB RDBMS  Is the system-of-record for data-intensive applications, while NOSQL offering an integrated high-throughput, low-latency NewSQL ingestion engine. Data Analytics  It’s a great choice for policy enforcement, fraud/anomaly Our course detection, or other fast-decisioning apps

FORMAS - UFBA 32 de X RDBMS x NOSQL x NewSQL

FORMAS - UFBA 33 de X Data Analytics

Big Data  Traditional approach

RDBMS  Decision makers wait for reports from disparate OLTP systems NOSQL  Put it all together in a spread-sheet NewSQL  Highly manual process Data Analytics  In the Web context Our course  Data capture at the user interaction level:

 in contrast to the client transaction level in the Enterprise context

 As a consequence the amount of data increases significantly

 Greater need to analyze such data to understand user behaviors FORMAS - UFBA 34 de X

Data Analytics

Big Data  Scalability to large data volumes:

RDBMS  Scan 100 TB on 1 node @ 50 MB/sec = 23 days

NOSQL  Scan on 1000-node cluster = 33 minutes NewSQL  Divide-And-Conquer (i.e., data partitioning)

Data Analytics  Cost-efficiency:

Our course  Commodity nodes (cheap, but unreliable)

 Commodity network

 Automatic fault-tolerance (fewer admins)

 Easy to use (fewer programmers)

FORMAS - UFBA 35 de X Data Analytics Future: Learning?

Big Data  Evolution RDBMS Strategic: NOSQL Mining & NewSQL Statistics Data Analytics Our course Tactical: Data Analysis

Operational: Reporting 36 de X Big Data Analytics

Big Data RDBMS NOSQL NewSQL Data Analytics Our course

FORMAS - UFBA 37 de X Big Data Analytics

Big Data  Big data analytics is the process of examining large data sets RDBMS containing a variety of data types (i.e. Big Data) to discover hidden NOSQL patterns, unknown correlations, market trends, customer preferences and other useful business information. NewSQL  To analyze large volumes of transaction data, as well as other forms Data Analytics of data Our course  Examples: Web server logs and Internet stream data, social media content and social network activity reports, text from customer emails and survey responses, mobile- phone call detail records and machine data captured by sensors connected to the Internet of Things.

FORMAS - UFBA 38 de X Big Data Analytics

Big Data  Traditional analytical tools comprise basic business intelligence RDBMS examine historical data

NOSQL  Tools for advanced analytics NewSQL  focus on forecasting future events and behaviors, allowing businesses to conduct what- if analyses to predict the effects of potential changes in business strategies. Data Analytics  Predictive analytics, data mining, big data analytics, and location Our course intelligence are just some of the analytical categories that fall under the heading of advanced analytics.

 These technologies are widely used in industries including marketing, healthcare, risk management, and economics.

FORMAS - UFBA 39 de X Where is our course?

Big Data  Data Analytics RDBMS NOSQL  Big Data Analytics NewSQL  Data Mining for Structured Data Data Analytics Our course

FORMAS - UFBA 40 de X Prof. Daniela Barreiro Claro Email: [email protected]

Semantic Applications and Formalisms Research Group www.formas.ufba.br /formasresearchgroup /formasresearchgroup