THE ERA OF BIG DATA: From IoT to NewSQL
Daniela Barreiro Claro Outline
The era of Big Data
RDBMS
NOSQL
NewSQL
Big Data Analytics
Where is our course?
Prof. Daniela Barreiro Claro 2 de X;X=38 Introduction
Big Data Are you ready for the BigData era? RDBMS NOSQL NewSQL Data Analytics Our course
3 de X Prof. Daniela Barreiro Claro Introduction
Big Data Are you ready for the BigData era? RDBMS NOSQL NewSQL Data Analytics Our course
Prof. Daniela Barreiro Claro Introduction
Big Data Big Data = cloud+social+mobile RDBMS NOSQL NewSQL Data Analytics Our course
Prof. Daniela Barreiro Claro Introduction
Big Data What is BIG DATA? RDBMS NOSQL Big data is data that exceeds the processing NewSQL capacity of conventional database systems. Data Analytics The data is too big, moves too fast, or doesn’t fit Our course the structures of a database architecture
The buzzword started by 2012
FORMAS - UFBA 6 de X Internet of Things
Big Data Physical Objects RDBMS NOSQL + NewSQL Controller, Sensor, and Actuators Data Analytics + Our course Internet = Internet of Things
1. Adrian McEwen & Hakim Cassimally. Designing the Internet of Things, 7 de X Internet of Things
Big Data Integrate things into RDBMS NOSQL the existing web NewSQL HTML and REST Data Analytics Our course Smart things
FORMAS - UFBA 8 de X Introduction
Big Data RDBMS are 25- year-old legacy code lines RDBMS NOSQL that should be retired in favor of a collection NewSQL of from-scratch specialized engines Data Analytics (Stonebraker et al.) Our course Are we really prepared to the death of the relational area?
FORMAS - UFBA 9 de X RDBMS
Big Data One-size-fits-all RDBMS NOSQL If you wanted to NewSQL build Data Analytics an ecommerce shop Our course a banking core rental car website Database skills: You need to deeply know about a UNIQUE RDBMS 10 de X Prof. Daniela Barreiro Claro RDBMS
Big Data RDBMS NOSQL NewSQL Data Analytics Our course
11 de X Prof. Daniela Barreiro Claro RDBMS
Big Data RDBMS NOSQL NewSQL Data Analytics Our course
12 de X Prof. Daniela Barreiro Claro RDBMS
Big Data Strengths Drawbacks: RDBMS Experts in only one experts in only one database NOSQL database technology technology. Standard NewSQL Vertical scalability SQL Data Analytics Hard and costly to make horizontal Security (ACID) scalability Our course Triggers Models do not fit all cases
Joins Structured
Composed keys Do not deal well with non structured data Structured 13 de X Prof. Daniela Barreiro Claro RDBMS
Big Data ACID are absolutely essential for most operational systems and RDBMS online transaction processing systems, including retail, banking, NOSQL and finance
NewSQL ACID compliance may not be important to
Data Analytics a search engine that may return different results to two users Our course simultaneously, or
to Amazon when returning sets of different reviews to two users.
In these applications, speed and performance triumph the consistency of the results.
FORMAS - UFBA 14 de X NOSQL
Big Data No SQL then Not Only SQL RDBMS NOSQL Non structured NewSQL Eventual consistency Data Analytics Cap Theorem (Consistency, Availability, Partitions Our course tolerance)
Main memory
Data stored in graphs, key-value, columns format
FORMAS - UFBA 15 de X NOSQL
Big Data RDBMS NOSQL NewSQL Data Analytics Our course
FORMAS - UFBA 16 de X NOSQL
Big Data Strengts Drawbacks RDBMS High performance Flexible schema NOSQL Horizontal scalability NewSQL It is not secure at all Data Analytics Diversity of models Eventual consistency Our course Flexible schema There is not a standard High availability query language Manage well non structured data and big data
FORMAS - UFBA 17 de X NOSQL
Big Data 3-4 “V”s RDBMS NOSQL Volume NewSQL Variety Data Analytics Velocity Our course Value
18 de X NOSQL
Big Data Cap theorem: RDBMS You can only have Few two out of three NOSQL solutions Consistency, are here NewSQL Partition tolerance, Data Analytics Availability Our course
Most NOSQL lives here NOSQL
Big Data RDBMS NOSQL NewSQL Data Analytics Our course
FORMAS - UFBA NOSQL
Big Data RDBMS NOSQL NewSQL Data Analytics Our course
Prof. Daniela Barreiro Claro 21 de X NOSQL
Big Data Analytical queries RDBMS NOSQL NewSQL select sum(salary) Data Analytics from customerperson Our course
Prof. Daniela Barreiro Claro 22 de X NOSQL
Big Data Compression RDBMS Poor compression NOSQL ratio (low repetition) NewSQL Data Analytics Our course Good compression ratio (high repetition)
Prof. Daniela Barreiro Claro 23 de X NOSQL
Big Data Insertion RDBMS NOSQL NewSQL Insert * into customerperson Data Analytics Our course
Prof. Daniela Barreiro Claro 24 de X NewSQL
Big Data A problem situation
RDBMS Perhaps you have gigabytes to terabytes of data that needs high-speed NOSQL transactional access. NewSQL You have an incoming event stream (sensors, mobile phones, network access points) and need per-event transactions to compute responses and Data Analytics analytics in real time. Our course Your problem follows a pattern of “ingest, analyze, decide,” where the analytics and the decisions must be calculated per-request and not post- hoc in batch processing.
FORMAS - UFBA 25 de X NewSQL
Big Data A problem situation
RDBMS Perhaps you have gigabytes to terabytes of data that needs high-speed NOSQL transactional access. NewSQL You have an incoming event stream (sensors, mobile phones, network access points) and need per-event transactions to compute responses and Data Analytics analytics in real time. Our course Your problem follows a pattern of “ingest, analyze, decide,” where the analytics and the decisions must be calculated per-request and not post- hoc in batch processing.
FORMAS - UFBA 26 de X NewSQL
Big Data It is a new concept from 2011 RDBMS NOSQL Bring together the best of relational NewSQL database and the best of NOSQL Data Analytics Our course More tables…distributed database
FORMAS - UFBA 27 de X NewSQL
Big Data RDBMS NOSQL NewSQL Data Analytics Our course
FORMAS - UFBA 28 de X NewSQL
Big Data Strengths Drawbacks RDBMS ACID Model does not fit all cases NOSQL SQL Does not tackle well with NewSQL non structured data Data Analytics Standard Structured Our course Structured New concept (2011) High performance Do not have resources, tools Horizontal scalability as relational and nosql High availability FORMAS - UFBA 29 de X NewSQL
Big Data NuoDB
RDBMS a cluster-first SQL database with a focus on cloud: NOSQL run on many nodes across many datacenters NewSQL let the underlying system manage data locality and consistency for you
Data Analytics NuoDB is the closest to being called eventually consistent of Our course the NewSQL systems
Hekaton
adds sophisticated in-memory processing to the more traditional Microsoft SQL Server.
FORMAS - UFBA 30 de X NewSQL
Big Data MemSQL
RDBMS often offers faster OLAP analytics than all-in-one OldSQL systems, NOSQL with higher concurrency and the ability to update data as it’s NewSQL being analyzed Data Analytics focus on clustered analytics Our course Distributed, with MySQL compatibility
VoltDB
the most mature of these systems, combines streaming analytics, strong ACID guarantees and native clustering
FORMAS - UFBA 31 de X NewSQL
Big Data VoltDB RDBMS Is the system-of-record for data-intensive applications, while NOSQL offering an integrated high-throughput, low-latency NewSQL ingestion engine. Data Analytics It’s a great choice for policy enforcement, fraud/anomaly Our course detection, or other fast-decisioning apps
FORMAS - UFBA 32 de X RDBMS x NOSQL x NewSQL
FORMAS - UFBA 33 de X Data Analytics
Big Data Traditional approach
RDBMS Decision makers wait for reports from disparate OLTP systems NOSQL Put it all together in a spread-sheet NewSQL Highly manual process Data Analytics In the Web context Our course Data capture at the user interaction level:
in contrast to the client transaction level in the Enterprise context
As a consequence the amount of data increases significantly
Greater need to analyze such data to understand user behaviors FORMAS - UFBA 34 de X
Data Analytics
Big Data Scalability to large data volumes:
RDBMS Scan 100 TB on 1 node @ 50 MB/sec = 23 days
NOSQL Scan on 1000-node cluster = 33 minutes NewSQL Divide-And-Conquer (i.e., data partitioning)
Data Analytics Cost-efficiency:
Our course Commodity nodes (cheap, but unreliable)
Commodity network
Automatic fault-tolerance (fewer admins)
Easy to use (fewer programmers)
FORMAS - UFBA 35 de X Data Analytics Future: Learning?
Big Data Evolution RDBMS Strategic: NOSQL Mining & NewSQL Statistics Data Analytics Our course Tactical: Data Analysis
Operational: Reporting 36 de X Big Data Analytics
Big Data RDBMS NOSQL NewSQL Data Analytics Our course
FORMAS - UFBA 37 de X Big Data Analytics
Big Data Big data analytics is the process of examining large data sets RDBMS containing a variety of data types (i.e. Big Data) to discover hidden NOSQL patterns, unknown correlations, market trends, customer preferences and other useful business information. NewSQL To analyze large volumes of transaction data, as well as other forms Data Analytics of data Our course Examples: Web server logs and Internet stream data, social media content and social network activity reports, text from customer emails and survey responses, mobile- phone call detail records and machine data captured by sensors connected to the Internet of Things.
FORMAS - UFBA 38 de X Big Data Analytics
Big Data Traditional analytical tools comprise basic business intelligence RDBMS examine historical data
NOSQL Tools for advanced analytics NewSQL focus on forecasting future events and behaviors, allowing businesses to conduct what- if analyses to predict the effects of potential changes in business strategies. Data Analytics Predictive analytics, data mining, big data analytics, and location Our course intelligence are just some of the analytical categories that fall under the heading of advanced analytics.
These technologies are widely used in industries including marketing, healthcare, risk management, and economics.
FORMAS - UFBA 39 de X Where is our course?
Big Data Data Analytics RDBMS NOSQL Big Data Analytics NewSQL Data Mining for Structured Data Data Analytics Our course
FORMAS - UFBA 40 de X Prof. Daniela Barreiro Claro Email: [email protected]
Semantic Applications and Formalisms Research Group www.formas.ufba.br /formasresearchgroup /formasresearchgroup