Stay Tuned for Today’s Session

NAVIGATING THE DATABASE UNIVERSE Dr. Michael Stonebraker and Scott Jarr Navigating the Database Universe A Few Housekeeping Items

• Remember to mute your line

• Type your questions for the presenters in the chat box in the lower right side

• We will answer as many questions as we have time for at the end of the presentation

• If you experience audio difficulties, you can dial in using the following: – Telephone: +1 (626) 544-0059 – Access Code: 228-638-049 – Webinar ID: 912-533-214 About Our Presenters

Mike Stonebraker Scott Jarr Co-founder & CTO, VoltDB! Co-founder & Chief Strategy Officer, VoltDB! ! A pioneer of database research and More than 20 years of experience technology for more than a quarter of a building, launching and growing century, and the main architect of the technology companies from inception to relational DBMS and the object- market leadership in the search, mobile, relational DBMS PostgreSQL security, storage and virtualization markets Agenda

• The (proper) design of DBMSs – Presented by Dr. Michael Stonebraker!

• The database universe

• Where the future value comes from We Believe…

• “Big Data” is a rare, transformative market • Velocity is becoming the cornerstone • Specialized databases (working together) are the answer • Products must provide tangible customer value... Fast Dr. Michael Stonebraker

THE (PROPER) DESIGN OF THE DBMS Lessons from 40 Years of Database Design

1. Get the user interaction right – Bet on a small number of easy-to- understand constructs Those who don’t learn – Plus standards “ from history are 2. Get the implementation right desned to repeat it. – Bet on a small number of easy-to- understand constructs -Winston Churchill ” 3. One size does not fit all – At least not if you want fast, big or complex #1: Get the User Interaction Right

Historical Lesson: RDBMS vs. CODASYL vs. OODB

Winner: RDBMS Loser: CODASYL Loser: OODBs • Simple data model • Complicated data model • Complex data model (tables) (records; participate in (hierarchical records, “sets”; set has one owner pointers, sets, arrays, • Simple access and, perhaps, many etc.) language (SQL) members, etc.) • Messy access language (sea • Complex access • ACID (transactions) of “cursors”; some -- but not language (navigation, all -- move on every through this sea) • Standards (SQL) command, navigation programming) • No standards

Interaction Take Away − Simple is Good

• ACID was easy for people to understand

• SQL provided a standard, high-level language and made people productive (transportable skills) #2: Get the Implementation Right

• Leverage a few simple ideas: Early relational implementations Historical Winners – System R storage system dropped links – Views (protection, schema modification, performance) – Cost-based optimizer • Leverage a few simple ideas: Postgres – User-defined data types and functions (adopted by most everybody) – Rules/triggers – No-overwrite storage • Leverage a few simple ideas: Vertica – Store data by column – Compressed up the ging gong – Parallel load without compromising ACID #3: One Size Does NOT Fit All

• OSFA is an old technology with hundreds of bags hanging off it …specialized systems • It breaks 100% of the time when under “ can each be a factor of load 50 faster than the • Load = size or speed or complexity single ‘one size fits all’ • Load is increasing at a startling rate system…A factor of 50 • Purpose-built will exceed by 10x to 100x is nothing to sneeze at. -My Top 10 Asserons About • History has not been completely written Data Warehouses, 2010 ” yet…but let’s look at VoltDB as an example Example: VoltDB

• Get the interface right – SQL – ACID

• Implementation: Leverage a few simple ideas – Main memory – Stored procedures – Deterministic scheduling

• Specialization – OLTP focus allowed for above implementation choices Proving the Theory

Useful Work • Challenge: OLTP 4% performance Recovery 24% Latching 24% – TPC-C CPU cycles

Buffer Pool 24% – On the Shore DBMS Locking 24% prototype

– Elephants should be similar Implementation Construct #1: Main Memory • Main memory format for data – Disk format gets you buffer pool overhead • What happens if data doesn’t fit? – Return to disk-buffer pool architecture (slow) – Anti-caching • Main memory format for data • When memory fills up, then bundle together elderly tuples and write them out • Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin) • Run Xact normally Implementation Construct #2: Stored Procedures

• Round trip to the DBMS is expensive – Do it once per transaction – Not once per command – Or even once per cursor move • Ad-hoc queries supported – Turn them into dynamic stored procedures Implementation Construct #3: Deterministic and Non-deterministic Scheduling • Non-deterministic (can’t tell order until commit time) – MVCC – Dynamic locking • Deterministic – Time stamp order Result of Design Principles: VoltDB Example

• Good interface decisions – made developers more productive

– SQL & ACID • Leveraging a few simple implementation ideas – made VoltDB wicked fast – Main memory – Stored procedures – Deterministic scheduling Proving the Theory

• Answer: OLTP performance …we are heading – 3 million transactions per second “ toward a world with at least 5 (and probably – 7x Cassandra more) specialized – 15 million SQL statements per engines and the death second of the ‘one size fits all’ – 100,000+ transactions per legacy systems. commodity server -The End of an Architectural Era (It’s Time for a Complete ” Rewrite), 2007 Scott Jarr THE DATABASE UNIVERSE Technology Meets the Market

Believe

– “Big Data” is a rare, transformative market – Velocity is becoming the cornerstone – Specialized databases (working together) are the answer – Products must provide tangible customer value… Fast Observations – Noisy, crowded and new – kinda like Christmas shopping at the mall – Everyone wants to understand where the pieces fit – Analysts build maps on technology NOT use cases What we need is… Data Value Chain

Age of Data

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade • Calculate risk • Retrieve click • Backtest algo • Algo discovery • Serve ad • Leaderboard stream • BI • Log analysis • Enrich stream • Aggregate • Show orders • Daily reports • Fraud pattern match • Examine packet • Count • Approve trans. Data Value Chain

Value of Individual Aggregate Data Item Data Value Data ValueData

Age of Data

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade • Calculate risk • Retrieve click • Backtest algo • Algo discovery • Serve ad • Leaderboard stream • BI • Log analysis • Enrich stream • Aggregate • Show orders • Daily reports • Fraud pattern match • Examine packet • Count • Approve trans. The Database Universe

Fast Complex Large Value of Individual Data Item Aggregate Data Value Data ValueData

Application Complexity Application Traditional RDBMS Simple Slow Small Transactional Analytic Exploratory Interactive Real-time Analytics Record Lookup Historical Analytics Analytics The Database Universe

Fast Complex Large Value of Individual Data Item Aggregate Data Value Data ValueData

Velocity Hadoop, etc. NoSQL Data NewSQL Warehouse

Application Complexity Application Traditional RDBMS Simple Slow Small Transactional Analytic Exploratory Interactive Real-time Analytics Record Lookup Historical Analytics Analytics logins trades authorizaons clicks orders impressions sensors Closed-loop Big Data

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics logins trades authorizaons clicks orders impressions sensors Closed-loop Big Data • Make the most Interactive & Real-time Analytics informed decision every time there is an interaction

• Real-time decisions Historical Reports & Analytics are informed by Knowledge operational analytics and past knowledge

Exploratory Analytics The Velocity Use Case

What’s it look like? – High throughput, relentless data feeds – Fast decisions on high-value data – Real-time, operational analytics present immediate visibility

What’s the big deal? – Batch converts to real time = efficiency – Decisions made at time of event = better decisions

– Ability to micro segment/target/personalize/etc. = conversion, satisfaction, more data is coming at you, use it to improve your business Next Up QUESTIONS AND ANSWERS www.voltdb.com THANK YOU