Intro to Apache Kudu Hadoop Storage for Fast Analytics on Fast Data
Total Page:16
File Type:pdf, Size:1020Kb
Intro to Apache Kudu Hadoop storage for fast analytics on fast data Mike Percy Software Engineer at Cloudera Apache Kudu PMC member © Cloudera, Inc. All rights reserved. 1 Apache Kudu Storage for fast (low latency) analytics on fast (high throughput) data DATA ENGINEERING DATA DISCOVERY & ANALYTICS DATA APPS • Simplifies the architecture for building BATCH STREAM SQL SEARCH MODEL ONLINE analytic applications on changing data SPARK, SPARK IMPALA SOLR SPARK HBASE HIVE, PIG • Optimized for fast analytic performance UNIFIED DATA SERVICES RESOURCE MANAGEMENT – YARN • Natively integrated with the Hadoop SECURITY – SENTRY ecosystem of components DATA INTEGRATION & STORAGE FILESYSTEM COLUMNAR STORE NoSQL HDFS KUDU HBASE INGEST – SQOOP, FLUME, KAFKA © Cloudera, Inc. All rights reserved. 2 Why Kudu? © Cloudera, Inc. All rights reserved. 3 Previous Hadoop storage landscape HDFS (GFS) excels at: • Batch ingest only (eg hourly) • Efficiently scanning large amounts of data (analytics) HBase (BigTable) excels at: • Efficiently finding and writing individual rows • Making data mutable Gaps exist when these properties are needed simultaneously © Cloudera, Inc. All rights reserved. 4 Kudu design goals • High throughput for big scans Goal: Within 2x of Parquet • Low-latency for short accesses Goal: 1ms read/write on SSD • Database-like semantics Initially, single-row atomicity • Relational data model • SQL queries should be natural and easy • Include NoSQL-style scan, insert, and update APIs © Cloudera, Inc. All rights reserved. 5 Changing hardware landscape • Spinning disk -> solid state storage • NAND Flash: Up to 450k read 250k write IOPS, about 2GB/sec read and 1.5GB/sec write throughput, at a price of less than $3/GB and dropping • 3D XPoint memory (1000x faster than Flash, cheaper than RAM) • RAM is cheaper and more abundant: • 64->128->256GB over last few years • Takeaway: The next performance bottleneck is CPU, and current storage systems weren’t designed with CPU efficiency in mind © Cloudera, Inc. All rights reserved. 6 Apache Kudu: Scalable and fast structured storage Tables • Represents data in structured tables like a normal database • Individual record-level access to 100+ billion row tables Fast • Millions of read/write operations per second across cluster • Multiple GB/second read throughput per node Scalable • Tested up to 275 nodes (~3PB cluster) • Designed to scale to 1000s of nodes and tens of PBs © Cloudera, Inc. All rights reserved. 7 Storing records in Kudu tables • A Kudu table has a SQL-like schema • And a finite number of columns (unlike HBase/Cassandra) • Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP • Some subset of columns makes up a possibly-composite primary key • Fast ALTER TABLE • Java, Python, and C++ NoSQL-style APIs • Insert(), Update(), Delete(), Scan() • SQL via integrations with Impala and Spark • Community work in progress / experimental: Drill, Hive © Cloudera, Inc. All rights reserved. 8 Primary Key • Every table must have a primary key • A primary key is comprised of one or more columns • Primary key values must be unique • The columns that comprise a primary key may not be • Boolean or floating-point typed • Nullable • Kudu does not allow the primary key values of a row to be updated • Kudu requires primary key fields to be defined as the first fields of the table schema • Rows within a tablet are stored in primary key sorted order © Cloudera, Inc. All rights reserved. 9 Integrations Kudu is designed for integrating with higher-level compute frameworks Integrations exist for: • Impala • Spark • MapReduce • Flume • Drill © Cloudera, Inc. All rights reserved. 10 Use cases © Cloudera, Inc. All rights reserved. 11 Kudu use cases Kudu is best for use cases requiring: • Simultaneous combination of sequential and random reads and writes • Minimal to zero data latencies Time series • Examples: Streaming market data; fraud detection & prevention; network monitoring • Workload: Inserts, updates, scans, lookups Online reporting / data warehousing • Example: Operational Data Store (ODS) • Workload: Inserts, updates, scans, lookups © Cloudera, Inc. All rights reserved. 12 “Traditional” real-time analytics in Hadoop Fraud detection in the real world = storage complexity Storage in HDFS Considerations: Kafka • How do I handle failure during this process? • How often do I reorganize Have we accumulated Historical Data data streaming in into a enough data? format appropriate for Reporting reporting? Request HBase • When reporting, how do I see data that has not yet been Reorganize Most Recent Partition HBase file reorganized? into Parquet New Partition • How do I ensure that Parquet important jobs aren’t File interrupted by maintenance? • Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file © Cloudera, Inc. All rights reserved. 13 Real-time analytics in Hadoop with Kudu Storage in Kudu Improvements: • One system to operate • No cron jobs or background processes Incoming data Historical and Real-time • Handle late arrivals or data (e.g. Kafka) Data corrections with ease Reporting • New data available Request immediately for analytics or operations © Cloudera, Inc. All rights reserved. 14 Xiaomi use case th • World’s 4 largest smart-phone maker (most popular in China) • Gather important RPC tracing events from mobile app and backend service. • Service monitoring & troubleshooting tool. High write throughput • >20 Billion records/day and growing Query latest data and quick response • Identify and resolve issues quickly Can search for individual records • Easy for troubleshooting © Cloudera, Inc. All rights reserved. 15 Xiaomi big data analytics pipeline Before Kudu Long pipeline • High data latency (approx 1 hour – 1 day) • Data conversion pains No ordering • Log arrival (storage) order is not exactly logical order • Must read 2 – 3 days of data to get all of the data points for a single day © Cloudera, Inc. All rights reserved. 16 Xiaomi big data analytics pipeline Simplified with Kafka and Kudu OLAP scan ETL pipeline Side table lookup • 0 – 10s data latency Result store • Apps that need to avoid backpressure or need ETL Direct pipeline (no latency) • Apps that don’t require ETL or backpressure handling © Cloudera, Inc. All rights reserved. 17 JD.com use case nd •2 largest online retailer in China Browser tracing Web logs •Real-time ingestion via Kafka •Click logs Kafka •Application/Browser tracing •~70 columns per row Kudu •6/18 sale day •15B transactions Impala •10M inserts/sec peak JDBC access •200 node cluster Web-app Developers •Query via JDBC -> Impala -> Kudu Marketing Dept. © Cloudera, Inc. All rights reserved. 18 Kudu+Impala vs MPP DWH Commonalities ✓ Fast analytic queries via SQL, including most commonly used modern features ✓ Ability to insert, update, and delete data Differences ✓ Faster streaming inserts ✓ Improved Hadoop integration • JOIN between HDFS + Kudu tables, run on same cluster • Spark, Flume, other integrations ✗ Slower batch inserts ✗ No transactional data loading, multi-row transactions, or indexing © Cloudera, Inc. All rights reserved. 19 How it works Replication and fault tolerance © Cloudera, Inc. All rights reserved. 20 Tables, Tablets, and Tablet Servers •Each table is horizontally partitioned into tablets •Range or hash partitioning •PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY HASH(timestamp) INTO 100 BUCKETS •Each tablet has N replicas (default = 3) with Raft consensus •Automatic fault tolerance •MTTR: ~5 seconds •Tablet servers host tablets on local disk drives •Master servers manage the cluster’s metadata © Cloudera, Inc. All rights reserved. 21 Master servers • Master servers (3 - 5 of them) manage the cluster’s metadata • Manage schemas and tables (and the corresponding tablets) • CREATE / ALTER / DROP TABLE • Track the locations of all of the tablet replicas • Detect when tablet replicas fail and initiate data re-replication • Internally, the “master” metadata is stored in a special type of tablet that only lives on the master servers • The master tablet uses Raft consensus for replication across the master servers’ local disk drives © Cloudera, Inc. All rights reserved. 22 How it works Columnar storage © Cloudera, Inc. All rights reserved. 23 Columnar storage Tweet_id User_name {25059873, {newsycbot, 22309487, RideImpala, 23059861, fastly, 23010982} llvmorg} Created_at text {1442865158, {Visual exp…, 1442828307, Introducing .., 1442865156, Missing July…, 1442865155} LLVM 3.7….} © Cloudera, Inc. All rights reserved. 24 Columnar storage Only read 1 column Tweet_id User_name Created_at text {25059873, {newsycbot, {1442865158, {Visual exp…, 22309487, RideImpala, 1442828307, Introducing .., 23059861, fastly, 1442865156, Missing July…, 1442865155} LLVM 3.7….} 23010982} llvmorg} 1GB 2GB 1GB 200GB SELECT COUNT(*) FROM tweets WHERE user_name = ‘newsycbot’; © Cloudera, Inc. All rights reserved. 25 Columnar compression Created_at Diff(created_at) • Many columns can compress to Created_at a few bits per row! 1442825158 n/a • Especially: {1442825158, • Timestamps 1442826100, 1442826100 942 1442827994, 1442828527} • Time series values 1442827994 1894 • Low-cardinality strings 1442828527 533 • Massive space savings and throughput increase! 64 bits each 11 bits each © Cloudera, Inc. All rights reserved. 26 Representing time series in Kudu © Cloudera, Inc. All rights reserved. 27 What is time series? Data that can be usefully partitioned and queried based on time Examples: • Web user activity