Apache Apex: Next Gen Big Data Analytics
Total Page:16
File Type:pdf, Size:1020Kb
Apache Apex: Next Gen Big Data Analytics Thomas Weise <[email protected]> @thweise PMC Chair Apache Apex, Architect DataTorrent Apache Big Data Europe, Sevilla, Nov 14th 2016 Stream Data Processing Data Delivery Transform / Analytics Real-time visualization, … Declarative SQL API Data Beam Beam SAMOA Operator SAMOA DAG API Sources Library Events Logs Oper1 Oper2 Oper3 Sensor Data Social Databases CDC (roadmap) 2 Industries & Use Cases Financial Services Ad-Tech Telecom Manufacturing Energy IoT Real-time Call detail record customer facing (CDR) & Supply chain Fraud and risk Smart meter Data ingestion dashboards on extended data planning & monitoring analytics and processing key performance record (XDR) optimization indicators analysis Understanding Reduce outages Credit risk Click fraud customer Preventive & improve Predictive assessment detection behavior AND maintenance resource analytics context utilization Packaging and Improve turn around Asset & Billing selling Product quality & time of trade workforce Data governance optimization anonymous defect tracking settlement processes management customer data HORIZONTAL • Large scale ingest and distribution • Enforcing data quality and data governance requirements • Real-time ELTA (Extract Load Transform Analyze) • Real-time data enrichment with reference data • Dimensional computation & aggregation • Real-time machine learning model scoring 3 Apache Apex • In-memory, distributed stream processing • Application logic broken into components (operators) that execute distributed in a cluster • Unobtrusive Java API to express (custom) logic • Maintain state and metrics in member variables • Windowing, event-time processing • Scalable, high throughput, low latency • Operators can be scaled up or down at runtime according to the load and SLA • Dynamic scaling (elasticity), compute locality • Fault tolerance & correctness • Automatically recover from node outages without having to reprocess from beginning • State is preserved, checkpointing, incremental recovery • End-to-end exactly-once • Operability • System and application metrics, record/visualize data • Dynamic changes and resource allocation, elasticity 4 Native Hadoop Integration • YARN is the resource manager • HDFS for storing persistent state 5 Application Development Model A Stream is a sequence of data Directed Acyclic Graph (DAG) tuples A typical Operator takes one or Filtered Enriched more input streams, performs Stream Stream computations & emits one or more output streams Output Operator • Each Operator is YOUR custom Stream business logic in java, or built-in operator from our open source library Tuple Operator Operator Operator Operator • Operator has many instances that run in parallel and each instance is single-threaded Filtered Enriched Stream Stream Directed Acyclic Graph (DAG) is Operator made up of operators and streams 6 Development Process Kafka Word JDBC Parser Filter Input Counter Output Lines Words Filtered Counts Kafka Database Apex Application • Operators from library or develop for custom logic • Connect operators to form application • Configure operator properties • Configure scaling and other platform attributes • Test functionality, performance, iterate 7 Application Specification DAG API (compositional) Java Stream API (declarative) 8 Developing Operators 9 Operator Library Messaging NoSQL RDBMS • Kafka • Cassandra, HBase • JDBC • JMS (ActiveMQ, …) • Aerospike, Accumulo • MySQL • Kinesis, SQS • Couchbase/ CouchDB • Oracle • Flume, NiFi • Redis, MongoDB • MemSQL • Geode File Systems Parsers Transformations • HDFS/ Hive • XML • Filter, Expression, Enrich • NFS • JSON • Windowing, Aggregation • S3 • CSV • Join • Avro • Dedup • Parquet Analytics Protocols Other • Dimensional Aggregations • HTTP • Elastic Search (with state management for • FTP • Script (JavaScript, Python, R) historical data + query) • WebSocket • Solr • MQTT • Twitter • SMTP 10 Stateful Processing with Event Time Event Stream k=A k=B k=B k=A k=A t=4:00 t=5:00 t=5:59 t=4:30 t=5:00 Processing Time +30s +60s +90s (All) : 1 (All) : 4 (All) : 5 t=4:00 : 1 t=4:00 : 2 t=4:00 : 2 State t=5:00 : 2 t=5:00 : 3 k=A, t=4:00 : 1 k=A, t=4:00 : 2 k=A, t=4:00 : 2 k=A, t=5:00 : 1 K=B, t=5:00 : 2 k=B, t=5:00 : 2 11 Windowing - Apache Beam Model Event-time Session windows Watermarks Accumulation Triggers Keyed or Not Keyed Allowed Lateness Accumulation Mode Merging streams ApexStream<String> stream = StreamFactory .fromFolder(localFolder) .flatMap(new Split()) .window(new WindowOption.GlobalWindow(), new TriggerOption().withEarlyFiringsAtEvery(Duration.millis(1000)).accumulatingFiredPanes()) .countByKey(new ConvertToKeyVal()).print(); 12 Fault Tolerance • Operator state is checkpointed to persistent store ᵒ Automatically performed by engine, no additional coding needed ᵒ Asynchronous and distributed ᵒ In case of failure operators are restarted from checkpoint state • Automatic detection and recovery of failed containers ᵒ Heartbeat mechanism ᵒ YARN process status notification • Buffering to enable replay of data from recovered point ᵒ Fast, incremental recovery, spike handling • Application master state checkpointed ᵒ Snapshot of physical (and logical) plan ᵒ Execution layer change log 13 Checkpointing State . Distributed, asynchronous . No artificial latency . Periodic callbacks . Pluggable storage 14 Buffer Server & Recovery • In-memory PubSub Container 1 Container 2 • Stores results until committed Operator Buffer Operator • Backpressure / spillover to disk 1 Server 2 • Ordering, idempotency Node 1 Node 2 Independent pipelines Downstream Operators reset (can be used for speculative execution) 15 Recovery Scenario sum … EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1 0 sum … EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1 7 sum … EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1 10 sum … EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1 7 16 Processing Guarantees At-least-once • On recovery data will be replayed from a previous checkpoint ᵒ No messages lost ᵒ Default, suitable for most applications • Can be used to ensure data is written once to store ᵒ Transactions with meta information, Rewinding output, Feedback from external entity, Idempotent operations At-most-once • On recovery the latest data is made available to operator ᵒ Useful in use cases where some data loss is acceptable and latest data is sufficient Exactly-once ᵒ At-least-once + idempotency + transactional mechanisms (operator logic) to achieve end-to-end exactly once behavior 17 End-to-End Exactly Once • Important when writing to external systems • Data should not be duplicated or lost in the external system in case of application failures • Common external systems ᵒ Databases ᵒ Files ᵒ Message queues • Exactly-once = at-least-once + idempotency + consistent state • Data duplication must be avoided when data is replayed from checkpoint ᵒ Operators implement the logic dependent on the external system ᵒ Platform provides checkpointing and repeatable windowing 18 Scalability Unifier NxM Partitions Logical DAG Logical Diagram 0 1 2 3 0 1 2 Physical DAG with (1a, 1b, 1c) and (2a, 2b): Bottleneck on intermediate Unifier 1a 2a Physical Diagram with operator 1 with 3 partitions 0 1b Unifier Unifier 3 1 2b 1c Physical DAG with (1a, 1b, 1c) and (2a, 2b): No bottleneck 0 1 Unifier 2 1a Unifier 2a 1 0 1b Unifier 3 Unifier 2b 1c 19 Advanced Partitioning Parallel Partition Cascading Unifiers Logical DAG Logical Plan 0 1 2 3 4 uopr dopr Execution Plan, for N = 4; M = 1 Physical DAG uopr1 1a uopr2 Container 0 Unifier 2 3 4 unifier dopr NIC 1b uopr3 uopr4 Physical DAG with Parallel Partition Execution Plan, for N = 4; M = 1, K = 2 with cascading unifiers uopr1 Container 1a 2a 3a unifier NIC NIC 0 Unifier 4 uopr2 Container unifier dopr NIC 1b 2b 3b uopr3 Container unifier NIC NIC uopr4 20 Dynamic Partitioning 2a 2a 1a 2a 1a 2b 1a 2b 3a 3 3 1b 2b 1b 2c 1b 2c 3b 2d 2d Unifiers not shown • Partitioning change while application is running ᵒ Change number of partitions at runtime based on stats ᵒ Determine initial number of partitions dynamically • Kafka operators scale according to number of kafka partitions ᵒ Supports re-distribution of state when number of partitions change ᵒ API for custom scaler or partitioner 21 How dynamic partitioning works • Partitioning decision (yes/no) by trigger (StatsListener) ᵒ Pluggable component, can use any system or custom metric ᵒ Externally driven partitioning example: KafkaInputOperator • Stateful! ᵒ Uses checkpointed state ᵒ Ability to transfer state from old to new partitions (partitioner, customizable) ᵒ Steps: • Call partitioner • Modify physical plan, rewrite checkpoints as needed • Undeploy old partitions from execution layer • Release/request container resources • Deploy new partitions (from rewritten checkpoint) ᵒ No loss of data (buffered) ᵒ Incremental operation, partitions that don’t change continue processing • API: Partitioner interface 22 Compute Locality • By default operators are deployed in containers (processes) on different nodes across the Hadoop cluster • Locality options for streams ᵒ RACK_LOCAL: Data does not traverse network switches ᵒ NODE_LOCAL: Data transfer via loopback interface, frees up network bandwidth ᵒ CONTAINER_LOCAL: Data transfer via in memory queues between operators, does not require serialization ᵒ THREAD_LOCAL: Data passed through call stack, operators share thread • Host Locality ᵒ Operators can be deployed on specific hosts • (Anti-)Affinity ᵒ Ability to express relative deployment without specifying a host 23 Performance: Throughput vs. Latency? https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-