Apache Pulsar and Its Enterprise Use Cases

Apache Pulsar and its enterprise use cases Yahoo Japan Corporation Nozomi Kurihara July 18th, 2018 Who am I? Nozomi Kurihara • Software engineer at Yahoo! JAPAN (April 2012 ~) • Working on internal messaging platform using Apache Pulsar • Committer of Apache Pulsar Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 2 Agenda 1. What is Apache Pulsar? 2. Why is Apache Pulsar useful? 3. How does Yahoo! JAPAN uses Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 3 What is Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 4 Agenda 1. What is Apache Pulsar? › History & Users › Pub-Sub messaging › Architecture › Client libraries › Topic › Subscription › Sample codes 2. Why is Apache Pulsar useful? 3. How does Yahoo! JAPAN uses Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 5 Apache Pulsar Flexible pub-sub system backed by durable log storage ▪ History: ▪ Competitors: › 2014 Development started at Yahoo! Inc. › Apache Kafka › 2015 Available in production in Yahoo! Inc. › RabbitMQ › Sep. 2016 Open-sourced (Apache License 2.0) › Apache ActiveMQ › June 2017 Moved to Apache Incubator Project › Apache RocketMQ › June 2018 Major version update: 2.0.1 etc. ▪ Users: › Oath Inc. (Yahoo! Inc.) › Comcast › The Weather Channel › Mercado Libre › Streamlio › Yahoo! JAPAN etc. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 6 Pub-Sub messaging Message transmission from one system to another via Topic ▪ Producers publish messages to Topics ▪ Consumers receive only messages from Topics to which they subscribe ▪ Decoupled (no need to know each other) → asynchronous, scalable, resilient Subscribe Consumer 1 Publish Producer Topic Consumer 2 message (log, notification, etc.) Consumer 3 Pub-Sub system Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 7 Architecture Producer Consumer ■3 components: ‣ Broker ‣ Bookie Broker 1 Broker 2 Broker 3 ‣ ZooKeeper Configuration Store Local ZK (Global ZK) Bookie Bookie Bookie 1 2 3 Pulsar Cluster Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 8 Architecture - Broker Producer Consumer ■Broker ‣ Serving node for clientsʼ requests ‣ No data locality (stateless) Broker 1 Broker 2 Broker 3 Configuration Store Local ZK (Global ZK) Bookie Bookie Bookie 1 2 3 Pulsar Cluster Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 9 Architecture - Bookie Producer Consumer ■Bookie (Apache BookKeeper) ‣ Storage node for messages ‣ Durable, Scalable, Consistent, Fault- Broker 1 Broker 2 Broker 3 tolerant, Low-latency Configuration Store Local ZK (Global ZK) Apache BookKeeper: distributed write-ahead log system Bookie Bookie Bookie 1 2 3 Copyright © 2016 - 2018 The Apache Software Foundation, Pulsar Cluster licensed under the Apache License, version 2.0. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 10 Architecture - ZooKeeper Producer Consumer ■Apache ZooKeeper ‣ Store metadata and configuration ‣ Local ZK: within local cluster Broker 1 Broker 2 Broker 3 ‣ Configuration Store: across all clusters Configuration Store Local ZK (Global ZK) Bookie Bookie Bookie 1 2 3 Pulsar Cluster Copyright © 2016 - 2018 The Apache Software Foundation, licensed under the Apache License, version 2.0. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 11 Client libraries Supported client libraries: ▪ Java ▪ C++ ▪ Python ▪ Go (Coming soon) Also can use WebSocket API Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 12 Topic URI Service etc. Usage detail etc. persistent://tenant/namespace/topicname persistency Usage etc. persistent non-persistent Examples: Producer Consumer Producer Consumer persistent://shopping/user-log/buy persistent://auction/notifications/bid Broker Broker persistent://news/sports/articles persist … Bookie Bookie Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 13 Subscription • Acknowledgement(ACK): confirmation that message is received • Cursor: which messages are acknowledged • Subscription: cursor and which consumers are connecting Note: Each message is kept unless all subscriptions acknowledge sub-A Consumer acked by sub-A 2 1 Producer 6 5 4 3 ✓2 ✓1 acked by all → can be deleted acked by sub-B sub-B Consumer 4 3 2 1 Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 14 Sample code(Java) - Producer // Create client by specifying the broker URI PulsarClient client = PulsarClient.builder() .serviceUrl("pulsar://localhost:6650") .build(); // Create producer by specifying the topic URI Producer<byte[]> producer = client.newProducer() .topic(”persistent://my-tenant/my-ns/my-topic") .create(); // Send a message producer.send("My message".getBytes()); Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 15 Sample code(Java) - Consumer // Create client by specifying the broker URI PulsarClient client = PulsarClient.builder() .serviceUrl("pulsar://localhost:6650") .build(); // Create consumer by specifying the topic URI and subscription name Consumer consumer = client.newConsumer() .topic(" persistent://my-tenant/my-ns/my-topic") .subscriptionName("my-subscription") .subscribe(); // Receive a message Message msg = consumer.receive(); System.out.printf("Message received: %s", new String(msg.getData())); // Acknowledge the message so that it can be deleted by the broker consumer.acknowledge(msg); Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 16 Why is Apache Pulsar useful? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 17 Agenda 1. What is Apache Pulsar? 2. Why is Apache Pulsar useful? › High performance › Scalability › Multi-tenant › Geo-replication › Pulsar Functions › Pulsar on Kubernetes › Apache Pulsar vs. Apache Kafka 3. How does Yahoo! JAPAN uses Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 18 High performance High throughput & low latency even if ▪ huge number of topics Throughput and 99pct latency: 1 Topic – 1 Producer 6 ▪ high reliability is required 10 Bytes ] 5 100 Bytes ms 1 KB ex. At Oath Inc. 4 Requirements: 3 • Topics: 2.3 million 2 • Messages：100 billion [msg/day] 1 99pct Latency [ 99pct Latency • Message loss is not acceptable 0 • Order needs to be guaranteed 1,000 10,000 100,000 1,000,000 10,000,000 Throughput [msg/s] Performance(1KB message): • Throughput: 100 thousand [msg/s] Graph is from: https://yahooeng.tumblr.com/post/150078336821/ • Latency: 5 [ms] open-sourcing-pulsar-pub-sub-messaging-at-scale Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 19 Scalability Just adding Brokers/Bookies increases serving/storage capacity ▪ Add Brokers when more producer/consumerʼs requests need to be served ▪ Add Bookies when more messages need to be stored Producer Consumer Broker 1 Broker 2 Broker 3 Broker X Configuration Store (Global ZK) for more serving capacity Local ZK Bookie Y Bookie 1 Bookie 2 Bookie 3 Pulsar Cluster for more storage capacity Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 20 Multi-tenancy Multiple services can share one Pulsar system ▪ Just use Pulsar as a “Tenant” → no need to maintain own messaging system ▪ Authentication/Authorization mechanism protects messages from interception Service A Producer Topic A Consumer Service B Producer Topic B Consumer Service C Producer Topic C Consumer Service D Producer Topic D Authentication/AuthorizationConsumer blocks unauthorized access Pulsar System Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 21 Geo-replication Pulsar can replicate messages to another cluster 1. Producers only have to publish messages to Pulsar in the same data center 2. Pulsar asynchronously replicates messages to another cluster 3. Consumers can receive messages from the same data center Pulsar Cluster A Consumer Geo-replication Pulsar Cluster B Consumer Producer Topic Consumer Topic Consumer Consumer Consumer Data center A Data center B Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 22 Pulsar Functions • Lightweight compute framework (e.g., AWS lambda, Google Cloud Functions) • No need to launch extra systems (e.g., Apache Heron, Apache Storm, Apache Spark) • All you have to do is to implement your logic and deploy it to Pulsar cluster • Java and Python are available and also other languages will be supported in the future Pulsar Function Worker Source Consumer F(x) Producer Sink Hello Hello! def process(input): return input + ’!’ Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 23 Pulsar on Kubernetes • Pulsar can be easily deployed in Kubernetes clusters (e.g., AWS, GKE) • See https://pulsar.incubator.apache.org/docs/latest/deployment/Kubernetes/ Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 24 Apache Pulsar vs. Apache Kafka Apache Pulsar Apache Kafka Architecture image: • # of partition=1 • # of storage=3 Producer Consumer Producer Consumer • # of replication=2 Broker 1 Broker 2 Broker 3 Broker 1 Broker 2 Broker 3 (replica) (leader) seg-1 seg-2 seg-1 seg-3 seg-2 seg-3 Bookie1 Bookie2 Bookie3 partition partition Replication unit Segment (more granular than Partition) Partition Data distribution Distributed and balanced across all Bookies Kept in only leader and replica Brokers Max capacity Not limited by any single node Limited by the smallest brokerʼs capacity Data Rebalancing Not required Required when scale Geo-replication Built-in Need to start an extra process: MirrorMaker Copyright (C) 2018 Yahoo Japan

Apache Pulsar and Its Enterprise Use Cases

Java Linksammlung

Enterprise Integration Patterns N About Apache Camel N Essential Patterns Enterprise Integration Patterns N Conclusions and More

Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka

Unravel Data Systems Version 4.5

Talend Open Studio for Big Data Release Notes

Talend ESB System Management Integration User Guide

Release Notes Date Published: 2021-03-25 Date Modified

Thesis Artificial Intelligence Method Call Argument Completion Using

HDP 3.1.4 Release Notes Date of Publish: 2019-08-26

60 Recipes for Apache Cloudstack

An Empirical Performance Evaluation of Apache Kafka

Hortonworks Data Platform May 29, 2015