Apache Pulsar and Its Enterprise Use Cases

Total Page:16

File Type:pdf, Size:1020Kb

Apache Pulsar and Its Enterprise Use Cases Apache Pulsar and its enterprise use cases Yahoo Japan Corporation Nozomi Kurihara July 18th, 2018 Who am I? Nozomi Kurihara • Software engineer at Yahoo! JAPAN (April 2012 ~) • Working on internal messaging platform using Apache Pulsar • Committer of Apache Pulsar Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 2 Agenda 1. What is Apache Pulsar? 2. Why is Apache Pulsar useful? 3. How does Yahoo! JAPAN uses Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 3 What is Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 4 Agenda 1. What is Apache Pulsar? › History & Users › Pub-Sub messaging › Architecture › Client libraries › Topic › Subscription › Sample codes 2. Why is Apache Pulsar useful? 3. How does Yahoo! JAPAN uses Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 5 Apache Pulsar Flexible pub-sub system backed by durable log storage ▪ History: ▪ Competitors: › 2014 Development started at Yahoo! Inc. › Apache Kafka › 2015 Available in production in Yahoo! Inc. › RabbitMQ › Sep. 2016 Open-sourced (Apache License 2.0) › Apache ActiveMQ › June 2017 Moved to Apache Incubator Project › Apache RocketMQ › June 2018 Major version update: 2.0.1 etc. ▪ Users: › Oath Inc. (Yahoo! Inc.) › Comcast › The Weather Channel › Mercado Libre › Streamlio › Yahoo! JAPAN etc. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 6 Pub-Sub messaging Message transmission from one system to another via Topic ▪ Producers publish messages to Topics ▪ Consumers receive only messages from Topics to which they subscribe ▪ Decoupled (no need to know each other) → asynchronous, scalable, resilient Subscribe Consumer 1 Publish Producer Topic Consumer 2 message (log, notification, etc.) Consumer 3 Pub-Sub system Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 7 Architecture Producer Consumer ■3 components: ‣ Broker ‣ Bookie Broker 1 Broker 2 Broker 3 ‣ ZooKeeper Configuration Store Local ZK (Global ZK) Bookie Bookie Bookie 1 2 3 Pulsar Cluster Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 8 Architecture - Broker Producer Consumer ■Broker ‣ Serving node for clientsʼ requests ‣ No data locality (stateless) Broker 1 Broker 2 Broker 3 Configuration Store Local ZK (Global ZK) Bookie Bookie Bookie 1 2 3 Pulsar Cluster Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 9 Architecture - Bookie Producer Consumer ■Bookie (Apache BookKeeper) ‣ Storage node for messages ‣ Durable, Scalable, Consistent, Fault- Broker 1 Broker 2 Broker 3 tolerant, Low-latency Configuration Store Local ZK (Global ZK) Apache BookKeeper: distributed write-ahead log system Bookie Bookie Bookie 1 2 3 Copyright © 2016 - 2018 The Apache Software Foundation, Pulsar Cluster licensed under the Apache License, version 2.0. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 10 Architecture - ZooKeeper Producer Consumer ■Apache ZooKeeper ‣ Store metadata and configuration ‣ Local ZK: within local cluster Broker 1 Broker 2 Broker 3 ‣ Configuration Store: across all clusters Configuration Store Local ZK (Global ZK) Bookie Bookie Bookie 1 2 3 Pulsar Cluster Copyright © 2016 - 2018 The Apache Software Foundation, licensed under the Apache License, version 2.0. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 11 Client libraries Supported client libraries: ▪ Java ▪ C++ ▪ Python ▪ Go (Coming soon) Also can use WebSocket API Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 12 Topic URI Service etc. Usage detail etc. persistent://tenant/namespace/topicname persistency Usage etc. persistent non-persistent Examples: Producer Consumer Producer Consumer persistent://shopping/user-log/buy persistent://auction/notifications/bid Broker Broker persistent://news/sports/articles persist … Bookie Bookie Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 13 Subscription • Acknowledgement(ACK): confirmation that message is received • Cursor: which messages are acknowledged • Subscription: cursor and which consumers are connecting Note: Each message is kept unless all subscriptions acknowledge sub-A Consumer acked by sub-A 2 1 Producer 6 5 4 3 ✓2 ✓1 acked by all → can be deleted acked by sub-B sub-B Consumer 4 3 2 1 Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 14 Sample code(Java) - Producer // Create client by specifying the broker URI PulsarClient client = PulsarClient.builder() .serviceUrl("pulsar://localhost:6650") .build(); // Create producer by specifying the topic URI Producer<byte[]> producer = client.newProducer() .topic(”persistent://my-tenant/my-ns/my-topic") .create(); // Send a message producer.send("My message".getBytes()); Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 15 Sample code(Java) - Consumer // Create client by specifying the broker URI PulsarClient client = PulsarClient.builder() .serviceUrl("pulsar://localhost:6650") .build(); // Create consumer by specifying the topic URI and subscription name Consumer consumer = client.newConsumer() .topic(" persistent://my-tenant/my-ns/my-topic") .subscriptionName("my-subscription") .subscribe(); // Receive a message Message msg = consumer.receive(); System.out.printf("Message received: %s", new String(msg.getData())); // Acknowledge the message so that it can be deleted by the broker consumer.acknowledge(msg); Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 16 Why is Apache Pulsar useful? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 17 Agenda 1. What is Apache Pulsar? 2. Why is Apache Pulsar useful? › High performance › Scalability › Multi-tenant › Geo-replication › Pulsar Functions › Pulsar on Kubernetes › Apache Pulsar vs. Apache Kafka 3. How does Yahoo! JAPAN uses Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 18 High performance High throughput & low latency even if ▪ huge number of topics Throughput and 99pct latency: 1 Topic – 1 Producer 6 ▪ high reliability is required 10 Bytes ] 5 100 Bytes ms 1 KB ex. At Oath Inc. 4 Requirements: 3 • Topics: 2.3 million 2 • Messages:100 billion [msg/day] 1 99pct Latency [ 99pct Latency • Message loss is not acceptable 0 • Order needs to be guaranteed 1,000 10,000 100,000 1,000,000 10,000,000 Throughput [msg/s] Performance(1KB message): • Throughput: 100 thousand [msg/s] Graph is from: https://yahooeng.tumblr.com/post/150078336821/ • Latency: 5 [ms] open-sourcing-pulsar-pub-sub-messaging-at-scale Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 19 Scalability Just adding Brokers/Bookies increases serving/storage capacity ▪ Add Brokers when more producer/consumerʼs requests need to be served ▪ Add Bookies when more messages need to be stored Producer Consumer Broker 1 Broker 2 Broker 3 Broker X Configuration Store (Global ZK) for more serving capacity Local ZK Bookie Y Bookie 1 Bookie 2 Bookie 3 Pulsar Cluster for more storage capacity Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 20 Multi-tenancy Multiple services can share one Pulsar system ▪ Just use Pulsar as a “Tenant” → no need to maintain own messaging system ▪ Authentication/Authorization mechanism protects messages from interception Service A Producer Topic A Consumer Service B Producer Topic B Consumer Service C Producer Topic C Consumer Service D Producer Topic D Authentication/AuthorizationConsumer blocks unauthorized access Pulsar System Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 21 Geo-replication Pulsar can replicate messages to another cluster 1. Producers only have to publish messages to Pulsar in the same data center 2. Pulsar asynchronously replicates messages to another cluster 3. Consumers can receive messages from the same data center Pulsar Cluster A Consumer Geo-replication Pulsar Cluster B Consumer Producer Topic Consumer Topic Consumer Consumer Consumer Data center A Data center B Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 22 Pulsar Functions • Lightweight compute framework (e.g., AWS lambda, Google Cloud Functions) • No need to launch extra systems (e.g., Apache Heron, Apache Storm, Apache Spark) • All you have to do is to implement your logic and deploy it to Pulsar cluster • Java and Python are available and also other languages will be supported in the future Pulsar Function Worker Source Consumer F(x) Producer Sink Hello Hello! def process(input): return input + ’!’ Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 23 Pulsar on Kubernetes • Pulsar can be easily deployed in Kubernetes clusters (e.g., AWS, GKE) • See https://pulsar.incubator.apache.org/docs/latest/deployment/Kubernetes/ Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 24 Apache Pulsar vs. Apache Kafka Apache Pulsar Apache Kafka Architecture image: • # of partition=1 • # of storage=3 Producer Consumer Producer Consumer • # of replication=2 Broker 1 Broker 2 Broker 3 Broker 1 Broker 2 Broker 3 (replica) (leader) seg-1 seg-2 seg-1 seg-3 seg-2 seg-3 Bookie1 Bookie2 Bookie3 partition partition Replication unit Segment (more granular than Partition) Partition Data distribution Distributed and balanced across all Bookies Kept in only leader and replica Brokers Max capacity Not limited by any single node Limited by the smallest brokerʼs capacity Data Rebalancing Not required Required when scale Geo-replication Built-in Need to start an extra process: MirrorMaker Copyright (C) 2018 Yahoo Japan
Recommended publications
  • Java Linksammlung
    JAVA LINKSAMMLUNG LerneProgrammieren.de - 2020 Java einfach lernen (klicke hier) JAVA LINKSAMMLUNG INHALTSVERZEICHNIS Build ........................................................................................................................................................... 4 Caching ....................................................................................................................................................... 4 CLI ............................................................................................................................................................... 4 Cluster-Verwaltung .................................................................................................................................... 5 Code-Analyse ............................................................................................................................................. 5 Code-Generators ........................................................................................................................................ 5 Compiler ..................................................................................................................................................... 6 Konfiguration ............................................................................................................................................. 6 CSV ............................................................................................................................................................. 6 Daten-Strukturen
    [Show full text]
  • Enterprise Integration Patterns N About Apache Camel N Essential Patterns Enterprise Integration Patterns N Conclusions and More
    Brought to you by... #47 CONTENTS INCLUDE: n About Enterprise Integration Patterns n About Apache Camel n Essential Patterns Enterprise Integration Patterns n Conclusions and more... with Apache Camel Visit refcardz.com By Claus Ibsen ABOUT ENTERPRISE INTEGRATION PaTTERNS Problem A single event often triggers a sequence of processing steps Solution Use Pipes and Filters to divide a larger processing steps (filters) that are connected by channels (pipes) Integration is a hard problem. To help deal with the complexity Camel Camel supports Pipes and Filters using the pipeline node. of integration problems the Enterprise Integration Patterns Java DSL from(“jms:queue:order:in”).pipeline(“direct:transformOrd (EIP) have become the standard way to describe, document er”, “direct:validateOrder”, “jms:queue:order:process”); and implement complex integration problems. Hohpe & Where jms represents the JMS component used for consuming JMS messages Woolf’s book the Enterprise Integration Patterns has become on the JMS broker. Direct is used for combining endpoints in a synchronous fashion, allow you to divide routes into sub routes and/or reuse common routes. the bible in the integration space – essential reading for any Tip: Pipeline is the default mode of operation when you specify multiple integration professional. outputs, so it can be omitted and replaced with the more common node: from(“jms:queue:order:in”).to(“direct:transformOrder”, “direct:validateOrder”, “jms:queue:order:process”); Apache Camel is an open source project for implementing TIP: You can also separate each step as individual to nodes: the EIP easily in a few lines of Java code or Spring XML from(“jms:queue:order:in”) configuration.
    [Show full text]
  • Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka
    White Paper Information Security | Machine Learning October 2020 IT@Intel: Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka Our Apache Kafka data pipeline based on Confluent Platform ingests tens of terabytes per day, providing in-stream processing for faster security threat detection and response Intel IT Authors Executive Summary Ryan Clark Advanced cyber threats continue to increase in frequency and sophistication, Information Security Engineer threatening computing environments and impacting businesses’ ability to grow. Jen Edmondson More than ever, large enterprises must invest in effective information security, Product Owner using technologies that improve detection and response times. At Intel, we Dennis Kwong are transforming from our legacy cybersecurity systems to a modern, scalable Information Security Engineer Cyber Intelligence Platform (CIP) based on Kafka and Splunk. In our 2019 paper, Transforming Intel’s Security Posture with Innovations in Data Intelligence, we Jac Noel discussed the data lake, monitoring, and security capabilities of Splunk. This Security Solutions Architect paper describes the essential role Apache Kafka plays in our CIP and its key Elaine Rainbolt benefits, as shown here: Industry Engagement Manager ECONOMIES OPERATE ON DATA REDUCE TECHNICAL GENERATES OF SCALE IN STREAM DEBT AND CONTEXTUALLY RICH Paul Salessi DOWNSTREAM COSTS DATA Information Security Engineer Intel IT Contributors Victor Colvard Information Security Engineer GLOBAL ALWAYS MODERN KAFKA LEADERSHIP SCALE AND REACH ON ARCHITECTURE WITH THROUGH CONFLUENT Juan Fernandez THRIVING COMMUNITY EXPERTISE Technical Solutions Specialist Frank Ober SSD Principal Engineer Apache Kafka is the foundation of our CIP architecture. We achieve economies of Table of Contents scale as we acquire data once and consume it many times.
    [Show full text]
  • Unravel Data Systems Version 4.5
    UNRAVEL DATA SYSTEMS VERSION 4.5 Component name Component version name License names jQuery 1.8.2 MIT License Apache Tomcat 5.5.23 Apache License 2.0 Tachyon Project POM 0.8.2 Apache License 2.0 Apache Directory LDAP API Model 1.0.0-M20 Apache License 2.0 apache/incubator-heron 0.16.5.1 Apache License 2.0 Maven Plugin API 3.0.4 Apache License 2.0 ApacheDS Authentication Interceptor 2.0.0-M15 Apache License 2.0 Apache Directory LDAP API Extras ACI 1.0.0-M20 Apache License 2.0 Apache HttpComponents Core 4.3.3 Apache License 2.0 Spark Project Tags 2.0.0-preview Apache License 2.0 Curator Testing 3.3.0 Apache License 2.0 Apache HttpComponents Core 4.4.5 Apache License 2.0 Apache Commons Daemon 1.0.15 Apache License 2.0 classworlds 2.4 Apache License 2.0 abego TreeLayout Core 1.0.1 BSD 3-clause "New" or "Revised" License jackson-core 2.8.6 Apache License 2.0 Lucene Join 6.6.1 Apache License 2.0 Apache Commons CLI 1.3-cloudera-pre-r1439998 Apache License 2.0 hive-apache 0.5 Apache License 2.0 scala-parser-combinators 1.0.4 BSD 3-clause "New" or "Revised" License com.springsource.javax.xml.bind 2.1.7 Common Development and Distribution License 1.0 SnakeYAML 1.15 Apache License 2.0 JUnit 4.12 Common Public License 1.0 ApacheDS Protocol Kerberos 2.0.0-M12 Apache License 2.0 Apache Groovy 2.4.6 Apache License 2.0 JGraphT - Core 1.2.0 (GNU Lesser General Public License v2.1 or later AND Eclipse Public License 1.0) chill-java 0.5.0 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 OpenCensus 0.12.3 Apache License 2.0 ApacheDS Protocol
    [Show full text]
  • Talend Open Studio for Big Data Release Notes
    Talend Open Studio for Big Data Release Notes 6.0.0 Talend Open Studio for Big Data Adapted for v6.0.0. Supersedes previous releases. Publication date July 2, 2015 Copyleft This documentation is provided under the terms of the Creative Commons Public License (CCPL). For more information about what you can and cannot do with this documentation in accordance with the CCPL, please read: http://creativecommons.org/licenses/by-nc-sa/2.0/ Notices Talend is a trademark of Talend, Inc. All brands, product names, company names, trademarks and service marks are the properties of their respective owners. License Agreement The software described in this documentation is licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.html. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. This product includes software developed at AOP Alliance (Java/J2EE AOP standards), ASM, Amazon, AntlR, Apache ActiveMQ, Apache Ant, Apache Avro, Apache Axiom, Apache Axis, Apache Axis 2, Apache Batik, Apache CXF, Apache Cassandra, Apache Chemistry, Apache Common Http Client, Apache Common Http Core, Apache Commons, Apache Commons Bcel, Apache Commons JxPath, Apache
    [Show full text]
  • Talend ESB System Management Integration User Guide
    Talend ESB System Management Integration User Guide 6.0.1 Publication date: September 10, 2015 Copyright © 2011-2015 Talend Inc. All rights reserved. Copyleft This documentation is provided under the terms of the Creative Commons Public License (CCPL). For more information about what you can and cannot do with this documentation in accordance with the CCPL, please read: http://creativecommons.org/licenses/by-nc-sa/2.0/ This document may include documentation produced at The Apache Software Foundation which is licensed under The Apache License 2.0. Notices Talend and Talend ESB are trademarks of Talend, Inc. Apache CXF, CXF, Apache Karaf, Karaf, Apache Camel, Camel, Apache Maven, Maven, Apache Archiva, Archiva, Apache Syncope, Syncope, Apache ActiveMQ, ActiveMQ, Apache Log4j, Log4j, Apache Felix, Felix, Apache ServiceMix, ServiceMix, Apache Ant, Ant, Apache Derby, Derby, Apache Tomcat, Tomcat, Apache ZooKeeper, ZooKeeper, Apache Jackrabbit, Jackrabbit, Apache Santuario, Santuario, Apache DS, DS, Apache Avro, Avro, Apache Abdera, Abdera, Apache Chemistry, Chemistry, Apache CouchDB, CouchDB, Apache Kafka, Kafka, Apache Lucene, Lucene, Apache MINA, MINA, Apache Velocity, Velocity, Apache FOP, FOP, Apache HBase, HBase, Apache Hadoop, Hadoop, Apache Shiro, Shiro, Apache Axiom, Axiom, Apache Neethi, Neethi, Apache WSS4J, WSS4J are trademarks of The Apache Foundation. Eclipse Equinox is a trademark of the Eclipse Foundation, Inc. SoapUI is a trademark of SmartBear Software. Hyperic is a trademark of VMware, Inc. Nagios is a trademark of Nagios Enterprises, LLC. All other brands, product names, company names, trademarks and service marks are the properties of their respective owners. Table of Contents 1. Introduction .............................................................................................................. 1 2. Hyperic HQ Integration .............................................................................................. 3 2.1.
    [Show full text]
  • Release Notes Date Published: 2021-03-25 Date Modified
    Cloudera Runtime 7.2.8 Release Notes Date published: 2021-03-25 Date modified: https://docs.cloudera.com/ Legal Notice © Cloudera Inc. 2021. All rights reserved. The documentation is and contains Cloudera proprietary information protected by copyright and other intellectual property rights. No license under copyright or any other intellectual property right is granted herein. Copyright information for Cloudera software may be found within the documentation accompanying each component in a particular release. Cloudera software includes software from various open source or other third party projects, and may be released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other software included may be released under the terms of alternative open source licenses. Please review the license and notice files accompanying the software for additional licensing information. Please visit the Cloudera software product page for more information on Cloudera software. For more information on Cloudera support services, please visit either the Support or Sales page. Feel free to contact us directly to discuss your specific needs. Cloudera reserves the right to change any products at any time, and without notice. Cloudera assumes no responsibility nor liability arising from the use of products, except as expressly agreed to in writing by Cloudera. Cloudera, Cloudera Altus, HUE, Impala, Cloudera Impala, and other Cloudera marks are registered or unregistered trademarks in the United States and other countries. All other trademarks are the property of their respective owners. Disclaimer: EXCEPT AS EXPRESSLY PROVIDED IN A WRITTEN AGREEMENT WITH CLOUDERA, CLOUDERA DOES NOT MAKE NOR GIVE ANY REPRESENTATION, WARRANTY, NOR COVENANT OF ANY KIND, WHETHER EXPRESS OR IMPLIED, IN CONNECTION WITH CLOUDERA TECHNOLOGY OR RELATED SUPPORT PROVIDED IN CONNECTION THEREWITH.
    [Show full text]
  • Thesis Artificial Intelligence Method Call Argument Completion Using
    Method Call Argument Completion using Deep Neural Regression Terry van Walen [email protected] August 24, 2018, 40 pages Academic supervisors: dr. C.U. Grelck & dr. M.W. van Someren Host organisation: Info Support B.V., http://infosupport.com Host supervisor: W. Meints Universiteit van Amsterdam Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering http://www.software-engineering-amsterdam.nl Abstract Code completion is extensively used in IDE's. While there has been extensive research into the field of code completion, we identify an unexplored gap. In this thesis we investigate the automatic rec- ommendation of a basic variable to an argument of a method call. We define the set of candidates to recommend as all visible type-compatible variables. To determine which candidate should be recom- mended, we first investigate how code prior to a method call argument can influence a completion. We then identify 45 code features and train a deep neural network to determine how these code features influence the candidate`s likelihood of being the correct argument. After sorting the candidates based on this likelihood value, we recommend the most likely candidate. We compare our approach to the state-of-the-art, a rule-based algorithm implemented in the Parc tool created by Asaduzzaman et al. [ARMS15]. The comparison shows that we outperform Parc, in the percentage of correct recommendations, in 88.7% of tested open source projects. On average our approach recommends 84.9% of arguments correctly while Parc recommends 81.3% correctly. i ii Contents Abstract i 1 Introduction 1 1.1 Previous work........................................
    [Show full text]
  • HDP 3.1.4 Release Notes Date of Publish: 2019-08-26
    Release Notes 3 HDP 3.1.4 Release Notes Date of Publish: 2019-08-26 https://docs.hortonworks.com Release Notes | Contents | ii Contents HDP 3.1.4 Release Notes..........................................................................................4 Component Versions.................................................................................................4 Descriptions of New Features..................................................................................5 Deprecation Notices.................................................................................................. 6 Terminology.......................................................................................................................................................... 6 Removed Components and Product Capabilities.................................................................................................6 Testing Unsupported Features................................................................................ 6 Descriptions of the Latest Technical Preview Features.......................................................................................7 Upgrading to HDP 3.1.4...........................................................................................7 Behavioral Changes.................................................................................................. 7 Apache Patch Information.....................................................................................11 Accumulo...........................................................................................................................................................
    [Show full text]
  • 60 Recipes for Apache Cloudstack
    60 Recipes for Apache CloudStack Sébastien Goasguen 60 Recipes for Apache CloudStack by Sébastien Goasguen Copyright © 2014 Sébastien Goasguen. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editor: Brian Anderson Indexer: Ellen Troutman Zaig Production Editor: Matthew Hacker Cover Designer: Karen Montgomery Copyeditor: Jasmine Kwityn Interior Designer: David Futato Proofreader: Linley Dolby Illustrator: Rebecca Demarest September 2014: First Edition Revision History for the First Edition: 2014-08-22: First release See http://oreilly.com/catalog/errata.csp?isbn=9781491910139 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. 60 Recipes for Apache CloudStack, the image of a Virginia Northern flying squirrel, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
    [Show full text]
  • An Empirical Performance Evaluation of Apache Kafka
    How Fast Can We Insert? An Empirical Performance Evaluation of Apache Kafka Guenter Hesse, Christoph Matthies, Matthias Uflacker Hasso Plattner Institute University of Potsdam Germany fi[email protected] about such study results is a prerequisite for making informed Abstract—Message brokers see widespread adoption in modern decisions about whether a system is suitable for the existing IT landscapes, with Apache Kafka being one of the most use cases. Additionally, it is also crucial for finding or fine- employed platforms. These systems feature well-defined APIs for use and configuration and present flexible solutions for tuning appropriate system configurations. various data storage scenarios. Their ability to scale horizontally The contributions of this research are as follows: enables users to adapt to growing data volumes and changing • We propose a user-centered and extensible monitoring environments. However, one of the main challenges concerning framework, which includes tooling for analyzing any message brokers is the danger of them becoming a bottleneck within an IT architecture. To prevent this, knowledge about the JVM-based system. amount of data a message broker using a specific configuration • We present an analysis that highlights the capabilities of can handle needs to be available. In this paper, we propose a Apache Kafka regarding the maximum achievable rate of monitoring architecture for message brokers and similar Java incoming records per time unit. Virtual Machine-based systems. We present a comprehensive • We enable reproducibility of the presented results by performance analysis of the popular Apache Kafka platform 1 using our approach. As part of the benchmark, we study selected making all needed artifacts available online .
    [Show full text]
  • Hortonworks Data Platform May 29, 2015
    docs.hortonworks.com Hortonworks Data Platform May 29, 2015 Hortonworks Data Platform : Data Integration Services with HDP Copyright © 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing and analyzing large volumes of data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. The Hortonworks Data Platform consists of the essential set of Apache Hadoop projects including MapReduce, Hadoop Distributed File System (HDFS), HCatalog, Pig, Hive, HBase, Zookeeper and Ambari. Hortonworks is the major contributor of code and patches to many of these projects. These projects have been integrated and tested as part of the Hortonworks Data Platform release process and installation and configuration tools have also been included. Unlike other providers of platforms built using Apache Hadoop, Hortonworks contributes 100% of our code back to the Apache Software Foundation. The Hortonworks Data Platform is Apache-licensed and completely open source. We sell only expert technical support, training and partner-enablement services. All of our technology is, and will remain free and open source. Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page. Feel free to Contact Us directly to discuss your specific needs. Except where otherwise noted, this document is licensed under Creative Commons Attribution ShareAlike 3.0 License. http://creativecommons.org/licenses/by-sa/3.0/legalcode ii Hortonworks Data Platform May 29, 2015 Table of Contents 1.
    [Show full text]