Message-Oriented Middleware for Scalable Data Analytics Architectures

Total Page:16

File Type:pdf, Size:1020Kb

Message-Oriented Middleware for Scalable Data Analytics Architectures DEGREE PROJECT, IN MASTER'S PROGRAMME COMMUNICATION SYSTEMS , SECOND LEVEL STOCKHOLM, SWEDEN 2015 Message-oriented Middleware for Scalable Data Analytics Architectures NICOLAS NANNONI KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY Master’s Thesis KTH – Information and Communication Technology School Examiner Markus Hidell (ICT) Supervisor Marko Helin (Accedo) 2015-02-13 Message-oriented middleware for scalable data analytics architectures Author Nicolas Nannoni ([email protected] | [email protected]) Master’s Thesis – Report Message-oriented middleware and data analytics Abstract The democratization of Internet allowed many more people to use online services and enjoy their benefits. The traffic towards websites became tremendous those recent years, especially with the apparition of social networks. Mobile application, televisions and other non-computer devices also get connected to the Internet and use it to provide services to the end-users: Video on-demand, music streaming and so on. These applications rely on powerful backend servers that handle the requests made by devices and provide statistics and metrics about application usage. These metrics can be generated by aggregating the access logs (e.g. HTTP requests log), logs that are potentially extremely large. Big data tools and analytics, providing a way to handle this huge number of records, come then in hand, as typical client-server architectures, with a single database storing all the data, reach their limits in terms of performance and capacity. Data duplication, combined to dedicated and specialized databases storing it, is the key to efficient data handling. How to fill up these databases in an elegant, efficient and scalable manner is the remaining question, and message-oriented middleware may be a viable answer. This project aims at exploring the capabilities of such middleware, identifying what are the benefits and the drawbacks in using them and presenting how they can be integrated in a real-world application that needs to aggregate events and logs on a large scale. Apache Kafka and RabbitMQ, two message-oriented middleware, are benchmarked and compared, on both performance metrics and qualitative criteria. A fully working proof- of-concept (of an already-existing industry product modified to use a message-oriented middleware and a specialized data warehouse system) is developed and presented, to conclude on the usefulness of message-oriented middleware when designing scalable data analytics architectures. 2015-02-13 2 Master’s Thesis – Report Message-oriented middleware and data analytics Sammanfattning Demokratiseringen av Internet har tillåtit många fler att använda online-tjänster och deras fördelar. Trafiken till webbsidor har blivit enorm de senaste åren. Speciellt i och med de sociala nätverken. Mobil-applikationer, TV-apparater och andra enheter ansluter sig i allt större omfattning till Internet och tillhandahåller tjänster till slutanvändare: Video On-Demand, strömmande musik o.s.v. Applikationerna förlitar sig på kraftfull infrastruktur som kan hantera de förfrågningar enheterna gör och tillhandahålla statistik och mätetal om applikationernas användning. Dessa mätetal kan skapas genom att aggregera access-loggar (ex. HTTP-loggar). Dessa loggar är potentiellt väldigt stora. Så kallade Big Data-verktyg kan lösa problemet med att hantera denna stora mängd data. Typiskt är dessa verktyg klient-server-arkitekturer med en enskild, central databas som lagrar all data. Dessa databaser har i regel begränsningar när det gäller prestanda och kapacitet. Duplicering av data kombinerat med en dedikerad och specialiserad databas är nyckeln till en effektiv lösning på detta problem. Frågan är hur man på ett effektivt, elegant och skalbart sätt fyller dessa databaser med information. Här kan meddelande-baserad mellanprogramvara vara en lösning. Det här examensarbetet syftar till att granska hur sådan mellanprogramvara kan integreras i en applikation som används i branschen idag och som behöver aggregera stora mängder loggar. Apache Kafka och RabbitMQ, som är två meddelande-baserade mellanprogramvaror, granskas och jämförs. Prestanda och effektivitet av lösningarna testas. En fullständig prototyp skapas. Den baseras på ett befintligt system och ändras för att använda meddelande-baserad mellanprogramvara och ett specialiserat Data Warehouse-system. Slutligen dras slutsatser om meddelande- baserad mellanprogramvara är effektivt när man vill skapa ett skalbart system för aggregering av loggar. 2015-02-13 3 Master’s Thesis – Report Message-oriented middleware and data analytics Contents 1 Introduction .............................................................................................................................. 7 1.1 Background ............................................................................................................................................................ 7 1.1.1 Accedo Appgrid ......................................................................................................................................................................... 7 1.2 Motivations ............................................................................................................................................................. 8 1.3 Problem statement .............................................................................................................................................. 9 1.4 Content of this document and methodology ............................................................................................... 9 1.5 Outcomes, ethics and sustainaBility ........................................................................................................... 10 2 Message-oriented middleware: introduction ........................................................................... 12 2.1 What is a message-oriented middleware? ................................................................................................ 12 2.2 Typical use cases for message-oriented architectures ......................................................................... 13 2.3 General Benefits of message-oriented architectures ............................................................................ 14 2.4 General disadvantages or difficulties in implementing message-oriented middleware .......... 16 3 Case study: Accedo Appgrid ..................................................................................................... 19 3.1 Current Appgrid architecture ....................................................................................................................... 19 3.2 API overview ....................................................................................................................................................... 20 3.3 Log, event generation and management ................................................................................................... 20 3.4 Original log and event insertion process .................................................................................................. 21 3.5 Introducing a message-oriented middleware in Appgrid architecture .......................................... 22 3.6 Message-oriented log and event insert process ...................................................................................... 23 3.7 Possible improvements to the basic message-oriented log insertion process ............................. 24 4 Choice of a message Broker ..................................................................................................... 26 4.1 State of the art in message-oriented middleware .................................................................................. 26 4.2 Apache Kafka ...................................................................................................................................................... 28 4.2.1 Overall structure ................................................................................................................................................................... 29 4.2.2 Main features .......................................................................................................................................................................... 30 4.2.3 Set-up and management .................................................................................................................................................... 33 4.3 RaBBitMQ .............................................................................................................................................................. 33 4.3.1 Overall structure ................................................................................................................................................................... 34 4.3.2 Main features .......................................................................................................................................................................... 36 4.3.3 Set-up and management .................................................................................................................................................... 38 4.4 General feature comparison Between Apache Kafka and RaBBitMQ ............................................... 39 4.5 Preliminary conclusions ................................................................................................................................. 41
Recommended publications
  • Analysis of Notification Methods with Respect to Mobile System Characteristics
    Proceedings of the Federated Conference on DOI: 10.15439/2015F6 Computer Science and Information Systems pp. 1183–1189 ACSIS, Vol. 5 Analysis of notification methods with respect to mobile system characteristics Piotr Nawrocki ∗, Mikołaj Jakubowski † and Tomasz Godzik ‡ ∗AGH University of Science and Technology, al. A. Mickiewicza 30, 30-059 Krakow, Poland e-mail:[email protected] †e-mail:[email protected] ‡e-mail:[email protected] Abstract—Recently, there has been an increasing need for most promise and therefore the purpose is to discern their secure, efficient and simple notification methods for mobile usefulness in the best way possible. systems. Such systems are meant to provide users with precise In addition to the protocols and methods above, we inves- tools best suited for work or leisure environments and a lot of effort has been put into creating a multitude of mobile tigated other solutions, such as the Apple push notification applications. However, not much research has been put at the or Line application which, for various reasons, were not same time into determining which of the available protocols considered further. The Apple push notification technology is a are best suited for individual tasks. Here a number of basic good solution, but it is proprietary, i.e. limited to Apple devices notification methods are presented and tests are performed for and that is why we decided to test more universal solutions the most promising ones. An attempt is made to determine which methods have the best throughput, latency, security and other first. There are also solutions (applications) that use their own characteristics.
    [Show full text]
  • AMQP and Rabbitmq Message Queuing As an Integration Mechanism
    2/19/2018 MQTT A protocol for communicating with your things David Brinnen and Brett Cameron SESAM Seminar, March 2018 Abstract The Internet of Things refers to the ever-growing network of physical devices that have IP connectivity, allowing them to connect to the internet, and the communication that occurs between these devices and other internet-enabled devices and systems. In this talk, Brett and David will introduce the Internet of Things and will discuss some of the key technologies associated with the creation of Internet of Things solutions and services within the manufacturing domain. Particular attention will be given to MQTT, which is gaining acceptance as the preferred protocol for use by the Internet of Things applications. Currently available implementations of MQTT will be briefly reviewed and case studies illustrating the application of the protocol will be presented. Some examples of how MQTT might be used to implement secure, fault-tolerant, and scalable Internet of Things solutions will be discussed, and the application and use of MQTT in Next Generation SCADA systems that need to monitor and control devices at scale in a connected world will be will be discussed and demonstrated. 2 2/19/2018 About David Since completing his Masters studies in Embedded Software at the Swedish Royal Institute of Technology 2015 (KTH), David has been working as a software engineer across a range of projects, including the implementation of control logical and the commissioning of Energy Machines integrated energy systems and air handling units (https://www.energymachines.com/), and the development of a next-generation SADA platform (ControlMachines™) and simulation software to control, monitor, and model Energy Machines deployments.
    [Show full text]
  • Location Independent Inter-Process Communication As Software Buses
    Location independent inter-process communication as software buses Erik Samuelsson Erik Samuelsson VT 2016 Bachelor Thesis, 15 credits Supervisor: Cristian Klein, Ewnetu Bayuh Lakew Extern Supervisor: Clas H¨ogvall, Rickard Sj¨ostr¨om, Johan Forsman Examiner: Jerry Eriksson Bachelor’s Programme in Computing Science, 180 credits Abstract Telecommunication networks will transform and gradually migrate into virtualized cloud environments as a result of the potential for higher profitability through reduced costs and increased revenues. The purpose of this thesis is to investigate architectural mecha- nisms for location-independent communication between software components in a virtualized base station. Systems that provide such mechanisms are typically referred to as middleware and de- ployed as Platform-as-a-Service (PaaS). The overall goal is to achieve desired characteristics in cloud deployment regarding on- demand self-service, rapid elasticity of capacity while upholding services and high availability. Four communication protocols are examined and evaluated based on a set of functional and non- functional requirements that are especially relevant for a virtu- alized base station. In comparison with the Advanced Message Queuing Protocol (AMQP), Message Queuing Telemetry Trans- port (MQTT) and the eXtensible Messaging and Presence Pro- tocol (XMPP), the Data Distribution Service (DDS) standard is found to have excellent performance characteristics. Its complex- ity might have implications for the development and deployment though, that will increase the time it takes to reap the benefits from its advantages. Acknowledgements I wish to sincerely thank my supervisors at Tieto, Clas H¨ogvall, Rickard Sj¨ostr¨om and Johan Forsman for introducing an interesting and challenging thesis idea and for entrusting me with the task.
    [Show full text]
  • Industry Paper: Kafka Versus Rabbitmq a Comparative Study of Two Industry Reference Publish/Subscribe Implementations
    Industry Paper: Kafka versus RabbitMQ A comparative study of two industry reference publish/subscribe implementations Philippe Dobbelaere Kyumars Sheykh Esmaili Nokia Bell Labs Nokia Bell Labs Antwerp, Belgium Antwerp, Belgium ABSTRACT 1 INTRODUCTION Publish/subscribe is a distributed interaction paradigm well adapted e Internet has considerably changed the scale of distributed sys- to the deployment of scalable and loosely coupled systems. tems. Distributed systems now involve thousands of entities po- Apache Kaa and RabbitMQ are two popular open-source and tentially distributed all over the world whose location and behav- commercially-supported pub/sub systems that have been around for ior may greatly vary throughout the lifetime of the system. ese almost a decade and have seen wide adoption. Given the popularity constraints underline the need for more exible communication of these two systems and the fact that both are branded as pub/sub models and systems that reect the dynamic and decoupled na- systems, two frequently asked questions in the relevant online ture of the applications. Individual point-to-point and synchronous forums are: how do they compare against each other and which communications lead to rigid and static applications, and make the one to use? development of dynamic large-scale applications cumbersome [14]. In this paper, we frame the arguments in a holistic approach by To reduce the burden of application designers, the glue between establishing a common comparison framework based on the core the dierent entities in such large-scale seings should rather be functionalities of pub/sub systems. Using this framework, we then provided by a dedicated middleware infrastructure, based on an ad- venture into a qualitative and quantitative (i.e.
    [Show full text]
  • A Survey of Distributed Message Broker Queues
    A Survey of Distributed Message Broker Queues Vineet John Xia Liu University of Waterloo University of Waterloo [email protected] [email protected] ABSTRACT This paper surveys the message brokers that are in vogue today for distributed communication. Their primary goal is to facilitate the construction of decentralized topolo- gies without single points of failure, enabling fault tol- erance and high availability. These characteristics make them optimal for usage within distributed architectures. However, there are multiple protocols built to achieve this, and it would be beneficial to have a empirical comparison between their features and performance to determine their real-world applicability. Figure 1: Kafka Architecture This paper focuses on two popular protocols (Kafka and AMQP) and explores the divergence in their fea- RQ0 What are the message broker implementations tures as well as their performance under varied testing commonly in use today? workloads. RQ1 What are the common requirements for the im- plementation of message queues? KEYWORDS RQ2 What are the divergent functionalities in the distributed message broker, message queue, kafka, amqp, current message queue offerings? rabbitmq RQ3 How do each of the implementations offer relia- bility, partitioning and fault tolerance? 1 INTRODUCTION 3 KAFKA Kafka was developed at LinkedIn and primarily used Distributed Message Brokers are typically used to decou- for log processing. This worked well for Kafka’s user en- ple separate stages of a software architecture. They per- gagement metrics collection use-case. The fundamental mit communication between these stages asynchronously, features behind Kafka are performance over reliability by using the publish-subscribe paradigm.[1]. These mes- and it offers high throughput, low latency message queu- sage brokers are also finding new applications in the ing.
    [Show full text]
  • Rabbitmq : a MOM (Message Oriented Middleware)
    RabbitMQ : A MOM (Message Oriented Middleware) Yvon Kermarrec With figures from RabbitMQ tutorial and from “TabbitMQ in action” 1 Bibliography https://www.amqp.org/ https://www.rabbitmq.com/ « RabbitMQ in action » Alvaro Videla and Jason J.W. Williams « learning RabbitMQ » Martin Toshev « RabbitMQ essentials » David Dossot 2 Y Kermarrec Agenda AMQP Understanding messaging Using RabbitMQ illustrated with examples Administration of the broker Synthesis 3 Y Kermarrec Why AMQP ? Business, applications, services and devices need to interact and communicate information IBM technologies controled over 80% of the market with the WebSphere and MQ Hughe cost and danger of a monopoly… 4 Y Kermarrec AMQP The Advanced Message Queuing Protocol (AMQP) is an open standard application layer protocol for message-oriented middleware. The defining features of AMQP are message orientation, queuing, routing (including point-to-point and publish-and-subscribe), reliability and security (wikipedia) 5 Y Kermarrec AMQP AMQP provides a platform-agnostic method for ensuring information is safely transported between applications, among organizations, within mobile infrastructures, and across the Cloud. AMQP is used in areas as varied as financial front office trading, ocean observation, transportation, smart grid, computer-generated animation, and online gaming. Many operating systems include AMQP implementations, and many application frameworks are AMQP-aware 6 Y Kermarrec AMQP user group Set up by JPMorgan in 2006 • Goal to make Message Oriented
    [Show full text]
  • Hybrid Decentralized PBFT Blockchain Framework for Openstack Message Queue
    Kim and Park Hum. Cent. Comput. Inf. Sci. (2020) 10:31 https://doi.org/10.1186/s13673-020-00238-6 RESEARCH Open Access Hybrid decentralized PBFT Blockchain Framework for OpenStack message queue Youngjong Kim* and Jinho Park* *Correspondence: [email protected]; Abstract [email protected] Cloud computing based on OpenStack is widely used as a distributed computing School of Software, Soongsil University, 369 Sangdo-ro, platform. OpenStack has progressed at a rapid pace, incorporating a variety of service Dongjak-gu, Seoul, Republic modules; it is supported by many companies, has a community of active developers, of Korea and a diverse user base. OpenStack uses message queue to coordinate and exchange operation and status information between services. OpenStack supports various mes- sage queue services including RabbitMQ, Qpid, and ZeroMQ, whereas its distribution architecture uses RabbitMQ. As an OpenStack’s message queue service, RabbitMQ runs on a controller node as a centralized service. In case of the centralized service, increased usage may cause slowed response times and security vulnerability. This paper proposes a Hybrid decentralized Practical byzantine fault tolerance Blockchain Framework with two-step verifcation for OpenStack message queue service. When compared to existing OpenStack message queue service, OpenStack with the pro- posed framework demonstrates identical reliability a faster response time by approxi- mately 46.75% with a two-step verifcation process and decentralization approach. Additionally, a reduction in the security vulnerability in the OpenStack message queue information with saving the message queue information into each node by block- chain-based decentralized data duplication approach. Keywords: Cloud, OpenStack, Message queue, Hybrid, Decentralization, PBFT Blockchain Introduction OpenStack, which is the open-source Infrastructure-as-a-Service (laaS) Platform Pro- ject, is widely used in various felds.
    [Show full text]
  • A Pentesters Guide to Hacking Activemq Based
    WHITE PAPER A Pentesters Guide to Hacking ActiveMQ-Based JMS Applications 1 A Pentesters Guide to Hacking ActiveMQ-Based JMS Applications WHITE PAPER Table of Contents 4 Introduction Author 4 Messaging 101 5 Anatomy of a JMS Message This white paper was written by: Gursev Singh Kalra, Senior Principal 5 Message Broker Consultant,McAfee® Foundstone® 6 Messaging Models Professional Services 7 JMS API 8 Apache ActiveMQ Basics 8 ActiveMQ Authentication Options 10 ActiveMQ Authorization Controls 11 ActiveMQ Administration Console 11 Pentesting JMS Applications 11 Discovery and Configuration 11 Understanding ActiveMQ Configuration File 13 Review for Weak Configuration and Known Vulnerabilities 14 Data Protection in Storage and Transit 14 Insecure Communication 14 Insecure Password Storage 14 Weak Encryption Password 15 Unencrypted KahaDB 15 User Management and Authentication 15 No Authentication 15 Simple Authentication Plug-In 15 JAAS Authentication Plug-In 15 Account Lockout Testing 2 A Pentesters Guide to Hacking ActiveMQ-Based JMS Applications WHITE PAPER Table of Contents, continued 16 Data Validation and Error Handling 16 Injection Attacks 16 Attacking Other Clients 17 Authorization 17 Destination Access 17 Exploitation 17 Reading Queues with QueueBrowser 18 Retrieving Messages from Topics with TopicSubscriber 18 Retrieving Messages from Topics with Durable Subscribers 19 Additional Durable Subscriber Attacks 19 Retrieving Statistics for the Broker and its Destinations 20 Dynamic Destinations 20 JMSDigger―A GUI-Based JMS Assessment Tool
    [Show full text]
  • Providing High Availability to the Openstack Cloud Controller on Oracle Solaris with Oracle Solaris Cluster Release 1.0
    Providing High Availability to the OpenStack Cloud Controller on Oracle Solaris with Oracle Solaris Cluster Release 1.0 ORACLE WHITE PAPER | A P R I L 2 0 1 5 PROVIDING HIGH AVAILABILITY TO THE OPENSTACK CLOUD CONTROLLER ON ORACLE SOLARIS WITH ORACLE SOLARIS CLUSTER Table of Contents Introduction 1 Oracle Solaris Cluster Features 3 Overview of the High Availability Cloud Controller Configuration 5 Oracle Solaris Cluster Framework and Infrastructure Setup 7 Configuration assumptions................................................................................................ 8 Configure zpools on shared storage and a dataset per node for later usage.....................9 Zone cluster installation.....................................................................................................9 OpenStack Components on the HA Cloud Controller 12 Database - MySQL..........................................................................................................12 Configure cluster infrastructure objects for HA MySQL within zone cluster os-db......12 Install and configure the MySQL database within zone cluster os-db........................12 Configure the MySQL database as HA MySQL failover cluster resource os-mysql-rs within zone cluster os-db........................................................................................... 16 Message Queue - RabbitMQ...........................................................................................19 Configure cluster infrastructure objects for HA rabbitmq within zone cluster os-mq...19
    [Show full text]
  • Performance Comparison of Message Queue Methods
    UNLV Theses, Dissertations, Professional Papers, and Capstones August 2019 Performance Comparison of Message Queue Methods Sanika Raje Follow this and additional works at: https://digitalscholarship.unlv.edu/thesesdissertations Part of the Computer Sciences Commons Repository Citation Raje, Sanika, "Performance Comparison of Message Queue Methods" (2019). UNLV Theses, Dissertations, Professional Papers, and Capstones. 3746. http://dx.doi.org/10.34917/16076287 This Thesis is protected by copyright and/or related rights. It has been brought to you by Digital Scholarship@UNLV with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Thesis has been accepted for inclusion in UNLV Theses, Dissertations, Professional Papers, and Capstones by an authorized administrator of Digital Scholarship@UNLV. For more information, please contact [email protected]. PERFORMANCE COMPARISON OF MESSAGE QUEUE METHODS By Sanika N. Raje Bachelor of Science - Information Technology University of Mumbai 2012 Master in Computer Applications University of Mumbai 2015 A thesis submitted in partial fulfillment of the requirements for the Master of Science in Computer Science Department of Computer Science Howard R. Hughes College of Engineering The Graduate College University of Nevada, Las Vegas August 2019 ©Sanika N Raje, 2019 All Rights Reserved Thesis Approval The Graduate College The University of Nevada, Las Vegas May 16, 2019 This thesis prepared by Sanika N.
    [Show full text]
  • Examples of Open Source Software on Openvms
    9/2/2014 Examples of Open Source software on OpenVMS What, where, why, how and when Brett Cameron September 2014 Abstract The speaker has done a considerable amount of work over the past decade porting various pieces of Open Source software to the OpenVMS platform and working with OpenVMS customers to craft solutions using this Open Source software. During this talk the speaker will describe some of software that has been ported and where and how this software might be useful, with particular emphasis on Open Source web services and message queuing technologies. Work in progress and future plans will also be discussed. Additionally, the talk will examine the general approach taken by the speaker to the porting process, along with some of the pitfalls that are commonly encountered and how these may be resolved. In addition to the porting of software to OpenVMS, work has been done to implement interfaces for some of the ported Open Source packages to simplify their use from OpenVMS high-level languages such as COBOL, Pascal, and FORTRAN. Examples of such interfaces will be presented and the approaches taken to implementing them will be briefly discussed. 2 9/2/2014 About me Brett Cameron currently works as a senior architect with HP's corporate Cloud Services group, focusing on the design and implementation of message queuing and related integration services for customers and for internal use. Brett lives in Christchurch, New Zealand, and has worked in the software industry for some 22 years. In that time he has gained experience in a wide range of technologies, many of which have long since been retired to the software scrapheap of dubious ideas.
    [Show full text]
  • WSIT Release Notes
    Web Services Integration Toolkit for OpenVMS I64 Installation Guide and Release Notes February 2021 Software Version: Web Services Integration Toolkit Version 3.4-2D Introduction VMS Software Inc. is pleased to provide you with this new version of the Web Services Integration Toolkit for OpenVMS on Integrity servers. The Web Service Integration Toolkit for OpenVMS (WSIT) contains a collection of integration tools. These tools are easy to use, highly extensible, based on standards, and built on open source technology. The toolkit can be used to call OpenVMS applications written in 3GL languages, such as C, BASIC, COBOL, FORTRAN, and ACMS from newer technologies and programming languages such as Java, C#, Python, and JavaScript1. What’s new in this release The new options (XML tags) <protocol> and <CLOPT> have been introduced to the out-of- process section of server configuration file. The <protocol> tag is used to specify the inter- process communication protocol to be used for the application, and the <CLOPT> tag can be used to specify command line options to be supplied to the server by wsi$manager.exe at server start-up. A generic RabbitMQ consumer component (wsi$server_amqp.exe) that allows RabbitMQ and AMQP to be used in place of the traditional WSIT ICC protocol for out-of-process deployments server deployments. The corresponding ICC-based server has been renamed to wsi$server_icc.exe. 1 Use of C#, Python, and JavaScript for client development is supported on Linux and Windows only and requires the use of AMQP (the Advanced Message Queuing Protocol) in conjunction with the RabbitMQ message broker.
    [Show full text]