Message-Oriented Middleware for Scalable Data Analytics Architectures

DEGREE PROJECT, IN MASTER'S PROGRAMME COMMUNICATION SYSTEMS , SECOND LEVEL STOCKHOLM, SWEDEN 2015 Message-oriented Middleware for Scalable Data Analytics Architectures NICOLAS NANNONI KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY Master’s Thesis KTH – Information and Communication Technology School Examiner Markus Hidell (ICT) Supervisor Marko Helin (Accedo) 2015-02-13 Message-oriented middleware for scalable data analytics architectures Author Nicolas Nannoni ([email protected] | [email protected]) Master’s Thesis – Report Message-oriented middleware and data analytics Abstract The democratization of Internet allowed many more people to use online services and enjoy their benefits. The traffic towards websites became tremendous those recent years, especially with the apparition of social networks. Mobile application, televisions and other non-computer devices also get connected to the Internet and use it to provide services to the end-users: Video on-demand, music streaming and so on. These applications rely on powerful backend servers that handle the requests made by devices and provide statistics and metrics about application usage. These metrics can be generated by aggregating the access logs (e.g. HTTP requests log), logs that are potentially extremely large. Big data tools and analytics, providing a way to handle this huge number of records, come then in hand, as typical client-server architectures, with a single database storing all the data, reach their limits in terms of performance and capacity. Data duplication, combined to dedicated and specialized databases storing it, is the key to efficient data handling. How to fill up these databases in an elegant, efficient and scalable manner is the remaining question, and message-oriented middleware may be a viable answer. This project aims at exploring the capabilities of such middleware, identifying what are the benefits and the drawbacks in using them and presenting how they can be integrated in a real-world application that needs to aggregate events and logs on a large scale. Apache Kafka and RabbitMQ, two message-oriented middleware, are benchmarked and compared, on both performance metrics and qualitative criteria. A fully working proof- of-concept (of an already-existing industry product modified to use a message-oriented middleware and a specialized data warehouse system) is developed and presented, to conclude on the usefulness of message-oriented middleware when designing scalable data analytics architectures. 2015-02-13 2 Master’s Thesis – Report Message-oriented middleware and data analytics Sammanfattning Demokratiseringen av Internet har tillåtit många fler att använda online-tjänster och deras fördelar. Trafiken till webbsidor har blivit enorm de senaste åren. Speciellt i och med de sociala nätverken. Mobil-applikationer, TV-apparater och andra enheter ansluter sig i allt större omfattning till Internet och tillhandahåller tjänster till slutanvändare: Video On-Demand, strömmande musik o.s.v. Applikationerna förlitar sig på kraftfull infrastruktur som kan hantera de förfrågningar enheterna gör och tillhandahålla statistik och mätetal om applikationernas användning. Dessa mätetal kan skapas genom att aggregera access-loggar (ex. HTTP-loggar). Dessa loggar är potentiellt väldigt stora. Så kallade Big Data-verktyg kan lösa problemet med att hantera denna stora mängd data. Typiskt är dessa verktyg klient-server-arkitekturer med en enskild, central databas som lagrar all data. Dessa databaser har i regel begränsningar när det gäller prestanda och kapacitet. Duplicering av data kombinerat med en dedikerad och specialiserad databas är nyckeln till en effektiv lösning på detta problem. Frågan är hur man på ett effektivt, elegant och skalbart sätt fyller dessa databaser med information. Här kan meddelande-baserad mellanprogramvara vara en lösning. Det här examensarbetet syftar till att granska hur sådan mellanprogramvara kan integreras i en applikation som används i branschen idag och som behöver aggregera stora mängder loggar. Apache Kafka och RabbitMQ, som är två meddelande-baserade mellanprogramvaror, granskas och jämförs. Prestanda och effektivitet av lösningarna testas. En fullständig prototyp skapas. Den baseras på ett befintligt system och ändras för att använda meddelande-baserad mellanprogramvara och ett specialiserat Data Warehouse-system. Slutligen dras slutsatser om meddelande- baserad mellanprogramvara är effektivt när man vill skapa ett skalbart system för aggregering av loggar. 2015-02-13 3 Master’s Thesis – Report Message-oriented middleware and data analytics Contents 1 Introduction .............................................................................................................................. 7 1.1 Background ............................................................................................................................................................ 7 1.1.1 Accedo Appgrid ......................................................................................................................................................................... 7 1.2 Motivations ............................................................................................................................................................. 8 1.3 Problem statement .............................................................................................................................................. 9 1.4 Content of this document and methodology ............................................................................................... 9 1.5 Outcomes, ethics and sustainaBility ........................................................................................................... 10 2 Message-oriented middleware: introduction ........................................................................... 12 2.1 What is a message-oriented middleware? ................................................................................................ 12 2.2 Typical use cases for message-oriented architectures ......................................................................... 13 2.3 General Benefits of message-oriented architectures ............................................................................ 14 2.4 General disadvantages or difficulties in implementing message-oriented middleware .......... 16 3 Case study: Accedo Appgrid ..................................................................................................... 19 3.1 Current Appgrid architecture ....................................................................................................................... 19 3.2 API overview ....................................................................................................................................................... 20 3.3 Log, event generation and management ................................................................................................... 20 3.4 Original log and event insertion process .................................................................................................. 21 3.5 Introducing a message-oriented middleware in Appgrid architecture .......................................... 22 3.6 Message-oriented log and event insert process ...................................................................................... 23 3.7 Possible improvements to the basic message-oriented log insertion process ............................. 24 4 Choice of a message Broker ..................................................................................................... 26 4.1 State of the art in message-oriented middleware .................................................................................. 26 4.2 Apache Kafka ...................................................................................................................................................... 28 4.2.1 Overall structure ................................................................................................................................................................... 29 4.2.2 Main features .......................................................................................................................................................................... 30 4.2.3 Set-up and management .................................................................................................................................................... 33 4.3 RaBBitMQ .............................................................................................................................................................. 33 4.3.1 Overall structure ................................................................................................................................................................... 34 4.3.2 Main features .......................................................................................................................................................................... 36 4.3.3 Set-up and management .................................................................................................................................................... 38 4.4 General feature comparison Between Apache Kafka and RaBBitMQ ............................................... 39 4.5 Preliminary conclusions ................................................................................................................................. 41

Message-Oriented Middleware for Scalable Data Analytics Architectures

Analysis of Notification Methods with Respect to Mobile System Characteristics

AMQP and Rabbitmq Message Queuing As an Integration Mechanism

Location Independent Inter-Process Communication As Software Buses

Industry Paper: Kafka Versus Rabbitmq a Comparative Study of Two Industry Reference Publish/Subscribe Implementations

A Survey of Distributed Message Broker Queues

Rabbitmq : a MOM (Message Oriented Middleware)

Hybrid Decentralized PBFT Blockchain Framework for Openstack Message Queue

A Pentesters Guide to Hacking Activemq Based

Providing High Availability to the Openstack Cloud Controller on Oracle Solaris with Oracle Solaris Cluster Release 1.0

Performance Comparison of Message Queue Methods

Examples of Open Source Software on Openvms

WSIT Release Notes