Submitted by Christoph Gerstberger BSc.

Submitted at Institute for System Software

Supervisor a.Univ.-Prof. Dipl.-Ing. Reactive and Dr. Herbert Pr¨ahofer Event-based Mail September 2020 Processing based on ReactiveX

Master Thesis to obtain the academic degree of Diplom-Ingenieur in the Master’s Program Computer Science

JOHANNES KEPLER UNIVERSITY LINZ Altenberger Straße 69 4040 Linz, Osterreich¨ www.jku.at DVR 0093696 Statutory Declaration i

Statutory Declaration

I declare that I have authored this thesis independently, that I have not used other than the declared sources / resources, and that I have explicitly marked all material which has been quoted either literally or by content from the used sources.

Location, Date Signature Eidesstattliche Erklärung ii

Eidesstattliche Erklärung

Ich erkläre an Eides statt, dass ich die vorliegende Masterarbeit selbstständig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die wörtlich oder sinngemäß entnommenen Stellen als solche kenntlich gemacht habe. Die vorliegende Masterarbeit ist mit dem elektronisch übermittelten Textdokument identisch.

Ort, Datum Unterschrift Acknowledgement iii

Acknowledgement

I would like to express the deepest appreciation to Alexander Fried, the founder of Swilox, that he made this thesis possible. He was the one who provided the idea for this project and more important who funded this project. He also supported me with his knowledge about Vert.x and ReactiveX whenever I had questions. Without his financial support and technical expertise, the thesis would not exist. Many thanks for that!

Many thanks also go to my supervisor Herbert Prähofer, who established the communication be- tween Alexander Fried and me. He also supported me in many questions regarding the written thesis and reactive streams. Abstract iv

Abstract

This thesis describes an implementation of a mail server based on a reactive and event-based ap- proach using the ReactiveX library. Reactive programming means asynchronous event processing. For an asynchronous task, a handler can be registered which is called when the computation is completed. Thus, reactive task handling is asynchronous and non-blocking. ReactiveX is a library and API that provides reactive streams as an approach for reactive programming. With reactive streams, event-based, asynchronous computations can be defined in a functional way as chains of function applications.

This style of programming is used in this thesis to build a mail server. Instead of fully loading the mails and processing them at once, the mails are handled in a reactive way, allowing to process the mails in small chunks. A mail usually consists of many lines where each line contributes certain information to the mail. When a mail is processed reactively, each line can be analyzed individu- ally, processed immediately, and possibly passed to the next processing step. This can have great advantages in respect to performance and throughput.

The mail server consists of multiple components and communication between components is exclu- sively through events. The mail lines are received by the Mail Receiver which passes them to the Mail Parser. There the lines are parsed and further passed to other operating blocks, like the Mail Forward and Mail Sender block, which forward a mail, or the Mail Storage block, which stores the mail in the database.

Performance measurements show that a significant improvement has been achieved compared to a conventional thread-based server implementation. A first experiment tested how many clients are able to connect to the server and still keep the connection alive. The classic mail server approach could handle only ca. 1200 clients, whereas with the reactive mail server approach ca. 22 000 clients could be handled. In a further experiment, the number of messages that can be handled by the server was tested. On a machine with two CPU cores and 4GB of memory, the reactive server was able to handle ca. 10 000 messages in less than 2 minutes.

The thesis project has been conducted in cooperation with the start-up Swilox. The goal of the system provided by Swilox is to simplify registration and login services in web shops as well as shop-customer communication. A special mail system is used in the Swilox system for supporting shop-customer communication. A reactive mail server was needed to replace the existing mail server in the Swilox system, which was implemented with conventional technology and did not fulfill the demanding performance requirements. The reactive mail system developed in this thesis work was able to fulfill the requirements and therefore has been integrated in the production version of the Swilox system in the meantime. Kurzfassung v

Kurzfassung

Diese Arbeit beschreibt die Entwicklung eines reaktiven und ereignisbasierten Mailservers unter Verwendung von ReactiveX. Reaktive Programmierung bedeutet asynchrone Ereignisverarbeitung. Für eine asynchrone Berechnung kann ein Handler registriert werden, der aufgerufen wird, wenn die Berechnung abgeschlossen ist. Somit ist die Berechnung asynchron und nicht blockierend. Reac- tiveX ist eine Bibliothek und API, die reaktive Streams als Ansatz für eine reaktive Programmierung zur Verfügung stellt. Mit reaktiven Streams können ereignisbasierte, asynchrone Funktionen zu Funktionsketten zusammengehängt werden.

Diese Art der Programmierung wird in dieser Arbeit zum Aufbau eines Mailservers verwendet. Anstatt die Mails komplett zu laden und als Ganzes zu prozessieren, werden die Mails reaktiv be- handelt, so dass sie in kleinen Blöcken verarbeitet werden können. Eine Mail besteht in der Regel aus vielen Zeilen, wobei jede Zeile bestimmte Informationen zur Mail beiträgt. Wenn eine Mail reaktiv verarbeitet wird, kann jede Zeile einzeln analysiert, sofort verarbeitet und eventuell an den nächsten Verarbeitungsschritt übergeben werden. Dies kann große Vorteile im Hinblick auf Leistung und Durchsatz haben.

Der Mailserver besteht aus mehreren Komponenten. Die Kommunikation zwischen den Kompo- nenten erfolgt ausschließlich über Ereignisse. Die Zeilen der Mail werden von der Mail-Receiver Komponente empfangen, der sie an die Mail-Parser Komponente weiterleitet. Dort werden die Zeilen analysiert und an weitere Komponenten übergeben, wie z.B. die Mail-Forward und Mail- Sender Komponente, die eine Mail weiterleiten oder die Mail-Storage Komponente, die die Mail in der Datenbank speichert.

Leistungsmessungen zeigen, dass im Vergleich zu einer konventionellen Thread-basierten Server- Implementierung eine erhebliche Verbesserung erreicht wurde. In einem ersten Experiment wurde getestet, wie viele Clients in der Lage sind, sich mit dem Server zu verbinden und die Verbindung aufrechtzuerhalten. Der klassische Mailserver konnte nur ca. 1200 Clients behandeln, während mit dem reaktiven Mailserver ca. 22 000 Clients behandelt werden konnten. In einem weiteren Exper- iment wurde die Anzahl der Nachrichten getestet, die vom Server verarbeitet werden können. Auf einem Server mit zwei CPU-Kernen und 4 GB Speicher war der reaktive Server in der Lage ca. 10 000 Nachrichten in weniger als 2 Minuten zu verarbeiten.

Diese Diplomarbeit wurde in Zusammenarbeit mit dem Start-up Swilox durchgeführt. Ziel des von Swilox angebotenen Systems ist es, den Registrierungs- und Loginvorgang in Webshops sowie die Shop-Kunden-Kommunikation zu vereinfachen. Zur Unterstützung der Shop-Kunden-Kommunikation wird im Swilox-System ein spezielles Mailsystem eingesetzt. Ein reaktiver Mailserver wurde benötigt, um den bestehenden Mailserver im Swilox-System zu ersetzen, der mit konventionellen Technologien implementiert wurde und die hohen Leistungsanforderungen nicht erfüllte. Das in dieser Diplomar- beit entwickelte reaktive Mailsystem konnte die Anforderungen erfüllen und wurde deshalb inzwis- chen in die Produktionsversion des Swilox-Systems integriert. Table of Contents vi

Table of Contents

1 Introduction 1 1.1 Industrial Context ...... 1 1.2 Challenges ...... 2 1.3 Chapter Preview ...... 3

2 Use Cases 4 2.1 Mail from online shop to user (in Swilox App) ...... 4 2.2 Reply to a mail (in Swilox App) ...... 5 2.3 Forward mail to external user mail address ...... 6 2.3.1 Manual forward ...... 6 2.3.2 Automatic forward ...... 6 2.4 Reply to mail (from external mail server) ...... 7

3 Technologies 8 3.1 ReactiveX ...... 8 3.1.1 Observable ...... 9 3.1.2 Flowable ...... 10 3.1.3 Processor ...... 11 3.1.4 Maybe ...... 11 3.1.5 Completable ...... 12 3.1.6 Operators ...... 13 3.1.7 Observable vs. Iterable ...... 16 3.1.8 Hot and Cold Observables ...... 16 3.2 Eclipse Vert.x ...... 16 3.2.1 Concepts ...... 17 3.2.2 Example ...... 18

4 Mail Theory 19 4.1 SMTP ...... 19 4.2 Internet Message Format ...... 20 4.3 MIME Mail ...... 21 4.3.1 MIME Structure ...... 21

5 Architecture 24

6 Implementation 26 6.1 SMTP In ...... 26 6.2 MailParser ...... 27 6.2.1 Block hierarchy ...... 28 6.2.2 ReactiveStorage ...... 29 6.2.3 MimeLines ...... 30 6.2.4 Parsing Process ...... 31 6.3 MailStream ...... 34 6.4 MailStreamFactory ...... 37 6.5 Processing Steps ...... 38 6.5.1 Receive mail ...... 38 6.5.2 Forward mail ...... 42 Table of Contents vii

7 Extended Libraries 44 7.1 Vertx-Mail-Client ...... 44

8 Tests and Measurements 46 8.1 Reactivity Test ...... 46 8.2 Memory Test ...... 48 8.2.1 Blueglacier MIME Parser ...... 49 8.2.2 MailParser and MailStream ...... 49 8.2.3 MailParser and MailStream with deletion ...... 49 8.2.4 Results ...... 50 8.3 Connections Test ...... 51 8.3.1 Classic server ...... 51 8.3.2 Vert.x server ...... 53 8.4 Load Test ...... 55 8.4.1 Localhost ...... 57 8.4.2 Server to Server ...... 59

9 Conclusion 62

Literature 64 1 Introduction 1

1 Introduction

Reactive and event-based mail processing represents a strategy for processing emails and their mail parts iteratively and non-blocking. Usually emails are sent to the server in one piece before they are further processed. With the reactive and event-based approach, the server is able to read single lines from the mail and process them immediately. Therefore, reading and processing is interleaved and there is less blocking time.

The reactive and event-based approach provides key advantages. The server is faster in reacting to the mail content. Therefore, it can initialize processing steps, like storing the mail on the server or forwarding it to another mail server, as soon as the required lines are loaded and parsed. For example, if a mail does not belong to the server, the server can reject it as soon as the mail header is parsed. The server does not have to wait for the complete mail to be loaded. Moreover, processing of the mail is event-based, which means whenever a new line is received, a handler is called that initializes the parsing process. As a consequence, no threads are blocked while waiting for new lines.

This way of processing is supported by a technology called ReactiveX (www.reactivex.io). It de- scribes an API that allows working with data streams in an event-based and non-blocking way. In this thesis, a system for event-based and reactive email processing has been developed based on ReactiveX.

1.1 Industrial Context This thesis was done in cooperation with Swilox (www.swilox.com), a start-up company in Linz. Swilox also denotes the software system that it provides.

The main goal of Swilox is to develop a system that simplifies transactions in web shops. The reason why 35% of the customers decide to buy at a well-known online shop instead of a new one, is because they are not willing to create a new account. Online shops usually have their own password guidelines, so new passwords have to be remembered. Additionally, many shops tend to send spam mails to the customers email address. Also the time-consuming registration process is not beneficial for smaller online shops.

Swilox provides a solution for these problems. It has developed a system that cares about passwords, mail transactions and registration. The customer has to install the Swilox app on his mobile phone and then register to the Swilox system once. If this is done, the customer can use the Swilox app to login to all supported online shops. A QR-code is shown to the customer that can be scanned with the Swilox app. The customer data are then sent to the online shop in a secure way without the need to enter a password.

All the subsequent mail transactions are handled through a special mailing system with generated mail addresses. The customer does not come in touch with these addresses at all, because the mails are sent to the Swilox app through push messages. It represents a messenger-like system, where the customer communicates with the shop in an intuitive way. The shops receive a generated mail address for each customer, who logs in to their shop through Swilox. The generated mail address uniquely identifies the customer without knowing his private mail address. All mails that are sent to this generated address are sent via push messages to the customers Swilox app. 1 Introduction 2

1.2 Challenges The challenges for this thesis only aim on the special mailing system that runs on the server. Other components like the app, the webpage or the customer data transfer are already existing and are not part of this thesis.

A classic mail server architecture, where for every client a new thread is created which handles the mails in a blocking manner, is very memory-intensive and therefore inefficient. Another more efficient approach is needed where mails are handled reactive and non-blocking.

The Swilox system is developed with Vert.x [4], ReactiveX [10] and Kotlin [5]. The same approach is therefore needed for the mailing service. Vert.x has the capabilities to handle lots of clients si- multaneously with a small amount of threads. By using ReactiveX, a custom implementation of the mail protocols can be developed, which can process mails event-based and non-blocking. It works by parsing the lines one after the other without needing the whole mail to be loaded.

This results in following main requirements:

• Implementation of a reactive mail processing server

– Mail protocol (SMTP) and mail file format (MIME) need to be reactive and non-blocking

• Reactive processing in

– receiving mails – parsing mails – storing mails to the server – editing mails – sending mails

• Storage-friendly mail processing

• Comprehensive test environment

Figure 1.1: Stream-based processing of mails (with ReactiveX)

Figure 1.1 shows the core architecture of the system. The system has multiple components which operate together. The email lines are received by the Mail Receiver which passes them to the Mail Parser. There the lines are parsed and further passed to other operating blocks, like the Mail Forward and Mail Sender block, which forward a mail or the Mail Storage block, which stores the mail in the database. 1 Introduction 3

1.3 Chapter Preview The thesis is structured as follows:

• Chapter 2 describes important use cases of the Swilox system.

• Chapter 3 describes the technologies that were used in this thesis.

• Chapter 4 covers the mail protocol that is necessary to understand the procedure.

• Chapter 5 explains the architecture of the system.

• Chapter 6 describes, how the most important parts of the architecture are implemented.

• Chapter 7 describes the extended libraries.

• Chapter 8 covers important tests and performance measurements.

• Chapter 9 gives a summary of the work. 2 Use Cases 4

2 Use Cases

In this chapter the use cases of the system are described. They show the email communication between online shop and customer. The communication is built on a special mailing system. Mails are sent from the online shop to the Swilox server. The server then sends push messages to the customers app. The customer communicates with the shop via a messaging system which shows similarities to WhatsApp. The messaging style simplifies the communication for the customer. The messages can also be forwarded to an external mail server, if the customer wants to. Also a reply message from the external mail server is possible.

Figure 2.1 shows the communications between the online shop, the customer and external mail servers. When the shop sends mails to the Swilox server, it sends them to a generated email address, which identifies the customer. The actual private email address of the customer is not used. Therefore the private email address of the customer cannot be spammed. The mails are then processed and sent to the customer app and if requested also to an external mail server.

Figure 2.1: Mail and messaging system

In the following several usage scenarious are described. The first use case (Section 2.1) describes how the online shop sends a mail to the customer. The second use case (Section 2.2) describes how the customer replies to the shop mail via the mobile phone app. Then in Section 2.3 the forward use case is explained, which is divided into a manual and an automatic forward. The fourth use case (Section 2.4) describes how the customer can reply to a shop mail from an external mail server.

2.1 Mail from online shop to user (in Swilox App) In this use case the online shop wants to send an email to the customer. A sample scenario would be that the customer buys something at the online shop and uses Swilox as login method. The online shop receives the data of the customer from Swilox. After the customer has confirmed the purchase, the online shop usually sends a purchase confirmation email to the customer.

In our example in Figure 2.2, the online shop "Shop1" sends an email from their account [email protected] to the generated and encoded address that the online shop got from Swilox, e.g. 2 Use Cases 5 [email protected]. For better understanding the generated address is replaced by [email protected] in this document. The mail address associates the cus- tomer with the online shop. This association enables Swilox to connect an online shop to a customer.

When the mail is received by the Swilox server with the generated address, the server can identify the customer by the mail address. The server then sends a push notification to the customers phone app, which notifies the customer of a new message.

Figure 2.2: Use case - Mail from shop to user (in Swilox App)

2.2 Reply to a mail (in Swilox App) In this use case the customer replies to a message from the online shop using the Swilox app. The customer selects the message and clicks "Reply". After writing the message, the customer sends it by clicking "Send".

The server receives the message and identifies the customer. From the message content a new email is generated, which is sent from [email protected] to [email protected] (see Figure 2.3).

Figure 2.3: Use case - Reply to mail (in Swilox App) 2 Use Cases 6

2.3 Forward mail to external user mail address 2.3.1 Manual forward After the customer has received a mail from the online shop on his smartphone app, he may also want to have it on his private mail account. The customer can forward the mail by clicking the "Forward" button and enter his private mail address. Swilox then creates a new outgoing forward address, here called [email protected] and forwards the mail to the external mail server, i.e. the Gmail address in the example.

Figure 2.4: Use case - Forward mail manually from App

2.3.2 Automatic forward The forward function can also be automated for online shops. Then every incoming mail from the shop will be sent to the app by push notification and automatically to the external mail server through a newly generated mail address (see Figure 2.5).

Figure 2.5: Use case - Forward mail automatically when receiving a mail 2 Use Cases 7

2.4 Reply to mail (from external mail server) In this use case the customer wants to reply to a forwarded mail from an external mail server. After writing and sending the reply mail, it arrives at the generated address [email protected]. From this generated address, Swilox can figure out the customer who wrote the mail, and the online shop which this mail belongs to. After the shop has been figured out, the reply mail is sent to the shop via the original customer shop address [email protected].

Figure 2.6: Use case - Reply to mail from external mail server 3 Technologies 8

3 Technologies

In this chapter the technologies are described, which are important for this thesis. The essential technologies are ReactiveX and Eclipse Vert.x. This knowledge is needed for chapter 6 - Implemen- tation.

3.1 ReactiveX ReactiveX is an API for asynchronous programming with reactive streams. It consists of a variety of for programming with asynchronous and event-based data sequences. They are implemented in multiple languages, including Java, Kotlin, JavaScript and many more (polyglot implementation) [10][9].

Asynchronous programming allows programmers to perform computations (e.g. network requests) without having to wait for them. The programmer calls a function and provides a "callback" func- tion which is executed when the computation is done [9].

The sequences of ReactiveX, so called Observable streams, are asynchronous and event-based. They combine the advantages of the Pattern and the Observer Pattern, which simply means event-based notifications and stream transformations with operators (see Section 3.1.6 Operators). Observable streams emit three kinds of events: next, error and complete.

In the Observer Pattern an observer registers itself to a subject, which then notifies the observer of any changes. ReactiveX operates in a similar way. An observer subscribes to an observable (see Figure 3.1). The observable then sends events (e.g. changes) to the observer by executing the onNext callback function (notification). The events are processed one after the other (in order). If an error occurs, the observable notifies the observer by calling the onError callback. Otherwise after all events are sent without error, the onComplete callback is executed. After the onError or onComplete event, no more events follow [9].

Figure 3.1: Observer-Observable

In Figure 3.2 an observable stream, which completes successfully, is visualized on a timeline. First all four events are emitted and at the end the onComplete event is triggered. 3 Technologies 9

Figure 3.2: Observable - Success

In Figure 3.3 an observable stream where an error appeared is visualized on a timeline. The first three events could be emitted successfully, but then an error occured and the observable stream finished with an onError event. No onComplete event follows!

Figure 3.3: Observable - Error

3.1.1 Observable The example in Listing 1 shows, how an observer subscribes to an observable in Kotlin. The observable is implemented in the getData function and emits Int values (e.g. sensor data from the internet). In this example, the observable simply consists of a fixed array of Int values (Listing 2). In line 3 of Listing 1 a new observer subscribes to the observable by implementing three lambda functions in the subscribe function:

1. onNext: This lambda function defines what to do with the next emitted value. Here the item is just written to the console. Kotlin provides "String templates", an easy way to evaluate expressions within a String. The programmer has to set an "$" sign before the expression, which is to be evaluated.

2. onError: This lambda function defines, what should be done if an error occurs.

3. onComplete: This lambda function defines, what should be done if the observable completes.

Listing 1: Observer subscribing to Observable - Lambda

1 fun main() { 2 val dataStream: Observable = getData() 3 dataStream.subscribe({ item: Int -> 4 //onNext 5 println("onNext($item)") 6 }, { exception: Throwable -> 7 //onError 8 exception.printStackTrace() 9 }, { 10 //onComplete 3 Technologies 10

11 println("onComplete") 12 }) 13 }

Listing 2: getData() function

1 fun getData(): Observable { 2 return Observable.just(2, 14, 6, 9) 3 }

Using lambda functions is the usual way to implement observers. Another way would be to im- plement an anonymous class of the Observer interface and implement its methods (see Listing 3). However, as Kotlin provides such a high-level lambda support, lambdas are typically used.

Listing 3: Observer subscribing to observable - anonymous class

1 fun main() { 2 val dataStream= getData() 3 dataStream.subscribe(object : Observer { 4 override fun onComplete() { 5 println("onComplete") 6 } 7 override fun onSubscribe(disp: Disposable){ 8 println("onSubscribe") 9 } 10 override fun onNext(item: Int){ 11 println("onNext($item)") 12 } 13 override fun onError(exc: Throwable){ 14 exc.printStackTrace() 15 } 16 }) 17 }

The console output for the previous observable example is shown in Listing 4. First all four Int values are emitted by onNext and then the stream completes with onComplete.

Listing 4: Console output

1 onNext(2) 2 onNext(14) 3 onNext(6) 4 onNext(9) 5 onComplete

3.1.2 Flowable Flowables are pretty much the same as observables, except of one difference. Flowables implement backpressure strategies, observables do not. When items are produced more frequently than the subscribed observers can handle, a backpressure exception is thrown when using observables. Flow- ables can choose between several backpressure strategies. One of them is DROP, where onNext events are dropped as soon as the downstream cannot keep up with the observable. Another strategy is BUFFER, which buffers all onNext values until the downstream can consume them [2].

In the mail processing system only flowables are used. Therefore, if too many mails are arriving, they are handled by buffering them. 3 Technologies 11

3.1.3 Processor A Processor is an observable and an observer at the same time. It functions like a bridge. On the one hand, it can subscribe to other observables and receive events. On the other hand, it serves as an observable which other observer can subscribe to. Usually it is used as a processing stage, where further computation happens before values are emitted.

Figure 3.4: Processor

An example of how a processor is used is shown in Listing 5. A MulticastProcessor is created and started. This kind of processor allows multiple observers to subscribe. If an observer subscribes after items have been emitted, then it has missed them. In the example in Listing 5, Observer1 receives "Line A", "Line B" and the onComplete event. Observer2, however, does only receive "Line B" and onComplete (see Listing 6).

When an observable is initialized, the source where the items come from is also defined. The observable itself serves as the source. However, the initialization of a processor only defines the type. Items can be emitted to the processor from any source (e.g. line 8, 13 and 14 of Listing 5).

Listing 5: Processor example

1 fun useProcessor() { 2 val proc= MulticastProcessor.create< String>() 3 proc.start() 4 proc.subscribe( Listing 6: 5 { println("Observer1: onNext($it)") }, Processor - Output 6 { println("Observer1: onError") }, 1 Observer1: onNext(Line A) 7 { println("Observer1: onComplete") }) 2 Observer1: onNext(Line B) 8 proc.onNext("LineA") 3 Observer2: onNext(Line B) 9 proc.subscribe( 4 Observer1: onComplete 10 { println("Observer2: onNext($it)") }, 5 Observer2: onComplete 11 { println("Observer2: onError") }, 12 { println("Observer2: onComplete") }) 13 proc.onNext("LineB") 14 proc.onComplete() 15 }

3.1.4 Maybe Maybe is an observable object, which can emit one of three different states: success, error, complete. Therefore the Maybe object can either emit a single value (onSuccess), no value (onComplete) or an error (onError). [6] Figure 3.5 visualizes the three possible events. Only one of the three events can be emitted, e.g. no onComplete event follows after onSuccess. 3 Technologies 12

Figure 3.5: Maybe

The code example in Listing 7 shows the three different Maybe events. The first Maybe (line 2) emits an onSuccess event containing the value "Line A". The second Maybe (line 3) emits an onError event containing the Throwable instance. The third Maybe (line 4) emits an onComplete event without any value.

Listing 7: Maybe example

1 fun useMaybe() { 2 val maybeSuccess= Maybe.just("LineA") 3 val maybeError= Maybe.error< String>(Throwable("Error")) 4 val maybeComplete= Maybe.empty< String>() 5 }

3.1.5 Completable A Completable is an observable object which can emit either a complete event or an error event. No elements are emitted in a Completable object. Completable objects are generally used to check if an operation has completed successfully or not (error). For example, when a network operation is executed, often a completable is returned, which indicates if the operation was successful or not [3].

Figure 3.6: Completable

In Listing 8 the two possible events of a completable are shown. The variable completableComplete holds a Completable object that emits the complete event if an observer subscribes. The variable completableError emits the error event instead.

Listing 8: Completable example

1 fun useCompletable() { 2 val completableComplete= Completable.complete() 3 val completableError= Completable.error(Throwable("Error")) 4 } 3 Technologies 13

3.1.6 Operators ReactiveX provides lots of operators that can be used to modify and transform observable streams. Some of them have a similar functionality as operators from Java Streams, but ReactiveX provides a bigger variety. In the following some important operators are described.

3.1.6.1 Map Operator The map operator is one of the most commonly used operators. Its purpose is to apply any form of computation to each of the values emitted by the Observable and return an Observable with the results of the computation. [12]

Figure 3.7: Map - Operator

For the following examples the function getData is used to create the data observable. It is defined in Listing 2. The visualization of the map operator in Figure 3.7 shows an operator which multiplies each emitted value by two.

Listing 9: Map - Example Listing 10: Map - Output 1 fun useMap() { 1 4 2 val data= getData() 2 28 3 val doubleOutput= data.map{x ->x*2} 3 12 4 doubleOutput.subscribe{ println(it)} 4 18 5 }

In Listing 9 line 3 the map operator is implemented by applying "times 2" to each element of the data stream. The subscription to the observable in line 4 is done by only implementing the onNext handler with a lambda function. Therefore onError and onComplete are not handled. This simplification should only be used for demonstration examples. In production all three handler should be implemented, because errors are hard to find, when the onError handler is not properly implemented. Listing 10 shows the output of the code example.

3.1.6.2 Filter Operator The filter operator applies a predicate function to each value of the observable stream. Only items that pass the predicate are further emitted. 3 Technologies 14

Figure 3.8: Filter - Operator

The example of the filter operator in Listing 11 blocks all items that are bigger or equal than 10. Therefore only the values 2, 6 and 9 pass through the operator (see Listing 12).

Listing 11: Filter - Example Listing 12: 1 fun useFilter() { Filter - Output 2 val data= getData() 1 2 3 val filteredData= data.filter{x ->x < 10 } 2 6 4 filteredData.subscribe{ println(it)} 3 9 5 }

3.1.6.3 Scan Operator The scan operator takes the emitted items and applies an accumulator function to it. Then it emits the result, but also feeds it into the next item as input argument. In this way, the previously emitted item affects the current item.

Figure 3.9: Scan - Operator

In Listing 13 the data flowable is transformed by the scan operator. Every previously emitted item is fed into the the next item. The previous item and the current item are then aggregated and the result is emitted. This is done until the flowable ends. The resulting output flowable is shown in Listing 14. 3 Technologies 15

Listing 13: Scan - Example Listing 14: Scan - Output 1 fun useScan() { 1 2 2 val data= getData() 2 16 3 val res= data.scan{x: Int,y: Int ->x+y} 3 22 4 res.subscribe{ println(it)} 4 31 5 }

3.1.6.4 Chaining Operators Operators can be stuck together in form of a chain. This is possible because every operator returns an observable and therefore new operators can be added with the dot notation.

When combining the map and the filter operator in the example, the values are first multiplied by the factor of two and then filtered for any values that are smaller than 20. This is shown in Figure 3.10. After applying both operators, only three values remain, because the value 14 got multiplied by two, which is 28 and then the filter operator sorted out this value, because it is bigger or equal than 20.

Figure 3.10: Chaining Operators

The code and its console output are shown in Listing 15 and Listing 16.

Listing 15: Chaining Operators - Example 1 fun chainingOperators() { Listing 16: Chaining Operators - 2 val data= getData() Console output 3 val res= data 1 4 4 .map{x ->x*2} 2 12 5 .filter{x ->x < 20 } 3 18 6 res.subscribe{ println(it)} 7 } 3 Technologies 16

3.1.7 Observable vs. Iterable Java Iterable provides a way of handling sequences of data synchronously in a pull-based approach, which is blocking. So if we consider the next function of an Iterable, the program tries to pull data from the Iterable stream and blocks until the data is delivered [11].

Observables on the other hand are asynchronous and push based. The Observer subscribes to the Observable with certain handler functions. The program can then continue. When a new item is ready to be emitted to the Observable stream, it is pushed to the Observers by calling their onNext function. In that way the items are "pushed" to the Observers. Items can then be processed, whenever they arrive (asynchronous) [11].

single items multiple items synchronous T getData() Iterable getData() asynchronous Future getData() Observable getData()

Table 3.1: Positioning of Observables [11]

Additionally, Observables are very flexible in handling the three major events: retrieve data, discover error and complete. In the Observable implementation, each of them has a custom handler that must be provided by an Observer. For Iterables, however, the programmer has to manually take care of them, as the Table 3.2 shows [11].

event Iterable(pull) Observable(push) retrieve data next() onNext(T) discover error throws Exception onError(Exception) complete !hasNext() onComplete()

Table 3.2: Comparison - Iterable vs. Observable [11]

3.1.8 Hot and Cold Observables There are two types of observables, hot and cold observables. Hot observables may emit items as soon as they are created. Therefore, when an observer subscribes later on, it might miss some items that are already emitted. Cold observables start emitting items only when the first observer has subscribed. Therefore no item can be missed, at least for the first subscriber.

3.2 Eclipse Vert.x Eclipse Vert.x represents a toolkit for programming asynchronous and non-blocking network appli- cations. Like ReactiveX, Vert.x is a polyglot implementation and supports multiple languages, e.g. Java, Kotlin, JavaScript and a few more. Vert.x is based on the Netty project [8], an asynchronous event-driven network application framework for the JVM, but Vert.x provides a higher-level API for easier use [1].

The classic strategy for handling network communications is assigning each connecting socket a separate thread which deals with the client until it disconnects. For small projects this simple threading strategy works just fine. However, if systems scale up, overhead is getting too high from handling every socket in a single thread, because threads are not cheap in memory and also thread scheduling gets more complex. Vert.x comes up with a solution. It heavily relies on event-driven 3 Technologies 17 and non-blocking mechanisms and uses concepts like event loops and verticles. Therfore it can handle more concurrent network connections with less threads than the classic approach [1].

A blocking mechanism is described as a long lasting operation (usually IO operations), where the thread blocks and waits until the operation is done. Either the result is returned or an exception is thrown directly. The non-blocking mechanism however specifies handler functions for the operation. The operation is executed in the background (e.g. a file system operation), while the thread is not blocked by it. When the operation is completed one of the handler functions is called, either the success or error handler. [19]

3.2.1 Concepts Verticles are chunks of code that are running on a Vertx instance. By deploying multiple verticles, systems can scale up easily, like for example, when having a ServerVerticle which handles incom- ing clients. If at any point in time too many clients are connecting, the system can simply deploy additional ServerVerticles to balance the load between them. [1]

Incoming events (e.g. connecting clients) are usually forwarded to verticles by event loops (see Figure 3.11). An event loop takes care of the event queue and forwards new events to a verticle, if it is not busy. This pattern is well-known from asynchronous programming.

Figure 3.11: Vertx - Event loop [1]

The event loop is meant to forward a great number of events to verticles in a short period of time. Verticles, however, may not block the event loop for too long, because otherwise the event loop would lose its purpose. If long lasting or blocking code has to be performed, the code is executed on a worker thread instead of an event loop thread.

In Vert.x every event loop is attached to a single thread. Because each verticle is assigned to one event loop only, it processes the events single-threaded and therefore no thread coordinating code is needed. Usually when a new Vertx instance is created, N event loop threads are created, where N is the number of CPU threads times two. As an example, if the CPU has 4 cores and 8 hyper threads, then the Vertx instance will create 16 event loop threads. 3 Technologies 18

3.2.2 Example When working with Vert.x, it is necessary to create a Vertx instance (Listing 17). The Vertx instance is the control point of every operation concerning Vert.x, like for example, deploying new verticles.

Listing 17: Creating Vertx instance

1 val vertx= Vertx.vertx()

For this example, a simple TCP server should be created which communicates with the clients (see Listing 18). A ServerVerticle is defined, which creates a NetServer that is listening on port 54321. In line 6 a connection handler is supplied to the NetServer. Every newly connected client is handled by this lambda function. It receives a NetSocket as argument, for which again a handler can be specified. For example in line 7, a data handler is specified for handling incoming data. In line 10 an exception handler is specified which cares about possible exceptions. In line 13 a close handler is specified, where some final code can be executed when the connection is closed.

Listing 18: Creating the NetServer

1 class ServerVerticle: AbstractVerticle() { 2 override fun start() { 3 val netServerOptions= NetServerOptions() 4 netServerOptions.port = 54321 5 val netServer= vertx.createNetServer() 6 netServer.connectHandler{ socket: NetSocket -> 7 socket.handler{ buffer -> 8 println("Received data: $buffer") 9 } 10 socket.exceptionHandler{ exception -> 11 println("Exception:${exception.message}") 12 } 13 socket.closeHandler{ 14 println("Connection closed") 15 } 16 } 17 netServer.listen() 18 } 19 }

When more processing power is needed, additional ServerVerticles can be deployed (Listing 19). New verticles are assigned to other event loop threads, which leads to multi-threaded processing. In this way systems can scale up easily.

Listing 19: Deploying a ServerVerticle

1 vertx.deployVerticle(ServerVerticle()) 4 Mail Theory 19

4 Mail Theory

4.1 SMTP SMTP (Simple Mail Transfer Protocol) is a communication protocol for sending mails. The first version was defined in RFC 821 [15] in 1982. The most recent update of SMTP is called Extended Simple Mail Transfer Protocol (ESMTP). It was published in the document RFC 5321 [13] in the year 2008. It added various extensions to the SMTP that are used for improved security and added functionality.

An example client-server conversation according to ESMTP is shown in Figure 4.1. In this conver- sation one mail is delivered and afterwards the conversation is closed.

Figure 4.1: Example - SMTP Protocol [17]

The communication between client and server (Figure 4.1) works as follows

1. The client connects to the server.

2. The server replies with code 220 “Service ready”. The client can continue.

3. The client sends an EHLO command, telling the server his name. In RFC 821 there was just an HELO command, but the extensions required to develop a new command (EHLO = Extended Hello). 4 Mail Theory 20

4. The server replies with code 250 “Server greeting client”. Here the server can add additional settings that the server provides, like “8BITMIME” data transfer or max file size “SIZE”. Multiline responses have a dash (“-”) instead of a space between status code and text.

5. The client calls the sender address with command “MAIL FROM:<...>”.

6. The server replies with code “250 OK”

7. The client calls the recipient address with command “RCPT TO: <...>”.

8. Again the server replies with “250 OK”. This recipient command can be called multiple times, each time the server has to confirm.

9. The client announces to send data with the “DATA” command.

10. The server replies with “354 start mail input”, telling the client that it is ready to receive the mail content.

11. The client now sends all the data, where each line is sent separately with at the end of each line. The data part ends with a line containing only a dot (“.”). The data part is defined by RFC 2045-2049 (see 4.2 Internet Message Format).

12. The server replies with code “250 OK”.

13. Client calls end of transaction with command “QUIT”.

14. Server closes channel with response “221 closing channel”.

4.2 Internet Message Format The Internet Message Format (IMF) defines how the content of a mail has to look like. The stan- dard is described in RFC 5322 [14] of the year 2008. It is an updated version of the original RFC 822 [16] of the year 1982.

A simple mail message consists of two building blocks, a header and a body. The header contains important information about the mail, e.g. the sender, the receiver, the date and how the body looks like. The body contains the content of the mail. A simplified visualization of a mail is shown in Figure 4.2.

Figure 4.2: Simple mail visualization

A simple mail according to the original RFC 822 standard could consist of the lines visible in Figure 4.3. The header block contains the minimum header fields (Date, From, To). Then an empty line follows, indicating the end of the header. The body contains simple ASCII characters only. 4 Mail Theory 21

Figure 4.3: Example - RFC 822

The RFC 822 protocol of 1982 was developed when only text messages were relevant. Therefore, the body of these messages was intended to deliver US-ASCII characters only. As other file types became more important, there was the requirement that emails can send attachments with any file type as well. Therefore, in the year 2008 RFC 5322 and the RFC 2045-2049 standards were developed which introduced new elements and structures.

4.3 MIME Mail Multipurpose Internet Mail Extensions (MIME) of RFC 2045-2049 was developed in 1996. It intro- duced new elements for the Internet Mail Format that did not interfere with the existing message format, but enabled other character sets and multimedia files to be sent.

The standard defines additional header fields, where the most important ones are:

• MIME-Version: The MIME version header field that declares the actual MIME version.

• Content-Type: The content type field specifies the media type of the data in the body of a message.

• Content-Transfer-Encoding: This field describes how the data in the body of a message is encoded.

4.3.1 MIME Structure A MIME mail is identified by its MIME-Version header field, which is set in the main header. Usually also the Content-Type field is set and helps to decide, how the body block has to be processed [7]. The Content-Type field can carry different multimedia types. Some important ones are:

• Multipart: The body of the mail contains multiple blocks where each block again has a header and a body.

– multipart/mixed: The blocks do not relate to each other, but do have a certain order. – multipart/alternative: Each of the blocks contains the same content, but in different formattings. – multipart/related: The blocks relate to each other and are only useful when they are combined.

• Text: The body of the mail contains only text.

– text/plain: The body is plain text. – text/html: The body is html text.

• Image The body of the mail contains image data. 4 Mail Theory 22

– image/png – image/jpeg

However, there are a lot more types that can be set as the Content-Type field.

To present an example, Figure 4.4 shows a main block with Content-Type "multipart/mixed", which has two sub blocks. One block has Content-Type "text/plain" and one has Content-Type "image/png".

Figure 4.4: MIME - Multipart

To have a comparison of how an email looks like, the real email code suitable to Figure 4.4 is printed in Figure 4.5. The brackets on the right side visualize the main "multipart/mixed" block, which refers to the top-most block in Figure 4.4. The brackets on the left visualize the bottom two blocks, "text/plain" and "image/png". These two blocks are located within the body block of the "multipart/mixed" block. The blocks are separated by a boundary ID, which is defined in the Content-Type of the main block. The boundary ID can be any character concatenation, but it has to be unique.

Figure 4.5: MIME - Multipart code 4 Mail Theory 23

With the multipart type any kind of mail tree can be built. A complex example is shown in Fig- ure 4.6. The main header has Content-Type "multipart/mixed" and there are multiple sub-blocks in its body. The first block is a "multipart/alternative" block, the second and third one are attach- ments, one PDF file and one PNG image. The "multipart/alternative" block means that there are multiple blocks that provide the same content in different formattings. The first block is the plain text, the second one is a "multipart/related" block. It contains one block that provides the HTML code and one block that provides an PNG image which the HTML block refers to.

Thus, this mail contains two attachments, a PDF file and an image, and it has a text and a HTML representation of the content. The receiving mail program can decide, which presentation of the content it prefers. Usually they are ordered from top to bottom, whereas top is no formatting and bottom is most formatting.

Figure 4.6: MIME - Complex 5 Architecture 24

5 Architecture

The architecture had to be designed in a way that the reactive concept and non-blocking mecha- nisms are utilized throughout each component of the system. It was also required that the mails can be accessed on the level of raw lines, but also on the level of a block structure as shown in (see 4.3 MIME Mail). In Figure 5.1 the architecture of the mailing system is shown. It visualizes the individual parts of the system and how the data flows through them.

The process is initiated by a client connecting to the server, which represents an incoming mail (Server In). A running NetServer verticle (implemented with Vert.x - see 3.2 Eclipse Vert.x) cares about incoming connections. The mail is received in buffers of bytes. Each buffer can hold a sequence of bytes.

The SMTP In component splits up these sequences of bytes into lines ending with CRLF (Carriage Return + Line Feed). Important data is extracted and further processed. When the DATA part of SMTP (see 4.1 SMTP) is received, a Flowable (see 3.1 ReactiveX) is used to forward each line by events to the MailParser.

The MailParser is responsible for parsing the DATA part of the SMTP communication. Therefore, it has to parse the actual mail content (see 4.3 MIME Mail). The parsing process is done line-by- line. The lines are analyzed with the help of a state machine. Whenever a new line is emitted by the Flowable, the MailParser analyzes it and stores the current state. If a new part is completely received and parsed, it is stored in a MimeBlock instance, which is then further emitted in the outgoing Flowable.

The MailStream class processes the Flowable. It builds the interface to access all different mail parts. Its implementation mainly focuses on processing the Flowable and extracting all the different information. However, it is also capable of modifying mails, for example, changing the header of a mail.

The central MailStream interface is used by different services. The MailStorageService, for ex- ample, uses the interface to store the different mail parts. It stores the complete mail as file of bytes on the file storage. It stores a simple HTML representation of the mail on the file storage, as well as all the attachments in their specified file types. It also stores important mail meta information in the database for quick access and also for locating the actual mail on the file storage.

The next step, where the MailStream interface is used, is the MailSendingService. It is, as the name suggests, responsible for sending mails. For example, if automatic mail forward is enabled, the mail has to be forwarded. Therefore the header has to be replaced, which is done by the MailStream interface. Then the modified mail can be sent via SMTP Out.

For the SMTP Out implementation, the Vert.x mail client is used. However, the original Vert.x mail client implementation is not reactive. Therefore a fork of the Github library was necessary, where the SMTP out protocol was modified to send the mail lines in a reactive way.

This is the usual process of receiving a mail. Figure 5.1, however, also shows that a mail can be processed by reading a file from the operating system (OS). Yet another way to process a mail would be to use the MailStreamFactory to build a new mail based on some input data. 5 Architecture 25 Architecture Figure 5.1: 6 Implementation 26

6 Implementation

In this chapter the different components of the system as described in Chapter 5 are explained in detail.

6.1 SMTP In As explained in Chapter 5, the SMTP In component is responsible for converting the received bytes into mail lines. The protocol is implemented in the class smtpSessionNew. Before a mail can be received through SMTP, a server has to be started to which the client can connect. This process is described in Section 3.2.2. For each new client a new SmtpSessionNew instance is allocated. The existing Vertx instance and the NetSocket instance of the client are passed to the SmtpSessionNew instance. The initializer of SmtpSessionNew registers the handlers to the NetSocket (see Listing 20). It also sends a welcome message to the client (line 7) which is the first message of the SMTP transaction. The welcome message contains the status code 220 and an arbitrary text message.

Listing 20: SmtpSessionNew - initializer

1 init{ 2 socket.handler(this::handleBuffer) 3 socket.exceptionHandler{ 4 log.error("SmtpSessionNew- Error when reading from NetSocket stream!${it. message}") 5 } 6 socket.endHandler{ subject.onComplete() } 7 sendWelcome() 8 }

All further actions are usually triggered by the handler handleBuffer which receives the bytes from the client. The handleBuffer method buffers all bytes until a CRLF (Carriage Return + Line Feed) is found, because that represents the end of a line. The concatenated line is then treated further by the handleLine method.

Listing 21: SmtpSessionNew - handleBuffer

1 fun handleBuffer(buffer: Buffer){ 2 try { 3 var = false 4 for (i in 0.until(buffer.length())) { 5 val b= buffer.getByte(i) 6 curBuffer.appendByte(b) 7 if (curBuffer.length() > 10000) { 8 sendFatalError(500,"line too long, are you DOSing me?") 9 return 10 } 11 if (b ==N&&r){ 12 handleLine(curBuffer) 13 curBuffer= initBuffer() 14 } 15 r=b ==R; 16 } 17 } catch(ex: Exception){ 18 ex.printStackTrace() 19 log.error("handleBuffer:${ex.message}",ex) 20 socket.close() 21 } 6 Implementation 27

22 }

Depending on the state in which the session currently is, not all actions are allowed at any time. The handleLine method defines what actions are allowed in which state and executes the proper one (see Listing 22). For example, if the session is in state EHLO, only the commands MAIL (= MAIL FROM) or QUIT are allowed. If the client sends anything other than that, the server sends back an error code.

Listing 22: SmtpSessionNew - handleLine

1 private fun handleLine(buffer:Buffer){ 2 when(state){ 3 States.UNINITIALIZED -> handleCommand(buffer,"quit") 4 States.INITIALIZED -> handleCommand(buffer,"ehlo","helo","quit") 5 States.EHLO -> handleCommand(buffer,"mail","quit") 6 States.MAIL -> handleCommand(buffer,"rcpt","quit") 7 States.RCPT -> handleCommand(buffer,"rcpt","data","quit") 8 States.DATA -> handleActualData(buffer) 9 } 10 }

When the DATA part of the SMTP is reached, the ReactiveX processor (see 3.1.3 Processor) starts to play an important role. Every line which is received within the DATA state is added to a Processor called "linesStream" (see line 8 in Listing 23). Before the DATA part started, an instance of the MailParser class has been created and has subscribed to the lineStream. Each line which is emitted to the lineStream processor, will be immediately passed to the MailParser, where it gets anaylyzed and parsed.

The DATA part ends if a line with just a dot is received. Then an onComplete event is triggered on the lineStream processor (line 3). The MailParser knows that the mail input has ended and can finish parsing the mail. The state of the SmtpSessionNew class is reset to States.EHLO, because further mails could follow. Then an OK message is sent back to the client and the buffers are reset.

Listing 23: SmtpSessionNew - handleActualData

1 private fun handleActualData(buffer: Buffer){ 2 if (dataEnd.equals(buffer)){ 3 lineStream.onComplete() 4 state= States.EHLO 5 sendOK() 6 initBuffers() 7 }else{ 8 lineStream.onNext(buffer) 9 } 10 }

6.2 MailParser The MailParser is responsible for parsing the DATA part of the SMTP conversation, i.e. the mail content. The MailParser uses some other classes for the parsing process. Let’s cover these classes first before explaining the parsing process. 6 Implementation 28

6.2.1 Block hierarchy A hierarchy of Block classes (Figure 6.1) was needed to store the raw and parsed data, because mails are built out of blocks.

A Block is the abstract class of the block classes. It contains a property type of type BlockType, a property from and to of type Int, which describe from which line to which line of the complete mail the block reaches, and it has a boolean flag isMainHeader, which indicates if the block is the main header block. The BlockType is an enumeration with the values: undefined, html, text, inline, attachment.

A MimeBlock represents a few lines of the mail like a header of a mail or a body, for example. The lines are stored in an IReactiveStorage instance, which is the key class for storing elements in the reactive mail processing system. It also stores a parsed header instance if the block is a header block, otherwise the field is null. Instances of this class are the result of the MailParser class (compare Figure 5.1).

A DoubleMimeBlock represents a block containing a header and a body block. This class refers to the representation in Figure 4.2. It combines two separated blocks to one connected block. The DoubleMimeBlock enables access to decoded body blocks, because only with the suitable header block, the encoded body can be decoded.

Figure 6.1: Class diagram - Block hierarchy 6 Implementation 29

6.2.2 ReactiveStorage Every email consists of lots of lines and the lines are separated in blocks. The observers, which are interested in the mail data, want to subscribe to certain mail information and when they are available, they want to be notified by an event. However, the moment when an observer subscribes can vary, and later subscribers also need to receive all the items (e.g. the lines), even if some have already been emitted (cold observable). Therefore a storage solution was needed which can store data items, but also reactively emit them by events. It is called ReactiveStorage. When an ob- server subscribes, it receives a concatenated Flowable of already stored items from the ArrayList and future items from the Processor.

The IReactiveStorage interface (see Figure 6.2) defines all the methods that a "reactive stor- age" has to master. It is generic, because it is used to store mail lines (Strings) as well as blocks (MimeBlocks).

Figure 6.2: Class diagram - ReactiveStorage

The underlying implementation is realized with an ArrayList in the class ReactiveStorageWithList. When a new instance of ReactiveStorageWithList is created, a new ArrayList and a MulticastProcessor is initialized. The MulticastProcessor allows multiple Observers to sub- scribe to it. In the init method (line 7-14 of Listing 24), the processor is started. An observer is subscribed to the processor which stores every item to the ArrayList that is emitted to the processor. In that way the ArrayList always contains all items that were emitted by the processor.

When the add method of ReactiveStorageWithList is called, it increments the counter and emits a new element to the processor (line 16-21 of Listing 24). The processor then adds the element to the ArrayList. However, this is only done, if the processor has not already completed.

When the elements of the ReactiveStorageWithList are required, the getAll method is called 6 Implementation 30

(line 23-28 of Listing 24). It creates a new Flowable by concatenating the ArrayList with the MulticastProcessor (line 26). In that way, the already emitted elements, which are stored in the ArrayList, are combined with the not yet received elements, which will be emitted by the processor in the future.

The defer operator is necessary to make sure that ever subscriber of the flowable, which is returned from the getAll method, really receives all lines. Without defer operator every subscriber would receive the same flowable as the first subscriber. This means that every subscriber receives the lines that were in the ArrayList, when the first subscriber subscribed, plus the lines that are newly emit- ted by the processor. However, if the processor has emitted lines between the first subscription and the current subscription, then the current subscriber misses these lines. With the defer operator, every subscriber receives a new flowable which is created just before the subscription. Therefore it contains the most current data.

Listing 24: ReactiveStorageWithList

1 class ReactiveStorageWithList : IReactiveStorage { 2 private var list: ArrayList? = ArrayList() 3 private val proc: MulticastProcessor = MulticastProcessor.create() 4 private val disposable: Disposable 5 private var counter=0 6 7 init{ 8 proc.start() 9 disposable= proc.subscribe({ 10 list!!.add(it) 11 }, { 12 throw it 13 }) 14 } 15 16 override fun add(element:T){ 17 if(!proc.hasComplete()) { 18 counter++ 19 proc.onNext(element) 20 } 21 } 22 23 override fun getAll(): Flowable { 24 return Flowable.defer{ 25 val iterable= if (list == null) emptyList() else ArrayList(list) 26 Flowable.fromIterable(iterable).concatWith(proc) 27 } 28 } 29 }

6.2.3 MimeLines The class MimeLines was developed to manage the adding of blocks to the reactive storage more easily. It uses an instance of IMailStorage. However, it does not store String items, but rather MimeBlock items. Each block is added by the appendBlock method. With the methods getLines or getBlocks one can access the stored lines or blocks. It also cares about correct indices of the blocks (from and to property of each block). 6 Implementation 31

Figure 6.3: Class diagram - MimeLines

6.2.4 Parsing Process Now that the classes are explained, let’s address the MailParser and its parsing process. The goal of the MailParser is to incrementally parse the lines of the mail content (line-by-line) and emit intermediate parsing results (MimeBlocks) to the MailStream via a stream of MimeBlocks (Flowable).

The parsing process is initialized by the parse method of the MailParser class (see Listing 25). A stream of lines (Flowable) is passed as input data. An observer subscribes to the stream which handles incoming lines. Each line is handled by the handleState method which parses the line depending on the current state. The current state (curState) as well as an instance of MimeLines and an instance of MimeBlock is also allocated (line 1-3). The curBlock variable keeps track of the currently processed block and the lines variable manages all the blocks.

Listing 25: MailParser - parse

1 private var curBlock: MimeBlock= MimeBlock() 2 private val lines: MimeLines= MimeLines() 3 private var curState: MultipartState= MultipartState.START_MESSAGE 4 5 fun parse(stream: Flowable): Flowable { 6 stream.subscribe({ line -> 7 ... 8 handleState(line) 9 ... 10 }, { 11 logger?.error("MailParser.kt- parse(): Error while listening to mail stream.\n ${it.message}\n${it.stackTrace}") 12 }, { 13 curBlock.onComplete() 14 lines.onComplete() 15 }) 16 return lines.getBlocks() 17 }

The handleState method takes care of all the different mail states a MIME mail can have. De- pending on the current mail state, a proper method for parsing the line is executed. Listing 26 shows the method implemented with the when control structure of Kotlin, which is similar to the Java switch statement.

Listing 26: MailParser - handleState

1 private fun handleState(line: String){ 2 when(curState){ 6 Implementation 32

3 MultipartState.START_MESSAGE -> startMessage(line) 4 MultipartState.START_MULTIPART -> startMultipart(line) 5 MultipartState.PREAMBLE -> preamble(line) 6 MultipartState.START_BODYPART -> startBodyPart(line) 7 MultipartState.START_HEADER -> startHeader(line) 8 MultipartState.FIELD -> parseField(line) 9 MultipartState.END_HEADER -> endHeader(line) 10 MultipartState.BODY -> body(line) 11 MultipartState.END_BODYPART -> endBodyPart(line) 12 MultipartState.EPILOGUE -> epilogue(line) 13 MultipartState.END_MULTIPART -> endMultipart(line) 14 MultipartState.END_MESSAGE -> endMessage(line) 15 } 16 }

Each of the methods that is called in the handleState method of Listing 26, takes care of a single state of the MIME mail structure (4.3 MIME Mail). The state diagram that is used to process the MIME mail states is shown in Figure 6.4.

The MailParser starts in state START_MESSAGE. When the first line is received, the state switches to START_HEADER. Then a new MimeBlock is created and added to the MimeLines instance. The state switches to state FIELD. The following received lines are added to the header MimeBlock. When an empty line is received, the end of the header is recognised and the state switches to END_HEADER. Then the header is parsed, because the next decision is based on the Content-Type field (see 4.3.1 MIME Structure) of the header.

If the Content-Type field is any other type than "multipart/..." (see 4.3.1 MIME Structure), then the state switches to state BODY and a new MimeBlock is created and added to the MimeLines instance. All following lines are added to the MimeBlock until the mail reaches its end, where it switches to state END_MESSAGE.

However, if the Content-Type field was a "multipart/..." type, then a multipart mail is recog- nised and the MailParser switches to state START_MULTIPART. In this state a new MimeBlock is created and added to the MimeLines instance. Then the PREAMBLE state is reached which can contain explanatory information. Usually this block is ignored, because it does not contain rele- vant information. It can also be an empty block, if no preamble is defined in the mail. As soon as a line with a boundary ID is recognised (line starting with "--"), the MailParser switches to state START_BODYPART and creates a new MimeBlock. Then the header and its fields are read and stored, which is the same procedure as at the beginning of the state diagram. When the state END_HEADER is reached, a new decision has to be made. If the currently parsed header is of type "multipart/...", then the state switches to START_MULTIPART again, because a nested multipart was recognised. Otherwise the END_BODYPART state is reached. The next decision depends on the boundary ID. If the line contains a starting boundary ID (line starting with "--"), then a sibling multipart is recognised and the start switches to START_BODYPART. Otherwise if an ending boundary ID is recognised (line starting and ending with "--"), then no more sibling will follow and the state switches to EPILOGUE. The epilogue can be empty like the preamble and is to be ignored as well. The state then switches to END_MULTIPART. When there are still lines to be emitted, then the mail tree (see Figure 4.6) goes one level upwards and enters the END_BODYPART state again. If no more lines follow, the mail has ended and the END_MESSAGE state is reached. 6 Implementation 33

Figure 6.4: State diagram - MIME 6 Implementation 34

To show an example, in Listing 27 the startMultipart function is printed. If the current block is empty (line 4), e.g. from an empty preamble block, there is no need to add it to the MimeLines instance. If it is not empty, the onComplete event is called, which notifies its observers that it has completed. A new MimeBlock is created and assigned as curBlock. The new block is then added to the MimeLines instance which in the background calls an onNext event with the block. Every observer of the flowable, which is returned by the method getBlocks, (see 6.2.3 MimeLines) receives the event. When lines are added to the new block, then every observer of the flowable, which is returned by the method getLines, will receive the onNext events with these lines. Then the current state changes to PREAMBLE and the currently processed line is passed further to the handleState method.

Listing 27: MailParser - startMultipart

1 private fun startMultipart(line: String){ 2 logger?.info("startMultipart()") 3 4 if(curBlock.getLength() != 0) { 5 curBlock.onComplete() 6 curBlock= MimeBlock() 7 lines.appendBlock(curBlock) 8 } 9 curState= MultipartState.PREAMBLE 10 handleState(line) 11 }

6.3 MailStream The MailStream class is the central interface for accessing and modifying mail data. It operates on the Flowable which it receives from the MailParser (see Figure 5.1). The job of the MailStream class is to extract the individual mail parts from the block stream and also modify these blocks. It inherits from the IMailStream interface which defines all the methods to access and modify mail data (see Figure 6.5). 6 Implementation 35

Figure 6.5: Class diagram - IMailStream

The implementation of the MailStream class operates solely on a Flowable that it receives as constructor argument. Its methods internally use a converted Flowable type. Block is a superclass of the MimeBlock and the DoubleMimeBlock class (header and body). The Flowable type therefore has to be converted which is done by the getBlocks() method (see Listing 28). The conversion from two single blocks to one double block is possible, because header and body block are always emitted directly after each other. Therefore, when a header block is found, it can be converted to a double block by adding it to the following body block.

The scan operator takes the emitted item, applies a computation to it and emits it, but the com- puted item is also transferred to the next emitted item as input argument. In the getBlocks method, the scan operator emits pairs of blocks (Pair). The question mark after the type represents that they are nullable. The input arguments of the scan operator are the pair, which represents the last emitted block, and the block, which represents the current block. The pair argument can take three possible values:

1. Pair(block, null): The previous block was no header block. Therefore the current block is a single block. However, the current block can either be a header block, then Pair(null, block) must be emitted, or a non-header block, then Pair(block, null) must be emitted.

2. Pair(null, block): The previous block was a header block. Therefore the current block is the body block and a double block must be emitted (Pair(block, block)). 6 Implementation 36

3. Pair(block, block): The previous block was a double block. Therefore the current block is a single block again, because after a double block, a single block follows. The attached filter operators are important as well. The first filter operator filters the starting Pair (null, null) which is emitted as the inital value of the scan operator. The second filter operator blocks the pairs with the header block only (Pair(null, block)). The map operator at the end ensures to map the Pair items to either MimeBlock or DoubleMimeBlock items.

Listing 28: MailStream - getBlocks

1 override fun getBlocks(): Flowable { 2 val blocks= blocks.scan(Pair( null, null), { pair, block -> 3 if (pair.second == null){// last block was no header block 4 if (block.type == BlockType.undefined){ 5 // current block is no header block -> emit Pair(block, null) 6 Pair(block, null) 7 } else { 8 // current block is header block -> emit Pair(null, block) 9 Pair(null, block) 10 } 11 } else { 12 if (pair.first == null)// last block was header block 13 Pair(pair.second, block)// emit Pair(header, body) now 14 else // last block was double block 15 Pair(block, null)// emit single block now 16 } 17 }).filter{ pair: Pair -> 18 !(pair.first == null && pair.second == null)// filter out Pair(null, null) 19 }.filter{ pair: Pair -> 20 !(pair.first == null && pair.second != null)// filter out Pair(null, block) 21 }.map{ pair: Pair -> 22 if (pair.second == null) 23 pair.first!! 24 else 25 DoubleMimeBlock(pair.first!!, pair.second!!) 26 } 27 return blocks 28 }

Most of the remaining functions of the MailStream class make use of the getBlocks function. A simple example is the getText function (see Listing 29) which extracts the text block of the Flowable from the getBlocks function and returns it as a Maybe. It is of type Maybe, because it might not exist.

The function filters the stream of blocks for any DoubleMimeBlock, which has a property block.type that is equal to BlockType.text. It then takes the first one, if there were multiple, and casts the one to a DoubleMimeBlock.

Listing 29: MailStream - getText

1 override fun getText(): Maybe { 2 return getBlocks() 3 .filter{ block -> block is DoubleMimeBlock&& block.type == BlockType.text} 4 .firstElement() 5 .cast(DoubleMimeBlock::class.java) 6 } 6 Implementation 37

6.4 MailStreamFactory The MailStreamFactory class was implemented to create new mails in an easy way. The mail parts can be added as parameters (see Figure 6.6) and the mail structure will be dynamically built. For simple mails with just a text or HTML, another possibility is to retrieve a cached MailStream instance, e.g. with getSampleTextMailStream, and just swap the header and text part. This is an efficient way, because the parsing is only done once and then the MailStream instance is cached.

Figure 6.6: Class diagram - MailStreamFactory

An example of how to create a mail with the MailStreamFactory is shown in Listing 30. In the example, the from adress, the to address and the subject of the mail is set. Additionally the text and html fields are set which will lead to an alternative representation ("multipart/alternative") of these two contents.

Listing 30: MailStreamFactory - createMail

1 val mail2= MailStreamFactory.createMail( 2 from="[email protected]", 3 to="[email protected]", 4 subject="Mail creation", 5 text="Hallo Erika!\nDas ist ein MailStreamFactory Beispiel.", 6 html="\n\n

Hallo Erika!
\n
Das ist ein MailStreamFactory Beispiel.
\n\n" 7 )

The output of the example is visible in Listing 31. As the output shows, the header fields in line 1-3 are added dynamically. A unique Message-ID is generated as well. Also a unique boundary ID is generated to separate the multipart blocks from each other. If many multipart levels are added to a mail, e.g. when text, attachments and HTML with related attachments are set in the createMail method, the nesting of the multipart blocks is done automatically.

Listing 31: MailStreamFactory - Output of createMail

1 MIME-Version: 1.0 2 Message-ID: 3 Date: Mon, 15 Jun 2020 09:36:30 +0200 (CEST) 4 Subject: Mail creation 5 From: [email protected] 6 To: [email protected] 7 Content-Type: multipart/alternative; 8 boundary="=--vertx_mail_876563773_1592206590543_4" 9 10 --=--vertx_mail_876563773_1592206590543_4 11 Content-Type: text/plain 12 Content-Transfer-Encoding: 7bit 6 Implementation 38

13 14 Hallo Erika! 15 Das ist ein MailStreamFactory Beispiel. 16 17 --=--vertx_mail_876563773_1592206590543_4 18 Content-Type: text/html 19 Content-Transfer-Encoding: 7bit 20 21 22 23

Hallo Erika!
24
Das ist ein MailStreamFactory Beispiel.
25 26 27 28 --=--vertx_mail_876563773_1592206590543_4--

6.5 Processing Steps The processing steps describe how the mentioned classes from Section 6.1 to Section 6.4 are used in the Swilox system. Yet, they are mainly used in the MailStorageService and MailSendingService of the Swilox system.

The essential use cases are:

• Receive mail

• Store mail

• Send mail

• Forward mail

• Reply mail

• Create mail

In the following the use cases for receiving and forwarding mails are explained.

6.5.1 Receive mail When the mail is received through SMTP (see Section 6.1) and the DATA part is reached, the receiveMail method of the MailStorageService is called. The initialized MailStream instance and some other arguments are passed to the receiveMail method. Listing 32 shows a simplified version of the receiveMail method. It heavily uses the IMailStream instance for accessing mail data.

In line 4 a MailMeta object is created which will hold important email information. In line 7 every line of the mail is requested from the MailStream instance to store a copy of the mail on the server. Line 10 calls a method that accesses the text or html part of the MailStream and creates a viewable HTML structure, which is used for the representation on the smartphone. Line 14 requests all the attachments of the mail and stores them as files on the server. In line 17 an even shorter HTML view (maximal 256 characters) is generated from the MailStream instance for preview purposes. 6 Implementation 39

In line 20 the storage of the MailMeta object is started. First, important header information are re- trieved from the MailStream instance and stored in the MailMeta object. Then the preview HTML and information about the attachments are stored in the MailMeta object. Afterwards the filled MailMeta object is saved in the CosmosDB. The CosmosDB is a database that is used by Swilox. When the MailMeta object is saved successfully, the mail is automatically forwarded to the external mail address. The automatic forwarding function must be explicitly enabled, however. At the end a notification is sent to the Swilox app that a new mail has been received.

All the actions are connected with the andThen operator. This operator first subscribes to the previous observable and when the observable is completed, it subscribes to the next observable. The resulting observables, however, are not needed. Only the success or failure state of them is important, which is why they are converted to completables by using the ignoreElements operator. This operator does not emit the individual values, but does only emit the error or complete event. Therefore, if an error occurs within the sequence, the complete sequence fails and emits the error signal. However, if every operation in the sequence completes successfully, the complete signal is emitted.

All the returned values of line 7, 10, 14 and 20 are completables which means they either emit an error or success event. To prove if the receiveMail method is successful, all of these completables have to be combined to a single completable. This is done in line 42 and 43, where all completables are merged to a single one which is then the return value of the receiveMail method.

Listing 32: MailStorageService - receiveMail

1 private fun receiveMail(..., mailStream: IMailStream, ...): Completable{ 2 return Completable.defer{ 3 // 1.) create MailMeta ----- 4 val mailMeta = ... 5 6 // 2.) store complete mail(strings) ----- 7 val storeMailCompletable= mailBlob.saveBlocksGZ(blobName, getMailBlobPathOrig (..., mailStream.getFullMail().map{ Buffer.buffer(it) }) 8 9 // 3.) store viewHtml ----- 10 val storeViewHtmlCompletable= storeViewHtmlCompletable(mailStream, ...) 11 12 // 4.) store attachments ----- 13 val attachments= mailStream.getAttachments() 14 val storeAttachmentsCompletable= storeAttachments(..., attachments) 15 16 // 5.) short view html ----- 17 val shortViewHtml= getShortViewHtml(mailStream) 18 19 // 6.) store mailMeta ----- 20 val storeMailMetaCompletable= mailStream.getHeader().map{ header -> 21 // 6.1.) store header specific data in mailMeta ----- 22 ... 23 }.ignoreElement().andThen(shortViewHtml).map{ short -> 24 // 6.2.) store shortViewHtml in mailMeta ----- 25 ... 26 }.ignoreElement().andThen(attachments).map{ attBlock -> 27 // 6.3.) store attachments in mailMeta ----- 28 ... 29 }.ignoreElements().andThen(Single.defer{ 6 Implementation 40

30 // 6.4.) store mailMeta 31 cosmosDb.createEntity(mailMeta) 32 }).doOnSuccess{ newMeta -> 33 // 6.5.) auto-forward 34 forwardIfAuto(..., mailStream).subscribe({}) 35 }.ignoreElement() 36 .andThen(Single.defer{ 37 // 6.7.) send notification to app 38 ... 39 }).ignoreElement() 40 41 // 7.) subscribing to all completables at once(concurrent processing) 42 val comps: Publisher = Flowable.fromArray(storeMailCompletable, storeViewHtmlCompletable, storeAttachmentsCompletable, storeMailMetaCompletable) 43 Completable.merge(comps) 44 } 45 }

To test, if the receiveMail method is working in a reactive way, a sample mail (see Figure 6.7) is sent to a running server. Logs were introduced in the code to check which action is performed at which point in time. The sample mail is a simple text mail with two attachments, a PDF file and a PNG image.

Figure 6.7: Mail example for receiving

The logs of the mail receiving process are shown in Listing 33. The logs are built in the pattern: ": message". Log lines with an "onNext(..)" message are the events, where single lines are emitted by the SMTP flowable. Red lines are event logs, yellow lines are header lines and green lines are body lines.

First, two events are printed, showing that the mail receiving process and the mail storing process to the server is started. When storing the complete mail, each line is sent to the server as soon as it arrives.

The lines 3-11 are the header lines of the main block (see top block in Figure 6.7) and lines 13 - 41 are the body lines which contain the other blocks. Lines 12, 19, 29 and 41 are the boundary IDs which separate the nested blocks.

The lines 13-15 are the header lines of the "text/plain" block. Directly after the header lines, an event is logged which says that the creation of the ViewHtml is started. This event is started, because a "text/plain" block was recognised. The lines 17-18 are the body lines of the "text/plain" block.

The lines 20-23 describe the header lines of the "application/pdf" block which is the PDF at- tachment. After the header, an event is logged in line 24. It tells us that the storage process of the 6 Implementation 41

first attachment is started, because an attachment was recognised. Line 25-28 describe a minimized version of the body lines of the attachment, which is encoded in Base64.

The last block is a PNG image. Lines 30-35 show the header lines of the "image/png" block. Then again an event for the start of an attachment storing process is shown in line 36. Lines 37-40 rep- resent the Base64 encoded body lines of the "image/png" block.

After the mail has ended with the closing boundary ID in line 41, some conclusive event logs are printed, where the storage of the ViewHtml and attachments are successfully finished. Also the storage process of the full mail is completed. At the end the mail receiving process is done.

Listing 33: MailStorageService - Logs of receive mail

1 1587204332782: start receiving mail 2 1587204332786: start storing mail in BlobStorage 3 1587204332840: onNext(To: [email protected]) 4 1587204332840: onNext(From: Swiloxtest ) 5 1587204332840: onNext(Subject: Demo) 6 1587204332840: onNext(Message-ID: <[email protected]>) 7 1587204332840: onNext(Date: Sat, 18 Apr 2020 12:05:30 +0200) 8 1587204332840: onNext(MIME-Version: 1.0) 9 1587204332840: onNext(Content-Type: multipart/mixed;) 10 1587204332840: onNext( boundary="------9654E0342ADFDD935DB2F3FE") 11 1587204332840: onNext() 12 1587204332842: onNext(------9654E0342ADFDD935DB2F3FE) 13 1587204332842: onNext(Content-Type: text/plain; charset=utf-8; format=flowed) 14 1587204332842: onNext(Content-Transfer-Encoding: 7bit) 15 1587204332842: onNext() 16 1587204332843: start creating ViewHtml 17 1587204332843: onNext(Demo Mail) 18 1587204332843: onNext() 19 1587204332843: onNext(------9654E0342ADFDD935DB2F3FE) 20 1587204332843: onNext(Content-Type: application/pdf; name="Mail Data Structure.pdf") 21 1587204332843: onNext(Content-Transfer-Encoding: base64) 22 1587204332843: onNext(Content-Disposition: attachment) 23 1587204332843: onNext() 24 1587204332844: start storing attachment 1 25 1587204332844: onNext(JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9DcmVhdG9yIChDaHJvbWl1b) 26 ... 27 1587204332913: onNext(MDAwMDAgbiAKdHJhaWxlcgo8PC9TaXplIDE0Ci9Sb290IDkgMCBSCi9Jb) 28 1587204332913: onNext(c3RhcnR4cmVmCjI4ODc5CiUlRU9G) 29 1587204332913: onNext(------9654E0342ADFDD935DB2F3FE) 30 1587204332916: onNext(Content-Type: image/png;) 31 1587204332916: onNext( name="Unbenannt.png") 32 1587204332916: onNext(Content-Transfer-Encoding: base64) 33 1587204332916: onNext(Content-Disposition: attachment;) 34 1587204332916: onNext( filename="Unbenannt.png") 35 1587204332916: onNext() 36 1587204332917: start storing attachment 2 37 1587204332917: onNext(TWFyIDI0LCAyMDIwIDA4OjUzOjU5LjA3OCBjb20uc3d5cHluLm1lc3NhZ) 6 Implementation 42

38 ... 39 1587204332940: onNext(b3JlLm5ldC5pbXBsLkNvbm5lY3Rpb25CYXNlClNFVkVSRTogQ29ubmVjd) 40 1587204332940: onNext(dXQKCg==) 41 1587204332940: onNext(------9654E0342ADFDD935DB2F3FE--) 42 1587204333116: storing ViewHtml successful 43 1587204333197: storing attachments successful 44 1587204333213: mail stored completely 45 1587204333213: receiving mail done

These event logs demonstrate how reactive processing works. All the different events (e.g. storing the mail, creating the ViewHtml, storing attachments, ...) have been started before all the mail lines were received at the "SMTP In" component. Therefore there is less waiting/blocking time, because mail data can be parsed and further processed, while the mail is received.

6.5.2 Forward mail The implementation for forwarding mails is used to forward an existing mail to another mail ad- dress. This procedure can either appear when an existing mail is manually forwarded by the user or when an automatic mail forward setting is adjusted and a received mail is immediately forwarded to the predefined external mail address (see Section 2.3).

A simplified version of the implementation is shown in Listing 34. The forwardMailMeta method is either called, when an existing mail is manually forwarded or when the automatic mail forward is enabled. In the case of a manual forward of an existing mail, the mailStreamInput parameter is null, because the mail is received from the file storage. However, if this method is reached from an automatic mail forward, then the mail is just arriving which means that the MailStream is still available and stored in the mailStreamInput parameter. Therefore, in line 2-6 the if-branch is inserted to distinguishes these two cases. In line 8-11 a public mail address for the sender mail address is either newly generated or received from the server, if it has been used already, and then stored in a Single type. This is done, because the mail addresses are anonymized and all mail traffic should run through the Swilox network. In line 13-19 the result from the Single object is retrieved, which contains the generated mail address. In this event handler the new mail header fields for the forward mail are set. At the end of the forwardMailMeta method, a completable is returned which indicates if the forwarding was successful.

Listing 34: MailStorageService - forwardMail

1 override fun forwardMailMeta(owner: Entity, mailMeta: Entity, email: String, mailStreamInput: IMailStream?): Completable{ 2 val mailStream= 3 if (mailStreamInput != null) 4 mailStreamInput 5 else 6 loadMailFromBlobStorage(owner, mailMeta) 7 // fetch From address from MailStream and load/generate public address for it 8 val singleFrom= mailStream.getHeader().flatMapSingle{ header -> 9 val fromAddress= header.parsedHeader!!.getFromList() 10 getOrCreateMailPublishedAddressForwardTo(mailMeta, fromAddress) 11 } 12 13 return singleFrom.map{ fromAddress -> 14 val publicMailFrom= 15 Optional.of("Swilox Forward<${fromAddress.content!!.publicEmail!!}>") 6 Implementation 43

16 val modifiedMailStream= 17 mailStream.setHeaderFields(from= publicMailFrom, to= Optional.of(email)) 18 mailSendingService.sendMessage(from= publicMailFrom. get(), to= email, mailStream= modifiedMailStream).second 19 }.ignoreElement() 20 } 7 Extended Libraries 44

7 Extended Libraries

In this chapter the extension of external library Vertx-Mail-Client is explained which is used to send out emails. Recalling on the architecture diagram in Figure 5.1, the Vertx-Mail-Client is used in the SMTP Out component.

7.1 Vertx-Mail-Client The Vertx-Mail-Client is the Vert.x SMTP client for sending mails. It was developed to send SMTP mails asynchronously. A handler must be specified, which is called when the sending process is done. However, before the sending process can be started, the complete mail has to be available. In the MailStream approach however, the mail is received and processed in parts (lines and blocks). Therefore the mail sending process should be done in parts as well. That is why a fork of the Vertx-Mail-Client repository of GitHub [18] was made to implement a new sending process.

In Figure 7.1 a simplified representation of the class diagram of the original Vertx-Mail-Client is shown. During a mail sending process, a MailClient instance is created. Then the sendMail method is called which receives a MailMessage instance as parameter. This MailMessage instance contains the complete mail. The sendMail method further creates an SMTPSendMail instance, where the actual sending process is implemented, and calls its start method.

Figure 7.1: Vertx-Mail-Client - Original

The new implementation works similarly. When compared to the original implementation, there are now two additional types that inherit from the MailMessage class (see Figure 7.2), the BasicMailMessage for the original implementation and the RxMailMessage for the MailStream implementation. Be- cause of these two new types, the MailClient interface now has two different sendMail methods, where each of them creates a different SMTPSendMail instance. One creates the BasicSMTPSendMail which works with the BasicMailMessage type and one creates the RxSMTPSendMail which works with the RxMailMessage type. In this way, the original BasicMailMessage as well as the new RxMailMessage can be sent through SMTP Out. 7 Extended Libraries 45

Figure 7.2: Vertx-Mail-Client - New implementation

The mail sending process in BasicSMTPSendMail and RxSMTPSendMail are similar at first, but have some major differences. So the first difference is the start method. The BasicSMTPSendMail implementation checks, if the mail exceeds the maximum mail size that the SMTP connections allows. For the RxSMTPSendMail implementation this check is skipped, because at the time where the sending process starts, the mail might not be completely loaded. Therefore the mail size cannot be checked. Our approach is to assume that the mail is within the maximum size. Otherwise the server will send an error code to which we can react to.

Afterwards the SMTP protocol is treated equally in both implementations. Both classes are sending the required information, like From address and To address, combined with the appropriate com- mands, e.g. MAIL FROM, RCPT TO, ... (see Section 4.1) and receive status codes back from the server.

The DATA part of SMTP again differentiates both implementations from each other. In the BasicSMTPSendMail implementation the complete mail is stored as one single String that contains all mail lines. This String is then split into multiple lines, each ending with CRLF, and each line is sent individually. However, the complete mail has to exist in this String before sending. In the RxSMTPSendMail class the complete mail is delivered by a Flowable, where each String already contains a single line. An observer subscribes to this Flowable which receives the lines per onNext event. Each line is then sent to the server. In this implementation, the complete mail does not have to exist before sending. 8 Tests and Measurements 46

8 Tests and Measurements

In this chapter the tests are explained that were made to prove the efficiency of the implementation and to compare it with existing solutions. The first section 8.1 Reactivity Test is a test which checks the reactivity of the parsing process. This means that some mail parts are fully processed in certain points in time that can be tested. The second section 8.2 Memory Test checks the amount of memory that is used by the implementation and compares it with an existing library. Section 8.3 Connections Test tests the amount of possible concurrent connections that can be established to the server without any load. A classic thread-based server is compared to the Vert.x server. Section 8.4 Load Test tests a heavy load scenario where lots of clients send mails to different server configurations. The goal of this test was to check how many client mails can be handled.

The tests were executed with following settings: • Computer: HP 15-db1204ng • CPU: AMD Ryzen 7 3700u • RAM: 16GB • OS: Ubuntu 19.04 • Java-Version: OpenJDK 12.0.2 (build 12.0.2+9-Ubuntu-119.04)

8.1 Reactivity Test The class ReactivityTest is a unit test that verifies the reactiveness of the MailStream implemen- tation. It tests the point in time in which certain elements are emitted by the MailStream class.

First a TestScheduler is created which will be used to manually advance the time. Then the test mail is read line-by-line from a file, pretending to be like SMTP. The returned Flowable is stored in the variable fileFlowable. In line 3-7 of Listing 35 the fileFlowable is connected with an interval flowable, which emits a sequential number every 10 milliseconds. The combination is done with the Flowable.zip operation. The zip operation waits until one element is emitted by each of the two flowables. Then a combined value, which is created by the BiFunction lambda, is emitted. Therefore, the emission of mail lines can be manually controlled by using the function advanceTimeTo or advanceTimeBy of the TestScheduler.

Listing 35: ReactivityTest - Init

1 val testScheduler= TestScheduler() 2 val fileFlowable: Flowable = readFile("./src/test/resources/m1005.txt") 3 val flowableUnderTest= Flowable.zip( 4 Flowable.interval(10, TimeUnit.MILLISECONDS, testScheduler), 5 fileFlowable, 6 BiFunction({i: Long, line: String -> line}) 7 )

Listing 36 shows that the mail parsing process is initialized and the MailStream instance is created. This means that the mail parsing process is indeed started, but without advancing the time (= emitting lines), the MailParser will do nothing.

Listing 36: ReactivityTest - MailStream

1 val blocks= MailParser.parse(flowableUnderTest) 2 val mailStream= MailStream.create(blocks) 8 Tests and Measurements 47

The next task is to initialize the TestObserver instances that will be used to check the MailStream outputs (see Listing 37). The TestObserver class provides the required assert methods. For each MailStream method a suitable test observer has to be created.

Listing 37: ReactivityTest - Init observers

1 // initialize test observers 2 val fullMailTestObserver= TestObserver< String>() 3 val bodyTestObserver= TestObserver< String>() 4 val headerLinesTestObserver= TestObserver< String>() 5 val textHeaderLinesTestObserver= TestObserver< String>() 6 val textLinesTestObserver= TestObserver< String>() 7 val htmlHeaderLinesTestObserver= TestObserver< String>() 8 val htmlLinesTestObserver= TestObserver< String>() 9 val htmlAttachmentTestObserver= TestObserver() 10 val attachmentTestObserver= TestObserver()

Then in Listing 38 the TestObservers are subscribed to the MailStream flowables.

Listing 38: ReactivityTest - Subscribe test observers

1 mailStream.getFullMail().toObservable().subscribe(fullMailTestObserver) 2 mailStream.getBody().toObservable().subscribe(bodyTestObserver) 3 mailStream.getHeader().subscribe{ 4 it.getLines().toObservable().subscribe(headerLinesTestObserver) 5 } 6 mailStream.getText().subscribe{ 7 it.header.getLines().toObservable().subscribe(textHeaderLinesTestObserver) 8 it.body.getLines().toObservable().subscribe(textLinesTestObserver) 9 } 10 mailStream.getHtml().subscribe{ 11 it.header.getLines().toObservable().subscribe(htmlHeaderLinesTestObserver) 12 it.body.getLines().toObservable().subscribe(htmlLinesTestObserver) 13 } 14 mailStream.getAttachments().toObservable().subscribe(attachmentTestObserver) 15 mailStream.getRelatedAttachments().toObservable().subscribe(htmlAttachmentTestObserver)

The rest of the test only cares about asserting values and manually advancing the time with the TimeScheduler. Some examples are shown in Listing 39.

The first block checks that no values have been emitted yet. The second block (line 12-22) advances to time 110 ms, where 11 lines have been emitted to the MailParser. Because the first 11 lines are the header lines, only the fullMailTestObserver and headerLinesTestObserver have received exactly 11 emitted lines. The third block (line 24-35) advances the time to 220 ms. Therefore, the first 22 lines have been emitted yet. These lines contain the 11 header lines of the main header and 11 body lines. The 11 body lines contain a nested sub-block which is a "text/plain" block. Therefore the textHeaderLinesTestObserver has received three lines (header of the text block). Also the first body line of the text block is emitted which is asserted by the textLinesTestObserver.

The purpose of this test is to check that the MailStream class parses and emits the different lines and blocks immediately when they are available. This means that the observers already receive parsed lines and blocks, before the complete mail is read. This proves that the MailStream class operates in a reactive way. 8 Tests and Measurements 48

Listing 39: ReactivityTest - Assertions

1 //0- start of test 2 fullMailTestObserver.assertNoValues() 3 bodyTestObserver.assertNoValues() 4 headerLinesTestObserver.assertNoValues() 5 textHeaderLinesTestObserver.assertNoValues() 6 textLinesTestObserver.assertNoValues() 7 htmlHeaderLinesTestObserver.assertNoValues() 8 htmlLinesTestObserver.assertNoValues() 9 htmlAttachmentTestObserver.assertNoValues() 10 attachmentTestObserver.assertNoValues() 11 12 // 11- header done 13 testScheduler.advanceTimeTo(110, TimeUnit.MILLISECONDS) 14 fullMailTestObserver.assertValueCount(11) 15 bodyTestObserver.assertNoValues() 16 headerLinesTestObserver.assertValueCount(11) 17 textHeaderLinesTestObserver.assertNoValues() 18 textLinesTestObserver.assertNoValues() 19 htmlHeaderLinesTestObserver.assertNoValues() 20 htmlLinesTestObserver.assertNoValues() 21 htmlAttachmentTestObserver.assertNoValues() 22 attachmentTestObserver.assertNoValues() 23 24 // 22- first line of text part 25 testScheduler.advanceTimeTo(220, TimeUnit.MILLISECONDS) 26 fullMailTestObserver.assertValueCount(22)// header parsed and first line is read 27 bodyTestObserver.assertValueCount(11) 28 headerLinesTestObserver.assertValueCount(11) 29 textHeaderLinesTestObserver.assertValueCount(3) 30 textLinesTestObserver.assertValueCount(1) 31 textLinesTestObserver.assertValue("[blue ball]\n") 32 htmlHeaderLinesTestObserver.assertNoValues() 33 htmlLinesTestObserver.assertNoValues() 34 htmlAttachmentTestObserver.assertNoValues() 35 attachmentTestObserver.assertNoValues()

8.2 Memory Test In the memory test the amount of mails that can be held in memory is tested. The test defines a memory space for the JVM (Java Virtual Machine). A mail is read and parsed multiple times. Each mail instance is stored in an ArrayList so it does not get removed by the garbage collector. The number of stored mails is counted until the JVM runs out of memory.

The test is done three times. For the first test an existing maven library called Email MIME Parser from the group tech.blueglacier is used. For the other two tests the MailParser and MailStream classes are used. In one test the MailStream instance with the parsed mail is just stored in an ArrayList. In the second test the MailStream instance is also stored in an ArrayList, but the text part and the HTML part of the mail are deleted in the MailStream instance to free up memory. The deletion has real-world purpose, because when a mail is being received, the first mail parts can be deleted as soon as they are processed, even if the mail is not fully loaded. That saves memory and therefore more mails can be handled concurrently. 8 Tests and Measurements 49

The example mail is a real-world advertisement mail from a webshop. The mail contains a text part and an HTML part and its size is about 110kB. The tests are executed with a JVM heap space of 16MB (VM Options: -Xmx16M).

8.2.1 Blueglacier MIME Parser When parsing the mails with the Email MIME Parser from tech.blueglacier (see Listing 40), then only 18 mail instance can be stored in memory, before the "OutOfMemoryError: Java heap space" is thrown.

Listing 40: MemoryTest - Blueglacier MIME Parser

1 fun testNormalMail() { 2 val list= ArrayList() 3 val rt= Runtime.getRuntime() 4 var i=0 5 while (true){ 6 val bytes= File("/home/christoph/Documents/Mails/advertisement.eml").readBytes() 7 val mail= Mail.parse(io.vertx.reactivex.core.buffer.Buffer(Buffer.buffer(bytes))) 8 9 list.add(mail) 10 i++ 11 12 printMemory(rt,"$i: After parse") 13 } 14 }

8.2.2 MailParser and MailStream When executing the test with the MailParser and MailStream classes without deleting the text and HTML part (see Listing 41), then 50 mails can held in memory before the OutOfMemoryError is thrown.

Listing 41: MemoryTest - MailStream without deletion

1 fun TestNormalMailWithoutDelete() { 2 val list= ArrayList() 3 val rt= Runtime.getRuntime() 4 var i=0 5 while (true){ 6 val fileFlowable= readFile("/home/christoph/Documents/Mails/advertisement.eml") 7 val blocks= MailParser.parse(fileFlowable) 8 val mailStream= MailStream.create(blocks) 9 10 list.add(mailStream) 11 i++ 12 13 printMemory(rt,"$i: After parse") 14 } 15 }

8.2.3 MailParser and MailStream with deletion The same test with the MailParser and MailStream classes is executed again, but this time the text and HTML parts are deleted after parsing (see Listing 42). The heap space of 16MB was now 8 Tests and Measurements 50

able to hold 328 mails in memory, before the OutOfMemoryException occurred.

Obviously these are test data and in this test almost all parts of the mail (except of the header) were deleted after parsing. In a real-world scenario the impact would not be that big. Nevertheless, a lot of memory space can be saved, when certain parts of the mails are deleted after being processed.

Listing 42: MemoryTest - MailStream with deletion

1 fun TestNormalMailWithDelete() { 2 val list= ArrayList() 3 val rt= Runtime.getRuntime() 4 var i=0 5 while (true){ 6 val fileFlowable= readFile("/home/christoph/Documents/Mails/advertisement.eml") 7 val blocks= MailParser.parse(fileFlowable) 8 val mailStream= MailStream.create(blocks) 9 list.add(mailStream) 10 11 mailStream.deleteHtml() 12 mailStream.deleteText() 13 i++ 14 15 printMemory(rt,"$i: After parse") 16 } 17 }

8.2.4 Results The tests were also executed with different mail sizes and JVM heap spaces. The Table 8.1 shows the amount of mail instances that could be held in memory.

Heap size 128MB Heap size 16MB Heap size 16MB Mail size 14MB Mail size 110kB Mail size 40kB Blueglacier 2-3 18 129 MailStream 3 50 157 MailStream with delete > 500 328 240

Table 8.1: MemoryTest - Possible mail instances in memory

For the first column of Table 8.1, the heap space had to be increased to 128MB, because the default memory usage of the JVM and a 14MB email already exceeded the predefined heap size of 16MB. Therefore, not a single mail could have been loaded into memory. The 14MB mail consists of a text part with two big attachments.

When we compare these results, we can clearly see the memory improvement between the existing Blueglacier library and the MailStream approach even without deletion. However, when considering MailStream with deletion, the improvement is even more significant. The improvement of the MailStream with deletion approach depends on the mail size. If big mail parts can be deleted, more space is freed. The difference is less in smaller mails, but the bigger the mails get, the more memory can be saved. 8 Tests and Measurements 51

8.3 Connections Test The connections test was designed to check the amount of clients that can concurrently connect to the server, before it crashes. To check the efficiency of the new Vert.x server implementation, it was tested against a classic server architecture.

The classic server structure is defined as a server thread waiting for clients to connect. Each con- nected client socket is passed to a new thread, where it is handled. Therefore if a great amount of clients connect, a lot of threads are created which take a big amount of memory.

The Vert.x server implementation works differently. When a Vertx instance is created, two different thread pools are created, an event loop thread pool and a worker thread pool. Per default several event loop threads (number of CPU threads times 2) and 20 worker threads are created. Verticles, which are chunks of code, can be deployed on event loop threads. In this case the server represents such a verticle which operates on an event loop thread. Therefore, for each CPU thread a separate server verticle can be started. Each server verticle listens for clients. When clients connect to the server, they are handled by the same event loop thread as the server verticle. However, the client- server communication may not block the event loop, which is the golden rule of Vert.x. Nothing may block the event loop. If a client-server communication requires to process a heavy task, it must be passed to a worker thread. Event loop threads, however, quickly spread events to verticles and then go for the next event. Because the event loop threads are not blocked, a lot of clients can be handled by just a few threads and because only a few threads are used, memory usage is low.

To test the advantages and disadvantages of both servers, a client and a server program was im- plemented for both the classical thread-based server and the new Vert.x server. For testing the possible amount of concurrent connections, the client opens a socket to the server and waits. The connections are not closed. Within a loop, up to 10000 clients connect to the server until it is overloaded.

8.3.1 Classic server As mentioned before, a classic server is considered a server thread that is listening for clients on a certain port. Each connecting client is moved to a newly created thread which deals with the client. Listing 43 shows a ServerSocket listening for clients on port 25252. Every connecting client is passed to a SocketHandler instance, which inherits from the class Thread.

Listing 43: ConnectionsTest - Server accepting clients

1 val server= ServerSocket(25252) 2 println("Server listening on port 25252...") 3 while(true){ 4 val socket= server.accept() 5 val socketHandler= SocketHandler(socket) 6 socketHandler.start() 7 }

The SocketHandler class sends out a "Hello" message to the client and tries to receive messages from the client. The messages are written to the console. No other operations are used for this test.

Listing 44: ConnectionsTest - SocketHandler

1 import java.net.Socket 2 3 class SocketHandler(val socket: Socket): Thread() { 8 Tests and Measurements 52

4 override fun run() { 5 val output= socket.getOutputStream().bufferedWriter() 6 val input= socket.getInputStream().bufferedReader() 7 8 output.write("220 Hello from Server\r\n") 9 output.flush() 10 while(true){ 11 val line= input.readLine() 12 println("client: $line") 13 if(line.startsWith("EHLO")) { 14 output.write("250 Accepted\r\n") 15 output.flush() 16 } 17 } 18 } 19 }

A single client that connects to the server is implemented as shown in Listing 45. This code is called in a loop for 10000 times.

Listing 45: ConnectionsTest - Client code

1 fun handleSocket() { 2 val s= Socket("localhost", 25252) 3 println(s.getInputStream().bufferedReader().readLine()) 4 val writer=s.getOutputStream().bufferedWriter() 5 writer.write("message: client to server\r\n") 6 writer.flush() 7 println(s.getInputStream().bufferedReader().readLine()) 8 }

The problem with this server architecture is that every new thread adds up to memory space, which will eventually lead to a heap overflow.

In this test, the memory of the JVM is manually limited to 128MB (VmOption: -Xmx128M). Then 10000 clients try to connect to this server. Every client is assigned a separate thread. After 2493 clients that were able to connect, an OutOfMemoryException was thrown by the server, because it ran out of memory.

Then the test was repeated with 64MB and 256MB to show the differences, as Table 8.2 shows. With 64MB of memory, 1194 clients were able to connect. With 256MB of memory, 5156 clients were able to connect to the server, before the OutOfMemoryException was thrown again.

Memory Connected clients 64MB 1194 128MB 2493 256MB 5156

Table 8.2: Classic Server - Maximum connected clients

The maximum connected clients increased approximately by a factor of 2, when the memory is increased by a factor of 2. This describes a linear growth of possible connected clients per memory. Therefore, memory is a major limiting factor on classic servers. 8 Tests and Measurements 53

8.3.2 Vert.x server The test for the Vert.x server is implemented in a custom verticle, called SimpleServer. In the start method of this custom verticle, a NetServer instance is created. The NetServer instance is assigned to an event loop thread that gets processing time every now and then to accept clients. It may not block the event loop, however. A connect handler is registered to the server which handles connecting clients. For each connected client some handlers are registered: a data handler for all the messages/data between client and server, an exception handler for exceptions that might occur and an end handler, which indicates the end of a connection. Each client is handled by the same event loop thread as the server verticle. Therefore, a lot of clients are handled by a single thread. This, however, is no problem, because the clients may not block the event loop, which leads to quick response times. Due to the fact that this server architecture uses just few threads, memory space is not an issue.

This server implementation is located within a verticle, the SimpleServer verticle. If necessary, multiple server verticles can be started if the load requires it. Each verticle is started on another event loop thread, so that all CPU cores can be utilized properly. The Vert.x server implementation can be easily scaled up.

Listing 46 shows the implementation of a simple Vert.x server that listens on port 25252.

Listing 46: ConnectionsTest - Vert.x Server

1 class SimpleServer: AbstractVerticle() { 2 override fun start() { 3 super.start() 4 5 val serverConf= io.vertx.core.net.NetServerOptions() 6 serverConf.port = 25252 7 val server= vertx.createNetServer(serverConf) 8 9 server.connectHandler{ socket -> 10 socket.handler{ buffer -> 11 println("client: $buffer") 12 } 13 socket.exceptionHandler{ 14 println("socket exception") 15 it.printStackTrace() 16 } 17 socket.endHandler{ 18 println("client disconnected") 19 } 20 socket.write("220 Hello from Server\r\n") 21 socket.end() 22 } 23 server.listen() 24 } 25 }

The implementation of the clients which are connecting to the Vert.x server, are similar to those of the classic server (see Listing 45).

The test conditions are the same as for the classic server. Only one server verticle is started, the JVM heap space is limited to 128MB and 10000 clients try to connect to the server. With this 8 Tests and Measurements 54 architecture, however, all 10000 clients were able to connect to the server without any problem. This already shows how efficient the Vert.x server operates.

A second iteration was executed to push the server to its limits. This time 50000 clients connect to the server, which still has 128MB of heap storage. In this way, it can be tested how much RAM a single connection really costs. The second iteration was able to handle 28225 connections, but then many exceptions were thrown. However, there was no OutOfMemoryException, so another system limitation might have been the cause of these exceptions.

Then the same test was repeated with a heap space of 256MB, but again 28225 clients connected successfully. Therefore memory is not the issue.

In the next iteration, more server verticles were used to eliminate a problem of too little CPU utilization, because of only a single server verticle. The new settings were: 256MB heap space, 4 server verticles and 50000 clients connecting. But again only 28225 clients were able to connect. CPU power was not the issue either.

Another iteration was made to check for network problems. The amount of possible open ports was adjusted to a range between 15000 and 60999 (which results in 46000 possible ports) with the following linux command: sudo sysctl net.ipv4.ip_local_port_range="15000 60999"

In a new test iteration the amount of successful client connections raised up to 45991. Therefore the limiting factor was the maximum amount of concurrent ports. The standard was in the range between 32768 and 60999 (= 28231 ports), which explains the previous results of 28225 connected clients.

It is still not defined though, how many clients the Vert.x server can handle maximal. Therefore the test was executed again with following settings: 4 server verticles, possible network ports set to 46000, 64MB of heap space and 50000 connecting clients. This time, the server reached its RAM limitations. 22106 clients connected successfully.

A second iteration with the same settings, but with 96MB of heap space resulted in 34586 success- fully connected clients. To measure how much RAM a single client connection actually needs, the difference between 64MB and 96MB was used, which is 32MB and this value was divided by the difference of the clients of these two tests. The difference between 22106 and 34586 is 12480. 32MB (= 32768kB) divided by 12480 clients resulted in 2.63 kB/client. Therefore one client connection requires about 2.63 kB of memory. 8 Tests and Measurements 55

Summary of test iterations

Iteration Memory (MB) Verticles Possible ports Connecting clients Connected clients 1. 128 1 28231 10000 10000 2. 128 1 28231 50000 28225 3. 256 1 28231 50000 28225 4. 256 4 28231 50000 28225 5. 256 4 46000 50000 45991 6. 64 4 46000 50000 22106 7. 96 4 46000 50000 34586

Table 8.3: Vert.x Server - Maximum connected clients

8.4 Load Test For the load test, a simple SMTP protocol was implemented on the Vert.x client and Vert.x server program. The classic thread-based server is not handled in the load test anymore. To establish the load, a certain amount of clients connect to the server and send mails to it. A batching mechanism was introduced to the client program to control the load (see Listing 48). It uses a batch number and a batch delay, which means that every x milliseconds y clients connect to the server and send a mail (x... batch delay, y... batch number). Additionally, a semaphore was introduced (see Listing 47) to the server program to control the CPU power. This semaphore keeps client mails from being processesed further than the DATA tag of the SMTP protocol. For example, when the semaphore is set to 20, then only 20 client mails are processed concurrently by the server. All the other clients wait in a state before the DATA tag. Whenever a client mail is processed sucessfully, the semaphore is increased, which leaves a new slot for another client mail.

Server side The SimpleSemaphore class is an implementation of a semaphore on the server side. It allows multiple client mails to be processed simultaneously up to a predefined number. The class has two instance variables, cur and waiting. The variable cur stores the possible amount of mails that can be processed at any point in time. The variable waiting stores a list of lambda functions that still need to be executed, but are waiting at the moment.

When the run method of the semaphore is called, a lambda function is passed as parameter. If there is still enough capacity in the semaphore, which means cur > 0, then cur is decremented and the lambda function is immediately executed. Otherwise the lambda function is added to the waiting list.

Whenever a client mail is sucessfully processed, the done function of the semaphore is called. The variable cur is incremented. If there are still waiting functions in the waiting list, the variable num is decremented and the lambda function at position 0 of the list waiting is executed next.

Listing 47: Load test - Semaphore

1 class SimpleSemaphore(val num: Int){ 2 private var cur= num; 3 private var waiting= mutableListOf<() -> Unit>() 4 5 fun run(r: () -> Unit){ 8 Tests and Measurements 56

6 if (cur <= 0) { 7 waiting.add(r) 8 } else { 9 cur--; 10 r(); 11 } 12 } 13 14 fun done() { 15 cur++; 16 if (waiting.isNotEmpty()) { 17 cur-- 18 waiting.removeAt(0)() 19 } 20 } 21 }

Client side On the client side the amount of concurrent connecting clients is controlled by a batchNumber and a batchDelay factor. The batchNumber decides, how many clients in one batch iteration connect to the server and the batchDelay decides, what time between each batch iteration is waited.

The method startClients starts a certain amount of clients in batches. First, one batch of clients is started connecting to the server, where batchNumber is the batch size. Then the program waits some milliseconds. The time is predefined in the variable batchDelay. Then the method is called recursively with same batch and max parameters, but with a new start parameter.

In the method connectOneClient a client starts connecting to the server. As soon as the con- nection is established, a mail is sent to the server via the SMTP protocol which is done by the CustomSmtpClient class object in line 31 of Listing 48. When the mail is successfully delivered, the connection is stopped.

Listing 48: Load test - Client side

1 class SmtpClient: AbstractVerticle() { 2 private val HOST="localhost" 3 private val max = 10000 4 private val batchNumber = 10 5 private val batchDelay = 100L 6 7 override fun start() { 8 super.start() 9 startClients(1, batchNumber, max) 10 } 11 12 private fun startClients(start: Int, batch: Int, max: Int){ 13 for(i in 1..batch){ 14 connectOneClient(i+ start) 15 } 16 val newStart= start+ batch 17 if(newStart< max){ 18 vertx.setTimer(batchDelay){ 19 startX(newStart, to, max) 8 Tests and Measurements 57

20 } 21 } 22 } 23 24 private fun connectOneClient(clientNr: Int){ 25 val config= NetClientOptions() 26 config.setConnectTimeout(1000*60*10) 27 val netClient= vertx.createNetClient(config) 28 netClient.connect(25252, HOST){ asyncResult: AsyncResult? -> 29 if(asyncResult!!.succeeded()) { 30 val socket= asyncResult.result() 31 CustomSmtpClient(socket,clientNr,stat, vertx) 32 } else { 33 asyncResult.cause().printStackTrace() 34 } 35 } 36 } 37 }

8.4.1 Localhost First the load test was executed on the local machine without handling real-world network com- munication. This means, the server and client program were executed on the same machine. The mails, which are sent from the clients to the server, are not processed any further. The mail data is discarded afterwards.

For the test the server was adjusted with a JVM heap space of 1GB (-Xmx1G). The event loop size (eventLoopPoolSize=4) and the amount of server verticles (-instances 4) was set to 4, which means that each server verticle had its own event loop thread to handle clients. The mail size that was sent from each client to the server had a size of 300kB. The batch number was set to 10, which means that every X milliseconds 10 clients try to connect to the server and send a mail (X = batch delay).

Heap space: 1GB (-Xmx1G) EventLoopPoolSize: 4 (eventLoopPoolSize=4) Verticles: 4 (-instances 4) File size: 300kB Batch number: 10 clients 8 Tests and Measurements 58

Batch delay (ms) Semaphore Delivered mails No 8947 100 9267 100 50 7610 5 10000 No 5583 100 6005 10 50 6900 5 10000

Table 8.4: Load test - Localhost

The first column Batch delay (ms) of Table 8.4 describes the batch delay, which is the amount of time that the client program waits until the next clients start to connect to the server. The column Semaphore describes the use of the semaphore technique which, in this context, limits the amount of client mails that are processed by the server concurrently. The value "No" means unlimited mails can be processed concurrently, "100" means at most 100 mails concurrently and so on. The third column Delivered mails describes how many client mails were successfully delivered.

The result with a batch delay of 100ms are not very clear, because with different semaphore counters, the resulting values jump around. However, with a batch delay of 10ms a clear trend is visible. The more CPU power is used for processing the mails, the less actual clients can be handled. This is due to the problem that the server is overloaded and therefore new connecting clients are refused. When the semaphore counter is set to 5, then only 5 mails are processed per verticle at a time. Therefore, there is more CPU power left for handling new connections.

Appeared errors • IOException: Connection reset by peer The client aborted the connection, because of waiting for too long.

• AnnotatedConnectException: Connection timed out The connection attempt failed. The connection could not be established.

Conclusion The less CPU power the server has to accept the client sockets, the more errors appear on the client side, because the server cannot handle more clients at this point in time. Clients then receive ei- ther a “Connection reset by peer” or a “Connection timed out” exception, when waiting for too long.

Nevertheless, the achieved results of the Vert.x server are impressive. Up to 10000 real-world client mails can be handled in a reasonable amount of time with a server that has only 1GB of RAM. 8 Tests and Measurements 59

8.4.2 Server to Server A more realistic test is to have the client on one server and the actual mail server on another one. Only then a realistic network transaction with network delays is given. In this test the client is located on a dedicated server and the mail server is located on a Microsoft Azure server. It is still not perfect, because the clients all connect to the mail server from one server. This must be sufficient though, because we did not have the resources to test it any better.

The test case looks like follows. The clients connect to the server and send a mail to it. Two test cases are covered. Either the mail is sent to the server and the mail data is discarded or it is sent to the server and the data is parsed with the MailStream class and kept in memory. In either test case, 10000 clients connect to the server and send a mail. Three different mail sizes were tested. The Vert.x mail server was tested with three different Microsoft Azure servers, a weak one (B1ms), a medium one (B2s) and a strong one (DS3) [20].

Mini Server - B1ms The server:

Memory: 2GB CPU: 1 core (Intel Broadwell E5-2673 v4 2.3 GHz or Intel Haswell 2.4 GHz E5-2673 v3)

The settings:

Heap Space: 1.5GB (-Xmx1500M) Interval (batch number/batch delay): 10 Clients / 100ms (= 100 connections per second)

Table 8.5 shows the test results without parsing. This means the mails are just received by the server, but not further processed. The received mails are discarded.

EventLoopSize + Verticles Mail size (kB) Delivered mails Errors Time used (ms) 10 10000 0 110606 1 50 9514 486 179288 300 5674 4326 240933 10 10000 0 107815 2 50 9965 35 115795 300 5709 4291 252111

Table 8.5: Load Test on Mini Server (B1ms)

According to the results of Table 8.5, the bigger the mails are that are transferred, the less mails can be successfully delivered. The receiving process of the mail data increases the workload on the server and therefore less clients can be treated. The increase of the event loop threads and verticles did not improve the result by much. When no errors occur, 10000 clients can be handled in about 110 seconds. However, if exceptions are thrown, the processing time is more than doubled. Problematic connections and exception handling therefore takes a lot of time. 8 Tests and Measurements 60

Small Server - B2s The server:

Memory: 4GB CPU: 2 cores (Intel Broadwell E5-2673 v4 2.3 GHz or Intel Haswell 2.4 GHz E5-2673 v3)

The settings:

Interval (batch number/batch delay): 10 Clients / 100ms (= 100 connections per second) EventLoopSize: 4 Verticles: 4

Table 8.6 shows the test results without parsing. This means the mails are just received by the server, but not further processed. The received mails are discarded.

Mail size (kB) Heap size (GB) Delivered mails Errors Time used (ms) 10 2 10000 0 110237 2 9682 318 188202 300 3 9969 31 171117 3.5 9784 216 189742

Table 8.6: Load Test on Small Server (B2s) without Parsing

Table 8.7 shows the test results with parsing. The received mails are parsed by the MailStream class and kept in memory.

Mail size (kB) Heap size (GB) Delivered mails Errors Time used (ms) 10 2 10000 0 109495 2 10000 0 108145 50 3 10000 0 111288 3.5 10000 0 108881 2 9085 915 190020 300 3 9378 622 185752 3.5 OutOfMemory OutOfMemory OutOfMemory

Table 8.7: Load Test on Small Server (B2s) with Parsing

The OutOfMemoryException happened, because the total RAM of the system ran out. The default system RAM usage together with the 3.5GB that the server used were too much for the 4GB RAM total. However, we can definitely see that parsing the mails with the MailStream class increased the load on the server. It led to less successfully delivered mails, because the server was not able to connect the clients in time.

When using a semaphore with a counter of 5 to limit the amout of concurrently processed mails per verticle, the results change (see Table 8.8). The received mails are parsed by the MailStream class and kept in memory. 8 Tests and Measurements 61

Mail size (kB) Heap size (GB) Delivered mails Errors Time used (ms) 300 1 10000 0 170071

Table 8.8: Load Test on Small Server (B2s) with Parsing and Semaphore

Here the processing power and memory is not mainly used for parsing the mail, but rather for accepting client connections. Therefore all clients can be connected first and afterwards their mails are processed. However, the mails take longer to be processed.

Strong Server - DS3 The server:

Memory: 14GB CPU: 4 Cores (Intel Xeon 8171M 2.1GHz or Intel Xeon E5-2673 v4 2.3 GHz or Intel Xeon E5-2673 v3 2.4 GHz)

The settings:

Interval (batch number/batch delay): 10 Clients / 100ms (= 100 connections per second) Mail size 300kB Heap space: 2GB

Table 8.9 shows the test results without parsing. This means the mails are just received by the server, but not further processed. The received mails are discarded. In this test only the 300kB mails were tested, because the strong server was able to handle them with ease. Therefore smaller mails can be handled as well.

EventLoopSize Verticles Delivered mails Errors Time used (ms) 4 4 10000 0 126980 8 8 10000 0 123000

Table 8.9: Load Test on Strong Server (DS3)

When comparing these result with the 300kB mail results of the small server B2s in Table 8.6, it can be concluded that improved CPU performance and increased core count, along with increased number of event loop threads and verticles, leads to an increased amount of concurrently handled mails. 9 Conclusion 62

9 Conclusion

This thesis presented an approach of a reactive and event-based mail processing server based on Re- activeX. Reactive programming means asynchronous event processing. For an asynchronous task, a handler can be registered which is called when the computation is completed. Thus, reactive task handling is asynchronous and non-blocking. ReactiveX is a library and API that provides reactive streams as an approach for reactive programming. With reactive streams, event-based, asynchronous computations can be defined in a functional way as chains of function applications.

The thesis was made in collaboration with the start-up Swilox. It provides a system that tries to simplify the registration and login services of webshops as well as the shop-customer-communication. For the shop-customer-communication a special mailing system was needed, which for the user looks like a messenger, but in the background operates with mails. The mailing system had to be im- plemented in a reactive way, which is why the SMTP protocol and the mail file format had to be reworked. The receiving, parsing, storing, editing and sending of mails had to be re-implemented in the reactive style as well.

The implementation of the mail server starts with the SMTP In component, which receives incoming mails as bytes and combines them to single lines. Each received line is added to a reactive stream, which is used along all components of the mail server. The lines flow through the reactive stream to the Mail Parser component, where they are parsed and combined to logical blocks that contain accessible mail information. The blocks, however, are still unstructured, which is why they then flow through the reactive stream to the MailStream component. The MailStream component uses the stream of blocks to build a structure, which allows easy access to all mail information. Then com- ponents like the Mail Forward and Mail Sender component access the mail information through the MailStream to modify the mail and forward it to other servers, or like the Mail Storage com- ponent, which uses the MailStream to store the mail on the file server. Because all components make us of reactive streams, lines and blocks can immediately flow through the whole system as soon as they are processed.

The results show that there is a significant performance improvement compared to a conventional solution. A first experiment tested the memory usage of the reactive mail approach compared to another mail library. In the test, incoming mails were parsed and kept in memory. The reactive mail approach was able to handle almost double the amount of mails compared to the other mail library. In a second experiment it was tested, how many clients were able to connect to the server and keep the connection alive. The classic mail server approach could handle only ca. 1200 clients, whereas the reactive mail server approach was able to handle ca. 22 000 clients. Another experiment implemented a more realistic scenario, where clients sent average sized mails to the server and the server accepted and parsed these mails. In the test a small server with only 2 CPU cores and 4GB of memory was able to handle 10 000 clients.

In general, the performance has been improved, because the mail server no longer blocks to wait for mails to be fully loaded, before being further processed. This saves a lot of time, where a thread can handle events of other clients instead of waiting in a blocking state. Therefore, the reactive mail server was able to fulfill the performance requirements of the Swilox system, which is why it has been integrated in the production version in the meantime.

To sum up, I can say that it was a very interesting project. It addresses common issues, because every user and every company uses a mail system. Additionally, the implementation of this thesis 9 Conclusion 63 is actually used in the Swilox system, which means it will be further developed and maintained by Swilox. Also very appealing to me was that this thesis was implemented with modern technologies, so I was able to work with new languages and frameworks. Literature 64

References

[1] A gentle guide to asynchronous programming with Eclipse Vert.x for Java developers - Vert.x. https://vertx.io/docs/guide-for-java-devs/ (visited on 06/09/2020). [2] BackpressureStrategy (RxJava Javadoc 2.2.19). http : / / reactivex . io / RxJava / 2 . x / javadoc/io/reactivex/BackpressureStrategy.html (visited on 06/04/2020). [3] Completable (RxJava Javadoc 2.2.19). http://reactivex.io/RxJava/2.x/javadoc/io/ reactivex/Completable.html (visited on 06/20/2020). [4] Eclipse Vert.x. https://vertx.io/ (visited on 06/09/2020). [5] Kotlin . https://kotlinlang.org/ (visited on 08/12/2020). [6] Maybe (RxJava Javadoc 2.2.19). http://reactivex.io/RxJava/javadoc/io/reactivex/ Maybe.html (visited on 06/14/2020). [7] Multipurpose Internet Mail Extensions - Wikipedia. https://de.wikipedia.org/wiki/ Multipurpose_Internet_Mail_Extensions (visited on 06/06/2020). [8] Netty project. https://netty.io/ (visited on 08/12/2020). [9] Reactive Extensions - Wikipedia. https://en.wikipedia.org/wiki/Reactive_extensions (visited on 06/08/2020). [10] ReactiveX. http://reactivex.io/ (visited on 06/04/2020). [11] ReactiveX - Intro. http://reactivex.io/intro.html (visited on 06/04/2020). [12] ReactiveX - Map operator. http://reactivex.io/documentation/operators/map.html (visited on 06/05/2020). [13] RFC 5321 - Simple Mail Transfer Protocol. https://tools.ietf.org/html/rfc5321 (visited on 08/12/2020). [14] RFC 5322 - Internet Message Format. https://tools.ietf.org/html/rfc5322 (visited on 06/06/2020). [15] RFC 821 - Simple Mail Transfer Protocol. https://tools.ietf.org/html/rfc821 (visited on 08/12/2020). [16] RFC 822 - STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES. https://tools.ietf.org/html/rfc822 (visited on 08/12/2020). [17] Simple Mail Transfer Protocol - Wikipedia. https://de.wikipedia.org/wiki/Simple_ Mail_Transfer_Protocol (visited on 06/06/2020). [18] vert-x3/vertx-mail-client. https://github.com/vert-x3/vertx-mail-client (visited on 06/24/2020). [19] Vert.x Core Manual - Vert.x. https://vertx.io/docs/vertx- core/java/ (visited on 06/09/2020). [20] Virtuelle Windows-Computer - Microsoft Azure. https://azure.microsoft.com/de-de/ pricing/details/virtual-machines/windows/ (visited on 07/08/2020). Literature 65

List of Figures

1.1 Stream-based processing of mails (with ReactiveX) ...... 2 2.1 Mail and messaging system ...... 4 2.2 Use case - Mail from shop to user (in Swilox App) ...... 5 2.3 Use case - Reply to mail (in Swilox App) ...... 5 2.4 Use case - Forward mail manually from App ...... 6 2.5 Use case - Forward mail automatically when receiving a mail ...... 6 2.6 Use case - Reply to mail from external mail server ...... 7 3.1 Observer-Observable ...... 8 3.2 Observable - Success ...... 9 3.3 Observable - Error ...... 9 3.4 Processor ...... 11 3.5 Maybe...... 12 3.6 Completable ...... 12 3.7 Map - Operator ...... 13 3.8 Filter - Operator ...... 14 3.9 Scan - Operator ...... 14 3.10 Chaining Operators ...... 15 3.11 Vertx - Event loop [1] ...... 17 4.1 Example - SMTP Protocol [17] ...... 19 4.2 Simple mail visualization ...... 20 4.3 Example - RFC 822 ...... 21 4.4 MIME - Multipart ...... 22 4.5 MIME - Multipart code ...... 22 4.6 MIME - Complex ...... 23 5.1 Architecture ...... 25 6.1 Class diagram - Block hierarchy ...... 28 6.2 Class diagram - ReactiveStorage ...... 29 6.3 Class diagram - MimeLines ...... 31 6.4 State diagram - MIME ...... 33 6.5 Class diagram - IMailStream ...... 35 6.6 Class diagram - MailStreamFactory ...... 37 6.7 Mail example for receiving ...... 40 7.1 Vertx-Mail-Client - Original ...... 44 7.2 Vertx-Mail-Client - New implementation ...... 45 Literature 66

List of Tables

3.1 Positioning of Observables [11] ...... 16 3.2 Comparison - Iterable vs. Observable [11] ...... 16 8.1 MemoryTest - Possible mail instances in memory ...... 50 8.2 Classic Server - Maximum connected clients ...... 52 8.3 Vert.x Server - Maximum connected clients ...... 55 8.4 Load test - Localhost ...... 58 8.5 Load Test on Mini Server (B1ms) ...... 59 8.6 Load Test on Small Server (B2s) without Parsing ...... 60 8.7 Load Test on Small Server (B2s) with Parsing ...... 60 8.8 Load Test on Small Server (B2s) with Parsing and Semaphore ...... 61 8.9 Load Test on Strong Server (DS3) ...... 61 Literature 67

List of Listings

1 Observer subscribing to Observable - Lambda ...... 9 2 getData() function ...... 10 3 Observer subscribing to observable - anonymous class ...... 10 4 Console output ...... 10 5 Processor example ...... 11 6 Processor - Output ...... 11 7 Maybe example ...... 12 8 Completable example ...... 12 9 Map - Example ...... 13 10 Map - Output ...... 13 11 Filter - Example ...... 14 12 Filter - Output ...... 14 13 Scan - Example ...... 15 14 Scan - Output ...... 15 15 Chaining Operators - Example ...... 15 16 Chaining Operators - Console output ...... 15 17 Creating Vertx instance ...... 18 18 Creating the NetServer ...... 18 19 Deploying a ServerVerticle ...... 18 20 SmtpSessionNew - initializer ...... 26 21 SmtpSessionNew - handleBuffer ...... 26 22 SmtpSessionNew - handleLine ...... 27 23 SmtpSessionNew - handleActualData ...... 27 24 ReactiveStorageWithList ...... 30 25 MailParser - parse ...... 31 26 MailParser - handleState ...... 31 27 MailParser - startMultipart ...... 34 28 MailStream - getBlocks ...... 36 29 MailStream - getText ...... 36 30 MailStreamFactory - createMail ...... 37 31 MailStreamFactory - Output of createMail ...... 37 32 MailStorageService - receiveMail ...... 39 33 MailStorageService - Logs of receive mail ...... 41 34 MailStorageService - forwardMail ...... 42 35 ReactivityTest - Init ...... 46 36 ReactivityTest - MailStream ...... 46 37 ReactivityTest - Init observers ...... 47 38 ReactivityTest - Subscribe test observers ...... 47 39 ReactivityTest - Assertions ...... 48 40 MemoryTest - Blueglacier MIME Parser ...... 49 41 MemoryTest - MailStream without deletion ...... 49 42 MemoryTest - MailStream with deletion ...... 50 43 ConnectionsTest - Server accepting clients ...... 51 44 ConnectionsTest - SocketHandler ...... 51 45 ConnectionsTest - Client code ...... 52 46 ConnectionsTest - Vert.x Server ...... 53 47 Load test - Semaphore ...... 55 Literature 68

48 Load test - Client side ...... 56