Masaryk University Faculty of Informatics

Library for Handling Asynchronous Events in C++

Bachelor’s Thesis

Branislav Ševc

Brno, Spring 2019 Declaration

Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Branislav Ševc

Advisor: Mgr. Jan Mrázek

i Acknowledgements

I would like to thank my supervisor, Jan Mrázek, for his valuable guid- ance and highly appreciated help with avoiding death traps during the design and implementation of this project.

ii Abstract

The subject of this work is design, implementation and documentation of a for the C++ programming language, which aims to sim- plify asynchronous event-driven programming. The Reactive Blocks Library (RBL) allows users to define program as a graph of intercon- nected function blocks which control message flow inside the graph. The benefits include decoupling of program’s logic and the method of execution, with emphasis on increased readability of program’s logical behavior through understanding of its reactive components. This thesis focuses on explaining the programming model, then proceeds to defend design choices and to outline the implementa- tion layers. A brief analysis of current competing solutions and their comparison with RBL, along with overhead benchmarks of RBL’s abstractions over pure C++ approach is included.

iii Keywords

C++ Programming Language, Inversion of Control, Event-driven Pro- gramming, Asynchronous Programming, Declarative Programming, Functional Programming

iv Contents

Introduction 1 Motivation ...... 1 Thesis Structure ...... 4

1 Programming Concepts 5 1.1 Imperative and Procedural Programming Paradigm .....5 1.2 Inversion of Control ...... 6 1.3 Event-driven Programming Paradigm ...... 6 1.4 Declarative Programming Paradigm ...... 7 1.5 Functional Programming Paradigm ...... 7

2 Design 8 2.1 Operation as a Block ...... 8 2.2 Program as a Graph of Blocks ...... 9 2.3 Events ...... 10 2.4 Synchronous and Asynchronous Blocks ...... 10 2.5 Nested Graph Composition ...... 13 2.6 Syntactic Sugars ...... 14

3 Implementation 15 3.1 Core Functionality ...... 15 3.2 Built-in Blocks ...... 23 3.3 Executor Blocks ...... 27 3.4 Algorithm Blocks ...... 31 3.5 Introspection Layer ...... 34 3.6 Expression Layer ...... 36

4 Evaluation 43 4.1 Use Case Comparison ...... 43 4.1.1 Boost.Asio ...... 43 4.1.2 RxCpp ...... 43 4.1.3 Intel® Threading Building Blocks ...... 44 4.1.4 Coroutines ...... 45 4.2 Performance Analysis ...... 48 4.2.1 Overhead Analysis ...... 48 4.2.2 Performance Benchmarks ...... 51

v 4.3 Debugging ...... 53

5 Conclusion 54 5.1 Future Work ...... 54

A Asynchronous API Model 56 A.1 Asynchronous Operations ...... 56 A.2 Boost.Asio – An Asynchronous API Example ...... 56

B Technical Details 62 B.1 Build requirements ...... 62 B.2 Third-party libraries ...... 62 B.3 Project Structure ...... 64

Bibliography 67

vi Introduction

Most of today’s computer systems and their software have to react on some form of external changes that do not have an exactly defined time of occurrence. The practice of handling these events falls into the field of event-driven programming (see 1.3). Notices of external events can originate from the system’s hard- ware (I/O devices, network, timers, etc.) or from software that exposes its application programming interface (API) as a set of asynchronous func- tions (see A.1). An example of such software is an and its time, peripheral I/O (e.g. networking) and task scheduling related API1. Modern programming languages, including C++, provide sup- port for asynchronous code to some extent. C++’s implementation mainly comprises of built-in language primitives for multi-threaded programming and associated synchronization [2, Chapter 41]. Despite not being viewed as a conservative and lower-level language, C++ still lacks the standard concepts and utilities (as of the C++17 standard2), which would provide higher-level abstractions for more convenient event-driven programming. In recent years, frameworks and libraries, e.g. Reactive Exten- sions (Rx) [3], which aim to simplify event-driven programming, have started to appear in various higher-level languages. While a few of such libraries, including Rx, have also been written for C++ (elabo- rated in 4.1.2), we have decided to design and implement our own library with increased generalization and transparency over existing solutions. We have given it the name Reactive Blocks Library, or RBL.

Motivation

The aim of this work is to implement a C++ library to support general- purpose event-driven programming with a higher level of abstraction than the plain C++17 standard provides. The library should make

1. E.g. POSIX aio (7), POSIX timer_create (2) [1]. 2. This should change with the arrival of C++20 and its coroutines, see 4.1.4.

1 asynchronous event-driven programming easier in terms of code size, readability, and expressiveness. We would like to be able to write programs based on asynchronous event handling with an emphasis on code clarity in terms of event consequences. Functional and declarative programming concepts can aid us to express the behavior of an event-driven program in a cleaner and informatively more condensed way. The usual approach to event handling is done via IoC (see 1.2) and its most straightforward application – callback functions, as in A.1. To understand a program written as a set of callbacks, the user has to thoroughly read the associated logic and scrape the information about the structure of execution of these callbacks. The purpose of RBL is to separate this information from the se- mantics of individual functions or operations. That is, by segregating the logic among function blocks, which are in return loosely connected to form a graph of consequent operations with the desired collective behavior. A code with callbacks may look like this: void callback_1(){ ...; asynchronous_operation_2(callback_2);...; }

void callback_2(){ ...; asynchronous_operation_3(callback_3);...; }

void callback_3(){ ...; asynchronous_operation_2(callback_2);...; }

As you can also see in Listing A.1, it may not be clear what the flow of execution is, especially in larger programs with multiple asyn- chronous branches, chained operations, and cycles.

2 Our library should capture the dependencies of operations more explicitly. Since the ellipses (...) could represent many statements, the information about event consequences is more densely contained within a syntax like so (considering the ... parts may be grouped into operations as well): ...; asynchronous_operation_1 -> asynchronous_operation_2; ...; asynchronous_operation_2 -> asynchronous_operation_3; ...; asynchronous_operation_3 -> asynchronous_operation_2; ...;

From this notation, the imagination of the dependencies is straight- forward:

asynchronous_operation_1

asynchronous_operation_2

asynchronous_operation_3

Figure 1: Operation Consequences

Since standard exceptions that represent errors during an operation propagate in the same direction as return values, error handling has to be partly reworked for the conformance with the IoC paradigm.

Additional goals are therefore: • ability to formulate handling of exceptional cases a similar way as the processing of data; • basic introspection and debugging support, which is typically weakened by each abstraction; • extensibility and modularity, i.e. having different logical parts, or layers.

3 Thesis Structure

Chapter 1 – Programming Concepts familiarizes us with the concepts forming the building ground of this thesis. Chapter 2 – Design explains the design choices and the programming paradigm that RBL introduces. Chapter 3 – Implementation outlines RBL’s API layer by layer. Chapter 4 – Evaluation compares RBL against the existing solutions and analyzes the benefits and disadvantages of RBL from both conve- nience and performance perspectives. Appendix A – Asynchronous API Model describes the traditional interface of an asynchronous API in C++. Appendix B – Technical Details summarizes the structure and tech- nologies used by the project.

4 1 Programming Concepts

To become fluent with RBL, one has to understand the concepts upon which it stands. RBL shifts asynchronous programming from the tra- ditional, procedural approach to the sphere of declarative and functional programming, which is more suitable for event-driven programming. All of this is done thanks to the Inversion of Control (IoC) principle. RBL encompasses all these concepts in an interesting way. This section further explains the terms and their roles in RBL.

1.1 Imperative and Procedural Programming Paradigm

Perhaps the oldest programming paradigm is writing a program as a list of sequentially executed commands that manipulate the state of a program and its acting environment. The imperative programming paradigm remains to be the most popular way to program. It is so, because of its closeness to the hardware that performs the computa- tions (in terms of instructions), which allows us to achieve maximum performance. Procedural programming is a form of imperative, with commands structured into functions with behavior defined by them. The programmer has complete control over the program’s execu- tion flow and potentially the entire state of the program. That personis the only one who dictates whether and when certain actions happen.

In terms of RBL: RBL diverts from these paradigms but does not want to abolish them. Imperative code fragments should still be used for implementing synchronous operations that are further logically indivisible, or if doing so is the only efficient way. It is, after all, the only way to perform the underlying fundamental operations in C++. RBL therefore provides minimal adapters to its interface for describing operations as classic functions – callbacks containing imperative code.

Said interface is based on the ideas listed further.

5 1. Programming Concepts 1.2 Inversion of Control

Inversion of Control (IoC) [5] is a broader programming principle, which, in its most general form, liberates the client system (or a pro- grammer) from the responsibility for the program’s execution mecha- nisms. The control is usually transferred to an execution managing entity, also known as a framework. Under this paradigm, the client code supplies the implementation of a program’s logic to the frame- work as a set of functions or objects. The framework then uses this additional behavior when it is necessary.

In terms of RBL: RBL implements the IoC model, but it does not manage asynchronous execution by itself. To complete RBL into a fully functioning model of asynchronous execution, we wrap the API of an existing asynchronous framework in RBL’s structures. We have chosen to create a proof of concept with one such API – Boost.Asio (see A.2). IoC, in the scope of RBL, means that a function, which is normally called from a site which inquires its return value for given arguments, is now called the other way around. The function is invoked by the presence of its arguments (information source) to produce its output value, which is an information source for other functions or tasks.

1.3 Event-driven Programming Paradigm

Imperative programming alone is certainly not the best way of han- dling events with no exact time of occurrence. A programmer has to frequently check the state of the program and its environment to detect changes that should initiate further actions. Event-driven pro- gramming transfers this responsibility from the programmer to the event handling system. Apart from creating a specialized event-driven language, such a system can be constructed in any higher-level language with an API under the IoC principle – allowing the implementation of ab- stract interfaces, or callback functions. The underlying mechanism remains based on an internal event loop, adaptation to an existing asynchronous API, or even manually written interrupt routines.

6 1. Programming Concepts

In terms of RBL: The execution of the user’s code is guided by events. External events (acquired from the wrapped asynchronous API) ini- tiate action inside a program. The behavior is then further specified using the same principle – as a chain of event handling callbacks generating and reacting to (now application-internal) events.

1.4 Declarative Programming Paradigm

Declarative programming paradigm is a tool for building logical struc- tures specifying a program’s behavior, without explicitly stating the execution steps. The semantics of execution are implicit, and the im- plementations may vary diametrically, as long as they are producing identical results.

In terms of RBL: RBL further utilizes the declarative approach to describe connections between event handlers as a dependency graph. These declarations become the essence of an RBL program. All the necessary information about a program written with RBL is contained within the declared connection graph – its function (vertex) types and topology. Only once the program’s graph has been constructed, it can be executed. The execution, however, uses a fixed set of rules.

1.5 Functional Programming Paradigm

The last kind of programming, which found its use in RBL is a form of declarative programming – functional programming. It is based upon building structures that represent computations. These structures are treated as objects and can be composed as such to form more complex computations and whole programs.

In terms of RBL: RBL does not implement the functional paradigm in its pure sense. RBL’s graph consists of functions that are allowed to have an internal state that persists between calls. RBL, as C++ alone, contains only some elements of functional programming. It supports the construction of functional-like data transformations, e.g. to map, filter, or reduce values. The functional paradigm is visible mainly around algorithm blocks (see 3.4).

7 2 Design

At its core, RBL takes the IoC paradigm, which is inherent for asyn- chronous operations (i.e. functions with callback parameters), a step further. RBL also builds all synchronous operations in the IoC way. The motivation behind this is to make no difference between the way of writing synchronous and asynchronous operations. This implies more uniform code and easily interchangeable operations of distinct seman- tics. Since the IoC concept needs to know the structure of operations, we need to devise the computation structure and its elements.

2.1 Operation as a Block

We begin by designing a representative for each synchronous or asyn- chronous function – an object called block. A block possesses the same characteristics as a function – its behavior, inputs and (potentially more than one) output. The inputs and outputs have fixed event types. Because N-ary functions and functions with multiple independent outputs require additional concepts (see 2.5), let us continue with the simple case – unary function. Its body is invoked at the place where it’s result is required:

auto return_value= function(argument);

With IoC, the same semantics of function are hidden inside a block. The interface is what differs:

auto block= rbl::make_block(function); block(argument);

The function can be still invoked using the same syntax, but we have retired from the use of the return value as the immediate result (as with asynchronous functions). The output value can only be acquired under the IoC concept, that is, by providing a result handler, which it should call. Not by coincidence, the handler is another block.

8 2. Design

But how do we inform block of its presence? By declaring com- munication connections between blocks: auto handler= rbl::make_block(handler_function); connect(block, handler);

Likewise, block can become a handler of another operation: auto source= rbl::make_block(source_function); connect(source, block);

This model allows blocks to represent operations that are more universal than the mathematical definition of a function. Multiple output ports are allowed, and each output port may produce multiple events during block’s invocation.

2.1.1 Termination Blocks, as objects, may have a limited lifetime. Termination is the act of block removal from the program, which can have various reasons and significant effects (see 3.1). Considering termination policies as part of a blocks’ semantics, we can employ the view of a block as a transformation of input event sequences into output event sequences. The sequences can be either infinite or finite – ended by termination. This becomes useful in understanding algorithm blocks (see 3.4).

2.2 Program as a Graph of Blocks

Connections represent information (i.e. event) flow between oper- ations. Each connection exists between one block, considered as a source of the events, and the other, their recipient. Together, blocks and connections form a directed computational graph of a program as vertexes and edges. The graph persists through its use and may be modified by adding or removing blocks and connections atthe program’s run-time. RBL achieves this without a central authority, i.e. entity, that would manage the graph. Connections are part of the block’s internal state. RBL further introduces an enhancement – output of each block can be connected to multiple blocks’ inputs and vice versa. Thus, one

9 2. Design

operation can launch multiple following ones, and a block may be used as a handler of multiple events. Our program is indeed only a declaration of vertexes and edges:

// vertexes block_1_t block_1; block_2_t block_2; block_3_t block_3;

// edges connect(block_1, block_2); connect(block_1, block_3); connect(block_2, block_3);

For apparent reasons, RBL introduces an abstraction layer for sim- plifying the syntax and condensing the information about the connec- tion topology (see 2.6).

2.3 Events

Events are the information-representing objects, which flow through established block graphs. An event can represent a value, a valueless occurrence or an error. RBL uses a more general term for the objects in its implementation – messages (see 3.1).

2.4 Synchronous and Asynchronous Blocks

With the rules established above, there is no syntactical difference between the usage of synchronous and asynchronous blocks. Such information is only contained within the type of operation, which the block represents. Synchronous blocks produce events only as immediate reactions to the input events during their invocation. Asynchronous blocks initiate asynchronous operations with minimal blocking and the production of the output events is in the hands of the underlying asynchronous API.

10 2. Design

This conceptual distinction is most visible when we employ the block graph perspective. We can see that the graph consists of communica- tion chains – directed paths in the graph.

2.4.1 Synchronous Subgraphs Multiple connections of the same block introduce branches – di- rected trees. A directed subgraph which is delimited by asynchronous blocks is called synchronous subgraph. In other words, a synchronous subgraph does not involve any asynchronous operations and it is executed in a blocking manner to the initiating block. This should not be an issue – there will always be a portion of code that is exe- cuted synchronously. Quite the opposite is true; in order to minimize the overhead of asynchronous execution scheduling, programmers should strive after writing their programs with the least amount of asynchronicity, while still being able to produce the desired behavior. With multiple connections, a block sends the event to each recipient one in a blocking iterative manner, in order of their connection. This results in the depth-first execution. Given that execution of one branch is not an obstacle in executing another branch in due time, this is perfectly valid construction. If one branch should take a long time to execute and incur unacceptable delay on others, it should be either ordered near the end of the list of connections (registered among the last ones) or delimited by an execution deferring asynchronous block (i.e. executor, see 3.3). Consider the following block graph:

Figure 2.1: Synchronous Subgraph

11 2. Design

A is the initiating block, C1 is an asynchronous block. D1 and D2 there- fore do not belong to the synchronous subgraph. Given the connec- tions being created in the top-down order and each block producing exactly one event for each input event, the invocation order is: (A), B1, C1, C2, B2, C1 (again), C3, D3, D3 (again). Note that X and Y belong to the subgraph, because synchronous com- munication may also be present in the backward manner (see 3.1). A synchronous subgraph may be executed by at most one event source at a time, not to impose a -safety requirement upon blocks. Implicitly parallel execution would also be harder to grasp and use with confidence.

2.4.2 Cycles A cycle in the block graph is a result of expressing dependency of an operation upon its previous results.

Synchronous cycles In general, RBL does not allow synchronous cycles – cycles consisting entirely of synchronous blocks. This would also lead to the imposition of the reentrancy1 condition upon the participating blocks, which would be a nuisance to implement, or even impossible, given block’s semantics and internal state. Instead, we can design a special synchronous block, which will make all synchronous cycles containing this type of block valid, under special semantics (see 3.2).

Asynchronous cycles Cycles containing asynchronous blocks are easier to imagine, and no extra support is required from RBL. The syn- chronous execution stops at each asynchronous block and recursion does not occur. The only things the programmer has to be certain of (also in the synchronous case) are: • the termination condition, i.e. a situation, which breaks the asynchronous loop;

1. Reentrant functions are those with well-defined recursive execution in the same thread of execution.

12 2. Design

• multiplicative (exponential) growth of the number of events in the loop, originating from consecutive branching and merging of event paths.

2.5 Nested Graph Composition

A graph of blocks grows with the increased complexity of a program. In such cases, it is desired to add more structure information – divide the graph into subgraphs with related purpose. RBL thus allows wrap- ping a subgraph of a program in a composite block, or vice-versa, it allows implementing a block in terms of a nested block graph. Such a block is called a group. It is only an RBL concept of how a block-aggregating class should look like. In practice, it is essentially a user-defined class, which holds its blocks as private attributes and creates the connections at its construction. A group shall expose the inputs and outputs of the internal subgraph as its own inputs and outputs via member attributes/functions: connect(source_1, add_numbers.input_1); connect(source_2, add_numbers.input_2); connect(add_numbers.output, handler);

A group does not necessarily have to be implemented in terms of an internal subgraph. It is a mandatory element when building blocks with more than one input and/or output. Such blocks have in RBL their internals hooked to the exposed I/O blocks, and the behavior is implemented imperatively. Because input events arrive at different times, a group with multiple inputs usually keeps an internal state to collect the data required for producing an output. An example of such blocks are combiners (see 3.4). Lastly, groups may figure in declarations in a hybrid way: connect(source_1, zip_events.input_1); connect(source_2, zip_events.input_2); connect(zip_events, handler);

Notice, that the first connection uses the group block directly, because we can unambiguously identify the one output it has, the same way as handler has one input. The details are further elaborated in 3.1.

13 2. Design 2.6 Syntactic Sugars

Writing programs by declaring blocks and connection in the verbose manner above is, in fact, worse than the existing approaches. Since our goals are to simplify programming, RBL has to pull out a secret weapon – expression templates. Expression templates are a C++ idiom for implementing class tem- plates, which represent computations at compile time [6, Chapter 27]. All the required information to building the computation is embed- ded within an expression template’s instantiated type. The expression classes are usually composed so that an outer expression template is instantiated with the types of its nested expressions. RBL uses this technique to implement expressions, which build the underlying block graph around the provided arguments. The simplest expression is a chain expression, constructible with operator>>, as with the chain function. Let us consider the following example (A, B and C being blocks): connect(A,B); connect(A,C); connect(B,C);

Such structure may be also constructed like this: A >>B >>C; A >>C;

Which is equivalent to its more verbose version: chain(A,B,C); // same as chain(chain(A,B),C); chain(A,C);

With both forms above, we have achieved an establishment of a minimalist declarative syntax. We are able to specify an arbitrarily long chain of connections in one statement. RBL provides some other expressions for general use, partly as a proof of concept and an example. Most of these expressions also generate additional blocks and branches in their produced graph. Expression templates are implemented in a manner, which allows the user to create potentially domain-specific expressions, by re-using existing expression API of RBL. The implemented expression tem- plates are explained more thoroughly in 3.6.

14 3 Implementation

The implementation of RBL is divided into the following logically layered parts:

• core – low-level, object-oriented block API; • concrete blocks – built-in, executor and algorithm blocks; • expression template layer – syntactic sugars for block graph declaration; • introspection layer – an additional layer to core, for debugging and visualization.

3.1 Core Functionality

The core layer of RBL implements its variant of IoC using block objects. A block, however, is a broader term at the API level, as explained in the following section.

3.1.1 Blocks All RBL blocks are classes derived from rbl::block::block_base. The base class defines the following internal state, which is common for all blocks:

• uninitialized – the block has not been connected yet, it does not have any effect on an RBL program; • running – the block is connected and has the potential for re- ceiving and sending events; • terminated – the block is in the of destruction, no more events will be received nor produced; • invalid – the block has been invalidated by move semantics [7, $11.3.4.2], it does not have any effect on the RBL program.

15 3. Implementation

Publisher and Subscriber Blocks The atomic block objects are divided into two connectible counterparts: a publisher and a subscriber. Publisher is a block, which is a source of events (e.g. periodic event generator). It is derived from rbl::block::publisher_base class tem- plate, parametrized by the type of events it produces. Subscriber is a block, which is a recipient of events (e.g. standard output writer). It is derived from rbl::block::subscriber_base class template, parametrized by the type of events it accepts.

These two classes implement the connection and event sending mech- anisms. They are abstract and further extensible via inheritance.

Connection Connections represent potential event paths between blocks and can be formed only between a publisher and a subscriber instances of the same event type, at run-time. Each publisher holds a list of references to its subscribers and vice-versa. Different matching (derived) block types can be therefore connected to their counterpart, without any distinction on the other side.

Termination Termination of a block can be a result of:

• an explicit request from inside/outside the block, • termination of all subscriber’s publishers, • termination of all publisher’s subscribers.

The last two mentioned scenarios implement a strategy for auto- matic block graph cleanup. When a block terminates, it notices all its connected blocks of this event, which remove it from their connec- tion lists. The process can continue recursively, to result in a part of a synchronous subgraph being destructed at once.

16 3. Implementation

RBL allows termination of blocks, which are currently being exe- cuted from the same thread (e.g. a subscriber terminates its publisher upon receiving an event from it). Publisher and subscriber base classes implement a mechanism to register disconnections (and connections) without immediate effect, as to avoid iterator invalidation of the un- derlying of block references [7, $22.3.11.5/1, 3].

Listing 3.1: Block Termination 1 if block is not currently entered 2 enter the block 3 add/remove the other block 4 if the list of block counterparts is empty 5 terminate 6 leave the block 7 else 8 stage the other block for addition/removal

The staged changes are applied after each iteration through the list of connected blocks. The iteration has in all cases only a communi- cation purpose (see 3.1).

Transformer Blocks

Transformer is a block that is both a subscriber and a publisher, i.e. a class derived from both. Transformers represent operations with a single input and a single output. A transformer may appear to be in one state as a subscriber, and in another as a publisher. This is the case with asynchronous transformers – executors (see 3.3). The transformer_base base class rep- resents transformers, resolving the ambiguities, that are created by the questionable, although intentional multiple (non-virtual and indirect) inheritance of the rbl::block::block_base class [2, Section 21.3.6]. Namely, it implements termination as termination of its subscriber and publisher – in that order, and disambiguates between the operator() overloads – uses those of the underlying subscriber.

17 3. Implementation

Figure 3.1: Block Class Diagram

Group Blocks

RBL does not allow inheriting from more than one subscriber or pub- lisher classes at once. For the purpose of multi-input and multi-output operations, it utilizes object composition. A group is each block de- rived from the group class template, parametrized by the type of the base block. The base block is a type, which will be used for the implicit se- mantics (derived from) of the group. Therefore, a group can be a subscriber, publisher, transformer or none of those. If not provided, the base block will be taken as block_base and its subblocks grouped using only composition.

18 3. Implementation

Here is an example of a strictly composed group (i.e. implicitly neither subscriber nor publisher): connect(input_1, group.input_1); connect(input_2, group.input_2); connect(group.output_1, output_1); connect(group.output_2, output_2); If a group has only one output (publisher), we can make our class derive from group and use it as: connect(input_1, group.input_1); connect(input_2, group.input_2); connect(group, output);

The same method applies to a single input. A block of a single input and a single output shall not be implemented as a group derived from group. This class only serves for introspec- tion purposes. A group instantiated with block_base is an exceptional case, whe- re the base class provides necessary block semantics for the group type, for which there is no other way to acquire.

Lifetime of Blocks It is forbidden for an RBL block to be destructed in the running state. A block has to be uninitialized, terminated or invalidated beforehand. Not obeying this rule will lead to undefined behavior1. A group shall terminate all its inner blocks as part of its own termination process and vice-versa.

Dynamic Blocks Blocks can be allocated dynamically, although the meaning is more restricted in RBL. A dynamically allocated block is a block created via rbl::dynamic or rbl::dynamic_ptr functions. The result is a reference, or rbl::dynamic_block_ptr (RBL’s variant of std::shared_ptr). The created block is in both cases a self-owning block, meaning that it is not destructed at least until its terminates. This (shared) ownership is further prolonged for the lifetime of the last living external user-held rbl::dynamic_block_ptr instance.

1. Because of leaving invalid pointers in the block lists of the connected blocks.

19 3. Implementation

3.1.2 Messages

Communication between connected blocks works in both direc- tions and is implemented via message sending. A block always mul- ticasts its message to the connected blocks in the order, in which the connections occurred.

Forward Messages A publisher can send three types of forward messages, according to their informative purpose:

• Data message – rbl::data_message – transmits data, • Error message – rbl::error_message – trans- mits errors, • Termination message – rbl::termination_message – transmits termination signals.

Backward Messages A subscriber can send a backward message of one type to its publishers:

• Backward termination message – rbl::backward_termination_message – transmits backward termination signals.

Data Messages A data message holds a value of the given type. The message may be used to transmit data only between a publisher and a subscriber of the same data type. A data message may be valueless; RBL provides the empty rbl::event class for this. From now on, we will refer to rbl::data_messagerbl::event simply as events.

Error Messages An error message transmits exceptions held by std::exception_ptr or std::shared_ptr for plat- forms without support of standard exceptions (-fno-exceptions com- piler switch). rbl::exception is a mandatory base class for all excep- tions that can be transferred this way.

20 3. Implementation

Termination Messages A forward/backward termination message holds a reference to its originating publisher/subscriber. Based on this information, the receiving side knows which block to unregister from its list of publishers/subscribers.

Messages are handled by overriding virtual receive member function overloads of the publisher_base and subscriber_base classes, e.g.:

Listing 3.2: Squaring Transformer class squarer: public transformer_base { void receive( data_message const& message) override{ // send squared value this->send(message.get()* message.get()); }

void receive(error_message const& message) override{ // forward(default behavior, in fact) this->send(message); } };

The above behavior can be specified a simpler way than with in- heritance, see 3.2.

Message Containers Since messages cannot be often processed by blocks instantly, whether it is because of waiting for an additional message from the same or another input, they have to be stored as part of the blocks’ internal state. RBL provides basic message containers in its core, which it later uses to implement the shipped concrete general-purpose blocks. Message containers provide a unified, first-in, first-out interface for storing all three types of forward messages:

• insertion, • observation – the type of the top-most message, number of stored messages, • top-most message destructive/non-destructive access (by mes- sage type, by polling – callbacks),

21 3. Implementation

• top-most message discarding without reading, • discarding the given number of future messages.

Message Slot Message slot is a storage unit that can hold at most one message at a time. An incoming message overwrites the stored one (if any). The class templates are:

• rbl::message_slot – includes ignoring of future messages; • rbl::message_slot_basic – does not include ignoring of fu- ture messages; which requires an internal counter, as a space optimization.

Message Ring Message ring is a circular buffer with a capacity spec- ified beforehand. The internal buffer, implemented as a sequence of message slots, can be possibly dynamically allocated. When full, an incoming message overwrites the oldest message in the ring. The corresponding class templates are:

• rbl::message_ring, • rbl::message_ring_static.

Message Queue Message queues are dynamically sized containers, which, in contrast to slots and rings, store their messages in separate queue structures for each type. The reason behind this is space opti- mization. The above containers store their messages in a std::variant instance, the size of which is at least the size of the biggest type it can hold. Since data messages can have arbitrary size, we did not want to incur the same cost on storing of the small, fixed-size error and termination messages, however less frequent they may be. A message queue class has to preserve the relative order of mes- sages, as they were inserted. This is done by assigning an internal identification number (ID) to each message. The ID is later usedas a deciding factor in choosing from which internal queue the next message should be accessed.

22 3. Implementation

The queue variants are: • rbl::message_queue – for data, error and termi- nation messages, not thread-safe; • rbl::message_queue_dataless – for error and termination messages, not thread-safe; • rbl::message_queue – thread-safe variant of rbl::message_queue; • rbl::message_queue_dataless_locked – thread-safe variant of rbl::message_queue_dataless.

3.2 Built-in Blocks

RBL has a collection of built-in concrete blocks, the behavior of which can be possibly modified/supplied by (template) arguments (flags/- functors), instead of more verbose class inheritance.

3.2.1 Convenience Blocks A user can construct concrete subscriber and transformer blocks from callable objects or simple functions: auto sub= rbl::subscriber([](auto value){ std::cout << value << std::endl; });

int power(int value){ return value* value;}

auto tran= rbl::transformer(power);

If the power function threw an exception, it would be caught and sent as an error message. The type of the block can even be deduced from the type of the callable argument (equivalent to the previous example): auto sub= rbl::make_block([](int value){}); auto tran= rbl::make_block([](int value){ return value;});

23 3. Implementation

In addition, there exists a block, which simplifies implementation of a transformer’s body with a nested graph: auto tran= rbl::composite( [](rbl::passive_subscriber& sink, input_type const& value) { auto start= rbl::once(value); start >> ... >> sink; start(); });

The body is executed (i.e. the subgraph is re-constructed) for each input message. The main use case is the parametrization of multiple temporary blocks according to an input value. The sink parameter represents an internal subscriber of the composite block, which in turn directly forwards the messages as output messages of the composite block itself.

3.2.2 Identity Block identity can serve two purposes. It can be used as a connection node between multiple blocks. Secondly, it can perform a conversion from the input type to the output type, which can differ.

3.2.3 Constant Blocks The constant block transforms each input event message to an output data message of a given value or an output error message with a given exception. The once block is a version of constant which terminates immedi- ately after sending the first message.

3.2.4 Termination Signaler The block terminated sends out an event message before its ter- mination. The connected subscribers can react to such event regularly.

3.2.5 Error Handlers By default, each transformer block implements the manipulation of error messages by forwarding them. If an error message reaches a

24 3. Implementation

subscriber (or a transformer without subscribers) without a handling strategy at the end of the connection chain, the program is terminated. There are three types of dedicates blocks that manipulate the flow of error messages: • catch_errors – invokes an user-provided handler if an incom- ing error message contains an exception matching a given type, otherwise forwards all messages; • ignore_errors – prevents an error message from being for- warded if its exception type matches a given type, otherwise forwards all messages; • capture_errors – transforms matching exceptions received as error messages into exceptions sent as data messages; input data messages are ignored.

The following example shows how the control flow can be ex- pressed the same way for exceptions as for data, by injection of the error handling related blocks into the graph:

Listing 3.3: Error Manipulation produce_or_fail >> ignore_errors() >> on_success; produce_or_fail >> capture_errors() >> print_trying_again_message; produce_or_fail >> capture_errors() >> produce_or_fail;

ignore_errors and capture_errors represent two disjunctive paths to be taken. On failure, a diagnostic message is printed and the cycle for another attempt is entered.

3.2.6 Collect Block As you might have noticed in Listing 3.3, if produce_or_fail is synchronous, we have constructed invalid synchronous cycle (see 2.4). The collect block, as the only synchronous block, can be used to "break" such cycles.

25 3. Implementation

The block is reentrant with the following behavior: Listing 3.4: Collect Block 1 if block is not currently entered 2 enter the block 3 propagate the event further, as normally 4 foreach event in the input queue 5 propagate the event 6 pop the event from the queue 7 leave the block 8 else 9 push the event to the input queue

3.2.7 Switch Blocks RBL provides group blocks, which have the transformation se- mantics of identity, but only accept/send messages from/to the cur- rently active subscriber/publisher. Activation of the next running subscriber/publisher is instigated by the termination of the currently active one. The switch blocks terminate when there are no further subscribers/publishers to switch to. These blocks are useful for im- plementing the concatenation expression (see 3.6). The classes are: • switch_outputs – sequentially chooses the active publisher, • switch_inputs – sequentially chooses the active subscriber.

3.2.8 State Interfacing Blocks Lastly, part of the built-in blocks constitutes of blocks for access and modification of external variables. They exist mainly to support interaction with imperatively written code, a complete rewriting of which into RBL code might be undesirable. The blocks include: • get – reads the associated variable and sends its value immedi- ately upon receiving an input event message; • set – assigns each data message’s value to the associated vari- able upon receiving;

26 3. Implementation

• read – reads values from a forward iterator range, includes overloads for reading from standard input streams; • write – writes values to an output/forward iterator range, in- cludes overloads for writing to standard output streams; • insert, push_front and push_back – insert values into a standard- compliant container.

3.3 Executor Blocks

Executors are transformers which transfer the execution from one synchronous subgraph to another or change the context of execution for the remaining subgraph in another way.

3.3.1 Asynchronous Blocks One of the most common executors should be the ones, which directly interact with an asynchronous API. RBL shows how to implement these blocks above Boost.Asio’s contexts and I/O objects (TCP sockets). They are similar to the above state interfacing blocks, in that they modify the state of the associated I/O context/object, external objects to the block graph. The I/O context operating blocks are (contained within the rbl::asio::block namespace):

• executor – issues a post/dispatch/defer call to a Boost.Asio’s io_context according to the policy, the internal completion handler sends input message stored alongside further; • strand_executor – issues a post/dispatch/defer call to a Boost.Asio’s io_context::strand; • priority_executor – issues a post/dispatch/defer call to a custom-made rbl::asio::priority_context; • delay_executor – delays the re-sending of the input message via an io_context and boost::asio::basic_waitable_timer objects.

27 3. Implementation

delay_executor has three possible policies:

• all – delays all messages, • first – ignores messages while there is a waiting message (inspired by RxCpp’s debounce operator), • last – a new message cancels the previous waiting message.

The other executors listed above treat the I/O context simply as a queue (see A.2), which allows to transform the depth-first execution strategy of what would previously be a synchronous subgraph into a breadth-first pattern.

Figure 3.2: Controlled Breadth-first Execution

In the above graph, the prepended executors change the execution order from B, , E, C, F, G to B, C, D, E, F, G. The priority executor uses rbl::asio::priority_context, which is an adapter for prioritized execution on top of a standard io_context. Each executor has an associated constant priority, which determines the relative order of execution between executors. As a result, if A in the above example generated more than one message in a row and the executors were prioritized with different value each (consider the order B, D, F, C, E, G), the execution pattern would be B, B, D, D, C, F, C, F, E, E, G, G.

28 3. Implementation

The blocks for asynchronous TCP operations (see A.2) are (con- tained within the rbl::asio::tcp::block namespace):

• connect – sends an asynchronous TCP connection request to the endpoint given as the input message, produces a connected TCP socket;

• accept – asynchronously accepts an TCP connection request upon receiving an input event message, produces a connected TCP socket;

• resolve – asynchronously resolves a query into a TCP endpoint using the Boost.Asio’s mechanisms;

• read/write – issues an asynchronous read/write operation from/to the referenced socket, can be either formatted (using Boost.Serialization), or raw (binary).

One limitation of Boost.Asio sockets is that they can process at most one asynchronous operation at a time in the way that is meaningful for us. This is because, internally, the object can partition the asyn- chronous operations into smaller chunks (to be written, for example), and these operations would become interleaved with those of another asynchronous operation. RBL handles this problem by sequencing requests after completion of previous ones.

3.3.2 Thread Block

The rbl::thread block transfers the execution of its input mes- sages to its internally managed thread, with use of a buffering message queue. The thread can be explicitly started and joined, as well as auto- matically – running since the first connection until block’s termination. A potential hazard is that the internal queue may become over- loaded if producing the messages is faster than consuming. There is currently no support to detect this incident and handle it gracefully, but the exceptional behavior may be implemented with relative ease.

29 3. Implementation

3.3.3 Lock Block The rbl::lock block can manage either an external mutex2, or its own. The mutex is held during the forwarding of each message and released before returning control to the invoking publisher. In RBL’s terms, the block is required as an entry point to a shared synchronous subgraph connected to concurrently executed subgraphs (e.g. multiple rbl::thread blocks). An unlocking block counterpart does not exist. Because of how synchronous subgraphs work (see 2.4), we would not want to leave an exclusively held subgraph before returning to the effective locking block. The block states may become disrupted before backtracking or immersing into another branch. Therefore, a locked subgraph may be escaped by a message only through asynchronous or thread blocks. 3.3.4 Termination Executors only operate in the forward direction (apart of the lock block). Backward termination messages from executor’s subscribers do not propagate past the executor in a synchronous, nor asynchronous manner. That would violate the assumptions of the synchronous sub- graph model we have established sooner. As a result, a program may attempt to use a terminated executor. It is at this point that the source subgraph becomes notified of the termi- nation. To avoid misconceptions, terminating an executor from either side (not manually) does not cancel its possible pending asynchronous effects; the termination is enqueued afterwards and happens asthe last action.

Figure 3.3: Executor Termination

When A terminates, the subscriber side of executor terminates imme- diately, but the publisher’s side and B is sequenced after all pending effects. If no effects are pending, the termination of B is also immediate.

2. A mutex is a synchronization primitive used to protect shared data from being simultaneously accessed by multiple threads [2, Section 42.3.1].

30 3. Implementation

When B terminates, the publisher side of executor terminates im- mediately, but the subscriber’s side and A is not affected. Later, when A attempts to communicate a data or an error message, the propaga- tion of the withheld backward termination message is resumed. The original message from A does not have an effect in this case. This is an inherent shortcoming of executors trying to implement the same semantics as synchronous blocks. Nevertheless, there should be no visible behavioral difference of a program because of this, other than possible performance of unnecessary calculations in the source subgraph. 3.4 Algorithm Blocks

As a continuation of built-in blocks, RBL provides basic concrete blocks for building algorithms. The implemented algorithms are largely RBL variants of algorithms that can be found in the C++ standard library. The RBL algorithms represent computations on inputs (messages) distributed in time, as opposed to algorithms distributed in space3. The latter need to have the whole input data available before starting, the former work by processing inputs one by one. RBL implements mostly online algorithms4 because of their natural fitness for this case. RBL implements these algorithms in categories, which share the general semantics and the exact behavior is usually specified by user- provided functions, as with convenience blocks (see 3.2). Moreover, some blocks implement multiple policies for their behavior, which are again selected by parametrization.

3.4.1 Mappers Mappers (essentially transformers) are the simplest of algorithm blocks – they transform the values from the input sequence to values of the output sequences in a one-to-one relation. Currently, the only concrete mappers are the variants of standard clamping functions – clamp_min, clamp_max and clamp.

3. We have been partly inspired also by RxCpp (see 4.1.2) 4. An online algorithm is an algorithm, which can compute its output data sequen- tially from its input data, without needing the whole input to be available before producing a part of its output. 31 3. Implementation

3.4.2 Filters Filters are blocks, which do not transform messages, but decide, whether to forward or discard them. There are several versions: • filter_if/filter_if_not – accept or ignore values based on a user-provided predicate; • filter/filter_not – accept or ignore values based on equality to a given value; • filter_if_consecutive/filter_if_not_consecutive – accept or ignore data messages based on a user-provided binary predicate, called with two consecutive values (filter_unique_consecutive is a concrete example); • filter_unique and filter_unique_unordered remember all previous unique values and forward only the unique ones, the variants differ by the usage of the internal storage – std::set or std::unordered_set.

3.4.3 Accumulators Accumulators progressively calculate one output value for the whole input sequence. Accumulators may be constructed with a user- provided function, which, based on the current state and an input value, changes the value of the state. The type of the state value may differ from both input and output value types, for maximum generic- ity5. The state type is transformed to the output type, again, with a user-provided function6. Accumulators can be constructed with two policies: total and partial. The partial policy behavior sends each intermediate result, while total sends only the final result. If the input sequence was empty and the block was terminated, the initial value is sent, if provided, otherwise no output is generated.

5. For example, this is useful for the average accumulator, since it holds both the sum and the count of yet received values. 6. In the case of the average accumulator, this function performs the division of the two state’s components.

32 3. Implementation

3.4.4 Combiners Combiners are algorithm blocks for processing multiple input sequences in order to generate a single output sequence. They wrap an user-defined predicate with an arbitrary number of parameters. There are numerous policies available: • all – all input values have to be combined (they are stored in input queues, an output is generated, when all inputs are available), an error message propagates further and discards all messages, which would otherwise become combined with a data message at its place, including future error messages; • ordered – input values are accepted strictly in the sequential order, messages from inactive input sequences are ignored, an error message propagates further and resets the currently active input to the first one; • first – the input values are collected and processed when all of them are present, subsequent messages from the same input sequence are ignored until an output value is generated; • last – the input values are collected and processed when all of them are present, subsequent messages from the same input sequence overwrite the stored ones; • first_partial – a variant of first, which sends all intermedi- ate combinations; • last_partial – a variant of last, which sends all intermediate combinations.

3.4.5 Quantifiers Quantifier blocks are similar to accumulators, but they are limited to producing exactly one boolean output value, signaling whether the input sequence has matched the given predicate in combination with the quantification assertion (existential or universal). As the quantifi- cation result may become known before the end of the input sequence, the block may (correctly) terminate prematurely. Quantifiers work with predicates applied on single or consecutive elements. The later

33 3. Implementation

case allows checking for consecutive value uniqueness and monotonic- ity of value sequences.

3.4.6 Generators

Generators is a more general category of algorithm blocks, which produce larger or informatively denser output sequences than the input ones they accept. The generate block outputs a value on an incoming input event and internally generates its successor via a user- provided function. The repeat block sends each input message the specified number of times, immediately after receiving it.The unpack block expects a standard forward-iterable container or a homogenous std::tuple as its input value type, to send their individual elements one by one.

3.4.7 Other Algorithms

There are numerous uncategorized blocks, such as sliding window (the inversion of pack), discretely delayed block (by a number of mes- sages to wait), a block for unzipping a sequence of tuples into multiple sequences of individual elements (unzip), a block for lexicographical comparison of two input sequences and lastly, common set operations. The set operations accept two sorted input sequences (according to a specifiable comparator) to create one sorted output sequence. Their variants have been adapted from the standard library – union (and merge – preserving duplicates), intersection, difference and symmetric difference [2, Section 32.6.3].

3.5 Introspection Layer

The IoC paradigm accompanied with asynchronicity severely de- grades debugging options. This is hoped to be compensated by intro- spection layer, which currently consists of logging and visualization support. Introspection can be disabled at compile time, incurring no costs on program’s run-time.

34 3. Implementation

3.5.1 Logging RBL logs the information about its internal proceedings on multiple verbosity levels, specifiable via command-line flags:

• operations (-v1) e.g. socket requested to send data, • state changes (-v2) e.g. block has been terminated, • communication events (-v3) e.g. a block is sending a value (calling its subscribers’ receive member functions), • internal events (-v4) e.g. a block has been removed from the list of connections.

3.5.2 Visualization The other, more prominent feature, which does not have an analogy in pure C++, is the ability to visualize the program’s control flow. The output has a form of a static graph, as it was at the point of the user’s request. The graph generation is not an atomic and thread- safe operation. Therefore the block graph may not be undergoing modification at the same time. Visualization is performed by an rbl::intro::visualizer object bound to a standard output stream. The visualizer allows the user to customize the subject and options of visualization before they decide to invoke the write member function. This writes the program’s block graph to the output stream, usually being an output file stream, in the DOT format. The DOT format can be visualized externally, using GraphViz [8]. The following information is visualized: • generated (default) or user-defined block names; • block class types (possibly simplified); • block traits, i.e. whether they are dynamically allocated and/or executors; • individual input/output ports of group blocks and their names; • direction, data types and relative order of connections.

35 3. Implementation

Graph scopes serve during visualization as bounds of the visualized subgraph. A user can declare a (nested) scope hierarchy and request visualization of a specific scope they are interested in. Only blocks that were created during the time the requested scope was on the scope stack (active) will be visualized, including its child scopes. Otherwise, scopes do not play any role in the program’s logic.

The following graphic shows two nested scopes, a dynamic (dashed) block, an executor (filled) block, a block (group) with multiple inputs, etc. The group block uses the default identifier, generated from the object’s address, other blocks have been explicitly named by the user according to their functions.

The graph has been generated from the contrived code of example/intro/visualization_basic.cpp. Figure 3.4: Block Graph Visualization

3.6 Expression Template Layer

Expression templates (or just RBL expressions) are used to stamp out distinct block graphs (see 2.6). Expressions can be looked at as a building block for a custom C++-internal language. When we have determined what our atomic expressions are (terminal symbols), we

36 3. Implementation

can continue building compound expressions (nonterminal symbols) around them. Each expression is derived from expression for common semantics, where Input/Output is the input/output data type. An expression may, therefore, have at most one input and output, which are directly connected to the underlying blocks. The type of a compound expression is automatically deduced from its nested subexpressions. The purpose of each expression is to generate its block subgraph and further represent its input and output blocks for use in enclosing expressions.

3.6.1 Block Expression

In our case, block is the single atomic expression. It is represented by block_expression or temporary_block_expression. The former masks named (lvalue) blocks in expressions and the latter the anony- mous (rvalue) ones. With some additional techniques, blocks can be used in compound expressions directly and be converted to block expressions under the hood. Even more, functions, which are compatible with convenience blocks (see 3.2) implicitly construct the corresponding convenience block, which is further fed to the aforementioned step. As a result, non-RBL callables (lambdas, simple functions) may also appear in the expressions directly:

publisher() >> [](int value* value){ return value* value;} >> [](int value){ std::cout << value << std::endl;};

3.6.2 Chain Expression

Chain expression is the simplest compound expression. Its purpose is to create connections between its nested expressions. A chain can potentially spawn a hidden identity block (see 3.2), which is used to convert messages between two adjacent blocks that are convertible, but not identical.

37 3. Implementation

Compound expressions are variadic7 and automatically flattened to simpler types, in favor of optimizations (smaller block graphs) and shorter compile error messages (in cases of invalid template instan- tiation [6, Chapter 9.4]). The overloads with two operands are repre- sented via overloaded binary operators, so the following statements are equivalent: chain(A,B,C); chain(chain(A,B),C); chain(A, chain(B,C)); A >>B >>C; (A >>B) >>C; A >>(B >>C);

Read as: A triggers (sends messages to) B, which triggers C

3.6.3 Complex Expressions

There are three kinds of more tricky expressions implemented in RBL. They are complex in the way that their underlying subgraph is composed of:

• the subgraphs of the nested expressions (not different from chains),

• a prepended hidden block with one input and multiple output connections,

• an appended hidden block with one output and multiple input connections.

The types of hidden blocks are automatically deduced from the types of subexpressions. The blocks, along with the deduction rules, are the only thing that differs between the following expressions.

7. Meaning they can be instantiated with a variable number of distinct types or deduced-from values [6, Chapter 4].

38 3. Implementation

The topology remains the same in all cases, i.e.:

A: source D: target B: 1st operand C: 2nd operand Figure 3.5: Complex Expression Graph

Disjunction Expression Disjunction expression (operator |) has identity as its both hidden block types. Because of this, each input message sent to the generated graph will be synchronously sent to each subexpression’s subgraph. Output messages from all subexpressions’ subgraphs are collected and sent as output messages of the generated graph. A >>(B|C) >>D;

Read as: A triggers B and C, which both trigger D (may happen twice as often as the invocations from A) In practice, the disjunction expression can be used to remove the repetition from Listing 3.3 to the equivalent: produce_or_fail >>( ( ignore_errors() >> on_success ) | ( capture_errors() >> (print_trying_again_message| produce_or_fail) ));

Conjunction Expression Conjunction expression (operator & or &&) uses a pair of unzip (see 3.4) and zip hidden blocks. The input and output types are tuples.

39 3. Implementation

Operator & uses a zip combiner (see 3.4) with the all policy, while && uses the ordered policy. A >>(B&C) >>D;

Read as: A triggers B and C with individual tuple elements, which combine their output values into an output tuple to trigger D This construction has a potential usage in the MapReduce pattern (splitting the work to be done on each message among a static number of worker blocks/graphs), with some additional data transformations: input >> map_value_to_tuple_of_n_elements >> (worker_1&...& worker_n) >> combine_tuple_of_n_elements_to_value >> output;

Concatenation expressions are implicitly flattened as well, which now affects the input and output types. What would previously be std::tuple, C>, becomes std::tuple. To prevent this action in case it is unwanted, there is the no_fold function, which wraps the compound expression we would like to keep intact. The function works for other compound expressions as well, though not bringing any semantic difference.

Concatenation Expression Concatenation expression (operator +) spawns a hidden complemen- tary switch block pair (see 3.2). As a result, an input message is only forwarded to the active expression’s subgraph and only the output messages of the active subgraph are being forwarded as output mes- sages of the generated graph8. A >>(B+C) >>D;

Read as: A triggers only B until it is running, then switches to triggering only C; D is triggered only by B until it is running, then switches to being triggered only by C

8. Since a subgraph may appear in a terminated state at one side and running on the other, the switches could happen to consider subgraph of difference expressions as the active one.

40 3. Implementation

The practical usage can be seen in the scenario of handling a mes- sage sequence with transient strategies, e.g. using take block variants: input >>( (take_10 >> strategy_1) + (strategy_2 >> take_while_less_than_42) + (take_until_consecutive_equal >> strategy_3) + final_strategy ) >> output;

Expression Committing Committing is the act of expressions taking effect – generating their block graphs. Because an expression may undergo further expansions, this is not done immediately at the point of construction. The chain expression is the only one which commits itself automat- ically at the destruction (i.e. the end of the statement). Committing is recursive for all subexpressions, regardless of their types. An expression may be committed from the input and/or output side. This supports a lazy block graph generation: A >>(B|C);

In the above example, the disjunction expression only instantiates its hidden input block and its connections because the output of the expression was never requested.

Expression Capturing Potential unnamed blocks and created hidden blocks have to be stored somewhere. In the above cases, they would be individually dynami- cally allocated, which could result in not using the cache in a coherent way9. To fix this, there is the following construction that produces expressions holding blocks with automatic storage duration (on the stack): auto expr= rbl::capture(expression);

9. is the practice of storing related structures (in terms of access time) close to each other in the virtual address space of a program. It is done to maximize the effect of prefetching from the main memory that is done by theCPU.

41 3. Implementation

Capturing expression moves it to the expr object, and commits it. Unfortunately, the information about how an outer expression uses its subexpressions is not reflected in the subexpressions’ types. Therefore, they always have to create the space for their hidden blocks, including the unused ones. The extra cost also involves unneeded references to the named blocks, contained within block_expressions. Expressions are complex as they are to address these issues with further type modifications via template , but it may be achieved.

42 4 Evaluation

4.1 Use Case Comparison

In this section, we look at different designs of existing solutions for asynchronous event handling in terms of syntactic and semantic dif- ferences. The respective use case examples can be found in the exam- ple/comparison directory.

4.1.1 Boost.Asio

Compared to Boost.Asio (see A.2), RBL makes the structure of writ- ing sequences of asynchronous operations more linear and coherent. Without RBL, the flow of operations is segmented into individual functions, which are connected to their initiators as callbacks. In RBL, such sequence of connections can be formed in one or few statements, tightly packing the information about consequences in one place. Boost.Asio implements its asynchronous model and minimal API. It does not attempt to abstract things further, like RBL, leaving syn- chronous operations to be implemented in the traditional, imperative manner. This is where RBL takes on and continues. Since version 1.54.0, Boost.Asio comes with integration of the Boost’s coroutine (see 4.1.4) implementation [9]. While allowing fine- grained control over program’s asynchronous execution involving I/O operations, the code suffers from the same uncertainty of logical consequences in larger programs. We will skip the analysis of this combination, as the results can be extrapolated from the elementary studies. The comparison example is located in asio.cpp and asio_rbl.cpp source files.

4.1.2 RxCpp

RxCpp is an implementation of Reactive Extensions in C++. Like RBL, it simplifies event-driven programming with the use of IoC, but with abstractions built differently.

43 4. Evaluation

RxCpp’s main building entities are observables (event sources) and observers (receivers). Unlike RBL’s blocks, observables directly provide the interface to be extended and composed using procedural- like syntax. Such syntactic environment is generally referred to as Language Integrated Query (LINQ)1. Here is an example of query construction, chained with operator|: auto observable= range(0, 10) | map([](inti){ returni*i; }) | filter([](inti){ returni < 20; }) | reduce( std::vector(), [](std::vectorv, inti){ v.push_back(i); returnv; });

observable| subscribe>([](autov){ std::copy(v.begin(),v.end(), std::ostream_iterator(std::cout,"")); });

The code transforms a range of integers to their squares, then filters and collects them to a vector to be printed out. A more sophisticated comparison example can be found in rx.cpp and rx_rbl.cpp files. An observable can be queried by an observer to create potentially another observable, similarly to RBL’s block chaining. In RBL, we did not aim to mimic the same; we have chosen a more explicit (connection declaring) style instead. RBL’s blocks could be hidden away under the LINQ-styled expression templates to produce a similar interface. Overall, RxCpp comes with more concise syntax, but RBL is more transparent, thanks to multiple layers of abstractions and their open- ness to the user.

4.1.3 Intel® Threading Building Blocks Intel® Threading Building Blocks (TBB) is a library designed for writ- ing multi-threaded applications [11], which is another way of asyn- chronous programming. The library also contains functionality to define programs as a graph of interconnected nodes (the equivalent of

1. LINQ has first appeared in .NET Framework [10].

44 4. Evaluation

RBL’s blocks), contained withing the tbb::flow namespace. Between two nodes, the IoC concept applies. TBB primarily focuses on implementing more complex communica- tion protocol, intended mainly for multi-threaded producer-consumer patterns. It places more emphasis on task management and synchro- nization. The protocol, unlike RBL, is bidirectional. It uses switch- ing between push-pull mechanics, while RBL is based only on push mechanics. In the "push" model, communication is initiated by the producer of data, in the "pull" model, it is requested by their consumer. TBB’s nodes implicitly implement this complex set of behavior, which is further specifiable for each node. In RBL, we would most likely implement additional scheduling mechanisms via executor blocks. In addition, TBB’s nodes are controlled by a central authority – a graph object, which can be used as Boost.Asio’s context to some extent, to run and wait for the execution in each of its nodes to finish. In RBL, we have more flexibility, and the possibility to register a termination handler to each block individually. Lastly, TBB is aimed at and recommends nodes with larger gran- ularity. RBL should be able to handle smaller grain size with less overhead, because of its simpler communication protocol. Therefore, TBB is more suitable for building a data flow graph of more complex, long-running tasks, instead of discrete asynchronous operations. See tbb.cpp and tbb_rbl.cpp files for the difference in code.

4.1.4 Coroutines Coroutines [12] are a relatively new concept for the C++ language. Their main application is also asynchronous programming, however, with minimal structural code changes from procedurally written syn- chronous code. Coroutines change the mechanics of function execu- tion, without using the IoC principle.

Routines Normally, a function represents a continuous list of statements, includ- ing nested function calls. Inside a thread of execution, each function is executed in a blocking manner without interruption2.

2. With the exception of interrupt routines, of course. 45 4. Evaluation

Coroutines With coroutines, execution of a function may become discontinuous. A coroutine function can be run as a normal one, but it can also put itself into a suspended state. At that time, the control flow returns to the calling function, but the coroutine’s internal state is moved to a separately held, usually dynamically allocated data structure. The caller is provided with a handle to this structure. They can later use it to resume the execution of the associated coroutine from the state, in which it was suspended. The side-allocated coroutine state is destroyed when no longer needed – the coroutine has no more instructions to run. A suspension of a coroutine from within is either associated with a produced output value of given return type or waiting for another resource. The mechanism, therefore, allows coroutines to produce () multiple output values, allowing so-called "generating" func- tions3 (including infinite ones). Unlike RBL’s or RxCpp’s IoC approaches, the way of declaring coroutine functions and dependencies remains at the level of simple function declarations and function calls. Also, if an RBL block has multiple values to output, it does so without intermediate suspensions, and value production is guided by the producer, not the consumer.

Standard Proposal Coroutines TS4 are expected to become a part of the C++20 standard. The implementation includes reserved keywords to support expres- sions of asynchronicity:

• co_await – wait for a coroutine, • co_yield – produce an output value, • co_return – produce the last output value.

With this, a matter of writing asynchronous code becomes little syntactically different to that of synchronous code. Coroutines are

3. These are similar to Python’s yield generators 4. Technical specification ISO/IEC TS 22277:2017.

46 4. Evaluation

therefore relatively low-level and effective primitives while being per- fectly usable. We think RBL, and other IoC-based can still compete only under the following requirements:

• mild emphasis on performance – coroutines are expected to per- form much better, even allowing compilers to perform unusual intrinsic optimizations (to avoid said dynamic state allocations, among others); • strong emphasis on coherence of dependency information – RBL’s graph declarations dominate in this area.

Extension The standard implementation of coroutines is open to user-defined behavior in some places. This is done by providing two counter- part class interfaces – promise (for the callee, not to be confused with std::promise) and awaitable (for the caller). Both classes allow users to insert additional state and behavior, which will be performed at times like suspension of execution. The CppCoro5 library makes use of this extensibility to build a slightly higher-level layer over coroutines.

Future Vision If standardized, coroutines will eventually become the preferred way of asynchronous programming in C++, for their incorporation into the standard, if nothing else. This will make it a common knowledge base for a modern C++ programmer, which will even further promote their popularity. The standard implementation, however, requires a substantial amount of work to be performed by the compiler and library ven- dors, as the feature is tied to the most basic concepts of C++’s abstract machine. This means that the support of coroutines could be delayed some time, before arriving on some platforms. Coincidentally, embed- ded platforms, a major target for event-driven programming, are not among those that follow the cutting edge standards. RBL is built with

5. https://github.com/lewissbaker/cppcoro

47 4. Evaluation

relatively modest requirements, although on the C++17 standard. It is, therefore, ready to be ported to all platforms for which a C++17 compiler exists with minimal changes.

Other Implementations of the Coroutine Concept Apart from the standard library and compiler vendors’ implementa- tions, coroutines have been available in other forms to a certain degree, most notably the Boost.Coroutine2 library [13]. The development of this support library began even before C++11. It implements the same mechanics, although without dedicated language keywords. The func- tionality comes with classes, which are used as function parameters and manipulated inside functions that become coroutines. The sup- port to suspend a function’s execution and save the state comes from the Boost.Context library [14].

We did not invest the time in a use-case comparison example with coroutines, as there is yet no definite standardization, and they are conceptually too distant from RBL. The exact fitness would have to be evaluated with a number of examples, many of which could be biased towards one side.

4.2 Performance Analysis

RBL’s implementation of IoC introduces some performance drawbacks. In this section, we will identify the most prominent ones and assess their severity with benchmarks of simple non-RBL versus RBL code.

4.2.1 Overhead Analysis In the following analysis, we leave individual concrete blocks aside and focus on the common performance factors. We can agree that the most prominent C++’s optimization feature is function call inlining, which aids the compiler to perform further optimizations, based on the available internal code model. Because of the dynamic nature of RBL (a program’s graph is built at run-time, not compile-time), we are losing this capability between blocks.

48 4. Evaluation

Static Connection Overhead More obvious overhead mainly appears in form of pointer indirection (which prevented inlining in the first place) and dynamic dispatching [15]. To transmit a message over one connection, an indirection to the dynamically allocated list of block pointers is performed. Then, each pointer in the vector is dereferenced (another indirection), to call the appropriate virtual function.

The first indirection, as well as the dynamic allocation altogether, can be avoided. RBL can be compiled to create statically sized static list of block pointers (std::array), instead of std::vector for each block. This should theoretically also result in better cache coherence. The static number of block pointers to store is specified by the user. If a block needs more connections than this number, it will allocate a dynamic list and the whole static list will be unused. If the static size is too large, a big fraction of the designated space may be unused. Since the variation between static and dynamic container types is implemented with std::variant, a reasonable default value for the static size is sizeof(std::vector) / sizeof(void*), not adding any space overhead, except that of std::variant itself6. The number depends on the platform, compiler, and the standard library’s implementation; e.g. on x86-64 GCC with libstdc++, it is generously 3. Lest we forget one condition check for each access, which might make things only worse. In our benchmarks, the performance difference between these two approaches was not noticeable, so it should rather be tweaked for each program individually.

We will look at how the implemented inter-block communication impacts the performance in 4.2.2.

Dynamic Connection Overhead Connections and disconnections are asymptotically costly, because each of them searches the underlying associative container in O(n) time, to avoid multiple connections or find the connection to remove. But, assuming that these actions form only a fraction of an RBL pro-

6. std::variant stores the index of the type currently being held.

49 4. Evaluation

gram compared to actual communication and are negligible with the usual, low connection branching factor, we will not dwell on this.

Communication Overhead Messages are passed from a publisher to its subscriber without the use of move semantics, as one message may have multiple destinations and therefore cannot be invalidated, but copied.

Executable Size C++ templates vastly present in RBL result in multiple instances of block logic and internal communication functions and to be compiled for each data type [2, Section 23.2.2]. The binary executable is therefore visibly larger than that of equivalent non-RBL code7. In general, we cannot correlate executable size with the speed of the program. The binary size, however, is an important trait to consider in environments with limited program memory, e.g. embedded systems. By using dynamic dispatching in block graphs, instead of extending the block templates to hold the type information about connected blocks at compile-time, we have avoided additional growth of the executable size (code bloat), at the run-time cost analyzed sooner.

Compile-time Overhead Extensive use of templates in each layer of RBL visibly degrades com- pilation times, and deep template instantiations require a substan- tial amount of system’s memory to compile. This is a well-known drawback from the generic use of C++ [6, Section 23.3]. The resorting techniques to be taken by the user include precompiled headers [6, Sec- tion 9.9], explicit instantiation [6, Section 14.5], and (future) modules [6, Section 17.11].

7. It is still smaller than the RxCpp executable in the comparison example (rx_- rbl.cpp compiled to 1.1MB versus rx.cpp with 1.9MB)

50 4. Evaluation

4.2.2 Performance Benchmarks

Each of the following benchmarks measures the performance of RBL compared to equivalent imperatively written code (see B.3 for technical details). By doing so, we get the picture of inherent communication overhead. All benchmarks are constructed around the transmission of given total data size, with message granularity further parametrized on the x-axis, increasing exponentially. There are two y-axes:

• left – the absolute time required to complete the transmission with base-10 logarithmic scale, • right – the efficiency percentage (performance of RBL relative to non-RBL) with a linear scale.

The most basic case in which we were interested was synchronous forward data message transmission between a publisher and a sub- scriber.

Figure 4.1: Synchronous Benchmarks

As expected, RBL performs the worst with smaller messages, com- pared to non-RBL, because the communication mechanism (indirec- tion) is used more intensively and takes the majority of the CPU time.

51 4. Evaluation

With combination of Boost.Asio’s io_context, this cost becomes much less prominent8, and with TCP sockets, barely visible.

Figure 4.2: Asynchronous Benchmarks

8. Note that there are even three blocks, two connections in the first case’s loop; four blocks and four connections in the second (see execute.cpp and read_write.cpp).

52 4. Evaluation

Lastly, also in the case of producer-consumer pattern with separate threads and a buffering queue, the overhead is acceptable. We could agree that this pattern is not suitable for fine-grained communication in either case, so the values for lower message sizes should be of no interest to us.

Figure 4.3: Asynchronous Benchmarks (thread)

There are many more evaluated versions of all the aforementioned benchmarks, and performed on executables compiled with 6.0.0, as well as GCC 8.1.0. The benchmarks were produced on 64-bit Ubuntu 16.04 LTS, running on Intel® Core™ i5-4210U CPU @ 1.70GHz × 4 with 8GB of DDR3, 1600MHz RAM.

4.3 Debugging

The introspection layer does not bring the usual debugging options back to the levels of code without IoC. We cannot debug IoC code using breakpoints, step-execution and subsequent state inspection with the same level of comfort, as the program’s logic is scattered among loosely-coupled functions. The stack is dominated by RBL’s internal communication functions over the user-defined ones. There is a large potential for extension of debugging options (see 5.1).

53 5 Conclusion

In this thesis, we have designed and implemented a C++ library, that changes the way of expressing event-driven programs. We have ex- plained the basic building principles, which we have later solidified by showing their usability to implement concrete blocks for various operations and algorithms. We have shown the power of C++ in terms of being able to design our own declarative sublanguage with the purpose of simplifying the syntax of block graph construction. We have performed a concise comparison between the library and similar existing solutions. The library, built on general concepts, proved to be able to compete with various, more specialized libraries in terms of usability and performance. Among the last things, we have described an upcoming C++ language feature – coroutines, which should, however, significantly narrow down the prominent use cases of our library.

5.1 Future Work

There are potential improvements in each layer of RBL.

Multi-platform support RBL has been developed on the x86-64 ar- chitecture and Linux. It may be required for certain parts to be ported to work on different (embedded) platforms, most notably the bare metal ones1, e.g. Espressif ESP32 or Atmel AVR32.

Optimizations The core of RBL is the most critical part but has been implemented mainly as a proof of concept. It could benefit from deeper performance analysis and profiling (e.g. using the Callgrind tool) for the user-targeted platforms. The (micro)optimizations could be afterward incorporated into RBL’s platform-independent repository.

Additional concrete blocks We have implemented the most com- mon built-in blocks that we have found use while considering use

1. Bare metal is a computer environment, in which a program is run without a presence of an operating system.

54 5. Conclusion

cases. There may be more concrete blocks worth adding into the RBL’s base collection.

Additional asynchronous API adaptations RBL welcomes to be fit onto existing asynchronous frameworks, other than Boost.Asio. Ex- ecutors do not fully wrap the functionality of Boost.Asio, either. In the future, Boost.Asio support could be segregated out of the RBL’s base to a stand-alone repository with the same rank as other . Mainly on embedded platforms, transfer of various interrupts into RBL events seems viable.

Additional/reworked expressions The expression template layer demonstrates the power of C++ and the ability the abstract RBL’s low-level API. We could find a reason in implementing other types of expressions, with slightly different semantics, such as an expression for creating cyclic graphs (loops). However, the designed expressions (namely the type deduction rules of complex expressions), may be too complex for an average user. With the example of the existing expres- sions, this layer may either be extended or re-built into expressions with simpler semantics, perhaps using the LINQ approach.

Improved debugging capabilities We could reason about expand- ing the introspection layer to include run-time, perhaps visual tools to look into the process of execution. Said extension could, at the very least, provide the feature of placing breakpoints2 onto selected blocks and/or their connections, halting the execution upon observing any events. The breakpoints could either be created via a function call for each block type, or there could be a designated block for this purpose, intended for connection to the graph at the critical places.

2. Breakpoints can be created on Linux in-code with raise(SIGTRAP) call (see RAISE(3) and SIGNAL(7) [1]).

55 A Asynchronous API Model

A.1 Asynchronous Operations

An asynchronous API consisting of functions, which initiate opera- tions usually taking longer, or unspecified amount time, but return the control to the caller immediately, regardless of whether the operation has completed or not. The result of the asynchronous operation can either be:

• manually queried by the user in an execution blocking or non- blocking manner, • automatically conveyed to the user by calling their callback function.

Synchronous execution would block the caller until completion of the associated operation, making it unfit for communication with an outside environment.

Callback A callback function (callable object) is provided by the user as an argument to each function of the API, which uses IoC. It is executed as soon as the associated operation completes, either with success or failure. This is the base case of IoC, and at that point, the asynchronous API has to be in control of the program. To be in control means to have at least one thread designated to the execution of the managing code, typically in a loop.

A.2 Boost.Asio – An Asynchronous API Example

Since RBL serves only as a transformation of programming techniques, it has to look for asynchronous execution scheduling and synchro- nization mechanisms elsewhere. The intention behind RBL is to be fit onto existing asynchronous APIs. One of such APIs can befound in the well-known and well received Boost C++ Libraries collection, under the name Boost.Asio [16]. While being designed primarily as an asynchronous networking support library for C++, the execution

56 A. Asynchronous API Model

model of the Boost.Asio library allows more general use, suitable even outside the domain of networking and asynchronous I/O. This section is devoted to explaining the relevant portions of the Boost.Asio library, for which RBL and some of the use case examples of this thesis have been constructed.

A.2.1 io_context The model revolves around an execution managing entity – io_context. It is an object, which accepts asynchronous requests from a user, enqueues the operation or forwards the requests to the operating system and calls the appropriate completion handlers. The handlers are provided in the form of callbacks as arguments of each asynchronous operation. The callbacks are then stored in the io_con- text’s internal queue for processing. At some point in the program, the user has to delegate the responsi- bility for control flow to the created io_context instance to execute the enqueued asynchronous operations. This is done by calling the io_- context::run function. The run function blocks while the registered operations are being executed. It is perfectly valid, and a common practice to chain asynchronous operations together. In other words, a completion handler of an asyn- chronous operation can register additional asynchronous operations for execution, possibly within the same io_context::run call. This even allows asynchronous cyclic dependencies between operations.

A.2.2 Execution Options The registered handlers can appear in two states: pending and ready. Pending handlers are those, for which the associated asyn- chronous operation has not been completed yet. Ready handlers are the remaining ones. Registering an asynchronous operation is roughly equivalent to: 1. registering a ready handler launching the operation, 2. registering a pending handler (if any) as a completion handler of the operation.

57 A. Asynchronous API Model

Overall, the io_context class provides these functions for greater control over the execution: • run_one – executes at most one ready handler including its completion handler in a blocking fashion; • run – executes all pending handlers and their completion han- dlers, blocks until completion, sets the io_context object to a stopped state afterwards; • poll_one – executes at most one ready handler and returns immediately afterwards; • poll – executes all ready handlers; • stop – stops the event processing loop, no more handlers will be executed; • restart – prepares a stopped io_context instance for repeated run call.

A.2.3 Asynchronous Operations An asynchronous Boost.Asio operation can be one of the following: • I/O operations, • time-delayed execution of a callback, • deferred execution of a callback.

I/O Operations Input/output manipulation operations are the only type among the ones above with visible side effects, which are their sole purpose.

Creating a context and an I/O object The following code declares an io_context and a TCP socket as an I/O object instance associated with it: boost::asio::io_context context; boost::asio::ip::tcp::socket socket(context, endpoint);

endpoint is a TCP socket identifier (an IP address and a port number).

58 A. Asynchronous API Model

Asynchronous reading This is how an asynchronous read operation to a buffer called inbound_data is requested: boost::asio::async_read(socket, inbound_data, [](boost::system::error_code ec, size_t length){ if(ec){ /* an error has occurred*/ } });

The socket and inbound_data objects have to outlive the whole asyn- chronous operation. The callback function is called with the return status of the operation and the number of bytes that have been trans- ferred.

Asynchronous writing This is the analogical data writing code: boost::asio::async_write(socket, outbound_data, [](boost::system::error_code ec, size_t length){ if(ec){ /* an error has occurred*/ } });

Boost.Asio’s sockets also provide an interface for issuing a read/write operation of multiple data buffers as one asynchronous operation (scatter-gather I/O), or for manipulation of the sockets as with I/O streams of characters. This is a theme beyond the purpose of RBL.

Execution We can launch the previously registered handlers: context.run();

The run call blocks until both read and write operations finish. In case we wish to execute the operations synchronously in relation to each other, we can write: context.run_one(); context.run_one();

The first run_one call blocks until reading finishes, the second one blocks until writing finishes.

59 A. Asynchronous API Model

If we would like to initiate the operations without blocking and do some other work, we can do: context.poll_one(); context.poll_one(); // other work... context.run();

The first poll_one call initiates an asynchronous read, the second a write operation.

Time-delayed Execution Boost.Asio allows registering callbacks that react not on an I/O event, but an event caused entirely by the passing of time. The library pro- vides timer classes as a supplement to an I/O object. The following code launches the user-provided callback function with a custom time delay: boost::asio::io_context context; boost::asio::deadline_timer timer(context); timer.expires_from_now(delay);

timer.async_wait([](boost::system::error_code ec){ if(ec){ /* the timer has been cancelled*/ } else{ /* run the delayed code*/ } });

context.run();

delay specifies the duration to wait, relative to the expires_from_now call. As with sockets, the timer object should outlive the delay period. Otherwise, the operation will be canceled, and the handler will be called with an error code signifying failure. The associated handler will only be called in the context of the run function, which blocks until the completion of the delayed handler.

Deferred Execution Lastly, it is possible to treat the io_context in a more direct way, as a queue of handlers (tasks) to be executed. A user can enqueue custom handlers to be invoked in a specific order, relative to other handlers.

60 A. Asynchronous API Model

This is achieved using the post function and its relatives – defer and dispatch: Listing A.1: Deferred Execution boost::asio::io_context context;

boost::asio::post(context,[](){ /* 1.*/ });

boost::asio::post(context,[&context](){ // 2. boost::asio::post(context,[](){ /* 4.*/ }); // 3. });

context.run();

The code portions will be executed in increasing order, as marked in the comments. The last post call represents a chained asynchronous operation.

A.2.4 Implicit Synchronization The API of Boost.Asio is thread-safe, and it provides a very useful guarantee, that a context’s handlers are only called from within the processing functions (run, run_one, poll, poll_one). These functions can be called from multiple threads at once. This might lead to syn- chronization problems between handlers, which Boost.Asio promptly resolves with the introduction of execution strands.

Strand A strand is another kind of Boost.Asio I/O object, which holds a reference to an io_context instance. It can substitute the io_context instance in the function calls, such as post. The handlers that are registered through a strand are guaranteed to be run sequentially without overlapping, even while executing the associated io_context from multiple threads.

61 B Technical Details

RBL is an open-source project under the Mozilla Public License 2.0. Its code is managed via the Git version control system. A clone of the repository is currently available at https://gitlab.fi.muni.cz/ xsevc/rbl.

B.1 Build requirements

The project uses CMake, a build system generation tool1. The mini- mum acceptable version is 3.6. RBL requires C++17 standard support from the compiler. Some parts require the systems’ multithreading support, such as POSIX Threads on UNIX-like operating systems. Other parts depend upon some of the Boost C++ Libraries, specifically Boost.Asio, Boost.Seriali- zation and Boost.TypeIndex. A few compilation switches can be speci- fied for a build, as described in README.md. For the purpose of continuous integration and easier development environment setup, a docker2 image with the necessary prerequisites has been created, and is available at https://gitlab.fi.muni.cz/ xsevc/rbl-docker.

B.2 Third-party libraries

B.2.1 Boost.Serialization The Boost.Serialization library implements data serialization and de- serialization mechanisms in a more uniform way than the C++ I/O streams. Under the same syntax, it is possible to write and read data (built-in types, std::string and user-defined types) with the guaran- tee, that the data obtained back will be the same. This is not the case

1. https://cmake.org/ 2. Docker is open-source software for operating-system-level virtualization of isolated environments (images and containers) directly interfacing with the host’s kernel. Docker images represent a lightweight hierarchical packaging structure, useful for software deployment.

62 B. Technical Details

for I/O streams, e.g. formatted extraction of std::string (originally with whitespace) stops at the first whitespace [7, $21.3.3.4/(1.3)]; the remedy is syntactically different to extracting non-problematic types. The library is currently used only in RBL’s TCP formatted reading and writing blocks (Section 3.3).

B.2.2 Boost.TypeIndex Boost.TypeIndex provides portable API for getting static or run-time (RTTI3) C++ type information about objects. RBL uses this to produce human-readable C++ type names, which are part of RBL’s visualized block graphs.

B.2.3 Loguru Loguru (https://github.com/emilk/loguru) is a lightweight C++ logging library. We have chosen it for its relative simplicity and ro- bustness, advocated performance, modern impression and ongoing development. The main features we were looking for were modest: the ability to select a verbosity level, and view timestamps, as well as thread identifiers in the output. The library is not required with logging disabled.

B.2.4 Catch2 Catch2 (https://github.com/catchorg/Catch2) is a minimalistic unit test framework, in which RBL’s unit tests are written. The library con- sists of macros that are wrapped around user-provided code blocks representing full test cases, test case sections or asserted conditions. The framework is explicitly given the control at the entry point of a program (the main function) to run all the defined test cases auto- matically. Additional behavioral and test case filtering options can be supplied as command-line arguments.

3. Run-time type information (RTTI) is a feature of C++ that allows introspection of the dynamic object types, i.e. objects of types determined at run-time [2, Chapter 22].

63 B. Technical Details B.3 Project Structure

The project’s repository is organized into the subdirectories the fol- lowing way:

• bench, bench/results – code and results of performance bench- marks (Section 4.2.2, B.3); • doc/draft, doc/html – basic documentation in form of a brief draft of each layer, source code documentation; • example – code of basic usage and comparison examples; • include/rbl – header files; • src – source files; • test – unit tests; • third-party – third-party libraries; • utils – header files of general convenience functions.

B.3.1 Utilities library The source code located in the utils directory is a collection of our own convenience functions, which we have found needs for while the development of RBL. These functions were too general to be placed alongside the code that implements RBL’s specialized concepts. The utilities extend the C++ standard library’s support of: • generic run-time and compile-time tuple manipulation, • functional programming, • Resource Acquisition is Initialization (RAII) applications, • type traits, • with variadic template parameter packs.

64 B. Technical Details

B.3.2 Source-code Structure Due to a high amount of generic template code, RBL largely takes the form of a header library. There is only a small portion of non- generic code that is compiled into a library. RBL’s source code is logically divided into different parts (also folders), each of which focuses on implementing one RBL feature, or concept. These parts are: • core – Core Functionality – RBL’s object-oriented block concept and communication; • builtin – Built-in Blocks – convenience blocks instantiable by user-provided functions as their behavior, error handling and other concrete blocks; • exec – Executor Blocks – blocks implementing RBL’s executor concept; • algo – Algorithm Blocks – blocks implementing algorithms operating on sequences of messages; • intro – Introspection Layer – optional support for run-time logging and visualization of an RBL program; • expr – Expression Template Layer – syntactic simplification of block graph creation. All of the RBL’s source code is located inside the rbl namespace and its inner namespaces. Namespaces named detail or containing detail in their names are not to be used by the user. The namespace hierarchy does not copy the logical and directory structure. All block classes are contained within a block namespace, which can be either rbl::block or appear further down the names- pace structure, e.g. rbl::asio::block. The blocks are recommended to be instantiated via their creation functions, which are usually named the same (or similarly). These functions are located next to the corresponding block namespace. RBL is forced to use this pattern because class template argument deduc- tion rules are weaker than those of normal functions4. Furthermore,

4. Apparently, partial class template argument deduction (essential for explicitly specifying a block’s input/output type while deducing other types) is not supported [17, Template argument deduction for class templates].

65 B. Technical Details

there can be more creation functions (or overloads) dedicated to the construction of one block in different ways. RBL’s core implementation contains several run-time assertions that can identify invalid constructions before undefined behavior should take place.

B.3.3 Documentation RBL’s sources are briefly documented via Doxygen5 comments. The implementation details are omitted from the documentation, as well as obvious parameter meanings (e.g. source object for copy constructor) or return values (e.g. member access functions).

B.3.4 Unit Tests As it is expected from larger projects, and support libraries even more so, we would like to have a tangible assurance of their correctness. As the main method, we have adopted unit testing, suitable for testing of decoupled parts, which RBL blocks certainly are. RBL contains unit tests, although only moderately covering the functionality. The behavior of all blocks is tested by interaction with various input messages. Complete instantiation (compile-time) tests and tests for value semantics are currently lacking. Basic instantiation validity tests of expressions are only present in examples. The core layer is tested most thoroughly. Other layers are tested via contrived block graphs, which usually contain built-in error handling or message counting blocks to re-use what RBL provides.

B.3.5 Performance Benchmarks The values in Section 4.2.2 are acquired as a mean of 10 measure- ments for each parametrization. The compiled executables produce data tables saved in .csv files. These can be then plotted into graph images by the gnuplot6 program. For this, there is an automatized Bash script called plot.bash that is copied to the same directory of the .csv filesRBL_BENCHMARKS_OUTPUT_DIR ( CMake option) on each build.

5. http://www.doxygen.nl/ 6. http://www.gnuplot.info/

66 Bibliography

1. KERRISK, Michael. Linux Programmer’s Manual [online]. 2019 [vis- ited on 2019-04-30]. Available from: http://man7.org/linux/ man-pages/index.html. 2. STROUSTRUP, Bjarne. The C++ Programming Language. 4th. Addison-Wesley Professional, 2013. ISBN 0321563840, 9780321563842. 3. CAMPBELL, Lee. Introduction to Rx [online]. 2012 [visited on 2019-04-30]. Available from: http://introtorx.com/. 4. GABBRIELLI, Maurizio; MARTINI, Simone. Programming Lan- guages: Principles and Paradigms. 1st. Springer Publishing Com- pany, Incorporated, 2010. ISBN 1848829132, 9781848829138. 5. FOWLER, Martin. InversionOfControl [online]. 2005 [visited on 2019-04-30]. Available from: https://martinfowler.com/ bliki/InversionOfControl.html. 6. VANDEVOORDE, David; JOSUTTIS, Nicolai M.; GREGOR, Dou- glas. C++ Templates: The Complete Guide (2nd Edition). Addison- Wesley Professional, 2017. ISBN 0321714121, 9780321714121. 7. ISO. ISO/IEC 14882:2017 Information technology — Programming languages — C++. Fifth. 2017. Available also from: https://www. iso.org/standard/68564.html. 8. GANSNER, Emden R.; NORTH, Stephen C. An open graph vi- sualization system and its applications to software engineering. SOFTWARE - PRACTICE AND EXPERIENCE. 2000, vol. 30, no. 11. 9. SCHÄLING, Boris. Boost.Asio Coroutines [online]. 2019 [visited on 2019-04-30]. Available from: https://theboostcpplibraries. com/boost.asio-coroutines. 10. MICROSOFT DOCS. LINQ (Language-Integrated Query) [online]. 2017 [visited on 2019-04-30]. Available from: https : / / docs . microsoft.com/en- us/previous- versions/bb397926(v=vs. 140).

67 BIBLIOGRAPHY

11. INTEL CORPORATION. Intel® Threading Building Blocks Docu- mentation [online]. 2018 [visited on 2019-04-30]. Available from: https://software.intel.com/en-us/node/506211. 12. BAKER, Lewis. Coroutine Theory [online]. 2017 [visited on 2019- 04-30]. Available from: https://lewissbaker.github.io/2017/ 09/25/coroutine-theory. 13. KOWALKE, Oliver. Boost.Coroutine2 [online]. 2014 [visited on 2019-04-30]. Available from: https : / / www . boost . org / doc / libs / 1 _ 70 _ 0 / libs / coroutine2 / doc / html / coroutine2 / overview.html. 14. KOWALKE, Oliver. Boost.Context [online]. 2014 [visited on 2019- 04-30]. Available from: https://www.boost.org/doc/libs/1_ 70_0/libs/context/doc/html/context/overview.html. 15. BENDERSKY, Eli. The cost of dynamic (virtual calls) vs. static (CRTP) dispatch in C++ [online]. 2013 [visited on 2019-04-30]. Available from: https://eli.thegreenplace.net/2013/12/05/the- cost - of - dynamic - virtual - calls - vs - static - crtp - dispatch-in-c. 16. TORJO, John. Boost.Asio C++ Network Programming. Packt Pub- lishing, 2013. ISBN 9781782163268. 17. BALLO, Botond. Trip Report: C++ Standards Meeting in Oulu, June 2016 [online]. 2016 [visited on 2019-04-30]. Available from: https: //botondballo.wordpress.com/2016/07/06/trip-report-c- standards-meeting-in-oulu-june-2016/.

68