Linköping University | Department of Computer and Information Science Master’s thesis, 30 ECTS | Datateknik 2021 | LIU-IDA/LITH-EX-A--2021/023--SE

Holistic view on alternative pro- gramming languages for Radio Access Network applications in cloud and embedded deploy- ments En helhetsvy över alternativa programmeringsspråk för Radio Access Network-applikationer i moln- och inbyggda system

Anton Orö Rasmus Karlbäck

Supervisor : John Tinnerholm Examiner : Christoph Kessler

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer- ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko- pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis- ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker- heten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman- nens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down- load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

Anton Orö © Rasmus Karlbäck Abstract

With the emergence of cloud based solutions, companies such as Ericsson AB have started investigating different means of modernizing current implementations of software systems. With many new programming languages emerging such as Rust and Go, inves- tigating the suitability of these languages compared to C++ can be seen as a part of this modernization process. There are many important aspects to consider when investigating the suitability of new programming languages, and this thesis makes an attempt at con- sidering most of them. Therefore both performance which is a common metric as well as development efficiency which is a less common metric, were combined to provide a holistic view. Performance was defined as CPU usage, maximum memory usage, processing time per sequence and latency at runtime, which was measured on both x86 and ARM based hardware. Development efficiency was defined as the combination of the productivity metric, the maintainability index metric and the cognitive complexity metric. Combining these two metrics resulted in two general guidelines: if the application is constantly under change and performance is not critical, Go should be the language of choice. If instead performance is critical C++ should be the language of choice. Overall, when choosing a suitable , one needs to weigh development efficiency against per- formance to make a decision. Acknowledgments

We would like to thank our excellent supervisor at LiU, John Tinnerholm, who helped pro- vide us with valuable insights and feedback throughout the thesis. We also want to thank the supervisors at Ericsson who provided invaluable help when identifying the characteristics and continually helped moving the work forward. Further, we extend our gratitude to the respondents of our survey and without your help the results would have been incomplete.

iv Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures viii

List of Tables x

Acronyms 1

1 Introduction 2 1.1 Motivation ...... 2 1.2 Aim...... 3 1.3 Research questions ...... 3 1.4 Delimitations ...... 4 1.5 Overview ...... 4

2 Theory 6 2.1 Radio access network ...... 6 2.2 Threads, concurrency and parallelism ...... 7 2.3 Green threads ...... 7 2.4 Inter-process communication ...... 8 2.5 The programming language C++ ...... 9 2.5.1 Concurrency ...... 9 2.5.2 Compiler ...... 9 2.5.3 Memory management ...... 10 2.6 Go...... 10 2.6.1 Concurrency ...... 10 2.6.2 Compiler ...... 11 2.6.3 Memory management ...... 11 2.6.4 Goroutines ...... 11 2.6.5 Go Runtime system ...... 12 2.7 Rust ...... 12 2.7.1 Concurrency ...... 13 2.7.2 Compiler ...... 13 2.7.3 Memory management ...... 14 2.8 Related work ...... 14 2.8.1 Measuring performance differences for programming languages . . . . 14 2.8.2 Evaluating development efficiency differences for programming lan- guages ...... 15 2.8.3 Comparing compilers ...... 15

v 2.8.4 Advantages and disadvantages of C-RAN ...... 16 2.8.5 Energy consumption evaluation of programming languages ...... 16 2.9 Soft metrics ...... 17 2.9.1 Productivity ...... 17 2.9.2 Maintainability ...... 17 2.9.3 Understandability ...... 18

3 Background 20 3.1 Ericsson ...... 20 3.2 System requirements ...... 20 3.3 System description ...... 21 3.3.1 Test driver ...... 21 3.3.2 Application ...... 22

4 Method 25 4.1 Pre study ...... 25 4.1.1 Identifying the characteristics ...... 25 4.1.2 Hardware ...... 28 4.1.3 Choosing compiler ...... 29 4.2 Implementation ...... 29 4.2.1 Detailed description of the test driver ...... 30 4.2.2 Detailed description of the IPC ...... 30 4.2.3 Detailed description of the simplified application ...... 30 4.3 Performance comparison ...... 31 4.3.1 Sequence test ...... 32 4.3.2 Latency test ...... 33 4.3.3 Encode/decode test ...... 34 4.4 Development efficiency ...... 34

5 Results 36 5.1 Performance comparison ...... 36 5.1.1 Sequence test ...... 36 5.1.2 Latency test ...... 60 5.1.3 Encode/decode test ...... 62 5.2 Development efficiency ...... 65 5.2.1 Productivity ...... 65 5.2.2 Maintainability ...... 66 5.2.3 Understandability ...... 67

6 Discussion 68 6.1 Results ...... 68 6.1.1 Performance comparison ...... 68 6.1.2 Development efficiency ...... 71 6.2 Method ...... 72 6.2.1 Replicability ...... 72 6.2.2 Reliability ...... 72 6.2.3 Validity ...... 72 6.2.4 Pre study ...... 73 6.2.5 Implementation ...... 74 6.2.6 Performance comparison ...... 75 6.2.7 Development efficiency ...... 76 6.2.8 Understandability ...... 78 6.2.9 Literature ...... 78

vi 6.3 The authors’ experiences ...... 79 6.4 The work in a wider context ...... 80

7 Conclusion and future work 81 7.1 Conclusion ...... 81 7.2 Future work ...... 82

Bibliography 83

A Appendix 88 A.1 Time estimation survey ...... 88 A.2 Time estimation survey results ...... 94 A.3 Source code ...... 96 A.3.1 C++ application source code ...... 96 A.3.2 Go application source code ...... 107 A.3.3 Rust application source code ...... 113 A.3.4 GPB proto source code ...... 119 A.3.5 C++ test driver source code ...... 120

vii List of Figures

2.1 Visualization of difference between concurrent and parallel execution ...... 7 2.2 Go runtime system overview (Adapted from [goroutineImage]) ...... 12

3.1 Test driver overview ...... 22 3.2 Application overview ...... 24

4.1 Simplified Test Driver ...... 27 4.2 Simplified Application Overview ...... 28 4.3 Sequence message flow between test driver and application ...... 33

5.1 BB2.5 CPU usage for sequence test with 10 sequences and 2 000 iterations . . . . . 37 5.2 BB2.0 CPU usage for sequence test with 10 sequences and 2 000 iterations . . . . . 37 5.3 BB3.0 CPU usage for sequence test with 10 sequences and 6 000 iterations . . . . . 38 5.4 Dell R630 CPU usage for sequence test with 10 sequences and 6 000 iterations . . . 38 5.5 Dell R6525 CPU usage for sequence test with 10 sequences and 6 000 iterations . . 39 5.6 BB2.5 CPU usage for sequence test with 100 sequences and 1 000 iterations . . . . . 39 5.7 BB2.0 CPU usage for sequence test with 100 sequences and 1 000 iterations . . . . . 40 5.8 BB3.0 CPU usage for sequence test with 100 sequences and 5 000 iterations . . . . . 40 5.9 Dell R630 CPU usage for sequence test with 100 sequences and 5 000 iterations . . 41 5.10 Dell R6525 CPU usage for sequence test with 100 sequences and 5 000 iterations . . 41 5.11 BB2.5 CPU usage for sequence test with 1 000 sequences and 400 iterations . . . . . 42 5.12 BB2.0 CPU usage for sequence test with 1 000 sequences and 400 iterations . . . . . 42 5.13 BB3.0 CPU usage for sequence test with 1 000 sequences and 1 500 iterations . . . . 43 5.14 Dell R630 CPU usage for sequence test with 1 000 sequences and 1 500 iterations . 43 5.15 Dell R6525 CPU usage for sequence test with 1 000 sequences and 1 500 iterations . 44 5.16 BB2.5 CPU usage for sequence test with 10 000 sequences and 25 iterations . . . . . 44 5.17 BB2.0 CPU usage for sequence test with 10 000 sequences and 25 iterations . . . . . 45 5.18 BB3.0 CPU usage for sequence test with 10 000 sequences and 100 iterations . . . . 45 5.19 Dell R630 CPU usage for sequence test with 10 000 sequences and 100 iterations . . 46 5.20 Dell R6525 CPU usage for sequence test with 10 000 sequences and 100 iterations . 46 5.21 BB2.5 Time per sequence for sequence test with 10 sequences and 2 000 iterations . 47 5.22 BB2.0 Time per sequence for sequence test with 10 sequences and 2 000 iterations . 47 5.23 BB3.0 Time per sequence for sequence test with 10 sequences and 6 000 iterations . 48 5.24 Dell R630 Time per sequence for sequence test with 10 sequences and 6 000 iterations 48 5.25 Dell R6525 Time per sequence for sequence test with 10 sequences and 6 000 iter- ations ...... 49 5.26 BB2.5 Time per sequence for sequence test with 100 sequences and 1 000 iterations 49 5.27 BB2.0 Time per sequence for sequence test with 100 sequences and 1 000 iterations 50 5.28 BB3.0 Time per sequence for sequence test with 100 sequences and 5 000 iterations 50 5.29 Dell R630 Time per sequence for sequence test with 100 sequences and 5 000 iter- ations ...... 51 5.30 Dell R6525 Time per sequence for sequence test with 100 sequences and 5 000 iterations ...... 51

viii 5.31 BB2.5 Time per sequence for sequence test with 1 000 sequences and 400 iterations 52 5.32 BB2.0 Time per sequence for sequence test with 1 000 sequences and 400 iterations 52 5.33 BB3.0 Time per sequence for sequence test with 1 000 sequences and 1 500 iterations 53 5.34 Dell R630 Time per sequence for sequence test with 1 000 sequences and 1 500 iterations ...... 53 5.35 Dell R6525 Time per sequence for sequence test with 1 000 sequences and 1 500 iterations ...... 54 5.36 BB2.5 Time per sequence for sequence test with 10 000 sequences and 25 iterations 54 5.37 BB2.0 Time per sequence for sequence test with 10 000 sequences and 25 iterations 55 5.38 BB3.0 Time per sequence for sequence test with 10 000 sequences and 100 iterations 55 5.39 Dell R630 Time per sequence for sequence test with 10 000 sequences and 100 iterations ...... 56 5.40 Dell R6525 Time per sequence for sequence test with 10 000 sequences and 100 iterations ...... 56 5.41 Memory usage for sequence test for C++, Go and Rust on BB2.0 ...... 57 5.42 Memory usage for sequence test for C++, Go and Rust on BB2.5 ...... 58 5.43 Memory usage for sequence test for C++, Go and Rust on BB3.0 ...... 58 5.44 Memory usage for sequence test for C++, Go and Rust on Dell R630 ...... 59 5.45 Memory usage for sequence test for C++, Go and Rust on Dell R6525 ...... 59 5.46 BB2.5 Latency in latency test for C++, Rust and Go ...... 60 5.47 BB2.0 Latency in latency test for C++, Rust and Go ...... 60 5.48 BB3.0 Latency in latency test for C++, Rust and Go ...... 61 5.49 Dell R630 Latency in latency test for C++, Rust and Go ...... 61 5.50 Dell R6525 Latency in latency test for C++, Rust and Go ...... 62 5.51 BB2.5 Encode/decode time per sequence in encode/decode test for C++, Rust and Go...... 63 5.52 BB2.0 Encode/decode time per sequence in encode/decode test for C++, Rust and Go...... 63 5.53 BB3.0 Encode/decode time per sequence in encode/decode test for C++, Rust and Go...... 64 5.54 Dell R630 Encode/decode time per sequence in encode/decode test for C++, Rust andGo...... 64 5.55 Dell R6525 Encode/decode time per sequence in encode/decode test for C++, Rust and Go ...... 65

ix List of Tables

4.1 Available hardware ...... 29 4.2 Compiler configurations ...... 29 4.3 Sequence messages ...... 32 4.4 Sequence test scenarios ...... 33

5.1 Combined score development efficiency ...... 65 5.2 Weighted averages of expert estimated implementation time ...... 66 5.3 Average execution time of the application in C++, Rust and Go for 1 iteration of 10 000 sequences for all hardware configurations ...... 66 5.4 Data for the productivity metric with r = 50000 ...... 66 5.5 Data for the productivity metric with r = 100000 ...... 66 5.6 Data for the productivity metric with r = 200000 ...... 66 5.7 Maintainability index ...... 67 5.8 Cognitive complexity ...... 67

x Acronyms

AP Atomic Procedure.

AST Abstract Syntax Tree.

C-RAN Cloud Radio Access Network.

COTS Commercial off-the-shelf.

CP Composite Procedure.

GCC GNU Compiler Collection.

GPB Google Protocol Buffer.

ICC Intel C++ Compiler.

IEEE Institute of Electrical and Electronics Engineers.

IPC Inter-Process Communication.

RAII Resource Acquisition Is Initialization.

RAN Radio Access Network.

TCP Transmission Control Protocol.

UDP User Datagram Protocol.

UE User Equipment.

UEs User Equipments.

1 1 Introduction

This chapter gives an introduction to the problem and the context of the problem that this thesis aims to address. It also presents the research questions that this thesis will answer.

1.1 Motivation

Mobile networks are used all over the world and are the cornerstone of the networked society. To support the vast amount and diversity of data expected in future networks, Ericsson AB (Ericsson) is developing products to drive and support the networked society. This leads to the need for investigation and development of algorithms, architecture, tools etc. to support the increase of data and Massive Internet of Things1 for Radio Access Network (RAN). For a more detailed explanation of RAN, see Section 2.1. Previously, RAN applications needed dedicated hardware for optimal performance and functionality, which led to higher costs due to the fact that Commercial off-the-shelf (COTS) hardware could not be used [21]. But with the emergence of cloud environments and con- tainer technology like Docker and Kubernetes, RAN applications can now be deployed on COTS hardware and is called Cloud Radio Access Network (C-RAN) [22]. Since C-RAN is a new technology, there is a need to investigate how to implement software in order to im- prove efficiency and take advantage of the capabilities enabled by the new technology. The software needs to be able to run on different hardware and still be performant, meaning that hardware is another aspect that needs to be included in the investigation. C++ is undoubtedly one of the most used programming languages for embedded appli- cations. RAN applications at Ericsson are also developed in C++, with the main reason being performance. There are many studies that compare different aspects such as development efficiency, CPU and memory usage for popular programming languages like Rust, Go and C++ [24, 23, 45, 28]. Furthermore, investigations in different ways of load-balancing and/or optimizing resource allocation for C-RAN [40, 52, 53, 15] exist as well. These kinds of com- parisons have a limited value for a RAN application which by nature is heavy on messaging between multiple services using asynchronous decoupled communication patterns. Another aspect that is often overlooked when designing a system, is how easy the system is to observe

1https://www.ericsson.com/en/reports-and-papers/ericsson-technology-review/articles/key-technology- choices-for-optimal-massive-iot-devices

2 1.2. Aim and troubleshoot when needed. Due to the lack of comparisons specifically about perfor- mance or development efficiency of different programming languages for RAN applications, there is a need for further evaluation to determine the best suited programming language with these aspects in mind. Another reason for comparing C++ with Rust and Go is that C++ was introduced in the ’80s, a time where 128 kB - 32 MB of RAM and 8-40 MHz single-core CPUs were the standard for computers (with larger systems having more powerful components, but nothing close to today’s standards), for example the DEC VAX 8600 was released in 1984 with 32 MB of RAM and 12.5 MHz CPU speed [8]. During the subsequent years, Moore’s law was upheld by increasing single-core clock speeds on the CPU and increased transistor counts on a sin- gle CPU core [25], meaning one could more or less expect the same program to run twice as fast when upgrading to a new computer every other year. Lately, the industry focuses on adding more cores instead, meaning that programmers have to adapt their programs to the increased core count to see improvements. Modern programming languages have also seen some improvements in regards to more efficient utilization of the CPU for concurrent pro- grams that run on a single core. Since RAN applications at Ericsson are asynchronous and single-threaded by nature, they scale horizontally to a multi-core or cloud based architecture by spawning more instances of the program. Therefore, investigating modern languages that better facilitate efficient development of performant programs that utilize the full capabilities of single-threaded programs in multi-core architectures, is highly needed for companies that plan to expand into a cloud environment. One could also argue that the landscape of modern software development has changed into a more feature heavy focus, meaning that companies needs to patch, fix bugs and pro- vide new features in an increasingly rapid pace in order to keep up with competition. Because C++ is commonly regarded as one of the hardest or even the hardest mainstream program- ming language, companies like Ericsson have realised that they can not only focus on the performance of their products, they also have to consider things like maintainability, pro- ductivity and understandability. Not to mention the fact that young aspiring programmers might be more intrigued by developing in a more intuitive programming language with a gentle learning curve.

1.2 Aim

The aim of this thesis is to provide a holistic view on the differences between the program- ming languages C++, Rust and Go in RAN applications. To achieve this, the aim is divided into two parts: performance and development efficiency. To investigate performance in the three programming languages, a benchmark for a typical high intensity RAN use case is to be implemented in C++, Rust and Go. The purpose of the benchmark is that it should be able to give a clear picture of the performance differences of the RAN application implemented in the chosen languages. This will be accomplished by first identifying the key RAN fun- damental characteristics followed by implementing a trivial version of the real system, that still adhere to the identified characteristics and then measuring performance metrics for each programming language implementation. To investigate development efficiency, the three metrics productivity, maintainability, and understandability will be measured. These metrics should provide a more complete eval- uation of the suitability of a programming language, compared to using only performance metrics. To accomplish this, this thesis will discuss why development efficiency metrics are important and provide relevant data for evaluating these in the different languages.

1.3 Research questions

The research questions presented here are the basis for the work done in this thesis:

3 1.4. Delimitations

1. Which of the programming languages C++, Go or Rust is best suited for developing a RAN application for a cloud and embedded deployment, with regards to performance?

2. Which of the programming languages C++, Go or Rust is best suited for developing a RAN application for a cloud and embedded deployment, with regards to development efficiency?

3. Do different types of embedded and COTS hardware affect the choice of programming language and the performance of the RAN application?

1.4 Delimitations

The first delimitation is the amount of programming languages compared, since this depends on the time it takes to create the application in the different languages. The second delimita- tion concerns the related work, which is that certain conclusions are mainly based on extrap- olation of existing studies in other contexts than to the context studied here. When it comes to the choice of programming languages to compare, the choice of C++, Rust and Go were requests made by Ericsson. For example, Erlang could have been an interesting and suitable candidate for comparison as well, but this comparison has already been done internally at Ericsson and was therefore excluded from this thesis. Since holistic is a broad term, this thesis delimits the meaning of the word to a comparison of three different languages in regards to both performance and development efficiency as well as looking at both cloud and embedded hardware. In the context of this thesis, performance is delimited to only include CPU-usage, memory usage, processing time per sequence and latency at runtime, and development efficiency is delimited to only include productivity, maintainability, and understandability (definitions for development efficiency metrics are given in Section 2.9).

1.5 Overview

In this section the structure of the thesis is presented and an overview of each chapter is given. All chapters were written together by the authors, except for Section 2.6 about Go which was written by Anton Orö and Section 2.7 which was written by Rasmus Karlbäck.

Background This chapter focuses on presenting a brief introduction to Ericsson as a company, as well as an overview of the currently implemented RAN system at Ericsson that is the foundation of the research conducted in this thesis.

Theory In this chapter, the theory related to the thesis is presented as well as a discussion about different related works.

Method This chapter is focused on presenting the approach to solving the aim of the thesis. This was done in four main phases: a pre study phase, an implementation phase, a phase for comparing performance and a phase for comparing the development efficiency.

Results This chapter is focused on the results obtained. The results are presented in the same order as the work that was carried out.

4 1.5. Overview

Discussion In this chapter the results and the method employed to get the results are discussed. There is also a section that provides insights about the authors’ personal experiences as well as a discussion about the work in a wider context.

Conclusion and future work This chapter provides answers to the research questions as well as suggestions for future research.

Appendix The appendix contains the source code used in the project as well as the survey and its results.

5 2 Theory

In this chapter, the theory related to the thesis is presented. In Section 2.1, RAN is described. In Sections 2.2, 2.3 and 2.4 the programming concepts used in the thesis is presented. Sec- tions 2.5, 2.6 and 2.7 presents background to the programming languages C++, Go and Rust. Finally, in Section 2.8 the related works are presented and discussed.

2.1 Radio access network

This section is based entirely on the article by Checko et al. [13]. A base station consists of two main components, a baseband processing module and a radio functionalities module. Base stations are used to receive and transmit data to remote devices, such as mobile phones, which is done via antennas mounted on the base station. Equipment held by end-users that utilize base stations are called User Equipment (UE). So far, there have been three main iterations of how the base station architecture works. The first step, the "traditional cellular" or Radio Access Network (RAN), has multiple base stations. Every single base station processes and transmits its own signal out to the core network (through something called the backhaul), since the components are integrated inside the base station. This means that all the equipment needed had to be on-site, such as backup- power, air conditioning and backhaul transmission equipment. This was mostly popular for 1G and 2G networks and meant that each base station always needs to be able to handle a maximum load, meaning a lot of its potential being unused most of the time. When 3G was later deployed the base stations begun to have a Remote Radio Head (RRH), meaning that the base station was separated from their signal processing unit and the radio unit stayed at the base station. This signal processing unit is called a Baseband Unit (BBU) or Data Unit (DU). This meant that the BBU could be placed up to 40km from the base station and was most likely connected via an optical fiber. This in turn means that the BBU could be placed in an easier to handle location, making maintenance costs and cooling much cheaper. One could also connect multiple RRHs to a single BBU making redundancy less common. Lastly there is the Cloud base station architecture which is not yet used. By centralizing the BBU and creating a so called "BBU pool", one can connect many RRHs to one BBU pool. A BBU pool is a cluster of multiple virtual BBUs, which can be deployed on COTS hardware. By doing this, one can scale the amount of BBU processing power needed in a very efficient way and reduce overhead costs.

6 2.2. Threads, concurrency and parallelism

2.2 Threads, concurrency and parallelism

It is important to differentiate between the different concepts of parallelism and concurrency, since these terms are often confused with each other and can have different meanings. The following quote by Breshears will be used as the definition of these two concepts: A system is said to be concurrent if it can support two or more actions in progress at the same time. A system is said to be parallel if it can support two or more actions executing simultaneously. [11, p. 3] The difference between concurrency and parallelism is visualized in Figure2.1.

Task 1 Concurrent, Task 2 non-parallel Task 3

Task 1 Parallel Task 2 (and concurrent) Task 3 Figure 2.1: Visualization of difference between concurrent and parallel execution

In order to understand concurrency, one needs to understand what a process and a is first. A process is the entity that is run when launching a program, and its default be- haviour is that it does not communicate with other processes or accesses their data. Inter- Process Communication (IPC) is needed to communicate between processes, and a detailed description of the different IPC mechanisms can be seen in Section 2.4. Portable Interface Threads (pthreads) is a standard API [30] and is implemented on many dif- ferent operating systems such as Linux and macOS. A thread can be seen as a set of executable instructions that can be scheduled by the operating system [7], and a thread is part of a pro- cess. Pthreads, which are kernel-level threads, are most commonly scheduled preemptively, meaning that the scheduler can interrupt a running thread and allow another thread to run instead. This can be used when a thread is waiting for something, such as a blocking I/O operation, then another thread can be scheduled instead to utilize the CPU better [46]. A pro- cess can have one or several threads associated to it and having multiple threads that can be scheduled simultaneously is what provides thread-level parallelism in a program. Threads associated with the same process share the same resources, so a modification of a shared re- source by one thread within a process can be seen in the other threads. This might lead to race conditions, which is when two or more threads are trying to modify the same resource at the same time, leading to undefined behaviour. Since processes have their own unique mem- ory by default [46], threads within one process do not share memory with another thread in another process, unless they use IPC to share data.

2.3 Green threads

When referring to user-level threads and light-weight threads, the term green threads will be used as a collective name in this thesis. Sometimes, can be included in the term green threads, but in this thesis a decision was made to separate them. To differentiate between coroutines and green threads the explanation given by Kowalke [35] will be used. The main difference, according to the author, is that coroutines do not have a scheduler, since when a yields it passes the control directly back to its caller. Meanwhile, when a green thread yields it passes the control to the scheduler which hands control over to the next green thread in line implicitly. Green threads are similar to pthreads with two main dif- ferences, with the first being that green threads are mapped to a single pthread, effectively

7 2.4. Inter-process communication making it single-threaded [51]. The second difference is that green threads are not scheduled by the operating system, they are instead scheduled by the runtime system associated with the program. This means that languages with runtimes that lack native support for green threads (for example Rust and C++), need to implement a runtime which supports manage- ment of green threads. Green threads are cooperatively scheduled and provide concurrency by multiplexing the green threads onto the pthread, thus interleaving execution at distinct and safe points. Cooperative scheduling means that green threads are not interrupted by the scheduler, instead the threads explicitly yield control to other threads at specific points and these points are usually when waiting for synchronization or for a non-blocking operation, such as non-blocking I/O or asynchronous communication. Similar to pthreads, green threads have their own stack allocated on the heap of the pro- cess to enable context-switching, which is faster in terms of CPU-cycles compared to kernel- level thread switching [10] and uses less memory. However, the single-threaded nature of green threads means that multi-core processors are not fully utilized and green threads are therefore better suited for embedded applications, unless horizontal scaling is possible. If a green thread uses blocking operations, such as blocking I/O or synchronization, then the entire pthread is blocked, thus blocking all other green threads from running on that pthread. There exist several other names for user-level threads other than green threads, and most programming languages have their own implementation of green threads. Examples of im- plementations of green threads are goroutines1 in Go, the Boost.Fiber2 in C++ and tokio::task3 in Rust.

2.4 Inter-process communication

This section is based entirely on the guide about Inter-Process Communication (IPC) by Kalin [31]. IPC is the name of a mechanism that enables communication between different processes or more accurately, between threads of different processes. For Linux, the main IPC mecha- nisms are signals, sockets, message queues, shared files, shared memory and pipes. Communication through signals can be achieved by interrupting an already executing program with a signal that the application should respond to. For instance, a parent process may signal to one of its child processes (created by a fork() call) that it should terminate. The child can then block or handle the signal, depending on if a signal handler for that specific signal has been implemented. There are two variants of sockets: network sockets and Unix domain sockets. Network sockets allow communication between processes that are on different hosts using for example Transmission Control Protocol (TCP) or User Datagram Protocol (UDP) as the communication protocol. Unix domain sockets are used for communication between processes that are on the same host. Socket communication can be either unidirectional or bidirectional. Pipes are used for First In - First Out (FIFO) communication between processes using a channel, which has a write-end and a read-end. A pipe can either be named or unnamed. The named pipe is created with a specific name by the writer and accessed through that same name when reading from that pipe. Named pipes can be used for communication between processes on different hosts with the possibility of having multiple writers and readers and exists until all processes connected to it unlinks from the pipe. Unnamed pipes can only be used for communication between a parent and its child thread, and only enables one-way communication and exists only if the parent or child uses it, or until it is closed, whichever comes first.

1https://golang.org/doc/faq#goroutines 2https://www.boost.org/doc/libs/1_68_0/libs/fiber/doc/html/fiber/overview.html 3https://docs.rs/tokio/0.2.4/tokio/task/index.html

8 2.5. The programming language C++

Message queues are similar to pipes. However, data does not need to be read in FIFO order, but can be read in any order, depending on the identifier of the message. Shared storage is a basic form of IPC, meaning that one process writes to a file and another process reads from it. However, there is a risk of race conditions if the processes were to try to read and write to the same file at the same time. This is solved by using locks, where a writer blocks other writers and readers when trying to write to a file. There can be multiple readers for the same file, so readers share the same lock and when one or more readers are locking a file, it cannot be written to. Shared memory for Linux comes in two variants: POSIX and the legacy System V API. POSIX is the newer of the two and is still under development, so depending on what ker- nel version is used there might be some portability issues. Shared memory shares the same principles for write and read access as the shared storage. This approach is also prone to race conditions, if two processes or threads are trying to access the same shared resource si- multaneously. There are two types of semaphores commonly used: binary semaphores and counting semaphores. Counting semaphores are used when there is an upper limit for shared resources. A binary semaphore is also called a mutex.

2.5 The programming language C++

C++ was created by Bjarne Stroustrup, with the aim to be the next step of the programming language C. This entailed adding object-oriented support to the C language, as well as the notion of classes and strong type checking. C++ was originally commercialized in 1985 and has had several updates since its release, with additions such as multi-inheritance, regular expression support, new syntax and new compilers [2]. C++ is a compiled language with support for data-abstraction, object-oriented programming, concurrency as well as low-level programming and is today one of the most used programming languages [61].

2.5.1 Concurrency When C++ was originally launched it did not include features for concurrency. However, with the release of C++11, built in support for concurrency was added to the standard li- brary. This concurrency support consists of giving the programmers the ability to spawn and handle new threads and make them execute tasks. The threads launched share address space, meaning they can communicate through shared memory, see Section 2.4. Therefore, users that are using these threads needs to be aware of and utilize locks or similar mecha- nisms to avoid data races [48]. Features were also added that enabled programmers to write programs that can for example, wait for resources to be available and more [63]. With the re- cent release of C++20 constructs called coroutines were implemented, and functions related to that4. Support for green threads is provided by libraries such as the Boost. library, which can be used for concurrent single-threaded execution with a cooperative scheduler. A Boost.Fiber works like the quintessential green thread, see the definition given in Section 2.2.

2.5.2 Compiler C++ has many different compilers, for example Visual C++5, Clang6 and GCC 7. For back-end compilation, different front-end compilers utilizes different back-end compilers. An example of a back-end compiler is LLVM which is used by Clang. Much like the C programming language, C++ has support for separate compilation [50]. This can be used for organizing the

4https://en.cppreference.com/w/cpp/language/coroutines 5https://docs.microsoft.com/en-us/cpp/?view=msvc-160 6https://clang.llvm.org/ 7https://gcc.gnu.org/

9 2.6. Go code into smaller fragments, with a header file describing the interface or class contents, and an implementation file containing the implementation.

2.5.3 Memory management C++ has what is called manual memory allocation, meaning that the programmer is respon- sible for allocating heap memory. Memory allocated also needs to be deallocated manually in C++ [49]. This is done by using destructors and "delete" operators for example. The rec- ommended way to make memory management easier in C++ is to use a design pattern called Resource Acquisition Is Initialization (RAII)8 [9]. RAII describes the life-cycle of a resource, saying that it has to be acquired before use and the resource is destructed after use. How- ever, this can also be handled automatically if using different C++ constructs such as smart pointers9 or more low-level solutions such as manually de-allocating memory.

2.6 Go

Go was introduced in November 2009 and has been under development since. It was created by Robert Griesemer, Rob Pike and Ken Thompson who worked for Google and the reason for creating Go was their frustration with existing programming languages [55]. The main goal for Go was to create a language that could be efficient when compiling and executing as well as effective, expressive and reliable when writing the code [19]. Since C++ was developed in the ’80s it did not have access to the same amount of memory, cores and other modern components that are present today. Therefore, C++ was focused on a single block of memory and a single processor, meaning utilizing modern hardware components required a bit of effort. The motivation behind Go was a language that could efficiently and easily utilize modern components natively and therefore fill the same space that C++ did in the 1980s [14]. The resulting language is a statically typed language that has automatic garbage collec- tion and can be described as a language heavily influenced by C++, but with emphasis on safety and simplicity [19]. From C++, Go inherited its expression syntax, control-flow state- ments, basic data types, call-by-value parameter passing, pointers and also the compilation into . It does however have an unusual approach to object-oriented program- ming since objects cannot be created per se, instead structs that contain the data linked to the "object" have to be created. Go is a general-purpose language and can be used for everything from small purpose-built software to larger integrated systems. Companies like Google and Netflix use Go in many different projects, such as computing recommendations10 for Netflix users and dl.google.com11 for serving Google downloads. The downside of Go is that at the time of writing it does not support generics, meaning for example some repetition of code and increased complexity when testing [54]. Another downside is the handling of objects in Go. Objects are, as mentioned before, treated differ- ently which can lead to some confusion or problems for new developers.

2.6.1 Concurrency The Go language has built-in support for concurrency and the constructs are called goroutines (see Section 2.6.4) [20]. The usage of goroutines makes the Go process run onto a smaller set of operating-system threads and is very similar to how Erlang handles concurrency. Simply put, a goroutine is a function call that completes in an asynchronous manner. The Go compiler can either use a single thread and use different timer signals to switch between these threads, or

8https://en.cppreference.com/w/cpp/language/raii 9https://en.cppreference.com/book/intro/smart_pointers 10https://github.com/Netflix/rend 11https://github.com/golang/go/wiki/GoUsers#united-states

10 2.6. Go the compiler can spawn an OS thread for each goroutine. The threads communicate through channels which enables non-blocking communication of their current state.

2.6.2 Compiler The default compiler for Go is called Gc and is written in Go based on the Plan 9 loader [56]. Gc used to be written in C, but was converted to Go so that Go could be self-hosting. However, users can decide if they want to use another front end compiler called Gccgo which is written in C++ instead, this couples to the standard GCC backend.

2.6.3 Memory management The language Go can be more CPU intensive than languages like C++ and Rust since it has automatic garbage collection [27]. The collector is used to remove unused heap elements meaning it can free up memory leading to less memory (RAM) consumption, but at the cost of CPU performance.

2.6.4 Goroutines This section is based on the research by Deshpande et al. [18]. For the purpose of this the- sis there is no need to go in-depth about the channels, only the scheduler and goroutines themselves are interesting. Goroutines consist of three different structures and the most important task of these is to keep track of their stack and their current state. These three structs are called the G struct (goroutine), M struct (OS-thread/machine) and P struct (Processor). The G struct represents the goroutine itself, and its most important responsibility is to keep track of the stack of and the state of the routine as well as references to the code it is supposed to run. The M struct is the runtime system representation of an OS-level thread. It mainly keeps track of all G’s (both running and waiting in a queue), its own cache and current locks being held. Finally, there is the P struct, which contains two different G queues, one for G’s waiting to run (i.e. ready to be picked up by M’s) and one for free G’s. There is also a queue of M’s that are idle, as well as a global scheduler queue which contains G’s. All M’s that are spawned are assigned to some P, a M cannot execute code without being assigned to a P. If all the M’s are making blocking calls or are being run, the M queue in the P is empty. G’s can spawn more M’s if needed up to a maximum of a user set maximum variable named GOMAXPROCS. At any time at most one G can be run per M and only one M can run per P. The reason for having P’s is for instance if a M makes a system call, then that M is "detached" from their P while doing the system call, and the P can handle their next M. But as soon as the detached M is finished with their system call and wants to run the remainder of the code, they need to be assigned to a P first. Figure 2.2 shows three P’s running G’s on M’s in parallel. Green means they are running, red means they are waiting. This is a simplistic view, since there might be several thousands of G’s and multiple detached M’s.

11 2.7. Rust

G G G

M M M M M M M

P P P

G G G G G G G G G G G G

P local queue P local queue P local queue

G G G G G G G G G G G G

Global Queue

Figure 2.2: Go runtime system overview (Adapted from [47])

Now the scheduler needs to assign G’s to the different M’s in a fast way. The scheduler does this by simply finding a G that is runnable in the P queue or the global queue and then executes this. The way this works is with work-stealing. If a G is either created or changed state into runnable, it becomes available in the local queue of ready G’s in the P. When a P finishes executing one of their G’s, they try to take one G from their local queue but if this queue is empty it will instead steal half of another P’s local G queue. What P it steals from is chosen at random.

2.6.5 Go Runtime system Since Go has high-level support for automatic garbage collection, goroutines and channels it needs to have an effective runtime system infrastructure [18]. In Go, every call that the user-written code is making to the operating system goes through the runtime system layer so it can effectively handle things like scheduling and garbage collection. However, the most important work performed by the runtime system is arguably to keep track of and handle the scheduling of the goroutines.

2.7 Rust

Rust was created by Graydon Hoare and started as a side-project in 2006 [59] that later led to the first stable release coming out 2015 and has since seen regular releases every six weeks [58]. The three main goals of Rust is performance, reliability and productivity. The language has support for object oriented programming, is statically typed, guarantees thread and mem- ory safety, provides memory management enforced by RAII instead of using a garbage col- lector [43] and provides integration with other programming languages [60].

12 2.7. Rust

2.7.1 Concurrency One of the major goals of Rust is to provide safe and efficient concurrency mechanisms [33, Chapter 16]. This is achieved by ownership and type-checking, meaning that most errors related to concurrency will be detected at compile-time instead of runtime. The consequence of this is that subtle bugs that otherwise would be missed by some compilers and found in specific runtime system scenarios can be found and corrected at an early stage. Rust achieves concurrency by letting the programmer implement multi-threading. For the threads to be able to communicate with each other, the Rust standard library provides two ways to do so: message passing using channels and through shared memory. Channels are split into a receiver and a transmitter, where there can be multiple transmitters (i.e. multiple threads producing data) but only one receiver (i.e. one thread consuming the data). The receiver can either be blocking or non-blocking. When it is blocking the thread will just wait for incoming messages on the channel without executing any other code during the wait, and if it is non-blocking it will periodically check if there are any messages on the channel and perform other work in the meantime. Rust handles safe concurrency when it comes to channels by enforcing ownership rules, which means that when a message has been sent from a transmitter to a receiver, the message can no longer be accessed from the transmitter. This check is done during compile-time, so an incorrect reference to a message that is no longer owned by the current thread will produce an error before it is even run, meaning that the incorrect code cannot be run [33, Chapter 16]. The Rust version of green threads are called tokio::task (task) which use async/await syn- tax combined with the Tokio runtime12. Tasks correspond to the definiton of green threads given in Section 2.2.

2.7.2 Compiler This section is based on the Rust Compiler developer guide [57]. The Rust compiler (rustc) is self-compiling and it is written in Rust and uses LLVM as its backend. The compilation process is started when the user invokes the rustc command on Rust source code, with optional command line arguments. Thereafter, the source code is converted into token streams by a low-level lexer, which then are passed to a high-level lexer that validates the tokens and performs interning on them. Interning is a technique where, during compilation, many values are allocated in something called an arena. Values in the arena are immutable and can be compared at a low cost, by comparing pointers and is use- ful for preventing duplicate values, for example types or identical strings. Interning is used for optimizing memory allocation and performance by only allocating memory once, instead of several times. After the lexers are done, the token streams are forwarded to the parser that creates an Abstract Syntax Tree (AST) and does additional validation. The AST is then converted to a High-Level Intermediate Representation (HIR), which in turn is converted to a Mid-Level Intermediate Representation (MIR) and Typed High-Level Intermediate Represen- tation (THIR). HIR is used for type inference, THIR for pattern and exhaustiveness checking and MIR for borrow checking. Exhaustiveness checking is used to validate match-expressions and a match-expression is valid if all possible branches are covered, otherwise it is not exhaus- tive. Match-expressions that are not exhaustive will then throw a compile-time error. The MIR is optimized and monomorphized (copying generic code and replacing type parameters with concrete types) before moving on to code generation, which entails converting the MIR to a LLVM Intermediate Representation and passing it to the LLVM backend to generate ma- chine code. The last step of the compilation process is linking the binaries and libraries to create the final binary file that can be executed.

12https://github.com/tokio-rs/tokio

13 2.8. Related work

2.7.3 Memory management To guarantee memory safety, Rust uses ownership which is a key feature of the language. The three rules of ownership in Rust are: every value in Rust has an owner, a value can only have one current owner and when the owner is no longer relevant, the value is dropped [33, Chapter 4]. What this means is that memory allocation and deallocation is done automatically when an owner goes out of scope. The owner is in charge of deallocating all of its values, by calling the Drop method implemented by its values, which clears the allocated memory on the heap. Checks for violations of ownership rules are done at compile-time, meaning that potential memory leaks and violations will result in a compile-time error [33, Chapter 4].

2.8 Related work

This section focuses on previous work in related areas, such as performance comparisons, C-RAN investigations and different metrics for measuring soft qualities of software.

2.8.1 Measuring performance differences for programming languages Measuring the performance of different programming languages has been relevant for al- most as long as there have been multiple programming languages. One of the most basic, but also most important and relevant ways to compare programming languages is by bench- marking using the same type of program and measure aspects such as CPU-usage, processing efficiency and memory usage. When comparing the performance of different programming languages, there need to exist one or several concrete problems or use cases that should be implemented in the different languages. The same type of programming mechanisms should be employed when creating the solution to the problem, e.g. the solution should use con- currency. Serfass and Tang [45] did a comparison of Go and C++ Threading Building Blocks (TBB), which is a library that enables tasks to be run on multiple cores in parallel. How- ever, this comparison was done on a very specific use case and problem, so the result that C++ TBB has better performance than Go is not generally applicable to other problems. For instance, the performance of the C++ TBB library is not relevant to the performance of the C++ boost::fibers library, meaning it cannot directly be applied in this thesis. However, the same general approach for answering the research questions will be used, namely identify the use case, identify what type of problem it poses and see if it translates into a similar but more simplified problem, propose a solution for the simplified problem and then implement a solution. In the paper by Memeti et al. [38], the authors describe a way of comparing four dif- ferent programming frameworks (OpenMP, OpenCL, OpenACC, CUDA), in regards to pro- gramming productivity, performance, and energy consumption. The way of measuring the programming productivity, by comparing how many lines of code that are needed to pro- vide multi-threaded support for an application, is very usable for this thesis. This method, or some variation of it, can be applied when evaluating development efficiency of different programming languages. The way of keeping the same hardware and using an external tool with little overhead for measuring performance of the programs will also be useful. How- ever, a different tool than the one used in the paper by Memeti et al. is needed for this thesis, since it is a library for C++ and cannot be used for other programming languages. The paper by Sachs et al. [44] focuses on constructing a benchmark for Message-oriented Middleware (MOM), with a focus on using real-life data instead of synthesized data. Also, their aim is to provide a benchmarking program suitable for the entire MOM server and its functionalities, instead of isolating specific parts which had been done in previous works according to the authors. The benchmark the authors produced is called SPECjms2007, which can use custom workloads in order to test either the MOM application as a whole, or selected components. Given the fact that a RAN application is similar in nature to a MOM application,

14 2.8. Related work it would be appropriate to use real-life data from the actual system when implementing the RAN application. However, this would make the scope of the thesis too large and instead synthesized data will be used.

2.8.2 Evaluating development efficiency differences for programming languages The article by Ardito et al. [5] did a comparison of the verbosity, understandability, organi- zation, complexity and maintainability of Rust to C, C++, Python, JavaScript and TypeScript. The metrics used in the comparison were lines of code, number of methods, number of ar- guments, number of exit points, Cyclomatic Complexity, Cognitive Complexity, Halstead metrics and Maintainability Index. The source code used in the comparison were nine al- gorithms commonly used when benchmarking performance of programming languages and was taken from the Energy-Languages’ github repository13. The results of this comparison was that code written in Rust is more readable, more organized, less verbose and equally maintainable compared to C and C++. Compared to Python, JavaScript and Typescript, Rust had lower maintainability. The authors also point out that Rust had the lowest Cognitive Complexity of all the investigated languages, indicating that the understandability of the Rust source code was better than the other languages. From this article, it is reasonable to assume that the development efficiency compared in this thesis will provide a better score for Rust compared to C++. However, it does not give an indication of how Go will score compared to either Rust or C++. Additionally, some of the metrics used for determining the softer aspects of programming languages will be included in this thesis. Constanza et al. [17] had a more informal approach when comparing the ease of program- ming in C++, Go and Java. The authors had to decide which language would be more suit- able for their sequencing tool elPrep, where memory management and manipulation of large amounts of data is important. Therefore languages with support for safe reference counting and concurrent, parallel garbage collection was chosen to be compared. The authors ap- proach to decide which language to use was to implement a nontrivial subset of elPrep and then benchmark these different implementations. From these performance benchmarks they settled for Go. But since the tool will get regular updates and need maintenance they also realised that softer aspects, such as how difficult it is to implement new features, needs to be investigated as well. To solve which language was the easiest to implement new features in they simply compared the different challenges encountered and the solutions to said chal- lenges. Their conclusions from this was that C++ required the most development effort since there are more options that needs to be compared (allocate on stack/heap, deciding on mem- ory managers etc.). Go and Java was found to be comparable, where some features where easier in Go, and some in Java. From this article we can see that it is important to consider softer aspects when deciding on a programming language, however a more formal approach would be useful since the conclusions by Constanza et al. are quite subjective.

2.8.3 Comparing compilers In the paper by Sagonas et al. [34] the authors compare three different backend compilers for the Erlang language. The compilers compared were ErLLVM, High Performance Erlang (HiPE) and BEAM (Björn’s Erlang Abstract Machine). The motivation behind their compari- son was that the authors had come up with the new backend compiler ErLLVM, which build on LLVM, and wanted to see how it compared to existing compilers. The paper describes in detail how they changed the LLVM to make ErLLVM and how ErLLVM works. From the performance comparison the conclusion was drawn that the new ErLLVM was significantly faster than BEAM and achieves similar performance as existing HiPE implementations. One could also see a difference in compile time for the different compilers.

13https://github.com/greensoftwarelab/Energy-Languages

15 2.8. Related work

The paper by Machado et al. [36] compared the performance between the compilers GNU Compiler Collection (GCC) and Intel C++ Compiler (ICC), with different optimisation flags and on two different multi-core architectures. The algorithm chosen for performance eval- uation was the Embarrassingly Parallel (EP) kernel [6]. The results of the study were that the most optimized sequential execution time for the Intel architecture was 778 seconds and for the AMD architecture it was 997 seconds for the GCC compiler. For the ICC compiler, the optimized sequential execution time decreased to 525 seconds for Intel and 776 seconds for AMD. This represents roughly a 33% decrease for the Intel architecture and roughly 23% decrease for the AMD architecture. For multi-core execution, the EP kernel was implemented in Pthreads, C++11, Cilk Plus, OpenMP and TBB. The result of the multi-core execution was that the execution time increase varied greatly between the different multithreaded imple- mentations. The papers by Sagonas et al. [34] and Machado et al. [36] show that different compil- ers and compiler configurations can have an impact on runtime system performance, which needs to be investigated and compared when doing similar studies and will therefore be con- sidered for this thesis. Although suitable, an investigation of different compilers will not be done in this thesis, since the time investment of doing such an investigation would be too large. For example, if we chose to investigate just three different compilers (excluding dif- ferent configurations of these) in each language on each hardware, this would result in 45 unique combinations to be tested and compared.

2.8.4 Advantages and disadvantages of C-RAN One advantage of C-RAN is that it is more cost efficient than its predecessors, since multiple BBUs can be grouped in a co-located BBU Pool, thus lowering energy consumption [13]. Also, the co-location of BBUs combined with virtualization of BBUs gives the ability to adjust to a change in user-demand, which further decreases energy consumption and improves scala- bility [42]. The work presented by Tang et al. [52] also presents the advantage of reducing cost in a C-RAN solution compared to traditional solutions. They found that according to simulations the performance and cost are more beneficial than previous methods. The disadvantages of the C-RAN solution are that one needs to have a BBU pool nearby, otherwise the transportation cost will be high. There is also a need for a high amount of bandwidth in order to transport the necessary information to the BBU pool, since it can be connected to up to 1000 base stations [13]. One also needs to make sure that the BBU pool is reliable and that the cells in the pool are optimally assigned, to make the savings worth it.

2.8.5 Energy consumption evaluation of programming languages The paper by Pereira et al. [41] investigated if there was a correlation between the aspects execution time, energy consumption and peak memory usage. The authors wanted to in- vestigate if programs with shorter execution time also consumed less energy. Ten different benchmarking programs implemented in twenty seven programming languages were com- pared in terms of execution time, energy consumption and peak memory usage. The con- clusion from the paper was that a faster language is not always more energy efficient. If only looking at the languages that are also compared in this thesis, and considering all three aspects, then the conclusion according to their data is that Go is the best language. But for embedded applications, execution time and energy usage might be more relevant than peak memory usage. Therefore, taking only execution time and energy usage into account, Rust is the best choice, followed by C++.

16 2.9. Soft metrics

2.9 Soft metrics

When considering the scale of the code base of companies like Ericsson, qualities like per- formance are no longer the only important metric when measuring whether a programming language is good for them to use or not. Companies should also consider the "softer" aspects such as readability, usability etc, since for example the readability of a program or language correlates to its maintainability. Therefore, for the purpose of this thesis, the focus will be on productivity, maintainability and understandability.

2.9.1 Productivity For the context of this thesis the definition of productivity provided by Institute of Electrical and Electronics Engineers as the “ratio of work product to work effort” [29] will be used. A method for measuring performance productivity is presented by Schreiber et al. [32] and the authors suggested a mathematical expression for productivity of a programming language compared to a "base language" as seen in Section 2.4. P denotes the problem of interest, subscript 0 means the base program and subscript L is the L program (the program written in the programming language to compare). The ease of use, or relative power of the language ρL is given by Equation 2.1, I(P0) ρL = (2.1) I(PL) where I(P0) is the implementation time of the base program and I(PL) is the implementation time of program L.

The performance of the language, or the efficiency eL is given by Equation 2.2,

E(P0) eL = (2.2) E(PL) where E(P0) is the average execution time per run for the base program and E(PL) is the average execution time per run for program L. The problem-dependent parameter X is given by Equation 2.3,

E(P ) X = r L (2.3) I(PL) where r is a weighting factor that is specific to the current problem that reflects how impor- tant it is to minimize execution time compared to implementation time, E(PL) is the average execution time per run and I(PL) is the implementation time. The productivity of a language compared to the base language is seen in Equation 2.4.

ρ + e X productivity = L L (2.4) 1 + X The authors suggest that these metrics provide a good estimation of how the productivity can be measured on a specific problem implemented in different languages.

2.9.2 Maintainability For the context of this thesis the first (1) definition of maintainability provided by IEEE as the “ease with which a software system or component can be modified to change or add capabilities, correct faults or defects, improve performance or other attributes, or adapt to a changed environment” [29, p. 258] will be used.

17 2.9. Soft metrics

Coleman et al. presented in 1994 a polynomial assessment model for measuring main- tainability of code [16]. The formula they presented as maintainability index was:

Maintainability = 171 ´ 5.2 ˚ ln aveVol ´ 0.23 ˚ aveV(g1)´ (2.5) 16.2 ˚ ln aveLOC + (50 ˚ sin 2.46 ˚ perCM)

Where aveVol is the average Halstead volume metric, aveV(g’)a is the average cyclomatic com- plexity, aveLOC is the average lines of code and finally perCM is the percentage of comments in the code. The Halstead volume metric [26] V can be calculated as

V = N ˚ log2(η) (2.6) where N = N1 + N2 and η = η1 + η2. N1 is the total number of operators, N2 the total number of operands, η1 the number of distinct operators and η2 the number of distinct operands. The Halstead volume can be interpreted as a metric for how much information a reader of the code has to ingest in order to understand the code. Cyclomatic complexity [37] can be calculated using the control flow graph of the code (where each node is a basic block). The metric was first introduced in 1976 by Thomas Mc- cabe. The method can for example, accurately predict the number of test cases that will be needed to test code. In essence, cyclomatic complexity metric M can be determined as:

M = E ´ N + 2P (2.7) where E are the number of edges of the graph, N is the number of nodes of the graph and P is number of connected components. A very simple example of this would be a program that only has one if-statement. This code would only have two paths, either the if-condition is True or False, meaning the cyclomatic complexity would be two. In this thesis, the version of maintainability index presented by Visual Studio will be used instead14. Their version normalizes the value to lie between 0 and 100 instead of the original ´8 to 171. Visual Studio also removed the comments part, which is not of any interest in this thesis either. The formula used is:

Maintainability index = MAX(0, (171 ´ 5.2 ˚ ln Vol´ (2.8) 0.23 ˚ V(g1) ´ 16.2 ˚ ln LOC) ˚ 100/171)

2.9.3 Understandability For the context of this thesis the definition of understandability provided by IEEE as the “ease with which a system can be comprehended at both the system-organizational and detailed- statement levels” [29, p. 485] will be used. This thesis will focus on the detailed-statement level. As previously stated, metrics like understandability are increasingly important for com- panies involved in software development. However, these metrics are hard to measure and therefore new methods are developed to this day. One example of such a metric is Cognitive Complexity which was presented in 2017 by a company called SonarSource [12]. One could argue that this is in ways a modernization and/or extension of cyclomatic complexity, since cyclomatic complexity for example does not measure more modern structures such as try/- catch. This method was developed as a way of measuring the understandability of code [12], unlike cyclomatic complexity which mainly focuses on maintainability and testability. One important part of the cognitive complexity metric is that it does not rely on mathematical models to assess the code, instead it uses assessment rules that represent the mental effort of understanding code.

14https://docs.microsoft.com/en-us/visualstudio/code-quality/code-metrics-maintainability-index-range- and-meaning?view=vs-2019

18 2.9. Soft metrics

Since this was just recently published by a company, one could question the validity and correctness of the method. This was investigated in a study by Muñoz Barón, Wyrich, and Wagner [39] and they concluded that cognitive complexity is a promising metric for mea- suring the quality goal understandability. They found a correlation of the time it took for a developer to understand the code with a combination of time and correctness. Cognitive complexity is based on three simple rules [12]:

• Ignore the structures that allow multiple statements to be readably shorthanded into one.

• Increment (add one) for each break in the linear flow of the code. Examples are GOTO, recursive methods and break/continue.

• Increment when flow-breaking structures are nested. Example of this would be a two- dimensional nested loop, which would increase the score to two (i.e. one per level), and an If-statement inside the nested loop would increment once more to a three.

There are four different types of increments in cognitive complexity: nesting, structural, fun- damental and hybrid. Nesting increments occur when there are nested control-flow struc- tures, for example an if-statement within a lambda-function. Nesting increments do not di- rectly contribute to the cognitive complexity score, but instead make other increments within a nested structure more expensive. Using lambda functions does not increment the total score but increases nesting with +1, so an if statement within a lambda would give +2 (due to the nesting increment) instead of +1 if it would not be nested. Structural increments are control-flow structures within a nested statement that also increase the nesting increment, for example an if-statement within an if-statement. Fundamental increments are used for state- ments that are not nested, such as binary expressions or methods within a recursive loop. Thus, the use of extensive recursion will yield a high score for cognitive complexity. Hybrid increments are used for control flow structures that are not nested but increase the nesting value. Examples of hybrid increments are else-if statements and else statements [12].

19 3 Background

This chapter focuses on presenting a brief introduction to Ericsson as a company, as well as an overview of the currently implemented RAN system at Ericsson that is the foundation of the research conducted in this thesis. The system is divided into two parts, a test driver and an application. By studying the given system, the goal was to identify the key characteristics of the RAN application in order to create a simplified version of the system with the same characteristics that requires less effort to test and implement in other languages. In this chap- ter, the term system under test (SUT) will be used when referring to the system implemented by the authors of this thesis.

3.1 Ericsson

Ericsson is a Swedish networking and telecommunications equipment company with 100 824 employees (as of December 2020). Ericsson was founded in 1876 by Lars Magnus Ericsson as a telegraph repair company. Today, Ericsson is leading the implementation of 5G technologies and holds more than 57 000 patents. Ericsson has offices all around the world, in about 180 countries and their products are used world-wide [1].

3.2 System requirements

The research in this thesis is done on behalf of Ericsson. Therefore, Ericsson had a list of requirements for the SUT which were:

1. The characteristics identified in the original RAN system at Ericsson shall be the same as in the SUT.

2. The communication between the test driver and SUT shall use some kind of IPC, and the test driver should preferably spread the start of new sequences in time to mimic telecommunication applications.

3. The SUT shall use green threads (see Section 2.3) with context switching.

4. The SUT shall be single-kernel-threaded and use green threads for concurrency.

20 3.3. System description

5. Encode/decode messages sent via IPC using Google Protocol Buffers1.

6. Optional: Use XCM (see Section 3.3) as an IPC and wrap the XCM in the different programming languages, in order to investigate the difficulty in wrapping these.

7. Optional: Use ASN.12 for encode/decode of binary blob part of one IPC message (i.e. ASN.1 inside a Google protocol buffer encoded message) This is to establish how easy Go and Rust is to interface third part libraries for ASN.1 encode/decode.

3.3 System description

This section describes the original RAN system at Ericsson, which were to be simplified and implemented in C++, Go and Rust. This is later used to identify the characteristics of a RAN application, see Section 4.1.1. The system is comprised of two main components, a test driver and the RAN application (normally running in the base stations). The test driver and application is written in C++ and runs on different threads. To communicate between the test driver thread and the application thread XCM is used. XCM3 is an API for IPC written in C developed by Ericsson, that enables the use of different transport protocols with a unified interface. The test driver simulates multiple User Equipments (UEs) trying to connect and send messages at the same time to the application. Therefore, the application needs to simulate a data unit, where the equipment is registered, and the message is handled. The application is a complex state machine that handles prioritization of the different messages as well as routing between running Composite Procedures (CPs are explained in Section 3.3.2).

3.3.1 Test driver The test driver is responsible for simulating a use case of multiple UEs trying to send mes- sages. The execution flow of the test driver overview seen in Figure 3.1 is as follows: 1. The main thread starts several sequences (can be seen as users trying to access some- thing). Each sequence is registered in the router, with a unique ID, message filter and user filter.

2. For each function in the sequence, a call is made to the Service Helper where the func- tions are executed, generating an encoded message that is sent to the application. All sequences are within Boost fibers [10], and since the communication is asynchronous, the Boost fiber yields as soon as the message has been sent. This means that the next se- quence can be scheduled while the other is waiting for a response from the application. For a detailed explanation of Boost fibers, see Section 2.5.1.

3. The application processes the message, returns an event to the Service Helper which decodes it and forwards it to the router. Here a queue is built up while the router forwards the message to the correct sequence. This is done by looping over all Boost fibers that are active and asking them if they want to accept the message. This request is done based on the message filters and user filters that are passed into the registration of the thread at the router and forwarded so information is preserved.

4. Once the router finds the correct sequence the message is sent to that sequence and the next function can be started.

5. Once all functions in a sequence have received their data the fiber will terminate and deregister from the router.

1https://developers.google.com/protocol-buffers 2https://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx 3https://github.com/Ericsson/xcm

21 3.3. System description

6. Once all fibers have terminated, the test driver is done and exits.

This is a simplified view of how the test driver works, within the actual driver there are multiple different minor events also happening, but for the identification of the characteristics these were deemed irrelevant.

Application

Outgoing Incoming

Google Protocol Buffer via XCM

Green thread Thread

Test driver C++ implementation Encode Decode Single threaded Service Helper

Incoming

Outgoing Router

Sequence N Sequence 0 Sequence 1 Sequence 2

Figure 3.1: Test driver overview

3.3.2 Application Figure 3.2 demonstrates how a RAN application can look. The key components of the appli- cation are the Composite Procedure (CP) and the Atomic Procedure (AP). A CP corresponds to a function that contains code and APs. An AP is an asynchronous call that can for example

22 3.3. System description request data (from the test driver in this case). APs makes the current CP yield their execu- tion while waiting for the response. This means multiple CPs can run concurrently within the application. The figure does not represent an exact copy of a RAN application at Erics- son, since these are confidential and have a great number of features that are not relevant for this thesis. The system depicted is boiled down in order to demonstrate the key fundamental characteristics identified from the Ericsson code. The execution flow of a RAN application can look as follows:

1. An eventloop is started on a boost fiber inside the application thread. This eventloop waits for something to be received or something to be sent. However, it does not block.

2. A message is received from the test driver to the application thread.

3. The message is decoded and sent to the router.

4. The router is a linked list of handlers (for example a message can be "start procedure 2") and each handler is asked if they want the message until one accepts. If no handler accepts the message, the router instead sends the information directly to an AP (an asynchronous call from within the application that is currently waiting for some data).

5. The state of the CP is updated and depending on the new state a choice is made. It can for example be to start a new CP or kill it.

6. An AP requests/sends data to an outside source (test driver) via the encoder. If it needs to wait for a response, the AP yields the current boost fiber until woken by the router.

23 3.3. System description

Green thread Thread Application thread

Procedure green thread

Procedure

AP Procedure spawner AP AP AP 0...n

State of procedure Event Producer

Receive signal Send signal

Initialize/Reset

Main green thread

Route

Message handler (event loop)

Decode Encode

Incoming Outgoing

Google Protocol Buffer via XCM

Test Driver

Figure 3.2: Application overview

24 4 Method

This chapter is focused on presenting the approach to solving the aim of the thesis. This was done in four main phases: a pre study phase (see Section 4.1), an implementation phase (see Section 4.2), a phase for comparing performance (see Section 4.3) and a phase for comparing the development efficiency (see Section 4.4). The purpose of the pre study phase was to estab- lish the characteristics of the RAN system provided by Ericsson, in order to create a simplified version of the RAN system that could be implemented in C++, Go and Rust. These imple- mentations were used to compare performance and development efficiency. During the pre study it was also determined what hardware to use and compare, as well as which compiler configurations to use. During the implementation phase the algorithms derived from the pre study phase was implemented in the different languages, as well as the communication between processes. When an implementation that reflected the original system’s character- istics was implemented, the phases for comparing performance and development efficiency was started. The purpose of these phases was to extract data that could be compared and analyzed.

4.1 Pre study

Since no related work was found about the characteristics of RAN applications the main characteristics of such programs had to be derived from studies of the code itself comple- mented by discussions with experts at Ericsson. This was done by identifying the different components involved in these applications and then discussing the importance and main responsibility of said components.

4.1.1 Identifying the characteristics In order to identify the characteristics, a small test case of the main code was investigated. The code execution path was followed and key components were identified. From this an overview figure was created and discussed with experts at Ericsson until it was considered correct. The resulting figures can be seen in Section 3.3.2, Figure 3.1 and Figure 3.2. These figures were slightly simplified versions of the system but it was agreed on, together with the experts, that the figures captured the key characteristics. From these figures it was agreed that some parts of the system could be removed or further simplified since they were over-

25 4.1. Pre study complicated for the case that was studied. Other parts could be simulated using methods that were easier to implement but essentially did the same thing. The simplifications and the reason for doing them were:

• Did not use the XCM API since this would only test how difficult it would be to create a wrapper for this API, and was deemed to be irrelevant for the purpose of the research questions of this thesis. Instead, a Unix domain socket with TCP was used.

• Encode/decode was removed from test driver and application, since they only produce a constant delay which will be measured in a separate test case.

• Communication is now a simple string over a TCP stream. The test driver sends which CP to start (a 1-9 integer) and the ID for that CP. The application acknowledges when it has completed a CP, wants more data or has exited (a 1-9 integer) and what ID that action corresponds to. There is no need to send fully encoded messages, since this only introduces a constant time delay.

• There are no boost fibers in test driver sequences, since we do not care about the per- formance of the test driver as long as it behaves the same way for each application.

• The message filters and user filters in the test driver were removed, since they are not needed for simple sequences and we do not have boost fibers anymore. This also led to the combination of the service helper and router.

• The event-producer was removed since simple strings are sent without encoding/de- coding, so the APs can do it directly.

• Overall made each step in application to be the bare-minimum. Meaning for exam- ple that the procedure spawner is a simple switch case and the router is just a list of structures.

The RAN application characteristics found were:

1. Client-server architecture

2. The application handles multiple messages concurrently

3. Communication is possible between running processes (via TCP)

4. The communication is asynchronous

5. The application is concurrent via green threads

6. The application is single-threaded

From the identification an overview of the resulting test driver and application was cre- ated and can be seen in Figure 4.1 and Figure 4.2. The overall application can be described as a message heavy asynchronous concurrent single-threaded application.

26 4.1. Pre study

Application

Outgoing Incoming

TCP Stream with simple string instructions

Green thread Thread

Test driver C++ implementation Single threaded Service Helper

Router

Outgoing

Sequence 0 Sequence 1 Sequence 2 Sequence N

Figure 4.1: Simplified Test Driver

27 4.1. Pre study

Green thread Thread Application thread

Procedure green thread

Procedure

AP Procedure spawner AP AP AP 0...n

State of procedure

Receive signal Send signal

Start/Exit

Main green thread

Route

Message handler (event loop)

Incoming Outgoing

TCP Stream with simple string instructions

Test Driver

Figure 4.2: Simplified Application Overview

4.1.2 Hardware Since RAN applications need to be able to run on different kinds of COTS hardware there was a need to run the tests on different hardware with different architectures. The hardware used by Ericsson and available for the project was three different COTS x86 based computers and two different ARM based computers. For all hardware, the hardware multithreading was

28 4.2. Implementation disabled. The programs ran in Docker containers on Dell R630 and Dell R6525 hardware. The hardware available can be seen in Table 4.1.

Table 4.1: Available hardware Name CPU RAM Linux version BB2.0 ARM A15 12 core @1.4GHz 6GB DDR3 @1866 MT/s WindRiver linux 18 BB2.5 ARM A57 12 core @1.4GHz 6GB DDR3 @1866 MT/s WindRiver linux 18 BB3.0 Intel Snowridge 16 core @2.20 GHz 8GB DDR4 @2933 MT/s WindRiver linux 18 Dell R630 2 X Intel Xeon E5-2620 v3 6 core @2.40GHz 256GB DDR4 @1866 MT/s Ubuntu 20.04 LTS Dell R6525 2 X AMD EPYC 7742 64 core @3.40GHz 1TB DDR4 @3200 MT/s Ubuntu 20.04 LTS

4.1.3 Choosing compiler From the related work conducted (see Section 2.8.3) it was clear that different compiler con- figurations can have different performance impacts. However, finding the optimal compiler for each language on each platform would entail a large time investment so it was decided that the recommended release configurations would be used (for Go1, for Rust2, for C++3). Another decision was to use the latest versions available at Ericsson (at the time of testing) of the languages and the compilers. The chosen compiler configurations can be seen in Table 4.2.

Table 4.2: Compiler configurations Platform Language Compiler Build command arm-wrs-linux-g++ -O3 -s -DNDEBUG -std=c++17 - ARM C++ arm g++ 8.2.0 levent -lboost_fiber -lboost_context -lboost_filesystem Gc 1.15.6 ARM Go env GOOS=linux GOARCH=arm go build (bundled) cargo build –release –target armv7-unknown-linux- ARM Rust rustc 1.50.0 musleabihf g++ -O3 -s -DNDEBUG -std=c++17 -levent -lboost_fiber x86 C++ g++ 10.2.0 -lboost_context -lboost_filesystem Gc 1.15.6 x86 Go go build (bundled) x86 Rust rustc 1.50.0 cargo build –release

4.2 Implementation

When the characteristics of the RAN application were identified, the implementation phase begun. During the implementation an effort was made to utilize the different strengths of each language, and stay away from using exactly the same implementation in each language if built-in features could be used instead. The implementation phase was comprised of three parts:

• Developing the test driver

• Implementing the IPC

• Implementing the application in the different languages

1https://golang.org/doc/tutorial/compile-install 2https://doc.rust-lang.org/book/ch14-01-release-profiles.html 3https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

29 4.2. Implementation

4.2.1 Detailed description of the test driver The real world scenario of the test driver is to simulate that sequences are sent to the appli- cation sparsely, at most ~1000 sequences might be connected to the application concurrently and request resources only when doing actual work for example streaming, calling or brows- ing a web page as well as on setup and teardown. In other words, most of the time the connected devices are idle. Since performance of the test driver is not the focus, the driver is single-threaded and without green threads. Algorithm 1: Test driver Initialize the socket; Initialize the event loop; Initialize timer to fire send timeout event every millisecond; while Event loop running do foreach connection in incoming connections do Add a listener for read/write on the socket file descriptor; Add send timer in the event loop; end if Socket file descriptor readable then Message <- Read message; if Message is done then Increment message index for sequence with corresponding ID; Mark sequence as not running; if No more messages then Mark sequence as done; else if Data requested then Send back data; // The data requested is just the message bounced back if Event send timer is ready then if All sequences finished then Send exit to application; Exit event loop; else foreach Not running and not done sequence do Send next message to application; Set sequence to running; end end The final source code for the test driver can be seen in Appendix A.3, listing A.12.

4.2.2 Detailed description of the IPC The chosen IPC was a Unix domain socket with TCP as the underlying transport protocol. Since multiple messages could be sent and received, delimiters were implemented to differ- entiate between messages. The start of a message was indicated by a \r character and the end of a message was indicated by a \n character. Only the first character in the message was used for determining which CP to route to and the subsequent characters are the ID of the sequence to route to. For example, "\r111\n " would route 1 to CP with ID 11.

4.2.3 Detailed description of the simplified application The main idea of the application is to have as many green threads running and - ing as possible at one time. The application contains three actions and one main event loop.

30 4.3. Performance comparison

Simply put, the main event loop listens for incoming messages on the socket and then calls the corresponding CP based on the message’s first character. Algorithm 2: Setup (CP) Initiate sequence object and add to global list; Send data request to test driver; while Data not received do Wait for data; // Asynchronous call, not blocking main thread Send done to test driver;

Algorithm 3: CP3 (CP) Sleep current green thread for 10ms; Send done to test driver;

Algorithm 4: Teardown (CP) Delete sequence with corresponding ID from global list; If needed, deallocate sequence object to free memory; Send done to test driver;

Algorithm 5: Main Initialize the socket; Connect to test driver; Initialize the event loop; Initialize timer to fire timeout event every millisecond; Start event loop in a green thread; while Event loop do if Socket file descriptor readable then Message <- Read message; Do the CP corresponding to the first character in the message; // If starting a new CP, do it in a green thread if Event timeout timer is ready; // Only needed in C++, since it has an explicit event loop then Yield event loop green thread; // To make sure other green threads can run end The final source code for the applications can be seen in Appendix A.3, listings A.2-A.10.

4.3 Performance comparison

The main goal of the performance comparison was to get a good picture of eventual perfor- mance loss or gain that would come from implementing a program with the same charac- teristics and functionality in C++, Rust and Go. Based on the authors general knowledge of these programming languages, the hypothesis was that there would be a performance loss to switch from C++ to Rust or Go. Of the three languages, C++ is by far the most mature language and is well known to be used when performance is critical. To properly compare the different languages based on performance the decision was made to keep the test driver the same throughout the project (see Figure 4.1) and then re- place the application part written in different languages. Another decision was made to keep the communication between the test driver and applications the same as well. Therefore the test driver is a simplified version of an already existing Ericsson test driver written in C++ (see Section 3.3.1), and the communication is done using TCP.

31 4.3. Performance comparison

The interesting part of the system to test and extract data from is the application, since it is the actual representation of the RAN application. For the application, the critical parts are the latency of the communication, the time to handle a GPB package and the resources used to handle multiple messages concurrently. For these three parts, three different tests were created. The sequence test evaluates how the different implementations handle multi- ple messages concurrently, the latency test evaluates the latency of the test driver-application communication and the encode/decode test evaluates the time to handle a GPB. These three tests were all performed for each different combination of programming language and hard- ware. To satisfy the requirement of the application being single-kernel-threaded, some modifica- tions were needed for Rust and Go. C++ is natively single-threaded, so no changes had to be made. For Go, we set the program variable GOMAXPROCS=1, to limit the CPU-core count used to 1. For Rust, this was enforced by using the current_thread4 option when initializing the Tokio runtime.

4.3.1 Sequence test To test the performance of handling multiple messages concurrently in the application it- self, the test driver sends messages from sequences to the application. These sequences are comprised of three messages in a row, where each message respond to some action in the ap- plication. All sequences are identical; they started with a setup message, then a sleep message and ended with a teardown message. The sequence test is the main test that best represents the use case of an actual RAN application (according to experts at Ericsson) and is based on the findings of the pre study in Figure 3.2. The messages that can be sent from the test driver are seen in Table 4.3.

Table 4.3: Sequence messages Message Response a setup message that starts a green thread, creates an AP that yields 1 when waiting for a response, and then sends back that it is finished. See Algorithm 2 a CP that sleeps its green thread for 10 ms and then sends a message 3 when it is done. See Algorithm 3 a teardown message that clears memory allocated by the corre- 9 sponding CP. See Algorithm 4

The application sends a response (a 1 followed by the ID of the sequence) when done with the message, to inform the test driver that it is ready for the next message in the sequence. For example, if a single sequence is started by the test driver, the flow of messages can be seen in the sequence diagram in Figure 4.3. The sequences are comprised in this manner to test many context switches, since a con- text switch happens each time a green thread sleeps or yields, and to test keeping many green threads in memory at the same time. This is measured by monitoring the CPU and memory usage of the application. The time it took to complete each sequence in the different program- ming languages was also measured. To see how well the application handles an increasing amount of sequences, the test driver ran four different scenarios. The first scenario was to start 10 sequences concurrently, and then wait for all of them to complete. The remaining scenarios were with 100, 1 000 and 10 000 sequences respectively. Each scenario was run a different amount of repetitions to be able to collect sufficient amounts of data, see Table 4.4. To run the different applications on the different hardware, a custom in-house program at Ericsson was used. This program

4https://docs.rs/tokio/0.3.3/tokio/attr.main.html#current-thread-runtime

32 4.3. Performance comparison

Test Driver Application

Send 111

Send 211

Respond 211 Respond 111

Send 311 Respond 111

Send 911 Respond 111

Figure 4.3: Sequence message flow between test driver and application enabled automatic execution of scripts and binary files on multiple hardware from a remote location, but limited the execution time of applications to around 100 seconds, which led to using different iterations for ARM and x86_64 architecture. Top5 was used to collect data about CPU usage at a rate of once per second and memory was collected with the Linux time -v6 command. For CPU data, the first, second and third quartiles along with min and max values were calculated from the top data. For memory usage, the max memory allocated was collected from the time output data. To measure the time for each sequence to finish, the C++ library std::high_resolution_clock7 was used. The measurements were taken from the time a sequence was started in the test driver, to the time that the application responded that it was finished with the last message of the sequence. For the processing time per sequence, the first, second and third quartiles along with min and max values were calculated from the times measured in the test driver.

Table 4.4: Sequence test scenarios Platform Scenario Amount of sequences Iterations x86_64 1 10 000 100 ARM 1 10 000 25 x86_64 2 1 000 1 500 ARM 2 1 000 400 x86_64 3 100 5 000 ARM 3 100 1 000 x86_64 4 10 6 000 ARM 4 10 2 000

4.3.2 Latency test To test the performance of the IPC of the implementations in C++, Rust and Go a latency test was devised. The test was a simple "PING", "PONG" request sequence, where a "PING"

5https://man7.org/linux/man-pages/man1/top.1.html 6https://man7.org/linux/man-pages/man1/time.1.html 7https://en.cppreference.com/w/cpp/chrono/high_resolution_clock

33 4.4. Development efficiency is sent from the test driver and the application instantly responds "PONG". The times were measured from time of sending in the test driver to time of receiving in the test driver and was run 1 500 000 times on x86 hardware and 500 000 times on ARM hardware. The time was measured by using the C++ high_resolution_clock class in the std::chrono library8 and was stored in a list. From the time data collected the data for box plots was retrieved by calculating the first, second and third quartiles along with min and max values.

4.3.3 Encode/decode test Another test to measure the performance of the application was a test designed to measure the performance of handling GPB. The test sends a "start" message from the test driver and the application then starts a test sequence. This sequence consists of measuring the time it takes to encode and then decode a 110 bytes large GPB package 1 500 000 times on x86 hardware and 500 000 times on ARM hardware. The time was measured from the start of encoding until the decoding had finished, which was then sent back to the test driver and stored in a list. The time was measured using std::chrono in C++, std::time9 in Rust and time10 in Go. From the time data collected the data for box plots was retrieved by calculating the first, second and third quartiles along with min and max values.

4.4 Development efficiency

The purpose of the soft values investigation was to compare and research the difference of the quality attributes maintainability, understandability and productivity. For large companies that need to update and provide new features in a more rapid pace than ever, it is important to investigate these types of metrics as well. One could argue that if the performance loss is not too great, things like maintainability, understandability and productivity play a more im- portant role than performance. For example, if the average CPU usage increases from 10% to 15% but features take half of the time to implement compared to another programming lan- guage, then the obvious choice, given that performance is not extremely critical, should be to choose the language with better productivity and worse performance. It is however hard to estimate these metrics, given that many are opinion based or knowledge based. Our hypoth- esis is that especially Go will have a major benefit in most metrics related to development efficiency, based on our beliefs and general knowledge gathered during this project. The metrics evaluated were deemed interesting after doing the informal pre study and is related to the related work presented under Section 2.9. These were productivity, main- tainability index and cognitive complexity. These metrics were measured on the code used for the sequence test, since this is the most representative use case. The source code for the sequence test can be seen in Appendix A.3, listing A.2. To measure productivity the Equation 2.4 was used. However, the decision was made, which also follows the recommendation of the authors [32], that I which is a measurement of the implementation time, needs to be estimated instead of measured. Since neither of the authors involved in this thesis were experts in any of the languages, the decision was instead made that we would consult experts to get a proper estimation of implementation time. Furthermore, this will only yield an approximation of I, and the variable Iˆ will be used to symbolize this approximation. To extract the data for time estimations by experts we created a survey where we asked the experts to give a time estimation of how long it would take to create an example application. The survey can be seen in Appendix A.1. A weighted average was used to calculate the estimated implementation times of the experts. This was calculated by weighing the self-estimated skill, on a scale of 1 to 10, of the experts together

8https://en.cppreference.com/w/cpp/chrono 9https://doc.rust-lang.org/std/time/index.html 10https://golang.org/pkg/time/

34 4.4. Development efficiency with their estimation of the time. Only respondents who gave time estimations for C++ and at least one more language were included in the weighted average. The equation used for the weighted average WA was:

num answers skilln ˚ estimated timen WA = n=1 (4.1) ř num answers skilln n=1 ř To measure execution time an average of the medians across all hardware was taken from the results of 10 000 sequences on the sequence test. The time was measured by using the time11 command in Linux. C++ was chosen as the base language to compare against (e.g. P0), since this is the current standard language at Ericsson. This means that the score for the productivity metric for C++ will be 1, and Rust and Go will get a score relative to C++ where a score less than 1 indicates lower productivity than C++ and a score higher than 1 indicates higher productivity. To measure the sub-metrics for maintainability index we used different tools for different languages and calculated the actual maintainability index score using the sub-metrics and Equation 2.8. For C++ and Rust we used rust-code-analysis [4] to extract the data for the three sub-metrics. Since this tool was not yet compatible with extracting data from Go programs, three different tools for Go were used. We adapted the go-complexity-analysis tool12 to ex- tract the Halstead volume metric. Then a second tool was used to extract the lines of code, named gocloc13. A third tool was used to extract the cyclomatic complexity, named gocyclo14. Finally, understandability was used to provide an understanding of how well program- mers could understand the code. Understandability of a program was measured using cog- nitive complexity and is determined by applying the three simple rules when following the code execution stated in Section 2.9.3. To measure this, two different tools were used. For C++ and Rust, the rust-code-analysis tool [4] and for Go, the gocognit tool15. To be able to compare the development efficiency between C++, Rust and Go, an attempt was made to create an equation that combined the three metrics to a single score. Since pro- ductivity is a variable in the equation, the resulting equation will also be relative to the base language which in this case is C++. The resulting equation for development efficiency can be seen in Equation 4.2. A higher score than the base language indicates that the compared lan- guage has better relative development efficiency. The productivity metric could be removed from the equation to get an absolute development efficiency score.

Maintainability Index ˚ Productivity Development Efficiency = (4.2) Cognitive Complexity

11https://man7.org/linux/man-pages/man1/time.1.html 12https://github.com/shoooooman/go-complexity-analysis 13https://github.com/hhatto/gocloc 14https://github.com/fzipp/gocyclo 15https://github.com/uudashr/gocognit

35 5 Results

This chapter is focused on the results obtained. The results are presented in the same order as the work that was carried out, meaning that first the results of the performance comparisons are presented in Section 5.1, and lastly the results of the development efficiency comparison are presented in Section 5.2.

5.1 Performance comparison

In this section the results of the different performance metrics are presented. The tests were comprised to test as much key functionality as possible to provide an overall view of the different languages. The figures in this section show how each programming language per- formed on each of the different computers and the value explicitly shown in the box plots is the median. The lower whisker indicates the minimum value and the higher whisker indi- cates the maximum value. The top of the box is the third quartile of the data and the bottom of the box is the first quartile of the data.

5.1.1 Sequence test In this section the results of the sequence test are presented. This is the test that is most similar to a complete use case, and focuses on keeping many green threads in memory at the same time, as well as making asynchronous calls from the application to the test driver.

CPU usage measurements In this section the results of the CPU usage measurements for each language on each hard- ware configuration is shown. The CPU usage shown was measured for the entire duration of the test.

36 5.1. Performance comparison

7 6.85

6

5

4 CPU usage (%)

3 3

2 2

C++ Rust Go

Figure 5.1: BB2.5 CPU usage for sequence test with 10 sequences and 2 000 iterations

14

12

11 10

8

6 CPU usage (%) 5 4

3 2

C++ Rust Go

Figure 5.2: BB2.0 CPU usage for sequence test with 10 sequences and 2 000 iterations

37 5.1. Performance comparison

9

8

7

6 6

5 CPU usage (%) 4

3 3

2 2

C++ Rust Go

Figure 5.3: BB3.0 CPU usage for sequence test with 10 sequences and 6 000 iterations

14

12

10 10

8 CPU usage (%)

6

5 5

4

C++ Rust Go

Figure 5.4: Dell R630 CPU usage for sequence test with 10 sequences and 6 000 iterations

38 5.1. Performance comparison

6

5

4 4 CPU usage (%)

3

2 2 2

C++ Rust Go

Figure 5.5: Dell R6525 CPU usage for sequence test with 10 sequences and 6 000 iterations

40

36.7 35

30 28 25

20

CPU usage (%) 15

10

5 4

0 C++ Rust Go

Figure 5.6: BB2.5 CPU usage for sequence test with 100 sequences and 1 000 iterations

39 5.1. Performance comparison

50 49

42 40

30 CPU usage (%) 20

10 8

C++ Rust Go

Figure 5.7: BB2.0 CPU usage for sequence test with 100 sequences and 1 000 iterations

30 28

25 24

20

15 CPU usage (%)

10

6 5

C++ Rust Go

Figure 5.8: BB3.0 CPU usage for sequence test with 100 sequences and 5 000 iterations

40 5.1. Performance comparison

35 35

30

27 25

CPU usage (%) 20

15

12 10

C++ Rust Go

Figure 5.9: Dell R630 CPU usage for sequence test with 100 sequences and 5 000 iterations

18

16 16

14 14

12

10

8 CPU usage (%)

6

4 3 2

C++ Rust Go

Figure 5.10: Dell R6525 CPU usage for sequence test with 100 sequences and 5 000 iterations

41 5.1. Performance comparison

90 89 90

80

70

60 CPU usage (%) 50

40 39

30 C++ Rust Go

Figure 5.11: BB2.5 CPU usage for sequence test with 1 000 sequences and 400 iterations

100

94.1 94 90

80

70 CPU usage (%)

60

54 50

C++ Rust Go

Figure 5.12: BB2.0 CPU usage for sequence test with 1 000 sequences and 400 iterations

42 5.1. Performance comparison

90

80 81 78

70

60 CPU usage (%) 50

40 37

C++ Rust Go

Figure 5.13: BB3.0 CPU usage for sequence test with 1 000 sequences and 1 500 iterations

80

76

70 69

60 CPU usage (%)

50

42 40

C++ Rust Go

Figure 5.14: Dell R630 CPU usage for sequence test with 1 000 sequences and 1 500 iterations

43 5.1. Performance comparison

70 69

65

60

50

CPU usage (%) 40

30

25

20 C++ Rust Go

Figure 5.15: Dell R6525 CPU usage for sequence test with 1 000 sequences and 1 500 iterations

100 99.9 98

90

80 CPU usage (%) 70

63.5 60

C++ Rust Go

Figure 5.16: BB2.5 CPU usage for sequence test with 10 000 sequences and 25 iterations

44 5.1. Performance comparison

100 99 99.9

90

80 76

CPU usage (%) 70

60

50 C++ Rust Go

Figure 5.17: BB2.0 CPU usage for sequence test with 10 000 sequences and 25 iterations

140

120

100 99

87 80 76

60 CPU usage (%)

40

20

0 C++ Rust Go

Figure 5.18: BB3.0 CPU usage for sequence test with 10 000 sequences and 100 iterations

45 5.1. Performance comparison

100 94 90

82 80 78

70

60

CPU usage (%) 50

40

30

20 C++ Rust Go

Figure 5.19: Dell R630 CPU usage for sequence test with 10 000 sequences and 100 iterations

100 98

95

90 89

85

CPU usage (%) 80 78

75

70

C++ Rust Go

Figure 5.20: Dell R6525 CPU usage for sequence test with 10 000 sequences and 100 iterations

Time measurements In this section the processing time per sequence of each hardware setup for each language is presented. The y-axis is logarithmic since the spread of some data points were high.

46 5.1. Performance comparison

103 s) µ

2 time ( 10

50

30 30

C++ Rust Go

Figure 5.21: BB2.5 Time per sequence for sequence test with 10 sequences and 2 000 iterations

103 s) µ

2

time ( 10

49

30 30

C++ Rust Go

Figure 5.22: BB2.0 Time per sequence for sequence test with 10 sequences and 2 000 iterations

47 5.1. Performance comparison

103 s) µ 102 time (

14 13 12 101

C++ Rust Go

Figure 5.23: BB3.0 Time per sequence for sequence test with 10 sequences and 6 000 iterations

103 s) µ 102 time (

14 14 13 101

C++ Rust Go

Figure 5.24: Dell R630 Time per sequence for sequence test with 10 sequences and 6 000 iterations

48 5.1. Performance comparison

103 s) µ 102 time (

14 13 12 101

C++ Rust Go

Figure 5.25: Dell R6525 Time per sequence for sequence test with 10 sequences and 6 000 iterations

103 s) µ

2 time ( 10

51

31 31

C++ Rust Go

Figure 5.26: BB2.5 Time per sequence for sequence test with 100 sequences and 1 000 iterations

49 5.1. Performance comparison

103 s) µ

time ( 102

51 41 33

C++ Rust Go

Figure 5.27: BB2.0 Time per sequence for sequence test with 100 sequences and 1 000 iterations

103 s) µ 102 time (

15 15 14 101 C++ Rust Go

Figure 5.28: BB3.0 Time per sequence for sequence test with 100 sequences and 5 000 iterations

50 5.1. Performance comparison

103 s) µ 102 time (

16 15 15

101 C++ Rust Go

Figure 5.29: Dell R630 Time per sequence for sequence test with 100 sequences and 5 000 iterations

103 s) µ 102 time (

15 14 14 101

C++ Rust Go

Figure 5.30: Dell R6525 Time per sequence for sequence test with 100 sequences and 5 000 iterations

51 5.1. Performance comparison

103 s) µ time ( 138 113 102

54

C++ Rust Go

Figure 5.31: BB2.5 Time per sequence for sequence test with 1 000 sequences and 400 iterations

103 s) µ

212

time ( 181

102 76

C++ Rust Go

Figure 5.32: BB2.0 Time per sequence for sequence test with 1 000 sequences and 400 iterations

52 5.1. Performance comparison

103 s) µ

time ( 102

52 46

26

C++ Rust Go

Figure 5.33: BB3.0 Time per sequence for sequence test with 1 000 sequences and 1 500 itera- tions

103 s) µ

time ( 102

39 34 28

C++ Rust Go

Figure 5.34: Dell R630 Time per sequence for sequence test with 1 000 sequences and 1 500 iterations

53 5.1. Performance comparison

103 s) µ

102 time (

30 32

18

101 C++ Rust Go

Figure 5.35: Dell R6525 Time per sequence for sequence test with 1 000 sequences and 1 500 iterations

103.6

103.5

103.4 2,346 s) µ 103.3 time (

3.2 10 1,557

1,361 103.1

C++ Rust Go

Figure 5.36: BB2.5 Time per sequence for sequence test with 10 000 sequences and 25 iterations

54 5.1. Performance comparison

103.7

103.6 3,572

103.5 s) µ 103.4

time ( 2,337

103.3

1,624 103.2

103.1 C++ Rust Go

Figure 5.37: BB2.0 Time per sequence for sequence test with 10 000 sequences and 25 iterations

103.3

103.2

s) 3.1

µ 10 time ( 3 1,011 10 987 945

102.9

102.8 C++ Rust Go

Figure 5.38: BB3.0 Time per sequence for sequence test with 10 000 sequences and 100 itera- tions

55 5.1. Performance comparison

103.2

103.1 s) µ 103 time ( 872 102.9 732

651 102.8

C++ Rust Go

Figure 5.39: Dell R630 Time per sequence for sequence test with 10 000 sequences and 100 iterations

103.2

103 s) µ

102.8 time (

464

2.6 412 10 379

102.4 C++ Rust Go

Figure 5.40: Dell R6525 Time per sequence for sequence test with 10 000 sequences and 100 iterations

Memory measurements In this section the results for the memory measurements is shown. The figures show the Memory usage for sequence test for each language on each hardware configuration. The

56 5.1. Performance comparison memory usage shown is based on the maximum memory allocated by the application during the test.

Figure 5.41: Memory usage for sequence test for C++, Go and Rust on BB2.0

57 5.1. Performance comparison

Figure 5.42: Memory usage for sequence test for C++, Go and Rust on BB2.5

Figure 5.43: Memory usage for sequence test for C++, Go and Rust on BB3.0

58 5.1. Performance comparison

Figure 5.44: Memory usage for sequence test for C++, Go and Rust on Dell R630

Figure 5.45: Memory usage for sequence test for C++, Go and Rust on Dell R6525

59 5.1. Performance comparison

5.1.2 Latency test In this section the results of the latency test are presented. This test is designed to measure latency differences of C++, Rust and Go communication libraries. The graphs show how each language performed on each of the different hardware configurations. The y-axis is logarithmic since the spread of some data points were high.

103 s) µ time (

102

74 69 72

C++ Rust Go

Figure 5.46: BB2.5 Latency in latency test for C++, Rust and Go

103 s) µ time (

131 112 113 102

C++ Rust Go

Figure 5.47: BB2.0 Latency in latency test for C++, Rust and Go

60 5.1. Performance comparison

102.8

102.6

102.4

102.2 s) µ

102 time (

101.8

1.6 10 37 32 31 101.4

C++ Rust Go

Figure 5.48: BB3.0 Latency in latency test for C++, Rust and Go

103 s) µ

102 time (

22 22 19

101 C++ Rust Go

Figure 5.49: Dell R630 Latency in latency test for C++, Rust and Go

61 5.1. Performance comparison

104

103 s) µ time ( 102

23

13 12 101

C++ Rust Go

Figure 5.50: Dell R6525 Latency in latency test for C++, Rust and Go

5.1.3 Encode/decode test In this section the results of the encode/decode test is presented. The time measures how long it takes to encode and then decode a GPB message. This test is designed to measure time differences of C++, Rust and Go GPB libraries. The y-axis is logarithmic since the spread of some data points were high.

62 5.1. Performance comparison

106

105 time (ns)

14,078 104 6,250 6,820

C++ Rust Go

Figure 5.51: BB2.5 Encode/decode time per sequence in encode/decode test for C++, Rust and Go

106

105 time (ns)

20,281

4 10 8,657 9,359

C++ Rust Go

Figure 5.52: BB2.0 Encode/decode time per sequence in encode/decode test for C++, Rust and Go

63 5.1. Performance comparison

105 time (ns)

104

4,902 3,392 2,469

C++ Rust Go

Figure 5.53: BB3.0 Encode/decode time per sequence in encode/decode test for C++, Rust and Go

105 time (ns) 104

3,732

1,682 1,336 103

C++ Rust Go

Figure 5.54: Dell R630 Encode/decode time per sequence in encode/decode test for C++, Rust and Go

64 5.2. Development efficiency

105

4

time (ns) 10

2,084

1,193 103 1,052

C++ Rust Go

Figure 5.55: Dell R6525 Encode/decode time per sequence in encode/decode test for C++, Rust and Go

5.2 Development efficiency

In this section the results related to development efficiency is presented. The results are di- vided into three parts: productivity, maintainability and understandability. This is measured on the sequence application for each language. Using Equation 4.2 together with data from Tables 5.5 (with r = 100000 in Equation 2.3), 5.7 and 5.8, the resulting relative development efficiency score can be seen in Table 5.1, where a higher score means a better development efficiency. In the parenthesis is the normalized score for each language compared to the C++ score.

Table 5.1: Combined score development efficiency

Language Productivity Maintainability Index Cognitive Complexity Development Efficiency (Normalized) C++ 1 16.21 20 0.81 (1) Rust 0.98 19.44 16 1.19 (1.47) Go 1.08 27.06 13 2.25 (2.74)

5.2.1 Productivity This section shows the results related to the productivity measurements. The data displayed in Table 5.2 are the weighted averages of the implementation time in C++, Rust and Go, calculated from Equation 4.1 together with responses from the survey in A.2. The survey was answered by 8 experts, of which two answered for both Rust and C++, two answered for both Go and C++, two answered on all three languages and two people answered on only one language. The spread of time estimations for Go was 1-30 hours, for Rust it was 3-24 hours and for C++ it was 4-60 hours. Iˆ(P0) ´ Go and Iˆ(P0) ´ Rust denote the weighted averages for C++ where respondents gave a time estimation for Go and Rust, respectively.

65 5.2. Development efficiency

Table 5.2: Weighted averages of expert estimated implementation time

Iˆ(P0) ´ Go (seconds) Iˆ(P0) ´ Rust (seconds) Iˆ(PGo) (seconds) Iˆ(PRust) (seconds) 119 160 60 120 48 420 46 944

Table 5.3: Average execution time of the application in C++, Rust and Go for 1 iteration of 10 000 sequences for all hardware configurations Language Execution time (seconds) C++ 1.0446 Rust 1.2098 Go 1.5956

Using data from Table 5.2 and 5.3 with Equations 2.4, 2.3, 2.1, 2.2 the Tables 5.4 to 5.6 were created. Note that C++ is not present in Tables 5.4 to 5.6, since it is the base language and therefore has a constant productivity value of 1.

Table 5.4: Data for the productivity metric with r = 50000 Variable Rust Go

E(PL) 1.2098 1.5956 Iˆ(PL) 46944 48420 X 1.29 1.65 ρL 1.28 2.46 eL 0.63 0.65 productivity 1.05 1.34

Table 5.5: Data for the productivity metric with r = 100000 Variable Rust Go

E(PL) 1.2098 1.5956 Iˆ(PL) 46944 48420 X 2.58 3.30 ρL 1.28 2.46 eL 0.86 0.65 productivity 0.98 1.08

Table 5.6: Data for the productivity metric with r = 200000 Variable Rust Go

E(PL) 1.2098 1.5956 Iˆ(PL) 46944 48420 X 5.15 6.59 ρL 1.28 2.46 eL 0.86 0.65 productivity 0.93 0.89

5.2.2 Maintainability In this section the maintainability index score for each implementation is presented. The score is based on the implementation of the application for each language. A score of less than or

66 5.2. Development efficiency equal to 10 is regarded as bad, a score between 10 and 20 is regarded to be moderate and everything above 20 is regarded as good1.

Table 5.7: Maintainability index Language Halstead Volume Cyclomatic Complexity Lines of Code Maintainability index C++ 7049.86 47 207 16.21 Rust 6050.53 34 186 19.44 Go 2890.80 22 125 27.06

5.2.3 Understandability In this section the results for measuring the understandability of the different implementa- tions are presented. This is measured using the cognitive complexity score and presented in the Table 5.8. The lower the score, the easier a program is to understand.

Table 5.8: Cognitive complexity Language Cognitive complexity C++ 20 Rust 16 Go 13

1https://docs.microsoft.com/en-us/visualstudio/code-quality/code-metrics-maintainability-index-range- and-meaning?view=vs-2019

67 6 Discussion

In this chapter the results (Section 6.1) and the method employed (Section 6.2) to get the results are discussed. There is also a section that provides insights about the authors’ personal experiences (Section 6.3) as well as the work in a wider context (Section 6.4), where ethical, societal and environmental aspects are discussed.

6.1 Results

In this section, the results gathered from Chapter 5 is analyzed and discussed.

6.1.1 Performance comparison In this section the results of the different performance metrics will be discussed.

Sequence test As previously mentioned in Section 4.3.1, the purpose of the sequence test was to simulate a real use case for the different applications. Therefore the sequence test results are the most telling of how the different programming languages would behave in an actual implemen- tation at Ericsson. The results show the performance differences on each computer for each language, meaning one could easily discern depending on what hardware is available, what programming language is the best suited. The values discussed in this section are the median values, since they are the most representative values to compare.

Sequence test CPU usage measurements The results from the tests that measured CPU usage (seen in Section 5.1.1) are in line with our predictions and relate well to the theory. When looking at the Figures 5.1-5.5 we see that for 10 sequences there is only 3 to 8 percentage points difference between the different lan- guages. However, Go uses approximately 100% more CPU than both Rust and C++, most likely because of the overhead that automatic garbage collection and more complex runtime system support introduces. When the sequences increase to 100, as seen in Figures 5.6-5.10, the difference in percentage points increase substantially compared to 10 sequences. The CPU usage for C++ is still low, with an increase of 1 to 7 percentage points for the different

68 6.1. Results hardware, while for Rust and Go it increases 12 to 37 percentage points and 12 to 38 respec- tively. The pattern for 1 000 sequences, seen in Figures 5.11-5.15, stays mostly the same as for 100 sequences. Now on the slowest hardware (BB2.0), the upper limit of 100% CPU usage is almost reached for both Rust and Go, while the CPU usage for C++ peaks at 54%. For 10 000 sequences, seen in Figures 5.16-5.20, Go and Rust now have around 100% CPU usage on multiple hardware and the CPU usage for C++ increases by 22 to 53 percentage points. 10 000 sequences is however not a representative test of a real use case, and was designed to be a stress test to measure processing time per sequence more than CPU usage. It is quite clear that C++ is the superior language if low CPU usage is important. What was unexpected when measuring the CPU usage of the different languages was that Go could use more than 100% CPU, even though the program is restricted to one core. This is most likely because Go does not consider using another core to be using more than one core in total, while the origi- nal core is locked in some system call. Technically, Go is only occupying one core at the time. This could however make some results skewed, and it seems that Go behaves unpredictably, since the usage was only above 100% some times. This could be quite problematic for com- panies that want to have good control over what and how many cores a program should use, since there is no apparent way of restricting this.

Sequence test time measurements When looking at the results for the processing time per sequence of the different program- ming languages the results are more unpredictable. We see that for the sequence tests com- prised of just 10 and 100 sequences, seen in Figures 5.21-5.30, both Rust and Go is consistently faster than (or equal) C++ on all hardware (difference increases slightly on ARM hardware) and they are more or less equally fast as each other. But when increasing the amount of se- quences to 1 000, as seen in Figures 5.31-5.35, C++ is instead consistently faster than the other languages on all hardware. The difference between Rust and Go also increases, with Rust be- ing the fastest of the two. Finally, when examining the results for 10 000 sequences in Figures 5.36-5.40, the difference between the languages increase further on some hardware, while on others it decreases. On ARM based hardware, Figure 5.36 and Figure 5.37 illustrates that the difference between C++ and the other languages increases. Rust is 14% to 44% slower than C++ and Go is 72% to 120% slower than C++. The most unpredictable results can be found when looking at the x86 based hardware results with 10 000 sequences, the results in Figure 5.38 suggests that the languages are more or less equally fast, but Go is the fastest. On this hardware, Go is 7% faster than Rust and 4% faster than C++. The Figure 5.39 instead suggests that Go is a lot faster than both C++ and Rust, 34% than C++ and 12% faster than Rust. In the last Figure 5.40 for 10 000 sequences the results are instead the opposite, C++ is the fastest and Go is the slowest. Here C++ is 9% faster than Rust and 22% faster than Go. This could be a result of many different things. Our main theory is that when increasing the amount of sequences, the program builds up a queue of green threads. This queue would for 10 000 sequences be about 10 000 green threads long. So when the green threads have finished wait- ing (either for data or have finished sleeping) they will all get queued up approximately at the same time, so the scheduler has to schedule the threads efficiently. On ARM computers we see that C++ is the only language that is well below 100% CPU usage, while Rust and Go both uses about 100%, suggesting that the scheduler for Rust and Go need more CPU power to properly schedule. If the schedulers have more CPU power, like on the x86 hardware, they are more efficient than the C++ scheduler and will therefore be faster. A secondary theory could be that C++ is always bottlenecked by the speed of the RAM. One can clearly see that C++ never uses 100% CPU, unlike the other languages, and uses the most memory (see CPU and Memory measurements in Section 5.1.1), which could explain why the hardware with slower RAM performs worse with C++. For example, since the Dell R630 computer has the slowest memory of the x86 hardware, this is where C++ has the worst performance. How- ever, this theory is based on very general knowledge of computers and would need to be

69 6.1. Results properly investigated before any conclusions can be made. When we on the other hand have fewer sequences, 10 or 100, the bottleneck is most likely in the eventloop instead, while wait- ing for work. Since C++ does not have native support for this in the same way as Go or Rust, this could be a reason why it is a bit slower on smaller amounts of sequences.

Sequence test memory measurements When looking at the results for memory usage in Figures 5.41-5.45, they are not in line with our predictions. We see that C++ has the highest memory usage, especially when looking at larger amount of sequences, and Rust has the least memory usage. What is surprising with these measurements is how little memory Rust uses. The main theory for this is that the initial stack size for the Rust green threads could be the smallest, or that they adapt dynamically faster than the stacks of C++ or Go. This would mean that if a programmer is concerned with memory consumption, without changes to the initial stack size of green threads, Rust is the best option. The second theory is that because the green threads are only alive for approximately 10 milliseconds, during which they are mostly idle, we believe they do not have time to change their dynamic stack size. The memory consumption for Rust on the ARM based hardware does however look suspiciously low compared to the x86 based hardware, and we have no explanation for this behavior. Since the applications that were run on both ARM and x86 were compiled from the same sources and the memory usage was measured in the same way, we have to assume the data is correct. The third theory is that the data is wrong or that something is wrong with the implemented application. The most logical thing based on the theory would be that all programs use approximately the same amount of memory, since all green threads stacks work approximately the same. However, the Rust compiler strictly enforces different rules, for example ownership which means that RAII is enforced and no unnecessary memory is allocated, which is not the case for C++. Therefore, mistakes with following RAII when programming in C++ is more difficult to find. If the third theory is true, a more experienced programmer would have not committed these mistakes in C++ and Go when creating the programs and the results could potentially be completely different. The last theory to why the memory usage is different between ARM and x86 is the amount of iterations that the applications ran. Since x86 could handle more iterations, the probability of memory spikes was larger than on ARM, meaning that the maximum allocated memory could have been higher on x86.

Latency test Since the different programming languages had different approaches to handling incoming and outgoing messages, which in turn could lead to different latencies when sending and receiving messages, we believed this was an important test. Overall, there are huge differ- ences between the largest and smallest latencies, between 10 µs and 10 000 µs, seen in Figures 5.46-5.50. Since latency depends on many variables, such as high load on computers and/or the connection, this is logical and expected. Looking at the results, it is obvious that there is no discernible difference between the different programming languages, with the exception of the max times for Go. The overall latency of all languages are, more or less, the same on each hardware. The theory why Go has much higher spikes in the latency test is that it is the only language with a garbage collector, and when this is running, it blocks all other work until it is done. The main thing to notice is that Go has higher peaks in latency, meaning that some users can notice a longer response time in some cases. Also, the overall latency for all languages decrease on faster hardware, which is intuitive.

70 6.1. Results

Encode/decode test One important feature for Ericsson was to see the performance of the different programming languages when encoding and decoding GPB packages. The results from the measurements seen in Figures 5.51-5.55 were both surprising and predictable at the same time. The reason for this is that the encoding and decoding is basically just number-crunching which is an area that C++ and Rust should dominate in. One could however expect that since Go was developed by Google and GPB was also developed by Google, the handling of GPB packages in Go should be optimized. However, from our results it is clear that Go falls far behind the other programming languages. Go was approximately twice as slow compared to the competition on all hardware. Due to the fact that we are measuring in nanoseconds, the difference is negligible in most real world scenarios.

6.1.2 Development efficiency In this section the results of the different development efficiency metrics will be discussed. In Table 5.1 the development efficiency score for each of the three languages is presented. From these results it is clear to see that Go has the best score by far, 174% better than C++. Rust does not have quite as good score as Go, but still 47% better than C++. This can be explained by observing the three separate metrics in Table 5.1, where Go has a better score compared to C++ in all three categories. These results are in line with our expectations and experiences, since Go was, in our opinion, by far the easiest of the three languages. However, depending on how one chooses the variable r, i.e. the importance of execution time versus implementation time, we can see from the productivity metric that sooner or later C++ would have a greater score than the other languages. This is exemplified by the results in Table 5.6, where r is set to 200 000.

Productivity Since we wanted to ensure that implementation time and execution time had an equal impact on the productivity of the language, and there was approximately a difference of a factor 105 between implementation time and execution time, we wanted to calculate the productivity for different value of r. r was chosen to be 50 000, 100 000 and 200 000 to see its impact, where an r of 100 000 approximately corresponds to the execution time being equally important as the implementation time. The results for the three choices of r can be seen in Tables 5.4 to 5.6. With r = 50 000, Go has a 34% higher productivity than C++ while Rust only has a 5% higher score. By increasing r to 100 000, the productivity of Go decreases to be 8% better than C++, while Rust is actually 2% worse than C++. Finally, when r is 200 000, both Rust and Go has a lower productivity score than C++, meaning that C++ is more productive when increasing the importance of execution time. It can also be noted that the productivity of Go decreases faster than Rust when r is increased. In other words, when less focus is put on execution time, the more Go gains and when more focus is put on execution time the more C++ gains. Consequently, if only productivity is considered when choosing a programming language it is very hard to motivate using Rust.

Maintainability Index As predicted, from Table 5.7 we can see that Go had the best maintainability index score of 27.06 while C++ had the worst score 16.21 and Rust placing in between with a score of 19.44. The reason for this being that C++ and Rust has a lot of boilerplate code for many of the different actions. This gives programmers more control, but at the cost of increasing the complexity. As an example, in order to read from a socket in C++ a for loop with a nested if had to be implemented, while in Go a single function call to a standard library was sufficient. While Go might do the same thing "under the hood" as the loop implemented in C++, this is

71 6.2. Method nothing that the programmers see nor have to implement. We believe that the abstractions provided by the Go language was the main reason for its high maintainability index score, which is reflected in the lower score in all three sub-metrics comprising the maintainability index. As usual, we find Rust in the middle since the language provides a middle-ground of giving programmers control while still having support from standard libraries. This can be beneficial for developers since they get the choice of using built-in functionality or implement their own.

Understandability From Table 5.8 we see that Go has the lowest score and is therefore the easiest to understand. Based on personal opinions, we agree with the scoring of Go but believe that C++ and Rust should be closer together. Overall, when reading the code for the C++ program and the Rust program, we think that Rust is more difficult to grasp than C++. Where C++ has complex data structures, Rust instead has ownership, lifetime rules and scopes. We do however realise that these opinions might be skewed, since we have previous experience with C++ and not with Rust. Also, Rust introduced ownership and lifetime rules which are new concepts for us and therefore substantially decreased its understandability.

6.2 Method

This section discusses the methods used to gather the results present in Chapter 5. This includes an discussion about the replicability, reliability and validity of the study.

6.2.1 Replicability If someone were to try to replicate the results, there are some aspects that need to be kept in mind. When following the pseudocode in Section 4.2, it is is unlikely that the implementation would be identical to ours. For example, there are more than one way to implement an event loop or poll a socket etc. This in turn could impact both the performance and development efficiency metrics, since the implementations might not be identical to ours. Furthermore, a more experienced developer could have better knowledge of built-in functionality and how to efficiently utilize it as well as better knowledge of coding conventions of the languages, meaning that they might produce better code. Better code will impact the performance met- rics and most likely the development efficiency metric as well.

6.2.2 Reliability We believe the results are quite reliable. However, implementing the applications again, via the pseudocode, would mean that the resulting applications would most likely not be iden- tical to the ones implemented for this thesis. This would lead to different results compared to the results presented here. Although, if one were to implement identical applications, or even use the source code (See Appendix A.3) implemented by us, the results would be the same. The results of the survey are not necessarily repeatable, given that even though one were to contact the same experts, they might not give the same answer in the future. This would mean that the results of the productivity metric would be different and in turn the develop- ment efficiency.

6.2.3 Validity Given the fact that top and time are built-in functionalities in the Linux OS, we are confident that they measured the expected data correctly. We can also be confident that the execution

72 6.2. Method times measured with the C++ library std::high_resolution_clock are correct, since it is a well- established library. The validity of the development efficiency metric on the other hand, is not proven. This is a metric put together for the purpose of getting a value to compare development efficiency of the three languages. Both the sub-metrics that are included and the equation to create the development efficiency score was created for the sole purpose of getting comparable scores. However, it has not been tested if the theoretical scores calculated actually correlate to the real development efficiency of the programming languages and needs to be validated.

6.2.4 Pre study In this section, the results of the pre study will be discussed.

Identifying characteristics The first part of the results is the identified characteristics of a RAN application. These char- acteristics is what we based the rest of the work on, so it was crucial that the characteristics was identified correctly. The approach to identifying these was to conduct a pre study where we analysed an already existing RAN application to find the key fundamental characteris- tics. Once key components and features had been identified we verified these characteristics with experts at Ericsson until we reached an agreement. For this part of the work a lot of compromises had to be made, the most important of these compromises can be seen in the Section 4.1.1. Not all of these compromises were agreed upon together with the supervisors at Ericsson, since they wanted features which we deemed provided no value to the research questions. For example, they wanted us to use the XCM API to facilitate the communication between processes, but since XCM essentially just sets up a TCP connection anyway, and the documentation for XCM was difficult to understand, we deemed using this API unnecessary. One could argue that by making compromises like this, the final application that this thesis is based upon is not an actual RAN application. We agree that is not a fully functioning RAN application, since so many simplifications had to be made to keep the scope of the thesis manageable. However, we believe that the simplified application captures the fundamental characteristics of a real RAN system but it might be on a more conceptual level.

Hardware As presented in Table 4.1 we ended up with five different hardware configurations. Two were ARM based and three were x86 based. By using these computers we got a good spread of dif- ferent CPU speeds, different RAM speeds and amounts and different architectures. However, we did not have the possibility to change the configurations of any of the hardware, meaning that we could not pinpoint exactly what component had the greatest impact on the results. Optimally, we would have run the tests on one hardware configuration, and then change one component at a time and re-run the tests. For example, using the BB2.5 hardware as a baseline, and then change the RAM speed and amount to pinpoint the impact of RAM. The same approach could be used for CPUs, meaning that the configuration should be the same apart from changing the CPU. We also believe that it would have been interesting to have more comparable components on the ARM and x86 based hardware, since RAM speed and amount, CPU core count and CPU clock speed differ greatly. The fact that the program ran in Docker containers on two of the hardware was good since it made sure that the programs could also scale via containers in a cloud environment. If more time and resources were avail- able it would be interesting to try to run the programs both inside a container and outside, and then compare the results to see the impact of containerizing. Another obvious change would be to investigate even more hardware with better spread.

73 6.2. Method

Compiler From the results it was concluded that the default and latest compilers available at Ericsson would be used for each language. However, from the informal literature study one can draw the conclusion that different compilers can mean a great deal for the performance of different programming languages on different hardware. Therefore, in our opinion, a comprehensive investigation of different compiler configurations on different hardware should have been conducted. But because of the effort needed to do so and the lack of time, this was excluded. Also, the latest compilers available at Ericsson were in some cases a few versions behind what was publicly available and some performance differences could therefore be attributed to this.

6.2.5 Implementation Since we are no experts in any of the languages, and even beginners in Rust and Go, there are multiple things to criticize regarding the implementation. Before starting the implemen- tation we decided to have an approach where we focused on giving each language an equal opportunity. Therefore, based on the Figures 4.1 and 4.2 we created pseudocode for us to follow during implementation. Along with the pseudocode, we decided to try to follow the general guidelines found online in both documentation and in community forums and write the code with as many recommended solutions as possible. This way we hopefully utilized the languages built-in support for the different parts of the code and gave each language an as representative solution as possible and stayed away from using the exact same solution in each language. We aimed for providing the best solution in each language, instead of trying to make the code as identical as possible between the languages. The choice of doing so affected the different metrics; the development efficiency was most likely more representative of the different languages, but might be less representative in terms of a strict performance compar- ison. If we would have implemented the applications exactly the same in all languages, then the development efficiency metrics would most likely be the same and performance metrics would give different results. Nevertheless, we believe that these results would be uninterest- ing, since they would not represent a real world scenario where a programmer most likely would, and should, utilize the strengths of each language when implementing the solution. Because we are beginners at both Rust and Go, we realise that we might not have im- plemented the applications in these languages in an optimal manner, and the results might therefore be skewed in C++’s favor. Although an effort was made to minimize this, multiple implementation details could probably have been done better in terms of both performance and development efficiency metrics. If we were to do this study again, an alternative would have been to give experts in the different languages the pseudocode and figures and have them implement the solutions. This way the solutions might better utilize the differences in the languages and provide more accurate results. However, by doing this information that Ericsson expressed was valuable to them would be lost. That information was our personal opinions of how the different languages were perceived by beginners. There is also the prob- lem of finding experts that are equally competent in their respective languages, since this is difficult to measure. A decision was made early on in the implementation phase to ignore handling faults, a decision that was criticized by one of the experts who answered the survey. The expert criticized this decision since in their opinion this is an area where for example, Rust really shines and takes much less time to implement than the other languages. An extension to this thesis would therefore be to implement the handling of faults in all languages to see if the development efficiency metrics would differ.

74 6.2. Method

6.2.6 Performance comparison In this section we discuss the method used to extract results regarding the different perfor- mance metrics. The critical parts that were identified during the pre study was the latency of the communication and the time to handle GPB packages as well as the amount of resources needed for an actual use case of the RAN application, i.e. the sequence test. Therefore we believed that the sequence test, latency test and encode/decode test would give a good cov- erage of the performance differences of the programming languages that were investigated. All measurements for time in the three tests were handled by the test driver. This deci- sion was made since we wanted to make sure that all solutions had the time measured in exactly the same way. If the different programming languages measured the time within the applications, it would increase the complexity (skewing the development efficiency and per- formance metrics) as well as us not being able to make sure that the times are measured in exactly the same way. One aspect to consider is that we actually needed to restrict both Rust and Go to use only one core. It might have been interesting to use their default configurations when doing the performance comparison. However, this would be in conflict with one of the identified char- acteristics and was therefore not tested. Since Go and Rust natively scale to more than one core, they are automatically scaling after load and could therefore, in theory, be used instead of current single-threaded applications. For example, instead of starting a new application to handle higher loads, one lets the application scale natively.

Sequence test The sequence test was the most representative test of a real use case for a RAN application and therefore most metrics were extracted from it. A decision was made to only use three different messages in a sequence that the test driver sent to the application. One alternative was to have more messages, that had other consequences (for example made green threads sleep longer/shorter), but we came to the conclusion that this would not test anything other than the three messages seen in Table 4.3 did. The reason being that since we are just using one CPU core and want to have as many context switches as possible combined with as many green threads in memory as possible, it does not matter if we have different messages or the same ones, the scheduler does the same work. During the creation of the pseudocode in Section 4.2, we considered adding more messages that wrote and read to a file and also communicated with a database et cetera, but we opted against this since it would test context switches in the same manner as the already existing messages and increase the complexity of the application. Originally, we wanted to have the same amount of iterations for ARM and x86 hardware, but due to the limited execution time in the Ericsson test environment, this was not possible. However, we do not believe that this had a significant impact on the results, it only meant that we got fewer data points for CPU usage on ARM based hardware. The argument for this insignificant impact is that the spread of data points for CPU usage is small, and the amount of data points are relatively high. We also wanted to increase the amount of both sequences and iterations more, but because of the same time restraint this was not possible either. A larger amount of iterations and/or sequences could have provided more insights of how the applications written in Rust, Go and C++ scaled. For example, the time results currently suggest Go catches up or even surpass the other languages for higher amount of sequences on x86 hardware and instead fall behind on ARM hardware. We were unable to properly test this theory because of the constraint on execution time. For the sequence tests we measured CPU usage with top and memory usage with the time command. We opted for using top since Ericsson had a built in test bench that provided data when using top. But the data for memory usage with top was unreasonable, so we swapped memory usage measurements to time instead. CPU usage and memory usage could have been

75 6.2. Method measured within the application, using the languages’ built-in functionality for doing so. However, this might have skewed the results due to different implementations of measuring CPU and memory usage and therefore we decided to use top and time to get more comparable data.

Latency test The measurements in the latency test consist of the time to read and send a package. This means that the time measured is not only the time to send something over the network com- bined with the time for the application to react, but the latency of the application as a whole. This could therefore instead be called the round trip delay time, since this is more accurate to what is actually measured. It is however not entirely accurate, since there is probably a slight delay between receiving and sending in the application since we actually read the message on the socket. In hindsight we should instead have sent a message back as soon as the application noticed a change on the socket, since this would minimize the delay inside the application, but still included the delay time for the event loop to notice a change on the socket.

Encode/decode test The GPB package that was encoded and decoded in this test was of a fixed size of 110 bytes. What could have been done differently is varying the size of the GPB package to detect if the size could have an impact on the results. For instance, Go might have performed better on GPB packages of smaller or larger sizes, which is something that would have been considered if we were to do this study again. Another aspect that would be interesting to investigate would be different methods of packaging messages. For example, ASN.1 is another API that is used at Ericsson, which could be compared to GPB. One can however assume that the results would be somewhat similar, assuming that the information is still handled more or less the same by the application.

6.2.7 Development efficiency This section is focused on discussing the different metrics used to measure development ef- ficiency. During the informal literature study we could not find any related work that had done a similar comparison. Our intuitive image of what development efficiency means is that the code should be fast to produce, easy to read, understand and maintain. We wanted to choose metrics that when combined, adhered to our image of development efficiency and provided an holistic view of the programming languages Rust, C++ and Go and therefore we chose productivity, maintainability and understandability. Other quality attributes, such as reliability and security could have been incorporated in the development efficiency metric, but these attributes did not add value to our image of development efficiency. However, for other purposes, they could definitely be worthwhile to compare. To measure maintainability index and cognitive complexity we were forced to use differ- ent tools for different languages since we did not find a universal tool for all three languages. Therefore it is important to keep in mind that the implementation of these different tools might not be the same, and can therefore produce somewhat different results. This is also important to keep in mind when replicating the results in Section 5.2, since some tools might give different results. We were unable to validate the different tools’ implementations, to see if they differed, due to great effort needed to do so. As a measurement for development efficiency we decided to combine productivity, main- tainability index and cognitive complexity into a single score. The idea of the score was to make sure that a better score in either of the included metric increased the score for develop- ment efficiency. Because we included productivity, we get a score that is relative to the base

76 6.2. Method language, i.e. C++ in our case. One improvement that could be made to this score is to in- clude weights for each individual metric. It is however difficult to estimate the importance of each individual component, and we believe that their score indicate their weight well enough to get an holistic view of the languages and a score that can be compared.

Productivity The reason for choosing the productivity metric is because it gives an estimation of the po- tential productivity difference when investigating the suitability of different programming languages for a given problem. This fits well with what we interpreted as valuable for Erics- son and provides value as part of the development efficiency metric. Companies like Ericsson could use a metric like productivity to get an idea of the cost and benefits of changing from language A to language B. However, since the metric depends on the execution time and implementation time of the solution in the programming languages to compare, the actual solutions have to be implemented to get the data. This is very impractical in most situations, since one might want to make a decision about the most beneficial language to choose, be- fore actually implementing the problem in another language. Furthermore, the value of the productivity metric for one problem will most likely not be the same for another problem, since the implementation time and execution time will differ. This means that the produc- tivity metric is not translatable to other problem domains, which makes its value in practice questionable. But, in theory, its value should give a good indication of work product to work effort for a specific problem. Another downside of the productivity metric is the values r, I(P0) and I(PL). The liter- ature gives no good indication of how to choose r, with the only consideration being that r should be larger for programs that are to be run many times without changes to the source code. This means that great consideration must be taken before choosing, since it can substan- tially impact the productivity score, which is illustrated in Section 5.2.1 with varying values of r. If execution time is more important than implementation time, it is difficult to determine if r should be set to 10 or 100. The problem with the I values is that if instead of actually implementing the solution one wants to give an estimation of this value. It is difficult to give a proper estimation of implementation time which can be seen from the results of the survey in Appendix A.2 where different respondents give vastly different estimations. Therefore, the decision to use weighted averages when calculating the estimated times was taken. For example, one person answered that the implementation would take 1 hour in Go and 6 hours in C++, while another one answered 30 hours in Go and 60 hours in C++. So to calculate an average on just the estimated times would have yielded inaccurate time estimations. When instead focusing on the persons who could give an answer about multiple languages, we knew they could more accurately differentiate their skill level together with the differences in the languages and give more proper time estimations. Another decision was made to dis- regard the amount of years of experience, with the argument being that a person with 0 years of experience can be equally capable as a person with 10 years of experience, given that they have solved the same type of problem and have sufficient knowledge of the programming language. Since it was not required to give an estimated implementation time in any lan- guage, we believe that people who gave an estimation for multiple languages, despite their short experience in a language, could still gauge the differences and give comparable esti- mations. This was also the reason for only including persons who gave time estimations in C++ and at least one more language, since a single person can provide comparable estima- tions because they are guaranteed to have interpreted the problem identically. If we would have extracted the data for implementation time again, group interviews would have been conducted. Conducting group interviews would have enabled us to make sure that everyone understood the problem in the same way and then the group of experts could have reached a consensus of the most accurate implementation time. One could also argue that more experts

77 6.2. Method should have been a part of the survey, and preferably with overall higher skill level, to get a more accurate score.

Maintainability We chose to incorporate maintainability because all industrial code will most likely at some point need to be changed, either to correct bugs or add new features. If the code is easy to maintain, then the cost for doing these changes will be lower than if the code was hard to maintain and it is therefore important to have this aspect in consideration. Maintainability index was chosen since it is a widely used metric, for example the IDE VSCode1 has support for calculating this metric. Maintainability index uses three different sub-metrics and there- fore covers multiple different aspects of what makes code maintainable. It can be argued that since the metric was first introduced in 1994, and the sub-metrics even earlier than that, it is not quite as relevant as more recent metrics. On the other hand, since these metrics have been used for more than 25 years, and are still being used, they are both well tested and still relevant.

6.2.8 Understandability Understandability was chosen because when we encountered the cognitive complexity met- ric during the pre study it seemed relevant to this study. At larger companies that have large code bases it is inevitable that multiple people will work with the same code and therefore need to understand it. New employees and consultants are common examples of people who will start to create value faster if the code base is easy to understand. In our opinion, cognitive complexity and maintainability index are closely related in the sense that they benefit from each other. We believe that this is intuitive, since code that is easier to understand is prob- ably easier to maintain and vice versa. One aspect with cognitive complexity is that it was very recently presented, compared to the other metrics. We therefore wanted to validate that this metric works as intended and therefore the paper by Muñoz Barón, Wyrich, and Wagner [39] was used as validation. However, this was the only paper that we could find that vali- dates cognitive complexity which is probably due to the metric still being quite new. What we could find, were existing tools and support in IDEs to calculate this metric, along with positive discussions on forums and blog posts which indicates that the metric can provide value.

6.2.9 Literature As previously mentioned in this thesis, we were not able to find any previous work that was directly related to what we wanted to do. Therefore we had to study different approaches from loosely related work, and create something new ourselves based on this. We did how- ever find multiple papers that were peer-reviewed and published as well as books. Since this thesis was mostly practical it was concluded that a formal literature review would serve little to no purpose in finding answers to the research questions. Instead the focus was on finding related work that presented methods of extracting valid information about how to compare two or more languages against each other, both in terms of performance but also development efficiency. We also found sources that explained RAN which created value when identifying the characteristics and mainly creating a good use case for the tests conducted. A few of the sources are more unofficial, such as blog posts and forums, but they are either written by reliable authors, or the information in them are unable to be found anywhere else. For example, for the programming language Go, it was hard to find published sources for much of the information about the language, instead we were forced to turn to blogs and forums. These sources were however referenced by the language’s encyclopedia itself, and

1https://docs.microsoft.com/en-us/visualstudio/code-quality/code-metrics-values

78 6.3. The authors’ experiences was therefore deemed reliable. Another example is the guide to IPC communication by Kalin [31], which is the foundation for the entire inter-process communication section in the theory chapter. The source is not peer-reviewed but the author was deemed to be legitimate since he had written 7 books in the computer science area. Overall, it would have been valuable to have more nuanced sources. For example, the sec- tions about the different programming languages in Chapter 2 are more or less based purely on the languages’ own websites, and no sources that actually criticise them. Also papers that criticise or discuss the different sub-metrics involved in the development efficiency metric could have been valuable.

6.3 The authors’ experiences

During our discussion with the experts at Ericsson, they expressed an interest in other soft values that we were not able to include in the thesis, for example the perceived difficulties with getting acquainted with new languages like Go and Rust compared to C++. These val- ues could either not be investigated in a formal manner, or it was difficult to find a formal way of doing so. Therefore we wanted to share some of our personal experiences when im- plementing the applications in the three programming languages involved in this thesis. One major aspect that differentiated Rust and Go from C++ was when troubleshooting, in the cases where the official online manuals for the different languages had to be used. Rust and Go had very modern looking pages, with highlighted URLs and intuitively navigated pages. The encyclopedias for C++ on the other hand was difficult to maneuver, unintuitive, and the websites felt unmodern. Where C++ came out ahead was however overall searcha- bility for issues. Since C++ has been around for a lot longer than the other languages, and is more widely used, most if not all of the issues we encountered were easily searched for and answers could be found in different forums. We also felt that the learning curves for Rust and C++ were very different from Go. Even though we had previous experience with C++, the learning curve still felt steep and the imple- mentation in C++ was the implementation that we spent the most time with. There are many different aspects with the language that are unclear to us (for example cross-compilation), and difficult to use, and as previously stated it is widely regarded as one of the hardest languages. As for Rust, we believe it to be the hardest and most unique language of the three. The many programming concepts introduced in the small amount of code we wrote, for example own- ership and lifetimes, gives a hint of the complexity of this language. This is something we have never encountered before and we are still unsure if we have implemented this solution in a correct manner, which also gives a hint of how difficult the language is to troubleshoot. On the other hand, after spending just a couple of hours familiarizing ourselves with the Go syntax and Goroutines we felt we could quickly produce code that worked as intended and was very understandable. Since the code was very understandable it was also much easier to troubleshoot. Since code was executed on both x86 and ARM based hardware in this thesis, we were forced to cross-compile to both platforms. It was quite cumbersome to cross-compile C++ since all the libraries that were used had to be cross-compiled as well. Rust was easier than C++, but still problematic. For Rust the libraries were cross-compiled automatically when included and the Rust compiler had been configured properly. Go was however by far the easiest to cross-compile, since all we had to do was to tell the Go compiler to compile to another architecture, for example just writing GOARCH=ARM and GOOS=linux. This is something to consider as well when deciding programming languages for projects, since the overhead in time and effort of including more libraries in C++ is substantial compared to Go. Overall, from the perspective of two masters’ students, Go is a language with a low learn- ing curve that is fun to program in but with good enough performance for most scenarios.

79 6.4. The work in a wider context

6.4 The work in a wider context

When it comes to the ethical and societal aspects related to this work, there are some aspects that could be relevant and need to be considered. In theory, switching programming lan- guages to one that is more performance efficient, could lead to less energy consumption [41] or utilize the current hardware better, meaning that less hardware is needed. This would then have a positive effect on the societal aspect, since it is better for the environment. In terms of development efficiency, it could have an impact on total time required to de- velop and maintain a product. Higher efficiency would mean lower production time, lower production costs and might decrease the amount of employees needed to develop products. This could mean that fewer developers get hired and increase unemployment rates amongst engineers, which is negative from a societal standpoint but beneficial from a company stand- point. However, it could also mean that companies can keep employment rates at the same level, but instead focus on expanding their portfolio and offer a broader range of products and services to both existing and new customers. Furthermore, stress might be reduced for developers if they can get more work done in a shorter time, which would improve their overall health. Also, higher development efficiency, in the sense that it is easier to start to work with a new code base, might encourage companies to hire more newly graduated de- velopers. This benefits people with no prior working experience to get their first job, which commonly is the hardest. Looking at the cloudification of RAN-applications, there are some aspects that can be considered. For example, hardware will be in a centralized location which potentially means better hardware utilization due to containerization and less overall energy consumption due to for instance more efficient cooling and less required hardware [62]. Another aspect that is environmentally beneficial with centralization is the reduced emissions from transportation when doing maintenance and installation. However, this could lead to less work opportu- nities for the people in charge of maintenance. Hypothetically, centralization could pose a larger security risk than having multiple different sites since then there would only be one location to attack or tamper with instead of several different locations spread over a large area. Moving to a cloud environment could also increase the security risks for the collocated users [3], which is an aspect to consider before cloudification.

80 7 Conclusion and future work

The purpose of this thesis was to provide a holistic view of alternative programming lan- guages to C++ for cloud and embedded deployments. To provide a holistic view, the ar- eas performance and development efficiency were investigated and compared for C++, Go and Rust. Performance was measured using a typical high intensity RAN use case where CPU usage, memory usage, processing time per sequence and latency at runtime was mea- sured and compared. Development efficiency was measured by comparing extracted data of the implementations in each language related to the three different sub-metrics productivity, maintainability and understandability. From this aim three research questions acted as the foundation of this thesis, see Section 1.3. They are focused on comparing the performance and development efficiency of a RAN application written in C++, Rust and Go for different hardware configurations.

7.1 Conclusion

The answers to research question 1 and 3 are interleaved. Depending on what the use case is, different languages are suitable. If the use case is approximately 10 concurrent users, then the choice of language does not matter much due to the low performance differences. If the use case is instead 100 users and slow ARM based hardware, then the choice should be C++. Dif- ferent cases of hardware and number of concurrent users have different suitable languages. However, overall if the hardware is based on ARM architecture, then one should choose C++ since it has the lowest CPU usage and highest processing time per sequence compared to Rust and Go. On x86 based hardware one has to weigh the importance of CPU usage compared to processing time per sequence, since overall C++ has lower CPU usage and in most cases comparable processing time compared to Rust and Go. Overall, Rust and Go instead have a lower processing time and much higher CPU usage compared to C++. In regards to latency the choice of programming language does not result in any noticeable difference. However, what is clear is that slower hardware increases the latency of the application. The language impact on handling GPB packages is extremely low in terms of actual time, although the relative performance of Go is significantly lower than both C++ and Rust. The answer to research question 2 is clear. One can see that Go outperforms the competing languages in all areas except for execution time. This gives Go a much better development

81 7.2. Future work efficiency score, indicating that it is a more understandable, more maintainable and a more productive language than Rust and C++. Overall if one would like to make a decision based on all the variables investigated in this thesis the decision becomes more difficult. To be able to make the decision of choosing a programming language for a RAN application it must be thoroughly evaluated what aspects are important to the use case. If the application is constantly under change or feature heavy and performance is not critical, Go should be the language of choice. If instead performance is critical C++ should be the language of choice. Based on the results, it is hard to motivate a scenario where Rust would be best suited since it has worse performance than C++ and lower development efficiency than Go.

7.2 Future work

Given that few investigations like the one conducted in this thesis could be found, one should consider this to be a first attempt at creating a theoretical framework for comparing program- ming languages’ suitability when developing RAN applications. Therefore there are many different aspects that could be improved or done differently and most of them have been discussed or mentioned previously. We will nevertheless mention a few of them again along with some additional aspects. What could be done as future research is conducting group interviews instead of surveys to get more accurate time estimations, perform the sequence test on different configurations of hardware to pinpoint the performance impact of specific components and focus more on implementing a Minimum Viable Product instead of a Proof of Concept, i.e. with proper error handling. Another possible improvement would be to have experts actually implement the appli- cation and measure the implementation time for the productivity metric, instead of having an estimated time. Furthermore, more aspects of development efficiency could have been included, such as testability and security. In this thesis we only touched upon the cloud as- pects of the programming languages and an improvement would therefore have been to more closely investigate performance in a cloud environment, for example by comparing multiple single core instances of a C++ RAN application to a natively scaling multi-core implementa- tion of a Rust or Go RAN application. In the ideal case, the RAN application would not be a simplified version of the actual system. Instead, it should be an actual RAN application that would be implemented in Rust and Go to get even more comparable results.

82 Bibliography

[1] Ericsson AB. About us. URL: https://www.ericsson.com/en/about-us (visited on 03/19/2021). [2] Albatross. History of C++. URL: https://www.cplusplus.com/info/history/ (visited on 12/17/2020). [3] Mazhar Ali, Samee U. Khan, and Athanasios V. Vasilakos. “Security in cloud comput- ing: Opportunities and challenges”. In: Information Sciences 305 (2015), pp. 357–383. ISSN: 0020-0255. DOI: https://doi.org/10.1016/j.ins.2015.01.025. [4] Luca Ardito, Luca Barbato, Marco Castelluccio, Riccardo Coppola, Calixte Denizet, Sylvestre Ledru, and Michele Valsesia. “rust-code-analysis: A Rust library to analyze and extract maintainability information from source codes”. In: SoftwareX 12 (2020), p. 100635. ISSN: 2352-7110. DOI: https://doi.org/10.1016/j.softx.2020. 100635. URL: https://www.sciencedirect.com/science/article/pii/ S2352711020303484. [5] Luca Ardito, Luca Barbato, Riccardo Coppola, and Michele Valsesia. “Evaluation of Rust code verbosity, understandability and complexity”. In: PeerJ Computer Science 7 (2021), e406. [6] David H Bailey, Eric Barszcz, John T Barton, David S Browning, Robert L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, et al. “The NAS parallel benchmarks summary and preliminary results”. In: Supercomputing’91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing. IEEE. 1991, pp. 158–165. [7] Blaise Barney. POSIX Threads Programming. URL: https://computing.llnl.gov/ tutorials/pthreads/ (visited on 02/02/2021). [8] Richard W. Beane. “VAX 8600”. In: Digital Technical Journal 1 (1985), pp. 13–24. URL: https://vmssoftware.com/docs/dtj-v01-01-aug1985.pdf. [9] Herb Sutter Bjarne Stroustrup. C++ Core Guidelines. URL: https://github.com/ isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#e6- use-raii-to-prevent-leaks (visited on 06/02/2021). [10] Boost.org. Overview. URL: https://www.boost.org/doc/libs/1_68_0/libs/ fiber/doc/html/fiber/overview.html (visited on 02/02/2021).

83 Bibliography

[11] Clay Breshears. The art of concurrency: A thread monkey’s guide to writing parallel applica- tions. " O’Reilly Media, Inc.", 2009. [12] G. Ann Campbell. Cognitive Complexity - A new way of measuring understandability. Tech. rep. Version 1.4. SonarSource S.A., Sept. 2018. URL: https://www.sonarsource. com/docs/CognitiveComplexity.pdf. [13] A. Checko, H. L. Christiansen, Y. Yan, L. Scolari, G. Kardaras, M. S. Berger, and L. Dittmann. “Cloud RAN for Mobile Networks—A Technology Overview”. In: IEEE Communications Surveys Tutorials 17.1 (2015), pp. 405–426. DOI: 10 . 1109 / COMST . 2014.2355255. [14] David Chisnall. The Go programming language phrasebook. Sl: Addison-Wesley Profes- sional, 2012. URL: http://cds.cern.ch/record/1515496. [15] Iskanter-Alexandros Chousainov, Ioannis Moscholios, Panagiotis Sarigiannidis, Alexandros Kaloxylos, and Michael Logothetis. “An analytical framework of a C- RAN supporting random, quasi-random and bursty traffic”. In: Computer Networks 180 (2020), p. 107410. ISSN: 1389-1286. DOI: https://doi.org/10.1016/j.comnet. 2020.107410. [16] Don Coleman, Dan Ash, Bruce Lowther, and Paul Oman. “Using metrics to evaluate software system maintainability”. In: Computer 27.8 (1994), pp. 44–49. [17] Pascal Costanza, Charlotte Herzeel, and Wilfried Verachtert. “Comparing Ease of Pro- gramming in C++, Go, and Java for Implementing a Next-Generation Sequencing Tool”. In: Evolutionary Bioinformatics 15 (2019), p. 1176934319869015. [18] N. Deshpande and Nathaniel Weiss. Analysis of the Go runtime scheduler. 2012. URL: http : / / www1 . cs . columbia . edu / ~aho / cs6998 / reports / 12 - 12 - 11 _ DeshpandeSponslerWeiss_GO.pdf (visited on 03/01/2021). [19] Alan AA Donovan and Brian W Kernighan. The Go programming language. Addison- Wesley Professional, 2015. [20] Caleb Doxsey. Introducing Go: Build reliable, scalable programs. " O’Reilly Media, Inc.", 2016. ISBN: 9781491941959. [21] Ericsson. Mobile radio access networks and 5G evolution. 2020. URL: https : / / www . ericsson.com/en/public- policy- and- government- affairs/5- key- facts-about-5g-radio-access-networks (visited on 11/16/2020). [22] Ericsson. The four key components of Cloud RAN. 2020. URL: https://www.ericsson. com/en/blog/2020/8/the- four- components- of- cloud- ran (visited on 11/16/2020). [23] Pascal Fua and Krzysztof Lis. “Comparing python, go, and c++ on the n-queens prob- lem”. In: arXiv preprint arXiv:2001.02491 (2020). [24] Benchmarks game. The computer language benchmarks game. URL: https : / / benchmarksgame - team . pages . debian . net / benchmarksgame / fastest / rust-gpp.html (visited on 11/16/2020). [25] Pat Gelsinger. “Moore’s Law–The Genius Lives On”. In: IEEE Solid-State Circuits Society Newsletter 11.3 (2006), pp. 18–20. [26] Maurice H. Halstead. Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., 1977. ISBN: 0444002057. [27] Rick Hudson. Getting to Go: The Journey of Go’s Garbage Collector. URL: https://blog. golang.org/ismmkeynote (visited on 03/22/2021).

84 Bibliography

[28] Heyman Hugo and Brandefelt Love. A Comparison of Performance & Implementation Com- plexity of Multithreaded Applications in Rust, Java and C++. 2020. URL: https://www. diva-portal.org/smash/record.jsf?pid=diva2%5C%3A1463855&dswid= 3076. [29] IEEE/ISO/IEC 24765-2017 - ISO/IEC/IEEE International Standard - Systems and software engineering–Vocabulary. 2nd ed. IEEE, 2017, p. 260. [30] IEEE/ISO/IEC International Standard - Information technology Portable Operating System Interface (POSIX(TM)) Base Specifications, Issue 7. 7th ed. IEEE, 2009. ISBN: 0-7381-6032- 6. [31] Marty Kalin. A guide to inter-process communication in Linux. URL: https : / / opensource.com/sites/default/files/gated-content/inter-process_ communication_in_linux.pdf (visited on 01/29/2021). [32] Ken Kennedy, Charles Koelbel, and Robert Schreiber. “Defining and Measuring the Productivity of Programming Languages.” In: IJHPCA 18 (Jan. 2004), pp. 441–448. [33] Steve Klabnik and Carol Nichols. The Rust Programming Language (Covers Rust 2018). No Starch Press, 2019. [34] Sagonas Konstantinos, Stavrakakis Chris, and Tsiouris Yiannis. “ErLLVM : an LLVM backend for Erlang.” In: Erlang workshop (2012), pp. 21–32. ISSN: 978-1-4503-1575-3. [35] Oliver Kowalke. Distinguishing coroutines and fibers. 2014. URL: http://www.open- std.org/jtc1/sc22/wg21/docs/papers/2014/n4024.pdf. [36] Roger S. Machado, Ricardo B. Almeida, Andre . Jardim, Ana M. Pernas, Adenauer C. Yamin, and Gerson Geraldo H. Cavalheiro. “Comparing Performance of C Compilers Optimizations on Different Multicore Architectures.” In: 2017 International Symposium on Computer Architecture and High Performance Computing Workshops (2017), pp. 25–30. ISSN: 978-1-5386-4819-3. [37] T. J. McCabe. “A Complexity Measure”. In: IEEE Transactions on Software Engineering SE-2.4 (1976), pp. 308–320. DOI: 10.1109/TSE.1976.233837. [38] Suejb Memeti, Lu Li, Sabri Pllana, Joanna Kołodziej, and Christoph Kessler. “Bench- marking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Per- formance, and Energy Consumption”. In: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing. ARMS-CC ’17. Washington, DC, USA: Association for Computing Machinery, 2017, pp. 1–6. ISBN: 9781450351164. DOI: 10.1145/3110355.3110356. [39] Marvin Muñoz Barón, Marvin Wyrich, and Stefan Wagner. “An empirical validation of cognitive complexity as a measure of source code understandability”. In: Proceed- ings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 2020, pp. 1–12. [40] A. Okic and A. E. C. Redondi. “Optimal Resource Allocation in C-RAN through DSP Computational Load Forecasting”. In: 2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications. 2020, pp. 1–7. DOI: 10 . 1109 / PIMRC48278.2020.9217268. [41] Rui Pereira, Marco Couto, Francisco Ribeiro, Rui Rua, Jácome Cunha, João Paulo Fer- nandes, and João Saraiva. “Energy efficiency across programming languages: how do energy, time, and memory relate?” In: Proceedings of the 10th ACM SIGPLAN Interna- tional Conference on Software Language Engineering. 2017, pp. 256–267. [42] D. Pompili, A. Hajisami, and T. X. Tran. “Elastic resource utilization framework for high capacity and energy efficiency in cloud RAN”. In: IEEE Communications Magazine 54.1 (2016), pp. 26–32. DOI: 10.1109/MCOM.2016.7378422.

85 Bibliography

[43] RAII. URL: https://doc.rust-lang.org/rust-by-example/scope/raii. html (visited on 01/19/2021). [44] Kai Sachs, Samuel Kounev, Jean Bacon, and Alejandro Buchmann. “Performance eval- uation of message-oriented middleware using the SPECjms2007 benchmark”. In: Per- formance Evaluation 66.8 (2009), pp. 410–434. [45] Doug Serfass and Peiyi Tang. “Comparing Parallel Performance of Go and C++ TBB on a Direct Acyclic Task Graph Using a Dynamic Programming Problem”. In: Proceedings of the 50th Annual Southeast Regional Conference. ACM-SE ’12. Association for Comput- ing Machinery, 2012, pp. 268–273. ISBN: 9781450312035. DOI: 10 . 1145 / 2184512 . 2184575. [46] Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne. Operating system concepts. Wiley, 2018. ISBN: 9781119439257. [47] Riteek Srivastav. A complete journey with Goroutines. Apr. 2018. URL: https://medium. com / @riteeksrivastava / a - complete - journey - with - goroutines - 8472630c7f5c (visited on 02/27/2021). [48] Bjarne Stroustrup. A Tour of C++. Addison-Wesley Professional, 2018. [49] Bjarne Stroustrup. “Foundations of C++”. In: European Symposium on Programming. Springer. 2012, pp. 1–25. [50] Bjarne Stroustrup. The C++ programming language. Pearson Education India, 2000. [51] Minyoung Sung, Soyoung Kim, Sangsoo Park, Naehyuck Chang, and Heonshik Shin. “Comparative performance evaluation of Java threads for embedded applications: Linux Thread vs. Green Thread”. In: Information Processing Letters 84.4 (2002), pp. 221– 225. ISSN: 0020-0190. DOI: https://doi.org/10.1016/S0020-0190(02)00286- 7. [52] Jianhua Tang, Wee Peng Tay, Tony QS Quek, and Ben Liang. “System Cost Minimiza- tion in Cloud RAN With Limited Fronthaul Capacity”. In: IEEE Transactions on Wireless Communications 16.5 (2017), pp. 3371–3384. DOI: 10.1109/TWC.2017.2682079. [53] Meixia Tao, Erkai Chen, Hao Zhou, and Wei Yu. “Content-Centric Sparse Multicast Beamforming for Cache-Enabled Cloud RAN”. In: IEEE Transactions on Wireless Com- munications 15.9 (2016), pp. 6118–6131. DOI: 10.1109/TWC.2016.2578922. [54] Ian Lance Taylor. The Go Blog. URL: https : / / blog . golang . org / generics - proposal (visited on 05/05/2021). [55] The Go Team. Why did you create a new language? URL: https://golang.org/doc/ faq#creating_a_new_language (visited on 05/31/2021). [56] The Go Team. Why goroutines instead of threads? URL: https://golang.org/doc/ faq#goroutines (visited on 02/02/2021). [57] The Rust Compiler Team. Overview of the Compiler. URL: https : / / rustc - dev - guide.rust-lang.org/overview.html (visited on 02/15/2021). [58] The Rust Core Team. Announcing Rust 1.0. May 15, 2015. URL: https://blog.rust- lang.org/2015/05/15/Rust-1.0.html (visited on 01/19/2021). [59] The Rust Core Team. Frequently Asked Questions. URL: https://prev.rust-lang. org/en- US/faq.html%5C#is- this- project- controlled- by- mozilla (visited on 01/19/2021). [60] The Rust Core Team. Why rust? URL: https://www.rust-lang.org/ (visited on 01/19/2021). [61] TIOBE. TIOBE Index for December 2020. 2020. URL: https : / / www . tiobe . com / tiobe-index/ (visited on 12/17/2020).

86 Bibliography

[62] J Whitney and J Kennedy. “Is cloud computing always greener?” In: Natural Resources Defense Council, New York (2012). [63] Anthony Williams. C++ concurrency in action. Vol. 2. Manning Publications, 2019.

87 A Appendix

A.1 Time estimation survey

In this appendix the survey used to get time estimations from experts is shown. The survey contains an example program for which experts should give a time estimation on how long the program would take to implement. The experts could give an estimation in all three languages, and they were also asked what their perceived skill level was for Go, Rust and C++.

88 2021-04-15 Enkät tidsuppskattning

Enkät tidsuppskaning Du ska implementera ett program som skall kommunicera med en testdrivare via en TCP socket. Testdrivaren kommer skicka upp till 10 000 sekvenser, bestående av 3 meddelanden vardera, till programmet. Programmet skall hantera inkommande meddelanden samtidigt (concurrently) på endast en tråd med hjälp av green threads. Ditt program måste meddela testdrivaren när ett meddelande har blivit hanterat genom att kommunicera tillbaka till testdrivaren vilket meddelande och för vilken sekvens den just hanterat. Programmet skall även kunna begära data från testdrivaren asynkront. Eftersom att programmet är enkeltrådat, måste all kommunikation vara asynkron för att inte blockera hantering av andra meddelanden samtidigt. Programmet avslutas när meddelandet "exit" skickas från testdrivaren. När du studerar problemet och skall ge en tidsuppskattning, ignorera saker som felhantering och att hantera alla möjliga scenarion, tänk istället "do the simplest thing that could possibly work".

Även om man inte har erfarenhet av gröna trådar bör en uppskattning kunna ges av hur lång tid det tar att sätta sig in i. Går att likna med bibliotek för multi-trådning, dock med en del olikheter. Program att implementera: ● Programmet ska vara enkeltrådat. ● TCP kommunikation via sockets (endast en server och en klient). ● Ta emot 3 olika typer av meddelanden (meddelanden innehåller vilken procedur som ska startas, samt ID för den sekvens som “skickade” meddelandet) där varje meddelande får sin egna gröna tråd (t.ex Tokio::task i rust, Goroutines i Go, Boost::fibers i C++) . När ett meddelande har blivit hanterat, så ska programmet meddela "rätt" sekvens i testdrivaren.

Meddelandena som ska tas emot och hanteras i programmet är följande: ○ Meddelande 1: Setup, gör en “AP” (dvs skicka ett asynkront meddelande till testdrivaren där man begär data) och sedan vänta på svar utan att blockera huvudtråden. ○ Meddelande 2: Sleep, enkelt meddelande som endast söver sin gröna tråd, utan att blockera huvudtråden, och vaknar efter ca 10 ms. ○ Meddelande 3: Teardown, rensar allt allokerat minne relaterat till sekvensen (om det skulle behövas). Minne kan ha allokerats för att hantera routing av data, eller av andra anledningar. ● Testdrivaren behöver inte implementeras.

Meddelandeflödet för en sekvens bör se ut ungefär som i bild 1.

Den första siffran som skickas från testdrivaren är alltid vilken typ av meddelande (1 = setup, 3 = sleep, 9 = teardown) och efterföljande siffror är ID:t för sekvensen. Programmet skickar ett meddelande (2 = begär data) för att begära data från testdrivaren. Bild 1 illustrerar ett exempel på kommunikationen mellan testdrivaren och programmet för en sekvens med ID=11. Programmet ska hantera upp till 10 000 aktiva gröna trådar, dvs ID upp till 10 000. En översiktlig bild av systemet syns i bild 2. “State of procedure”, i bild 2, kan vara t.ex om en grön tråd körs/väntar.

* Required

https://docs.google.com/forms/d/1lfhh8B37AiXFBJk2JyZkbaJPfHUL8TJxrng0xp6orZo/edit 1/6 2021-04-15 Enkät tidsuppskattning

Bild 1

https://docs.google.com/forms/d/1lfhh8B37AiXFBJk2JyZkbaJPfHUL8TJxrng0xp6orZo/edit 2/6 2021-04-15 Enkät tidsuppskattning

Bild 2

https://docs.google.com/forms/d/1lfhh8B37AiXFBJk2JyZkbaJPfHUL8TJxrng0xp6orZo/edit 3/6 2021-04-15 Enkät tidsuppskattning

1. Har du erfarenhet av gröna trådar? *

Mark only one oval.

Ja

Nej

Other:

2. Har du löst ett liknande problem som ovan någon gång tidigare? *

Mark only one oval.

Ja

Nej

Other:

3. Hur många års erfarenhet har du av utveckling i Rust? (Skriv 0 om du saknar erfarenhet) *

4. Hur bra skulle du uppskatta att du är på utveckling i Rust? *

Mark only one oval.

1 2 3 4 5 6 7 8 9 10

Aldrig utvecklat i språket tidigare Expert

5. Hur många timmar skulle du uppskatta att det skulle ta att implementera ovanstående program i Rust?

6. Hur många års erfarenhet har du av utveckling i Go? (Skriv 0 om du saknar erfarenhet) *

https://docs.google.com/forms/d/1lfhh8B37AiXFBJk2JyZkbaJPfHUL8TJxrng0xp6orZo/edit 4/6 2021-04-15 Enkät tidsuppskattning

7. Hur bra skulle du uppskatta att du är på utveckling i Go? *

Mark only one oval.

1 2 3 4 5 6 7 8 9 10

Aldrig utvecklat i språket tidigare Expert

8. Hur många timmar skulle du uppskatta att det skulle ta att implementera ovanstående program i Go?

9. Hur många års erfarenhet har du av utveckling i C++? (Skriv 0 om du saknar erfarenhet) *

10. Hur bra skulle du uppskatta att du är på utveckling i C++? *

Mark only one oval.

1 2 3 4 5 6 7 8 9 10

Aldrig utvecklat i språket tidigare Expert

11. Hur många timmar skulle du uppskatta att det skulle ta att implementera ovanstående program i C++?

12. Övriga tankar/funderingar. Kan t.ex vara om du kände att uppgiftsbeskrivningen var bristfällig så diverse antaganden behövde göras, eller liknande.

This content is neither created nor endorsed by Google.

Forms https://docs.google.com/forms/d/1lfhh8B37AiXFBJk2JyZkbaJPfHUL8TJxrng0xp6orZo/edit 5/6 A.2. Time estimation survey results

A.2 Time estimation survey results

In this appendix the results of the survey used to get time estimations from experts is shown.

94 Hur många Hur många Har du löst Hur många års Hur många timmar timmar skulle Hur många års timmar skulle Har du ett liknande erfarenhet har du Hur bra skulle skulle du uppskatta Hur många års Hur bra skulle du uppskatta att erfarenhet har Hur bra skulle du uppskatta att erfarenh problem som av utveckling i du uppskatta att att det skulle ta att erfarenhet har du av du uppskatta det skulle ta att du av utveckling du uppskatta det skulle ta att et av ovan någon Rust? (Skriv 0 du är på implementera utveckling i Go? (Skriv att du är på implementera i C++? (Skriv 0 att du är på implementera Övriga tankar/funderingar. Kan t.ex vara om du kände att gröna gång om du saknar utveckling i ovanstående 0 om du saknar utveckling i ovanstående om du saknar utveckling i ovanstående uppgiftsbeskrivningen var bristfällig så diverse antaganden behövde trådar? tidigare? erfarenhet) Rust? program i Rust? erfarenhet) Go? program i Go? erfarenhet) C++? program i C++? göras, eller liknande.

Ja Ja 0 1 0 1 20 9 20 Jag är lite osäker på definitionen av gröna trådar men vid tidsuppskattningen ovan har jag antagit att gouroutines samt något Ja Ja 0 1 1 6 30 5 7 60 boost lib för concurrency av c++ implementationen ok att använda.

Ja Ja 0 3 6 7 9 1 0 3 6

Ja Ja 3 6 3 0 2 9 7 4

Ja Nej 1.5-2 8 4 <0.5 2 <0.5 3 Det tog ett tag att förstå sekvensdiagrammet (Jag var kanske lite ivrig att komma dit och läste text innan lite slarvigt). Är det fel på de 2 sista Response 111? Ska det vara Response 311 resp Response 911? Kanske skulle dela upp i 2 tidsuppskattningar oxå. Dels den som redan finns, och dels givet att den sekvensen redan är implementad, hur lång tid skulle det ta att utöka den implementationen med ytterligare en "AP" (AP behöver kanske förklaras för de som inte har Ja Ja 0 1 - 2 2 16 30 8 24 PDTC bakgrund, tex Mattias Åkervik) För ett relativt litet program (PoC) så kommer tidsestimeringen variera ganska mycket beroende på hur vad personen är av tech:en och ev. frameworks (t.ex. boost för cpp, och tokio för rust). Har Ja Ja 1 5 24 0 1 8 8 32 förutsatt att personen i fråga har jobbat med allt detta tidigare. Tyvärr så kan jag inte säga att jag hittar någon del av problemet som tydligt skiljer de olika problemen åt men sen kan jag tyvärr inte Go och C++ nog bra för att kunna jämföra de ordentligt. I och med att alla språken (så vitt jag vet) har bibliotek/stöd för både gröna trådar och channels för kommunikation så ser jag det som att de komplexa delarna kan hanteras av de libsen och att den kvarstående komplexiteten blir att hantera tråd-state och mellan- tråd-kommunikation, något som kommer se väldigt liknande ut om man använder channels för det.

Jag har gjort antagande att setup alltid körs före någon annan instruktion och att man därför inte behöver hantera "felaktiga" instruktioner. Dessutom har jag antagit att språket inte är nytt för programmeraren, utan att hen redan kan språket och inte behöver Ja Ja 5 9 16 0 3 16 0 2 16 lägga tid på det med. A.3. Source code

A.3 Source code

In this appendix all the source code for each language implementation of the application as well as the test drivers is shown.

A.3.1 C++ application source code In this appendix the source code for the C++ application is presented.

Listing A.1: Header file for sequence test application C++ // // Created by eaasrkr on 2021−01−22. //

# ifndef MASTERTHESIS_SEQUENCE_H # define MASTERTHESIS_SEQUENCE_H

# include # include # include s t r u c t Sequence { std :: vector sequence ; i n t id ; i n t current_message = 0; bool running = f a l s e ; bool done = f a l s e ; std :: chrono :: time_point time_of_send ; };

# endif / / MASTERTHESIS_SEQUENCE_H

Listing A.2: Sequence test application C++ # include # include < c s t d l i b > # include # include # include # include # include < p o l l . h> # include # include # include # include # include using namespace std ; enum Incoming {SETUP = ’1’, RESPONSE = ’2’, CP3 = ’3’, CP4 = ’4’, TEARDOWN = ’ 9 ’ } ; s t r u c t routing_data { i n t sequence_id {};

96 A.3. Source code

bool received_data = f a l s e ; boost :: fibers ::mutex io_mutex; boost:: fibers :: condition_variable condition; }; s t r u c t routing_data_list_mutex { boost:: fibers ::mutex list_mutex; std :: vector routing_data_list; }; routing_data_list_mutex rdata_list ; void setup ( i n t seq_id , s t r u c t bufferevent * bev ) { auto * rdata = new routing_data ; rdata−>sequence_id = seq_id; rdata_list . routing_data_list .push_back(rdata); std::string request = "\r2" + std::to_string(seq_id) + "\n"; bufferevent_write(bev, request.c_str() , request.size());

std ::unique_lock< boost:: fibers ::mutex > lk( rdata−>io_mutex); while (!rdata−>received_data) rdata−>condition.wait(lk) ;

std:: string done = "\r1" + std:: to_string(rdata−>sequence_id) + "\n" ; bufferevent_write(bev, done.c_str() , done.size()); } void cp3 ( i n t seq_id , s t r u c t bufferevent * bev ) { boost :: this_fiber :: sleep_for(std ::chrono:: milliseconds(10)); std::string done = "\r1" + std::to_string(seq_id) + "\n"; bufferevent_write(bev, done.c_str() , done.size()); }

void init_boost_fiber( void ( * f ) ( int , s t r u c t bufferevent * ), i n t i , s t r u c t bufferevent * bev ) { boost::fibers::fiber fib(boost::fibers::launch::dispatch, f, i, bev ) ; fib.detach() ; } void teardown ( i n t seq_id , s t r u c t bufferevent * bev ) { for ( i n t i = 0; i < rdata_list.routing_data_list.size(); i++) { i f (rdata_list .routing_data_list[i]−>sequence_id == seq_id) { delete (rdata_list.routing_data_list[i]) ; rdata_list .routing_data_list.erase( rdata_list.routing_data_list.begin() + i); std::string done = "\r1" + std::to_string(seq_id) + "\n "; bufferevent_write(bev, done.c_str() , done.size()); break ; }

97 A.3. Source code

} } void handle_response( i n t seq_id ) { for (routing_data * data: rdata_list.routing_data_list) { i f (data−>sequence_id == seq_id) { std::unique_lock< boost:: fibers ::mutex > lk( data−> io_mutex ) ; data−>received_data = true ; data−>condition .notify_one () ; return ; } } } void handle_message(std :: string message, s t r u c t bufferevent * bev ) { i f (message == "exit") { std::cout << "Client exiting ...\n"; auto * base = bufferevent_get_base(bev); event_base_loopexit(base , nullptr); }

switch (message.at(0)) { case SETUP : init_boost_fiber(setup , std :: stoi(message.substr(1) ) , bev ) ; break ; case RESPONSE : handle_response(std :: stoi(message.substr(1))) ; break ; case CP3 : init_boost_fiber(cp3, std :: stoi(message.substr(1)) , bev ) ; break ; case TEARDOWN: init_boost_fiber(teardown, std :: stoi(message.substr ( 1 ) ) , bev ) ; break ; default : break ;

}

}

s t a t i c void set_tcp_no_delay(evutil_socket_t fd) { i n t one = 1 ; setsockopt ( fd , IPPROTO_TCP , TCP_NODELAY, &one , sizeof one ) ; evutil_make_socket_nonblocking(fd) ;

98 A.3. Source code

}

s t a t i c void boost_timeoutcb(evutil_socket_t fd, short what , void * arg ) { boost:: this_fiber :: yield(); } string not_finished_string = ""; bool start_read = f a l s e ; void readcb ( s t r u c t bufferevent * bev , void * arg ) { char tmp [ 1 2 8 ] ; s i z e _ t n ; for (;;){ string message = not_finished_string; n = bufferevent_read(bev, tmp, sizeof ( tmp ) ) ; i f ( n <= 0) { break ; }

for ( i n t i = 0; i

else i f (start_read) { message += tmp[i ]; } } not_finished_string = message; } }

s t a t i c void eventcb ( s t r u c t bufferevent * bev , short events , void * ptr ) { i f ( events & BEV_EVENT_CONNECTED) { evutil_socket_t fd = bufferevent_getfd(bev); set_tcp_no_delay(fd) ; } else i f ( events & BEV_EVENT_ERROR) { printf ("NOT Connected\n" ) ; } }

99 A.3. Source code

i n t main ( i n t argc , char ** argv ) { boost :: fibers :: use_scheduling_algorithm() ; s t r u c t event_base * base ; s t r u c t sockaddr_in sin {}; s t r u c t event * boost_evtimeout ; s t r u c t timeval boost_timeout{}; i n t port = 8080;

boost_timeout.tv_sec = 0; boost_timeout.tv_usec = 1000;

base = event_base_new() ;

boost_evtimeout = event_new(base , −1, EV_PERSIST, boost_timeoutcb , nullptr); event_add(boost_evtimeout , &boost_timeout) ;

memset(&sin , 0, sizeof ( sin ) ) ; sin.sin_family = AF_INET; sin.sin_addr.s_addr = inet_addr("127.0.0.1"); sin.sin_port = htons(port);

s t r u c t bufferevent * bev=bufferevent_socket_new( base , −1 , BEV_OPT_CLOSE_ON_FREE) ;

bufferevent_setcb(bev, readcb, nullptr , eventcb, nullptr); bufferevent_enable ( bev , EV_READ|EV_WRITE) ;

bufferevent_socket_connect(bev ,( s t r u c t sockaddr * )&sin , sizeof ( sin ) ) ;

boost :: fibers :: fiber mainFiber(event_base_dispatch , base); mainFiber. join () ;

bufferevent_free(bev) ; event_free(boost_evtimeout) ; event_base_free(base) ; return 0 ; }

Listing A.3: Latency test application C++ # include # include < c s t d l i b > # include # include # include

100 A.3. Source code

# include < p o l l . h> # include # include # include # include # include

void handle_message(std :: string message, s t r u c t bufferevent * bev ) { i f (message == "exit") { std::cout << "exiting...\n"; auto * base = bufferevent_get_base(bev); event_base_loopexit(base , nullptr); } else i f (message == "reset") { std::cout << "resetting...\n";

} } using namespace std ; int64_t total_bytes_read = 0; int64_t total_messages_read = 0; s t a t i c void set_tcp_no_delay(evutil_socket_t fd) { i n t one = 1 ; setsockopt ( fd , IPPROTO_TCP , TCP_NODELAY, &one , sizeof one ) ; evutil_make_socket_nonblocking(fd) ; }

void readcb_ping_pong( s t r u c t bufferevent * bev , void * arg ) { char tmp [ 1 2 8 ] ; s i z e _ t n ; for (;;){ n = bufferevent_read(bev, tmp, sizeof ( tmp ) ) ; i f ( n <= 0) break ; / * No more data. * /

total_bytes_read += n;

std :: string message; for ( i n t i=0; i

101 A.3. Source code

i f (isalnum(tmp[i])) { message += tmp[i ]; } } } }

s t a t i c void eventcb ( s t r u c t bufferevent * bev , short events , void * ptr ) { i f ( events & BEV_EVENT_CONNECTED) { evutil_socket_t fd = bufferevent_getfd(bev); set_tcp_no_delay(fd) ; } else i f ( events & BEV_EVENT_ERROR) { printf ("NOT Connected\n" ) ; } }

i n t main ( i n t argc , char ** argv ) { s t r u c t event_base * base ; s t r u c t sockaddr_in sin {};

i n t port = 8080;

base = event_base_new() ; i f ( ! base ) { puts("Couldn’t open event base " ) ; return 1 ; }

memset(&sin , 0, sizeof ( sin ) ) ; sin.sin_family = AF_INET; //sin.sin_addr.s_addr = inet_addr("10.120.207.7"); sin.sin_addr.s_addr = inet_addr("127.0.0.1"); sin.sin_port = htons(port);

s t r u c t bufferevent * bev = bufferevent_socket_new( base , −1 , BEV_OPT_CLOSE_ON_FREE) ;

bufferevent_setcb(bev, readcb_ping_pong , nullptr , eventcb , n u l l p t r ) ; bufferevent_enable ( bev , EV_READ|EV_WRITE) ;

i f (bufferevent_socket_connect(bev ,( s t r u c t sockaddr * )&sin , sizeof ( sin ) ) < 0) {

102 A.3. Source code

/ * Error starting connection * / bufferevent_free(bev) ; puts("error connect " ) ; return −1; }

event_base_dispatch(base) ;

bufferevent_free(bev) ; event_base_free(base) ;

printf ("%zd t o t a l bytes read\n" , total_bytes_read); printf ("%zd t o t a l messages read\n" , total_messages_read) ; printf("%.3f average messages s i z e \n" , ( double )total_bytes_read / total_messages_read);

return 0 ; }

Listing A.4: Encode/decode test application C++ # include # include < c s t d l i b > # include # include # include # include # include # include # include # include # include "adressbook.pb.h"

void handle_gpb( const std :: string& body, s t r u c t bufferevent * bev ) { auto start = std ::chrono:: high_resolution_clock ::now() ; gpb:: AddressBook addressBook; addressBook. ParseFromString(body) ; addressBook. SerializeAsString () ; auto end = std ::chrono:: high_resolution_clock ::now() ; auto duration = std ::chrono:: duration_cast(end − start).count(); std::string response = "\r" + std::to_string(duration) + ’\a’; bufferevent_write(bev, response.c_str() , response.size()); } void handle_message( const std :: string& message, s t r u c t bufferevent * bev ) { i f (message == "exit") { std::cout << "exiting...\n"; auto * base = bufferevent_get_base(bev); event_base_loopexit(base , nullptr);

103 A.3. Source code

} else i f (message == "reset") { std::cout << "resetting...\n";

} else { handle_gpb(message. substr(1) , bev) ; } } using namespace std ; int64_t total_bytes_read = 0; int64_t total_messages_read = 0; s t a t i c void set_tcp_no_delay(evutil_socket_t fd) { i n t one = 1 ; setsockopt ( fd , IPPROTO_TCP , TCP_NODELAY, &one , sizeof one ) ; evutil_make_socket_nonblocking(fd) ; } void readcb_gpb( s t r u c t bufferevent * bev , void * arg ) { char tmp[65534]; s i z e _ t n ; for (;;){ n = bufferevent_read(bev, tmp, sizeof ( tmp ) ) ; i f ( n <= 0) break ; / * No more data. * /

total_bytes_read += n; std :: string message; bool start_reading = f a l s e ; for ( i n t i=0; i

s t a t i c void eventcb ( s t r u c t bufferevent * bev , short events , void * ptr ) { i f ( events & BEV_EVENT_CONNECTED) {

104 A.3. Source code

evutil_socket_t fd = bufferevent_getfd(bev); set_tcp_no_delay(fd) ; } else i f ( events & BEV_EVENT_ERROR) { printf ("NOT Connected\n" ) ; } }

i n t main ( i n t argc , char ** argv ) {

s t r u c t event_base * base ; s t r u c t sockaddr_in sin {};

i n t port = 8080;

base = event_base_new() ; i f ( ! base ) { puts("Couldn’t open event base " ) ; return 1 ; }

memset(&sin , 0, sizeof ( sin ) ) ; sin.sin_family = AF_INET; //sin.sin_addr.s_addr = inet_addr("10.120.207.7"); sin.sin_addr.s_addr = inet_addr("127.0.0.1"); sin.sin_port = htons(port);

s t r u c t bufferevent * bev = bufferevent_socket_new( base , −1 , BEV_OPT_CLOSE_ON_FREE) ;

bufferevent_setcb(bev, readcb_gpb, nullptr , eventcb , nullptr); bufferevent_enable ( bev , EV_READ|EV_WRITE) ;

i f (bufferevent_socket_connect(bev ,( s t r u c t sockaddr * )&sin , sizeof ( sin ) ) < 0) { / * Error starting connection * / bufferevent_free(bev) ; puts("error connect " ) ; return −1; } event_base_dispatch(base) ;

bufferevent_free(bev) ; event_base_free(base) ;

printf ("%zd t o t a l bytes read\n" , total_bytes_read); printf ("%zd t o t a l messages read\n" , total_messages_read) ; printf("%.3f average messages s i z e \n" , ( double )total_bytes_read / total_messages_read);

105 A.3. Source code

return 0 ; }

106 A.3. Source code

A.3.2 Go application source code In this appendix the source code for the Go application is presented.

Listing A.5: Sequence test application Go package main import ( " bufio " " fmt " " io " " net " " runtime " " s t r i n g s " " sync " " time " " unicode " ) const ( SETUP = ’ 1 ’ RESPONSE = ’ 2 ’ CP3 = ’ 3 ’ TEARDOWN = ’ 9 ’ ) var socket * net . TCPConn type routingData s t r u c t { sequenceId s t r i n g receivedData * bool sync . Mutex cond * sync . Cond } type d a t a L i s t s t r u c t { routingDataList [] * routingData sync . Mutex } var SharedList = new( d a t a L i s t ) func tryRead (wg * sync.WaitGroup) { connbuf := bufio.NewReader(socket) for { msg, err := connbuf.ReadString( ’\n’) s := strings.Split(msg, "\r") msg = strings .TrimFunc(strings .TrimSpace(s[ len ( s ) −1]) , func ( r rune ) bool { return !unicode.IsGraphic(r) }) switch e r r {

107 A.3. Source code

case n i l : handleMessage(msg, wg) case io . EOF : fmt. Println("connection closed " ) return default : return }

} } func setup(seqId s t r i n g ){ data := &routingData{sequenceId:seqId , receivedData: new( bool )} * data.receivedData = f a l s e SharedList .Lock() SharedList.routingDataList = append (SharedList. routingDataList , data) SharedList .Unlock() data.cond = sync.NewCond(data) _, _ = socket.Write([] byte ("\r2" + seqId + "\n"))

data.cond.L.Lock() for ! * data.receivedData { data.cond.Wait() } data.cond.L.Unlock()

_, _ = socket.Write([] byte ("\r1" + seqId + "\n"))

} func handleResponse(seqId s t r i n g ){ for i := range SharedList . routingDataList{ i f SharedList.routingDataList[i ].sequenceId == seqId { * SharedList. routingDataList[i ]. receivedData = true SharedList.routingDataList[i ].cond.Signal() return } } } func cp3 ( seqId s t r i n g ){ time.Sleep(10 * time. Millisecond) _, _ = socket.Write([] byte ("\r1" + seqId + "\n")) } func teardown(seqId s t r i n g ){ for index, value := range SharedList . routingDataList{ i f value.sequenceId == seqId {

108 A.3. Source code

SharedList.routingDataList[index] = SharedList . routingDataList[ len (SharedList. routingDataList)−1] SharedList . routingDataList[ len (SharedList. routingDataList)−1] = n i l SharedList.routingDataList = SharedList. routingDataList [: len (SharedList. routingDataList)−1] break } } _, _ = socket.Write([] byte ("\r1" + seqId + "\n"))

} func handleMessage(message string , wg* sync.WaitGroup) {

i f strings.Contains(message, "exit") { wg. Done ( ) return } else i f strings.Contains(message, "reset") { runtime .GC() return }

sequenceId := message[1:] switch message[0] { case SETUP : go setup(sequenceId) case RESPONSE : go handleResponse(sequenceId) case CP3 : go cp3(sequenceId) case TEARDOWN: go teardown(sequenceId) default : fmt. Println("badness: " + message ) } } func mainLoop() { var wg sync .WaitGroup wg.Add( 1 ) go tryRead(&wg) wg. Wait ( ) fmt.Println("Client exiting ...") } func main ( ) { runtime .GOMAXPROCS( 1 ) addr, _ := net.ResolveTCPAddr("tcp", "127.0.0.1:8080") socket , _ = net.DialTCP("tcp", nil , addr )

109 A.3. Source code

_ = socket.SetNoDelay( true ) defer socket.Close() mainLoop ( ) }

Listing A.6: Latency test application Go package main import ( " bufio " " fmt " " io " " net " " runtime " " s t r i n g s " " sync " " unicode " )

var socket * net . TCPConn func mainLoop() { var wg sync .WaitGroup wg.Add( 1 ) go pingPongLoop(&wg) wg. Wait ( ) } func pingPongLoop(wg * sync.WaitGroup) { // connect to server connbuf := bufio.NewReader(socket) for { msg, err := connbuf.ReadString( ’\n’) s := strings.Split(msg, "\r") msg = s [ len ( s ) −1] msg = strings .TrimSpace(msg) msg = strings .TrimFunc(msg, func ( r rune ) bool { return !unicode.IsGraphic(r) }) switch e r r { case n i l : i f msg == "PING" { _, _ = socket.Write([] byte ( "\r " + " PONG" + "\n" ) ) } case io . EOF : fmt. Println("connection closed " ) wg. Done ( ) return default : fmt.Printf("error: %v\n" , e r r ) wg. Done ( )

110 A.3. Source code

return }

} } func main ( ) { addr, _ := net.ResolveTCPAddr("tcp", "127.0.0.1:8080") socket , _ = net.DialTCP("tcp", nil , addr ) _ = socket.SetNoDelay( true ) defer socket.Close() runtime .GOMAXPROCS( 1 ) mainLoop ( ) }

Listing A.7: Encode/decode test application Go package main import ( " ./ gpb " " bufio " " fmt " "github .com/golang/protobuf/proto" " io " " net " " runtime " " strconv " " s t r i n g s " " sync " " time " )

var socket * net . TCPConn func tryRead (wg * sync.WaitGroup) { connbuf := bufio.NewReader(socket) for { msg, err := connbuf.ReadString(’$’) s := strings.Split(msg, "\r8") msg = s [ len ( s ) −1]

switch e r r { case n i l : i f len (msg) != 0 { start := time.Now() addressBook := gpb.AddressBook{} _ = proto.Unmarshal([] byte (msg) , & addressBook) _, _ = proto.Marshal(&addressBook) duration := time.Since(start).Nanoseconds()

111 A.3. Source code

_, _ = socket.Write([] byte ( "\r " + strconv.FormatInt(duration , 10) + "$")) } case io . EOF : fmt. Println("connection closed " ) wg. Done ( ) return default : fmt.Printf("error: %v\n" , e r r ) wg. Done ( ) return } } } func mainLoop() { var wg sync .WaitGroup wg.Add( 1 ) go tryRead(&wg) wg. Wait ( ) } func main ( ) { addr, _ := net.ResolveTCPAddr("tcp", "127.0.0.1:8080") socket , _ = net.DialTCP("tcp", nil , addr ) _ = socket.SetNoDelay( true ) defer socket.Close() runtime .GOMAXPROCS( 1 ) mainLoop ( ) }

112 A.3. Source code

A.3.3 Rust application source code In this appendix the source code for the Rust application is presented.

Listing A.8: Sequence test application Rust use tokio::io::{ BufWriter , BufReader , AsyncWriteExt , AsyncReadExt}; use tokio::net:: TcpStream ; use tokio ::sync:: Mutex ; use std::sync:: Arc ; use tokio ::sync:: Notify; use tokio : : time :: Duration ; use std :: process :: e x i t ; use tokio :: net :: tcp ::{ OwnedWriteHalf, OwnedReadHalf}; use bytes :: BytesMut; use tokio : : io ;

#[macro_use] extern crate lazy_static ;

#[tokio:: main(flavor="current_thread")] async fn main()−> io :: Result < ( )> {

l e t socket = TcpStream :: connect("127.0.0.1:8080").await?; socket .set_nodelay( true )?;

l e t (rd, wr) = socket.into_split(); l e t wr = BufWriter : : new(wr) ; l e t rd = BufReader : : new( rd ) ;

// Write data in the background l e t wr = Arc : : new( Mutex ::new(wr)) ; tokio ::spawn(async move { event_loop(rd, wr).await.expect("event loop f a i l e d " ) ; } ) . await ? ;

p r i n t l n ! ( " e x i t i n g badly " ) ;

Ok(()) } async fn event_loop(mut rd : BufReader, wr : Arc>>)−> io :: Result < ( )> {

l e t mut start_read = f a l s e ; l e t mut incomplete_string = String ::from(""); loop { l e t mut buf = BytesMut:: with_capacity(128); l e t n = rd.read_buf(&mut buf).await?; l e t mut message = incomplete_string.clone() ; for i in 0 . . n { i f buf [ i ] as char == ’\r’ && message.is_empty() { start_read = true ;

113 A.3. Source code

} else i f buf [ i ] as char == ’\n’ && !message.is_empty() { start_read = f a l s e ; handle_message(message.clone() , wr.clone() ) ; incomplete_string = String ::from(""); message = String ::from(""); } else i f start_read { message.push(buf[ i ] as char ) } } i f !message.is_empty() { incomplete_string = message.clone() ; } tokio :: task ::yield_now() .await; } } s t r u c t RoutingData { sequence_id : String , received_data : Arc } s t r u c t RoutingDataList { d a t a _ l i s t : Vec> } impl RoutingDataList { fn new() −> RoutingDataList {RoutingDataList { data_list: Vec :: new ( ) } } fn i n s e r t (&mut self , data: RoutingData) { s e l f . data_list .push( Box ::new(data)) ; } } lazy_static! { s t a t i c r e f ROUTING_DATA_LST: std : : sync : : Mutex = std::sync:: Mutex ::new(RoutingDataList ::new() ) ; } async fn setup ( id : String , wr : Arc>>) { l e t n o t i f y = Arc ::new(Notify ::new() ) ; l e t rd = RoutingData { sequence_id : id . clone ( ) , received_data: notify.clone() , }; { l e t mut guard = ROUTING_DATA_LST. lock () .unwrap() ; guard. insert(rd); }

114 A.3. Source code

{ l e t mut writable = wr. lock ( ) . await ; l e t buf = String ::from("\r2") + &*id + "\n" ; writable. write_all(buf.as_ref()).await.expect("writing f a i l e d " ) ; writable.flush() .await.expect("flushing f a i l e d " ) ;

}

notify.notified() .await;

{ l e t mut writable = wr. lock ( ) . await ; l e t buf = String ::from("\r1") + &*id + "\n" ;

writable. write_all(buf.as_ref()).await.expect("writing f a i l e d " ) ; writable.flush() .await.expect("flushing f a i l e d " ) ;

} } async fn handle_response( id : String ){ l e t guard = ROUTING_DATA_LST. lock () .unwrap() ; for data in &guard.data_list { i f data.sequence_id == id { data.received_data.notify_one() ; return ; } } } async fn cp3 ( id : String , wr : Arc>>) { tokio : : time : : sleep ( Duration :: from_millis(10)).await; { l e t mut writable = wr. lock ( ) . await ; l e t buf = String ::from("\r1") + &*id + "\n" ; writable. write_all(buf.as_ref()).await.expect("writing f a i l e d " ) ; writable.flush() .await.expect("flushing f a i l e d " ) ; } } async fn teardown ( id : String , wr : Arc>>) { { l e t mut guard = ROUTING_DATA_LST. lock () .unwrap() ; for i in 0..guard.data_list.len() { i f guard. data_list .get(i).unwrap() .sequence_id == id { guard. data_list .remove(i); break ; }

115 A.3. Source code

} } { l e t mut writable = wr. lock ( ) . await ; l e t buf = String ::from("\r1") + &*id + "\n" ; writable. write_all(buf.as_ref()).await.expect("writing f a i l e d " ) ; writable.flush() .await.expect("flushing f a i l e d " ) ; } } fn handle_message(message : String , wr : Arc>>) { i f message == "exit" { p r i n t l n ! ("exiting"); e x i t ( 0 ) ; //Exit eventloop, figure it out } else i f message == "reset" { // println!("resetting"); } l e t id = String ::from(&message[1..]) ; match message.chars() .next() .unwrap() { ’1 ’ => { tokio ::spawn(async move { setup ( id .clone() .parse() .unwrap() , wr).await }); }, ’2’ => {tokio ::spawn(async move { handle_response( id .clone() .parse() .unwrap() ).await }); }, ’3 ’ =>{ tokio ::spawn(async move { cp3 ( id .clone() .parse() .unwrap() , wr).await }); }, ’9 ’ => { tokio ::spawn(async move { teardown ( id .clone() .parse() .unwrap() , wr).await }); },

_ => { } } }

Listing A.9: Latency test application Rust use tokio::io::{ self , AsyncReadExt, AsyncWriteExt}; use tokio::net:: TcpStream ; #[tokio:: main(flavor="current_thread")] async fn main() −> io:: Result < ( )> { l e t socket = TcpStream :: connect("127.0.0.1:8080").await?; l e t (mut rd , mut wr) = io:: split(socket);

116 A.3. Source code

l e t mut buf = vec! [ 0 ; 1 2 8 ] ;

loop {

l e t n = rd.read(&mut buf).await?;

i f n == 0 { break ; }

wr. write(b"\rPONG\n") . await ?;

}

Ok(()) }

Listing A.10: Encode/decode test application Rust mod adressbook ; use tokio::io::{ self , AsyncReadExt, AsyncWriteExt , BufWriter , BufReader }; use tokio::net:: TcpStream ; use std : : s t r ; use tokio :: net :: tcp ::OwnedWriteHalf; use protobuf ::{ Message}; use c r a t e :: adressbook :: AddressBook; use bytes :: BytesMut; use std :: process :: e x i t ; use std : : time :: Instant ;

#[tokio:: main(flavor="current_thread")] async fn main() −> io:: Result < ( )> { l e t socket = TcpStream :: connect("127.0.0.1:8080").await?; socket .set_nodelay( true )?;

l e t (rd, wr) = socket.into_split(); l e t mut wr = BufWriter : : new(wr) ; l e t mut rd = BufReader : : new( rd ) ;

l e t mut start_read = f a l s e ;

loop { l e t mut buf = BytesMut:: with_capacity(65534); l e t n = rd.read_buf(&mut buf).await?; l e t mut message= vec! []; for i in 0 . . n { i f buf [ i ] as char == ’\ r ’ { start_read = true ; } else i f buf [ i ] as char == ’$’ && !message.is_empty() { start_read = f a l s e ; i f s t r :: from_utf8(&* message).unwrap() == "exit" {

117 A.3. Source code

e x i t ( 0 ) ; } handle_gpb(message , &mut wr) . await ; break ; } else i f start_read { message.push(buf[ i ]) } }

} } async fn handle_gpb(message : Vec , wr : &mut BufWriter < OwnedWriteHalf>) { l e t now = Instant : : now ( ) ; l e t address :AddressBook = Message :: parse_from_bytes(&message [1..]) .unwrap() ; l e t _encode = s t r :: from_utf8(&* address. write_to_bytes () .unwrap () ) .unwrap() ; l e t buf = String ::from("\r") + &*now.elapsed() .as_nanos() . to_string() + "$"; wr. write_all(buf.as_ref()).await.expect("writing f a i l e d " ) ; wr.flush() .await.expect("flushing f a i l e d " ) ;

}

118 A.3. Source code

A.3.4 GPB proto source code In this appendix the source code for the GPB proto is presented.

Listing A.11: Prototype file for GPB encode/decode syntax = "proto3"; import "google/protobuf/timestamp.proto" ; package gpb; message Person { string name = 1; i n t 3 2 id = 2 ; // Unique ID number for this person. string email = 3;

enum PhoneType { MOBILE = 0 ; HOME = 1 ; WORK = 2 ; }

message PhoneNumber { string number = 1; PhoneType type = 2; }

repeated PhoneNumber phones = 4; google.protobuf.Timestamp last_updated = 5; }

// Our address book file is just one of these. message AddressBook { repeated Person people = 1; }

119 A.3. Source code

A.3.5 C++ test driver source code In this appendix the source code for the C++ test driver is presented.

Listing A.12: Sequence test driver C++ # include # include # include # include //sockaddr , socklen_t # include # include # include < s t d i o . h> # include "sequence.h" # include < p o l l . h> # include # include # include # include # include # include # include # include # include using namespace std ; i n t completed_sequences = 0; enum Procedure {SETUP = 1 , CP3 = 3 , CP4 = 4 , GPB = ’ 8 ’ , TEARDOWN = 9 } ; vector registered_sequences ; i n t iterations = 0; i n t number_of_sequences = 20; i n t curr_iteration = 1; vector times ; void setup_sequences() { for ( i n t i= 11; i <= number_of_sequences; i++) { vector sequence = {SETUP , CP3 , TEARDOWN} ; auto * seq = new Sequence ; seq−>sequence = sequence; seq −>id = i ; seq−>time_of_send = chrono :: high_resolution_clock ::now() ; registered_sequences .push_back( * seq ) ; } } void handle_response( s t r u c t bufferevent * bev, string response) {

char type = response.at(0); i n t seq_id = stoi(response.substr(1));

120 A.3. Source code

string temp; switch ( type ) { case ’ 1 ’ : // Message done for ( auto &seq: registered_sequences) { i f (seq.id == seq_id) { seq. current_message++; seq.running = f a l s e ; i f (seq.current_message == seq.sequence.size ()) { times .push_back(( double ) chrono : : duration_cast< chrono :: milliseconds >(chrono :: high_resolution_clock ::now() − seq.time_of_send).count()); completed_sequences++; seq . done = true ; } } } break ; case ’ 2 ’ : // Request data temp = ’\r’ + response + ’\n’; bufferevent_write(bev, temp.c_str() , temp.size()); break ; default : cout << "badness 1 0 0 : " << response << endl; } } void set_tcp_no_delay(evutil_socket_t fd) { i n t one = 1 ; setsockopt ( fd , IPPROTO_TCP , TCP_NODELAY, &one , sizeof one ) ; } void signal_cb(evutil_socket_t fd, short what , void * arg ) { auto * base = ( s t r u c t event_base * ) arg ; printf("stop\n");

event_base_loopexit(base , nullptr); } void handle_sequence_times() { std:: sort(times.begin() , times.end()); long max ; long min ; long median ; long Q1 ; long Q3 ; i f (times.size()%2 == 0) { // Even array size

121 A.3. Source code

median = times[times.size()/2]; Q1 = times[times.size()/4]; Q3 = times[times.size() * 3 / 4 ] ; } else { median = (times[floor(times.size()/2)] + times[ceil(times. size()/2)])/2; Q1 = (times[floor(times.size()/4)] + times[ceil(times.size ( ) /4) ] ) /2; Q3 = (times[floor(times.size() *3/4)] + times[ceil(times. s i z e ( ) *3/4) ] ) /2; } max = times[times.size() −1]; min = times[0];

cout << "max TIME : " << max << endl ; cout << "min TIME : " << min << endl; cout << "median TIME : " << median << endl; cout << " q1 TIME : " << Q1 << endl; cout << " q3 TIME : " << Q3 << endl; cout << "Number of times : " << times.size() << endl; } bool e x i t _ s e n t = f a l s e ; void send_cb(evutil_socket_t fd, short what , void * arg ) { auto * bev = ( s t r u c t bufferevent * ) arg ; i f (completed_sequences == registered_sequences.size()) { i f (exit_sent) { return ; } i f (curr_iteration == iterations) { handle_sequence_times () ; auto * base = bufferevent_get_base(bev); string exit = "\rexit\n"; bufferevent_write(bev, exit.c_str() , exit.size()); e x i t _ s e n t = true ; s t r u c t timeval one_sec{}; one_sec.tv_sec = 0; one_sec.tv_usec = 500; cout << "Server exiting ...\n"; event_base_loopexit(base , &one_sec) ; } else { registered_sequences. clear () ; setup_sequences() ; completed_sequences = 0; curr_iteration++; string reset = "\rreset\n"; bufferevent_write(bev, reset.c_str() , reset.size()); }

}

for (Sequence &seq: registered_sequences) { i f ((!seq.running) && (!seq.done)) {

122 A.3. Source code

string s1 = to_string(seq.sequence[seq.current_message ]); string s2 = to_string(seq.id); string s = ’\r’ + s1 + s2 + ’\n’; bufferevent_write(bev, s.c_str() , s.size()); seq.running = true ; } } } string not_finished_string = ""; bool start_read = f a l s e ; void echo_read_cb( s t r u c t bufferevent * bev , void * ctx ) { char tmp [ 1 2 8 ] ; s i z e _ t n ; vector strings_read; for (;;){ string message = not_finished_string; n = bufferevent_read(bev, tmp, sizeof ( tmp ) ) ; i f ( n <= 0) { break ; / * No more data. * / }

for ( i n t i = 0; i

i f ((isalnum(tmp[i])) && (start_read)) { message += tmp[i ]; } } not_finished_string = message; } for ( const string& message: strings_read) { handle_response(bev, message) ; } strings_read.clear() ; } void echo_event_cb( s t r u c t bufferevent * bev , short events , void * ctx ) { s t r u c t evbuffer * output = bufferevent_get_output(bev); size_t remain = evbuffer_get_length(output); i f ( events & BEV_EVENT_ERROR) { perror("Error from bufferevent");

123 A.3. Source code

} i f ( events & (BEV_EVENT_EOF | BEV_EVENT_ERROR) ) { cout << "completed: " << completed_sequences << "\n"; cout << " a l l boys : " << registered_sequences.size() << "\n" ; printf("closing , remain %zd\n" , remain) ; bufferevent_free(bev) ; } } void accept_conn_cb( s t r u c t evconnlistener * l i s t e n e r , evutil_socket_t fd, s t r u c t sockaddr * address , i n t socklen , void * ctx ) { s t r u c t event * send_evtimeout ; s t r u c t timeval timeout{}; evutil_make_socket_nonblocking(fd) ; / * We got a new connection! Set up a bufferevent for it. * / s t r u c t event_base * base = evconnlistener_get_base(listener); s t r u c t bufferevent * bev = bufferevent_socket_new( base , fd , BEV_OPT_CLOSE_ON_FREE) ; set_tcp_no_delay(fd) ; bufferevent_setcb(bev, echo_read_cb , nullptr , echo_event_cb , n u l l p t r ) ;

bufferevent_enable ( bev , EV_READ|EV_WRITE) ;

timeout.tv_sec = 0; timeout.tv_usec = 1000; send_evtimeout = event_new(base , −1, EV_PERSIST, send_cb, bev); event_add(send_evtimeout , &timeout) ; } i n t main ( i n t argc , char ** argv ) { i f ( argc != 2) { cout << " no arguments provided!" << endl; e x i t ( 0 ) ; } iterations = stoi(argv[1]); cout << "Number of sequences : " << number_of_sequences − 10 << endl ; setup_sequences() ; s t r u c t event_base * base ; s t r u c t evconnlistener * l i s t e n e r ; s t r u c t sockaddr_in sin {}; s t r u c t event * evstop ;

i n t port = 8080;

signal(SIGPIPE, SIG_IGN) ;

124 A.3. Source code

base = event_base_new() ; i f ( ! base ) { puts("Couldn’t open event base " ) ; return 1 ; }

evstop = evsignal_new(base, SIGHUP, signal_cb , base); evsignal_add(evstop , NULL) ;

/ * Clear the sockaddr before using it, in case there are extra ** platform−specific fields that can mess us up. * / memset(&sin , 0, sizeof ( sin ) ) ; / * This is an INET address * / sin.sin_family = AF_INET; / * Listen on 0.0.0.0 * / sin.sin_addr.s_addr = htonl(0); / * Listen on the given port. * / sin.sin_port = htons(port); listener = evconnlistener_new_bind(base , accept_conn_cb , nullptr , LEV_OPT_CLOSE_ON_FREE| LEV_OPT_REUSEABLE, −1 , ( s t r u c t sockaddr * )&sin , sizeof ( sin ) ) ; i f (!listener) { perror("Couldn’t c r e a t e listener"); return 1 ; }

event_base_dispatch(base) ;

evconnlistener_free(listener); event_free(evstop) ; event_base_free(base) ; return 0 ; }

Listing A.13: Latency test driver C++ # include # include # include # include //sockaddr , socklen_t # include # include # include # include # include # include # include # include # include # include # include

125 A.3. Source code

# include # include # include using namespace std ; auto tot_time_start = chrono:: high_resolution_clock ::now() ; auto tot_time_end = chrono:: high_resolution_clock ::now() ; auto time_of_send = chrono:: high_resolution_clock ::now() ; auto time_of_receive = chrono:: high_resolution_clock ::now() ; vector times ; i n t ping_pong_iterations = 0; void handle_ping_pong_times() { cout << "Total time : " << ( double ) chrono :: duration_cast (tot_time_end − tot_time_start).count() << " ms" << endl ; std:: sort(times.begin() , times.end()); long max ; long min ; long median ; long Q1 ; long Q3 ; i f (times.size()%2 == 0) { // Even array size median = times[times.size()/2]; Q1 = times[times.size()/4]; Q3 = times[times.size() * 3 / 4 ] ; } else { median = (times[floor(times.size()/2)] + times[ceil(times. size()/2)])/2; Q1 = (times[floor(times.size()/4)] + times[ceil(times.size ( ) /4) ] ) /2; Q3 = (times[floor(times.size() *3/4)] + times[ceil(times. s i z e ( ) *3/4) ] ) /2; } max = times[times.size() −1]; min = times[0];

cout << "max TIME : " << max << endl ; cout << "min TIME : " << min << endl; cout << "median TIME : " << median << endl; cout << " q1 TIME : " << Q1 << endl; cout << " q3 TIME : " << Q3 << endl; cout << "Number of times : " << times.size() << endl; }

void ping_pong_cb( s t r u c t bufferevent * bev , void * ctx ) { char tmp [ 6 4 ] ; s i z e _ t n ; i f (ping_pong_iterations == 0) { tot_time_end = chrono:: high_resolution_clock ::now() ;

126 A.3. Source code

handle_ping_pong_times () ; auto * base = bufferevent_get_base(bev); string exit = "\rexit\n"; bufferevent_write(bev, exit.c_str() , exit.size()); s t r u c t timeval one_sec{}; one_sec.tv_sec = 0; one_sec.tv_usec = 100; cout << "Server exiting ...\n"; event_base_loopexit(base , &one_sec) ; } for (;;){ string message; n = bufferevent_read(bev, tmp, sizeof ( tmp ) ) ; i f ( n <= 0) { break ; / * No more data. * / } for ( i n t i = 0; i (time_of_receive − time_of_send).count()); ping_pong_iterations −−; string ping = "\rPING\n"; bufferevent_write(bev, ping.c_str() , ping.size()); time_of_send = chrono:: high_resolution_clock ::now() ; break ; } } } void set_tcp_no_delay(evutil_socket_t fd) { i n t one = 1 ; setsockopt ( fd , IPPROTO_TCP , TCP_NODELAY, &one , sizeof one ) ; } void signal_cb(evutil_socket_t fd, short what , void * arg ) { auto * base = ( s t r u c t event_base * ) arg ; printf("stop\n");

event_base_loopexit(base , nullptr); } void echo_event_cb( s t r u c t bufferevent * bev , short events , void * ctx )

127 A.3. Source code

{ s t r u c t evbuffer * output = bufferevent_get_output(bev); size_t remain = evbuffer_get_length(output); i f ( events & BEV_EVENT_ERROR) { perror("Error from bufferevent"); } i f ( events & (BEV_EVENT_EOF | BEV_EVENT_ERROR) ) { printf("closing , remain %zd\n" , remain) ; bufferevent_free(bev) ; } }

void accept_conn_cb_ping_pong( s t r u c t evconnlistener * l i s t e n e r , evutil_socket_t fd, s t r u c t sockaddr * address , i n t socklen , void * ctx ) { cout << "New connection\n" ; evutil_make_socket_nonblocking(fd) ; / * We got a new connection! Set up a bufferevent for it. * / s t r u c t event_base * base = evconnlistener_get_base(listener); s t r u c t bufferevent * bev = bufferevent_socket_new( base , fd , BEV_OPT_CLOSE_ON_FREE) ; set_tcp_no_delay(fd) ; bufferevent_setcb(bev, ping_pong_cb, nullptr , echo_event_cb , n u l l p t r ) ; bufferevent_enable ( bev , EV_READ|EV_WRITE) ; bufferevent_write(bev, "\rPING\n" , sizeof ("\rPING\n") ) ; time_of_send = chrono:: high_resolution_clock ::now() ; tot_time_start = chrono:: high_resolution_clock ::now() ;

} i n t main ( i n t argc , char ** argv ) { i f ( argc != 2) { cout << " no arguments provided!" << endl; e x i t ( 0 ) ; } ping_pong_iterations = stoi(argv[1]) ; s t r u c t event_base * base ; s t r u c t evconnlistener * l i s t e n e r ; s t r u c t sockaddr_in sin {}; s t r u c t event * evstop ;

i n t port = 8080;

signal(SIGPIPE, SIG_IGN) ;

base = event_base_new() ; i f ( ! base ) { puts("Couldn’t open event base " ) ; return 1 ;

128 A.3. Source code

}

evstop = evsignal_new(base, SIGHUP, signal_cb , base); evsignal_add(evstop , NULL) ;

/ * Clear the sockaddr before using it, in case there are extra ** platform−specific fields that can mess us up. * / memset(&sin , 0, sizeof ( sin ) ) ; / * This is an INET address * / sin.sin_family = AF_INET; / * Listen on 0.0.0.0 * / sin.sin_addr.s_addr = htonl(0); / * Listen on the given port. * / sin.sin_port = htons(port); listener = evconnlistener_new_bind(base , accept_conn_cb_ping_pong , nullptr , LEV_OPT_CLOSE_ON_FREE| LEV_OPT_REUSEABLE, −1 , ( s t r u c t sockaddr * )&sin , sizeof ( sin ) ) ; i f (!listener) { perror("Couldn’t c r e a t e listener"); return 1 ; }

event_base_dispatch(base) ;

evconnlistener_free(listener); event_free(evstop) ; event_base_free(base) ; return 0 ; }

Listing A.14: Encode/decode test driver C++ # include # include # include # include //sockaddr , socklen_t # include # include # include < s t d i o . h> # include # include # include # include # include # include # include # include # include # include # include

129 A.3. Source code

# include "adressbook.pb.h" using namespace std ; vector times ; i n t ping_pong_iterations = 0; void handle_ping_pong_times() { std:: sort(times.begin() , times.end()); long max ; long min ; long median ; long Q1 ; long Q3 ; i f (times.size()%2 == 0) { // Even array size median = times[times.size()/2]; Q1 = times[times.size()/4]; Q3 = times[times.size() * 3 / 4 ] ; } else { median = (times[floor(times.size()/2)] + times[ceil(times. size()/2)])/2; Q1 = (times[floor(times.size()/4)] + times[ceil(times.size ( ) /4) ] ) /2; Q3 = (times[floor(times.size() *3/4)] + times[ceil(times. s i z e ( ) *3/4) ] ) /2; } max = times[times.size() −1]; min = times[0];

cout << "max TIME : " << max << endl ; cout << "min TIME : " << min << endl; cout << "median TIME : " << median << endl; cout << " q1 TIME : " << Q1 << endl; cout << " q3 TIME : " << Q3 << endl; cout << "Number of times : " << times.size() << endl; } void send_gpb ( s t r u c t bufferevent * bev ) { gpb:: AddressBook addressBook; gpb:: Person * anton = addressBook.add_people() ; anton−>set_id (1) ; anton−>set_name( "Anton Oro " ) ; anton−>set_email("anton. oro@ericsson .com") ; gpb :: Person :: PhoneNumber *num = anton−>add_phones() ; num−>set_number( "1337" ) ; num−>set_type (gpb :: Person : :WORK) ; gpb:: Person * rasmus = addressBook.add_people() ; rasmus−>set_id (2) ; rasmus−>set_name( "Rasmus Karlback") ; rasmus−>set_email("rasmus. karlback@ericsson .com") ; gpb :: Person :: PhoneNumber *num1 = rasmus−>add_phones() ; num1−>set_number( "1337" ) ;

130 A.3. Source code

num1−>set_type (gpb : : Person : :WORK) ; string data = "\r8" + addressBook.SerializeAsString() + ’$’;

bufferevent_write(bev, data.c_str() , data.size()); } bool e x i t _ s e n t = f a l s e ; void gpb_read_cb( s t r u c t bufferevent * bev , void * ctx ) { char tmp [ 1 0 2 4 ] ; s i z e _ t n ; i f (exit_sent) { return ; } i f (ping_pong_iterations == 0) { auto * base = bufferevent_get_base(bev); string exit = "\rexit$"; bufferevent_write(bev, exit.c_str() , exit.size()); e x i t _ s e n t = true ; s t r u c t timeval one_sec{}; one_sec.tv_sec = 0; one_sec.tv_usec = 100; cout << "Server exiting ...\n"; handle_ping_pong_times () ; event_base_loopexit(base , &one_sec) ; } for (;;){ string message; n = bufferevent_read(bev, tmp, sizeof ( tmp ) ) ; i f ( n <= 0) { break ; / * No more data. * / } bool start_read = f a l s e ; for ( i n t i = 0; i 0) { times.push_back( stoi (message)) ; ping_pong_iterations −−; send_gpb(bev) ; break ; } } }

131 A.3. Source code void set_tcp_no_delay(evutil_socket_t fd) { i n t one = 1 ; setsockopt ( fd , IPPROTO_TCP , TCP_NODELAY, &one , sizeof one ) ; } void signal_cb(evutil_socket_t fd, short what , void * arg ) { auto * base = ( s t r u c t event_base * ) arg ; printf("stop\n");

event_base_loopexit(base , nullptr); }

void echo_event_cb( s t r u c t bufferevent * bev , short events , void * ctx ) { s t r u c t evbuffer * output = bufferevent_get_output(bev); size_t remain = evbuffer_get_length(output); i f ( events & BEV_EVENT_ERROR) { perror("Error from bufferevent"); } i f ( events & (BEV_EVENT_EOF | BEV_EVENT_ERROR) ) { printf("closing , remain %zd\n" , remain) ; bufferevent_free(bev) ; } } void accept_conn_cb_gpb( s t r u c t evconnlistener * l i s t e n e r , evutil_socket_t fd, s t r u c t sockaddr * address , i n t socklen , void * ctx ) { cout << "New connection\n" ; evutil_make_socket_nonblocking(fd) ; / * We got a new connection! Set up a bufferevent for it. * / s t r u c t event_base * base = evconnlistener_get_base(listener); s t r u c t bufferevent * bev = bufferevent_socket_new( base , fd , BEV_OPT_CLOSE_ON_FREE) ; set_tcp_no_delay(fd) ; bufferevent_setcb(bev, gpb_read_cb, nullptr , echo_event_cb , n u l l p t r ) ; bufferevent_enable ( bev , EV_READ|EV_WRITE) ; send_gpb(bev) ; ping_pong_iterations −−;

}

i n t main ( i n t argc , char ** argv )

132 A.3. Source code

{ i f ( argc != 2) { cout << " no arguments provided!" << endl; e x i t ( 0 ) ; } ping_pong_iterations = stoi(argv[1]) ; s t r u c t event_base * base ; s t r u c t evconnlistener * l i s t e n e r ; s t r u c t sockaddr_in sin {}; s t r u c t event * evstop ;

i n t port = 8080;

signal(SIGPIPE, SIG_IGN) ;

base = event_base_new() ; i f ( ! base ) { puts("Couldn’t open event base " ) ; return 1 ; }

evstop = evsignal_new(base, SIGHUP, signal_cb , base); evsignal_add(evstop , NULL) ;

/ * Clear the sockaddr before using it, in case there are extra ** platform−specific fields that can mess us up. * / memset(&sin , 0, sizeof ( sin ) ) ; / * This is an INET address * / sin.sin_family = AF_INET; / * Listen on 0.0.0.0 * / sin.sin_addr.s_addr = htonl(0); / * Listen on the given port. * / sin.sin_port = htons(port);

listener = evconnlistener_new_bind(base , accept_conn_cb_gpb , nullptr , LEV_OPT_CLOSE_ON_FREE| LEV_OPT_REUSEABLE, −1 , ( s t r u c t sockaddr * )&sin , sizeof ( sin ) ) ; i f (!listener) { perror("Couldn’t c r e a t e listener"); return 1 ; }

event_base_dispatch(base) ;

evconnlistener_free(listener); event_free(evstop) ; event_base_free(base) ;

return 0 ; }

133