Design and Evaluation of -based E/E-Architectures for Latency- and Safety-critical Applications

Entwurf und Evaluierung Ethernet-basierter E/E-Architekturen für latenz- und sicherheitskritische Anwendungen

Der Technischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg zur Erlangung des Doktorgrades Dr.-Ing.

vorgelegt von

Fedor Smirnov aus Udomlja Als Dissertation genehmigt von der Technischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg Tag der mündlichen Prüfung: 27.09.19

Vorsitzender des Promotionsorgans: Prof. Dr.-Ing. Reinhard Lerch

Gutachter: Prof. Dr.-Ing. Jürgen Teich Prof. Dr.-Ing. Michael Glaß Prof. Dr. phil. nat. Sebastian Steinhorst Abstract

In recent years, there has been a tremendous number of innovations in car electronics. New infotainment and driver assistance features introduce an ever increasing amount of data that has to be transmitted via the in-car communication network. With its huge bandwidth advantage over other communication protocols, Ethernet offers an interesting opportunity to meet the increasing bandwidth and latency requirements of modern car communication networks and is generally seen as the most promising solution for future automotive systems. The strict real-time and reliability require- ments of modern Advanced Driver Assistance Systems (ADAS) may be addressed by protocol extensions like Ethernet Time-Sensitive Networking (TSN) which offer new mechanisms like time-triggered traffic or seamlessly redundant message transmission. The complexity of automotive communication networks, regarding both the size of the networks and the number of configuration parameters like the sending period or the priority of messages, will certainly further increase in the future and necessitates design automation already today. Yet, existing approaches for automated network design cannot be applied to the design of automotive Ethernet (TSN) networks, as they do not account for their special features such as the introduction of transmission schedules, virtually isolated subnetworks, redundant transmissions, and, in particular, Ethernet’s lack of real-time and reliability guarantees. In this context, this thesis, for the first time, presents a system-level design ap- proach for automotive Ethernet networks where the multi-dimensional solution search space created by the many—oftentimes non-linear and possibly conflicting—design objectives from the automotive domain is explored within a Design Space Exploration (DSE) to find not one, but multiple high-quality designs. This approach enables an automated design and evaluation of Ethernet-based electric/electronic (E/E) archi- tectures, in particular for latency- and safety-critical applications, and is based on contributions from the areas of formal analysis, constraint-based restriction of the search space, and the injection of problem-specific knowledge into the optimization. During network design, the evaluation of design decisions plays an important role, especially for the timing and the reliability of message transmissions within the network. Existing approaches for timing analysis provide safe timing guarantees for strict-priority Ethernet networks, but are not applicable for networks with TSN-specific

iii features like time-triggered traffic or transmission preemption. To cope with these novel network features, this thesis extends existing timing analysis approaches, so that the timing of the scheduled traffic and, in particular, the interference imposed on unscheduled traffic are considered. The timing analysis is, moreover, complemented by preprocessing techniques that significantly reduce the time required for the analysis of each network design. While a lot of work can be found on the formal analysis of permanent hardware errors and their impact on the system reliability, the influence of transient errors has, so far, attracted less attention from the scientific community. This thesis provides a contribution in this area by proposing a formal analysis approach for the analysis of transient errors which is specifically tailored to the error-detection mechanism used in automotive networks. The proposed approach combines timing and reliability analysis and demonstrates that temporal redundancy can be used as an effective means to improve transmission reliability. Especially for problems like the optimization of automotive networks, where the search space is huge and the evaluation of a single solution can take considerable amounts of time, excluding infeasible solutions from the evaluation space has been shown to significantly accelerate the optimization process. Based on SAT-Decoding, an existing approach for hybrid optimization of constrained problems, this work contributes constraint systems that formally describe Ethernet networks with overlap- free transmission schedules, message routes that are created with respect to a given Virtual Local Area Network (VLAN) partitioning, and a redundant routing without communication loops, respectively. These constraint sets enable an automatic creation of network designs which are valid with respect to application-specific requirements, which makes a design optimization of these networks at all possible. Over the years, a great pool of experience has been built by design and analysis experts. With the third area of contributions, this work proposes novel means of making parts of this problem-specific knowledge accessible to the optimizer. The thesis contributes Artificial Gene Design (AGD), a novel approach that extends SAT- Decoding and enables the optimizer to directly adjust problem characteristics with a high relevance for the design objectives. The application of AGD is demonstrated using the optimization of redundant routings with respect to the transmission reliability as an example. Furthermore, this thesis shows how topology-specific knowledge can be considered during the formulation of routing constraints to significantly reduce the number of encoding variables, resulting in a smaller search space and a faster convergence towards the (Pareto-)optimal solutions.

iv Acknowledgments

I would like to express my sincere gratitude to Prof. Dr.-Ing. Jürgen Teich for his constant support, his trust, his encouragement to pursue my research interests, and for providing an excellent research environment. I also want to thank Prof. Dr. phil. nat. Sebastian Steinhorst for agreeing to be the co-examiner of this work and to Prof. Dr.-Ing. Felix Freiling and Prof. Dr.-Ing. Sebastian Sattler for being part of the exam committee. I am grateful to all my colleagues at the Chair for Hardware/Software Co-Design for the great ambiance, help, and intriguing discussions. A large part of this thesis has resulted from a doctoral research project in cooperation with the AUDI corporation within its INI.FAU initiative. I am very grateful to the AUDI corporation for the opportunity to work on an industry-relevant research topic and to participate in the research activities of a large corporation. In particular, I would like to thank Felix Reimann for his support, the challenging discussions, and the opportunity to contribute research ideas of my own. Last but not least, I would like to thank Michael Glaß, whose support, advice, and criticism, both during his time as the group leader of the SDA group and after his move to the University of Ulm, were of great importance for this dissertation and for me personally.

v

Contents

1 Introduction1 1.1 Ethernet in Automotive ...... 2 1.2 The Challenge of Design Complexity ...... 2 1.3 Contributions and Scope ...... 3 1.3.1 Timing and Reliability Analysis of Ethernet TSN Networks3 1.3.2 Constraining the Design Space ...... 5 1.3.3 Efficient Design Space Exploration ...... 6

2 Fundamentals9 2.1 Ethernet Technology ...... 9 2.1.1 Origins and Characteristics ...... 9 2.1.2 Switched Strict-Priority Ethernet in Automotive ...... 11 2.1.3 Ethernet TSN ...... 14 2.2 Multi-Objective Optimization of Embedded Systems ...... 19 2.2.1 System Model ...... 20 2.2.2 SAT-Decoding ...... 26 2.2.3 Synthesis Constraints ...... 28 2.2.4 Multi-Objective Optimization ...... 31

3 Formal Timing and Reliability Analysis of Ethernet TSN Networks 35 3.1 Timing Analysis of Mixed-Criticality TSN Networks ...... 36 3.1.1 Timing Analysis using the Busy-Period Approach ...... 38 3.1.2 Timing Analysis of Scheduled TSN Networks ...... 45 3.1.3 Experimental Results ...... 49 3.1.4 Related Work ...... 52 3.2 Reliability Analysis of Ethernet Networks under Transient Transmis- sion Errors ...... 53 3.2.1 Reliability Model ...... 55 3.2.2 Reliability Calculation ...... 56 3.2.3 Reliability/Timing Correlation ...... 60 3.2.4 Experimental Results ...... 66

vii Contents

3.2.5 Related Work ...... 67 3.3 Conclusion ...... 68

4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks 71 4.1 Joint Constraint Generation for Routing and Scheduling ...... 73 4.1.1 System Model ...... 74 4.1.2 Constraint Formulation ...... 74 4.1.3 Experimental Results ...... 79 4.1.4 Related Work ...... 85 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning 85 4.2.1 System Model ...... 87 4.2.2 Routing Constraints ...... 88 4.2.3 Experimental Results ...... 100 4.2.4 Related Work ...... 107 4.3 Constraints for Redundant Message Routing ...... 109 4.3.1 Introduction ...... 109 4.3.2 The Link Encoding Approach ...... 110 4.3.3 The Preprocessing Approach ...... 111 4.3.4 Related work ...... 113 4.4 Conclusions ...... 114

5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality 115 5.1 Artificial Gene Design ...... 116 5.1.1 Introduction ...... 116 5.1.2 Formal Reliability Analysis of Ethernet Networks ...... 117 5.1.3 Optimization Challenges ...... 117 5.1.4 The Genetic Gap ...... 119 5.1.5 Artificial Gene Design ...... 121 5.1.6 Applying AGD to the Reliability Optimization ...... 122 5.1.7 Experimental Results ...... 125 5.1.8 Related Work ...... 128 5.2 Variety-Aware Routing Encoding ...... 132 5.2.1 Proxy Relations and Proxy Areas ...... 134 5.2.2 Identification of Proxy Areas ...... 135 5.2.3 Adaptation of Existing Constraint Systems ...... 136 5.2.4 Experimental Results ...... 138 5.2.5 Related Work ...... 143 5.3 Conclusions ...... 144

viii Contents

6 Conclusion and Future Work 145 6.1 Summary ...... 145 6.2 Directions of Future Work ...... 149

German Part 151

Bibliography 159

Author’s Own Publications 171

List of Symbols 173

Acronyms 179

ix

1 Introduction

On the first of October 1908, the Ford Motor Company started the production of the Ford Model T. This car, which is generally regarded as the first mass-produced automobile, both reliable and affordable to the middle class, heralded the automotive era, introducing a hitherto unknown level of mobility and freedom to mankind [Cly55]. Since this time, cars have seen a magnificent technological evolution. Most innova- tions during the first half of the 20th century like the assembly line, the all-steel body, or the automatic transmission mainly brought improvements of the production process and new mechanical features. However, the focus of the innovation then started to shift towards electronic functionalities like electronic fuel injection, sophisticated infotainment systems and the first systems offering driver assistance [CS03]. While the first cars were little more than an engine on wheels with a smallnum- ber of components, modern cars constitute highly complex heterogeneous systems. Nowadays, even the simplest actions, like opening or closing a window, require a co- ordinated functional interaction between sensors, actuators, and processing elements, the so-called Electronic Control Units (ECUs) [Rib03]. As these components are both physically and logically distributed within the car, the communication network connecting them has become a vital part of the automobile. With modern wiring harnesses weighing around 23 kilos and having a length of about 1.2 kilometers [Int15a], this communication network is in itself a highly complex component. Its huge effect on the weight and the monetary cost of the car, two of the main non-functional design requirements, is, however, not the main reason why the design of the communication network has become a key step during the automobile development. The real reason for this trend is the fact that in a heterogeneous system, where applications like the airbag control or the driver assistance are distributed, the communication network itself becomes safety-critical. Innovative applications for autonomous driving and Advanced Driver Assistance Systems (ADAS) combine safety-criticality with high bandwidth requirements and even further increase the importance of automotive communication networks.

1 1 Introduction

1.1 Ethernet in Automotive

Until the beginning of the 90s, automotive communication was implemented using point-to-point connections [NS13]. However, with the growing number of ECUs, this solution soon failed as it caused problems in terms of cost, weight, complexity, and reliability of the wiring harness. This challenge was then addressed by using shared communication resources, so-called buses. While sharing the same resource for the transmission of messages between many different end node pairs reduces the costs and the complexity, a set of rules describing the bus access, a so-called communication protocol, is necessary to ensure a reliable communication. A bus-based in-vehicle communication is the preferred solution today and many communication protocols such as Local Interconnect Network (LIN) [Int16], Con- troller Area Network (CAN) [Int15b], Media Oriented Systems Transport (MOST) [Grz07], or FlexRay [RSM+10] were developed for automotive. However, these protocols offer relatively small bandwidths and are, therefore, unlikely to be used for the implementation of future ADAS applications, which make driving decisions based on Gigabytes of image data that has to be transmitted through the vehicle network. As a consequence of the need to satisfy this bandwidth hunger, yet at the same time to remain within the strict cost and weight restrictions of the automotive industry, focus has shifted towards Ethernet, a communication protocol which was developed in an entirely different application domain.

1.2 The Challenge of Design Complexity

Throughout the evolution of automotive communication networks, both the number of the communicating devices and the messages sent through the network grew constantly. To keep the network design manageable, automotive networks were divided into domains such as engine control, power train, or infotainment. Devices that exchange high numbers of messages were hereby put into the same domain while the amount of messages transmitted between different domains was kept as small as possible. This method allowed to apply different requirements for different domains (for example, the security requirements of the infotainment domain can be less strict than those of the engine control domain). Furthermore, separating the network into domains has allowed to reduce the design complexity by distributing the design task to multiple teams of domain experts, each designing the subnetwork for one domain. For a long time, this approach has kept the design complexity low so that the communication network could be designed without additional aid by applying the experience of the designers together with a set of design rules gathered over the years. However, the key trends in automotive showcase the limitations of a manual network design. Not only are the networks rapidly growing in both the number of nodes and messages. Additionally, the usage of more sophisticated communication protocols

2 1.3 Contributions and Scope with more complex arbitration rules introduces complex dependencies and makes it harder to evaluate the effects of each design decision with respect to a growing number of heterogeneous and oftentimes conflicting design objectives. The trend towards a backbone network, where the different network domains are connected and exchange messages, further complicates the design process. Designing the network by hand, thus, becomes time consuming, error prone, and is increasingly unlikely to deliver optimal solutions. Computer-aided approaches for the automation of the design process must be introduced to meet the growing complexity of automotive networks and equip automotive designers for the next era of automotive communication.

1.3 Contributions and Scope

After providing theoretical fundamentals in Chapter 2, the contributions of this thesis are detailed in Chapters 3-5. They enable a fully automated and efficient design of automotive networks for both (a) strict-priority automotive Ethernet [Ins14] and (b) Ethernet networks which integrate the novel mechanisms introduced by Ethernet Time-Sensitive Networking (TSN) [Ins15; Ins16; Ins17] with message timing and transmission reliability as two of many design objectives. To enable an automated multi-objective optimization of the network design this thesis presents (a) approaches for an evaluation of the reliability and the message transmission timing of individual network configurations, (b) an efficient way to exclude infeasible network designs from consideration, and (c) means to efficiently guide the optimization process. This section shortly outlines the individual contributions.

1.3.1 Timing and Reliability Analysis of Ethernet TSN Networks The introduction of new safety-critical applications like X-by-wire or ADAS signifi- cantly changes the design process of automotive communication networks. Each of these applications requires several sensors, actuators, and ECUs which are separated, both physically and logically. Consequently, safety guarantees must be provided not only for the processing units, but also for the communication network transmitting the safety-critical messages between these. To provide these guarantees and also to evaluate design decisions which manage, e.g., message routing or prioritization, in early design stages, an analysis of both, the transmission timing and the transmission reliability, is needed. While simulation and testing methods are widely used to analyze the timing charac- teristics of a system, these methods cannot provide firm timing guarantees, as their results describe the average instead of the worst case. In automotive, using formal analysis to provide timing guarantees is a common practice and there are several ap- proaches to calculate a safe upper bound for the response time of messages transmitted through communication networks based on strict-priority switched Ethernet [Ins14] or

3 1 Introduction

Ethernet Audio Video Bridging (AVB) [Ins09]. However, these approaches cannot provide a holistic timing analysis for traffic in Ethernet TSN networks, as they donot consider the new mechanisms, in particular the enhancements for scheduled traffic [Ins15], introduced by this standard. As a remedy, this work extends the state-of-the-art timing analysis of Ethernet networks [DRE12], so that the timing of the scheduled traffic and, in particular, the interference imposed on the unscheduled traffic are considered. The proposed approach, which is described in [SGR+17a], is integrated into a state-of-the-art timing analysis. Additionally, the work at hand proposes a preprocessing technique that provides a significant speedup for the timing analysis of networks with scheduled traffic and shows that the usage of this technique enables an integration ofthetiming analysis approach into a Design Space Exploration (DSE), where it is used to optimize the system schedule and other network parameters with respect to the message timing as one of many design objectives. Beside the analysis of the message timing, analyzing the transmission reliability is equally important during the design of safety-critical automotive applications. In literature, a lot of work can be found on the reliability analysis of systems under the influence of permanent hardware errors [GLR+08; LGT09]. However, the analysis of transient transmission errors and their effect on the system reliability has, so far, not seen much attention from the scientific community. To address this issue, the work at hand proposes a formal method for the analysis of the impact that transient transmission errors have on the reliability of automotive communication networks, see [SGR+16]. This method is specifically targeted at the error detection mechanism used in these networks, where the receiver of each message has to receive at least one message during an application-specific time interval, the so-called Diagnostic Test Interval (DTI). To quantify the reliability of systems relying on this time-oriented error detection mechanism, the proposed approach combines reliability analysis with timing analysis techniques. It uses the DTI and the Bit Error Rate (BER) as inputs and provides a safe bound for the transmission reliability of messages. In addition to the analysis, this work proposes to adjust the sending rates of messages as a cost-efficient means to increase the system reliability. Hereby, finding the optimal sending rate for safety-critical messages is a non-trivial optimization task. While sending a message more often increases its transmission reliability, it also increases the network load and the interference that the message imposes on other network messages. The proposed approach is, therefore, integrated into a DSE to enable an automated system design where the sending rate is optimized with respect to both, the transmission reliability and the message timing. A detailed description of both analysis approaches is provided in Chapter 3. The chapter also contains experiments that highlight the speedup gained by the proposed timing analysis technique and the trade-offs between the transmission reliability and the message timing that occur during the optimization of the message sending rate.

4 1.3 Contributions and Scope

1.3.2 Constraining the Design Space Beside the ability to analyze a given network configuration, the ability to optimize the network architecture itself by generating new configurations in terms of the network topology, the message routing, and the configuration of the network resources is vital for the automation of the design process. To enable an efficient design process, the number of network configurations that have to be analyzed in order to find a set of optimal configurations w.r.t. several design objectives, such as monetary cost, reliability, or real-time capability, should be as small as possible. Especially for problems where there are many combinations of design decisions that lead to infeasible solutions, pruning the design space by excluding these infeasible network configurations significantly improves the optimization efficiency. Several approaches found in literature [JPT+10; AGS+13; LSF14] offer solutions for the task of exploring feasible network configurations in terms of the allocation, the binding, and the routing decisions. However, the specific requirements of the automotive area and the mechanisms of the Ethernet TSN standard introduce new requirements for a feasible network, so that the state-of-the-art approaches for the exclusion of infeasible solutions are no longer applicable. The ports of the switches in TSN networks contain so-called transmission gates. With these, it is possible to configure port schedules and hereby define time intervals during which only the scheduled high-priority traffic can be transmitted. In com- bination with a global schedule considering all port schedules in the system, this mechanism enables an interference-free and, therefore, deterministic transmission of safety-critical messages. However, the generation of a schedule that guarantees that the time intervals configured for the transmission of the messages do not overlap isa challenging task. State-of-the-art approaches for the generation of schedules in distributed systems [Ste10; SAW+14] address the routing and the scheduling problems separately and ignore the distinct interrelations between routing and scheduling decisions. This may result in severe disadvantages during a DSE if, e.g., the routing generated in the first design step leads to an unschedulable system. As a remedy, thiswork presents a problem encoding that enables a joint generation of the system routing and the schedule, see [SGR+17b]. Experimental results highlight the scalability of the proposed approach and give evidence that it enables an effective schedule optimization. With the introduction of ADAS, which have the complete control over the vehicle behavior, security is becoming one of the main issues during the design of automotive communication networks. To increase the security of the message transmission, automotive networks are virtually separated into multiple subnetworks, each of which is denoted as a so-called Virtual Local Area Network (VLAN). VLANs were originally developed for local and metropolitan area networks, where they are used to reduce the amount of broadcast traffic. In automotive networks, the goal of theVLAN partitioning is to improve the network security by creating subnetworks with different

5 1 Introduction security requirements. During the design of the automotive network, the VLAN configuration has to be optimized simultaneously with the network topology andthe message routings with respect to the many design objectives specific to the automotive domain. State-of-the-art approaches for the VLAN-partitioning of the network, such as [RHK99], are, consequently, not applicable, as they target given static networks and consider the single design objective of minimizing the amount of network traffic. To address this issue, the work at hand proposes a constraint set that encodes a message routing which is valid with respect to the VLAN assignment, see [SRT+18a]. While the complex constraint formulation is hereby completely transparent for the designer, it is easily extensible by additional routing constraints and can be used to optimize both the routing and the VLAN configuration of the network. To provide the reliability guarantees necessary for the certification of safety-critical systems, a certain degree of redundancy becomes mandatory. In a communication network with redundant routings for safety-critical messages, the failure of individual links does not always lead to the failure of the implemented application, making the overall system significantly more reliable. However, network topologies with the possibility for redundant routings naturally offer much more design options. The introduction of redundant routings, consequently, makes the network design more complex, further increasing the need for design automation. To enable an automated design of highly reliable automotive communication net- works, this work proposes two approaches for the encoding of valid and possibly redundant routings, see [SRT+18b]. One of these approaches relies on an extensive preprocessing step which is used to find all routing possibilities, while the other ap- proach is based on a constraint set describing valid routings based on the activation of individual links. A detailed description of the constraint systems mentioned above along with a presentation of experimental results is given in Chapter 4.

1.3.3 Efficient Design Space Exploration With the ongoing growth of both, the size of the communication networks and the number of different configuration parameters, the complexity of optimizing the com- munication network grows steadily. To enable an optimization of complex commu- nication networks, where a major portion of the design space consists of invalid solutions, the work at hand uses an approach referred to as SAT-Decoding [LGH+07]. SAT-Decoding—described in detail in Chapter 2—is able to drastically reduce the number of solutions that have to be evaluated by excluding the infeasible solutions, so that no problem-specific knowledge is required for the DSE. However, in case of a huge design space, the SAT-Decoding approach may not suffice for an effective optimization, especially for problems with a complicated relation between the design decisions and the design objectives. As a remedy, the work at hand proposes to extend SAT-Decoding by a technique

6 1.3 Contributions and Scope referred to as Artificial Gene Design (AGD), see [SRT+18b]. This technique makes problem-specific knowledge accessible to the Evolutionary Algorithm (EA) usedas optimizer by formulating additional constraints and, hereby, introducing additional genes carrying problem-specific information. We demonstrate the application and the effectiveness of the AGD approach by applying it to optimize the reliability of redundant transmissions. To address the scalability problems of constraint-based approaches, we introduce a novel technique for the encoding of valid routings [SPG+19]. There, a lightweight preprocessing algorithm is used to gather topology-specific knowledge and identify so-called proxy areas where there is exactly one possible route between any pair of nodes. Contrary to areas with a variety of different routes, proxy areas do not offer any potential for routing optimization. In the proposed approach, an unnecessary increase of the search space is prevented by excluding proxy areas from the routing encoding. The presented experimental results show that exploiting the knowledge about proxy areas provides a significant speedup and yields results of higher quality compared to variety-unaware encoding approaches. The rest of this work is outlined as follows: Chapter 2 presents the theoretical foundations of this work. The Chapters 3–5 detail the contributions of this work: The novel approaches for the analysis of automotive Ethernet (TSN) networks are described in Chapter 3. Chapter 4 presents the constraint systems that were developed for an automatic network optimization. The novel optimization concepts are discussed and evaluated in Chapter 5. Finally, Chapter 6 provides a summary of this thesis and offers an outlook on possible directions of future research.

7

2 Fundamentals

This chapter provides the fundamentals that are used as a basis for the explanations in the later chapters of this work. Section 2.1 focuses on details of the Ethernet technology, introduces the core concepts of both strict-priority Ethernet and the novel mechanisms introduced by the Time-Sensitive Networking (TSN) standard, and provides an overview over the configuration parameters that define the search space during the design of an Ethernet network. Section 2.2 then introduces the optimization approach that is used to explore this search space, with a special emphasis on SAT- Decoding and the system model.

2.1 Ethernet Technology

This section provides an overview over the characteristics of the Ethernet protocol relevant to the work at hand as well as the terms that will be used in later chapters.

2.1.1 Origins and Characteristics Ethernet was invented on May 22, 1973 at the Palo Alto Research Center [Spu00]. Af- ter being commercially introduced in 1980, Ethernet was first standardized in 1983 as IEEE 802.3 [Ins14]. This family of networking technologies was the first high-speed technology for the connection of personal computers in local-area, metropolitan-area, and wide-area networks. Since its invention, the Ethernet technology has seen an un- precedented success story. The explosive increase of the usage of information-sharing technologies such as the World Wide Web resulted in a global spread of Ethernet. In fact, probably every personal computer in the world is nowadays communicating using Ethernet. In current language, the word Ethernet is used as a general term that describes a wide range of networking technologies and is frequently applied to both the communication protocol and the Ethernet-specific hardware components such as cabling, connectors or switches. Over the years, several flavors of Ethernet such as Profinet [PM08], Avionics Full-Duplex Switched Ethernet (AFDX) [FS13], Ethernet Audio Video Bridging (AVB) [Ins09], or Ethernet TSN [Ins15; Ins16; Ins17] were developed for usage in

9 2 Fundamentals certain application domains. They provide domain-specific extensions and differ in nuances of the protocol, the used hardware, or the arbitration schemes. However, there are certain traits that are typical for the Ethernet technology, namely the frame format, the provided bandwidths, and the possibility to define virtually separated subnetworks.

Frame Format Similarly to protocols like CAN [Int15b] or FlexRay [RSM+10], messages in Ethernet networks are transmitted using packets. Large messages are divided into multiple packets. These packets are then individually transmitted through the network before being merged by the receiver of the message. On the , each packet consists of a 7 Byte preamble which is followed by the 1 Byte start-of-frame delimiter. The remainder of the packet consists of the actual with a size between 64 and 1,522 Bytes and the 12 Bytes of the interpacket gap. The structure of the Ethernet frame is relevant on the data link layer. The frame is formed by the entries specifying the source and the destination of the frame (6 Bytes each), the 802.1Q tag (oftentimes referred to as the Virtual Local Area Network (VLAN) tag) containing information about the message’s VLAN and priority, the actual payload, and the 32-bit check sum used for a cyclic redundancy check. The payload (which also contains headers and trailers for higher-layer protocols like, e.g., User Datagram Protocol (UDP)) of an Ethernet frame ranges between 46 and 1,500 Bytes.

Bandwidth The first version of Ethernet was based on a shared coaxial cable and offereda bandwidth of 2.94 Megabits per second. As it became the dominant networking technology, Ethernet was continuously extended to support longer link distances and new transmission mediums such as twisted cables or optical wire. The biggest focus of the Ethernet extensions, however, was the increase of the bandwidth necessary to keep up with the world’s requirements for networking bandwidth, which is doubled every 18 months. In the area of consumer electronics, it is nowadays rare to find anything besides the three predominant Ethernet speeds of 10 Megabits, 100 Megabits, or 1 Gigabit per second. Yet, specialized environments like high-performance computing or data centers already operate with Ethernet speeds of 10 Gigabits per second and are expected to start using 100 Gigabits per second in the near future. Current efforts to develop standards for 400 Gigabits or even one-Terabit-per-second Ethernet indicate that the increase of Ethernet’s bandwidth is far from over [All18].

Virtual Local Area Networks The Ethernet protocol can be used to configure so-called VLANs within the network. Each individual VLAN is a limited area of the overall network. In a network with

10 2.1 Ethernet Technology configured VLANs, the VLAN tag of an Ethernet frame can be used to definecertain rules for this frame in relation to the VLAN areas. For example, it is possible to make a certain area of the network accessible only to frames with a certain tag, or even to virtually isolate individual areas of the (physically still connected) network from each other by completely preventing frame exchange between them. Please refer to Chapter 4.2 for a more detailed description of the VLAN mechanism and its application in automotive Ethernet networks.

2.1.2 Switched Strict-Priority Ethernet in Automotive

Automotive Ethernet - Advantages Ethernet’s wide distribution results in high production quantities and low costs for Ethernet hardware such as switches, cabling, or network connectors. Especially since the standardization of unshielded twisted pair cabling for automotive [Por18], Ethernet became a serious competitor for the established automotive communication protocols. With 100 Megabit and 1 standardized for automotive, Ethernet offers much higher bandwidth than Controller Area Network (CAN) (1 Megabit per second), FlexRay (10 Megabits per second) or even Media Oriented Systems Transport (MOST) (150 Megabits per second, requires optical wire). Ethernet operates with significantly bigger frames and, thus, offers much better bandwidth utilization during the transmission of large messages. The disadvantage of transmitting small messages with the relatively large Ethernet header can be mitigated by packing multiple smaller messages into one Ethernet frame. All in all, Ethernet’s reasonable hardware costs combined with its ability to transmit large amounts of data make it a promising communication solution for future automotive applications.

Automotive Ethernet - Challenges However, the introduction of Ethernet for automotive also introduces new design challenges. In contrast to the applications from the area of information exchange between personal computers, automotive applications come with extremely strict real- time and reliability requirements: While it may be acceptable that a user occasionally has to wait for a website to load or even has to actively reload it, a message carrying the information concerning the possible activation of an airbag always must be transmitted within the specified time interval with a near-certain reliability.

Switched Ethernet Networks In the automotive domain, Ethernet is used in switched networks consisting of end nodes and Ethernet switches. There, the end nodes are typically the Electronic Control Units (ECUs) that act as the source(s) and the destination(s) of the messages that are

11 2 Fundamentals

G0 G3

l0 l4

l1 l3 l5 G1 S0 S1 G4

l2 l6

G2 G5

Figure 2.1: Illustration of an example topology of a switched Ethernet network. transmitted through the network. The Ethernet switches neither create nor consume messages and are used for transmission only. The end nodes and the switches are connected by Ethernet links. An example of a switched Ethernet network is depicted in Fig. 2.1. This network consists of the end nodes G0-G5, the Ethernet switches S0 and S1, and the links l0-l6. The nodes are connected in a double-star topology. Contrary to classical bus-based communication that is used, e.g., in CAN networks, each Ethernet link is used as a dedicated point-to-point connection between two nodes. For example, link l3 in Fig. 2.1 is used exclusively for the transmission of messages between S0 and S1. Each link is a twisted pair of cables that is connected to the network interfaces of the link’s end points. Each network interface consists of an input and an output port used to receive data from the connected link and transmit data over the connected link, respectively, realizing a full-duplex connection between the link’s end points. Each end node is connected to the communication network by its network interface. Ethernet switches, on the other hand, have multiple network interfaces that are connected by the switch fabric, so that an Ethernet switch can transmit messages from any of its input to any of its output ports. A schematic illustration of the components of the switch S0 is given in Figure 2.2. While each of the ECUs in Figure 2.1 is connected to just one switch, the switches can transmit messages from each ECU to every other ECU in the network. Switched networks are somewhere in between point-to-point networks on the one hand and classic bus-based solutions on the other. They require a much smaller number of links than networks consisting of point-to-point connections and are, consequently, cheaper and lighter. For example, the small network in Fig. 2.1 consists of only 7 point-to-point connections instead of the 15 connections that would be necessary if all ECUs were mutually connected. Compared to bus-based technologies with one shared medium, switched networks have the advantage of a smaller number of collision domains. If all ECUs in Fig. 2.1 were connected to one shared transmission medium, each message transmission could possibly collide with any other message transmission, introducing the need for

12 2.1 Ethernet Technology

S0

l0 N0 Switch Fabric

I0

O0

N1 N3

l1 I1 I3 l3

O1 O3

N2

I2

l2 O2

Figure 2.2: Schematic illustration of switch S0 from Fig. 2.1. The connections of the links l0-l3 are realized by the network interfaces N0-N3. Each network interface consists of an input and an output port. an arbitration scheme. Collisions are, however, still possible in a switched network. Consider, for example, the case where both G0 and G2 transmit a message to G3. While the two messages can be transmitted through the input ports I0 and I2 and through the switch fabric at the same time, a simultaneous transmission of two messages over the output port O3 is not possible, so that one of the messages is transmitted first, while the other message is stored in the port buffer queue until the transmission of the first message is finished. Messages sharing the same output port can, consequently, interfere with each other, resulting in a negative impact on the timing and—in cases where messages are dropped as a result of a port buffer overflow—even the reliability of the message transmission. The choice of the arbitration scheme that dictates the order in which messages using the same output port are transmitted, has an important impact on the performance of the network. The arbitration scheme used in the switches of an Ethernet network is one of the defining features of an Ethernet technology. Ethernet implementations like Profinet Ethernet [PM08], Ethernet AVB [Ins09] or Ethernet TSN [Ins15; Ins16; Ins17] (pre- sented in the Section 2.1.3) differ in their arbitration scheme. The arbitration scheme that is currently used in the automotive domain is the so-called strict-priority arbitra- tion. In networks using this scheme, a priority level between 0 (lowest) and 7 (highest) is assigned to each message by writing it into its VLAN tag. When two messages sharing the same output port are both ready for being sent, the message with the higher priority is transmitted. In this arbitration scheme, messages with higher priorities (typically safety-critical control messages) experience less interference and shorter

13 2 Fundamentals

O3

Queue 7 Queue 6 Queue 5

Switch Fabric Queue 4 l3 Classifier Arbiter Queue 3 Queue 2 Queue 1 Queue 0

Figure 2.3: Schematic illustration of the output port O3. transmission times. Figure 2.3 illustrates an output port of an Ethernet switch using the strict-priority arbitration. The output port consists of a classifier, 8 first-in-first-out (FIFO) queues for messages with different priorities, and the strict-priority arbiter. When a message enters the output port, the classifier reads its VLAN tag to determine its priority and writes the message into the corresponding queue. The strict-priority arbiter then picks the oldest message from the non-empty queue with the highest priority and transmits it over the link. It is important to note that the strict-priority arbitration does not use any sort of transmission preemption, so that a frame is always transmitted as a whole.

2.1.3 Ethernet TSN While current automotive real-time and reliability requirements can be modeled and verified by using formal analysis techniques during the design of Ethernet networks, there is a worry that, considering trends like Advanced Driver Assistance Systems (ADAS), the existing Ethernet implementations may not be able to cope with the strictness of future requirements caused by the growth of the data volume, the shrinking response times, and the higher number of safety-critical messages. As a remedy, the Time-Sensitive Networking task group has developed Ethernet TSN, an Ethernet standard with extensions specifically aimed at real-time and reliability challenges in the automotive and the industrial control domains. Among other things, Ethernet TSN introduces mechanisms for a time-triggered transmission of messages [Ins15], frame preemption [Ins16], and a seamlessly redundant transmission of messages through the network [Ins17]. These concepts are explained in the following.

14 2.1 Ethernet Technology

Active Queues

7 6 0 - 5 t Port Schedule Hyper-Period (HP)

Figure 2.4: An example port schedule of a TSN Time-Aware Shaper (TAS). In this schedule, dedicated time slots are created for the transmission of criti- cal messages with the priorities 6 (orange) and 7 (red). Messages with lower priorities can only be transmitted within the remaining green slots according to the strict-priority arbitration scheme.

Time-Triggered Message Transmission Using the strict-priority arbitration has the advantage of a relatively easy network configuration and reconfiguration, as the arbitration is performed automatically inside the switches. However, the ignorance of the exact timing behavior of interfering traffic means that the transmission of real-time-critical messages is planned with respect to the maximum amount of possible interference that can be encountered in the switches. Consequently, the short response times dictated by strict deadlines can oftentimes not be achieved by any means other than a substantial over-specification of the communication bandwidth. To address this problem, the TSN standard [Ins15] introduces the so-called Time- Aware Shaper (TAS). In a TSN network, each priority queue of a switch output port is equipped with a transmission gate and can only transmit messages during the time slices when the gate is open. The state of each gate is dictated by the so-called port schedule. The TAS implements a time-triggered transmission of messages, similar to Time-Division Multiple Access (TDMA) communication protocols like FlexRay [RSM+10]. In the work at hand, we assume that the TAS is used to grant critical messages of certain traffic classes exclusive access to the transmission queue during certain time slots of the port schedule. During these slots, only messages from the critical traffic classes are eligible for transmission. Messages from other traffic classesare transmitted during the remaining fraction of the schedule hyper-period following the strict-priority arbitration scheme. An example for a port schedule following this approach is illustrated in Fig. 2.4.

15 2 Fundamentals

Active Queue Active Queue

∆ jL ∆ jL 7 7

0 0 IF F H CF t t

a) No Preemption b) Preemption

Figure 2.5: Without preemption, the transmission of the high-priority message cannot start until the transmission of the previously arrived low-priority message is finished. With an active preemption mechanism, the preemptable frame is separated into the initial fragment IF and the continuation fragment CF. The high-priority express message experiences a smaller low-priority interference ∆ jL and is transmitted earlier, while the transmission of the preemptable frame is delayed by the high-priority traffic and the overhead of transmitting the preemption footer F and the preemption header H.

Frame Preemption Beside the route that a message follows through a strict-priority Ethernet network, its priority is the main configuration parameter that can be used to influence the amount of worst-case interference that it encounters. High-priority messages have shorter response times, as they are the first messages that are picked for transmission from the priority queues. However, high-priority messages in an Ethernet network cannot preempt the transmission of other messages. As illustrated in the left part of Fig. 2.5, it is possible that a high-priority message is delayed by the transmission of a message that has arrived slightly earlier, even if that message has a lower priority (this is referred to as low-priority interference ∆ jL). In order to reinforce the timing impact of a higher prioritisation, TSN introduces enhancements for frame preemption [Ins16]. Hereby, each traffic class is configured either as express or as preemptable traffic. During the arbitration, express messages are able to preempt the transmission of preemptable traffic and delay it until the transmission of the express traffic is completed. The preemption separates the frame of the preemptable message into an initial and a continuation fragment. Before the express frame can be sent, a preemption footer consisting of a 4 Byte check sum is added to the initial fragment. After the transmission of the express frame, an 8 Byte preemption header is prepended to the continuation fragment of the preemptable frame. As illustrated in the right part of Fig. 2.5, the frame preemption mechanism reduces

16 2.1 Ethernet Technology

the worst-case low-priority interference ∆ jL encountered by high-priority messages and results in shorter worst-case transmission times.

Scheduled TSN Networks [Ins15] While TSN-compatible hardware is not yet commercially available1, it is foreseeable that TSN features will require hardware extensions. Therefore, automotive TSN networks are likely to be more expensive than networks based on regular Ethernet. As hardware costs are one of the main non-functional design objectives in the automotive domain, the additional costs of the TSN hardware have to be justified by an improved network performance. For this reason, the work at hand considers a network configuration where the timing advantages for the critical traffic are maximized. In these so-called scheduled TSN networks, the mechanisms of TAS and preemption are combined to enable interference-free and, therefore, deterministic transmission of the critical traffic. All critical messages are assigned the highest priority and transmitted according to a global schedule. The global schedule dictates the configuration of all TSN output ports and is generated based on the routing of each critical message. The critical traffic class is configured as express traffic. Furthermore, the preemption of a preemptable message is started a certain time interval before the actual arrival of the scheduled critical traffic. This time interval, referred to as the so-called Guard Band (GB), is long enough to complete the transmission of the initial fragment of the preempted message and to append the preemption footer, so that the critical express message can be transmitted as soon as it arrives at the output port without being delayed by any preemption overhead (like in the case illustrated in the right part of Fig. 2.5). The GB mechanism prevents any interference from the unscheduled low-priority traffic. However, the critical messages all share the same traffic class and can theoretically still interfere with each other. In a scheduled TSN network, the schedule, therefore, has to consider the sending time and the transmissions times of the resources used for the routing of each critical message, so that interference between critical messages can be prevented by construction. A simple example of a global schedule meeting these requirements is illustrated in Fig. 2.6.

Seamless Redundancy [Ins17] With safety-critical applications being distributed among physically separated process- ing units, links and switches become safety-critical, as their failure leads to the loss of the message and, in consequence, to the failure of the application. Beside the usage of more reliable (hardened) components, the reliability of message transmission can be

1Several hardware manufacturers claim that their hardware is TSN-ready, but the standardization process is not yet finished.

17 2 Fundamentals

t t t G1

S1

G2

t S3 G4 t t

G3 S2

Figure 2.6: Illustration of a simple scheduled TSN network. Each of the sender ECUs G1, G2, and G3 is sending a critical message (illustrated with violet, blue, and yellow rectangles, respectively) to the receiver ECU G4. The configuration of the port schedules of all nodes in the network accounts for the message routing and the transmission times and ensures that the time slots used for the transmission of the critical messages do not overlap [SGR+17b]. improved by transmitting messages over multiple different and, therefore, redundant routing paths. In standard Ethernet networks, redundant message transmission can be achieved only on application level. The processing unit that sends a message has to be configured to pack the same payload into several messages with different VLAN assignments. As detailed in Chapter 4.2, it is possible to use the VLAN assignment to restrict the transmission of a message to certain parts of the network. The VLAN of the resources that the messages pass on their route then has to be configured in a way that forces the messages to actually take different routes on the way to the destination. Finally, the receiving ECU has to receive all the messages and use the payload for its calculations. This approach, illustrated in Fig. 2.7a, requires a tedious configuration of the network and is highly inefficient, as each of the created messages consumes time on the network interfaces of the ECUs and bandwidth on the used links, even in cases where the messages use the same resources for the largest part of the route. In TSN networks, redundancy is achieved through the seamless redundancy mecha- nism illustrated in Fig. 2.7b. TSN switches are capable of creating redundant paths by replicating the message frames at the start of the redundant path segments and eliminating the duplicates at their end. In a TSN network, the redundant transmission is both efficient, as it only requires additional bandwidth in the case of real redundancy, and completely transparent to the application, as it depends solely on the configuration of the network switches.

18 2.2 Multi-Objective Optimization of Embedded Systems

S1

G0 S0 S3 G1

S2

(a) Sending multiple redundant messages wastes link bandwidth and introduces additional configuration overhead for the ECUs.

S1

G0 S0 S3 G1

S2

(b) With the frame replication performed by S0 and the duplicate elimination done by S3, TSN’s seamless redundancy mechanism is transparent to the application and offers a bandwidth-efficient transmission redundancy.

Figure 2.7: Illustration of the difference between redundant transmission and the seamless redundancy introduced by the TSN standard.

2.2 Multi-Objective Optimization of Embedded Systems

Designing an automotive network is a difficult task that involves a large number of individual design decisions. As outlined in Section 2.1, the network configuration of a single message in a strict-priority Ethernet network already includes the selection of the message priority, the VLAN membership, the size of the Ethernet packets, and the choice of the route used for the message transmission through the network. The deployment of TSN mechanisms adds further configuration parameters including the choice of the preemption class, the configuration of the port schedules, and the choice of the switches for frame replication and duplicate elimination. The overall number of these decisions scales with the number of network messages which, in total, can exceed 2,500 [Alb+04]. Moreover, the designer has to make several decisions on the network level including the choice of the network topology, hardware decisions like the choice of the link bandwidth, or the used VLAN partitioning. Each of these design decisions impacts the various, oftentimes conflicting and non-linear, design objectives like real-time capability, reliability, or the monetary cost of the communication network. In the context of the steady growth of both the network size and the network

19 2 Fundamentals complexity, it seems unlikely for the error-prone design-by-hand to yield optimal solutions, as finding any feasible solution is already a challenging problem. This introduces the need for an automated approach that is capable of performing a Design Space Exploration (DSE), i.e., efficiently searching the design space given bythe configuration parameters outlined in the previous paragraph and finding notone,but potentially many high-quality or even the best solution(s). The work at hand aims not only at providing analysis techniques for future Ethernet networks, but also at the integration of these techniques into an automated approach for multi-objective optimization. An overview of the overall optimization approach is illustrated in Fig. 2.8. At the beginning, the designer specifies the design problem at hand by delimiting the space of all available design options. The iterative optimization then proceeds by creating a single or a set of initial concrete solution(s). This starting point of the exploration can be created either randomly or based on domain-specific knowledge. Each solution is evaluated with respect to a set of design objectives and then modified to create a better set of solutions. Each concrete solution is obtained by picking a single point from within the design space. However, especially for complicated optimization problems, many combina- tions of design decisions may lead to solutions that are infeasible with respect to a set of constraints. For example, it is impossible to analyze the timing of a message in a network design where the message’s source node has no connection to the destination node. Processing these useless solutions wastes time and substantially reduces the effectiveness of the optimization. As a remedy, it is possible to use so-called repair mechanisms that check the feasibility of solutions and alter their structure in case that they are infeasible. This section details the implementation of two key components of the optimization approach used in the work at hand. Section 2.2.1 introduces the system model that is used to model both the entire design space and the concrete solutions. Section 2.2.2 presents the SAT-Decoding approach, a powerful repair mechanism that is used in combination with an Evolutionary Algorithm (EA) optimizer to only consider solutions satisfying a given set of Pseudo-Boolean constraints. The constraint set describing a valid, respectively feasible solution of the network design problem is then introduced in Section 2.2.3.

2.2.1 System Model As shown in [BTT98] and [Tei12], approaches for system synthesis and DSE require a model as a basis for the multi-objective optimization process. This so-called specifica- tion has to capture all system qualities that are relevant for the design objectives and the design decisions relevant for the design of automotive networks. This work uses the specification model that was first presented in [BTT98] and extended+ in[LSG 09; LSF14; GTL+17] and follows the well-known Y-chart approach [BGJ+97; KDV+97;

20 2.2 Multi-Objective Optimization of Embedded Systems

create initial solution Entire Design Space

Optimization Loop

(optional) repair Genetic Algorithm Concrete Solution SAT-Decoding modification

Evaluated Solution Evaluatable Solution

evaluation

Evaluators (Timing, Reliability, etc.)

Figure 2.8: Illustration of the overall approach used for the DSE of automotive net- works.

KNR+00; BTT98]. In [GTL+17], a specification is defined as follows:

Specification [GTL+17]: “A system model termed specification is given in a graph-based fashion that distinguishes between application (modeled as an application graph) and an ar- chitecture (modeled as an architecture graph). The relation between application and architecture — indicating each possible binding of a task of the application for execution on a resource of the architecture — is modeled by means of a set of mapping edges.”

In the following paragraphs, we use an example automotive application to show how the specification concept is used to model automotive networks and introduce the notation that is used in the work at hand.

Example Automotive System We use an ADAS for pedestrian detection as an example to illustrate the different parts of the system model. The system under design shall be able to record the current camera image, to process the image data, and, depending on the result of the processing, to reduce the throttle and activate the brakes of the automobile.

21 2 Fundamentals

GA

P0 GR d0

C0 G0

d1 l0

P1 G1 S1 G2 l1 l2 d2 l3

C1 G3 S2 G4 l4 l5 d3 d4

P2 P3

Figure 2.9: An application graph GA and an architecture graph GR.

Application Graph. The application graph GA(NT ,Ed) (first introduced in [BTT98], where it is referred to as problem graph) models the application tasks and their dependencies. These are to be mapped onto a network architecture under design. The graph consists of task nodes NT and directed dependency edges Ed. Each task node T N = N N is either a process P or a message C. Each process P N is used ∈ T P ∪ C ∈ P to model a computational task of the designed system like reading a sensor value, processing input data, or triggering an actuator action. Each message C N models ∈ C data dependencies between processes that generate data and processes that consume data. Each dependency d E connects a process and a message. The process nodes ∈ d that generate data are connected to the message node by an incoming dependency edge, while the processes that consume the data are connected to the message by an outgoing dependency edge.

Example. The graph GA on the left side of Fig. 2.9 models the application of the pedestrian detection system. The processes of acquiring the image, processing the data, and controlling the brake and the throttle are modeled as the process nodes P0, P1, P2, and P3. The fact that the image processing requires the image data is modeled by the message node C0, connected to the image acquisition and the image processing nodes by the dependencies d0 and d1. Both the brake and the throttle processes depend on the result of the image processing operation, a fact modeled by the message C1 and the dependencies d2, d3, and d4. Messages which, like C1 in this example, have multiple receiver processes, allow to model multicast communication. In a multicast, we refer to the transmission of the message between the source and one of its destinations as a communication flow. In the example illustrated in Fig. 2.9, the multicast of C1 consists

22 2.2 Multi-Objective Optimization of Embedded Systems of two communication flows.

Architecture Graph. The network under design is modeled by a graph called ar- chitecture graph [BTT98], The architecture graph GR(NR,El) consists of a set of resource nodes NR which are connected by link edges El and models all possible options for the implementation of the application that shall be considered during the DSE. Each resource R N = N N is either an ECU G N or a switch S N . ∈ R G ∪ S ∈ G ∈ S Each link l E connecting two resources specifies the ability of these two resources ∈ l to bi-directionally exchange messages [LSF14].

Example. The graph GR on the right side of Fig. 2.9 models an Ethernet network that can be used for the design of the pedestrian detection system. The network consists of the processing ECUs G1 and G2 and the ECUs attached to the camera sensor, the brake, and the throttle (G0, G3, and G4, respectively). The ECUs are connected to the communication network formed by the switches S1 and S2 by the links l0 to l5.

Mappings. Each mapping edge m = (P,G) E indicates that a process P N ∈ M ∈ P can be implemented on a resource G N . ∈ G

Example. Figure 2.10 illustrates a sample set of mapping possibilities of the pro- cesses of the pedestrian detection application onto the architecture. The processes P0 (acquiring the image), P2 (controlling the brake), and P3 (controlling the throttle) can only be implemented on the resource G0 (camera), G3 (brake), and G4 (throttle). The process P1 (data processing) has two mapping edges, as it can be implemented on both processors G1 and G2.

Routing. The introduction of explicit mapping edges may reduce the design space of process mappings a lot by explicitly defining the resources that can be usedto implement certain application tasks. In principle, it is possible to apply a similar restriction to the messages of the application graph by explicitly defining the links and resources that can be used for their transmission. However, in the work at hand, we assume that each message can be potentially routed over each link and each resource of the architecture graph.

Implementation Graph. During the DSE, the specification modeling the entire design space is used to generate concrete solutions for the design problem. In the system model used in this work, this step is referred to as a system-level synthesis [BTT98; GTL+17]. It is defined as follows:

23 2 Fundamentals

m0

P0 m1 d0

C0 G0

d1 l0 m2 P1 G1 S1 G2 l1 l2 d2 l3

C1 G3 S2 G4 l4 l5 d3 d4

P2 P3

m3 m4

Figure 2.10: Mapping edges mi describing the mapping options of the processes of the application to the ECUs of the target architecture.

System-Level Synthesis [GTL+17]: “System-level synthesis derives an implementation from a given specification by means of an allocation of resources, a binding of process tasks to allocated resources, a routing of communication tasks on a tree of allocated resources, and a schedule of tasks.”

In the work at hand, the individual steps of the system-level synthesis are imple- mented as follows:

Allocation. During the allocation, a subset of the architecture graph GR is chosen to determine the hardware architecture design. In the following, an allocation is defined implicitly as those resources of GR that are used to execute at least one process or to transmit at least one message. Hence, the allocation is implicitly defined by the binding and the routing as explained next.

Binding. The binding of processes is achieved by choosing exactly one mapping edge for each process. This edge identifies the resource that will implement the process, the so-called binding target of the process.

Routing. Finally, the routing of each message C must be determined. This is represented by the routing graph GRc, a subgraph of the architecture graph GR with

24 2.2 Multi-Objective Optimization of Embedded Systems

m0

P0

d0

C0 G0

d1 l0 m2 P1 G1 S1 G2 l1 l2 d2 l3

C1 G3 S2 G4 l4 l5 d3 d4

P2 P3

m3 m4

Figure 2.11: An example implementation generated from the specification illustrated in Fig. 2.9. directed edges. The routing graph is a tree that has the binding target of the source task of the message as root and the binding target(s) of the destination task(s) as leaf(leaves), hereby fulfilling the data dependencies.

Schedule. In this work, we assume that all process tasks have a strictly periodic character and that all ECUs use strict-priority arbitration, so that an explicit generation of a schedule for the process tasks is not necessary. The generation of transmission schedules that are necessary for TSN networks is detailed in Chapter 4.1.

Example. Figure 2.11 illustrates an example implementation that has been generated from the specification in Fig. 2.10. The processes P0, P2, and P3 have only one mapping option, so that the mappings m0, m3, and m4 are active in all implementations. In the illustrated implementation, the process P1 is bound onto ECU G1. The messages C0 and C1 are routed over the highlighted routing graphs to satisfy the data dependencies of their source and destination tasks. The link l2 and the ECU G2 are neither used as binding targets nor do they route messages. Consequently, they are not allocated as part of the implementation.

25 2 Fundamentals

2.2.2 SAT-Decoding The simple implementation example illustrated in Fig. 2.11 is generated from the specification in Fig. 2.10 by making the allocation, the binding, and therouting decisions. Within each of these decisions, a certain subset of the specification is chosen to become part of the implementation. The allocation is performed by choosing a subset of the architecture, the binding is done by choosing a subset of the mappings and the routing graph of each message is again a subset of the specification architecture. While the number of possible subsets of specification elements may be very high even for small specifications, only a small fraction of these subsets actually form valid implementations in terms of a set of so-called synthesis constraints, see Section 2.2.3. The introduction of additional domain-specific implementation restrictions, as introduced in Chapter 4, further narrows down the number of valid solutions. At the same time, the optimization of communication networks offers a solution space of considerable size, even if only valid solutions are considered. The optimiza- tion is even more challenging for automotive networks, which have to be optimized with respect to many, oftentimes conflicting and non-linear design objectives like, e.g., the link load or the transmission timing. For an automated design, an optimization ap- proach, therefore, should (a) exclude the set of invalid solutions from being evaluated, (b) efficiently search the space of the valid solutions, and (c) find not only one,buta set of optimal solutions w.r.t. a set of multiple, often conflicting design objectives. Overall, finding the optimal implementation and, in particular, the optimal network design, thus, is a constrained combinatorial problem. These problems are typically addressed by extending a metaheuristic optimizer with constraint-handling tech- niques [Coe02]. Two main concepts exist for handling constraints: penalty functions [ARC+04; SC97] and repair strategies [ZT99]. For problems like the implementation synthesis, where the set of valid solutions is considerably smaller than the entire design space, approaches that are based on penalties generally exhibit a slow convergence towards the optimum. For this reason, the work at hand uses a repair-strategy-based approach referred to as SAT-Decoding [LGH+07] for the optimization of the network design. This hybrid optimization technique combines a Pseudo-Boolean (PB) solver with an EA to exclude the set of infeasible solutions from the evaluation during the DSE. Figure 2.12 provides an overview of the principles of SAT-Decoding. In SAT-Decoding, infeasible solutions are excluded from the evaluation space by means of a set of PB constraints. These constraints encode all possible valid solutions and are formulated based on variables that symbolically encode all relevant design decisions. A binary variable v could, for example, encode the decision whether message C0 from the specification depicted in Fig. 2.10 is routed over link l0 or not. Formulating the constraints and using them to initialize the constraint solver constitutes the first and the second steps of the optimization (Steps 1 and 2 in Fig. 2.12). Solving these constraints in the third step of the optimization provides variable assignments that, by construction, encode valid implementations. For example, for the encoded

26 2.2 Multi-Objective Optimization of Embedded Systems

3 Valid 4 Problem Constraints, Solution Formulation Variables a = 1 b= 0 1 2 Solver Evolutionary a Algorithm a ∈ {0,1} a + b = 1 b b ∈ {0,1} Genome a = 1 b= 1 6 5

Figure 2.12: SAT-Decoding-based optimization loop: 1. Constraint formulation. 2. Solver Initialization. 3. Constraint resolution. 4. Decoding and evalu- ation of valid solutions. 5. Exploration of the solving strategy search space by an EA. 6. Configuration of the solving strategy.+ [SRT 18b] implementation, the assignment of the aforementioned variable v to 1 then indicates that message C0 is routed over link l0. In the fourth step, the valid variable assignment is decoded into an implementation that is then evaluated with respect to a set of design objectives, such as reliability, timing, or the monetary costs, that are to be optimized. The result of the evaluation is used as the fitness value of the implementation. The fitness value enables an EA to pick the individuals that perform best with respect tothe design objectives. By applying evolutionary variation operators such as crossover or mutation, the genotypes of these best individuals are then used to create the genotypes of the individuals for the next optimization iteration (Step 5 in Fig. 2.12). However, contrary to typical EA-based optimization approaches, the EA does not directly explore the search space of implementation design decisions in SAT-Decoding. In [GTL+17], this key idea is formulated as follows:

Key Idea of SAT-Decoding [GTL+17]: “In SAT-Decoding, instead of varying the implementation directly, the meta- heuristic varies the branching strategy of the backtracking-based solver. This way, only feasible implementations are obtained and are evaluated during design space exploration.”

Each genome of the genotype used by the EA in SAT-Decoding is associated with an encoding variable used by the constraint solver. Instead of encoding the actual design decisions, the genomes dictate the solution strategy for the PB variable associated with this decision. For example, setting the genome associated with the variable v to 1 does not automatically imply that message C0 is routed over the link l0 in the implementation associated with the genotype. Instead, it implies that the

27 2 Fundamentals constraint solver will first search for solutions where the variable v is set to 1. In cases where such a solution does not exist, the solver will backtrack and deliver a feasible solution, where the variable is set to 0. Consequently, the values of the genomes created by the EA describe the constraint solving strategy of the SAT solver and not the actual design decisions made in the implementations that the solver creates. However, each genotype encodes a specific branching strategy, i.e., the order in which the variables are set by the solver and the phases (the first value the solver assigns to a variable) of all variables. This creates an unambiguous linkage between the genotype and the solution found by the solver. The SAT-Decoding approach, thus, enables to evaluate only valid solutions during the DSE, significantly accelerating the convergence towards the set of implementations that are Pareto- optimal w.r.t. a set of design objectives. SAT-Decoding does not require any insight into the calculation of the design objectives, and introduces a clear separation of concerns between ensuring implementation feasibility on the one hand and optimization on the other. Consequently, any approach implementing SAT-Decoding can easily be extended towards new or altered design objectives and/or feasibility constraints.

Implementation of SAT-Decoding Most experiments presented in this thesis rely on an implementation of the SAT- Decoding approach. Unless stated otherwise, SAT-Decoding is implemented with the open-source design space exploration framework OPENDSE [RLG+18], which uses the evolutionary algorithm NSGA2-II [DPA+02] and the SAT solver SAT4J [LP10]. Both NSGA2 and SAT4J are provided by the open-source optimization framework OPT4J [LGR+11].

2.2.3 Synthesis Constraints Formulating the constraint set that any valid implementation must fulfill is an essential part of the SAT-Decoding approach. The constraints that enable an efficient automated design of automotive Ethernet networks constitute a big part of the contributions of this work and are presented in Chapter 4. As a starting point, this section presents the basic constraint set that encodes an implementation with a valid binding, a valid allocation, and a valid routing. The constraints for binding and allocation were first presented in [LSG+09], while the routing constraints originate from [LSF14]. All constraint sets presented in this work are formulated as so-called Pseudo- Boolean (PB) constraints. Each PB constraint consists of a sum of terms, a relational operator, and a constant right-hand side. Each term is either a constant value or a binary variable with a constant coefficient. All constant values are integer numbers. A solution of the constraint set is an assignment of the binary variables that ensures that all constraints are satisfied.

28 2.2 Multi-Objective Optimization of Embedded Systems

Valid Binding Depending on the design problem at hand, there are different options to define a valid binding. The basic constraint set relies on two assumptions concerning the binding. On the one hand, each process in the given application graph is considered to be non-optional, so that an implementation is only valid if at least one mapping edge is selected/activated for each process, deciding on which resource it will be implemented. On the other hand, multiple bindings of the same process are not considered in this work. With these assumptions, a valid binding is described by stating that each process has to be bound exactly once. This is done by the following constraint equation: P N : ∀ ∈ P ∑ m = 1 (2.1) m=(P,G) E ∈ M where m is an encoding variable which is set to 1 iff the mapping m is activated.

Valid Allocation A valid resource allocation has to ensure that the implementation contains all resources that are used as binding targets of the activated mappings and all links and resources that are part of the routings of the messages of the application. At the same time, it must not contain any unused links or resources. These rules are enforced by the following constraint equations: m = (P,G) E : ∀ ∈ M G m 0 (2.2) − ≥ C N ,R N : ∀ ∈ C ∈ R R C 0 (2.3) − R ≥ R N : ∀ ∈ R R m C 0 (2.4) − ∑ − ∑ R ≤ m=(P,R) E C NC ∈ M ∈ l El, R,Re NR,R = Re: ∀ ∈ { } ⊆ ̸ l C 0 (2.5) − l=(R,Re) ≥

l = (R,Re) El: ∀ ∈ l C C ∑ l=(R,Re) ∑ l=(Re,R) 0 (2.6) − C N − C N ≤ ∈ C ∈ C where G is an encoding variable that is set to 1 iff the ECU G is allocated in the implementation architecture. Similarly, the encoding variables R and l encode the activation of the resource R and the link l, respectively. The encoding variable CR is C R C set to 1 iff message is routed over resource and the encoding variable l=(R,Re) is set to 1 iff message C is routed from resource R to resource Re using the link l. The constraint equations (2.2)–(2.6) state that an ECU that is the binding target of at least

29 2 Fundamentals one process has to be allocated, a resource or a link has to be allocated if it is used for the routing of a message, and architecture elements that are not used are not allocated.

Valid Routing The basic constraints presented at this point are used for the encoding of valid non- redundant routings of (multicast) messages, where the routing graph is a tree with the message source as root and the message destination(s) as leaf(/ves). The activation of the elements of the routing graph of a message is encoded by the following constraints: C N ,P N N + ,m = (P,R) E : ∀ ∈ C ∈ C− ∪ C ∈ M C m 0 (2.7) R − ≥ C NC, Pe N + ,P N , R,Re NR,R = Re,m = (P,R),m = (Pe,R): ∀ ∈ ∀ ∈ C ∈ C− { } ⊆ ̸ e (CP ) m + m 0 (2.8) ∑ l=(R,Re) − e ≥ l=(R,Re) E ∈ l

C NC, P N ,m = (P,R) EM, R,Re NR,R = Re, l = (Re,R) El: ∀ ∈ ∀ ∈ C− ∈ { } ⊆ ̸ ∀ ∈ (CP ) + m 1 (2.9) ∑ l=(Re,R) ≤ l=(Re,R) E ∈ l

C NC, P N + ,Pe N , R,Re NR,R = Re,m = (P,R),m = (Pe,R): ∀ ∈ ∀ ∈ C ∈ C− { } ⊆ ̸ e (CP ) m + m 0 (2.10) ∑ l=(Re,R) − e ≥ l=(Re,R) E ∈ l

C NC,P N + , m = (P,R) EM, R,Re NR,R = Re, l = (R,Re) El: ∀ ∈ ∈ C ∀ ∈ { } ⊆ ̸ ∀ ∈ (CP ) + m 1 (2.11) ∑ l=(R,Re) ≤ l=(R,Re) E ∈ l Constraint Eq. (2.7) states that the binding targets of the message’s predecessor and successor processes are always part of its routing graph. The constraint Eqs. (2.8)–(2.11) state that the binding target of the predecessor (successor) process of the routed communication flow does not have activated in- (out-) edges, while it has to have one active out- (in-) edge, except in the case where the successor (predecessor) of the communication flow is also bound on this target. The variable CP is hereby set to 1 iff the message C is routed to the binding target l=(R,Re) of its successor process P using the link l from the resource R to the resource Re. C NC, P N + ,Pe N , R,Re NR,R = Re,m = (P,R),m = (Pe,R) ∀ ∈ ∀ ∈ C ∈ C− { } ⊆ ̸ e (CP ) (CP ) + m m = 0 (2.12) ∑ l=(Re,R) − ∑ l=(R,Re) − e l=(Re,R) E l=(R,Re) E ∈ l ∈ l

30 2.2 Multi-Objective Optimization of Embedded Systems

Constraint Eq. (2.12) states that a resource which is not the binding target of the predecessor or the successor of the routed message always has the same number of in- and out- edges for a communication flow. C NC, l = (R,Re) El: ∀ ∈ ∀ ∈ (CP ) C 0 (2.13) ∑ l=(R,Re) l=(R,Re) P N + − ≥ ∈ C

C NC,P N + , l = (R,Re) El: ∀ ∈ ∈ C ∀ ∈ C CP 0 (2.14) l=(R,Re) − l=(R,Re) ≥ The constraint Eqs. (2.13) and (2.14) state that a message is routed over a directed link if at least one of its communication flows is routed over the link and that the message is not routed over a directed link if none of its communication flows are routed over the link. C NC, (R,Re) El: ∀ ∈ ∀ ∈ C +C 1 (2.15) l=(R,Re) l=(Re,R) ≤

C NC, (R,Re) El: ∀ ∈ ∀ ∈ C +C 2 C 0 (2.16) R Re − ∗ l=(R,Re) ≥ Constraint Eq. (2.15) states that a message cannot be routed in both directions over a link and constraint Eq. (2.16) states that a message that is routed over a link is also routed over the link’s end points.

2.2.4 Multi-Objective Optimization The following subsection briefly introduces several concepts which are important in the context of multi-objective optimization. Its first part discusses ways to compare the quality of two concrete solutions of a multi-objective optimization problem and explains the concepts of Pareto dominance and a Pareto front. The second part explains different properties of a multi-objective optimizer that affect the optimization and presents the concept of ε-dominance, which is used as a quality metric for multi- objective optimizers in this work.

Pareto Dominance For an optimization with an EA, it is essential to be able to determine the best solu- tion(s) from a set of evaluated solutions. In cases with only one objective function this is trivial. Consider, e.g., the problem of choosing the best out of 17 available cars, where each car is specified by its monetary cost and its performance and the character- istics of each car are given by the points (crosses and dots) plotted in Fig. 2.13. As

31 2 Fundamentals

Cost B

C A

Performance

Figure 2.13: Qualitative illustration of Pareto dominance: The plotted points illus- trate a set of solutions of a multi-objective problem where the cost is minimized while the performance is maximized. The Pareto-optimal solutions that constitute the Pareto front are plotted as blue dots. The dominated solutions are plotted as red crosses. long as the only objective is to minimize the cost or to maximize the performance, the optimal solution is given by point A or point B, respectively. Yet, when both objectives are considered at the same time, neither point A nor point B can be considered as the single optimal solution. While point A has lower costs, point B offers better performance. However, this relationship does not apply to all pairs of points. For example, point C is clearly inferior to point A, as it offers a lower performance and at the same time has a higher cost. In context of multi-objective optimization, the relation between point A and point C is referred to as Pareto dominance. Solution 1 is said to be dominated by solution 2 if solution 2 is better than solution 1 in at least one objective and at least as good as solution 1 in all other objectives. The non-dominated solutions of a multi-objective optimization problem are referred to as Pareto-optimal solutions and constitute the so-called Pareto front. In Fig. 2.13, e.g., the Pareto front is given by the solutions plotted as blue dots, while each solution plotted as a red cross is dominated by at least one Pareto-optimal point. Finding the solutions that constitute the Pareto front of the optimization problem is the goal of a multi-objective optimization.

32 2.2 Multi-Objective Optimization of Embedded Systems

Evaluation of Multi-Objective Optimization Approaches Developing new approaches for the multi-objective optimization of automotive Ether- net networks is one of the main contributions of this thesis. As with the development of any new approach, it is essential to have a quantitative metric that allows to compare several alternative approaches. It is at this point that we present the metric and the ter- minology used in this thesis in the context of evaluating approaches for multi-objective optimization. As detailed above, the goal of a multi-objective optimization is to find the true Pareto front, a set of solutions that dominate any other solution within the search space. Any approach for multi-objective optimization provides a Pareto front as the result of the optimization. However, in case of heuristic optimizers—such as the EA used in this work—there is (typically) no guarantee that the optimization result corresponds to the true Pareto front. For example, when exploring the space of different cars as described in the previous section, a heuristic optimizer may provide the set of points which are highlighted with blue circles in Fig. 2.13. With respect to the set of solutions explored by the optimizer, these points constitute the Pareto front. However, they do not correspond to the true Pareto front of the optimization problem. There are two characteristics that are relevant when evaluating an optimization approach. On the one hand, optimization approach A is better than approach B if the Pareto front provided by approach A is (a) located at a closer proximity or even directly at the location of the true Pareto front and (b) provides a greater range of Pareto-optimal results than approach B. In this case, we say that approach A provides optimization results of higher quality than approach B. On the other hand, approach A is better than approach B if it provides results of similar quality, but requires a shorter optimization time to provide these results. In this thesis, this relation is expressed by saying that approach A offers a faster optimization convergence than approach B. To evaluate and compare an optimization approach to alternative approaches w.r.t. both the quality of the provided results and the optimization convergence, this thesis uses a method based on the observation of the so-called ε-dominance [LTD+02] throughout the course of the optimization. Broadly speaking, ε-dominance measures the distance between the Pareto front found by the evaluated approach and a reference Pareto front. For the experiments in this thesis, the reference Pareto front is a set of the best solutions found during the experimental runs of all approaches that are compared in the respective experiment. Hereby, a lower value of the ε-dominance indicates a smaller distance to the reference Pareto front and a higher quality of the found solutions w.r.t. all optimization objectives. The different insights that can be gained from a plot of epsilon dominance when comparing several optimization approaches are demonstrated using the qualitative example in Fig. 2.14. The figure illustrates the course of the ε-dominance over the run time of the optimization experiments for two different optimization approaches, approach A (black) and approach B (blue). The horizontal shift denoted as ∆t0

33 2 Fundamentals

Approach A

∆ε0 Approach B ∆t0 -dominance ε

∆t

∆ε

t

Figure 2.14: Qualitative example of an ε-dominance-based comparison of two multi- objective optimization approaches. While the two approaches display a similar convergence speed, approach A requires a shorter preprocessing phase (∆t0) and a shorter overall run time (∆t), while approach B pro- vides a higher result quality, both in the initial generation (∆ε0) and in the overall result of the optimization (∆ε). indicates that with approach B it took a longer time until the first evaluated results were available, i.e., that approach B required a longer time for the preprocessing. A comparison of the solutions in the initial generation shows that approach B generates initial solutions of a higher quality than approach A (∆ε0). Similarly, the distances denoted as ∆t and ∆ε indicate that approach B requires a longer optimization time but provides results of a higher quality than approach A. Looking at the complete course of the two plots, it is evident that, depending on the current phase of the optimization, a different approach delivers better results and none of the approaches is clearly superior in terms of convergence speed.

34 3 Formal Timing and Reliability Analysis of Ethernet Time-Sensitive Networking (TSN) Networks

This chapter is dedicated to the formal analysis of timing and reliability aspects of automotive Ethernet networks. As illustrated in Fig. 2.8, the evaluation of design objectives of valid implementations is a crucial part of the optimization loop, as it enables the optimizer to identify the most promising individuals for the further course of the optimization. Section 3.1 proposes an approach for the timing analysis of mixed-criticality Ether- net networks that utilize the Time-Aware Shaper (TAS) mechanism introduced by the TSN standard. While these networks provide a completely predictable behavior of the scheduled traffic by construction, timing analysis of the critical non-scheduled traffic streams with hard deadlines remains a crucial issue. State-of-the-art timing analysis approaches [TCN00; DRE12; RGS+13; ATE14; TAE+14; TED15; TE16] consider the interference that unscheduled traffic streams impose on each other, but don’t provide means to determine the worst-case interference that can be imposed by scheduled traffic, without relying on specific restrictions of the shape of the schedule. Applying a timing analysis technique within a Design Space Exploration (DSE) results in an additional complexity challenge, as the occurring interferences have to be calculated within multiple nested fixed-point iterations for each evaluated solution candidate. A computationally expensive interference analysis, consequently, quickly makes the DSE infeasible, as it leads to extremely long optimization times. As a remedy, Section 3.1 proposes (a) an approach to integrate the analysis of the worst-case interference imposed by scheduled traffic on unscheduled messages into state-of-the-art timing analysis approaches and (b) preprocessing techniques that reduce the computation time of the calculation of this interference by several orders of magnitude without introducing any pessimism. Section 3.2 proposes an approach for the reliability analysis of transient errors occurring in Ethernet networks. This approach combines methods from timing and reliability analysis and considers special features of automotive Ethernet networks

35 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks like statical network design, multihop communication, and significant message jitter. Moreover, the adjustment of message sending rates is proposed as a novel mechanism to increase transmission reliability. In contrast to established design approaches [TMA+05; GLR+08; LGT09; Bir17] that focus mainly on structural redundancy on the task and the routing levels, the presented approach is based on temporal redundancy and can serve as an additional cost-efficient way to improve the reliability of communication networks. The contents of this chapter contribute to the overall goal of an automated network design by providing evaluators for message timing and transmission reliability, two of the most important non-functional design objectives in the automotive domain [Int11].

3.1 Timing Analysis of Mixed-Criticality TSN Networks1

As outlined in Section 2.1.3, TSN’s TAS can be used to implement scheduled TSN networks, where the port schedules managing the transmission of scheduled messages are configured according to a global schedule. In these networks, scheduled messages do not experience any interference and their transmission is, thus, deterministic. However, the timing analysis of unscheduled critical traffic remains a crucial task during network design. As Original Equipment Manufacturers (OEMs) develop many different vehicle variants with different sets of messages, it can be expected that only the safety-critical messages common to all variants are transmitted according to a schedule, while variant- exclusive messages are transmitted in an unscheduled way [SWS+16]. Furthermore, to achieve an efficient usage of the bandwidth, only strictly periodic messages should be scheduled, while the critical messages with a sporadic character should be transmitted using the high-priority unscheduled traffic classes. Consequently, next-generation automotive networks will contain both: critical scheduled messages which do not encounter any interference and unscheduled critical messages. To guarantee that unscheduled critical messages are delivered within their deadlines, a formal timing analysis of the network is required. Hereby, the effects of the inter- ference that unscheduled messages impose on each other [TCN00; DRE12; TE16] and the interference that can occur among high-priority messages sharing the same schedule slots [TED15] can be determined with existing analysis approaches. However, scheduled traffic affects the transmission of low-priority messages in a different way than high-priority traffic in non-preemptive strict-priority Ethernet networks. Existing analysis approaches must, thus, be extended to account for the interference caused by scheduled traffic.

1The contents of this section have appeared in parts in [SGR+17a].

36 3.1 Timing Analysis of Mixed-Criticality TSN Networks

The formal analysis method presented in this section calculates a safe bound for the interference that scheduled traffic in a TSN network imposes on unscheduled traffic, in particular on each unscheduled message. We assume that the scheduled traffic routed through the analyzed port is described by a port schedule. The Hyper-Period (HP) of the port schedule is given as the least common multiple of the periods of the scheduled messages on the analyzed port. This port schedule is transformed into a set of Interference Slots (ISs). The ISs describe the time intervals during which the transmission of frames of unscheduled messages is impossible, either because scheduled frames are being transmitted or as a result of the preemption and/or the guard band mechanisms [Ins15; Ins16] used to ensure interference-freedom for scheduled messages. In the existing work on the analysis of TSN networks, the analysis of the interfer- ence imposed on unscheduled traffic by scheduled traffic, the so-called Scheduled Interference (SI), is either completely ignored [TCC+15] or based on specific re- strictions that have to be applied during the schedule generation [TED15]. As the generation of a global schedule is already a difficult task, one of the goals ofthe presented approach is to not introduce any restrictions on the shape of the generated schedule by not making any assumptions about the shape and the distribution of the ISs. Consequently, an exhaustive examination of all arrival points of the unscheduled workload that could possibly lead to the maximum SI is necessary to derive a safe bound. However, in state-of-the-art timing analysis approaches, a (re)calculation of the worst-case interference that can be imposed during the analyzed time interval is required very frequently, as this calculation is performed within the innermost loop of multiple nested fixed-point iterations. Calculating the SI of a (realistic) schedule with a big number of ISs exhaustively, therefore, results in a very long computation time of the analysis, making such an analysis approach practically useless for a DSE. To cope with this issue, we present preprocessing techniques that, prior to the timing analysis, transform the given schedule into a data structure that contains only information relevant for the worst-case and can be used to quickly obtain the worst-case SI for an analyzed time interval of arbitrary length during the timing analysis. With this preprocessing, the SI can be calculated several orders of magnitude faster, so that it can be seamlessly integrated into state-of-the-art timing analysis frameworks and used for an efficient DSE. Section 3.1.1 provides an overview of the basic concepts of formal timing analysis of Ethernet networks and an introduction of the busy-period analysis [Leh90], one of the most popular analysis approaches in this area. Section 3.1.2 shows how the busy-period analysis can be extended to consider the interference from the scheduled traffic present in TSN networks and proposes preprocessing techniques to significantly reduce the time required for the calculation of scheduled interference, so that it can be effectively used as part of a DSE evaluator. Finally, Section 3.1.3 presents experimental results that demonstrate both the potential of TSN networks to reduce the delay of critical messages and the applicability of the presented analysis by performing

37 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks a timing analysis of an automotive network. The achievable analysis-time speedup and the tightness of the provided timing bounds when applying the proposed preprocessing are further highlighted by analyzing the worst-case interference of a set of synthetic schedules.

3.1.1 Timing Analysis using the Busy-Period Approach Contrary to the recently introduced TSN mechanisms, the timing analysis of auto- motive strict-priority networks is a well-researched area. As mentioned in Section 2.1.2, the configuration of these networks does not require the generation of aglobal transmission schedule, as the arbitration is done dynamically in the output ports of the network switches. However, the time required for the transmission of a message over a switch cannot be predicted precisely, as it strongly depends on the behavior of other messages in the system. For example, the message transmission may be delayed by messages that are currently being transmitted or are buffered in higher-priority queues of the output port. Compared to scheduled networks, the lower configuration and reconfiguration overhead, thus, comes at the cost of a lower determinism concerning the transmission behavior. To guarantee that safety-critical messages satisfy their deadlines, network designers have to consider the worst case where the messages encounter the maximum amount of interference. This area of network analysis is commonly referred to as formal timing analysis. Multiple approaches for the formal timing analysis of Ethernet networks, e.g., Real-Time Calculus (RTC) [TCN00], can be found in literature. Without loss of generality, this work focuses on the analysis approach referred to as Compositional Performance Analysis (CPA) [RJE03; HHJ+05; HAE17].

Terminology. The purpose of timing analysis approaches is to calculate the so- called response time of the analyzed event. In the context of this thesis, the response time corresponds to the time required for the transmission of the analyzed message through the Ethernet network so that the terms response time and transmission time are used interchangeably when talking about timing analysis.2 Figure 3.1 illustrates the interrelationship of several terms that describe the timing of an event. Since the transmission time of a message depends on a varying amount of interference from other messages transmitted through the network, it is given by a time interval between the minimum and the maximum transmission time, the Best-Case Response Time (BCRT) and the Worst-Case Response Time (WCRT), respectively, rather than an exact number.

2For the timing analysis approaches presented in this thesis, we assume that the execution time of each message on each resource transmitting the message is constant and given by the message size and the transmission bandwidth of the respective resource. Consequently, the best-case execution time and the worst-case execution time have equal values.

38 3.1 Timing Analysis of Mixed-Criticality TSN Networks

∆ts ∆ta 0 BCRT ABCRT OBCRT OWCRT AWCRT WCRT t simulation

reality

formal analysis formal analysis

Figure 3.1: While the OWCRT provided by the simulation underestimates the AWCRT by an unknown time interval ∆ts, the WCRT provided by formal analy- sis overestimates AWCRT by the time interval ∆ta and can be used to provide real-time guarantees. To limit the waste of resources due to over- specification, the formal analysis should be tight, i.e., ∆ta should be small.

In reality, the transmission time of a message ranges between the Actual Best-Case Response Time (ABCRT) and the Actual Worst-Case Response Time (AWCRT). To provide real-time guarantees for a safety-critical message with hard deadlines, the AWCRT must be smaller than the deadline for the transmission of the message. Yet, neither the ABCRT nor the AWCRT are known at design time. A repeated simula- tion of the message transmission provides the Observed Best-Case Response Time (OBCRT) and the Observed Worst-Case Response Time (OWCRT). However, it is quite unlikely to observe the worst case during the simulation, so that the simula- tion typically underestimates the AWCRT and cannot be used to provide real-time guarantees. The values provided by formal timing analysis approaches, on the other hand, provide bounded values for the best- and the worst-case execution time, i.e., they underestimate the ABCRT and overestimate the AWCRT. While overestimating the AWCRT by the time interval ∆ta enables the designer to guarantee the real-time capability of the communication network, the applied analysis should be as tight—and the overestimation time interval ∆ta as small—as possible. For the sake of brevity, we refer to the bounded BCRT and the bounded WCRT simply as BCRT and WCRT in this thesis.

Compositional Performance Analysis (CPA). The CPA approach is a systematic method for formal timing analysis that is used to verify the timing properties of complex distributed systems with heterogeneous components. In [HAE17], the func- tionality of CPA is summarized as follows:

39 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

Functionality of CPA [HAE17] “CPA follows a compositional approach which first performs a local component- related timing verification step and then, in a global timing verification step, sets the local verification problems in a system context where inter-component dependencies are considered.”

In the context of the analysis of the transmission time of a message in an Ethernet network, the local timing verification step, hereafter referred toas local analysis, corresponds to an analysis that considers the timing impact of the individual commu- nication resources that participate in the transmission of the message. Please note that the term communication resource, as defined in [HAE17], refers to all components that provide processing service to the task under analysis. In an Ethernet network, a message is transmitted over Electronic Control Units (ECUs), switches, and links, so that all of these components can be considered communication resources. Since this does not comply with the definition of a resource presented in the Section 2.2.1, all ECUs, switches, and links involved in a transmission of messages are referred to as timing components in this thesis. A timing component is hereby denoted as E . CPA’s global timing verification step, which we refer toas global analysis, focuses on the interconnection of the timing components of the analyzed system. The prop- agation of timing events is reflected by establishing predecessor-successor relations between timing components. Hereby, the termination trace of a predecessor compo- nent corresponds to the activation trace of the successor component. In the context of communication networks, the activation (termination) trace describes the set of instants at which frames of a message arrive at (leave) a timing component and the timing components are connected according to the route of the message. The local analysis on a timing component can be summarized as the calculation of the termination trace of the analyzed event—represented by the so-called output timing model of the component—based on a given activation trace, the input timing model of the component. Each type of component hereby has a specific way in which it generates its output timing model νo from the input timing model νi. In this thesis, the timing influence of the component E is formalized by the timing function τE :

i o τE (ν ) = ν (3.1) The procedure of CPA is depicted in Fig. 3.2. The upper part of the figure illustrates the routing path of the analyzed message, routed from the sender G0 to the receiver G1 using the links l0 and l1 and the switch S0. During the global analysis, a timing component is created for each of these architecture elements. The components are then connected by directed edges in the order they participate in the message transmission. The purple edge between G0 and l0, e.g., illustrates the fact that the message is transmitted by the ECU G0 directly before getting to the link l0. Consequently, o i ν (CG0 ), the output timing model of message C on ECU G0, is identical to ν (Cl0 ),

40 3.1 Timing Analysis of Mixed-Criticality TSN Networks

l0 l1 G0 S0 G1 CPA

o o o o ν (CG0 ) ν (Cl0 ) ν (CS0 ) ν (Cl1 ) G0 l0 S0 l1 G1 i( ) i i i( ) ν Cl0 ν (CS0 ) ν (Cl1 ) ν CG1

Figure 3.2: Illustration of the CPA approach. During the global analysis, the routing graph (above) of message C is used to generate the network of timing components (below) that participate in the transmission of the message.

the input timing model of message C on link l0. Applying τl0 , the timing function of i o l0, with the argument ν (Cl0 ) generates the output timing ν (Cl0 ), which is at the same time the input timing for the next component, S0, and so on. The analysis of realistic networks is, of course, more complicated than in the simple example presented in Fig. 3.2. Data dependencies, complex network topologies, and complicated arbitration schemes in the shared resources may result in timing dependencies between different messages transmitted through the network. To account for these dependencies, the local analysis step must typically be applied within a fixed- point iteration.

Timing Model. As already mentioned, the activation (termination) trace of a task on a timing component is given by the input (output) timing model. In the work at hand, we assume that both scheduled and unscheduled messages are sent periodically. The timing model ν(C,E ) used to describe the timing characteristics of message C when being transmitted by the timing component (a link or a resource) E is denoted as ν(CE ) and is given by the period p(C), the jitter j(C,E ), the delay l(C,E ), and the inter-arrival distance i(C,E ).

ν(C,E ) = (p(C), j(C,E ),l(C,E ),i(C,E )) (3.2) The period p is the constant time interval between two successive points in time when the sender injects a message into the communication network. The delay l corresponds to the BCRT of the message, i.e., the time interval between the point in time when the message is sent by the sender and the point in time when it arrives at the component E in case of minimum interference. The jitter j is the difference between the WCRT (in case of maximum interference) and the BCRT required by the message to reach component E . The jitter, thus, captures

41 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks the possible timing influence of interference that may or may not occur during the transmission of the message. The inter-arrival distance i is the minimum time interval that is present between two consecutive arrivals of the message at the component E . This timing model is used to calculate the arrival function η(C,E ,t) and the distance function δ(C,E ,n) of the analyzed timing component [Gre93; Ric05]. + The values η−(CE ,t) and η (CE ,t) denote the minimum and the maximum number of frames of message C that can arrive at element E during the time interval t. These values are given by Eqs. (3.3) and (3.4):

 jt j(C,E )k η−(C,E ,t) = max 0, − (3.3) p(C)

lt + j(C,E )m l t m η+(C,E ,t) = min , (3.4) p(C) i(C,E ) Without jitter, the number of frames that arrive within the time interval t depends only on the period. The minimum number of arriving frames is reduced by jitter, but can never be negative (3.3). At the same time, the maximum number of arriving frames grows with the jitter, but is bounded by the inter-arrival distance (3.4). + The minimum and the maximum distance function δ −(C,E ,n) and δ (C,E ,n) denote the minimum and the maximum length of the time interval during which n instances of the message C can arrive at the element E . They are calculated using Eqs. (3.5) and (3.6):  δ −(C,E ,n) = max (n 1) i(C,E ),(n 1) p(C) j(C,E ) (3.5) − · − · − δ +(C,E ,n) = (n 1) p(C) + j(C,E ) (3.6) − · In absence of jitter, the time required to receive n message frames depends only on the message period. The minimum distance for n is reduced by jitter and has a lower bound that depends on the inter-arrival distance (3.5). The maximum distance grows with the jitter (3.6).

Timing Function of Ethernet Output Ports. While the propagation of timing mod- els through the timing component network does not depend on the underlying appli- cation domain or networking technology, the development of timing functions that accurately represent the network technology under analysis is a key requirement for the application of CPA. In case of Ethernet networks, the components with the most complicated and most significant timing influence are the output ports of the Ethernet switches performing the arbitration between messages. The timing function of timing component E captures the way the component influences the period, the jitter, the delay, and the inter-arrival distance of the messages it transmits. Obviously, an Ethernet output port does not influence the transmission

42 3.1 Timing Analysis of Mixed-Criticality TSN Networks period of messages. While its influence on the delay and the inter-arrival distance is quite simple and easy to calculate, the analysis of its influence on the message’s jitter constitutes the complex part of the timing analysis.

Delay Analysis. The message delay is analyzed by summing up the constant delay contributions of all timing components within GRc, the routing graph of message C. Each delay contribution ∆l(E ,C) is hereby calculated based on the size of the message frame P(C) and the bandwidth of the component B(E ):  l(C) = ∑ ∆l(E ,C) (3.7) E G ∈ Rc P(C) ∆l(E ,C) = (3.8) B(E )

Inter-Arrival Distance Analysis. In Ethernet networks, the order in which subse- quent frames of the same message are transmitted by the network cannot be changed by the components transmitting the message. The inter-arrival distance i(C,E ) of message C on component E is given by C’s transmission time on Ee, the immediate routing predecessor of E : i(CE ) = ∆l(Ee,C) (3.9)

Jitter Analysis. As detailed in Section 2.1, only one message can be transmitted over the output port at the same time. The contribution of an Ethernet output port to the jitter of a message is the transmission time of messages that are sent while the message under analysis remains buffered in the output port. In a strict-priority arbitration, whether and how much an other message can interfere highly depends on its priority. For this reason, the overall jitter experienced by the analyzed message is divided into different parts depending on the priority of the interfering messages. Assume that we want to analyze the interference imposed on a frame of message C on the output port processing the messages of the set F. Based on their traffic class, all messages on this port are separated in sets relatively to C: the low-priority message set FL, the same-priority message set FS, and the high-priority message set FH. The maximum interference ∆ j(n,t˙) for the n-th frame of the analyzed message, arriving at the time point t˙, is given by:

∆ j(n,t˙) = ∆ jL + ∆ jO + ∆ jS + ∆ jH (3.10) where the term ∆ jL represents the maximum interference caused by low-priority traffic, ∆ jO is the transmission time of previous frames of the analyzed message, ∆ jS is the interference caused by same-priority traffic, and ∆ jH the interference caused by high-priority traffic.

43 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

Low-Priority Interference. The worst-case low-priority interference ∆ jL is the transmission time of the biggest low-priority frame on the analyzed component E (the output port of the Ethernet switch):  ∆ jL = max ∆l(E ,C) (3.11) C F ∈ L

Transmission Time of the Previous Frames. The busy-period analysis used in this work finds the worst-case interference, i.e., the interference of a message arriving at the so-called critical instant. This maximum interference is found by picking the biggest value out of the interferences faced by n subsequent message frames (see [Leh90; DRE12; HAE17]). When calculating the interference faced by the n-th frame, the transmission time of the previous frames ∆ jO also has to be considered. It is given by: ∆ j (n) = (n 1) ∆l(E ,C) (3.12) O − ·

Same-Priority Interference. In the worst case, the analyzed frame of the message is slowed down by the maximum amount of same-priority traffic that can arrive before t˙ and still be in the first-in-first-out (FIFO) queue at t˙. The same-priority interference is given by: +  ∆ jS(t˙) = ∑ ηC (t˙) ∆l(E ,C) (3.13) C F · ∈ S The interference of the low- and the same-priority traffic and the time needed for the transmission of previous frames only depends on t˙ and the level of the busy period. In the following, the sum of these values is denoted as the fix interference ∆ j f (n,t˙):

∆ j f (n,t˙) = ∆ jL + ∆ jO(n) + ∆ jS(t˙) (3.14)

High-Priority Interference. High-priority messages are stored in a different buffer and are favored during the transmission arbitration. Consequently, all high-priority messages that arrive during tb, the time interval during which the analyzed message is stored within the buffer, are transmitted before the message and contribute to its high- priority interference. At the same time, tb itself depends on the amount of high-priority traffic that is transmitted before the analyzed message is queued for transmission. This circular dependency is resolved with a fixed-point iteration:

 +  ∆ jH(tb) = min ∑ ηC (tb) ∆l(E ,C) (3.15) tb Tb · ∈ C FH ∈ +  Tb = t R ∆ j f (n,t˙) + ∆ jH(t) t (3.16) { ∈ 0 | ≤ }

44 3.1 Timing Analysis of Mixed-Criticality TSN Networks

3.1.2 Timing Analysis of Scheduled TSN Networks This section contains the novel contributions to enable an efficient timing analysis of scheduled TSN networks. First, it is shown how the scheduled interference can be integrated into the busy-period approach. Subsequently, multiple approaches for the calculation of the scheduled interference are proposed and compared to each other.

Analysis Extension The busy-period analysis as presented in Section 3.1.1 is frequently used for timing analysis of strict-priority Ethernet networks and also captures interference that pre- emptable unscheduled messages impose on each other in scheduled TSN networks. However, scheduled messages in TSN networks constitute a new type of high-priority traffic, so that Eq. (3.15) for the calculation of high-priority interference is extended to:

 +  +  ∆ jH(tb) = min ∑ η (tb) ∆l(E ,Ce) + υ tb + ∆l(E ,C) (3.17) tb Tb Ce · ∈ Ce F ∈ H where υ+(t) denotes the maximum interference that can be imposed by scheduled messages during the time interval t. As outlined in Section 2.1.3, all unscheduled messages are in the preemptable, while all scheduled messages are in the express preemption class. Consequently, scheduled messages can interfere with the analyzed message even after it was queued for transmission. The time interval for the accumula- tion of the scheduled interference, thus, consists of both, the time the message spends in the buffer tb and its transmission time ∆l(E ,C).

Calculation of Scheduled Interference In the following, we show how the worst-case scheduled interference for a given time interval can be calculated. Scheduled traffic on the analyzed resource is hereby described by a set of Interference Slots (ISs). Each IS is characterized by its start point t˙si and its end point t˙ei and represents the time during which no unscheduled traffic can be transmitted by the analyzed port. Figure 3.3 shows how a port schedule consisting of 4 port slots (I - IV) is transformed into a set of 3 ISs by appending the Guard Band (GB) time interval at the start of each port slot and appending a time interval for the Preemption Header (PH) at the end of each IS. Both the GB and the PH time intervals have a fix length. For details concerning the GB and the preemption mechanisms, please refer to Section 2.1.3 and [Ins16].

Exhaustive Analysis. Depending on the point in time when it arrives at the analyzed port, an unscheduled message experiences different amounts of scheduled interference during time intervals of different lengths. Consequently, every possible arrival point

45 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

IS 1 IS 2 IS 3

GB I PH GB II PH GB III GB IV PH t Port Schedule Hyper-Period (HP)

Figure 3.3: Creation of the ISs [SGR+17a]. within the Hyper-Period (HP) of the port schedule must be considered when searching for the worst-case scheduled interference during a given time interval. Only an arrival at a start point of an IS can lead to the worst-case interference as in this case, the whole IS contributes to the interference before any part of the unscheduled message is transmitted. The worst-case scheduled interference for a given time interval is, thus, found by examining the interference resulting from an arrival at each of the start points of the ISs. The interference υ+(t˙,t) during the time interval t resulting from the arrival at the point t˙ is given by:

+ ei si  υ (t˙,t) = max ∑ (t˙i t˙i ) (3.18) b B(t) 1 i b − ∈ ≤ ≤ with + si B(t) = b N t˙ t˙ t˙+t (3.19) { ∈ | ≤ b ≤ } si ei where t˙1 and t˙1 are the start and the end points of the first IS after the arrival point, si ei t˙2 and t˙2 are the start and the end points of the second IS after the arrival point and so on. The worst-case scheduled interference is the biggest value that is found by applying Eq. (3.18) at the start point of each Interference Slot (IS). The time needed to calculate the scheduled interference in this way grows quadratically with the number of ISs, which can get very large in realistic networks. The large computation time of this operation is especially critical as during the timing analysis of a communication network, scheduled interference is calculated inside the high-priority fixed-point iteration (Eq. (3.17)), which itself is calculated inside the fixed-point iteration used to resolve the dependencies between the timing components of the network. The computation time of the scheduled interference, thus, has a considerable influence on the overall computation time of the timing analysis. In what follows, we show how to overcome this problem by preprocessing the system-wide schedule (known prior to the timing analysis) that results in a significant reduction of the time needed for the calculation of the scheduled interference during the timing analysis. We first exploit (a) the fact that the worst-case interference imposed by scheduled traffic during the complete HP of the schedule can be determined prior to the calculation to provide a quick way to determine the so-called static interference + υs , i.e., the interference imposed on an unscheduled message during a time interval consisting of complete HPs of the port schedule. We then present (b) three alternative

46 3.1 Timing Analysis of Mixed-Criticality TSN Networks

+ techniques to determine the so-called dynamic interference υd imposed during the remaining fraction of the HP. The scheduled interference is then given by the sum of + + υs and υd .

Static Interference. As the port schedule—which has a HP of length h—is known + prior to the timing analysis, υs (h), the maximum scheduled interference that can occur during one complete HP, can be calculated before the start of the timing analysis + by summing up the lengths of all ISs. υs for the time interval t is then given by: jt k υ+(t) = υ+(h) (3.20) s h · s The first factor is the number of complete HPs contained within the time interval while the second factor is the maximum scheduled interference in one complete HP. + + As υs (h) is determined during the preprocessing step, υs (t) may be calculated in constant time at the runtime of the timing analysis. After the static interference which occurs during complete HPs has been considered, the dynamic interference that is imposed during the remaining time tr(t) has to be calculated. It is given by: jt k t (t) = t h (3.21) r − h ·

Dynamic Interference: Naive Approach (NA). In the first approach, denoted as the Naive Approach (NA), we conservatively assume that during the incomplete HP, the scheduled traffic imposes the same amount of interference as during acomplete + + + HP, i.e., υd (tr) = υs (h). Similarly to the calculation of υs , the NA is based on the + + precalculated value for υs (h), so that υd may be calculated in constant time during the timing analysis. This, however, comes at the cost of a significant overestimation of the scheduled interference. The degree of overestimation grows with larger HPs and smaller values of tr.

Dynamic Interference: Exact Solution (ES). The second approach, denoted as + the Exact Solution (ES), calculates υd with minimum overestimation by applying Eq. (3.18) to find the maximum dynamic interference during the time tr. The ES computes safe and tight values for the worst-case dynamic interference for a given timing model. However, during the calculation with n ISs, an interference value has to be determined for each of the n candidate points. The ES has a complexity that is quadratic in the number of ISs because, in the worst case, all n ISs contribute to the interference and have to be considered during each calculation. Consequently, the computation time of the ES quickly grows with an increasing number of ISs.

47 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

Dynamic Interference: Dominance-based Approach (DBA). The third approach called Dominance-based Approach (DBA) significantly reduces the computation time of the ES while still providing safe and tight results. In the DBA, the port schedule is preprocessed and transformed into a data structure that (a) only contains information regarding the worst-case interference and (b) enables a quick derivation of the worst- case interference for arbitrary time intervals.

DBA: Interference Table. In the first step of the DBA, the set of the ISs is trans- formed into a so-called interference table. The n ISs are numbered in ascending order, where each IS is characterized by its start and its end. The interference table has n rows and n columns. The entry ai j in the row i and the column j represents the interference that is imposed on unscheduled traffic that arrives at the start ofthe i-th IS by the first j ISs that are encountered. The entry is characterized by the values d(ai j) and f(ai j). f(ai j) is the overall interference by all ISs between the start of the i-th and the end of the j-th IS. d(ai j) is the distance between the start point of the j-th IS and the start of the i-th IS. d(ai j) provides the minimum time after which an si unscheduled message arriving at t˙i experiences the scheduled interference f(ai j). The interference table fully describes the interference that is imposed on arbitrary amounts of unscheduled traffic arriving at one of the n candidate points. In the next steps, all information that is irrelevant for the worst case is eliminated. DBA: Entry dominance. Only the so-called dominant entries are relevant for the calculation of the scheduled interference. Entry aa is dominated by entry ab if d(a ) d(a ) f(a ) f(a ) as, in this case, entry a describes a situation where a b ≤ a ∧ b ≥ a b bigger or equal scheduled interference is imposed on the unscheduled message during a shorter or equal time interval than in the situation described by the entry aa. An entry is dominant if it is not dominated by any other entry in the interference table. DBA: Reduction. After the creation of the interference table, all non-dominant entries are iteratively eliminated. Each iteration starts by finding the set of entries relevant for the shortest time interval (the entries with the smallest d-value). This set always contains one dominant entry. All entries dominated by this entry are eliminated from the interference table, while the dominant entry is added to the so-called interference list, ordered in ascending order according to the d-value. DBA: Interference List. The interference list contains only information that is + relevant for the worst case. It is used to quickly derive υd for the given time interval tr with: +  υ (t ) = f(p) + max 0,t d(p + 1) + f(p + 1) f(p) (3.22) d r r − − where p denotes the position of the interference list entry with the biggest d value that is smaller than tr. The first term of the sum is the interference from the complete ISs that contribute to the entry at position p, while the second term provides the worst-case interference that is imposed by one incomplete IS.

48 3.1 Timing Analysis of Mixed-Criticality TSN Networks

During the timing analysis, the maximum dynamic interference is computed by (a) finding the right list position p and (b) applying Eq. (3.22). As the interference list is ordered, step (a) is performed using logarithmic search, while step (b) is performed + in constant time. The complexity of finding υd is consequently O(2logn) with the number of ISs as problem size. Thus, applying the DBA reduces the complexity of the calculation of the scheduled interference from quadratic to logarithmic. Example The DBA is demonstrated using an IS set with three ISs and a HP length h = 20 Time Units (TUs). The ISs start at 3, 7, and 14 and have a length of 3, 2, and 4 TUs. The generated interference table is shown in Tab. 3.2a, while Tab. 3.2b illustrates its reduction to the interference list. During the first iteration (I.) of the reduction process, the entries relevant for the time interval of 0 TUs are examined. Hereby, the dominant entry 0 4 is found, while the other two entries are eliminated. After | the found entry has been added to the interference list, the second iteration (II.) of the reduction process is started and the dominant entry 4 5 is found. After the fifth | iteration (V.), the reduction process is finished. The dominant entries found through the reduction of the interference table are used to generate the interference list: [(0 4),(4 5),(7 6),(9 7),(11 9)]. This | | | | | interference list completely describes the worst-case scheduled interference that can be imposed during time intervals ranging from 0 to 20 TUs. The worst-case scheduled interference for a specific time interval can be found by applying Eq. (3.22).

3.1.3 Experimental Results The timing impact of scheduled TSN traffic and the efficiency of the proposed tim- ing analysis technique are demonstrated with two different experiments. The first experiment focuses on a comparison of the preprocessing approaches for the analysis of scheduled interference presented in Section 3.1.2. A synthetic case study is used to evaluate the computational effort and the degree of overestimation of the three preprocessing techniques. In the second experiment, the timing differences between a scheduled TSN network and a network with strict-priority non-preemptive switched Ethernet (IEEE 802.1Q) [Ins14] are investigated by means of a formal timing analysis of an automotive communication network.

Table 3.1: Example for the DBA. (a) Generated interference table. (b) Results of the reduction step. i j 1 2 3 i j 1 2 3 \ \ 1 0 3 4 5 11 9 1 I. 0 3 II. 4 5 V. 11 9 | | | | | | 2 0 2 7 6 16 9 2 I. 0 2 III. 7 6 V. 16 9 | | | | | | 3 0 4 9 7 13 9 3 I. 0 4 IV. 9 7 V. 13 9 | | | | | |

49 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

5 ISs ES 103 DBA NA 102

101

100 computation time overhead 0 0.05 0.1 0.15 0.2 0.25 analyzed time interval [s]

15 ISs ES 103 DBA NA 102

101

100 computation time overhead 0 0.05 0.1 0.15 0.2 0.25 analyzed time interval [s]

100 ISs ES 103 DBA NA 102

101

100 computation time overhead 0 0.05 0.1 0.15 0.2 0.25 analyzed time interval [s]

Figure 3.4: Computation times of the three approaches for the calculation of the scheduled interference in relation to the NA: While the DBA, even for large numbers of ISs and long time intervals, never takes more than 10 times longer than the NA, the computation time of the ES can be up to 3,000 times longer. Note that the normalized computation times are given in logarithmic scale [SGR+17a].

50 3.1 Timing Analysis of Mixed-Criticality TSN Networks

The synthetic case study evaluates the DBA by comparing its computational effort and its degree of overestimation to the two other approaches presented in Section 3.1.2. During this experiment, the schedules have a HP length h = 100 ms and contain different numbers of ISs, distributed regularly over the HP. Each IS represents a scheduled message with a frame size of 1,500 Bytes in a 100-MBit-Ethernet network. The results of the first experiment are illustrated in Figs. 3.4 and 3.5. Figure 3.4 shows how the computation time of the calculation of the scheduled interference scales for 5 (top), 15 (middle), and 100 (bottom) ISs depending on the length of the analyzed time interval. The plots show the relation between the computation time of the ES (dashed) and the DBA compared to the NA, which always requires constant time. The computation times of the DBA and the ES show a similar behavior in the sense that they both grow with the number of ISs and have local minima at the points where the preprocessing-based calculation of the static interference described in Section 3.1.2 reduces the computation time of the scheduled interference calculation (the points around 0.1 and 0.2). However, while the DBA, even for large numbers of ISs, takes at most around 10 times longer than the NA, the ES is up to 3,000 times slower. Both the ES and the DBA provide tight analysis results without any overestimation of the scheduled interference. Figure 3.5 illustrates the degree of overestimation of the NA (dotted) and the DBA compared to the ES when analyzing the scheduled interference of 5 (black), 15 (orange) and 100 (violet) ISs. While the DBA provides tight results with a minimum degree of overestimation, the NA overestimates the interference by a factor of up to 100. The overestimation of the NA gets smaller when more ISs contribute to the worst- case interference and when the analyzed time interval contains full HPs. The results of this experiment give evidence that the DBA calculates the scheduled interference with the same tightness as the ES. At the same time, the DBA is approximately 400 times faster than the ES for a port slot number of 100, which is a realistic value for complex networks with different application-specific message periods. The usage of the DBA, hence, provides safe and tight analysis results while offering a tremendous speedup for timing analysis. The automotive network used for the second experiment consists of 32 nodes exchanging 152 messages, with both single- and multi-cast messages. The first row of Tab. 3.3 contains the average jitter experienced by the messages of this network when using strict-priority non-preemptive switched Ethernet (IEEE 802.1Q) [Ins14]. For the analysis of the scheduled TSN network, an Evolutionary Algorithm (EA) was used to find overlap-free schedules for the messages of the fourth traffic class. The timing of each overlap-free schedule that was found during the exploration was analyzed to obtain the values in Tab. 3.3. While the analysis of the unscheduled network took 0.20 seconds, the analysis of a scheduled network took 0.23 seconds on average. As expected, the scheduled messages experience no jitter at all, while the jitter values of the other traffic classes substantially differ from those in the unscheduled

51 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

102 NA 5 ISs NA 15 ISs NA 100 ISs ES and DBA 101

0 factor of overestimation 10 0 0.05 0.1 0.15 0.2 0.25 analyzed time interval [s]

Figure 3.5: Comparison of the factor of overestimation of the three analysis ap- proaches in relation to the ES: While the NA overestimates the interfer- ence by a factor of up to 100, the DBA always provides tight results. Note that the factor of overestimation is given in logarithmic scale [SGR+17a]. network. The scheduled interference differs from the unscheduled high-priority interference in two aspects. On the one hand, the arrival points of the scheduled messages have constant offsets to each other so that no critical instant can occur and the number of messages interfering in the worst-case is smaller. Unscheduled messages with the highest priority (traffic class 3 in our case study) benefit from this circumstance. On the other hand, the Guard Band (GB) mechanisms in combination with the preemption of the unscheduled traffic lead to a bigger interference per interfering scheduled message. For low-priority unscheduled messages, which, in the worst-case, also experience the critical instants of unscheduled messages with higher priority, this leads to jitter values larger than those in unscheduled networks.

3.1.4 Related Work A rich work on timing analysis of switched Ethernet exists in literature. Formal analysis approaches like Real-Time Calculus (RTC) [TCN00] or Compositional Per-

Table 3.3: Average jitter per traffic class in seconds Traffic class 1 2 34 IEEE802.1Q 3.0598 0.0049 0.0061 1.291 10 4 ∗ − TSN min 3.0601 0.0048 0.0039 0 TSN average 3.0601 0.0048 0.0044 0 TSN max 3.0601 0.0048 0.0053 0

52 3.2 Reliability Analysis of Ethernet Networks under Transient Transmission Errors formance Analysis (CPA) [HHJ+05; DRE12; HAE17] have been proven to provide safe worst-case latency bounds for Ethernet. An analysis approach for IEEE 802.1Q can be found, e.g., in [RGS+13]. To our knowledge, there are currently multiple works presenting approaches for formal analysis of an Ethernet TSN network: The authors of [TE16] propose an analysis approach that considers the effects of the preemption mechanisms introduced by the TSN standard [Ins16], but does not consider a simulta- neous usage of frame preemption and the TAS mechanism. In [TCC+15], the authors assume that the schedule on the output ports of TSN switches consists of just one periodic slot, while the authors of [TED15] assume a schedule that consists of one periodic slot per critical traffic class. During the design of realistic networks with many messages with different application-specific periods, it is very hard to generate overlap-free schedules following these restrictions. In contrast to both works, the applicability of the analysis approach presented in this thesis does not depend on any restrictions regarding the shape of the schedules on the output ports and can, consequently, be used with any schedule.

3.2 Reliability Analysis of Ethernet Networks under Transient Transmission Errors3

Modern automotive applications like Advanced Driver Assistance Systems (ADAS) or X-By-Wire have not only high bandwidth requirements, but are also highly safety- relevant. The reliability of message transmission and the real-time capability of the communication network are, consequently, becoming two of the main design objectives. The correct functionality of safety-critical applications can be disrupted not only by deadline misses or permanent hardware failures, but also by transient faults caused, e.g., by electrical noise [vDS18]. To cope with messages that are lost or corrupted by transient faults, messages of safety-critical applications are usually transmitted periodically and not in a request- response fashion. To operate correctly, a receiver of a safety-critical message has to receive at least one message during an application-specific time interval, the so-called diagnostic test interval tD [Int11]. To attain the required reliability level for an automotive system, most works cur- rently found in literature focus on structural redundancy on both, the task (multiple mapping of tasks) [GLR+08] and the routing levels (redundant routing paths for critical messages) [LGT09]. The possibility of temporal redundancy, where multiple messages are sent during one diagnostic test interval to attain a higher resilience against transient errors, has, so far, been widely neglected in the area of design automation. Indeed, increasing the sending rate of messages can serve as an additional cost-efficient way to improve the reliability of communication systems.

3Major parts of this section have been published in [SGR+16].

53 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

However, the determination of an optimal sending rate of a safety-critical message is a non-trivial optimization task. Whereas sending a message more often results in a better transmission reliability of this individual message, it also increases the network load and the interference that this message imposes on messages using the same communication medium, hereby increasing their transmission time and possibly affecting the reliability of corresponding functionalities. Consequently, the overall reliability of a set of transmitted messages is tightly coupled with their timing behavior, so that, during the design of a communication network, timing characteristics and reliability must be analyzed jointly. This section proposes a formal method for the analysis of the transmission reliability of periodically sent messages in a switched Ethernet network under influence of transient errors. The proposed approach considers special features of automotive Ethernet networks like statical network design, multi-hop communication, and a significant message jitter. It assumes a given bit error rate E (obtained, e.g., by Electromagnetic Field (EMF) measurements) and an application-specific diagnostic test interval to calculate a safe lower bound for the reliability of message transmission while considering the timing behavior of the entire communication system. After deriving formulas for the reliability of a message experiencing no interference two approaches to compute the Mean time to failure (MTTF) of a message under the influence of network jitter are presented: the first and more simple approach provides a reliability bound based on the assumption that the worst possible jitter distribution that can occur within one diagnostic test interval occurs in every diagnostic test interval observed over a long time. While this approach provides a safe bound, it also introduces a significant underestimation of the transmission reliability. The second approach is able to calculate a tighter yet still safe reliability bound by observing a sequence of special time intervals, the so-called dependability periods tD, over a long time. A dependability period is a time interval with a realistic jitter distribution, where a diagnostic test interval tD with the maximal number of delayed messages (the negative burst phase tN) is followed by a recovery phase tRcontaining the messages that were delayed by the interference in the Ethernet network and + therefore did not arrive within the tN. Contrary to the approach described in [QBH 14], the presented approach is not optimistic and therefore provides a safe reliability bound. The difficulty of the MTTF calculation of this second approach is that every possible position of a diagnostic test interval inside the dependability period has to be considered. We overcome this difficulty with the concept of dominance intervals, special time intervals where the probability of the occurrence of a detected error only depends on a certain subset of messages inside the dependability period. With this, the proposed analysis method can (a) be coupled with state-of-the-art timing analysis frameworks and (b) used for the automatic optimization of communication networks. Section 3.2.1 introduces the notion and the mechanisms of reliability in automotive Ethernet networks as well as the error model considered in the work at hand. Section 3.2.2 presents formulas that calculate the reliability of a message which does not

54 3.2 Reliability Analysis of Ethernet Networks under Transient Transmission Errors

CRC Check

G0 E S0 E S1 E S2 E G1

Potential Corruption

Figure 3.6: A corruption of the message can occur during each communication hop on the route from a sender to a receiver. Each node that is passed during the transmission checks the CRC of the message and drops the message if the CRC is not correct. experience any interference. Section 3.2.3 then proposes two approaches that extend these calculations to account for the interference that messages in Ethernet networks impose on each other. The benefits of the presented analysis as well as the trade- off between the reliability of an individual message and the timing behavior of the entire system are showcased in the experiments in Section 3.2.4, where the possible sending rates of high-priority messages in a switched Ethernet automotive network are explored to optimize their transmission reliability. Finally, Section 3.2.5 presents the related work for the topic.

3.2.1 Reliability Model

System Model In this work, a sender is assumed to send a message to a receiver through an Ethernet network. The message is routed over c switches and, consequently, arrives at the receiver after c communication hops over c communication nodes. During the send process, the sender wraps the payload with the needed headers and footers and adds a Cyclic Redundancy Check (CRC) bit sequence. After this, the message consists of b bits. Now, during each communication hop between two nodes, each bit of the transmitted frame can be corrupted with a certain probability E. The recipient node of every communication hop checks whether the CRC is valid. If the node detects that a frame has been corrupted, the frame is dropped and does not arrive at the receiver. Figure 3.6 demonstrates this model. Here, the system consists of the sender G0, the receiver G1 and 3 Ethernet switches S0–S2 (and consequently c = 4 communication hops with 5 communication nodes) between them. G0 is periodically sending message C to G1. During the sending process, G0 wraps the payload with the needed headers and footers and adds a CRC bit sequence. After this, C consists of b bits. During each communication hop between two nodes, the message can be corrupted with a certain

55 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks probability. At each node, the CRC of the message is checked. If the node detects that the message has been corrupted, the message is dropped and does not arrive at the receiver.

Error Model In the following, we focus on the analysis of transient errors. In contrast to permanent errors, transient errors cause a short (single) upset instead of a perma- nent malfunction of system resources. Environmental radiation or electromagnetic interference can lead to such transient errors. In our model, the probability of transient errors is characterized by the Bit Error Rate (BER) E. During each communication hop, each bit of the transmitted message can be equally flipped with this probability. The CRC of the transmitted messages is used as the error detection and mitigation mechanism for these errors. In our system model, we assume the use of a standard 32-bit CRC [Koo02]. The receiver in the above system model is a data processing node which receives information provided by the sender node periodically. It expects to receive at least one message during the diagnostic test interval tD which is a multiple of the sending period p. The event where no messages arrive during the diagnostic test interval is evaluated as an error and the receiver reacts accordingly4. Figure 3.7 illustrates the concepts of the diagnostic test interval tD in combination with temporal redundancy. In the case depicted in Fig. 3.7a, a message is sent with a sending period p, equal to the length of the diagnostic test interval. The arrows symbolize message arrivals at the receiver. Error-free communication is assumed if the receiver detects at least one arriving message in each diagnostic test interval. In cases where a message frame is either dropped due to transient errors (symbolized by a red lightning) or delayed by jitter resulting from the interference with other messages (symbolized by the dashed arrows), the receiver observes a diagnostic test interval without any arriving messages. This is considered an error. Figure 3.7b shows the same system in case a message is sent twice during each diagnostic test interval. While the case where no messages arrive within a diagnostic test interval is still possible, the probability for this event is considerably lower because its precondition is that two subsequent message frames are either lost or delayed.

3.2.2 Reliability Calculation In the following, a closed-form calculation to derive the long-term reliability of a message without jitter is presented. We first present an analysis based on the assumption that every message corruption is detected. The analysis is then expanded

4The exact reaction to a transmission error depends on the purpose of the respective message and can range from the incrementation of an error counter to the degradation/deactivation of the respective system functionality.

56 3.2 Reliability Analysis of Ethernet Networks under Transient Transmission Errors

j j j E E E t t p p p

tD tD (a) Default case (b) Temporal redundancy

Figure 3.7: Illustration of the diagnostic test interval tD and the concept of temporal redundancy. The blue arrows indicate the arrival of message frames at the receiver. Errors can occur if the message frames are either dropped due to corruption (symbolically illustrated by a red lightning) or delayed by the jitter j (illustrated by the dashed arrows) [SGR+16]. to account for the possibility that an occurring message corruption is not detected by the CRC check.

Perfect Error Detection First, we focus on the case where every message corruption during a message trans- mission is detected. For now, we thus assume that the probability to detect a corrupted message PC equals 1.

Probability Calculation. In case of a perfect corruption detection (PC= 1) for any number of corrupted bits, every message corruption during a communication hop is detected by the perfect check on the recipient communication node of the hop, result- ing in the drop of the damaged frame. Consequently, the case where a corruption of a message is not detected by any of the communication nodes on the message’s trans- mission route, a so-called residual error, is impossible. In this case, the probability of a residual error PR is 0. In the following, let PE(i) denote the probability that during one communication hop, a corruption leading to exactly i bit flips occurs. As shown in [RMS07], this probability is given by:   b i b i P (i) = E (1 E) − (3.23) E i · · − b where i is the number of all possible placements of i corruptions on b bits, E denotes i b i the bit error rate and E (1 E) − is the probability that exactly i bits are flipped · − while the remaining b i bits are transmitted correctly. − The probability PS that no bit flip occurs during a message transmission over c communication hops is then given by: c c P = P (0) = (1 E)b (3.24) S E −

57 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

The receiver is able to detect an error if none of the messages sent during one diagnostic test interval arrive. The probability that one of these messages is dropped along the way equals (1 P ). − S The probability PD that none of the n = tD/p messages sent during a diagnostic test interval arrives, hereby causing a detected error, is given by:

P = (1 P )n (3.25) D − S

Long-term Reliability. To evaluate the long-term characteristics of the communica- tion system, we calculate the Mean Time To Residual Error (MTTRE) and the Mean Time To Detected Error (MTTDE). As the residual error probability PR is zero, the MTTRE is infinitely long. The MTTDE is given by:

Z ∞ t t tD MTTDE = (1 PD) D dt = (3.26) 0 − −lg(1 P ) − D t where (1 P ) tD is the probability that no detected errors occur within t/t diagnostic − D D test intervals.

Imperfect Error Detection The analysis is now expanded to consider the imperfect error detection capabilities of the 32-bit CRC, which is used as error detection mechanism in Ethernet networks.

Probability Calculation. When assuming imperfect error detection, the case where a message corruption is not detected and a corrupted message arrives at the receiver is no longer impossible. According to [Koo02], the standard 32-bit CRC used in Ethernet networks detects every incorrect message with 3 or less bit flips and 99,999999975% of incorrect messages with exactly 4 bit flips (meaning that PC(4), the probability to detect a corruption with exactly 4 bit flips, equals 0.99999999975). To provide a safe reliability bound, we pessimistically assume that a corruption of 5 or more bit + 1 flips is never detected (PC(5 ) = 0). The probability PR that a corruption during a single communication hop leads to a residual error (hereafter referred to as a residual

Table 3.4: Results with perfect corruption detection

E PR MTTRE PD MTTDE 4 10− 0 ∞ 0.8723 29.146 ms 10 10 0 ∞ 3.81 10 9 182.02 d − · −

58 3.2 Reliability Analysis of Ethernet Networks under Transient Transmission Errors corruption) is therefore given by:

1  PR = PE(4) (1 PC(4)) + 1 ∑ PE(i) (3.27) · − − 0 i 4 ≤ ≤ The first summand describes the probability that a corruption with exactly 4 bitflips occurs and that this corruption is not detected, while the second summand contains the probability for a corruption with 5 or more bit flips (as the inverse probability of the case with less than 5 bit flips) which is never detected. The probabilities of the occurrence of a certain bit flip number can be calculated using Eq. (3.23). 1 With the probability PR, we can calculate the probability that a message routed c over c hops leads to a residual error PR. When a residual corruption occurs during a communication hop, the message always stays undetectably corrupted during the remaining communication hops, because it is either transmitted correctly or gets 5 c corrupted even further, increasing the number of flipped bits . The probability PR can therefore be calculated by summing up the probabilities of the occurrence of a residual corruption during each of the different hops of a message transmission:

c h 1 PR = ∑ PE(0) PR (3.28) 0 h c 1 · ≤ ≤ − The possibility of a residual error also has to be considered during the calculation of PD because a residual error, just like a successful message transmission, does not lead to the detection of an error by the receiver: n P = 1 (P (0)c + Pc ) (3.29) D − E R

Long-term Reliability. As the residual error probability is greater than zero, the mean time to residual error MTTRE is calculated with: t MTTRE = D (3.30) −ln(1 Pc )n − R The MTTDE value is calculated just as in Eq. (3.26), but using the value for PD as calculated in Eq. (3.29).

5We are ignoring the very unlikely possibility that a corruption during the later hops corrects a previously flipped bit.

Table 3.5: Results with imperfect corruption detection

E PR MTTRE PD MTTDE 4 10− 0.1620 226.22 ms 0.6507 57.04 ms 10 10 2.023 10 13 6270.39 y 3.81 10 9 182.02 d − · − · −

59 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

Examples. Consider the following situation: During the diagnostic test interval with a length of 60 ms, a sender is sending 1.5 messages on average. Each message has a length of 1,526 Bytes (12,208 bits) and is routed over two communication 4 + hops. We calculate with E1 = 10− , which, according to [FOF 04], can be used as an estimation of the BER for the Controller Area Network (CAN) protocol in 10 an aggressive environment and with E2 = 10− , the value commonly assumed for automotive Ethernet. Table 3.4 contains the results of the calculation that assumes a perfect corruption detection, while the results for the calculation considering the imperfect corruption detection are shown in Tab. 3.5. The results show that the CRC as used in standard Ethernet protocols may not suffice to provide an MTTRE value acceptable in the automotive domain, where permissible failure rates for components range between 10 and 100 failures in 109 hours [Sch18]. Consequently, additional error detection mechanisms, for example voters in the higher layers, may be necessary to prevent system failures. On the other hand, the example shows that the probability of residual errors is smaller than the probability of a detected error by several orders of magnitude. Con- sequently, the probability for residual errors can typically be neglected in reliability calculations, as the probability of a detected error constitutes the main reliability bottle neck.

3.2.3 Reliability/Timing Correlation

While the calculations in the last section provide a failure probability PD based on the number of communication hops c, the bit error rate E, and the error detection probability, we now take the timing jitter resulting from the network topology and the interference with concurrent messages into account as an additional potential source of safety violations. Until now, we assumed that each message is strictly periodic and that, consequently, the number of messages arriving during a diagnostic test interval is, apart from mes- sages being dropped due to corruption, constant (the situation illustrated in Fig. 3.8a). However, this assumption does not reflect the situation in a switched Ethernet network, where several messages compete for the same switch output port queues and interfere with each other. Messages in these networks can be delayed by jitter resulting from this interference. Hence, the number of messages arriving during a diagnostic test interval tD can vary significantly (Fig. 3.8b), even if no message is dropped due to adetected transmission error. To provide safe reliability guarantees, a formal analysis must, therefore, be based on the worst case resulting from message jitter in the network. In the following, we present a naive approach that delivers a safe but pessimistic reliability bound and a more sophisticated approach that delivers a tighter yet still safe bound. Both approaches are based on the assumption that the maximal jitter of a message has been already obtained by a timing analysis of the communication network.

60 3.2 Reliability Analysis of Ethernet Networks under Transient Transmission Errors

n n−

j = 0 j t max t p p

tD tD (a) No jitter (b) Maximal jitter

Figure 3.8: The impact of jitter on the number of arriving messages within a diagnostic + test interval of length tD [SGR 16].

c Note: Only the calculation of PD and MTTDE has to be altered. The values of PR and MTTRE depend solely on the total number of messages in the system. Since c the total message number is not changed by jitter, the calculation of PR and MTTRE remains the same as in Section 3.2.2.

Naive Approach (NA) To calculate a conservative and, therefore, safe reliability bound, we assume that the number of messages arriving within each diagnostic test interval is equal to n−, the minimal number of messages that can arrive within a time interval with the length tD due to a jitter jmax. Both jmax and n− can be calculated with state-of-the-art timing analysis techniques, e.g., using the busy-period approach which was briefly introduced in Section 3.1.1. n− is then given by:   jtD jmax k n− = η−(t , j ) = max 0, − (3.31) D max p A very conservative but safe reliability bound for the reliability of the message transmission considering the timing effects in an Ethernet network can therefore be obtained by assuming that only the minimal number n− of messages arrive in every diagnostic test interval within the observation time. PD, the probability that none of the n− messages arrives during such an interval, can be obtained by combining Eq. (3.25) with Eq. (3.31): P = (1 P )n− (3.32) D − S This new value for PD can then be used to calculate the MTTDE according to Eq. (3.26).

Refined Analysis (RA) The NA is based on the assumption that the observation time contains only diagnostic test intervals with a minimal number n− of arriving messages. This, however, is an

61 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

t tN tR tN tD

Figure 3.9: Dependability period with i = 0 [SGR+16]. over-approximation of the worst case because jitter can only delay the arrival of mes- sages but cannot reduce the absolute number of messages in the system. Consequently, each test interval with a minimal number of messages, the negative burst phase tN, has to be followed by the recovery phase tR that absorbs the messages that would have arrived during tN if they were not delayed by jitter. These two phases together form the dependability period tD, which can be extrapolated to get a tighter but still safe worst case for the long-term behavior of the system. An example of a dependability period is shown in Fig. 3.9. The calculation of this tighter reliability bound is performed in three steps. First, the worst case jitter is used to calculate tD. In the second step, the probability that no detected error occurs inside the dependability period is calculated. Finally, this probability is used to derive a safe MTTDE bound.

Calculation of the Dependability period. The dependability period tD is given by the negative burst phase tN, the recovery phase tR, and the messages contained within these phases.

Negative Burst Phase. As it represents the diagnostic test interval with the minimal number of messages, tN has the length tD and contains n− (Eq. (3.31)) messages.

Recovery Phase. tR is the minimal time interval necessary to restore the precondi- tions for the start of the next tN. As the implicit condition for the start of tN is that a message without jitter arrives just before the start point, tR ends with the arrival of the first message without jitter. Depending on the inter-arrival distance i of the analyzed message, the length of tR can vary between one (Fig. 3.9) and multiple (Fig. 3.10) message periods p. tR contains both the messages pushed from tN by jitter and the messages arriving periodically during the duration of tR. The number of messages contained inside the recovery phase, ntR, is given by a sum of np and nδ , where np is the number of messages arriving periodically within tR and nδ is the number of messages that would originally have arrived inside tN but were delayed by jitter. nδ is given by the difference between the number of periodic messages within tD and n−:

jtD k n = n− (3.33) δ p −

62 3.2 Reliability Analysis of Ethernet Networks under Transient Transmission Errors

tR can then be calculated by:

tR = min x p , with (3.34) x X { · } +∈ X = x N δ −(x + n ) x p (3.35) { ∈ | δ ≤ · }

Equations (3.34) and (3.35) formulate the condition that tR consists of the minimal number of message periods necessary to absorb both the delayed messages from tN and the messages arriving periodically. The minimal time to absorb n messages is equivalent to the minimum distance between n events defined in [Ric05] and can be calculated by δ −(n) = max(i n, p n j) (3.36) · · − where p, j and i are the period, the jitter and the inter-arrival distance of the message under consideration. The first message inside tR arrives tl after the end of tN with   t = max 0,i t (n− p + j) (3.37) l − D − · Equation (3.35) is then extended to:

+ X = x N δ −(x + n ) +tl x p (3.38) { ∈ | δ ≤ · }

Calculation of PD. The major difference to the calculation using the NA in Eq. (3.32) is that PD is now calculated for a time interval longer than the diagnostic test interval and that messages in this interval are distributed unequally. For a worst-case analysis of a network where the nodes are not synchronized, all possible positions of tD inside tD have to be considered. This is done by introducing the concept of dominance intervals which are associated with certain message groups. An example for the concept of dominance intervals is shown in Fig. 3.10. The dominance intervals are shown as yellow rectangles. Every dominance interval has an event area, depicted by green arrows inside the rectangles in Fig. 3.10. All messages arriving within the event area are the so-called dominant messages of the corresponding dominance interval. Whether a detected error can be observed within a dominance interval depends only on the successful transmission of its dominant messages. The dominance interval A in Fig. 3.10, e.g., has only the very first message, C1, inside its event area. This means that for the timespan covered by the dominance interval A, only the success of the transmission of C1 decides whether a detected error occurs. C1 has this importance because there is at least one diagnostic test interval that contains only this message (right at the start of the time line in Fig. 3.10). So if C1 does not arrive at the receiver, there is automatically at least one test interval without messages and a detected error occurs. On the other hand, a successful transmission of C1 means that no detected error will occur in any test intervals containing C1. Consequently, while other messages also arrive within the first dominance interval, only C1 is relevant

63 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

C1 C2 C3 C4 C5 C6 C7 C8 C9 i i i i i i i i

A t B C p D E tN = tD tR tD

Figure 3.10: Illustration of the dominance intervals concept for a case with i > 0. With a longer recovery phase, the dominance intervals A - E have to be considered during the reliability calculation. Hereby, the reliability inside interval A is dictated by the dominant message C1 inside its event area. The reliability of the subsequent intervals is then dictated by 4 dominant messages in each case (messages C2–C5 for interval B, messages C3 - C6 for interval C and so on) [SGR+16].

for the calculation of PD of the area from the start of the first diagnostic test interval (green arrow) to the end of the last diagnostic test interval (gray arrow) containing C1. The dominance interval B is then dominated by the messages in its event area (C2 to C5) and so on. To calculate PD of a given dependability period, the size of its dominance intervals and the dominant messages contained within these have to be determined. After that, the probabilities that no errors are detected within these intervals, the so-called faultlessness probabilities Pf , are combined to the probability that no errors are detected during the entire dependability period.

Determination of the Dominance Intervals. Each dominance interval is character- ized by its start point t˙sd, the end point of its event area t˙ed, and the earliest arrival time of a dominant message t˙dd. The event area starts at the start of the dominance ed sd interval and has a length of tD, so that t˙ is calculated by adding tD to t˙ . There also dd is a distance of tD between t˙ and the end of the dominance interval (gray arrows in Fig. 3.10). This so-called influence area of the current dominance interval is at the same time the event area of the next dominance interval. As the dependability period represents the worst-case jitter distribution, it always starts with an initial dominance interval with a minimal number of dominant messages (interval A in Fig. 3.10). The event area of this initial interval is identical with tN, while its influence area starts at the arrival point of the first message and hasalength of tD. The influence area of the first dominance interval is also the event areaofthe second interval. The second influence area starts at the arrival point of the first message inside the second event area and (like all influence and event areas) has a length of

64 3.2 Reliability Analysis of Ethernet Networks under Transient Transmission Errors

tD. The fact that the influence area of a dominance interval is at the same timethe event area of the following interval can be used to iteratively determine all dominance intervals which lie within the dependability period. As soon as the influence area of the currently processed interval exceeds the dependability period, all dominance intervals have been found and the calculation can be stopped.

Messages in a dominance interval. Both the arrival time points and the number of the dominant messages inside a dominance interval are needed for the probability calculation. The message number inside a time interval within the dependability period can be determined using Eqs. (3.39) and (3.40). The arrival time points are calculated in a similar way. The message number n of a time interval starting at t˙s and e ending at t˙ consists of n(tN), the number of messages arriving inside tN, and n(tR), the number of messages arriving within tR: ( e  s s η− min(tD,t˙ ) η−(t˙ ) for t˙ tD n(tN) = − s ≤ (3.39) 0 for t˙ > tD

( + s  + e e η tD max(tD,t˙ ) η (tD t˙ ) for t˙ > tD n(tR) = − − − (3.40) 0 for t˙e t ≤ D Calculation of P for the dependability period. The probability 1 P that no D − D detected errors occur inside the dependability period can be found by iteratively accumulating the faultlessness probabilities of the subsequent dominance intervals in the dependability period. The PD inside a single dominance interval containing n messages can be found with Eq. (3.25). FS The accumulated faultlessness probability Pf of two subsequent dominance inter- vals containing the message sets F and S can be found with: PFS = PF PS (3.41) f f · f F where Pf is the faultlessness probability of the first dominance interval PF = 1 P ( F ) (3.42) f − D | | S and Pf is faultlessness probability of the second dominance interval under the condition that there were no detected errors in the first dominance interval:  PS = 1 P (1 PS′) (3.43) f − I ∗ − f PI is the probability that there are no detected errors within the first dominance interval, but none of the messages lying within the intersection of the first and the second interval are transmitted correctly F S 2| \ | 1 PI = − (3.44) 2 F 1 | | −

65 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

S and Pf ′ is the probability that at least one of the messages that is contained only in the second dominance interval is transmitted correctly:

PS′ = 1 P ( S F ) (3.45) f − D | \ |

The MTTDE can then be obtained by inserting the calculated PD value and tD into Eq. (3.26).

3.2.4 Experimental Results To investigate the influence that the sending rate of a message has on its transmission reliability, the presented reliability analysis approaches are used for an exploration of the sending rates of high-priority messages of a strict-priority non-preemptive switched Ethernet (IEEE 802.1Q) [Ins14] automotive communication network. The network consists of 32 nodes exchanging 152 messages, with both single- and multi-cast messages. The sending rates are explored for 33 (21.7 % of the total message number) high-priority messages. For the exploration of the sending rates, the presented analysis is coupled with a timing-analysis framework based on the timing analysis approaches presented in [DRE12]. The sending rates of the high-priority messages are explored in the range between n = 1.3 and n = 5.0 messages sent within a diagnostic test interval. Although the calculations for both the NA and the RA approach took place during the exploration, the time for the reliability analysis was negligible compared to the time for the timing analysis. While the complete analysis of a network implementation took 12 seconds on average, the reliability analysis never took longer than 2 % of the analysis time. Figure 3.11 shows the number of messages reaching the MTTDE goal of 106 seconds. For sending rates between n = 1.3 and n = 2.0, no message achieves this goal. For sending rates between n = 2.0 and n = 4.0 messages per diagnostic test interval, a higher sending rate leads to a higher number of messages reaching the reliability goal with clearly visible sweet spots where a small increase of the sending rate leads to a considerably higher number of messages reaching the reliability goal (the points around the sending rates of n = 2.1 and n = 2.7). However, a sending rate higher than n = 4.0 messages per diagnostic test interval decreases the reliability significantly because of the outlined interference effects that now decrease the number of messages arriving within a diagnostic test interval, so that the number of messages reaching the reliability goal decreases again. Figure 3.12 gives a comparison between the reliability bounds delivered by the two approaches presented in Section 3.2.3. The RA approach provides reliability bounds which are up to 50 % tighter (up to 25 % on average). The difference is especially significant near sending rates where an additional message is emitted per diagnostic test interval (n = 2.0, n = 3.0, and n = 4.0).

66 3.2 Reliability Analysis of Ethernet Networks under Transient Transmission Errors

100

50

0 number of reliable messages 1 1.5 2 2.5 3 3.5 4 4.5 5 sending rate n

Figure 3.11: Number of messages reaching the MTTDE goal of 106 seconds in de- pendence of the sending rate. Around n = 2.1 and n = 2.7, the number increases. The decrease of the number around n = 4 is caused by an increased network interference [SGR+16].

3.2.5 Related Work The Ethernet communication protocol originates from the area of computer networks and the telecommunication sector. Traditionally, applications from these sectors are rather fault tolerant and system designers are mainly concerned about performance features of the network like, e.g., network throughput. In areas with stricter reliability demands like industrial automation, reliability of Ethernet networks is commonly measured by simulation [DEA10]. However, as outlined in Section 3.1.1, while giving a good impression of the average case, simulations do not suffice to provide the hard guarantees needed during the design of safety-critical systems. For CAN, a communication protocol well established in the automotive sector, approaches for the formal analysis of the communication reliability already exist, see , e.g., [SE09]. In the avionics sector, where the reliability requirements are comparable to those in the automotive sector, formal reliability analysis of the Avionics Full-Duplex Switched Ethernet (AFDX) protocol is also performed during system design [WLH11]. Similarly to the refined approach presented in this work, the authors+ of[QBH 14] use the idea to differentiate between the rarely occurring worst case situation and the long-term system behavior to cope with the over-approximation of the formal timing analysis. However, while the authors there provide an optimistic timing result in combination with an error metric, our refined approach is still based on the worst-case behavior of the system and consequently provides a safe yet tight system reliability bound. The authors of [MTE16] describe how the Automatic Repeat ReQuest (ARQ) protocol [LCM84] can be used to increase the reliability against transient faults.

67 3 Formal Timing and Reliability Analysis of Ethernet TSN Networks

avg max 40

20

0 1 1.5 2 2.5 3 3.5 4 4.5 5 relatively to the RA approach [%] MTTDE underestimation of the NA sending rate n

Figure 3.12: Maximum and average underestimation of the MTTDE introduced by the NA in relation to the results provided by the RA, in dependence of the sending rate n. For certain sending rates, the underestimation of the NA amounts to up to 50 % [SGR+16].

However, in contrast to the system model assumed in this theses, the ARQ protocol is based on the assumption that the receiver acknowledges every frame it receives, so that the reaction to a dropped message can only occur after the sender notices the missing acknowledgment. The authors of [GPB+19] propose a fault-tolerant Ethernet-based communication subsystem, where they, just like us, assume a temporally redundant message transmission as a mechanism to increase the reliability against transient faults without, however, providing an approach to analyze the reliability. To the best of our knowledge, [SGR+16] is the first work on the analysis of the reliability of message transmission in automotive Ethernet considering the correlation between timing characteristics and transmission reliability.

3.3 Conclusion

This chapter presented approaches for the formal reliability and timing analysis of automotive Ethernet networks. Section 3.1 introduced a formal analysis approach that provides a safe and tight upper bound for the interference that scheduled messages in TSN networks impose on unscheduled traffic. Furthermore, it proposed preprocessing techniques that reduce the computation time of this approach by several orders of magnitude without introducing any additional overestimation of the scheduled interference. Section 3.2 then presented a lightweight formal analysis approach for the determi- nation of the transmission reliability of messages in switched Ethernet networks under

68 3.3 Conclusion the influence of transient errors. It proposed both, a naive approach that underestimates the transmission reliability and a more sophisticated approach which delivers safe reliability bounds which are tighter by up to 50 %. The proposed analysis method takes the interrelation between the transmission reliability of a single message and the timing behavior of the overall communication network into account and enables to use the adjustment of the sending rate as a novel and cost-efficient means to improve the transmission reliability. The approaches presented in this chapter provide safe guarantees for the timing and the reliability of safety-critical automotive applications and, thus, make an important contribution towards the usage of Ethernet as the main communication technology in automotive networks. Furthermore, the proposed approaches are computationally lightweight and can be seamlessly integrated into state-of-the-art frameworks for timing analysis and design space exploration. In summary, the contributions presented in this chapter constitute an important step towards an automated design of automotive Ethernet networks for safety-critical applications.

69

4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet Time-Sensitive Networking (TSN) Networks

The overall goal of the work at hand is to enable a fully automated design of automotive Ethernet networks. As detailed in Chapter 2, the Design Space Exploration (DSE) hereby uses Evolutionary Algorithms (EAs). The contributions in the area of formal analysis presented in Chapter 3 enable the implementation of timing and reliability evaluators. The EA uses these evaluators for a comparison of individual design implementations with respect to the given design objectives in order to determine the best implementations that are then used as the foundation for further optimizations. However, the complex design of communication networks contains an immense number of design decisions, which, in certain combinations, result in design imple- mentations that are altogether invalid and cannot be evaluated. For example, neither the timing nor the reliability of a network design can be evaluated if the design does not offer any routes between the sending and the receiving communication nodes. Such invalid solutions may constitute a significant fraction of the design space. To mitigate their negative effect on the optimization efficiency, the work at hand applies the SAT-Decoding approach described in Chapter 2, where the search space explored by the EA is constrained by a set of Pseudo-Boolean (PB) constraints. The contributions presented in this chapter focus on the constraint sets necessary for an efficient optimization of automotive Ethernet TSN networks. To account forthe special features of automotive Ethernet networks and the novel mechanisms introduced by the TSN standard, these constraints significantly extend the basic constraint set introduced in Section 2.2.3. As detailed in Section 2.1.3, interference-free and therefore deterministic transmis- sion of critical messages requires not only a valid routing, but also a global schedule for a TSN network. The current approaches found in literature enable the generation of network designs with a valid routing and a valid schedule [Ste10; SLC13]. However,

71 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks these approaches generate the routing and the schedule in two entirely separated design steps, hereby ignoring and/or not exploiting the distinct interrelations between routing and scheduling decisions. The contribution presented in Section 4.1 is a joint con- straint set for routing and scheduling that can be used for the generation of networks with valid routings and global schedules. In contrast to a sequential generation of routing and scheduling, the presented approach can be integrated into SAT-Decoding and enables a concurrent optimization of the routing and scheduling with respect to objectives such as, e.g., minimizing the amount of scheduled interference introduced in Section 3.1.1. Future automotive applications like Car-To-Car or Car-To-X differ from the appli- cations found in current automobiles not only in their stricter reliability and timing requirements. In these safety-critical applications, driving decisions are made based on data generated outside of the car, e.g., by other cars or smart city infrastructure. While the significant increase of available data enables a higher quality of service, relying on external data also has a substantial potential for abuse. Consequently, security aspects also play a major role in the development of future automotive networks. One promising approach to address the security challenges in Ethernet networks is the division of the communication networks into so-called Virtual Local Area Networks (VLANs), i.e., subnetworks that are isolated at the data link layer (OSI layer 2). Yet, finding the optimal VLAN partitioning is a challenging task. The automation of the VLAN partitioning is a well-researched problem in the domain of local or metropoli- tan area networks. However, the approaches used there are hardly applicable for the design of automotive networks as they mainly focus on reducing the amount of broadcast traffic [RHK99; LL10] and cannot capture the many design objectives of automotive networks like the message timing or the link load, which are affected by the VLAN partitioning. As a remedy, Section 4.2 proposes an approach based on a set of PB constraints to generate a message routing which is feasible with respect to the VLAN-related routing restrictions in automotive networks. This constraint set can be seamlessly integrated into a DSE based on SAT-Decoding and, thus, enables to optimize not only the VLAN partitioning, but also other routing-related objectives. To cope with the strict reliability requirements of safety-critical Advanced Driver Assistance Systems (ADAS), the upcoming TSN standard [Ins17] introduces mech- anisms that enable transmission redundancy at any switch or end node (see Section 2.1.3). However, it is up to the designer to decide at which points and for which messages to activate transmission redundancy. The high number of infeasible solu- tions within the design space is a problem that cannot be addressed by the constraint set presented in Section 2.2.3, as this constraint set is limited to the generation of non-redundant message routings. As a remedy, Section 4.3 proposes two different encoding approaches for the encoding of valid redundant routings. Like the other con- straint sets presented in the work at hand, both encoding approaches can be integrated into SAT-Decoding and used for multi-objective optimization.

72 4.1 Joint Constraint Generation for Routing and Scheduling

4.1 Joint Constraint Generation for Routing and Scheduling1

Beside a greater bandwidth and strict guarantees for the transmission reliability, innovative automotive applications like X-by-Wire or ADAS require very fast or deterministic communication between the resources of the automotive communication network. As detailed in Section 2.1.3, the TSN standard addresses these requirements by introducing features that enable time-triggered transmission of critical traffic. While a message assigned to the new traffic class of scheduled messages does not experience any interference from unscheduled traffic by construction, its transmission can still be delayed by other scheduled messages. The interference among scheduled messages can, however, be prevented by a network configuration referred to as a scheduled TSN network. As detailed in Section 2.1.3, all scheduled messages in such a network are sent according to a global schedule. To guarantee that the transmission slots of scheduled messages never overlap, this schedule has to consider the size and the routing path of each scheduled message, the transmission bandwidth of each network link, and the sending behavior of each sender in the network. While the usage of scheduled TSN networks offers strict timing guarantees, it also makes the network design more challenging, as network implementations without a correct global schedule are invalid. The constraint set presented in Section 2.2.3, con- sequently, has to be extended to exclude these invalid solutions from the search space. State-of-the-art design approaches for time-triggered communication networks like, e.g., [Ste10] or [SLC13], handle the routing and the scheduling problems separately. In the first step, a constraint set is used to generate a valid routing for each message. After this, a set of schedule constraints is used to generate a valid schedule for the scheduled messages with the fixed routing defined in the previous step. This approach ignores the distinct interrelations between routing and scheduling and can, therefore, result in serious disadvantages during the DSE, e.g., if the routings generated in the first step yield an unschedulable system for the second step. The separated generation and optimization of the routings and the system schedule of these design approaches also significantly limits the global optimization potential as routing decisions aretaken without any knowledge about the schedule, while the variety of possible scheduling decisions may be seriously restricted by the previously determined fixed routing. In this section, we introduce a set of PB constraints which ensures both a valid routing and a valid global schedule of the scheduled messages in the network. This constraint set can be used to generate a valid mixed-criticality communication network with scheduled traffic in a single constraint resolution step. In the following, schedule-specific extensions of the system model introduced in Section 2.2.1 are first presented in Section 4.1.1. Subsequently, the constraint setfor the joint consideration of schedule and routing is presented in Section 4.1.2. Section

1Major parts of this section have been published in [SGR+17b].

73 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

4.1.3 then details the results of experiments where the presented approach is compared with approaches based on a subsequent determination of routing and scheduling. Finally, a summary of the related work for this topic is given in Section 4.1.4.

4.1.1 System Model In this section, we are using the system model presented in Section 2.2.1 and the network configuration presented in Section 2.1.3. A global schedule is generated by defining a so-called global offset oC for each scheduled message C. oC hereby defines the time interval between a global start point and the point in timewhenthe scheduled message C is sent into the network for the first time. In combination with the routing of the message and the message period, the global offset clearly specifies the port slots during which the frames of the message are transmitted by the output ports of the network nodes. For example, the global schedule of the scheduled TSN network illustrated in Fig. 2.6 guarantees interference-freedom for the three scheduled messages by defining global offsets of 0, 1, and 2 time units for the turquoise, the violet, and the yellow message, respectively. Assigning global offsets that guarantee interference-freedom for all scheduled messages is the main purpose of the constraint set presented in this section.

4.1.2 Constraint Formulation This section introduces constraint extensions needed for the joint generation of routing and scheduling of a mixed-criticality network with scheduled traffic. We start by introducing a constraint set that relies on a given routing of the scheduled messages to generate a valid schedule. This initial constraint set is then extended so that it can be combined with the constraints presented in Section 2.2.3 and used for the generation of valid routings and a valid schedule in a single step.

Schedule Constraints Fixed Routing We first focus on a constraint set to generate valid schedules after a valid routinghas already been determined. In [Ste10] and [SLC13], the authors present scheduling approaches where the schedule is generated by formulating Satisfiability Modulo Theories (SMT) constraints for each pair of potentially overlapping transmission slots of two frames. While we are using a similar idea, we have to reformulate them as PB constraints.

General Idea. As explained in Sections 2.1.3 and 4.1.1, a system schedule is valid iff it ensures that the port slots of scheduled messages do not overlap. The port slot of a message frame starts at the point in time when the frame arrives at the link and ends when the transmission of the frame is finished. Two port slots have therefore

74 4.1 Joint Constraint Generation for Routing and Scheduling no overlap if the slot of the frame that arrives first ends before the start of the slotof the frame that arrives second. Two respective messages C and Ce do not interfere on a l f f link if the following conditions hold for each frame pair ( C, Ce) in the cross product F F f C F of C and Ce, where C denotes a frame of message and C denotes the set of all frames of message C within the Hyper-Period (HP) h of the considered message pair. i j C,Ce NC,C = Ce,f FC,f F : ∀{ } ⊆ ̸ C ∈ Ce ∈ Ce Ce,l Cej,l C,l Ci,l C,l ˙ ˙ O j j (tf +tf ) (tf +tf +tT ) + (Ofi ,f Afi ,f ) 0 (4.1) − C Ce · C Ce ≥

C,l Ci,l Ce,l Cej,l Ce,l ˙ ˙ O j j (tf +tf ) (tf +tf +tT ) + ( Ofi ,f Af ,fi ) 0 (4.2) − ¬ C Ce · Ce C ≥ C,l In Eqs. (4.1) and (4.2), tT is the transmission time of a frame of message C on the C,l link l. t˙f is the point in time when the first frame within the HP of message C arrives at the link l. The arrival point of the i-th frame of the message C within the HP is then i given by adding tC ,l= (i 1) p to t˙C,l. f − · f The first term in Eqs. (4.1) and (4.2) denotes the start point of the frame that arrives later. The second term is the point in time when the transmission of the frame that arrives earlier is finished. As long as the later frame arrives after the end point ofthe earlier time slot, i.e., as long as the difference between the start of the later slot and the end of the earlier slot is non-negative, the two time slots do not overlap and the two frames do not cause interference to each other. Equation (4.1) formulates the constraint for the case that the frame of message C arrives before the frame of message Ce, while Eq. (4.2) addresses the case when the frame of message Ce arrives before the frame of message C. As these two situations contradict each other, we are using the third term of the equations, denoted as the so-called activation term, to make sure that only one of these constraints is activated, O j while the other one is trivially fulfilled. If the binary variable Ofi ,f of the activation C Ce term is set to 1, the constant number A is added to the sum. As long as A is bigger than the second term of the equation, the corresponding constraint is automatically fulfilled.

Start and End Points of the Time Slots. The point in time when the first frame C,l within the HP of message C arrives at link l, t˙f , is given by the sum of the global offset oC of the message and the so-called run offset rC,l: C,l t˙f = oC + rC,l (4.3)

The global offset oC is a value between 0 and pC, the period of the message C. We introduce the following equation for the binary encoding of the global offset: i i oC = ∑ (GC pC 0.5 ) (4.4) 1 i g · · ≤ ≤

75 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

Here, g is the number of the binary variables used for the offset encoding of each message, the so-called offset granularity. A higher granularity results in a bigger search space at the cost of longer execution times of the DSE but may enable a more dense schedule of the scheduled messages. The run offset rC,l is given by the link delay that frames of message C accumulate on the way from the sender to the link l:

C,l− rC,l = ∑ (tT ) (4.5) l E− −∈ l

Here, El− is the set of links that the frame passes before arriving at link l. With a given routing, the run offset is a constant value.

Big Number A. The constant number A in Eqs. (4.1) and (4.2) must be bigger than the maximal value of the subtracted term in the formulated constraints. For example, a safe bound for the big number for the Eq. (4.1) is given by:

Ci,l Afi ,f j = pC +tf + rC,l (4.6) C Ce Note: As we are using PB constraints for the problem formulation, only integer values may be used in the generated constraints. However, the run offset rC results from the sum of the transmission times of a message C over the network components. Run offsets in real networks may have continuous values that cannot be directly expressed using a combination of integer constants and binary variables. During constraint formulation, these values are transformed into integers by shifting the comma to the right by a certain number of positions and then cutting off the remaining part of the number. For certain, especially small numbers—like the link delays on a Gigabit-link—this transformation may lead to a loss of information. To make sure this loss of information does not result in invalid constraints, a safe correction must be made in these cases: If they suffer from information loss, all values contributing to the calculation of the end point of the earlier slot are incremented by one, while all values used for the calculation of the start point of the later slot have to be decremented by one. With this, the constraints describe the more strict situation where the first frame arrives slightly later, while the second frame arrives slightly earlier. Any solution satisfying the constraints built based on the corrected run offset values is guaranteed to also result in an overlap-free positioning of the frame slots in the real system, as the first frame will arrive slightly earlier and the second frame will arrive slightly later than in the situation described by the constraint.

Constraints for Routing and Scheduling If the schedule and the routing constraints shall be formulated jointly, the constraints introduced in Section 4.1.2 can no longer be used. On the one hand, as the routings are

76 4.1 Joint Constraint Generation for Routing and Scheduling not fixed, the links used by the messages are not known so that each scheduled message may potentially interfere with any other scheduled message in case a link is shared. On the other hand, the run offset is not a constant value any more because, depending on the routing, the message may pass a different set of links before arriving at the currently considered link. In the following, we show how to reformulate Eqs. (4.1) and (4.2) to account for the unknown message routing. Note that the following constraints have to be formulated for each pair of scheduled messages (C,Ce) on each link l that their frames can possibly pass: i j C,Ce NC,C = Ce,l El,f FC,f FC: ∀{ } ⊆ ̸ ∈ C ∈ Ce ∈ C,l (o + r ) (o + r +t ) + (O i j A i j ) Ce eCe,l C eC,l T ef ,f ef ,f 0 (4.7) − C Ce · C Ce ≥

Ce,l (o + r ) (o + r +t ) + (O j i A j i ) C eC,l Ce eCe,l T ef ,f ef ,f 0 (4.8) − Ce C · Ce C ≥ The three terms in the Eqs. (4.7) and (4.8) have the same purpose as in the Eqs. (4.1) and (4.2): The first term defines the start point of the later slot, the second termisthe end point of the earlier slot, while the third term ensures that the constraint is only activated when necessary. While the global offset term o is generated in the same way as before, the run offset er, the activation variable Oe, and the big number Ae have to be generated differently due to the unknown message routings, which is detailed in the following.

Activation Variable. While the binary variable O used in the activation term of the Eqs. (4.1) and (4.2) only contains information about the order of the two frames, the binary variable Oe in Eqs. (4.7) and (4.8) also contains information whether both messages of the considered message pair are routed over link l. In case that at least one of the messages is not routed over l, the messages cannot interfere on this link and both the Eq. (4.7) and Eq. (4.8) are trivially fulfilled. Two frames of the messages C and Ce can only interfere with each other on link l if both messages are routed over link l in the same direction (from resource R to resource R˜). With the routing encoding introduced in Section 2.2.3, the routing variables C C l=(R,Re) and el=(R,Re) are then both set to 1. The value of the activation variables for the case of unknown routing is, conse- quently, given by the following logic terms:

O i j = (C C O i j ) ef ,f l=(R,Re) el=(R,Re) f ,f (4.9) C Ce ¬ ∧ ∧ C Ce

O i j = (C C O i j ) ef ,f l=(R,Re) el=(R,Re) f ,f (4.10) C Ce ¬ ∧ ∧ ¬ C Ce The NAND-relation can be enforced by the following PB constraints:

C +C + O i j O i j l=(R,Re) el=(R,Re) f ,f ef ,f 2 (4.11) C Ce − ¬ C Ce ≤

77 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

O j O j Oefi ,f Ofi ,f 0 (4.12) ¬ C Ce − C Ce ≤

O i j C ef ,f l=(R,Re) 0 (4.13) ¬ C Ce − ≤

O i j C ef ,f el=(R,Re) 0 (4.14) ¬ C Ce − ≤

O j Constraints for the activation variable Oef ,fi follow the same scheme. Ce C

Run Offset. With an unknown routing, the run offset of message C is given by:

C C,l˜ rC,l = (L t ) (4.15) ∑ l,el · T l˜ E l ∈ l\ C where El is the set of all links and L is the so-called link order variable that is binary l,el and set to 1 if frames of message C pass link l before arriving at link l˜ and is set to 0 in all other cases, i.e., when the frames pass l after passing l˜ or when they are not routed over l at all. To make sure that the link order variables are set properly, we introduce the follow- ing constraints, which are formulated for each scheduled message in the network. For each pair of links l and el that can be passed by the scheduled message C we formulate the following constraints: l El,l = (R,Re): ∀ ∈ LC C 0 (4.16) l,el − l=(R,Re) ≤ LC + LC 1 (4.17) l,el el,l ≤ Equation (4.16) ensures that the link order variable is always set to 0 when the message is not routed over the potentially preceding link, while Eq. (4.17) enforces a strict order direction (if the link l is passed before the link el, the link l cannot precede the link el and vice versa). Additionally, for each resource R in the network that can be passed by the message C, we formulate following constraints for each link pair consisting of an in-link l′ coming from the resource R′ and the out-link el going to the resource Re:

C +C LC 1 (4.18) l′=(R′,R) l=(R,R) e e − l′,el ≤ C L Cl =(R ,R) 0 (4.19) l′,el − ′ ′ ≤ C L Cl=(R,R) 0 (4.20) l′,el − e e ≤ Equations (4.18)-(4.20) enforce the condition that the input link to a resource always precedes the output link of the resource, as long as both links are part of the routing of message C.

78 4.1 Joint Constraint Generation for Routing and Scheduling

To make sure that the order relation between links that do not share resources is correct, we finally introduce the following constraint that is added for every possible combination of three directed links l, l′ and el that can be passed by message C:

C C C Ll,l + L L 1 (4.21) ′ l′,el − l,el ≤

This transitivity constraint states that if link el is preceded by link l′ and at the same time, link l′ is preceded by link l, then link el is also preceded by link l.

Big Number Ae. As the exact value of the run offset is not known during constraint formulation, we are using the sum of all link delays plus the period of the later message for the calculation of a safe big number Ae.

Multicasts and Messages with different Periods. Throughout the explanation of the combined routing and scheduling constraints, we made two simplifying assump- tions: all scheduled messages (a) have the same period and (b) are sent as unicast messages. These simplifications do not limit the applicability of the presented ap- proach and were made solely for the sake of brevity. When the constraint set is formulated for messages with different periods, Eqs. (4.7) and (4.8) have to be formulated for each pair of interfering frames within the HP of the two interfering messages. This is done in the same way as for the case with fixed routing described in Section 4.1.2. Multicast messages may be handled using an approach similar to the one introduced in Section 2.2.3. When dealing with multicasts, instead of considering the routing and scheduling of messages, we consider the routing and scheduling of communication flows. Constraints are formulated for the possible interference of each communication flow with each communication flow from other messages. All communication flows of the same message share the same global offset variables and do not compete for links among each other.

4.1.3 Experimental Results We evaluate the applicability and the performance of our approach by performing two case studies.

Performance of the proposed approach In SAT-Decoding, the constraint set encoding valid solutions has to be solved for every solution considered during the DSE. The time required for the constraint resolution, therefore, has a great impact on the overall run time of the optimization. In our first experiment, we investigate how the schedule granularity g, the link number, and the

79 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

1 path 2 paths 3 paths

Figure 4.1: Architecture topology (hop number c = 3) used for the experiments mea- suring the solving time of the proposed constraint set. network topology affect the solving time of the proposed constraint set and compare it with a state-of-the-art design approach for time-triggered systems. For this, we perform an optimization run without any objectives—resulting in no time required for analysis of an implementation—and measure the average time needed for solving the constraints of the first 100 individuals. Throughout the first case study, all messages have a payload of 1,500 Bytes, a period of 0.01 seconds, and are transmitted over 100 Mbit Ethernet links. In this experiment, the architectures consist of two Electronic Control Units (ECUs), a sender and a receiver, which are connected by a network of switches and links. We consider the three different topologies illustrated in Fig. 4.1 and vary the hop number c to measure the constraint solving time for architectures with different link numbers. Figure 4.2 illustrates how the average solving time of the constraint set detailed in Section 4.1.2 scales with the number of messages n depending on the architecture topology (above), the chosen granularity for the message offset g (middle) and the hop number c between the sender and the receiver. Considering the super-linear growth of the solving time, opportunities to reduce the search space, like a message- specific offset granularity (the discretization of the sending offset, which results from the granularity, should not exceed the precision of the used hardware) or the restriction of a message routing to certain parts of the network, should always be taken. Also, coupling our approach with incremental solving approaches similar to the ones presented in [Ste10] and [SLC13] has the potential to further reduce the solving time. To evaluate our approach (hereafter referred to as the 1-step approach), we im- plemented a 2-step synthesis flow following the state-of-the-art design approach for time-triggered systems [Ste10; SAW+14]. There, the routing constraints proposed in [LSF14] are generated at the start of the optimization. During the optimization run,

80 4.1 Joint Constraint Generation for Routing and Scheduling

80 1 path, c = 2 2 paths, c = 2 60 3 paths, c = 2 40

20 solving time [s] 0 0 10 20 30 40 50 60 70 80 number of messages n 30 1 path, n = 60, c = 2

20

solving time [s] 10

6 7 8 9 10 granularity g 1path, n = 60, g = 7 15

10 solving time [s]

5 2 3 4 5 hop number c

Figure 4.2: Scaling of the average solving time per individual of the proposed 1-step approach. The solving time grows exponentially with the number of messages n (top) and the granularity g (middle) and has a linear growth with the hop number c (bottom). For all scaling dimensions n, g, and c, the solving time increases significantly faster for higher path numbers (plotted only for the graph at the top) [SGR+17b].

81 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks these constraints are used to create individuals describing valid routings. In the second step, constraints based on the Eqs. (4.1) and (4.2) in Section 4.1.2 are generated for each individual. Solving this individual constraint set then provides a valid schedule. Tables 4.1 and 4.2 illustrate the time needed for the creation of one implementation consisting of a routing and a schedule of a given set of messages. The routing and scheduling constraints used in the proposed 1-step approach are generated and preprocessed once at the beginning. Consequently, the creation of one implementation during the optimization involves only solving these constraints. In the 2-step approach however, the generation of the scheduling constraints also contributes to the time that is required for the creation of each implementation. During our experiments, we additionally examined how the preprocessing of the scheduling constraints of the 2-step approach affects the time needed per individual. Table 4.1 contains the times needed for the creation of an individual for specifica- tions where n = 30 messages are transmitted over two communication hops (c = 2) and where there are one, two, or three possible routing paths and an offset granularity of g = 7. This situation is not a very hard scheduling problem, as 30 messages can easily be scheduled on a single routing path. Recorded are the minimal, the maximal, and the average solving time per individual, as well as the average deviation. The experimental results show that the 2-step approach performs much better in cases with multiple routing options. This can be explained by the fact that the scheduled messages are distributed among the different routing paths during the routing step of the 2-step approach. After this, the scheduling constraints are generated for the easy scheduling situations, where less than 30 messages have to be scheduled. Ar- chitectures with more routing alternatives result in easier scheduling problems and, thus, shorter solving times. On the other hand, the 1-step approach considers all possible interferences of the messages within the given architecture and therefore has significantly longer solving times, which get bigger with an increasing number of routing options. Interestingly, the solving times in the case with a single routing possibility are nearly identical. This can be explained by the preprocessing step that is performed at the start of the optimization with the 1-step approach. In the case of the 2-step approach, however, the constraint preprocessing has to be performed for each

Table 4.1: Solving times in seconds - 30 messages 1 path 2 paths 3 paths 2-step 2-step* 1-step 2-step 2-step* 1-step 2-step 2-step* 1-step min 0.071 0.239 0.070 0.028 0.094 0.746 0.018 0.048 1.291 max 1.126 0.674 1.147 1.967 0.532 5.173 0.112 0.140 11.644 avg 0.315 0.359 0.298 0.348 0.178 1.891 0.029 0.064 3.737 dev 0.668 0.261 0.904 1.383 0.608 0.674 0.527 0.315 0.781

82 4.1 Joint Constraint Generation for Routing and Scheduling individual, so that it can even increase the time needed per individual. Table 4.2 contains the solving times for the case where n = 60 messages are transmitted over two communication hops (c = 2) with an offset granularity of g = 7. This scheduling problem is much harder. In fact, because of the conservative integer transformation outlined in Section 4.1.2, 65 messages are already not schedulable on a single routing path (this is also the reason why the black plot in Fig. 4.2 stops at n = 60). The 1-step approach considers scheduling and routing jointly and is therefore still able to find valid solutions within reasonable times and with a small variation of the solving times. The 2-step approach is still much faster in the cases where the schedule-unaware routing generation happens to provide a favorable distribution of messages over the routing paths. However, an unfavorable message distribution results in very hard scheduling problems that require very long solving times. During the experiments, we used a timeout of 5 minutes for the SAT solver used for scheduling in the 2-step approach. With a disabled timeout, we have seen cases where the solving was not finished after one hour. During a real optimization, an unfavorable routing can result in unschedulable message distributions, which is especially critical, because it leads to very long solving times ending with a timeout or a contradiction. Surprisingly, the 2-step approach performs much worse than the 1-step approach in case of a hard scheduling problem with one possible routing path, where it frequently runs into timeouts. As a new SAT solver is created in each iteration, it always starts the solving process with a more or less random order of the search variables. In case of hard scheduling problems, this results in very long solving times. In the 1-step approach, we are using a single constraint set and, therefore, a single SAT solver throughout the entire optimization run. Since the very first individual, the solver learns clauses and adjusts the variable order, making a worst-case situation with long solving times less probable with each processed individual.

Network Optimization Beside the run-time advantages, working with a joint constraint set offers the op- portunity to apply the SAT-Decoding approach presented in Section 2.2.2. In our second case study, SAT-Decoding is used for a multi-objective optimization of a

Table 4.2: Solving times in seconds - 60 messages 1 path 2 paths 3 paths 2-step 2-step* 1-step 2-step 2-step* 1-step 2-step 2-step* 1-step min 9.131 3.451 0.866 0.300 0.560 5.516 0.115 0.345 7.933 max 300.6 301.5 23.54 300.4 301.1 63.44 300.3 301.1 138.8 avg 265.4 246.1 6.864 100.3 84.73 18.20 3.933 6.552 36.78 dev 0.332 0.398 0.817 1.386 1.547 1.092 7.573 6.418 1.134

83 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks mixed-criticality automotive network, including routing and scheduling. This network consists of 13 ECUs exchanging n = 91 messages over 4 switches. There are 27 time-triggered high-criticality messages which are transmitted over up to 3 commu- nication hops. As some messages are transmitted as multicast-messages, there is a total of 37 scheduled message flows. They have periods of 0.5, 0.1, 0.8, 0.03 and 1.0 seconds. The network contains both 100 MBit- and GigaBit-Ethernet links. The offset granularity is set to g = 12. We optimize the routing and scheduling of this network with respect to two design objectives: The first design objective is to minimize the worst-case interference that the high-criticality traffic imposes on unscheduled traffic. For this, we use an evaluator based on the analysis presented in Section 3.1. There is a tendency that the worst-case interference on unscheduled traffic is reduced when the slots for scheduled traffic are distributed across the HP and are not grouped together into larger blocks. The second design objective is to minimize the number of port slots that have to be configured in the memory of the TSN output ports. This number may be smaller than the number of time slots for high-criticality frames because two consecutive slots are merged into one if their distance is smaller than the Guard Band (GB) interval. By minimizing the number of slots that have to be configured, we are reducing the memory requirements and thus, the monetary costs of the used TSN switches. Note that both objectives would, when considered independently, either distribute the port slots evenly or would cluster them together; introducing the need for a multi-objective optimization that shows the trade-offs between the two objectives. We perform an optimization run over 1,000 generations with 25 new individuals created and evaluated in each generation. For a single individual, the decoding takes 2.1 seconds and the evaluation takes 1 second on average. The total ≈ ≈ exploration time was 21.5 hours. The results of the optimization are illustrated ≈ in Table 4.3. Compared to the values found within the initial generation, the DSE finds implementations with reduction of the scheduled interference by upto30% and reduction of the number of configured slots by up to 4 %. Furthermore, the

Table 4.3: Results of a DSE with joint routing and schedule determination and two objectives to be optimized: Individual Number of port slots Scheduled interference [s] Initial 360 9 10 5 ∗ − Pareto 1 346 7.23 10 5 ∗ − Pareto 2 347 6.63 10 5 ∗ − Pareto 3 348 6.62 10 5 ∗ − Pareto 4 349 6.59 10 5 ∗ − Pareto 5 350 6.51 10 5 ∗ −

84 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning optimization yields 5 Pareto-optimal solutions. As could be expected, a smaller number of configured slots results in a higher schedule interference.

4.1.4 Related Work Since the introduction of the FlexRay bus, time-triggered systems are increasingly used in the automotive industry. With a given system schedule, these systems offer a very high level of determinism at run time. However, the design-time generation of this schedule is a challenging task. In [LGT+09], the authors demonstrate how a schedule for a FlexRay system can be generated using a set of Integer Linear Programming (ILP) constraints. In [MFH+05] and [BHP+08], the authors present constraint sets that yield a valid allocation and schedule for systems with multiple processors, where tasks can be mapped to different resources. However, they assume a one-hop communication over a single bus, so that these constraints are not suitable to solve the routing problem for multi-hop switched Ethernet networks. In [Ste10], the authors introduce a set of SMT constraints for the generation of schedules for time- triggered multi-hop networks, while [SLC13] proposes a methodology to resolve the conflicts between given subschedules, thereby generating the global system schedule. Both works, however, assume a given fixed routing of the scheduled messages and are limited to finding valid schedules without the possibility for optimization. To thebest of our knowledge, [SGR+17b] is the first work that enables the joint generation ofa valid schedule and a valid routing in a multi-hop automotive communication network and at the same time can be used for multi-objective optimization of both routings and the schedules.

4.2 Constraints for a Message Routing Respecting the VLAN Partitioning2

Ethernet is not only an attractive option to satisfy the huge bandwidth requirements of upcoming safety-critical applications, but also very well suited for the implementation of new network topologies like, e.g., a backbone architecture that connects all domains. Yet, in spite of these advantages, it is important to bear in mind that the Ethernet protocol is designed mainly with a focus on configurability, scalability, and low cost and, therefore, requires for a careful consideration of other aspects such as real-time capability, reliability, and in particular security. The already high security requirements of safety-critical applications are additionally aggravated by the fact that a lot of them, like, e.g., Car-to-Car, Car-to-X, or over-the-air-update applications, rely on data generated outside of the car, so that additional security mechanism become mandatory for automotive Ethernet networks [KSM13].

2Major parts of this section have been published in [SRT+18a].

85 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

The mechanism of VLANs is frequently used in local and metropolitan area networks and also seems interesting as a security mechanism in the automotive context. The communication network is hereby separated into subnetworks, where each end node is assigned to one or multiple VLANs [GSZ+07]. When an end node sends a message, the VLAN of the end node is written into the VLAN tag of the message frame. The message can then only be sent to end nodes that are assigned to the VLAN matching its VLAN tag. A resource (source) can, consequently, only send a message to another resource (destination), if both resources are assigned to the same VLAN. If a message has to be sent to a resource in another VLAN, a VLAN transmission is required which can only be done by VLAN routers, i.e., resources which are assigned to both the source’s and the destination’s VLAN. During the VLAN transmission, the VLAN router uses higher-layer protocols to determine whether the message is allowed to enter the destination VLAN. In this case, the router rewrites the VLAN tag of the message and transmits it into the destination VLAN. This mechanism enforces a separation of the communication between different VLANs of the network, while still allowing for an information exchange across VLANs when allowed by higher-layer security protocols. Automotive communication networks are traditionally already separated into do- mains with different security requirements like power train, chassis, or infotainment. Here, the separation of an Ethernet backbone topology into different VLANs is an interesting possibility to (a) overcome the physical separation of current domain networks to use communication bandwidth more efficiently, to (b) virtually mimic the traditional separation to ease deployment and integration of existing ECUs and respective applications, and to (c) allow for a controlled information exchange across different VLANs for security reasons [HMV+13]. However, the introduction of VLANs significantly increases the complexity of the network design: From the area of local and metropolitan area networks, it is known that an unfavorable VLAN configuration of the end nodes of the network, theso- called VLAN partitioning, results in severe disadvantages for the network performance [GSZ+07; KSS+09; SSK+10]. Yet, the approaches used for the automated partition- ing of municipal or local area networks mainly focus on minimizing the amount of broadcast traffic and are, thus, hardly applicable for the multi-objective optimization of automotive networks. As a remedy, this section proposes a set of routing constraints for the creation of routings that are valid with respect to a given VLAN partitioning and can be integrated into SAT-Decoding, hereby enabling the optimization of the VLAN partitioning with respect to arbitrary optimization objectives. The presented experiments highlight the optimization potential of the proposed approach and in- vestigate the impact of the VLAN partitioning on vital network characteristics like transmission timing or link load. The remainder of this section is outlined as follows: Section 4.2.1 shows how the system model presented in Section 2.2.1 is extended by a model of VLANs. A detailed introduction of the proposed constraint set is given in Section 4.2.2. Experimental

86 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning results are discussed in Section 4.2.3, while Section 4.2.4 presents the related work for the VLAN topic.

4.2.1 System Model

Network Configuration The Ethernet networks considered in this section are divided into VLANs. Each ECU is assigned to at least one VLAN and can only directly exchange messages with ECUs that are assigned to the same VLAN. ECUs which are assigned to multiple VLANs are able to transmit a message between two of these VLANs by rewriting the VLAN tag of the message. These special ECUs are referred to as VLAN routers and the rewriting of the VLAN tag of a message is referred to as a VLAN transmission. If an ECU (the message source) has to send a message to an ECU (the message destination) in a different VLAN, this message is (I) sent from the source to a VLAN router that is assigned to the same VLAN as the source, (II) transmitted to the VLAN of the destination by the VLAN router, and (III) sent from the VLAN router to the destination. On its way from the source to the destination, a message may have to be transmitted through multiple VLAN pairs by multiple VLAN routers.

VLAN Model The contribution presented in this section uses the system model presented in Section 2.2.1 as a basis and addresses a DSE exploring the VLAN partitioning of the ECUs. The designer hereby describes the VLAN search space by providing a set of possible VLAN configurations for each ECU. Each of these configurations assigns the ECUto one or to multiple different VLANs. During the creation of each implementation, one of the VLAN configuration options specified by the designer is picked for eachECU in the system, hereby partitioning the network into different VLANs.

Example Figures 4.3 and 4.4 show how VLANs are considered in the system model used throughout this work. The specification graph is illustrated in Fig. 4.3. Here, the architecture graph consists of the ECUs G0–G2, which are connected to the switch S. As G0 and G2 shall model ECUs that are not able to transmit messages between different VLANs, they have to be assigned to either the VLAN v0 or v1. Whereas, G1 shall model an ECU that can be configured as a VLAN router, so that it can also be assigned toboth v0 and v1 at the same time. In this simple example for illustration, the application graph consists of the processes P0 and P1. Hereby, the fact that P1 is data-dependent on P0 is captured by the message C0. The two mapping edges state that the processes P0 and P1 must be implemented on G0 and G2, respectively.

87 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

P0 G0 (v0) (v1)

C0 S G1 (v0) (v1) (v0,v1)

P1 G2 (v0) (v1)

Figure 4.3: Example of a specification for the exploration of VLAN partitionings. While both G0 and G1 can be assigned to either v0 or v1, G1 can also be configured as VLAN router with membership in both VLANs.

Figures 4.4a and 4.4b depict two implementations that can be generated based on the specification from Fig. 4.3. The processes P0 and P1 are bound to the resources G0 and G2, respectively. Figure 4.4a illustrates the case where both G0 and G2 are placed in the VLAN v0. As both the sender and the receiver resources are in the same VLAN, no VLAN transmission is required for the message C0 and the message is routed directly from G0 to G2. G1 is not allocated since it is neither a binding target of a process nor a resource used for message transmission. In the case illustrated in Fig. 4.4b, G0 and G2 are placed in different VLANs. Here, a VLAN transmission is necessary for a message from G0 to G2, so that a valid implementation must contain at least one VLAN router resource (G1 in this case). Because of the necessary VLAN transmission, the application graph must also be altered by adding the VLAN transmission process PR and the additional message C0′ , which represents the message which is created by the VLAN router by rewriting the VLAN tag of C0. PR is bound to the VLAN router resource G1 and C0′ is assigned to its own routing graph.

4.2.2 Routing Constraints The constraint set presented in Section 2.2.3 characterizes the space of valid resource allocations and valid task bindings. However, the message routing provided by this encoding does not consider the VLAN assignments of the network resources and is not valid when a VLAN partitioning is used. In what follows, we significantly extend the basic encoding so that the VLAN assignments of the source and the destination as well as the potentially required VLAN transmission(s) are taken into account.

88 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning

P0 G0 (v0)

C0 S G1

P1 G2 (v0)

(a) In the case where both G0 and G2 are assigned to the same VLAN, no VLAN transmission is necessary and C0 can be routed directly.

P0 G0 (v0)

C0 S G1 (v0,v1)

PR G2 (v1)

C0′

P1

(b) Assigning G0 and G2 to different VLANs implies the allocation of the VLAN router G1. Additionally, C0 must now be transmitted from v0 to v1. The VLAN transmission is modeled by the process PR, bound on G1. The part of the application graph that reflects the necessity of the VLAN transmission is highlighted by the dashed line.

Figure 4.4: Example of two implementations generated from the specification in Fig. 4.3.

The constraint presentation in this subsection is organized in three parts. The first part contains the constraints describing the necessity for VLAN transmissions depending on the VLAN configurations of the binding targets of the source andthe

89 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks destination tasks of the messages. In the second part, we present constraints that encode valid routings. Finally, the constraints in the third part of this subsection encode an allocation which is consistent with the previously made mapping and routing decisions.

Encoding the Necessity for VLAN transmissions The proposed constraint set does not make any assumptions about the mapping of the processes or the VLAN configuration of the ECUs. Before formulating constraints for correct message routings, we first introduce variables expressing the VLAN transmissions that are chosen for the messages.

VLAN Configuration of ECUs. As stated in Section 4.2.1, the VLAN search space is described by defining a set of possible VLAN configurations KG for each ECU G. In a valid implementation, exactly one VLAN configuration k K needs to be G∈ G chosen for each ECU: G N ,k K : ∀ ∈ G G ∈ G ∑kG = 1 (4.22) kG where kG is an encoding variable which is set to 1 iff the VLAN configuration k is chosen for the ECU G. To make sure that each ECU is assigned to each VLAN in the chosen configuration and is not assigned to any other VLANs, we formulate the following constraints: G N ,k K ,v k ,v / k : ∀ ∈ G G ∈ G ∈ G e∈ G v k 0 (4.23) G − G ≥ v k 0 (4.24) ¬eG − G ≥ Constraints (4.23) and (4.24) make sure that the ECU is only in the VLANs which are in the chosen VLAN configuration. The encoding variable vG is hereby set to 1 iff ECU G is assigned to VLAN v.

VLAN Assignment of the Process Tasks. The source and the destination processes of the messages in the network are associated with the VLANs of their binding targets: G N ,P N ,m = (P,G) E ,v V: ∀ ∈ G ∈ P ∈ M ∈ m + v v 1 (4.25) G − P ≤ m + v v 1 (4.26) ¬ G − ¬ P ≤ where vP is an encoding variable that is set to 1 iff process P is associated with VLAN v.

90 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning

VLAN Assignment of the Communication Flows. The source and the destination VLAN of a communication flow are directly dictated by the VLANs of its sourceand its destination task: + C NC,Pe N (C),P N−(C),v V,v V: ∀ ∈ ∈ P ∈ P ∈ e∈ CPe = 1 (4.27) ∑ v,ve v,ve

Pe C vP 0 (4.28) v,ve− ≤ CPe v 0 (4.29) v,ve−ePe ≤ + where NP−(C) and NP (C) denote the sets of the predecessor and the successor pro- cesses of the message C, respectively. CP is an encoding variable that is set to 1 iff v,ve the source of the communication flow of message C to its successor process P is in the VLAN v, while its destination is in the VLAN ve. The constraint Eq. (4.27) states that for each communication flow, exactly one combination of a source and a destination VLAN has to be chosen while the constraint Eqs. (4.28) and (4.29) state that a communication flow can only be assigned toa VLAN if the source and the destination tasks have the appropriate VLAN assignment.

Preventing unnecessary VLAN Transmissions. It is desirable that VLAN trans- missions only occur if the messages cannot be directly routed through a single VLAN. For example, the VLAN partitioning in Fig. 4.5(a) should always result in the direct routing of the message, instead of, e.g., a VLAN transmission from v0 to v1 followed by a transmission from v1 to v0. To prevent any unnecessary VLAN transmissions, we add the following constraint to the constraint set: + C NC,Pe N (C),P N−(C),v V: ∀ ∈ ∈ P ∈ P ∈ Pe vP + v C 1 (4.30) Pe ∑ ve,ve − v V ≤ e∈ This constraint states that if the source and the destination task are assigned to the same VLAN, one of the options without VLAN transmissions must be used for the communication flow.

VLAN Router Sequences of the Communication Flows. To encode the VLAN transmission of the communication flows, we first find all possibilities to transmit messages between each pair of VLANs. The possibilities to transmit a message from VLAN v to VLAN ve are expressed as so-called VLAN router sequences and constitute the set Q . Each VLAN router sequence q Q contains a sequence of v,ve v,ve∈ v,ve Nr (q ) G Nr (q ) VLAN routers G v,ve , where each VLAN router G v,ve performs the VLAN qv,v ∈ transmission fromv ¯ to v . This is denoted as G e v¯,v . ′ −−→ ′

91 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

G0 (v0) (v1)

P0 P0

S0 G1 (v0,v1)

C C (a) 0 0 S1 G2 (v0,v1) G0 (v0)

P1 PR

G3 (v0) (v1) S0 G1 (v0,v1)

(c) C0′

S1 G2 (v0,v1)

G0 (v0) PR′ P 0 G3 (v0)

S0 G1 C0′′ (b) C0

S1 G2 P1

P1

G3 (v0)

Figure 4.5: Illustration of an unnecessary VLAN transmission: if, according to the specification given in (a), the sender G0 and the receiver G3 are placed in the same VLAN v0, the message must be transmitted without VLAN transmissions (b), instead of a VLAN transmission from v0 to v1 with a subsequent transmission from v1 to v0 (c).

The first step to find all VLAN router sequences in a specification is the construction of the so-called VLAN transmission graph. This graph consists of nodes z Z, which ∈ are connected by undirected edges y Y. Hereby, each node z Z represents a ∈ ∈ possible configuration of an ECU, where the ECU would be a VLAN router, i.e.,a configuration where the ECU is assigned to more than one VLAN. An undirected edge y Y is added between each pair of VLAN routers that can communication, i.e., each ∈ pair of nodes that (a) represent different resources and (b) share at least one VLAN:

Z = z = (G,kG) G NG,kG KG, kG > 1 { | ∈ ∈ | | } (4.31) Y = y = (z,z) G(z) = G(z), k (z) k (z) 1 { e | ̸ e | G ∩ G e | ≥ } Each path in the VLAN transmission graph corresponds to a valid VLAN router Q sequence. The set of the VLAN router sequences v,ve is found by traversing the graph starting from each vertex assigned to v. The search is stopped when a vertex assigned to veis reached. To avoid unnecessary VLAN transmissions, we use a distance parameter and apply an iterative search. During each iteration, the sequences which can perform the desired transmission within a number of VLAN transmission steps—given by the distance plus 1—are found. After this, the vertices used in these sequences are removed from the graph and the distance is incremented. For example, VLAN routers capable of a direct VLAN transmission from v to veare found during the first iteration

92 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning and then removed, so that they cannot contribute to an unnecessary complex VLAN transmission sequence in later iterations. Example: The example illustrated by Tab. 4.4, Tab. 4.5, and Fig. 4.6 illustrates how the VLAN router sequences are found by traversing the VLAN transmission graph. Assume that in the given network, G0–G5, the ECUs used as VLAN routers, are configured with the VLAN assignments given in Tab. 4.4. Figure 4.6 illustrates the VLAN transmission graph that is created based on this assignment (0), as well as the iterative search (I)–(IV) for possible router sequences which enable a transmission between the VLAN v1 and the VLAN v5. During the first iteration of the graph traversal only the nodes that are within the distance of 0 steps of the start nodes z0 and z1—which are assigned to the start VLAN v1—are considered. Here, z0 is identified as a router capable of a VLAN transmission from v1 to v5 within 0 + 1 = 1 transmission steps. The corresponding router sequence qv′ 1,v5 is notated and the node z0 is removed from the graph, so that it is not considered for unnecessarily complicated router sequences in the later iterations. The search distance is increased in each of the subsequent iterations (II)–(IV). After finding two additional router sequences, qv′′1,v5 and qv′′′1,v5 , the search terminates, since none of the remaining nodes is assigned to the target VLAN v5. Table 4.5 provides a summary of the search process and a formal description of the router sequences found during each iteration. Having found all possible VLAN router sequences, we formulate the following constraints which ensure the choice of a valid VLAN router sequence: q + r v,ve C NC,Pe N (C), v,v V,G N (q ),G v¯,v′: ∀ ∈ ∈ P { e} ⊆ ∈ G v,ve −−→

Pe CPe + qC = 0 (4.32) v,ve ∑ v,ve − q Q v,ve∈ v,ve

CPe vG q 0 (4.33) e − v,ve ≥ CPe vG q 0 (4.34) − v,ve ≥

Table 4.4: Exemplary VLAN assignment of 6 VLAN routers. Router VLAN assignment G v ,v ,v 0 { 1 3 5} G v ,v 1 { 1 2} G v ,v 2 { 2 3} G v ,v 3 { 3 5} G v ,v 4 { 3 4} G v ,v 5 { 4 5}

93 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

(0) (I) (II) distance: 0 distance: 1 z0 z0 z0 v ,v ,v v ,v ,v v ,v ,v { 1 3 5} { 1 3 5} { 1 3 5} z1 z1 z1 v ,v v ,v v ,v { 1 2} { 1 2} { 1 2} z2 z5 z2 z5 z2 z5 v ,v v ,v v ,v v ,v v ,v v ,v { 2 3} { 4 5} { 2 3} { 4 5} { 2 3} { 4 5} z3 z4 (III) z3 z4 (IV) z3 z4 v ,v v ,v v ,v v ,v v ,v v ,v { 3 5} { 3 4} distance: 2 { 3 5} { 3 4} distance: 3 { 3 5} { 3 4} z0 z0 v ,v ,v v ,v ,v { 1 3 5} { 1 3 5} z1 z1 v ,v v ,v { 1 2} { 1 2} z2 z5 z2 z5 v ,v v ,v v ,v v ,v { 2 3} { 4 5} { 2 3} { 4 5} z3 z4 z3 z4 v ,v v ,v v ,v v ,v { 3 5} { 3 4} { 3 5} { 3 4} Figure 4.6: Illustration of the iterative search for possible VLAN router sequences that enable a VLAN transmission from VLAN v1 to VLAN v5, according to the VLAN assignment given in Tab. 4.4.

CP where q is an encoding variable which is set to 1 iff the router sequence qv,v is v,ve e chosen for the VLAN transmission of the communication flow transmitting message C to the binding target of process P. The constraint in Eq. (4.32) states that exactly one VLAN router sequence has to be chosen for the VLAN transmission of the communication flow from v to ve. The

Table 4.5: Summary of the iterative search used to find all VLAN router sequences which enable a VLAN transmission from v1 to v5 in the VLAN transmission graph based on the VLAN assignments given in Tab. 4.4 and illustrated in Fig. 4.6. iteration distance found router sequence (0) - - qv′ ,v (I) 0 q = (G 1 5 v ,v ) v′ 1,v5 { 0 −−−→ 1 5 } (II) 1 - qv′′ ,v qv′′ ,v (III) 2 q = (G 1 5 v ,v ),(G 1 5 v ,v ), v′′1,v5 { 1 −−−→ 1 2 2 −−−→ 2 3 qv′′ ,v (G 1 5 v ,v ) 3 −−−→ 3 5 } qv′′′ ,v qv′′′ ,v (IV) 3 q = (G 1 5 v ,v ),(G 1 5 v ,v ), v′′′1,v5 { 1 −−−→ 1 2 2 −−−→ 2 3 qv′′′ ,v qv′′′ ,v (G 1 5 v ,v ),(G 1 5 v ,v ) 4 −−−→ 3 4 5 −−−→ 4 5 }

94 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning constraint Eqs. (4.33) and (4.34) state that the VLAN router sequence variable must not be active unless the VLAN routers that it contains have the VLAN configurations necessary to perform the VLAN transmission described by the VLAN router sequence.

Encoding the Routing Variables A valid routing of a message is described by formulating the conditions for the activation of the edges in its routing graph.

Source Constraints. The following constraints are formulated for each resource that can be used as the source of the message, i.e., a potential binding target of its source task:

In-Edges. In the case where the resource is the source of the message, none of its in-edges must be active in the routing graph of the message: C N ,P N−(C),m = (P,Ge) EM,G N ,R X ,v V: ∀ ∈ C ∈ P ∈ ∈ G ∈ G ∈

m + ∑Cl=(R,G),v 1 (4.35) G,v ≤ where XR denotes the neighbors of resource R, i.e., the set of resources that are R C connected to by a link. The encoding variable l=(R,Re),v is set to 1 iff the message C is routed over the directed link l from R to Re through the VLAN v (the variable CP has a similar meaning for the communication flow to the process P). l=(R,Re),v

Out-Edges. The following constraints are formulated for the out-edges of a potential source resource (a potential binding target of the source task of the message): + C NC,P N−(C),Pe N (C),m = (P,G) EM,m = (Pe,G) EM,G NG,Re ∀ ∈ ∈ P ∈ P ∈ e ∈ ∈ ∈ XG, v,ve V: { } ⊆ Pe m me ∑Cl=(G,R),v 0 (4.36) − − R,v ≤

m +CPe CPe 0 (4.37) v,ve ∑ l=(G,R),v ¬ − R ≥ A source resource must have an activated out-edge per communication flow if not both the source and the destination task of the message are bound on the resource (4.36). Additionally, the activated edge must lead to the source VLAN of the communication flow (4.37).

Destination Constraints. The constraints for the edges of the potential destination resources are complementary to the source constraints:

95 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

Out-Edges. A destination of a communication flow must not have activated out- edges for it. + C NC,Pe N (C),G NG,m = (Pe,G) EM,G XG,v V: ∀ ∈ ∈ P ∈ e ∈ ∈ ∈ Pe me + ∑Cl=(G,R),v 1 (4.38) G,v ≤

In-Edges. A destination of a communication flow must have at most one activated in-edge for the flow unless it is also the source (4.39). The activated edge mustcome from the destination VLAN of the communication flow (4.40): + C NC,P N−(C),Pe N (C),m = (P,G) EM,m = (Pe,G) EM,G NG,R ∀ ∈ ∈ P ∈ P ∈ e ∈ ∈ ∈ X , v,v V: G { e} ⊆ m m CPe 0 (4.39) e − − ∑ l=(R,G),ve ≤ R,ve m +CPe CPe 0 (4.40) e v,ve ∑ l=(R,G),ve ¬ − R ≥

VLAN Router Constraints. The VLAN router constraints ensure that the VLAN routers are included into the routing if necessary.

In-Edges. The constraint Eqs. (4.41) and (4.42) encode the activation of the in-edges of resources that can be used as VLAN routers for the current communication flow. + C NC,Pe NP (C),Ce = (Pe,G) EM,G NG,R XG, v,ve,v¯,v′ V,v = ve,v¯ = v′, ∀ ∈ ∈ q∈v,v ∈ ∈ { } ⊆ ̸ ̸ Q = q G Nr (q ),G ¯ ′ v,v : v¯,v′ { v¯,v′ | ∈ G v¯,v′ −−→ e} m + (q ) (CPe ) 0 (4.41) e ∑ v¯,v′ ∑ l=(R,G),v q Q − R ≥ v¯,v′ ∈ v¯,v′ q Q : ∀ v¯,v′ ∈ v¯,v′ q + CPe 0 (4.42) v¯,v′ ∑ l=(R,G),v − r ≥ An incoming edge of a potential VLAN router G may only be active in v if G is the binding target of the destination task or if, according to the chosen VLAN router q G v v sequence v¯,v′ , is used to transmit the message from to e.

Out-Edges. The constraints in the Eqs. (4.43) and (4.44) describe the activation conditions of the out-edges of potential VLAN routers: q + r v¯,v′ C NC,P N−(C),Pe N (C),Q , = q , G N (q , ),G v,v ,m = (P,G) ∀ ∈ ∈ P ∈ P v¯ v′ { v¯ v′ | ∈ G v¯ v′ −−→ e} ∈ E ,G N ,R X , M ∈ G ∈ G

96 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning

v,v,v¯,v V,v = v,v¯ = v : { e ′} ⊆ ̸ e ̸ ′ m + (q ) (CPe ) 0 (4.43) ∑ v¯,v′ ∑ l=(G,R),ve q Q − R ≥ v¯,v′ ∈ v¯,v′ q Q : ∀ v¯,v′ ∈ v¯,v′ q + CPe 0 (4.44) v¯,v′ ∑ l=(G,R),ve − R ≥ An outgoing edge may only be active in veif the potential VLAN router is the source or if it is used to transmit the message into ve.

Switch Constraints. If a communication flow is not routed over a switch, the switch must not have any activated edges for this communication flow. A switch is never the destination or the source of a communication flow. It can also never be used to transmit a message from one VLAN to another. Consequently, even in the cases where a communication flow is routed over a switch, the switch must have exactly one activated in- and one activated out-edge which have to be in the same VLAN. These conditions are captured by the following constraints: + C NC,Pe N (C),S NS,R XS,v V: ∀ ∈ ∈ P ∈ ∈ ∈ Pe ∑Cl=(R,S),v 1 (4.45) R ≤

Pe ∑Cl=(S,R),v 1 (4.46) R ≤ Pe Pe ∑Cl=(R,S),v ∑Cl=(S,R),v = 0 (4.47) R − R These constraints state that each switch may have at most one activated in-edge (4.45) and at most one activated out-edge (4.46) per communication flow in a VLAN. At the same time, the sum of the activated in- and out-edges of the same communication flow in the same VLAN has to be zero (4.47). This only leaves the cases where the switch has either no activated edges or has exactly one in- and one out-edge.

Preventing Cycles. The following constraints ensure that inside one VLAN, a resource can have only one activated in-link used by a message (4.48) and a link can only be used in one direction by a message in a VLAN (4.49). C NC,R NR,Re XR,v V: ∀ ∈ ∈ ∈ ∈ C 1 (4.48) ∑ l=(Re,R),v ≤ Re C +C 1 (4.49) l=(Re,R),v l=(R,Re),v ≤ Consequently, there can be no cycles and different communication flows have to follow different paths after a forking.

97 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

Encoding the Variables for the Resource and the Link Allocation Hierarchy constraints. The following constraints express the relation between the usage of links by a communication and the usage of links by the communication flows of the communication: + C NC,Pe N (C),R NR,Re XR,v V: ∀ ∈ ∈ P ∈ ∈ ∈ C CPe 0 (4.50) l=(R,Re),v − l=(R,Re),v ≥

(CPe ) C 0 (4.51) ∑ l=(R,Re),v − l=(R,Re),v ≥ Pe A message uses a link in a VLAN if at least one of its communication flows uses the link in the VLAN (4.50). If none of its communication flows use the link in the VLAN, the message as a whole also does not use the link in the VLAN (4.51). C NC,R NR,Re XR,v V: ∀ ∈ ∈ ∈ ∈ C C 0 (4.52) l=(R,Re) − l=(R,Re),v ≥ (C ) C ∑ l=(R,Re),v l=(R,Re) 0 (4.53) v − ≥ A message uses a link if it uses the link in at least one VLAN (4.52). A message does not use a link at all if it does not use it in any VLAN (4.53).

Allocation Constraints. The allocation decisions are not affected by the introduc- tion of a VLAN partitioning. Consequently, the allocation can be implemented with the constraints introduced in Section 2.2.3.

Decoding In order to be used as part of the SAT-Decoding approach presented in Section 2.2.2, the above constraint set is automatically generated for the given specification at the beginning of the DSE. During the optimization, the SAT solver solves the constraint set with the strategy given by the genotype of the individual and returns a set of activated variables that encodes a specific problem solution. These variables are decoded into an implementation that can be directly used to evaluate the objective functions. The first step of the decoding is the creation of the implementation architecture. This is done by allocating each resource and each link with an activated variable. In contrast to the basic decoding detailed in Section 2.2.2, the approach presented in this section provides not only an altered architecture and routing, but also the necessary alteration of the application graph, where a VLAN transmission task is created for each VLAN transmission that is activated in the implementation. We also create an individual message task for each VLAN that the message passes because in a real

98 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning system, a different message is created during the VLAN transmission, so that it has to be considered separately during the evaluation of, e.g., the message timing or the link loads. For example, message C0 in Fig. 4.4b represents the message that is transmitted through the VLAN v0, while C0′ represents the message that is transmitted through the VLAN v1.

Solving Times As detailed in Section 2.2.2, the SAT solver solves the presented constraint set for each implementation considered during the DSE. Consequently, the solving time of the constraint set has a significant impact on the run time and the performance ofthe overall exploration. The solving time of a SAT solver in general scales exponentially with the number of variables in the underlying constraint set. Compared to the basic constraint set presented in Section 2.2.3, the constraint set presented in this section introduces a significantly higher number of encoding variables. Instead of encoding one variable for the transmission of each communication flow over each directed link, the introduction of VLANs necessitates encoding of one variable for the transmission of each communication flow over each directed link in each VLAN and toencode one variable for each possible router sequence. Considering the VLAN partition, hence, increases the number of link encoding variables by a factor that scales linearly with the number of VLANs and additionally requires encoding a number of routing sequence variables that scales quadratically with the number of possible VLAN routers. While this overhead reflects the growing size of the design space compared tothe case without VLAN consideration and our experimental results give evidence that it enables short exploration times for realistic use cases, the exponential growth of the solving time may become a problem for big systems with a large number of VLANs and potential router resources. Here, additional measures targeted at an improvement of the solving performance like, e.g., an incremental solving [Ste10; SLC13] or a user-defined restriction of the possible routings+ [MAS 18] may become necessary.

Routing Rule Customization Note that during the DSE, the complex process of constraint generation is performed automatically and in a way that is transparent to the designer, who simply supplies the specification. At the same time, the proposed constraint set allows a verystrong customization of the routing rules. Simply by setting the appropriate routing variables to one or zero, the designer can enforce or forbid that a message is routed through a certain VLAN, routed over a certain link, or transmitted to a VLAN using a certain VLAN router. This allows for a strong customization of the design space exploration that can be used to meet application-specific requirements.

99 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

4.2.3 Experimental Results We use two case studies to evaluate the performance and the optimization potential of our approach and to investigate the impact of VLAN partitioning on the performance of automotive networks.

Experimental Setup Characteristics of the Network. For our experiments, we use an automotive use case given by a communication network and an application. The network consists of 12 ECUs connected by two Ethernet switches in a double-star topology. The network contains both, 100 Megabit- and 1 Gigabit-links. Hereby, four ECUs can be configured as VLAN routers. The application consists of 105 processes communicating through 64 messages, both single- and multi-cast. All messages are transmitted periodically. The periods of the messages range between 0.00022 and 300.0 seconds, while the payloads of the messages range between 10 and 60,000,000 Bytes. The messages in the application at hand are divided into three different traffic classes: Safety-critical messages with short deadlines form the high-priority traffic and are configured to have the highest priority (4 in our case). The medium-priority traffic is made up of messages that are not safety-critical, yet have defined deadlines. For example, a message transmitting audio information for the infotainment system falls into this category, as it is not safety-critical, but should satisfy its deadline for a good user experience. In our experiments, medium-priority messages are transmitted with the priority 2. The last traffic class is the best-effort traffic. These sporadic messages are transmitted with the lowest priority (0) and do not have to fulfill any deadlines. For the calculation of their deadline ratio, we conservatively use their minimal inter-arrival distance as deadline.

VLAN Exploration. During our experiments, we investigate the performance of our approach for different magnitudes of the VLAN partitioning search space. We hereby measure the magnitude of the VLAN partitioning search space with the number of possible VLANs for an ECU which is not used as VLAN router. In the case of one possible VLAN, the VLAN configuration of both the VLAN routers and the non- router resources is fixed, so that we only explore the different VLAN transmission possibilities. In the case of two possible VLANs, all non-router resources can be assigned to one of two possible VLANs exclusively, while the potential VLAN routers can be assigned with any non-empty permutation of the two VLANs (three possible assignments, like in the specification in Fig. 4.3). With more than one possible VLAN, not only the VLAN transmission possibilities, but also the VLAN configurations of the ECUs are explored during the DSE.

100 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning

Constraint-Solving Times In the first experiment, we measure the solving times of the proposed constraint set during a DSE. For this, we perform an optimization run over 1,000 generations without any objectives—resulting in no time required for the analysis of an implementation— and measure the average time needed for solving the constraints of one implementation. Besides the scalability, we investigate the time needed for and the solving time improvement gained by preprocessing the constraints.

Solving-Time Scalability. The results of the first experiment are illustrated in Fig. 4.7. The proposed constraint set is applied for search spaces of different magni- tude for the two cases with (black bar) and without (red bar) preprocessing. As could be expected and as detailed in Section 4.2.2, the solving time of the constraint set grows exponentially with an increasing number of possible VLANs. However, even with four possible VLANs (where each of the 4 VLAN routers can be configured with one of 15 different VLAN configurations), the solving time of the constraint set, including the decoding process, is smaller than one second. Considering the fact that the time for the evaluation of typical automotive objectives like, e.g., timing, typically lies in the range of seconds or even minutes, the results of our first experiment show that our approach is well-suited for an application in this domain, as the timing analysis remains the bottle neck during the processing of each implementation. Nevertheless, the exponential growth of the solving time highlights the potential need for additional mechanisms that enable a faster solving process.

Constraint Preprocessing and VLAN Symmetries. Constraint preprocessing is a technique that is frequently applied when working with SAT solvers. Hereby, each of the encoded variables is checked. If the constraint set is only solvable for a certain assignment of the variable, the variable can be removed and replaced by this constant value. Each variable check is performed by fixing the variable and then trying to solve the constraint set. As the constraint set has to be solved at least once for each processed variable, the preprocessing time scales with the number of encoding variables. Nevertheless, preprocessing the constraints can lead to much

Table 4.6: Preprocessing time depending on the number of possible VLANs. number of possible VLANs preprocessing time [s] 1 0.57 2 1.096 3 1.56 105 · 4 2.47 106 ·

101 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

10 2 · − preprocessing no preprocessing 4

2

Constraint solving time [s] 0 1 2 3 4 Number of possible VLANs

Figure 4.7: Average solving time of the constraints introduced in Section 4.2.2 de- pending on the number of possible VLANs. The solving time grows ex- ponentially but is reasonable for realistic VLAN numbers. Preprocessing the constraints provides only minor solving time advantages [SRT+18a]. smaller variable numbers and significantly increase the solving speed during the exploration. Table 4.6 illustrates the time needed for the preprocessing of the proposed con- straint set. Interestingly, especially for bigger search spaces, the preprocessing takes extremely long, while offering only minor advantages for the constraint solving time. While the preprocessing time scales exponentially with the number of encoded vari- ables, the small performance gain suggests that the proposed constraint set covers a very big search space where only a small number of variables can be fixed to one value. Consequently, constraint preprocessing does not offer an effective way to reduce the number of encoding variables of the proposed constraint set. Excluding so-called VLAN Symmetries could result in a smaller number of en- coding variables and increase the effectiveness of constraint preprocessing. When exploring the separation of a network into subnetworks, only the actual separation of the resources is important. For example, in the specification in Fig. 4.3, the two implementations where both G1 and G2 are assigned to either v1 or v2 are identical from the practical point of view, as in both cases, the resources are in the same VLAN. Similarly, the implementation where G1 is in v1 and G2 is in v2 is symmetrical to the implementation where G1 is in v2 and G2 is in v1, as the resources are in different VLANs in both cases. In its current state, our constraint set is unaware of these symmetries. Consequently, we are evaluating the same VLAN partitioning multiple times, hereby reducing the optimization potential of the proposed approach. The symmetries may also be the cause for the long run time and the poor solving time

102 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning gain of the constraint preprocessing experienced in our first experiment, since a bigger variable space is investigated during the preprocessing. At the same time, all variable assignments leading to symmetrical implementations are valid, so that none of them can be excluded. We assume that removing these symmetries, e.g., by applying a similar technique as the authors of [SWG+17], can further increase the optimization potential of our approach.

Network Optimization In the second experiment, we use our approach for a multi-objective optimization. We use the VLAN sizes, the message timing, and the link loads as optimization objectives.

VLAN-Size Objective. Minimizing the size of the VLANs, i.e., the number of end nodes assigned to the VLANs, has an interesting impact on security: If a malicious attacker gains control over an ECU, he is able to directly communicate with all ECUs in the same VLAN. Consequently, a system where the number of ECUs sharing the same VLAN is small may be considered more secure, as the consequences of one ECU being compromised by a malicious attacker are in general less critical. Following this logic, our first objective during the exploration is to minimize the size of the created VLANs. We use two evaluators which evaluate the maximal and the average number of resources placed in the same VLAN. Implementations with smaller maximal and average VLAN sizes are considered better. Note: Providing a security evaluation that takes all relevant network characteristics like the accessibility or the resilience of ECUs into account and addresses different threat scenarios is out of the scope of this work. The VLAN size objective should be seen as a mere placeholder that is used for the DSE experiment and that should be replaced by an in-depth security evaluator in productive design.

Message Timing Objective. The VLAN partitioning directly affects the message routing and consequently also the time needed for the transmission of messages. Our second objective is to optimize the timing of the message transmission with respect to application-specific deadlines. We calculate the deadline ratio for each message. The deadline ratio ξ(C)of message C is hereby given by:

l(C) + j(C) ξ(C) = (4.54) u(C) where u(C)is the deadline of the message, i.e., the maximal time that may pass between the sending of the message and its arrival at the destination. As outlined in Section 3.1.1, l is the latency of the message, i.e., the minimal time for the transmission, while j is the difference between the worst- and the best-case transmission time, the so-called jitter. The sum of the latency and the jitter yields the worst-case transmission

103 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks time of a message. Messages which fulfill their deadlines have a deadline ratio smaller than 1, while the deadline ratio of messages missing their deadlines is bigger than 1. By minimizing the average deadline ratio of the network, we, therefore, improve the timing of the message transmissions. To calculate the latency and the jitter of the messages in the network, we use an evaluator based on the analysis approach detailed in Section 3.1.1. We extend this approach to account for the timing impact of the VLAN transmission processes. As the deadline is defined between the sending of the message and its arrival atthe destination, a potential VLAN transmission and the transmission to and from the VLAN router resource contribute to the latency and the jitter of a message. As the time to rewrite the VLAN tag does not depend on the frame size, we assume that the VLAN router resources require a constant time for the VLAN transmission of each frame of each message. Each message that requires a VLAN transmission experiences an additional latency based on the number of its frames and additional jitter based on the number of frames of other messages that are processed by the same VLAN router and may arrive earlier. If a process task is bound on a VLAN router, it also experiences additional jitter as it may be delayed by VLAN transmission tasks, which have the highest priority for the VLAN router resource.

Link Load Objective. To further investigate the correlation between the message timing and the VLAN partitioning, we use the link load of the Ethernet links as the third optimization objective. The link load of a directed link is obtained by dividing the number of bits that are transmitted over the link according to the chosen message routings by the bandwidth of the link. During the exploration, we minimize the average and the maximal link load of the links in the network.

Multiple Conflicting Objectives. The objectives used in our second experiment would, when considered independently, either create VLANs of minimal size or would put all ECUs into the same VLAN to minimize the need for VLAN transmissions; introducing the need for a multi-objective optimization that shows the trade-offs between the objectives.

Optimization Results. We performed an optimization over 1,000 generations with 25 new individuals in each generation. The timing analysis, which took about 20 seconds per individual, was the main performance bottle neck. We explored systems with up to 4 possible VLANs. Figures 4.8a–4.8c illustrate the 70 Pareto-optimal points found during the explo- ration. The optimization yields Pareto points which all satisfy their deadlines while at the same time having significantly smaller link loads. Among the objective values of the Pareto-optimal solutions, we see the same trends as in the archive overview. Implementations with bigger VLAN sizes tend to have smaller deadline ratios and

104 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning

0.30 0.30

0.28 0.28 0.75 0.26 0.26

0.70 0.24 0.24

0.22 0.22 0.65 0.20 0.20 Average Link Load Average Link Load

Average Deadline Ratio 0.60 0.18 0.18

0.16 0.16 3 4 5 6 7 8 3 4 5 6 7 8 0.60 0.65 0.70 0.75 Maximal VLAN size Maximal VLAN size Average Deadline Ratio (a) VLAN size and timing. (b) VLAN size and link load. (c) Timing and link load.

Figure 4.8: Among the Pareto-optimal points found during the exploration, there are three trends: (a) smaller VLAN sizes tend to have longer transmission timings and (b) higher link loads, while (c) smaller link loads typically lead to better timings [SRT+18a]. smaller link loads. We also see the expected correlation between low link loads and short times for the message transmission. However, the outlined correlations are trends rather than rules, as several Pareto-optimal points behave differently. This highlights the complex interrelations between the VLAN partitioning, the timing, and the link load, emphasizing the need for an automated design approach.

Timing Impact of VLAN Partitioning. Figure 4.9 illustrates the spread of the av- erage deadline ratio and the average link load across the individuals found in the archive throughout the exploration. The boxplots hereby capture the interquartile range of the objectives while the whiskers denote the range between the minimum and the maximum values seen throughout the exploration. As could be expected, the transmission time of the messages in the network massively depends on the VLAN configuration of the ECUs. The timing is affected not only by the fact thattheVLAN transmissions themselves consume time, but also by the choice of the resources that perform the VLAN transmission operations. In cases where many messages are processed using the same VLAN router, all of these messages have to be sent to the same resource, which leads to a high link load on the VLAN router links and results in a higher jitter for the messages. In the initial generations, the archive contains individuals with VLAN partitionings leading to high link loads and long transmission times, so that the deadlines are not fulfilled. Throughout the exploration, the proposed approach leads to a quick optimization of the timing and the link loads. In fact, all individuals found in the archive during the late phases of the optimization fulfill their deadlines, so that the overall median (red line) of the deadline ratio lies beneath 1.0. As could be expected, implementations with a stricter VLAN separation tend to have longer transmission times and higher link loads, resulting in wider distributions of the objectives for smaller VLAN sizes. Figures 4.10 and 4.11 illustrate the average deadline ratios of messages in the opti-

105 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

20.0

ξ 0.6 15.0 0.5

10.0 0.4

5.0 0.3 Average Link Load

Average Deadline Ratio . 1.0 0 2 3 4 5 6 7 8 3 4 5 6 7 8 Maximal VLAN Size Maximal VLAN Size

Figure 4.9: Networks with smaller VLANs tend to have longer message transmission times, higher deadline ratios, and higher link loads, especially at the links adjacent to the router resources. The application of the proposed optimization approach finds implementation with a significantly reduced link load and without deadline violations (with ξ < 1) [SRT+18a]. mized solutions and offer additional insight into the correlations between the VLAN partitioning and the message timing. Figure 4.10 depicts the average deadline ratios for messages of different traffic types (configured with different Ethernet priorities). As could be expected, a higher priority of a message results in smaller interference by other messages and, therefore, in a lower deadline ratio. The experimental results show that the VLAN partitioning does not affect the preferential arbitration treatment of the high-priority traffic and that the proposed approach can be effectively used tofind VLAN configurations where the messages satisfy their deadlines. Figure 4.11 shows that messages transmitted between different VLANs have significantly higher deadline ratios than messages transmitted within the same VLAN. A message transmission between two ECUs in different VLANs requires a longer transmission time, as the message is first transmitted to the VLAN router and then to the destination, instead of a direct transmission. In addition, Ethernet links near the VLAN routers have an especially high load, as they are used by all the messages that require a VLAN transmission. Consequently, a message transmitted between two different VLANs encounters a higher interference from other messages.

Optimization Potential of the SAT-Decoding Approach. To demonstrate the opti- mization potential of the proposed approach, we compare the optimization potential of two different optimization configurations. The first configuration implements the proposed SAT-Decoding-based approach (detailed in Section 2.2.2) where the solution strategy of the SAT solver is optimized by means of an EA. The second configuration, referred to as random solving, imitates optimization approaches where the constraint

106 4.2 Constraints for a Message Routing Respecting the VLAN Partitioning ξ 30 ξ 0.6

20 0.4

10 0.2 Average Deadline Ratio Average Deadline Ratio

0 0 0 2 4 2 4 Message Priority Message Priority

Figure 4.10: Average deadline ratio ξ for different types of traffic (priorities 0, 2, and 4). The best-effort traffic shows significantly higher deadline ratios. While both the medium- and the high-priority messages satisfy their deadlines, messages with the highest priority experience less interference [SRT+18a]. solver is not steered, but used only to obtain valid problem solutions [Ste10; NDR16; MAS+18]. Here, the SAT solver solves the proposed constraint set by using a random solving strategy for the creation of each individual. Each individual is evaluated with respect to the optimization objectives and the set of Pareto-dominant individuals is updated after each iteration. As detailed in Section 2.2.4, we quantify the optimization quality by measuring the ε-dominance. For each of the configurations, we measure the average ε-dominance over 10 optimization runs with 500 iterations and 25 new individuals in each iteration. Figure 4.12 illustrates the results of the experiment. The dashed black line and the solid red line show the ε-dominance of the random solving approach and the SAT-Decoding-based approach, respectively. The results clearly indicate that the SAT-Decoding-based approach delivers results of higher quality at all stages of the exploration and finds solutions that cannot be obtained by a less structured exploration strategy.

4.2.4 Related Work As the number of possible VLAN partitions grows exponentially with the number of network nodes, the VLAN partitioning in local and metropolitan area networks is typically automated. In these networks, the VLAN mechanism is mainly used to establish broadcast domains [RHK99; LL10]. A good VLAN partitioning hereby groups nodes that exchange a big number of messages into the same VLAN. The messages are then sent as broadcasts, as the VLAN partitioning makes sure that

107 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks

12

ξ 10

8

6

4

2 Average Deadline Ratio

0 Same VLAN Different VLANs

Figure 4.11: Messages that are transmitted between ECUs in different VLANs have significantly longer worst-case transmission times and higher deadline ratios [SRT+18a]. these messages are only transmitted within their respective VLANs. The existing algorithms for the automatic VLAN configuration of the end nodes, e.g., the algorithm proposed by Rooney et al. [RHK99], target the minimization of the broadcast traffic in the network as the sole design objective. Automotive networks, however, differ from local or metropolitan area networks in many key aspects: As minimizing the weight and the monetary costs of the network hardware is one of the main design objectives, the network bandwidth is limited, so that messages are sent over statically configured routes from source to destination instead of transmitting them as broadcasts. Consequently, in the context of security, the VLAN mechanism is not used to limit the amount of broadcast traffic. Instead, the main goal is to restrict the transmission of messages between end node groups with different security levels (e.g. between end nodes from the infotainment and the driver assistance domains). Similarly to the idea proposed by Daryabar et al. [DDN+11], such a transmission is then only possible if it is explicitly allowed by higher-layer protocols. Existing approaches for an automatic VLAN partitioning of the network are hardly applicable in the automotive context, as the goal here is to simultaneously optimize the VLAN partitioning and the message routings with respect to many, oftentimes non-linear and conflicting design objectives specific to the automotive domain, like the transmission timing, the link load,orthe transmission reliability, instead of minimizing the amount of broadcast traffic. To the best of our knowledge, the method presented in this section is the first approach to derive routings that are valid with respect to a given VLAN partitioning and can be used for a multi-objective optimization of the VLAN partitioning of an automotive network.

108 4.3 Constraints for Redundant Message Routing

Random Solving SAT-Decoding 0.2

-dominance 0.1 ε

0 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 Run time [s]

Figure 4.12: The measurement of the ε-dominance illustrates the gain in optimization potential provided by the application of the SAT-Decoding approach [SRT+18a].

4.3 Constraints for Redundant Message Routing3

4.3.1 Introduction As already outlined in Section 3.2, the introduction of ADAS where cars make safety-critical decisions based on messages transmitted through their communication network results in strict reliability requirements. To provide the necessary reliability guarantees, Section 3.2 presented an analysis of transient errors where messages that are transmitted over well-functioning hardware are corrupted. Of course, the possibility of permanent hardware errors, like broken links or defect switches, also has to be considered during the network design. In fact, providing the guarantee that safety-critical messages are transmitted even in cases of hardware failures is one of the mandatory steps during the design process. To provide this guarantee, a certain degree of transmission redundancy becomes mandatory. As outlined in Section 2.1.3, the TSN standard [Ins17] introduces mechanisms for frame replication and duplicate elimination and empowers the designer to create networks with a level of redundancy specifically tailored to the reliability requirements of the application athand. However, this mechanism also results in two main challenges during network design. On the one hand, similarly to the case of non-redundant routings, a large portion of the search space is made up of infeasible solutions, which, if not excluded, would result in a significant decrease of the optimization efficiency. On the other hand, the increasing size of automotive communication networks, the large number of transmitted messages, and the growing number of valid routings between two end nodes result in a tremendous expansion of the solution space, so that even an optimizer

3Major parts of this section have been published in [SRT+18b].

109 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks considering only the valid solutions may face optimization problems. The optimization problems that occur within the vast solution spaces are addressed in the next chapter of this work. In this section, we address the problem of infeasible solutions by proposing two different constraint sets for the creation of valid and possibly redundant routings. In the first approach, a set of PB variables is usedto formulate constraints describing the conditions for valid routings. Like in the constraint set presented in Section 2.2.3, these variables symbolically encode the allocation, the mapping, and the routing decisions for every link and every resource in the network. The second approach relies on a preprocessing phase, where all valid routings are found. The optimization then takes place by solving a much simpler constraint set that merely states that one of the valid routings found during the preprocessing has to be picked for every message. The remainder of this section is outlined as follows: Section 4.3.2 introduces the constraint set based on the explicit encoding of each architecture element, i.e., each link and each resource. Section 4.3.3 presents the encoding approach based on pre- processing, where each variable encodes the an entire routing graph. Finally, Section 4.3.4 provides a discussion of the related work on the topic of routing optimization. The results of the experiments investigating the optimization convergence and the result quality of the two approaches are presented in Section 5.1.7.

4.3.2 The Link Encoding Approach In this subsection, we present an approach for the encoding of possibly redundant routings that we refer to as the so-called link encoding approach. In this approach, we alter a part of the basic constraint set presented in Section 2.2.3 to enable redundant routings and introduce additional constraints that prevent communication cycles.

Relaxing the Basic Constraint Set The basic constraint set presented in Section 2.2.3 encodes non-redundant routings. The constraints that state that no resource in the routing graph must have more than one active in-link effectively prevent communication cycles. However, without the possibility for the merging of routing branches, redundant routings are impossible. In order to enable redundant routings, the basic constraint set, therefore, must be relaxed by removing the constraints defined by Eqs. (2.9), (2.11), and (2.12).

Preventing Cycles Relaxing the basic constraint set by allowing resources to have multiple in- and/or out- links enables the creation of redundant routings, like the one illustrated in Fig. 4.13a. However, the relaxed constraint set also enables the creation of invalid routings. While

110 4.3 Constraints for Redundant Message Routing

G0 G0

l0 l1 l0 l1 l2 l2 S0 S1 S0 S1 l4 l5 l4 E l5 S3 S2 S3 S2 l3 l3 l6 l7 l6 l7

G1 G1

(a) Allowing multiple in-edges enables (b) ... as well as routing cycles. the creation of redundant routings ...

Figure 4.13: Two routing graphs satisfying the relaxed constraint set. the routing graph illustrated in Fig. 4.13b does satisfy the relaxed constraint set, it also contains the routing cycle formed by the links l2–l5 and is, therefore, invalid. A cycle in the routing describes a situation where the message can pass the same resource more than once. To revisit a resource, the message must leave it using an out-link before reentering through an in-link. A cycle can, consequently, only occur in cases where an in-link of a resource is used after an out-link of the same resource. Obviously, a routing where each pair of links has a clear relative order is by definition free of cycles. The constraint set encoding valid, cycle-free and possibly redundant routings is, thus, created by adding constraints described by the Eqs. (4.16)–(4.21), introduced in Section 4.1, to the relaxed constraint set. Instead of extracting the order information from an already valid routing, these constraints are now used to enforce a link order, hereby preventing cycles.

4.3.3 The Preprocessing Approach The so-called preprocessing approach offers an alternative way of creating valid and possibly redundant routings. Instead of encoding each potential element of the routing graph of a message, the preprocessing approach encodes the activation of an entire routing graph with a single variable. Thus, the two main difficulties during the creation of valid routings, namely connecting the sender to the receiver(s) and preventing cycles, are relocated into a preprocessing phase and a much simpler constraint set can be used for the routing encoding. The preprocessing starts by finding all routings without redundancies between each pair of resources that can act as sender and receiver using a Depth-First-Search (DFS). The feasible redundant unicast routings are then created by merging all possible combinations of the non-redundant routings, hereby excluding routings with cycles from the result set. Multicast routings are created in a similar process by merging

111 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet TSN Networks all combinations of the valid redundant routings connecting the sender to each of the receivers and checking the results for cycles. After the preprocessing, each message C N is mapped onto the set B , which contains all possible routing graphs for the ∈ C C message C. Note: For very large architecture graphs with a high connectivity, calculating all possible routing paths may result in very long preprocessing times. However, automotive architectures tend to be rather sparse, so that the preprocessing times there are comparable to the time that the link encoding approach requires for the formulation of its more complicated constraints. For the preprocessing approach, the constraints given by the Eqs. (2.1) and (2.2) are adopted from the basic constraint set in Section 2.2.3. The valid choice of a routing path for message C is encoded with the following constraints: + C NC,P N−(C),Pe N (C),m = (P,G) EM,m = (Pe,Ge) EM: ∀ ∈ ∈ P ∈ P ∈ e ∈ ∑ Cb = 1 (4.55) b B ∈ C

m + me ∑ Cb 2 (4.56) − ( , ) ≤ b B G Ge ∈ C me + ∑ Cb 1 (4.57) ¬ ( , ) ≤ b B G Ge ∈ C m + ∑ Cb 1 (4.58) ¬ ( , ) ≤ b B G Ge ∈ C where the variable Cb is set to 1 iff message C is routed using the routing b and (G,Ge) BC is the set of all routings of message C with G as source and Ge as destination. The constraint in Eq. (4.55) hereby states that exactly one routing path b from the set BC, which, as detailed above, contains all routing paths (i.e., also the paths with redundancies) of message C, must be chosen for each message C. The constraint in Eq. (4.56) makes sure that the routing for the message is chosen in accordance with the mappings chosen for its source and destination tasks. Constraint Eqs. (4.57) and (4.58) address the case when a resource is not picked as the mapping target of the destination (source) task and make sure that no routing that contains this resource as leaf (root) node can be activated. Additionally, the preprocessing approach requires following constraints for a correct allocation of links and resources: C N ,b B ,l b: ∀ ∈ C ∈ C ∈ C l 0 (4.59) b − ≤ l E : ∀ ∈ l  ∑ ∑ (Cb) l 0 (4.60) C N l b B − ≥ ∈ C ∈ ∈ C

112 4.3 Constraints for Redundant Message Routing

C N ,b B ,R b: ∀ ∈ C ∈ C ∈ C R 0 (4.61) b − ≤ R NR: ∀ ∈  ∑ ∑ (Cb) R 0 (4.62) C N R b B − ≥ ∈ C ∈ ∈ C

Experimental Results. Please refer to Section 5.1 in the next chapter for an in-depth discussion of the optimization performance and the experimental evaluation of the proposed approaches for redundant routing.

4.3.4 Related work There are numerous approaches to increase the reliability against hardware failures beside the redundant message transmission. Hereby, the predominant approach is hardware redundancy, where several instances of the same task are bound onto dif- ferent resources to compensate for a possible resource failure. Various works present approaches for a reliability optimization based on hardware redundancy [JKH05; TMA+05; XLK+07; GLR+08]. The authors of [RGL+08] optimize the voter place- ment, while the approach presented in [LGT09] exploits inherent data redundancy. Another approach is the so-called hardening of individual resources, i.e., a resource upgrade that makes the failure of the respective resource less likely, hereby increas- ing the reliability of the overall system. The authors of [KZ12; AGK+14; KRG+14; AVG+16] present approaches to find the optimal hardening candidates by introducing the concept of so-called importance measures, quantifying the reliability impact of the individual resources. Yet, none of the above approaches address redundant routings as a mechanism to increase system reliability. Several approaches for the encoding of valid routings can be found in literature. The encoding strategy used in [LSG+09] is hop-based. While the hop-based approach is well suited for finding valid routings in multi-hop networks, it is outperformed by link-based approaches such as [LSF14; GRG+14], as it requires a bigger number of constraints and variables. Both [NDR16] and [SGR+17b] are dealing with the optimization of routings in TSN networks. Yet in both works, the focus is on the interrelation between the schedule of the time-triggered traffic and the message routing. Redundant routings arenot considered, so that the presented constraints solve a much easier problem. In addition, the approach in [NDR16] relies on an ILP that is tailored to the optimization of the schedule and can, therefore, not be used for the optimization of other objectives or a multi-objective optimization. To the best of our knowledge, [SRT+18b] is the first work that proposes an approach for the generation of redundant routings and a way to use such an approach for an effective optimization of the reliability as one of multiple design objectives.

113 4 Constraints Characterizing Valid Message Routings and Schedules for Ethernet Time-Sensitive Networking (TSN) Networks

4.4 Conclusions

In this chapter, we proposed multiple constraint-based approaches for the encoding of valid configurations of networks that use the novel TSN features from the areas of timing, security, and reliability. The presented approaches can be integrated into SAT-Decoding and, consequently, can be used not only for the creation of feasible network designs, but also for an automatic optimization of the global schedule, the VLAN partitioning, and the transmission reliability together with arbitrary other design objectives. The presented approach for the generation of a global TSN schedule enables the creation of a valid message routing and a valid schedule in a single step. In contrast to existing approaches, the proposed joint constraint set for routing and scheduling can be used throughout the entire optimization run and allows to exploit constraint preprocessing and the learning mechanisms of modern SAT solvers. The proposed approach for the optimization of the VLAN partitioning provides the, to the best of our knowledge, first constraint set encoding message routings that are valid with respect to a given VLAN partitioning of the communication network. It is fully transparent to the designer and automatically generates valid routings while altering the application graph and the task bindings in accordance with the chosen routing. Finally, we presented two different approaches for the encoding of possibly redun- dant message routings. These approaches create message routings in accordance with TSN’s seamless redundancy mechanism [Ins17] and enable an automatic optimization of redundant message routes.

114 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality

Chapter 4 presented multiple constraint sets encoding communication networks which are valid with respect to design-specific requirements (such as, e.g., having a valid global schedule or a routing incorporating also Virtual Local Area Network (VLAN) partitioning). In combination with the SAT-Decoding approach, described in Section 2.2.2, such a constraint set enables an optimization where the search space of the optimizer is reduced from the space of all possible design decisions to the space of all possible valid problem solutions. SAT-Decoding, like many constraint-based approaches, is based on constraint systems that are typically focused solely on (a) ex- cluding all infeasible solutions and (b) not excluding any solutions that are feasible, as this may lead to a loss of (Pareto-)optimal solutions. The integration of such highly generic constraint systems into optimization approaches like SAT-Decoding enables an optimization that is applicable to a wide range of different problems and objectives and can be easily extended. However, this generality also introduces a strong abstraction from the concrete problem, such that many problem-specific traits, concerning both the shape of the search space and the functionality of the design objectives, are not reflected in the encoding. Not being able to exploit vast amounts of problem-specific knowledge accumulated by domain experts can seriously limit both the convergence speed and the quality of the optimization results, or even make the optimization infeasible in cases of high problem sizes or complex design objectives. This chapter, therefore, investigates possibilities to inject problem-specific knowledge into the optimization of Ethernet-based communication networks. In Section 5.1, we study the problem of reliability optimization of redundant routings, introduced in Section 4.3. The presented experiments provide evidence that the huge size of the design space in combination with the complex relation between the routing and the system-level reliability result in an optimization problem where

115 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality the SAT-Decoding approach is not able to provide high-quality solutions, even for small problem sizes. As a remedy, we present an extension of SAT-Decoding that we refer to as Artificial Gene Design (AGD). In AGD, additional constraints are added to the constraint set used in SAT-Decoding. These constraints encode objective-specific variables that represent implementation traits with a high importance to the current design objectives. By varying the genomes associated with these variables, the so- called Artificial Genes (AGs), the Evolutionary Algorithm (EA) used in SAT-Decoding is able to directly adjust the implementation traits that are relevant for the design objectives. AGD, thus, enables the injection of objective-specific knowledge into the optimization. We apply the AGD approach to the reliability optimization of redundant routings. By encoding AGs for (a) the criticality of network links and (b) the number of redundant paths of messages, we inject the knowledge that a higher number of redundant paths and a lower number of critical links improve the system reliability. The presented experiments show that the usage of AGD enables an optimization that provides results of significantly higher quality. While AGD is based on the encoding of additional constraints, the approach pre- sented in Section 5.2 uses problem-specific knowledge to significantly reduce the number of constraints necessary to encode the search space of a routing optimization problem. The approach is based on a lightweight algorithm that identifies so-called proxy areas in the given network, i.e., areas with exactly one possible route between each pair of nodes. In contrast to areas with a variety of different routes, proxy areas do not offer any potential for routing optimization. Including proxy areas into the routing encoding, therefore, unnecessarily increases the routing search space. In the second section of this chapter, we show how knowledge about the partitioning of the network—into variety areas on the one and proxy areas on the other hand—can be exploited to enable a significantly faster optimization and, in some cases, a higher quality of the optimization results.

5.1 Artificial Gene Design1

5.1.1 Introduction In the last chapter, Section 4.3 presented two different formulations for the encoding of valid redundant routings. While both approaches are correct, in that they cover the complete search space of valid routings, the experimental results (see Section 5.1.7) show that, when integrated into the SAT-Decoding-based optimization loop according to Fig. 2.12 in Section 2.2.2, the optimization fails to find solutions with optimal reliability in both cases, even for small problem sizes. In this section, we address this and similar problems by proposing an extension of the SAT-Decoding approach.

1This section extends the concepts presented in [SRT+18b].

116 5.1 Artificial Gene Design

After introducing the approach used for the formal reliability analysis of Ethernet networks in Section 5.1.2, we examine the concrete optimization difficulties of the reliability optimization problem, in particular the interrelation between the imple- mentation structure and the reliability objective, in Section 5.1.3. Section 5.1.4 then abstracts from this concrete to the general problem that the optimizer is not able to adjust implementation characteristics with high relevance for the objective because of the so-called genetic gap. Section 5.1.5 presents a remedy for this problem, namely the AGD approach, where additional, design-objective-specific genes are created to significantly increase the impact that the decisions made by the optimizer haveonthe design objectives. Finally, we revisit the reliability optimization problem in Section 5.1.6 and solve it by applying the AGD approach and encoding genes for the link criticality and the number of redundant paths.

5.1.2 Formal Reliability Analysis of Ethernet Networks In automotive, reliability is also one of the biggest concerns besides timing that requires guarantees. Each component of a communication network has a certain probability of failure. In automotive systems, where the communication network is used for the transmission of safety-critical messages, such a failure can have catastrophic consequences. The ability to analyze the likelihoods of events leading to catastrophic failures of the system is, therefore, essential. In the work at hand, we analyze the impact of permanent hardware errors by applying an analysis technique based on fault trees, where the probability of the system failure is formally calculated based on the failure probability of its components. For reliability evaluation, we use the failure model presented in [GLR+08]. The system is assumed to be functional iff (a) each application task is bound onto a functional resource and (b) each message has at least one correct route consisting of functional resources and links. As reliability concepts such as redundant binding or the hardening of the processor resources are out of the scope of this work, the message routes constitute the only reliability-relevant degree of freedom during the following optimization.

5.1.3 Optimization Challenges When integrated into SAT-Decoding and used for the optimization of reliability with respect to permanent hardware failures, both approaches for the encoding of redundant routings yield a non-redundant message routing (with the shortest message routes possible) as the exploration result. While a redundant message transmission offers higher reliability in general, the SAT-Decoding-based optimization with the constraints presented in Section 4.3 has shown to fail to find the reliability-optimal solution. There are two reasons for this observation:

117 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality

G0 G0 G0

S0 S1 S0 S0 S1

S3 S2 S3 S3 S2

G1 G1 G1

(a) Long non-redundant (b) Short non-redundant (c) Redundant routing. routing. routing.

Figure 5.1: Short non-redundant routings contain a smaller number of critical elements and are, thus, more reliable than longer routings. Redundant routings offer the highest reliability.

On the one hand, the step from non-redundant to redundant routings significantly increases the size of the search space formed by all possible valid solutions. Naturally, finding the optimum without being stuck in local optima is much harder inabigger search space. Having the reliability of the communication system as an optimization goal is the second reason for the optimization difficulties, as this design objective hasan especially complicated interrelation with the structure of the evaluated implementation. The two design directions leading to a more reliable transmission of a single message are illustrated in Fig. 5.1. For non-redundant routings, where the failure of each link or resource automatically leads to the failure of the message transmission, reducing the number of potential points of failure improves the reliability. Shorter routings (Fig. 5.1b), therefore, offer a better transmission reliability than unnecessarily long message routings (Fig. 5.1a). A redundant transmission (Fig. 5.1c), of course, offers even better reliability by enabling a successful transmission even in cases where the elements of one of the redundant routes fail. During the optimization of the system-level reliability, solutions where all messages are routed using the shortest possible non-redundant routes, thus, constitute local optima. Indeed, leaving these local optima is very hard, as it requires to change the routing of all messages in a single optimization step (routing only a subset redundantly does not improve the reliability, as the links critical for the non-redundant message still lead to a system failure).

118 5.1 Artificial Gene Design

5.1.4 The Genetic Gap For the explanations that follow, we use a terminology where the progress threshold θ denotes the minimal number of genetic changes that must be applied to an individual in the population to obtain an improvement with respect to the design objective. The modifictation rate M designates the number of genetic changes that the variation operator of the used optimizer can apply to a parent individual during the creation of an offspring. Naturally, the condition for the improvement of the design objective is M θ. ≥ As detailed in Section 2.2.2, the optimizer used in SAT-Decoding controls the course of the optimization by setting the solving strategy used for the constraint resolution by the SAT solver. The optimizer hereby modifies the genes of the processed individuals. The genes encode the order in which the variables are set by the solver and the so-called phase, i.e., the first value that is assigned to the variable during the constraint resolution. As the SAT solver may have to backtrack to create feasible solutions, the variable assignment returned by the SAT solver may differ from the phase values specified in the genotype. The phases provided by the optimizer can, thus, be considered as an encoding of preferences concerning certain implementation characteristics rather than actual design decisions. The choice of the variables used for the encoding of valid solutions may have signi- ficant implications on the optimization process. Not only does the number of theused variables determine the time needed for constraint resolution. The chosen variables also affect the structure of the used genotypes and the implementation preferences that can be encoded therein, as well as the number of genetic modifications required for a specific change of the implementation. Figure 5.2 illustrates the course of the transmission reliability optimization using the link encoding approach (see Section 4.3.2) for an exemplary network with three messages. At the beginning of the optimization process (point A) the random creation of individuals typically leads to a situation where (a) most links in the architecture are used by at least one message and, therefore, allocated as part of the implementation architecture and (b) most message routes contain links that are critical, i.e., links whose failure automatically disconnects the sender and the receiver in the respective routing graph2. The implementations created during the early stages of the opti- mization, hence, have a very low transmission reliability, as (a) they contain a lot of components (links and switches) that can fail and (b) most of these components are vital for the transmission route of at least one message and, therefore, constitute single points of failure. An example of such an implementation—which is referred to as implementation A in what follows—is illustrated above the point A in Fig. 5.2. Although implementation A and similar implementations offer poor transmission reliability, their reliability can be easily improved by the optimizer, since a large

2Note that, by this definition, routing graphs without redundancies consist of critical links only.

119 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality

MTTF M θ M θ ≥ ≪ Genetic gap C

G0 G0 G0

S0 S1 S0 S0 S1

l0 S3 S2 S3 S3 S2 l1 l2 G1 G1 G1

B

A

Genetic distance

Figure 5.2: Illustration of the genetic gap during the transmission-reliability optimiza- tion of a network with three messages (orange, blue, and green). Note that in the modeled network, any pair of resources is connected by at most one link. For the sake of clarity and in contrast to other visualizations in this work, the transmission over a link is shown with an extra edge for each message (even in cases where the same link is used for the transmission of different messages). number of links is used by only a small number of messages. For example, altering implementation A by routing the blue message over link l2 instead of link l0 and link l1 improves the overall transmission reliability, as it removes two links that constitute single points of failure from the architecture. Indeed, an implementation where all messages use the same route given by the shortest non-redundant path between sender and receiver—illustrated above point B in Fig. 5.2—contains a lower number of single points of failure than implementation A and, consequently, has a higher transmission reliability. Furthermore, implementation B can be easily created by the iterative optimization process because the exclusion of a single point of failure (a) can be achieved by rerouting a small number of messages—i.e., requires a small number of genetic modifications—and (b) immediately results in an improvement w.r.t. the transmission reliability. In other words, the course of the optimization between the points A and B in Fig. 5.2 is characterized by a low progress threshold θ, which is smaller than the modification rate M of the optimizer (M θ). After the optimizer ≥ creates an implementation with short non-redundant routes, a further optimization is, however, significantly more difficult. At this point, the transmission reliability canbe

120 5.1 Artificial Gene Design further increased by routing the messages redundantly, hereby further reducing the number of critical links. Yet, since a component remains critical as long as it is used non-redundantly by at least one message, an improvement of the system reliability can be only achieved by rerouting all messages at once (see the difference between point B and point C in Fig. 5.2). In this case, the progress threshold corresponds to the large number of genetic changes necessary to modify the routings of all messages and exceeds the modification rate of the optimizerM ( θ). The optimizer, consequently, ≪ cannot breach this so-called genetic gap and stagnates in the local optimum without redundancies.

5.1.5 Artificial Gene Design The genetic gap problem, where the modification rate of an optimizer does not suffice to overcome the progress threshold, is not specific to reliability optimization. This problem can appear during any optimization with a large design space (large genotypes) and complex interrelations between the genotype and the design objectives, so that it is quite likely to appear in many other optimization problems in automotive and other domains. What makes it worse is that, unlike a slow optimization convergence, it cannot be solved by adding more computation power, because the genetic gap must be overcome within a single step, so that adding more optimization iterations is useless. Increasing the modification rate of the optimizer, for example by allowing itto use more aggressive mutations, is also unlikely to solve the problem. On the one hand, making too many genetic changes between two subsequent generations makes it harder for the optimizer to learn patterns between the genome and the objective functions. In fact, with a sufficiently high modification rate, the whole optimization degenerates to a random search. On the other hand, simply increasing the number of changed genes will not suffice to solve complicated optimization problems. Consider the step between point B and point C in Fig. 5.2. To make the step from non-redundant to redundant routings, it is not enough to just change a high number of genes, because only a change to a very specific gene assignment leads to redundant routings and, consequently, to a reliability improvement. Since the optimization difficulties cannot be solved by increasing the modification rate, the progress threshold must be lowered by increasing the impact of individual gene changes on the objective function. For this purpose, the constraint set based on variables encoding the implementation structure (binding, allocation, routing), hereafter referred to as the Structural Variables (SVs), is extended by constraints that encode so-called Artificial Variables (AVs). Each AV, if activated, enforces a certain assignment of a set of SVs and has no effect if it is deactivated. The artificial genes that are created based on the AVs, thus, act as a control switch for the optimizer and empower it to enforce a (problem- or objective-) specific assignment of a (potentially big) set of SVs by modifying a small number of genes. In the context of SAT-Decoding, introducing AVs adds new genes to the genotype

121 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality which encode additional, objective-specific preferences for implementations with certain characteristics, which are likely to improve the quality of the optimization results by exploring promising areas of the design space. The AGD approach, thus, makes problem- and objective-specific knowledge accessible to the optimizer.

5.1.6 Applying AGD to the Reliability Optimization The reliability optimization of redundant message routings constitutes a good opportu- nity for the application of the AGD approach. Based on our experimental observations, without objective-specific encoding extensions, the optimization cannot overcome the genetic gap and gets stuck in the local optimum formed by the non-redundant solution. At the same time, the reliability problem is well understood, so that it is known that the number of redundant paths and the number of links constituting single points of failure has a significant influence on the system reliability. In the work at hand, we enable an exploration of the parts of the design space containing the solutions with redundant routings by applying the AGD approach encoding two types of AVs. The AVs encode the number of redundant paths available to a message and the criticality of network links. A link is hereby considered critical if the link constitutes a single point of failure, i.e., if its failure directly leads to the failure of the system. By encoding these variables, we inject reliability-specific problem knowledge into the optimization and extend the genotype to include genes that encode preferences for implementations where messages have a certain number of redundant paths or where certain links are not critical.

Encoding the Path Number n n The number of redundant messages is encoded by encoding the variables f CP . f CP is hereby set to 1 iff the communication flow of message C is routed to the binding target of the process P over exactly n redundant paths. These variables are encoded by the following constraints: + C NC,P NP (C): ∀ ∈ ∈ nmax n ∑ f CP = 1 (5.1) n=nmin

nmax P P n  (CR) (C +C ) + (n 1) f P = 1 (5.2) ∑ − ∑ l=(R¯,Re) l=(Re,R¯) ∑ − · C R NR l=(R¯,Re) E n=nmin ∈ ∈ l While the constraint in Eq. (5.1) states that a path number (within the bounds nmin and nmax, provided by the designer) has to be chosen for each communication flow, the constraint in Eq. (5.2) enforces the chosen path number during the creation of the routings. The constraint in Eq. (5.2) is based on the fact that the difference between the number of nodes and links in a routing with a single path equals 1 and is decremented

122 5.1 Artificial Gene Design

by 1 by each additional path added to the routing. For example, if both nmax and nmin are set to 2, the rightmost term of Eq. (5.2) evaluates to 1 and the constraint enforces a difference of 0, which occurs only for routings with exactly two redundant paths.

Encoding the Link Criticality

CP The hl (hl ) variables inject knowledge about link criticality into the optimization. CP Hereby, the variable hl (hl ) is set to 0 if the failure of link l does not lead to the failure of the system (transmission of the message C to the binding target of process P). Link criticality variables are encoded by the following constraints: + C NC,P N (C),R NR,Re NR: ∀ ∈ ∈ P ∈ ∈ P P C C′ 0 (5.3) l=(R,Re) − l=(R,Re) ≥

P P CP C +C′ h 1 (5.4) l=(R,Re) l=(R,Re) − l ≤ P P CP C + C′ h 1 (5.5) l=(R,Re) ¬ l=(R,Re) − ¬ l ≤ P h hC 0 (5.6) l − l ≥ These constraints are based on the idea that, after defining the actual, possibly redun- dant path for the transmission of a message, the criticality of a link can be defined in relation to a non-redundant reference path connecting the source to the destination. We P use the constraints defined in Section 2.2.3 to encode the C′ variables describing l=(R,Re) the reference path. The reference path has to be built over links already in use by the actual path (5.3). A link is critical for a communication flow if it is on both the actual and the reference path (5.4), while it is not critical if it is on the actual path only (5.5). A link is considered entirely uncritical if it is not critical for any communication flow (5.6). Example: The idea used for the encoding of link criticality is illustrated in Fig.5.3 and Tab. 5.1. In the implementation illustrated in Fig. 5.3 the message under con- sideration is routed from the ECU G0 to the ECU G1 over a route consisting of the links l0–l5. The encoded reference path is illustrated with dashed arrows and consists of the links l0–l2. Table 5.1 illustrates the variable assignment that corresponds to the implementation in Fig. 5.3. Note that, while none of the links used in the actual routing of the message is critical, only the variables encoding the criticality of links which are not on the reference path (l3–l5) are set to 0. The link criticality variables encoded by the reference-path-based constraints (Eqs. (5.3)–(5.6)), consequently, do not fully comply with the notion of link criticality. Link criticality variables that fully reflect whether a link in a message route is expendable can be encodedwith a constraint set based on link order. However, our experimental results have shown that a link-order-based encoding of link criticality does not provide any optimization

123 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality

G0

l0 l3

S0 S1

l1 l4

S3 S2

l2 l5

G1

Figure 5.3: The encoding of link criticality requires the encoding of a non-redundant reference route (dashed arrows) in addition to the actual, possibly redun- dant message route (solid arrows). benefits compared to the reference-path-based encoding, while it requires alarger number of encoding variables and results in significantly longer constraint resolution times3.

Applying AGD for the Preprocessing Approach Compared to the link encoding approach, the encoding of the AGs for the path number and the link criticality is much simpler for the preprocessing approach (see Section 4.3.3), as it relies on information gathered during the preprocessing phase. There, each valid routing is checked to gather the information about its path number and the link criticality. The AGs are then encoded by formulating constraints that bind them to the activation of the corresponding routing paths. For example, the link criticality

3In the work at hand, both the constraint equations of the link-order-based criticality encoding and the presentation of the comparison experiments are omitted for the sake of brevity.

Table 5.1: The assignment of the link criticality encoding variables for the implemen- tation illustrated in Fig. 5.3. P P CP link C C′ h l=(R,Re) l=(R,Re) l

l0 1 1 1 l1 1 1 1 l2 1 1 1 l3 1 0 0 l4 1 0 0 l5 1 0 0

124 5.1 Artificial Gene Design

G8 G9 G10 G6

S2 S2 S3

S0 S1 S0 S1

G0 G1 G2 G3 G4 G5 G6 G7 G0 G1 G2 G3 G4 G5

(a) Automotive (b) Synthetic

Figure 5.4: The architectures used for the experiments [SRT+18b]. variable of a certain link is activated iff at least one of the routing paths where the link is critical is active.

5.1.7 Experimental Results We now demonstrate how the quality of the optimization results changes when the SAT- Decoding approach is enhanced with the AGD mechanisms detailed in the previous section. We use an automotive application and two distinct architectures to perform a reliability optimization as well as a multi-objective optimization where we maximize the reliability concurrently to minimizing the number of allocated links.

Application and Architecture For our experiments, a use case from the automotive domain is considered: The application contains 64 messages that are sent in both uni- and multicast. We consider all tasks in the given application as safety-relevant. We evaluate the proposed approach on two different architectures: (a) a classic automotive architecture (Fig. 5.4a) with relatively few components and communication paths and (b) a synthetic automotive ar- chitecture (Fig. 5.4b) that offers more possibilities for both, routings and redundancies, and, consequently, provides a significantly harder reliability optimization problem.

Optimization Objectives System Reliability. In the following experiments, the system reliability is quantified by the so-called Mean Time To Failure (MTTF). The MTTF specifies the average time until a fault—as defined in Section 5.1.2—occurs in the considered network. For the calculation of the MTTF, we are using the JReliability framework [GRL+14], which implements the approach from [GLR+08] to calculate the MTTF of the system based on given failure rates of the individual system components. The overall system reliability, MTTFS , is one of the primary optimization objectives (denoted with O) in all experiments: O1 = MTTFS (5.7)

125 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality

Number of allocated Links. Our second primary optimization objective it to mini- mize the number of allocated links:

O2 = ∑ R(l) (5.8) l E ∈ l where R(l) is a function that returns 1 if link l is allocated in the considered imple- mentation and returns 0 otherwise.

Secondary Objectives. In order to lower the selection pressure towards the local optimum without routing redundancies and enable the optimization to find a more diverse set of solutions, we introduce three secondary objectives – that correlate ≀1 ≀3 with the system reliability MTTFS . In our experiments, all secondary objectives are maximized.

Average individual MTTF. A high reliability for the transmission of each individual message is a precondition for a high reliability of the communication network. We therefore calculate the individual MTTFCP values for each communication flow CP (i.e., the MTTF of a system where the application does not contain any other communication flows) and use the average of these values MTTFCP as a secondary optimization objective: = MTTF (5.9) ≀1 CP

Number of uncritical links per communication flow. A communication flow is routed in a more reliable fashion if it contains links that are uncritical. A link is uncriti- cal if after its failure, the routing graph of the communication flow still contains a path from source to destination. The sum of links that are uncritical for the communication flows is used as the second secondary objective:

= C (l,C ) (5.10) ≀2 ∑ P l E ,C N ,P N + ∈ l ∈ C ∈ C where C (l,CP) is a function that returns 1 if link l is not critical for the communication flow CP and returns 0 otherwise.

Number of globally uncritical links. The third secondary objective is to maximize the number of globally uncritical links, i.e., links that are uncritical for all communica- tion flows: 3 = ∑ C (l) (5.11) ≀ l E ∈ l where C (l) is a function that returns 1 if link l is not critical for any communication and returns 0 otherwise.

126 5.1 Artificial Gene Design

Note: It is important to highlight the difference between the usage of link criticality within secondary design objectives on the one hand and its integration into the AGD approach on the other. Both measures are used to achieve a faster convergence towards a more diverse Pareto front and affect the optimization phase denoted as step 5 in Fig. 2.12. However, while the addition of an optimization objective affects the parent selection, integrating a system characteristic as an AG enables the variation operator of the EA to directly alter objective-specific implementation preferences encoded in the genotype. For example, in the case of link criticality, AGD enables the EA to generate a genome that encodes the preference for an implementation where a certain link is not critical.

Results Optimizing the Reliability as primary Objective. Figures 5.5 and 5.6 illustrate the results of the routing optimization of the automotive (5.4a) and the synthetic (5.4b) architectures with the objectives O , , , and as detailed above. The 1 ≀1 ≀2 ≀3 plots show the ε-dominance (see Section 2.2.4) averaged over 20 optimization runs over the run time of the experiment when using just the constraint sets of the Link- Encoding approach (LE) (black) and the Preprocessing approach (PP) (green) which are described in Section 4.3. The other plots illustrate the course of the ε-dominance when the AGD approach is applied to the link-encoding approach (LE-AGD) (red) and the preprocessing approach (PP-AGD) (blue) by encoding both the link criticality and the number of the routing paths as AGs (Section 5.1.6). The LE approach exploits no redundancies at all and shows no or only a slight convergence towards the reference Pareto front. In the LE approach, the genotype consists solely of structural genes encoding the preferences for the allocation of each link and the routing of each message over each link. Therefore, overcoming the local optimum without redundancies is extremely unlikely, as the EA would have to alter a great amount of genes within a single variation step. On the other hand, the PP approach is able to explore the reference Pareto front in case of the automotive architecture and provides a Pareto front with greater diversity than the LE approach in case of the synthetic architecture. This can be explained by the fact that in the PP approach, the genes encode a preference for entire routing graphs of the messages. Consequently, the step between two genotypes A and B, where genotype A encodes a preference for an implementation without redundancies, while genotype B encodes a preference for an implementation with redundancies, requires a smaller number of genetic changes and is more probable than in the case of the LE approach. Both the LE-AGD and the PP-AGD approaches use genotypes containing genes that encode preferences for certain numbers of redundant paths and the criticality of links, implementation characteristics with a huge impact on the system reliability. With these approaches, the smallest number of genetic changes is required to create

127 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality implementations with a high number of redundant routings. While the necessity to generate additional constraints encoding additional variables results in a longer run time, the AGD-based approaches exhibit the strongest convergence towards the reference Pareto front and provide results of the highest quality.

Concurrent Optimization of Reliability and the Link Number. In our second ex- periment, we again optimize the message routings for the architectures depicted in Fig. 5.4. This time, however, the optimization is conducted with the link number as an additional primary design objective, so that we optimize the objectives O , O , , , 1 2 ≀1 ≀2 and , as defined in Section 5.1.7. ≀3 The results of this experiment are illustrated in Figs. 5.7 and 5.8. The LE approach still cannot overcome the solution without redundancies and shows no convergence for the automotive architecture. However, with the number of allocated links as an additional objective, we see a significant quality difference between the results provided by both preprocessing-based approaches and the results provided by the LE- AGD approach. In order to understand this difference, we investigate the Pareto fronts yielded by the PP-AGD and the LE-AGD approaches as a result of a multi-objective optimization over 5,000 generations (Figs. 5.9 and 5.10). These Pareto fronts show that the weaker ε-dominance of the preprocessing-based approaches that is clearly visible in Figs. 5.7 and 5.8 can be explained by the fact that the LE-AGD approach provides a Pareto front with a significantly higher diversity with respect to the number of allocated links. This observation can, once again, be explained by the preferences encoded within the genotypes used by the different approaches. The preprocessing-based approaches do not encode a preference for the allocation of links, so that a higher number of genetic changes is required to find solutions with a smaller number of allocated links. When minimizing the link number, the allocation of individual links becomes an objective-relevant implementation trait, so that the LE approach, which encodes the preference for the allocation of individual links, exhibits a stronger convergence than in our first experiment. The LE-AGD approach uses a genotype that encodes preferences for both link allocation, link criticality, and path number and provides results of the highest quality.

5.1.8 Related Work In the area of evolutionary algorithms, it is known that the choice of the genetic encoding has a strong influence on both the convergence and the result quality of the optimization. In [RWB+04] and [PB17], a situation similar to the previously described genetic gap, where solutions with similar fitness levels have a large number of genetic differences, is referred to as a so-called Hamming Cliff. Numerous works like [Sav97; Whi99; GWE+05] provide evidence that encodings with a smaller ham-

128 5.1 Artificial Gene Design

1 LE LE-AGD PP PP-AGD 0.5 -dominance ε

0 0 50 100 150 200 250 300 Run time [s]

Figure 5.5: Automotive architecture (Fig. 5.4a). Even in this simple case, the LE approach does not manage to overcome the local optimum without re- dundancies, while the PP and both AGD approaches quickly converge towards the front of Pareto-optimal solutions [SRT+18b].

1 LE LE-AGD PP PP-AGD 0.5 -dominance ε

0 0 100 200 300 400 500 600 700 800 900 Run time [s]

Figure 5.6: Synthetic architecture (Fig. 5.4b). The LE approach cannot create solu- tions with redundant routings and, thus, hardly converges towards the reference Pareto front. The PP approach exhibits a stronger convergence and provides results of higher quality, especially when applied with AGD. The LE-AGD provides results of the highest quality [SRT+18b].

129 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality

1 LE LE-AGD 0.8 PP PP-AGD 0.6 -dominance ε 0.4

0 50 100 150 200 250 300 Run time [s]

Figure 5.7: Automotive architecture (Fig. 5.4a). While the simple LE approach still does not find solutions with redundant routings, there is now aclear difference between the result quality of both PP approaches on the one and the LE-AGD approach on the other hand [SRT+18b].

1 LE LE-AGD PP 0.8 PP-AGD

0.6 -dominance ε 0.4

0 100 200 300 400 500 600 700 800 Run time [s]

Figure 5.8: Synthetic architecture (Fig. 5.4b). The difference in the result quality between the preprocessing-based approaches and the LE-AGD approach is bigger in the case of the more complex synthetic architecture [SRT+18b].

130 5.1 Artificial Gene Design

LE-AGD 10 PP-AGD 2 O

9

O1 [s ]

Figure 5.9: Automotive architecture (Fig. 5.4a). While both approaches are equally good at finding solutions with a higher system reliability (objective O1), the PP-AGD approach fails to find solutions with smaller numbers of allo- cated links (objective O2) and, therefore, displays a worse ε-dominance (Fig. 5.7) than the LE-AGD approach [SRT+18b].

20 LE-AGD PP-AGD 15 2 O

10

4 O1 [s ] 10 · Figure 5.10: Synthetic architecture (Fig. 5.4b). As in the experiment from Fig. 5.6, the LE-AGD approach is slightly better at finding solutions with a high system reliability (objective O1). Its huge advantage with respect to the ε-dominance is caused by its superior ability to find solutions with + smaller numbers of allocated links (objective O2) [SRT 18b].

131 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality ming distance—which, in this special case, refers to the number of genetic variations necessary to transform the genotype of an individual into the genotype of an individual with a similar (better) fitness level—significantly improve the result quality andthe convergence of the optimization. Yet there is, to the best of our knowledge, no general solution beside the usage of an encoding specifically tailored to a certain problem and the current set of design objectives. With the AGD approach we, for the first time, propose a technique which allows to lower the hamming distance of the genetic encoding. The injection of problem- specific knowledge is done by adding new SAT constraints and does not require in-depth knowledge about the genetic encoding used in SAT-Decoding. In contrast to existing work, the application of AGD does not depend on the usage of a custom- tailored encoding. Instead, the existing genotype is extended with AGs, which act as genetic switches for certain problem-specific traits. Interestingly, recent findings from the field of biology show that a mechanism similar to the genetic switches used in AGD also occurs in nature. In [KLG+18], the authors present evidence that in certain fish species complex characteristics, such as the color pattern, can depend on a single gene that serves as a molecular switch when activated and switches entire groups of other genes on or off. It is assumed that the ability to achieve large changes in phenotype with the alteration of fewer genes improves the ability of the fish population to quickly adapt to changing conditions. Thefact that a similar mechanism has prevailed in the competitive setting of natural selection makes AGD all the more promising for an application in the area of multi-objective optimization.

5.2 Variety-Aware Routing Encoding4

As outlined in Chapter 2, constraint sets describing valid embedded systems typically encode a valid binding of application tasks onto resources, a valid allocation of resources, and a valid routing of messages that are transmitted between resources executing data-dependent tasks [BTT98]. The constraint solving effort typically scales exponentially with the number of encoding variables and has an immense impact on the efficiency and even the feasibility of the overall optimization [RG18]. Except for the most trivial network topologies, the routing constraints are the most complex part of the constraint set and introduce the largest share of encoding variables. The majority of state-of-the-art routing encodings fall into one of the following two classes: (a) so-called Route Preprocessing (RP) approaches rely on a preprocessing phase to find all possible routes connecting each pair of nodes [LPS16; GZP+17]. During the optimization, a message is then routed by picking one of the routes connect- ing its source to its destination. Alternatively, (b) so-called Componential Assembly

4Major parts of this section have been published in [SPG+19].

132 5.2 Variety-Aware Routing Encoding

(CA) approaches encode an activation variable for the routing of each message over each network component—e.g., a link or a node—and formulate constraints for the assembly of valid routes [ABC+13; NDR16; SDT+17; MAS+18]. The resolution of these constraints provides an assignment of the activation variables, and a valid route is created by assembling the activated components. As shown in [GRG+14], the strengths and weaknesses of RP and CA approaches are somewhat complimentary. CA approaches encode the usage of each network component, regardless of the network topology. This not only introduces a high number of encoding variables—especially in case of complex routing behaviors like multicasts [LSF14] or redundant transmissions [SRT+18b]—but also results in an unnecessarily complex description of the optimization search space. RP approaches, on the other hand, use the preprocessing phase to acquire total knowledge of the topology and provide the optimizer with a compact description of the actual search space. However, their need for an exhaustive analysis of all routing options limits the applicability of RP approaches to sparsely-connected topologies where the number of possible routes is small, even for large networks. For dense topologies, the encoding overhead of CA approaches is more than compensated by their superior scalability. Furthermore, as seen in Section 5.1, the fine-grained encoding of routing decisions not only enables the formulation of additional constraints for, e.g., link capacity or the mutual exclusion of components, but also results in a much better performance when optimizing objectives, such as monetary cost or reliability, which are strongly influenced by individual components of the routes. CA and RP approaches excel when processing either densely- or sparsely-connected graphs, respectively. In this section, we introduce an optimization approach that combines the strengths of the two strategies. Real-life networks rarely completely fall into an extreme connectivity category, but are rather a connection of several subgraphs that are either sparse or dense. In the proposed approach, the given network is, therefore, divided into two types of areas: (a) proxy areas where there is exactly one routing possibility between each pair of nodes and (b) areas with a variety of different routes connecting each node pair, the so-called variety areas. The proposed approach exploits the fact that—since any potential for routing optimization is based on a variety of different routing options—excluding proxy areas from the optimization does not limit the search space, as any valid route can be unambiguously described by specifying the routes within the variety areas. The simple network in Fig. 5.11, e.g., consists of one variety (left) and one proxy area (right). To connect resource R0 to resource R4, the architecture offers three distinct routes (I, II, and III) which differ from each other only in the variety area of the network while using exactly the same path in the proxy area. All routes between R0 and R4 can, therefore, be uniquely specified by their route segment within the variety area between R0 and R3. In addition to a lightweight algorithm that identifies the proxy and variety areas ofa network, the work at hand proposes two different ways to integrate the proxy concept

133 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality

I R1 R4 R0 II R3 R5 III R2 R6

Figure 5.11: Resource network consisting of a variety area (yellow/dotted line) and a proxy area (green/dashed line). All differences between the routes connecting resource R0 to resource R4 (I, II, and III) occur within the variety area [SPG+19]. into any existing CA approach. The obtained encoding is tailored to the topology, combines the efficiency of RP with the scalability and extensibility of CA approaches, and can be applied to any application and any architecture. Case studies from the automotive and the many-core domain give evidence that an optimization based on the proxy concept is up to 8 times faster and yields optimization results of equal or higher quality than variety-unaware approaches like [LSF14; GRG+14; SRT+18b]. The remainder of this section is outlined as follows: Sections 5.2.1 and 5.2.2 detail the general concept of proxy regions and the algorithm used to identify proxy regions in a given network, respectively. Section 5.2.3 then explains how the proxy concept can be integrated into any approach based on the encoding of individual network components, while Section 5.2.5 provides an overview of related work.

5.2.1 Proxy Relations and Proxy Areas CA approaches encode, for each link and message, the decision whether the link is used in the route of the message. While this strategy ensures that no route is excluded from the search space, it introduces unnecessary encoding variables for certain networks. Consider, e.g., link l0 on the left side of Fig. 5.12. This link connects resource R0 to the rest of the network via resource R4. Since R0 is accessible solely through l0, one can conclude that l0 is necessarily used in each and every route to/from R0. Thus, encoding a decision variable for the inclusion of l0 in routes to/from R0 provides no added optimization value, as each route starting/ending at R0 (referred to as a proxy slave) can be uniquely specified using a corresponding route starting/ending at R4 (referred to as the proxy master of R0). Extending the concept of proxy relations to larger network areas provides an even more compact routing encoding. Consider, e.g., resources R6–R12 in Fig. 5.12. In the terminology used in this thesis, we summarize these resources and the links between them as a so-called proxy area with R6 as the proxy master of the entire area. Between each pair of resources within a proxy area, there exists exactly one possible route. In particular, there is exactly one possible route between any resource and the proxy master of the area. Consequently, any connection between a resource outside the

134 5.2 Variety-Aware Routing Encoding

Initialization Iteration 1

R0 R4 R9 R11 R0 R4 R9 R11 l0

R1 R3 R6 R7 R8 R1 R3 R6 R7 R8

R2 R5 R10 R12 R2 R5 R10 R12

Iteration 2 Iteration 3

R0 R4 R9 R11 R0 R4 R9 R11

R1 R3 R6 R7 R8 R1 R3 R6 R7 R8

R2 R5 R10 R12 R2 R5 R10 R12

Figure 5.12: The proxy areas (green/dashed line) are identified by iteratively estab- lishing transitive proxy relations between resource pairs. The variety area (yellow/dotted line) encompasses all proxy masters (blue glow) and is reduced in each iteration [SPG+19]. proxy area and a resource within the area consists of an external route that connects the outside resource to the proxy master and an internal route connecting the proxy master to the proxy slave inside the proxy area. Hereby, only the external route can be established in multiple different ways (using different sets of links) and is, therefore, relevant for routing optimization. On the other hand, there is only one possible way to create the internal route. Links within proxy areas, therefore, provide no benefit for the routing optimization and can be excluded from the routing encoding.

5.2.2 Identification of Proxy Areas Proxy areas within a given network are identified using an iterative algorithm. This algorithm generates a map of resources to their respective proxy masters (where a proxy master is mapped to itself). Initially, every resource is registered into a list of potential masters. Over the course of several iterations, the algorithm I) examines every resource in the list, II) identifies proxy slaves (resources with only one neighbor denoted as master), III) updates their map entry with their sole neighbor as their proxy master, and IV) eliminates them from

135 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality the list of masters. Proxy relations are transitive. Thus, if R is identified as master of Re, R automatically becomes the master of all slave resources of Re. The algorithm terminates when no new proxy slaves are identified during an iteration. Figure 5.12 illustrates the functionality of the algorithm. Each subsequent iteration identifies new proxy slaves, expands the known proxy areas and shrinks the variety area. The algorithm terminates when no new proxy slaves are found in the fourth iteration. Apart from the proxy masters R2–R6, which form the variety area of the network, all resources are then located inside proxy areas.

5.2.3 Adaptation of Existing Constraint Systems Exploiting the concept of proxy relations reduces the number of encoding variables and the constraint resolution effort by excluding network areas without routing variety. Following the steps presented in this subsection, this concept can be integrated into any routing optimization based on the encoding of individual network components. We first detail how existing constraint sets can be adapted to encode the route segments within variety areas only. The second part of this subsection then proposes two approaches to create the internal routes within the proxy areas.

Routing Encoding in Variety Areas CA encodings built on the assumption of fix source and destination resources that are known prior to the constraint formulation, e.g., [ABC+13] or [MAS+18], do not require any adaptation of the constraints. The impact area of these encodings can be limited by using the proxy masters where the message enters/leaves the variety area—instead of the proxy slaves actually sending/receiving the message—as the start/end points of the encoded route. The adaptation of approaches where the source and the destination of the message transmission are not known during the constraint formulation—such as [LSF14] or [SRT+18b], in which routing and task mapping are optimized jointly—requires a slightly higher overhead. Here, the transmission end points are implicitly encoded as the binding targets of the source and the destination tasks. For these cases, we propose to explicitly encode variables for the start and the end points of the encoded route. The existing routing encoding is adapted by inserting these end-point variables into any constraint that relates to the start or the end point of the route. S D We use CR and CR variables to encode the end points of the encoded route. Hereby, S D the CR/CR variable is set to 1 iff resource R is the start/end point of the route of message C. These variables are encoded by stating that resource R is the start of the route of message C if the source task of C is bound onto R or onto one of its proxy D slaves (5.12) and is not the start otherwise (5.13). The CR variables are encoded analogously.

136 5.2 Variety-Aware Routing Encoding

V C NC,Pe N ,Re NR,m = (Pe,Re) EM,R = M (Re) N : ∀ ∈ ∈ C− ∈ e ∈ ∈ R m CS 0 (5.12) e − R ≤ CS ( m) 0 (5.13) R − ∑ e ≤ me V + NC− and NC denote predecessor and successor tasks of message C, respectively. NR designates all resources within variety areas. Function M : N NV returns the proxy R → R master of a resource, as determined by the algorithm from Section 5.2.2.

Route Creation in Proxy Areas We propose two different approaches to create the routes within proxy areas.

Exclusive Approach. In the first approach, referred to as the exclusive approach, we ignore proxy areas during the route encoding. The resolution of the routing constraints—adapted as detailed in Section 5.2.3—yields the route segments that connect the proxy masters of the network. The unique internal routes that connect proxy masters to the actual source and destination resources are added in a post- processing step. The obtained complete routing graphs are used for the evaluation of the objective functions, e.g., costs, timing, or reliability.

Compact Approach. The exclusive approach offers the biggest reduction of encod- ing variable number and, therefore, the biggest reduction of the constraint resolution effort. For certain problems, however, ignoring proxy areas may have a negative impact on the optimization convergence, as it limits the ability to formulate additional constraints regarding, e.g., the capacity of the links within these parts of the network. We address these cases with a second approach for the creation of route segments within proxy areas. In this so-called compact approach, the activation of internal links is encoded with a constraint set tailored to the conditions found within proxy areas. By exploiting the fact that neither routing cycles nor redundant route segments are possible within proxy areas, the compact approach requires only a small number of constraints that are formulated based on—already existing—variables that describe task mapping and component activation. The compact approach is implemented by formulating constraints (5.14)–(5.17) for each resource within a proxy area. They ensure that the source process of a message may only be mapped onto a resource inside a proxy area if the resource is the binding target of a destination process or has at least one activated out-link (5.14). An in-link of a resource may only be active if the resource is the binding target of a destination process or has at least one activated out-link (5.16). Analogous constraints apply to the binding of destination processes (5.15) and the activation of out-links (5.17). The C C l encoding variable l=(R,Re) is set to 1 iff message is routed over link in the direction

137 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality from resource R to resource Re. Note that these constraints are formulated with the assumption of routing optimizations like [LSF14] or [SRT+18b], where the end points of the routes are not fixed5. C NC,P N + ,Pe N ,m = (P,R) EM,m = (Pe,R) EM,l = (R,Re) El: ∀ ∈ ∈ C ∈ C− ∈ e ∈ ∈ m ( m) + ( C ) e ∑ ∑ l=(R,Re) 0 (5.14) − m l ≤

C NC,P N + ,Pe N ,m = (P,R) EM,m = (Pe,R) EM,el = (Re,R) El: ∀ ∈ ∈ C ∈ C− ∈ e ∈ ∈ m ( m) + ( C ) 0 (5.15) − ∑ e ∑ el=(Re,R) ≤ me el

C NC,P N + ,m = (P,R) EM,el = (Re,R) El,l = (R,Re) El: ∀ ∈ ∈ C ∈ ∈ ∈ C ( m) + ( C ) 0 (5.16) el=(Re,R) ∑ ∑ l=(R,Re) − m l ≤

C NC,Pe N ,m = (Pe,R) EM,el = (Re,R) El,l = (R,Re) El: ∀ ∈ ∈ C− e ∈ ∈ ∈ C ( m) + ( C ) 0 (5.17) l=(R,Re) − ∑ e ∑ el=(Re,R) ≤ me el

5.2.4 Experimental Results We perform several experiments to evaluate the impact of variety-awareness on the run time and the result quality of routing optimization approaches. The presented case studies originate from two distinct embedded system domains. In all experiments, we investigate how the run time and the result quality in terms of the ε-dominance (see Section 2.2.4) of a variety-unaware baseline approach changes when it is extended with the exclusive or the compact implementations of the proxy concept presented in Section 5.2.3. Hereby, each optimization run comprises 1,000 generations with a population size of 100 individuals and 25 new individuals in every generation.

Automotive Case Study With upcoming safety-critical Advanced Driver Assistance Systems (ADAS) applica- tions, a certain degree of redundancy becomes mandatory for automotive communica- tion networks. However, it seems improbable that redundancy will be considered at every possible point. Rather than that, it is more likely that the high cost pressure and the necessity to reuse network designs of previous car generations will force designers

5The constraint adaptation for the simpler case with known route end points ([ABC+13], [MAS+18]) is trivial and, therefore, omitted.

138 5.2 Variety-Aware Routing Encoding

8 ECUs 24 ECUs

0.8 baseline baseline compact 0.6 compact 0.6 exclusive exclusive 0.4 0.4

-dominance 0.2 0.2 ε 0 0 0 100 200 300 0 200 400 600 Run time [s] Run time [s]

40 ECUs 56 ECUs

baseline 0.8 baseline 0.6 compact 0.6 compact exclusive exclusive 0.4 0.4

-dominance 0.2

ε 0.2 0 0 0 200 400 600 800 0 500 1,000 Run time [s] Run time [s]

Figure 5.13: Average ε-dominance of 40 optimization runs for the automotive case study (Fig. 5.14). Exploiting proxy areas results in a reduction of the op- timization time required for a fixed number of iterations. This reduction scales with the network size. to consider redundancy only for a few critical components (e.g., links which are espe- cially vulnerable or important). In our first case study, we try to emulate this design situation. Similar to the experiments performed in Section 5.1.7, the message routing of an automotive application is optimized with respect to transmission reliability and the number of allocated links. The tasks of the application exchange 64 safety-critical messages, which are sent in both uni- and multicast fashions. For the architecture, we consider the double-star network topology illustrated in Fig. 5.14. In our case study, the two stars are connected by three switches. The connections between the two stars offer a possibility for redundant transmission. The connection between each ECU and its immediate switch offers no redundancy, so that each star establishes one proxy area. For this experiment, we use the redundant routing encoding presented in Section 4.3. This encoding is used as the baseline and is compared to optimization runs where it is extended by identifying proxy areas and implementing the compact and the exclusive

139 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality

Figure 5.14: Double-star topology with 24 ECUs connected by 3 switches [SPG+19]. approach as detailed in Section 5.2.3. The results are plotted in Fig. 5.13. The four plots show the average ε-dominance of 40 optimization runs performed for each of the networks with different numbers of Electronic Control Units (ECUs), equally distributed between the two stars. With respect to convergence to the reference Pareto front, all approaches perform similarly well. For each approach, there are some optimization runs where the optimal solution was not found, so that the average values of the final results are slightly different. However, the differences are very small and no single approach significantly outperforms the others. The biggest difference occurs in the caseofthe most complicated problem with the biggest network. Here, the faster convergence of the variety-aware approaches compared to the baseline could be explained by the fact that reducing the number of encoding variables by exploiting the knowledge about proxy areas simplifies the problem for the optimizer. The time for the identification of proxy areas is negligible in comparison tothe time for the constraint formulation, so that the run time of the three approaches is nearly identical for the smallest network. For the bigger networks, the variety-aware approaches offer a reduction of the run time needed for a fix number of iterations that scales with the complexity of the problem. The baseline approach encodes the activation of every individual link, even though a big part of the network consists of the two proxy areas (stars) which do not offer any routing variety. Furthermore, the expensive constraints that prevent routing cycles are formulated not only for the variety area formed by the switches, but also for the proxy areas, where they have no effect. Consequently, the baseline approach requires more time for the constraint formulation and preprocessing (visible as the initial time offset of the black graph) as well as for the constraint resolution (resulting in a higher overall run time). All in all, the proposed variety-aware approaches significantly outperform the baseline, as they offer results of similar or higher quality while the require up to three times shorter run times.

Many-core Case Study Over the past decade, many-core architectures have emerged as an interesting option to meet the increasing processing power demand of modern applications. In these

140 5.2 Variety-Aware Routing Encoding

R R R R ...

Network Core1 Network Core1 Network Core1 Network Core1 Interface Interface Interface Interface RX TX Core2 RX TX Core2 RX TX Core2 RX TX Core2

Tile ... Tile ... Tile ... Tile ... Memory CoreN Memory CoreN Memory CoreN Memory CoreN

R R R R ...

Network Core1 Network Core1 Network Core1 Network Core1 Interface Interface Interface Interface RX TX Core2 RX TX Core2 RX TX Core2 RX TX Core2

Tile ... Tile ... Tile ... Tile ... Memory CoreN Memory CoreN Memory CoreN Memory CoreN ......

Figure 5.15: A heterogeneous tiled many-core architecture [SPG+19]. architectures, multiple, oftentimes heterogeneous, processors are placed on the same chip and connected by a communication network, referred to as a Network-on-Chip (NoC). Tiled many-core architectures are even more heterogeneous. They consist of multiple processor tiles which are connected by the NoC in a grid fashion. Each tile consists of multiple resources, including processors and memories, interconnected via a bus. A part of such an architecture is illustrated in Fig. 5.15. While these distributed heterogeneous processor networks offer great flexibility, finding the optimal mapping of the application tasks onto the processors ofthechip is a considerable challenge. Moreover, recent research [WGW+14] shows that a deterministic routing approach—such as X-Y-routing, which is commonly used in the many-core domain—may render numerous solutions infeasible due to the violation of link capacities, thus, limiting the number and the quality of feasible solutions. Exploring all possible routes may, therefore, significantly increase the quality of found solutions. This type of optimization is the focus of our second case study. In these experiments, the goal is to find implementations of the telecommunication application from the E3S benchmark suite [Dic10] which are optimal with respect to energy consumption, makespan, and the number of allocated processors. In order to optimize the non- redundant message routes, we have reimplemented the approach presented in [LSF14]. The approach is extended with the compact and the exclusive proxy-strategies and used as a baseline for the many-core experiments. We perform the optimization experiments for 3 3- and 4 4-tile architectures. For both architecture types, we × × consider the cases where the bandwidth of each inter-tile link is quantized in 6 (hard) or 10 (relaxed) equal shares, hereafter referred to as link capacity, that can be utilized by the routed messages. This link capacity is enforced by additional link capacity constraints. In case of the exclusive approach, which disregards links inside proxy areas, an additional evaluator—namely, capacity evaluator—is used to check the

141 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality

3 3 relaxed 3 3 hard × × 0.6 baseline 1 baseline compact 0.8 compact exclusive exclusive 0.4 0.6

-dominance 0.4

ε 0.2 0.2 0 200 400 600 0 2,000 4,000 Run time [s] Run time [s]

4 4 relaxed 4 4 hard × × baseline 1 baseline 0.4 compact 0.8 compact exclusive exclusive 0.6

-dominance 0.2 0.4 ε 0.2 0 500 1,000 1,500 0 0.5 1 1.5 Run time [s] Run time [s] 104 · Figure 5.16: Average ε-dominance of 10 optimization runs for the many-core case study. Variety-aware approaches provide a faster convergence towards Pareto fronts in closer proximity to the reference front. The advantage in convergence speed scales with the complexity of the optimization problem. feasibility of implementations in terms of respecting link capacities inside proxy areas. The results are illustrated in Fig. 5.16. As each processor tile can be seen as a proxy area, using a routing encoding with variety-awareness has a strong impact on the optimization. For a fix number of iterations, the compact approach always requires a shorter run time than the baseline and offers results of equal or better quality in terms of convergence towards the reference Pareto front. The speed difference between these approaches depends on the difficulty of the constraint resolution. With alink capacity of 10 (relaxed), the 4 4 architecture offers a very high number of feasible × solutions, so that the constraint resolution is easy and the speedup offered by the proxy approaches is moderate. However, in case of a more challenging problem, such as a smaller architecture or stricter capacity constraints, the speedup grows. Indeed, for the 3 3 architecture with a capacity of 6 (hard), the compact approach ×

142 5.2 Variety-Aware Routing Encoding is nearly 8 times faster than the baseline. Comparing the exclusive and the compact approach offers an other interesting insight. With a capacity of 10, finding feasible routes is relatively easy, so that the exclusive approach can create a sufficiently big population of feasible solutions and outperform the other approaches in both run time and the convergence towards the reference Pareto front. However, with a capacity of 6, creating feasible solutions is much harder. Here, the exclusive approach wastes a large share of the optimization time creating solutions with capacity violations on links in the proxy areas, which are then rejected by the capacity evaluator. The exclusive approach, therefore, delivers results of poor quality. On the other hand, the compact approach is aware of every link in the architecture and offers the possibility to encode constraints restricting their usage. With these link capacity constraints, it is able to explore only feasible solutions without capacity violations and delivers Pareto fronts in close proximity to the reference front while requiring up to 8 times shorter run times than the baseline.

5.2.5 Related Work A large body of research exists on the optimization of routings during the design of embedded systems. An introduction to the general problem and the different routing algorithms is given in [WH00]. The authors of [GRG+14] provide a detailed perfor- mance comparison between Route Preprocessing (RP) and Componential Assembly (CA) approaches and show that RP approaches can outperform CA approaches in sufficiently sparse networks. Yet, in general, RP approaches are not considered to be a good option, as enumerating all routes is NP-hard. To overcome the scalability problem, RP approaches often limit the search space. For instance, authors of [LPS16] consider only the K shortest paths while the RP heuristic presented in [GZP+17] is targeted at reusing the already allocated links. The majority of existing routing optimization approaches can, however, be viewed as CA approaches. A link-based encoding for Avionics Full-Duplex Switched Ethernet (AFDX) routings is presented in [ABC+13]. The authors of [LSG+09] go a step further and present a constraint set where the routing is optimized together with the mapping and the allocation of an embedded system. Indeed, the easy modification to certain design goals isan additional advantage of CA approaches. For example, several works present constraint sets for a joint optimization of routing and transmission schedule [NDR16; MAS+18; SDT+17]. Yet, to the best of our knowledge, existing CA approaches formulate the constraints regardless of the topology. In this work, we, for the first time, show how each of these approaches can be improved by restricting their routing optimization space to the variety areas of the network.

143 5 Injection of Objective- and Topology-Specific Knowledge for a Faster Optimization Convergence and a Higher Result Quality

5.3 Conclusions

This chapter introduced means for the injection of objective- and topology-specific knowledge into the network optimization with the goal of a faster optimization conver- gence to results of higher quality. The first section of this chapter investigated the optimization problems occurring during the reliability optimization of redundant routings and identified the so-called genetic gap—a situation where too many genetic modifications are necessary to reach a better solution and the optimization stagnates in a local optimum—as the main cause for the optimization difficulties. As a remedy, we proposed the AGD approach, an extension for SAT-Decoding where problem-specific knowledge is made accessible for the optimizer by extending the used genotype by a set of genes encoding preferences for solutions with features with a high relevance for the design objectives. The AGD approach was then demonstrated by applying it to the reliability optimization problem and encoding genes for the number of redundant paths and the criticality of the links. The second section has proposed a novel strategy for routing optimization which automatically exploits the knowledge about so-called proxy areas in a given network, i.e., regions that do not offer any routing variety. We have presented a lightweight algorithm that identifies proxy areas in a given network, proposed two strategies to exploit this knowledge during the optimization, and it was shown how the presented strategy can be integrated into existing routing encodings.

144 6 Conclusion and Future Work

Upcoming automotive applications necessitate not only high amounts of communi- cation bandwidth, but also require strict real-time and reliability guarantees for the message transmission. While its high bandwidth, wide distribution, and real-time- and reliability-specific extensions make the Ethernet Time-Sensitive Networking (TSN) protocol [Ins15; Ins16; Ins17] the natural candidate to address this challenge, the complexity of the design objectives and the size of the communication networks found in the automotive domain call for an approach for design automation. This work proposes a methodology for an automated design of Ethernet TSN networks, with a particular focus on the timing and reliability requirements found in the automotive domain. The following section summarizes the key contributions of this thesis and provides perspectives on future work.

6.1 Summary

This work enables a fully automated design and optimization of Ethernet (TSN) networks. To enable a global optimization of system implementations with respect to many and possibly conflicting design objectives the presented approach is integrated into a framework [LGR+11; RLG+18], which uses a graph-based system model [BTT98; LSG+09] and the SAT-Decoding approach [LGH+07] for the multi-objective optimization. The contributions of this work focus on the challenges of this integration by providing (a) efficient analysis approaches for the timing and reliability of message transmission, (b) constraint sets to characterize the search space of solutions that are valid in the context of automotive communication networks, and (c) techniques to address scalability problems and complex design objectives by exploiting problem- specific knowledge. After providing an introduction of the topic in Chapter 1,the fundamentals are presented in Chapter 2. Chapters 3-5 then detail the contributions of the thesis. The evaluation of concrete network designs is a crucial part of the optimization process. Chapter 3 presents the evaluation techniques proposed in this thesis. In automotive communication networks, timing is one of the most important network evaluation criteria. The TSN standard [Ins15] introduces mechanisms which enable

145 6 Conclusion and Future Work an interference-free and, therefore, deterministic transmission of critical messages. However, a formal timing analysis remains an important part of the network design as it is necessary to ensure that the unscheduled messages satisfy their deadlines. While state-of-the-art techniques for formal timing analysis like [DRE12] are capable of providing safe bounds for the interference that unscheduled messages impose on each other, they do not consider the interference imposed by scheduled traffic. As a remedy, Section 3.1 presents an approach (originally proposed in [SGR+17a]) for the formal analysis of the worst-case interference that time-triggered traffic can impose on each unscheduled message in the network. We show how this approach can be integrated into state-of-the-art frameworks for formal timing analysis. Furthermore, the section proposes preprocessing approaches that significantly reduce the computation effort necessary for the evaluation of a considered network design, so that the proposed analysis can be integrated into a Design Space Exploration (DSE) and used for an optimization of the communication network with the message transmission timing as one of multiple design objectives. A high reliability of message transmission in the presence of errors is another im- portant objective during the design of communication networks. While the evaluation of reliability in the presence of permanent errors is a well-researched topic (see, e.g., [GLR+08; LGT09]), the reliability in the presence of transient errors has, so far, been mostly neglected by the scientific community. In Section 3.2, this thesis makes a first step to close this gap by providing a formal analysis (originally proposed in [SGR+16]) for the worst-case impact that transient errors have on the transmission reliability of messages in communication networks. This analysis method considers the special features of Ethernet-based automotive networks like statical network de- sign, multi-hop communication and, in particular, the fault model where receivers expect to receive a certain number of messages within a specific time interval, the so-called diagnostic test interval [Int11]. The presented analysis combines techniques from the areas of timing and reliability analysis and calculates a safe bound for the transmission reliability of messages which considers faults that are caused by both, the loss of messages due to transmission corruptions and messages that are delayed by network interference and do not arrive in time. Section 3.2, furthermore, introduces the variation of message sending rates as a novel and cost-efficient way to increase the transmission reliability of messages in automotive networks. The provided reliability analysis can be seamlessly combined with state-of-the-art timing analysis techniques and is computationally cheap, so that it can be integrated into DSE frameworks and used for an automated reliability optimization of automotive communication networks. The automatic design of communication networks constitutes a particularly chal- lenging optimization problem. Hereby, one of the difficulties is the fact that, while there are a lot of different ways to choose a subset of links from the given network topology, only a small fraction of these subsets constitute valid routings. To cope with this issue, the optimization approach proposed in this thesis is based on a hybrid optimization technique referred to as SAT-Decoding [LGH+07]. There, the usage of a

146 6.1 Summary

SAT solver as a powerful repair mechanism makes it possible to restrict the evaluation space of an Evolutionary Algorithm (EA) to solutions that satisfy a given constraint set. In Chapter 4, this thesis provides constraint sets that describe routings that are valid with respect to TSN- and automotive-specific conditions, namely scheduled message transmission, Virtual Local Area Network (VLAN) partitioning of the communication network, and the possibility for redundant message routes. To ensure an interference-free message transmission, Ethernet TSN networks re- quire global transmission schedules. Existing approaches for the generation of trans- mission schedules like [Ste10] or [SLC13] assume a given message routing and, consequently, make the routing and the scheduling decisions in two completely sepa- rated design steps, ignoring the distinct interrelations between routing and scheduling. To enable an optimization of routing and scheduling, Section 4.1 proposes a novel con- straint set (originally proposed in [SGR+17b]) where the routing and the scheduling constraints are formulated jointly. The experimental results presented in this section give evidence that, compared to existing approaches for the generation of transmission schedules, a joint formulation of routing and scheduling constraints does not impose any overhead in cases with only one routing option and requires a smaller computation effort for the constraint resolution in cases of hard optimization problems. To address the security challenges introduced by novel applications such as Car-2-X, automotive networks require additional security mechanisms. Section 4.2 proposes a so-called VLAN partitioning of the communication network as a novel security mechanism. Hereby, the network is partitioned into multiple virtually separated subnetworks, so-called VLANs. While the transmission of messages between end nodes in the same VLAN is not restricted, a message that is to be transmitted between nodes in different VLANs must be routed over so-called VLAN routers, special resources that apply higher-layer security protocols. From the area of local and metropolitan networks, it is known that the VLAN partitioning of a network has a significant impact on various performance parameters. However, existing approaches for the optimization of the VLAN partitioning, such as [RHK99], are tailored towards the minimization of the amount of broadcast traffic and cannot be used for a multi- objective optimization of the numerous design objectives found in the automotive domain. As a remedy, Section 4.2 presents a constraint set (originally proposed in [SRT+18a]) that characterizes routings that are valid with respect to a given VLAN partitioning of the network. We demonstrate how the proposed constraint set can be integrated into a multi-objective optimization to optimize the VLAN partitioning of an automotive communication network. In addition to the increased security requirements, innovative Advanced Driver Assistance Systems (ADAS) are also characterized by strict reliability requirements, so that a certain degree of hardware redundancy, i.e., redundant route segments, becomes mandatory for automotive communication networks. To facilitate the implementation of redundant transmission, the TSN [Ins17] standard introduces mechanisms for frame replication and duplicate elimination. Yet, considering redundancy during network

147 6 Conclusion and Future Work design results in a significant expansion of the design space and makes automated approaches for routing optimization even more important. However, while existing approaches such as [LSG+09; GRG+14; LSF14] provide encodings characterizing valid message routes, they do not consider transmission redundancies and cannot be used for the optimization of TSN routings. As a remedy, Section 4.3 presents two approaches for the generation of redundant routings (originally presented in [SRT+18b]). Similarly to the approach presented in [GRG+14], the first approach is based on a preprocessing phase, where all redundant routings are found with a Depth-First-Search (DFS). During the DSE, a routing is then created by choosing one of the routing options generated during the preprocessing. The second approach, on the other hand, applies a similar methodology as [LSF14] and encodes a binary variable for every link. During the DSE, valid assignments of these variables are then created by resolving a constraint set that encodes all valid routings. With the steadily growing network size and an increasingly complex interrelation between design decisions and design objectives, providing efficient evaluators and a correct constraint description of the solution space may not be sufficient to obtain (Pareto-)optimal solutions. With the large amount of problem-specific knowledge accumulated by domain experts over the last decades, approaches that allow to make this knowledge accessible to the optimizer, therefore, seem especially promising. Section 5 proposes two approaches for the injection of problem-specific knowledge into a SAT-Decoding-based optimization of automotive communication networks. In Section 5.1, a novel enhancement for the SAT-Decoding approach, referred to as Artificial Gene Design (AGD), is introduced. In AGD, the genetic representation of problem solutions is extended with so-called Artificial Genes (AGs) encoding preferences for solutions with traits with an especially high relevance for the design objectives. If activated, AGs act as genetic switches and affect whole groups of other genes. This helps to overcome a problem similar to the Hamming Cliff [RWB+04], the so-called Genetic Gap, where only the variation of a high number of genes results in an improvement with respect to the objectives. After describing the AGD approach and the Genetic Gap problem, Section 5.1 demonstrates how the AGD approach can be applied to significantly improve the quality of the solutions obtained during the reliability optimization of redundant routings. The application of the AGD approach is, however, not limited to the optimization of communication networks, as the approach can be used to inject problem-specific knowledge into any SAT-Decoding-based optimization. Section 5.2 presents an other approach where problem-specific knowledge is in- jected into the optimization process. Here, we propose a novel approach for the encoding of valid routings, which can be applied to any routing optimization. Be- fore the generation of the routing constraints, a lightweight preprocessing algorithm identifies so-called proxy areas, i.e., network areas where there is exactly one route between each pair of nodes. Contrary to areas with a variety of different routings, proxy areas do not offer any benefit for the routing exploration and can, therefore, be

148 6.2 Directions of Future Work excluded from the DSE. Section 5.2 presents two approaches for the generation of variety-aware routing constraint sets. The exclusive approach completely ignores the proxy areas of the network and offers the greatest reduction of the time required for constraint resolution. In the compact approach, routings within the proxy areas are encoded with a set of constraints that are formulated with respect to the conditions found within proxy areas. The compact approach, therefore, enables the formulation of additional constraints describing the routes through the proxy areas, while at the same time reducing the time for the constraint resolution. Gathering the information about proxy areas during the preprocessing step and using this information during the constraint generation enables an injection of topology-specific information into the optimization, reduces the optimization run time, and—in some cases—provides results of higher quality.

6.2 Directions of Future Work

This last section discusses several possible directions of future research based on the contributions presented in the work at hand. During the preparation of this thesis, the TSN standard has been finalized, so that no TSN-capable hardware was available. Consequently, the provided experimental results were obtained from computer models rather than real hardware. Performing the experiments with real communication networks would be the natural next step to evaluate the proposed optimization approach and could provide further insights for future research work. Providing a holistic reliability analysis for automotive communication networks would be an other interesting research opportunity. While combining existing tech- niques for the analysis of permanent hardware failures with the analysis approach presented in this thesis is a challenging task, expressing the two sorts of reliability in a single design objective may enable a fully automated reliability optimization of automotive communication networks with respect to both permanent and transient errors. Among several possible extensions of the system model, the introduction of so- called operating modes is one of the most promising. Each operating mode defines a group of tasks that is active under specific conditions and may be inactive otherwise. Exploiting knowledge about the mutual exclusion of different operating modes would enable a safe timing analysis that delivers tighter bounds on the worst-case timing while reducing the hardware costs of the system. At the same time, operating modes could be used to model innovative applications such as Car-2-Car or Car-2-X and make them accessible for automated optimization. Adapting the presented network optimization approaches to the optimization of security aspects is an other interesting research opportunity. In a first step, the approach for an automated VLAN partitioning presented in Section 4.2 could be refined by

149 6 Conclusion and Future Work implementing a proper security evaluator. Moreover, the proposed usage of VLANs has a strong resemblance to the usage of firewalls in communication networks [Lyn00], so that it may be possible to adapt the proposed approach for the VLAN partitioning to the optimization of the firewall configuration of computer networks.

150 German Part

Entwurf und Evaluierung Ethernet-basierter E/E-Architekturen für latenz- und sicherheitskritische Anwendungen

151

Zusammenfassung

In den letzten Jahren wurden im Automobilbereich zahlreiche fortgeschrittene Fahras- sistenzsysteme (englisch: Advanced Driver Assistance Systems (ADAS)) vorgestellt, die den Fahrkomfort und vor allem die -sicherheit signifikant erhöhen und als Vor- reiter für das automatisierte Fahren betrachtet werden. Diese Systeme basieren auf einer großen Menge Sensoren, Prozessoren und Aktuatoren, die sowohl logisch als auch räumlich im Fahrzeug verteilt sind und beachtliche Mengen an Daten aus- tauschen müssen [Rib03]. Moderne Fahrassistenzsysteme stellen für das Bordnetz des Fahrzeugs daher eine enorme Herausforderung dar, da sie nicht nur über strikte Echtzeit- und Zuverlässigkeitsanforderungen verfügen, sondern auch hohe Kommu- nikationsbandbreiten benötigen. In der Automobildomäne etablierte Kommunika- tionsprotokolle wie CAN [Int15b] oder FlexRay [RSM+10] sind zwar gut für die Übertragung sicherheitskritischen Echtzeitverkehrs geeignet, bieten aber nicht genü- gend Bandbreite für die Realisierung innovativer ADAS Systeme. Das Ethernet-Protokoll [Ins14] gilt als einer der aussichtsreichsten Kandidaten, um den Anforderungen zukünftiger automobiler Kommunikationsnetzwerke zu begegnen. Dieses Protokoll wurde ursprünglich für den Einsatz in lokalen Datennetzen entwickelt. Die von dem Ethernet-Protokoll unterstütze Bandbreite wird regelmäßig durch neue Standards erweitert. Aufgrund seiner hohen Ausbreitung zeichnet sich dieses Protokoll zusätzlich durch geringe Hardware-Kosten aus. Da Ethernet ursprünglich keine harten Echtzeit- und Zuverlässigkeitsgarantien bietet, wird derzeit der neue Ethernet TSN Standard entwickelt, der das Ethernet-Protokoll um neue Mechanismen für die zeitlich deterministische [Ins15; Ins16] und redundante Datenübertragung [Ins17] erweitert. Während die Mechanismen von Ethernet TSN vielversprechend scheinen, um die im Automobilbereich benötigten Echtzeit- und Zuverlässigkeitsgarantien zu erbringen, führen sie gleichzeitig eine hohe Zahl einstellbarer Parameter ein, die beim Entwurf des Fahrzeugnetzwerks konfiguriert werden müssen. Der Entwurf des Bordnetzes, ein durch das fortwährende Wachstum der Netzwerkgröße sowie der Nachrichtenanzahl bereits heute sehr komplexer Prozess, wird dadurch noch schwieriger, sodass er kaum noch manuell, sondern nur noch durch die Automatisierung des Entwurfsprozesses zu bewältigen ist. Es existieren zwar einige Techniken, die für den automatisierten Entwurf herkömm-

153 German Part licher Ethernet Netzwerke verwendet werden können. Diese Entwurfsansätze sind jedoch nicht für den Entwurf und die Optimierung automobiler Ethernet (TSN) Netze nutzbar, da sie die dort vorkommenden Mechanismen, wie beispielsweise zeitlich gesteuerte Nachrichtenübertragung, virtuell isolierte Subnetze, redundante Übertra- gung und, insbesondere, die fehlenden Echtzeit- und Zuverlässigkeitsgarantien, nicht berücksichtigen. In diesem Zusammenhang wird in dieser Arbeit der, nach unserem Wissensstand, erste Entwurfsansatz auf Systemebene für automobile Ethernet-Netzwerke vorgestellt. Bei diesem Ansatz wird der multidimensionale, durch die—möglicherweise nichtlin- earen und zueinander in Widerspruch stehenden—Entwurfszielgrößen aufgespannte Lösungssuchraum im Rahmen einer Entwurfsraumexploration (DSE) durchsucht, um nicht nur eine einzelne, sondern mehrere qualitativ hochwertige Entwurfsalternativen zu finden. Der vorgeschlagene Ansatz ermöglicht einen automatisierten Entwurf und Evaluierung von Ethernet-basierten elektrisch/elektronischen (E/E) Architekturen, insbesondere für latenz- und sicherheitskritische Anwendungen. Für die Umsetzung dieses Ansatzes wurden im Rahmen dieser Arbeit neue wissenschaftliche Erkennt- nisse aus den Bereichen der formalen Analyse, der Regel-basierten Beschränkung des Suchraums sowie der Einbeziehung problemspezifischen Wissens erarbeitet. Diese werden in den Kapiteln 3–5 beschrieben, während Kapitel 1 und 2 die Einleitung und die Präsentation der Grundlagen enthalten. Da es sich bei vielen der neuen Fahrassistenzsysteme um verteilte Systeme handelt, ist es bei der Entwicklung des Fahrzeugs wichtig, nicht nur für die beteiligten Prozes- soren, sondern auch für das Kommunikationsnetzwerk, das für die Übermittlung dieser Nachrichten verwendet wird, Echtzeit- und Zuverlässigkeitsgarantien geben zu können. Um diese Garantien zu bieten und einzelne Entwurfsentscheidungen, z.B. im Hinblick auf das Nachrichtenrouting oder die Priorisierung der Nachrichten, bereits in frühen Entwurfsphasen bewerten zu können, benötigt man Ansätze für die Analyse der Echtzeitfähigkeit und der Zuverlässigkeit der Übertragung. Während Simulations- und Testmethoden oft für die Untersuchung der zeitlichen Eigenschaften eines Systems verwendet werden, können diese Methoden keine sicheren Echtzeitgarantien bieten. In der Automobilindustrie ist es gängige Praxis, die Echtzeit- fähigkeit der entwickelten Produkte mithilfe formaler Analyseverfahren nachzuweisen. Für Kommunikationsnetzwerke, die das herkömmliche Ethernet Protokoll [Ins14] oder Ethernet Audio Video Bridging (AVB) [Ins09] verwenden, existieren mehrere Ansätze für die formale Berechnung einer Obergrenze für die Übertragungszeit von Nachrichten. Jedoch können diese Ansätze nicht für eine ganzheitliche Analyse von Ethernet TSN Netzwerken verwendet werden, da sie die als Teil von diesem Standard eingeführten Mechanismen für eine zeitlich gesteurte Nachrichtenübertragung nicht berücksichtigen. In dieser Arbeit wird gezeigt, wie moderne Ansätze für die formale Zeitanalyse von Ethernet Netzwerken [DRE12] erweitert werden können, sodass das zeitliche Verhalten des nach einem Zeitplan übertragenen Verkehrs und insbesondere auch das

154 Ausmaß, in dem dieser Verkehr die Übertragung des nicht zeitlich geplanten Verkehrs verzögern kann, analytisch erfasst werden können. Der vorgeschlagene Ansatz wurde in [SGR+17a] publiziert und wird in Kapitel 3.1 beschrieben. Neben der Integration dieses Ansatzes in ein modernes Analyseframework werden in diesem Kapitel mehrere Vorverarbeitungstechniken vorgestellt, die die für die Zeitanalyse von Ethernet TSN Netzwerken benötigte Zeit signifikant verringern. Diese Techniken erlauben es, die vorgestellte Zeitanalyse in eine Entwurfsraumexploration zu integrieren und diese für die Optimierung des zeitlichen Verhaltens des Nachrichten (als eine von vielen Zielgrößen) zu verwenden. Die Analyse der Übertragungszuverlässigkeit ist bei der Entwicklung sicherheit- skritischer Automobilanwendungen ebenso wichtig wie die Analyse des zeitlichen Verhaltens von Nachrichten. In der wissenschaftlichen Literatur finden sich viele Arbeiten zur Zuverlässigkeitsanalyse von Systemen unter dem Einfluss permanenter Hardwarefehler [GLR+08; LGT09]. Die Analyse transienter Übertragungsfehler und deren Auswirkungen auf die Systemzuverlässigkeit hat bisher jedoch wenig Beachtung in der Wissenschaft gefunden. Im Kapitel 3.2 der vorliegenden Arbeit wird eine formale Methode für die Analyse der Auswirkungen von transienten Übertragungsfehlern auf die Zuverlässigkeit von Fahrzeugkommunikationsnetzen vorgestellt, die in [SGR+16] publiziert wurde. Dieses Verfahren wurde speziell für den in diesen Netzwerken verwendeten Fehlererken- nungsmechanismus entwickelt, bei dem der Empfänger jeder Nachricht mindestens eine Nachricht in einem anwendungsspezifischen Zeitintervall, dem sogenannten Diagnostischen Test Intervall (DTI), empfangen muss. Um die Zuverlässigkeit von Systemen zu quantifizieren, die auf diesem zeitorientierten Fehlererkennungsmecha- nismus basieren, kombiniert der vorgeschlagene Ansatz Analysetechniken aus den Bereichen der Zuverlässigkeits- und der Zeitanalyse. Eine sichere Abschätzung für die Übertragungszuverlässigkeit von Nachrichten wird hierbei auf Basis von einem gegebenen DTI sowie einer bekannten Bitfehlerrate (BER) berechnet. Neben der Analysemethode wird in dieser Arbeit ein neuartiges und kostengün- stiges Mittel zur Erhöhung der Systemzuverlässigkeit vorgestellt: die Optimierung der Senderaten der Nachrichten. Die optimale Senderate sicherheitskritischer Nachrichten zu finden ist eine komplexe Optimierungsaufgabe: Während das häufigere Senden einer Nachricht ihre Übertragungszuverlässigkeit erhöht, steigt dadurch gleichzeitig die Interferenz dieser Nachricht auf andere Nachrichten im Netzwerk, was sich wiederum negativ auf die Übertragungszuverlässigkeit im Gesamtsystem auswirkt. Die vorgeschlagene Analysemethode wird daher in eine DSE integriert, um einen au- tomatisiertes Systementwurf zu ermöglichen, bei dem die Übertragungsrate in Bezug auf die Übertragungszuverlässigkeit und auf das zeitliche Verhalten der Nachrichten optimiert wird. Für die Automatisierung des Entwurfsprozesses ist es notwendig während der Ex- ploration neue Netzwerkentwürfe mit einer realisierbaren Netzwerktopologie, einem realisierbaren Nachrichtenrouting sowie einer korrekten Konfiguration der Netzw-

155 German Part erkressourcen generieren zu können. In der wissenschaftlichen Literatur gibt es mehrere Ansätze, um Netzwerkentwürfe zu generieren, die im Hinblick auf die Al- lokation der Ressourcen, die Bindung der Tasks sowie das Routing der Nachrichten realisierbar sind [JPT+10; AGS+13; LSF14]. Durch die spezifischen Anforderungen automobiler Netzwerke sowie die neuen Mechanismen des TSN Standards ergeben sich jedoch neue Anforderungen an realisierbare Netzwerkentwürfe, sodass die ex- istierenden Ansätze nicht mehr ausreichen, um nicht realisierbare Entwürfe aus der DSE auszuschließen. Die Ports der Switches in TSN-Netzen enthalten sogenannte Transmission Gates. Mit diesen ist es möglich, an einem Port einen Zeitplan zu konfigurieren und dabei Zeitintervalle zu definieren, in denen nur der hochpriore, zeitlich geplante Verkehr über den Port übertragen werden kann. In Kombination mit einem globalen Zeitplan, der die lokalen Zeitpläne aller Ports im Netzwerk berücksichtigt, ermöglicht dieser Mechanismus eine störungsfreie und damit deterministische Übertragung sicherheit- skritischer Nachrichten. Die Erstellung eines globalen Zeitplans, der sicherstellt, dass sich die für die Übertragung der Nachrichten konfigurierten Zeitintervalle nicht überschneiden, ist jedoch eine anspruchsvolle Aufgabe. Bei existierenden Ansätzen für die Generierung gültiger Entwürfe von verteilten Systemen [Ste10; SAW+14] werden die Generierung des Routings und die Gener- ierung des Zeitplans für die Übertragung der Nachrichten getrennt betrachtet. Bei einer DSE kann dieses Vorgehen zu erheblichen Nachteilen führen, beispielsweise wenn das im ersten Schritt generierte Routing zu einem System führt, für das kein gültiger Zeitplan generiert werden kann. Im Kapitel 4.1 dieser Arbeit wird ein Ansatz vorgestellt, mit dessen Hilfe dieses Problem umgangen werden kann. Dieser Ansatz, der erstmals in [SGR+17b] publiziert wurde, basiert auf einer neuartigen Kodierung des Problems, bei dem ein gültiges Routing gleichzeitig mit einem dazu passenden gültigen Zeitplan generiert wird. Durch die Einführung fortschrittlicher Fahrassistenzsysteme, die die vollständige Kontrolle über das Fahrzeugverhalten haben, wird die Sicherheit der Datenübertragung (im Sinne der Vertraulichkeit, der Integrität und der Verfügbarkeit der Daten) zu einem immer wichtigeren Thema beim Entwurf automobiler Bordnetze. Der sogenannte VLAN-Mechanismus, bei dem das Gesamtnetzwerk virtuell in mehrere Subnetzwerke unterteilt wird, die jeweils als Virtual Local Area Network (VLAN) bezeichnet werden, wurde ursprünglich für lokale und städtische Netzwerke entwickelt, um die Menge des per Broadcast übertragenen Verkehrs zu reduzieren. Dieser Mechanismus kann auch dazu eingesetzt werden, in einem Automobilnetzwerk mehrere Subnetzwerke mit unterschiedlichen Sicherheitsanforderungen zu etablieren und somit die Sicherheit der Nachrichtenübertragung insgesamt zu erhöhen. Beim Entwurf eines solchen Netzes muss dann die Netzwerktopologie, das Nachrichtenrouting sowie die Einteilung des Netzwerks in VLANs gemeinsam optimiert werden, um Lösungen zu erhalten, die im Hinblick auf die vielen Entwurfsziele aus dem Automobilbereich Pareto-optimal sind. Existierende Ansätze für eine automatische Einteilung eines Ethernet Netzwerks in

156 VLANs, wie beispielsweise [RHK99], sind hierbei nicht anwendbar, da sie einzig und allein darauf ausgerichtet sind, die Menge des Broadcastverkehrs zu reduzieren und keine anderen Zielgrößen berücksichtigen können. In Kapitel 4.2 dieser Arbeit wird ein Ansatz für die automatische Generierung von Nachrichtenroutings vorgestellt, die im Hinblick auf eine gegebene VLAN Aufteilung gültig sind. Dieser Ansatz wurde erstmalig in [SRT+18a] publiziert und ermöglicht es, das Nachrichtenrouting und die VLAN Aufteilung des Netzwerks im Hinblick auf beliebige Zielgrößen zu optimieren. Um die für die Zertifizierung verteilter sicherheitskritischer Fahrassistenzsysteme notwendigen Zuverlässigkeitsgarantien zu gewährleisten, ist auch bei der Nachricht- enübertragung ein gewisses Maß an Redundanz erforderlich. In einem Kommunika- tionsnetzwerk mit redundanten Routings für sicherheitskritische Nachrichten führt der Ausfall einzelner Verbindungen nicht immer zum Ausfall der implementierten Anwendung und macht das Gesamtsystem deutlich zuverlässiger. Durch die Berück- sichtigung redundanter Routings wird der Entwurfsraum jedoch stark erweitert, was den Entwurfsprozess wesentlich komplexer macht. Eine Automatisierung des En- twurfsprozesses wird hierdurch noch wichtiger. In Kapitel 4.3 dieser Arbeit werden zwei Ansätze für die Generierung gültiger und möglicherweise redundanter Nachrichtenroutings vorgestellt, der den automatis- chen Entwurf hochzuverlässiger Fahrzeugnetze ermöglichen (erstmal publiziert in [SRT+18b]). Einer dieser Ansätze basiert auf einem umfangreichen Vorverarbeitungss- chritt, mit dem alle Routing-Möglichkeiten ermittelt werden, während der andere Ansatz darauf basiert, explizit, also für jede Nachricht und jeden Link, zu kodieren, ob der jeweilige Link für das Routen der jeweiligen Nachricht verwendet wird. Durch das kontinuierliche Wachstum der Kommunikationsnetze und der wach- senden Anzahl von Konfigurationsparametern wird die Optimierung von Netzw- erkentwürfen immer komplexer. Für die Optimierung komplexer Kommunikation- snetze, bei der Optimierung derer ein Großteil des Suchraums aus ungültigen Lö- sungen besteht, wird bei der vorliegenden Arbeit ein Ansatz verwendet, der als SAT-Dekodierung [LGH+07] bezeichnet wird. Dieser Ansatz, der in Kapitel 2.2.2 ausführlich beschrieben wird, ist in der Lage, den Evaluierungsraum durch den Auss- chluss nicht realisierbarer Lösungen drastisch zu reduzieren, sodass die DSE auch ohne problemspezifisches Wissen qualitativ hochwertige Ergebnisse liefert. In komplexen Fällen, wo der Evaluierungsraum nach der Reduktion auf die gültigen Lösungen immer noch ausreichend groß ist, kann es jedoch vorkommen, dass das SAT-Dekodierung- Verfahren, insbesondere bei Problemen mit komplizierten Beziehungen zwischen Entwurfsentscheidungen und Entwurfszielen, nicht ausreicht, um zu qualitativ hochw- ertigen Optimierungsergebnissen zu kommen. Für diese Fälle präsentiert diese Arbeit eine Erweiterung des SAT-Dekodierung- Verfahrens, das sogenannte Artificial Gene Design (AGD). Dieses Verfahren, das in [SRT+18b] publiziert wurde, wird in Kapitel 5.1 vorgestellt. Es basiert auf der For- mulierung zusätzlicher Regeln für realisierbare Lösungen, die dazu verwendet werden,

157 German Part den vom Evolutionären Algorithmus genutzten Genotyp um Gene zu erweitern, in denen zielgrößenspezifische Information kodiert wird. Die Anwendung von AGDund die daraus resultierende Verbesserung der Optimierungsergebnisse werden in dieser Arbeit am Beispiel der Zuverlässigkeitsoptimierung redundanter Nachrichtenroutings demonstriert. Schließlich wird im Kapitel 5.2 dieser Arbeit ein neuartiges Verfahren vorgestellt, um die Skalierbarkeit Regel-basierter Verfahren für die Optimierung von Routings zu verbessern. Bei diesem Verfahren wird ein Vorverarbeitungsalgorithmus einge- setzt, um sogenannte Proxy-Bereiche im gegebenen Netzwerk zu identifizieren. In- nerhalb eines Proxy-Bereichs existiert zwischen jedem Ressourcenpaar genau eine Route. Im Gegensatz zu Gebieten mit einer Vielzahl unterschiedlicher Routen bieten Proxy-Bereiche keinen Nutzen für die Routingoptimierung und können daher von der Enkodierung der Routings ausgeschlossen werden, was die Optimierungszeit deutlich verkürzt und Ergebnisse gleicher oder besserer Qualität liefert. Kapitel 6 schließt die Arbeit mit einer kurzen Zusammenfassung und einem Aus- blick auf zukünftige Forschungsthemen ab.

158 Bibliography

[ABC+13] Ahmad Al Sheikh, Olivier Brun, Maxime Chéramy, and Pierre-Emma- nuel Hladik. Optimal design of virtual links in afdx networks. Real-Time Systems, 49(3):308–336, 2013. [AGK+14] Hananeh Aliee, Michael Glaß, Faramarz Khosravi, and Jürgen Teich. An efficient technique for computing importance measures in automatic design of dependable embedded systems. In Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis, page 3. ACM, 2014. [AGS+13] Benjamin Andres, Martin Gebser, Torsten Schaub, Christian Haubelt, Felix Reimann, and Michael Glaß. Symbolic System Synthesis Us- ing Answer Set Programming. In International Conference on Logic Programming and Nonmonotonic Reasoning, pages 79–91. Springer, 2013. [Alb+04] Amos Albert et al. Comparison of event-triggered and time-triggered concepts with regard to distributed control systems. Embedded world, 2004:235–252, 2004. [All18] Ethernet Alliance. 2019 Ethernet Roadmap. 2018. https://ethernetallian ce.org/the-2019-ethernet-roadmap/. [ARC+04] Arturo Hernández Aguirre, Salvador Botello Rionda, Carlos A Coello Coello, Giovanni Lizárraga Lizárraga, and Efrén Mezura Montes. Han- dling constraints using multiobjective optimization concepts. Interna- tional Journal for Numerical Methods in Engineering, 59(15):1989– 2017, 2004. [ATE14] Philip Axer, Daniel Thiele, and Rolf Ernst. Formal timing analysis of automatic repeat request for switched real-time networks. In Industrial Embedded Systems (SIES), 2014 9th IEEE International Symposium on, pages 78–87. IEEE, 2014.

159 Bibliography

[AVG+16] Hananeh Aliee, Stefan Vitzethum, Michael Glaß, Jürgen Teich, and Emanuele Borgonovo. Guiding genetic algorithms using importance measures for reliable design of embedded systems. In Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2016 IEEE International Symposium on, pages 53–56. IEEE, 2016. [BGJ+97] Felice Balarin, Paolo Giusto, Attila Jurecska, Michael Chiodo, Harry Hsieh, Claudio Passerone, Ellen Sentovich, Luciano Lavagno, Bassam Tabbara, Alberto Sangiovanni-Vincentelli, et al. Hardware-software co-design of embedded systems: the POLIS approach. Springer Science & Business Media, 1997. [BHP+08] Frédéric Boniol, Pierre-Emmanuel Hladik, Claire Pagetti, Frédéric As- pro, and Victor Jégu. A framework for distributing real-time functions. In International Conference on Formal Modeling and Analysis of Timed Systems, pages 155–169, 2008. [Bir17] Alessandro Birolini. Reliability engineering:theory and practice. Sprin- ger, 2017. [BTT98] Tobias Blickle, Jürgen Teich, and Lothar Thiele. System-level synthe- sis using evolutionary algorithms. Design Automation for Embedded Systems:23–58, 1998. [Cly55] Floyd Clymer. Henry’s Wonderful Model T. 1908-1927. Bonanza Books, 1955. [Coe02] Carlos A Coello Coello. Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Computer methods in applied mechanics and engineering, 191(11-12):1245–1287, 2002. [CS03] George Constable and Bob Sommervill. A Century of Innovation. Twenty Engineering Achievements that Transformed our Lives. Joseph Henry Press, 2003. [DDN+11] Farid Daryabar, Ali Dehghantanha, Farhood Norouzi, and Farbod Mah- moodi. Analysis of virtual honeynet and vlan-based virtual networks. In Humanities, Science & Engineering Research (SHUSER), 2011 In- ternational Symposium on, pages 73–77. IEEE, 2011. [DEA10] Ramez M Daoud, Hany M ElSayed, and Hassanein H Amer. Perfor- mance and Reliability of Fault-Tolerant Ethernet Networked Control Systems. INTECH Open Access Publisher, 2010. [Dic10] Robert Dick. Embedded system synthesis benchmarks suite (E3S). 2010. URL: http://ziyang.eecs.umich.edu/~dickrp/e3sdd/.

160 Bibliography

[DPA+02] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation:182–197, 2002. [DRE12] Jonas Diemer, Jonas Rox, and Rolf Ernst. Modeling of Ethernet AVB networks for worst-case timing analysis. MATHMOD, 2012. [FOF+04] Joaquin Ferreira, Arnaldo Oliveira, Pedro Fonseca, and José Fonseca. An experiment to assess bit error rate in can. In Proceedings of 3rd International Workshop of Real-Time Networks (RTN2004), pages 15– 18, 2004. [FS13] Christian M. Fuchs and Stefan Schneele. The evolution of avionics networks from arinc 429 to afdx. In 2013. [GLR+08] Michael Glaß, Martin Lukasiewycz, Felix Reimann, Christian Haubelt, and Jürgen Teich. Symbolic Reliability Analysis and Optimization of ECU Networks. In Proceedings of Design, Automation and Test in Europe, pages 158–163, Munich, Germany. IEEE Computer Society, March 2008. [GPB+19] David Gessner, Julian Proenza, Manuel Barranco, and Alberto Balles- teros. A fault-tolerant ethernet for hard real-time adaptive systems. IEEE Transactions on Industrial Informatics, 2019. [Gre93] Klaus Gresser. An event model for deadline verification of hard real- time systems. In Fifth Euromicro Workshop on Real-Time Systems, pages 118–123. IEEE, 1993. [GRG+14] Sebastian Graf, Felix Reimann, Michael Glaß, and Jürgen Teich. To- wards scalable symbolic routing for multi-objective networked embed- ded system design and optimization. In Proceedings of CODES+ ISSS, 2014. [GRL+14] Michael Glaß, Felix Reimann, Martin Lukasiewycz, and Faramarz Khosravi. JReliability - The Java-based Reliability Library, 2014. [Grz07] Andreas Grzemba. MOST. Das Multimedia-Bussystem für den Einsatz im Automobil. Franzis, 2007. [GSZ+07] Prashant Garimella, Yu-Wei Eric Sung, Nan Zhang, and Sanjay Rao. Characterizing vlan usage in an operational network. In Proceedings of the 2007 SIGCOMM workshop on Internet network management, pages 305–306. ACM, 2007.

161 Bibliography

[GTL+17] Michael Glaß, Jürgen Teich, Martin Lukasiewycz, and Felix Reimann. Hybrid optimization techniques for system-level design space explo- ration. In Handbook of Hardware/Software Codesign. Soonhoi Ha and Jürgen Teich, editors. Springer Netherlands, Dordrecht, 2017, pages 217– 246. [GWE+05] David Greiner, Gabriel Winter, José M Emperador, and Blas Galván. Gray coding in evolutionary multicriteria optimization: application in frame structural optimum design. In International Conference on Evolutionary Multi-Criterion Optimization, pages 576–591. Springer, 2005. [GZP+17] Voica Gavrilut, Bahram Zarrin, Paul Pop, and Soheil Samii. Fault- tolerant topology and routing synthesis for ieee time-sensitive network- ing. In Proceedings of RTNS, 2017. [HAE17] Robin Hofmann, Leonie Ahrendts, and Rolf Ernst. Cpa - compositional performance analysis. In Handbook of Hardware/Software Codesign. Soonhoi Ha and Jürgen Teich, editors. Springer Netherlands, Dordrecht, 2017, pages 721–751. [HHJ+05] Rafik Henia, Arne Hamann, Marek Jersak, Razvan Racu, Kai Richter, and Rolf Ernst. System level performance analysis-the SymTA/S ap- proach. In Proceedings of the Computers and Digital Techniques, vol- ume 152 of number 2, pages 148–166, 2005. [HMV+13] Peter Hank, Steffen Müller, Ovidiu Vermesan, and Jeroen Van Den Keybus. Automotive ethernet: in-vehicle networking and smart mobility. In Design, Automation and Test in Europe (DATE), 2013. [Ins09] Institute of Electrical and Electronics Engineers. IEEE Standard for Local and Metropolitan Area Networks—Audio Video Bridging (AVB) Systems. Standard, Institute of Electrical and Electronics Engineers, New York, USA, December 2009. [Ins14] Institute of Electrical and Electronics Engineers. IEEE Standard for Local and metropolitan area networks–Bridges and Bridged Networks. Standard, Institute of Electrical and Electronics Engineers, New York, USA, December 2014. [Ins15] Institute of Electrical and Electronics Engineers. IEEE Standard for Local and metropolitan area networks – Bridges and Bridged Networks - Amendment 25: Enhancements for Scheduled Traffic. Standard, Insti- tute of Electrical and Electronics Engineers, New York, USA, December 2015.

162 Bibliography

[Ins16] Institute of Electrical and Electronics Engineers. IEEE Standard for Local and metropolitan area networks – Bridges and Bridged Networks – Amendment 26: Frame Preemption. Standard, Institute of Electrical and Electronics Engineers, New York, USA, December 2016. [Ins17] Institute of Electrical and Electronics Engineers. IEEE Standard for Local and metropolitan area networks – Frame Replication and Elimi- nation for Reliability. Standard, Institute of Electrical and Electronics Engineers, New York, USA, September 2017. [Int11] International Organization for Standardization. Road vehicles - Func- tional Safety - Part 1-9. Standard, International Organization for Stan- dardization, Geneva, CH, December 2011. [Int15a] International Cablemakers Federation. ICF News. Trends in Automotive Wiring. International Cablemakers Federation, 2015. [Int15b] International Organization for Standardization. Road vehicles – Con- troller area network (CAN). Standard, International Organization for Standardization, Geneva, CH, December 2015. [Int16] International Organization for Standardization. Road vehicles – Local Interconnect Network (LIN). Standard, International Organization for Standardization, Geneva, CH, December 2016. [JKH05] Arshad Jhumka, Stephan Klaus, and Sorin A Huss. A dependability- driven system-level design approach for embedded systems. In Design, Automation and Test in Europe, 2005. Proceedings, pages 372–377. IEEE, 2005. [JPT+10] Zai Jian Jia, Andy D Pimentel, Mark Thompson, Tomás Bautista, and Antonio Núñez. Nasa: a generic infrastructure for system-level mp- soc design space exploration. In Embedded Systems for Real-Time Multimedia (ESTIMedia), 2010 8th IEEE Workshop on, pages 41–50. IEEE, 2010. [KDV+97] Bart Kienhuis, Ed Deprettere, Kees Vissers, and Pieter Van Der Wolf. An approach for quantitative analysis of application-specific dataflow architectures. In Application-Specific Systems, Architectures and Proces- sors, 1997. Proceedings., IEEE International Conference on, pages 338– 349. IEEE, 1997. [KLG+18] Claudius F Kratochwil, Yipeng Liang, Jan Gerwin, Joost M Woltering, Sabine Urban, Frederico Henning, Gonzalo Machado-Schiaffino, C Darrin Hulsey, and Axel Meyer. Agouti-related peptide 2 facilitates convergent evolution of stripe patterns across cichlid fish radiations. Science, 362(6413):457–460, 2018.

163 Bibliography

[KNR+00] Kurt Keutzer, A Richard Newton, Jan M Rabaey, and Alberto Sangio- vanni-Vincentelli. System-level design: orthogonalization of concerns and platform-based design. IEEE transactions on computer-aided de- sign of integrated circuits and systems, 19(12):1523–1543, 2000. [Koo02] Philip Koopman. 32-bit cyclic redundancy codes for internet applica- tions. In Dependable Systems and Networks, 2002. DSN 2002. Proceed- ings. International Conference on, pages 459–468. IEEE, 2002. [KRG+14] Faramarz Khosravi, Felix Reimann, Michael Glaß, and Jürgen Teich. Multi-objective local-search optimization using reliability importance measuring. In Proceedings of the 51st Annual Design Automation Con- ference, pages 1–6. ACM, 2014. [KSM13] Timo Kiravuo, Mikko Sarela, and Jukka Manner. A survey of ethernet lan security. IEEE Communications Surveys & Tutorials, 2013. [KSS+09] Sunil D Krothapalli, Xin Sun, Yu-Wei E Sung, Suan Aik Yeo, and San- jay G Rao. A toolkit for automating and visualizing vlan configuration. In Proceedings of the 2nd ACM workshop on Assurable and usable security configuration, pages 63–70. ACM, 2009. [KZ12] Way Kuo and Xiaoyan Zhu. Some recent advances on importance measures in reliability. IEEE Transactions on Reliability, 61(2):344– 360, 2012. [LCM84] Shu Lin, Daniel J Costello, and Michael J Miller. Automatic repeat- request error-control schemes. IEEE Communications magazine, 1984. [Leh90] John P Lehoczky. Fixed priority scheduling of periodic task sets with ar- bitrary deadlines. In Real-Time Systems Symposium, 1990. Proceedings., 11th, pages 201–209. IEEE, 1990. [LGH+07] Martin Lukasiewycz, Michael Glaß, Christian Haubelt, and Jürgen Teich. Sat-decoding in evolutionary algorithms for discrete constrained optimization problems. In Evolutionary Computation, 2007. CEC 2007. IEEE Congress on, pages 935–942. IEEE, 2007. [LGR+11] Martin Lukasiewycz, Michael Glaß, Felix Reimann, and Jürgen Teich. Opt4J - A Modular Framework for Meta-heuristic Optimization. In Proceedings of the Genetic and Evolutionary Computing Conference (GECCO 2011), pages 1723–1730, 2011. [LGT+09] Martin Lukasiewycz, Michael Glaß, Jürgen Teich, and Paul Milbredt. Flexray schedule optimization of the static segment. In Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, pages 363–372, 2009.

164 Bibliography

[LGT09] Martin Lukasiewycz, Michael Glaß, and Jürgen Teich. Exploiting Data- Redundancy in Reliability-Aware Networked Embedded System Design. In Proceedings of the 7th International Conference on Hardware/Soft- ware Codesign and System Synthesis (CODES+ISSS), pages 229–238, 2009. [LL10] Jiajia Liu and Wuwen Lai. Security analysis of vlan-based virtual desk- top infrastructure. In Educational and Network Technology (ICENT), 2010 International Conference on, pages 301–304. IEEE, 2010. [LP10] Daniel Le Berre and Anne Parrain. The SAT4J library, release 2.2, system description. Journal on Satisfiability, Boolean Modeling and Computation:59–64, 2010. [LPS16] Sune Mølgaard Laursen, Paul Pop, and Wilfried Steiner. Routing opti- mization of avb streams in tsn networks. ACM Sigbed Review, 13(4):43– 48, 2016. [LSF14] Martin Lukasiewycz, Shanker Shreejith, and Suhaib A Fahmy. Sys- tem simulation and optimization using reconfigurable hardware. In International Symposium on Integrated Circuits (ISIC), pages 468–471, 2014. [LSG+09] Martin Lukasiewycz, Martin Streubühr, Michael Glaß, Christian Haubelt, and Jürgen Teich. Combined system synthesis and communication ar- chitecture exploration for MPSoCs. In Proceedings of the Conference on Design, Automation and Test in Europe, pages 472–477, 2009. [LTD+02] Marco Laumanns, Lothar Thiele, Kalyanmoy Deb, and Eckart Zitzler. Combining Convergence and Diversity in Evolutionary Multiobjective Optimization. Evolutionary computation, 10(3):263–282, 2002. [Lyn00] H Merrill Lynch. Firewall fundamentals. Information systems security, 9(5):1–11, 2000. [MAS+18] Rouhollah Mahfuzi, Amir Aminifar, Soheil Samii, Ahmed Rezine, Petru Eles, and Zebo Peng. Stability-aware integrated routing and scheduling for control applications in ethernet networks. In Design, Automation and Test in Europe (DATE), number EPFL-CONF-232889, 2018. [MFH+05] Alexander Metzner, Martin Franzle, Christian Herde, and Ingo Stierand. Scheduling distributed real-time systems by satisfiability checking. In 11th IEEE International Conference on Embedded and Real-Time Com- puting Systems and Applications (RTCSA), pages 409–415, 2005. [MTE16] Mischa Möstl, Daniel Thiele, and Rolf Ernst. Towards Fail-Operational ethernet Based in-vehicle Networks. In Proceedings of the 53rd Annual Design Automation Conference, page 53. ACM, 2016.

165 Bibliography

[NDR16] Naresh Ganesh Nayak, Frank Dürr, and Kurt Rothermel. Time-sensitive Software-defined Network (TSSDN) for Real-time Applications. In Pro- ceedings of the 24th International Conference on Real-Time Networks and Systems. ACM, 2016. [NS13] Nicolas Navet and Françoise Simonot-Lion. In-vehicle communication networks-a historical perspective and review. Technical report, Univer- sity of Luxembourg, 2013. [PB17] Alain Pétrowski and Sana Ben-Hamida. Evolutionary Algorithms. John Wiley & Sons, 2017. [PM08] Raimond Pigan and Mark Metter. Automating with PROFINET. Indus- trial communication based on Industrial Ethernet. Publicis Publishing, 2008. [Por18] Donovan Porter. 100BASE-T1 Ethernet: the evolution of automotive networking, 2018. [QBH+14] Sophie Quinton, Torsten T Bone, Julien Hennig, Moritz Neukirchner, Mircea Negrean, and Rolf Ernst. Typical worst case response-time analysis and its use in automotive network design. In Proceedings of the 51st Annual Design Automation Conference, pages 1–6. ACM, 2014. [RG18] Valentina Richthammer and Michael Glaß. On search-space restriction for design space exploration of multi-/many-core systems. In Proceed- ings of MBMV, 2018. [RGL+08] Felix Reimann, Michael Glaβ, Martin Lukasiewycz, Joachim Kein- ert, Christian Haubelt, and Jürgen Teich. Symbolic voter placement for dependability-aware system synthesis. In Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software code- sign and system synthesis, pages 237–242. ACM, 2008. [RGS+13] Felix Reimann, Sebastian Graf, Fabian Streit, Michael Glaß, and Ju- rgen Teich. Timing analysis of Ethernet AVB-based automotive E/E architectures. In Proceedings of the Emerging Technologies & Factory Automation (ETFA), pages 1–8, 2013. [RHK99] Sean Rooney, Christian Hörtnagl, and Jens Krause. Automatic vlan creation based on on-line measurement. ACM SIGCOMM Computer Communication Review, 29(3):50–57, 1999. [Rib03] William B Ribbens. Understanding Automotive Electronics. Sams Un- derstanding Series. Newnes, 2003. [Ric05] Kai Richter. Compositional Scheduling Analysis Using Standard Event Models. Dissertation, University of Braunschweig, 2005. URL: http://d- nb.info/976951754/34.

166 Bibliography

[RJE03] Kai Richter, Marek Jersak, and Rolf Ernst. A formal approach to Mp- SoC performance verification. Computer, 36(4):60–67, 2003. [RLG+18] Felix Reimann, Martin Lukasiewycz, Michael Glaß, and Fedor Smirnov. OpenDSE – open design space exploration framework, 2018. URL: http://opendse.sourceforge.net/. [RMS07] Mahmood Rahmani, B Muller-Rathgeber, and Eckehard Steinbach. Error detection capabilities of automotive network technologies and ethernet-a comparative study. In Intelligent Vehicles Symposium, 2007 IEEE, pages 674–679. IEEE, 2007. [RSM+10] Richard Regler, Jörg Schlinkheider, Markus Maier, Reinhard Prech- ler, Eduard Berger, and Leo Pröll. Intelligent electrics / electronics architecture. ATZextra worldwide, 15(11):246–251, 2010. [RWB+04] Jonathan Rowe, Darrell Whitley, Laura Barbulescu, and Jean-Paul Watson. Properties of gray and binary representations. Evolutionary computation, 12(1):47–76, 2004. [Sav97] Carla Savage. A survey of combinatorial gray codes. SIAM review, 39(4):605–629, 1997. [SAW+14] Florian Sagstetter, Sidharta Andalam, Peter Waszecki, Martin Lukasie- wycz, Hauke Stähle, Samarjit Chakraborty, and Alois Knoll. Sched- ule integration framework for time-triggered automotive architectures. In Proceedings of the Annual Design Automation Conference (DAC), pages 1–6, 2014. [SC97] Alice E Smith and David W Coit. Penalty functions. Handbook on Evolutionary Computation, pages C, 5:1–6, 1997. [Sch18] Georg Schildbach. On the application of iso 26262 in control design for automated vehicles. arXiv preprint arXiv:1804.04349, 2018. [SDT+17] Eike Schweissguth, Peter Danielis, Dirk Timmermann, Helge Parzy- jegla, and Gero Mühl. Ilp-based joint routing and scheduling for time- triggered networks. In Proceedings of RTNS, 2017. [SE09] Maurice Sebastian and Rolf Ernst. Reliability analysis of single bus communication with real-time requirements. In Dependable Computing, 2009. PRDC’09. 15th IEEE Pacific Rim International Symposium on, pages 3–10. IEEE, 2009. [SLC13] Florian Sagstetter, Martin Lukasiewycz, and Samarjit Chakraborty. Schedule integration for time-triggered systems. In Design Automation Conference (ASP-DAC), Asia and South Pacific, pages 53–58, 2013. [Spu00] Charles E. Spurgeon. Ethernet. The Definitive Guide. O’Reilly Media, 2000.

167 Bibliography

[SSK+10] Xin Sun, Yu-Wei Sung, Sunil D Krothapalli, and Sanjay G Rao. A systematic approach for evolving vlan designs. In INFOCOM, 2010 Proceedings IEEE, pages 1–9. IEEE, 2010. [Ste10] Wilfried Steiner. An evaluation of smt-based schedule synthesis for time-triggered multi-hop networks. In Real-Time Systems Symposium (RTSS), pages 375–384, 2010. [SWG+17] Tobias Schwarzer, Andreas Weichslgartner, Michael Glaß, Stefan Wil- dermann, Peter Brand, and Jürgen Teich. Symmetry-eliminating design space exploration for hybrid application mapping on many-core archi- tectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2017. [SWS+16] Florian Sagstetter, Peter Waszecki, Sebastian Steinhorst, Martin Luka- siewycz, and Samarjit Chakraborty. Multischedule synthesis for variant management in automotive time-triggered systems. Transactions on Computer-Aided Design of Integrated Circuits and Systems:637–650, 2016. [TAE+14] Daniel Thiele, Philip Axer, Rolf Ernst, and Jan R Seyler. Improving formal timing analysis of switched ethernet by exploiting traffic stream correlations. In Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis, page 15. ACM, 2014. [TCC+15] Sivakumar Thangamuthu, Nicola Concer, Pieter JL Cuijpers, and Johan J Lukkien. Analysis of ethernet-switch traffic shapers for in-vehicle networking applications. In Proceedings of the Design, Automation & Test in Europe (DATE), pages 55–60, 2015. [TCN00] Lothar Thiele, Samarjit Chakraborty, and Martin Naedele. Real-time calculus for scheduling hard real-time systems. In Proceedings of the International Symposium on Circuits and Systems (ISCAS), pages 101– 104, 2000. [TE16] Daniel Thiele and Rolf Ernst. Formal worst-case performance analy- sis of time-sensitive ethernet with frame preemption. In 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA), pages 1–9. IEEE, 2016. [TED15] Daniel Thiele, Rolf Ernst, and Jonas Diemer. Formal worst-case timing analysis of ethernet TSN’s time-aware and peristaltic shapers. In Pro- ceedings of the Vehicular Networking Conference (VNC), pages 251– 258, 2015. [Tei12] Jürgen Teich. Hardware/Software Co-Design: Past, Present, and Pre- dicting the Future. Proceedings of the IEEE, 100:1411–1430, 2012.

168 Bibliography

[TMA+05] Suleyman Tosun, Nazanin Mansouri, Ercument Arvas, Mahmut Kan- demir, and Yuan Xie. Reliability-centric high-level synthesis. In Pro- ceedings of the conference on Design, Automation and Test in Europe- Volume 2, pages 1258–1263. IEEE Computer Society, 2005. [vDS18] Luc van Dijk and Günter Sporer. Functional safety for automotive ethernet networks. Journal of Traffic and Transportation Engineering, 6(4):176–182, 2018. [WGW+14] Andreas Weichslgartner, Deepak Gangadharan, Stefan Wildermann, Michael Glaß, and Jürgen Teich. Daarm: design-time application anal- ysis and run-time mapping for predictable execution in many-core systems. In Proceedings of CODES+ISSS, 2014. [WH00] Bin Wang and Jennifer C Hou. Multicast routing and its qos extension: problems, algorithms, and protocols. IEEE network, 14(1):22–36, 2000. [Whi99] Darrell Whitley. A free lunch proof for gray versus binary encodings. In Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation-Volume 1, pages 726–733. Morgan Kaufmann Publishers Inc., 1999. [WLH11] Chenhu Wang, Jian Li, and Fei Hu. Fault tree synthesis for an avionic network. In Transportation, Mechanical, and Electrical Engineering (TMEE), 2011 International Conference on, pages 155–159. IEEE, 2011. [XLK+07] Yuan Xie, Lin Li, Mahmut Kandemir, Narayanan Vijaykrishnan, and Mary Jane Irwin. Reliability-aware co-synthesis for embedded systems. The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 49(1):87–99, 2007. [ZT99] Eckart Zitzler and Lothar Thiele. Multiobjective evolutionary algo- rithms: a comparative case study and the strength pareto approach. IEEE transactions on Evolutionary Computation, 3(4):257–271, 1999.

169

Author’s Own Publications

[PSW+19] Behnaz Pourmohseni, Fedor Smirnov, Stefan Wildermann, and Jürgen Teich. Isolation-Aware Timing Analysis and Design Space Exploration for Predictable and Composable Many-Core Systems. In 31th Euromi- cro Conference on Real-Time Systems (ECRTS’19), July 9–12, 2019. Contributions: Introduces a novel timing analysis method that is ap- plicable to a composable many-core system integrating multiple inter- application isolation schemes in combination, as well as a novel DSE approach which is capable of optimizing the choice of the isolation scheme per resource. [SGR+16] Fedor Smirnov, Michael Glaß, Felix Reimann, and Jürgen Teich. Formal reliability analysis of switched ethernet automotive networks under transient transmission errors. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE, pages 1–6. ACM, 2016. Contributions: Presents a formal analysis approach for the reliability against transient transmission errors in Ethernet networks. Proposes an optimization of the message sending rates as a mechanism to improve transmission reliability. [SGR+17a] Fedor Smirnov, Michael Glaß, Felix Reimann, and Jürgen Teich. Formal timing analysis of non-scheduled traffic in automotive scheduled tsn networks. In 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1643–1646. IEEE, 2017. Contributions: Proposes a formal analysis approach for the worst- case interference that the scheduled traffic in TSN networks can impose on unscheduled traffic. The analysis speedup provided by the presented preprocessing techniques enables a schedule optimization as part of a DSE. [SGR+17b] Fedor Smirnov, Michael Glaß, Felix Reimann, and Jürgen Teich. Opti- mizing message routing and scheduling in automotive mixed-criticality time-triggered networks. In Proceedings of the 54th Annual Design Automation Conference 2017, page 48. ACM, 2017. Contributions: Proposes a constraint set that enables a joint genera-

171 Author’s Own Publications

tion of routing and scheduling for Ethernet TSN networks, unleashing the potential for an automatic optimization of routing and scheduling. [SPG+19] Fedor Smirnov, Behnaz Pourmohseni, Michael Glaß, and Jürgen Teich. Variety-Aware Routing Encoding for Efficient Design Space Explo- ration of Automotive Communication Networks. In Proceedings of the 5th International Conference on Vehicle Technology and Intelligent Transport Systems (2019). SCITEPRESS, May 3–5, 2019. Contributions: Introduces a preprocessing algorithm that gathers knowledge about the network topology and proposes a novel routing en- coding approach that exploits this knowledge to reduce the optimization run time and provide results of higher quality. [SRT+18a] Fedor Smirnov, Felix Reimann, Jürgen Teich, and Michael Glaß. Auto- matic optimization of the vlan partitioning in automotive communica- tion networks. ACM Transactions on Design Automation of Electronic Systems (TODAES), 24(1):9, 2018. Contributions: Proposes VLAN partitioning as a security mechanism for automotive networks. Presents a constraint set encoding routings which are valid with respect to a given VLAN assignment, enabling an automatic optimization of the VLAN partitioning. [SRT+18b] Fedor Smirnov, Felix Reimann, Jürgen Teich, Zhao Han, and Michael Glaß. Automatic Optimization of Redundant Message Routings in Au- tomotive Networks. In Proceedings of 21st International Workshop on Software and Compilers for Embedded Systems (SCOPES 2018) (Sankt Goar). ACM, May 28–30, 2018. Contributions: Presents a constraint set encoding possibly redundant routings. Proposes AGD as an extension to SAT-Decoding that enables to make objective-specific traits accessible to the optimizer.

172 List of Symbols

ai j the interference table entry in the i-th row and the j-th column 48 A constant factor used for the activation and the deactivation of A the schedule constraints ...... 75 b the bit number of a message frame ...... 55 B(E ) the bandwidth of the routing element E ...... 43

BC the set of all possible routing graphs of the message C . . . . 112 c the number of communication hops ...... 55 function that returns 1 if link l is not critical for the communica- C (l,CP) tion flow CP and returns 0 otherwise ...... 126 function that returns 1 if link l is not critical for any communica- C (l) tion and returns 0 otherwise ...... 126 C a message node in the application graph ...... 22 encoding variable: (1) if the message C is routed over the routing C b graph b, (0) otherwise ...... 112 encoding variable: (1) if resource R is used for the routing of C R message C in the implementation, (0) otherwise ...... 29 encoding variable: (1) if link l is used for the routing of mes- C sage C in the direction from resource R to the resource R in the l=(R,Re) e implementation, (0) otherwise ...... 29 encoding variable: (1) if link l is used for the routing of the communication flow delivering message C to the binding target CP l=(R,Re) of process P in the direction from resource R to the resource Re in the implementation, (0) otherwise ...... 30 encoding variable: (1) if the source of the communication flow CP of message C to process P is in the VLAN v, while its destination v,ve is in the VLAN ve, (0) otherwise ...... 91

173 List of Symbols

encoding variable: (1) if the message C is routed over the directed C l=(R,Re),v link l from R to Re through the VLAN v, (0) otherwise . . . . . 95 encoding variable: (1) if the communication flow of message C CP to the process P is routed over the directed link l from R to R l=(R,Re),v e through the VLAN v, (0) otherwise ...... 95 d a dependency edge in the application graph ...... 22 the distance function of message C at the element E for δ(C,E ,n) nevents ...... 42 d(a ) the distance between the start point of the j-th and the i-th inter- i j ference slot ...... 48 ξ(C) the deadline ratio of message C ...... 103 an element (timing component) of the architecture graph, i.e., a E resource or a link ...... 40 E the bit error rate ...... 54 the set containing all dependency edges of the application E d graph ...... 22

El the set containing all link edges of the architecture graph . . 23 the overall interference by all interference slots between the start f(a ) i j of the i-th and the end of the j-th slot ...... 48 fC a frame of the message C ...... 75

FC the set containing all frames of message C ...... 75 encoding variable: (1) if the communication flow of the message n f CP C to the binding target of the task T is routed over n redundant paths, (0) otherwise ...... 122 the number of binary variables used to encode the global offset g of messages ...... 76 G an ECU node in the architecture graph ...... 23 encoding variable: (1) if the ECU G is activated in the implemen- G tation architecture, (0) otherwise ...... 29

GA the application graph ...... 22

GR the architecture graph ...... 23

GRc the routing graph of the message C ...... 24 h the length of the schedule hyper-period ...... 47 encoding variable: (1) if the failure of link l leads to a system h l failure, (0) otherwise ...... 123

174 List of Symbols

encoding variable: (1) if the failure of link l leads to a failed CP hl transmission of message C to the binding target of process P, (0) otherwise ...... 123 i(C,E ) inter-arrival distance of the message C at the element E . . . . 41 j(C,E ) jitter of the message C at the element E ...... 41 kG a VLAN configuration of ECU G ...... 90 encoding variable: (1) if the VLAN configuration k is chosen for k G ECU G, (0) otherwise ...... 90

KG the set of possible VLAN configurations of ECU G ...... 90 l a link edge in the architecture graph ...... 23 encoding variable: (1) if the link l is activated in the implementa- l tion architecture, (0) otherwise ...... 29 encoding variable: (1) if message C is routed over link l first and LC l,el then over link el, (0) otherwise ...... 78 l(C,E ) delay of the message C at the element E ...... 41 the delay contribution of the element E when transmitting mes- ∆l(E ,C) sage C ...... 43 m a mapping edge in the specification ...... 23 encoding variable: (1) if mapping m is activated in the imple- m mentation, (0) otherwise ...... 29 the number of genes changed by the optimizer between two M subsequent generations ...... 119 MTTDE the Mean time to detected error ...... 58 MTTF the Mean time to failure ...... 54

MTTFS mean time to failure of the considered system ...... 125 MTTRE the Mean time to residual error ...... 58 the arrival function of message C at the element E during the η(C,E ,t) time interval t ...... 42 n− the minimal number of arriving events ...... 61

NC the set containing all message nodes of the application graph 22

NG a set containing all ECU nodes of the architecture graph . . . 23 Nr (q ) the router resources used in the router sequence q ...... 91 G v,ve v,ve

175 List of Symbols

NP the set containing all process nodes of the application graph 22

NP−(C) the set of the predecessor processes of the message C ...... 91 + NP (C) the set of the successor processes of the message C ...... 91

NR the set containing all resource nodes of the architecture graph 23

NS a set containing all switch nodes of the architecture graph . . 23

NT the set containing all task nodes of the application graph . . . 22 encoding variable: (1) if the i-th frame of message C arrives O j Ofi ,f C Ce before the j-th frame of message Ce, (0) otherwise ...... 75 oC the global offset of the scheduled message C ...... 74 the minimal number of genetic changes required for an imrpove- θ ment with respect to the design objective ...... 119 O primary optimization objective ...... 125 secondary optimization objective ...... 126 ≀ P(C) the frame size of the message C ...... 43 p(C) period of the message C ...... 41 P a process node in the application graph ...... 22

PC the probability for the corruption detection ...... 57

PD the probability of a detected error ...... 58

PE the probability for a corruption leading to i bit flips ...... 57

Pf the faultlessness probability of a time interval ...... 64

PR the probability for a residual error ...... 57 the probability that no bit flips occur during the transmission P S over c hops ...... 57 a VLAN router sequence that connects the VLAN v to the VLAN q v,ve ve ...... 91 the set of all VLAN router sequences that connect the VLAN v Q v,ve to the VLAN ve ...... 91 encoding variable: (1) if the router sequence qv,v is chosen for P e qC the VLAN transmission of the communication flow transmitting v,ve message C to the binding target of process P, (0) otherwise 94 the run offset that the message C accumulates before reaching r C,l the link l ...... 75

176 List of Symbols

a function that returns 1 if link l is allocated in the considered R(l) implementation and returns 0 otherwise ...... 126 R a resource node in the architecture graph ...... 23 encoding variable: (1) if the resource R is activated in the imple- R mentation architecture, (0) otherwise ...... 29 S a switch node in the architecture graph ...... 23

τE the timing function of partial component PC ...... 40 t a time interval ...... 42 tD the diagnostic test interval ...... 53 tD the dependability period ...... 54 tN the negative burst phase ...... 54 tR the recovery phase ...... 54 i the time interval between the arrival of the first and the i-th frame tC ,l f of message C at the link l ...... 75 C,l tT the transmission time of message C on link l ...... 75 t˙ a certain point in time ...... 43 t˙s the start of a time interval ...... 65 t˙e the end of a time interval ...... 65 t˙sd the start of the dominance interval ...... 64 t˙ed the end of the event area ...... 64 t˙dd the point when the first dominant message arrives ...... 64

C,l the point in time when the first frame of the message C arrives at t˙ f the link l ...... 75 t˙si the start of an interference slot ...... 45 t˙ei the end of an interference slot ...... 45 T a task node in the application graph ...... 22 u(C) deadline of the message C ...... 103

ν(CE ) timing model of the message C at the element E ...... 41 the worst-case scheduled interference that can occur within the υ+(t) time interval t ...... 45

177 List of Symbols

encoding variable: (1) if the ECU G is a member of VLAN v, (0) v G otherwise ...... 90 encoding variable: (1) if the process P is associated with the v P VLAN v, (0) otherwise ...... 90 the set of neighbors of resource R, i.e., the resources connected X R to R by a link ...... 95 y an edge of the VLAN transmission graph ...... 92 Y the set of all edges of the VLAN transmission graph ...... 92 z a node of the VLAN transmission graph ...... 92 Z the set of all nodes of the VLAN tranmission graph ...... 92

178 Acronyms

ABCRT Actual Best-Case Response Time ...... 39 ADAS Advanced Driver Assistance Systems iii, 1, 14, 53, 72, 138, 147 AFDX Avionics Full-Duplex Switched Ethernet ...... 9, 67, 143 AG Artificial Gene ...... 116, 148 AGD Artificial Gene Design ...... iv, 7, 116, 148 ARQ Automatic Repeat ReQuest ...... 67 AV Artificial Variable ...... 121 AVB Audio Video Bridging ...... 4, 9 AWCRT Actual Worst-Case Response Time ...... 39

BCRT Best-Case Response Time ...... 38 BER Bit Error Rate ...... 4, 56

CA Componential Assembly ...... 132 CAN Controller Area Network ...... 2, 11, 60 CPA Compositional Performance Analysis ...... 38 CRC Cyclic Redundancy Check ...... 55

DBA Dominance-based Approach ...... 48 DFS Depth-First-Search ...... 111, 148 DSE Design Space Exploration ...... iii, 4, 20, 35, 71, 146 DTI Diagnostic Test Interval ...... 4

179 Acronyms

E/E electric/electronic ...... iii EA Evolutionary Algorithm ...... 7, 20, 51, 71, 116, 147 ECU Electronic Control Unit ...... 1, 11, 40, 80, 123, 140 EMF Electromagnetic Field ...... 54 ES Exact Solution ...... 47

FIFO first-in-first-out ...... 14, 44

GB Guard Band ...... 17, 45, 84

HP Hyper-Period ...... 37, 75

ILP Integer Linear Programming ...... 85 IS Interference Slot ...... 45, 46 ISs Interference Slots ...... 37

LE Link-Encoding approach ...... 127 Link-Encoding approach in combination with AGD . 127, 128, LE-AGD 129, 130 LIN Local Interconnect Network ...... 2

MOST Media Oriented Systems Transport ...... 2, 11 MTTF Mean Time To Failure ...... 125

NA Naive Approach ...... 47 NoC Network-on-Chip ...... 141

OBCRT Observed Best-Case Response Time ...... 39 OEM Original Equipment Manufacturer ...... 36 OWCRT Observed Worst-Case Response Time ...... 39

PB Pseudo-Boolean ...... 26, 71 PH Preemption Header ...... 45

180 Acronyms

PP Preprocessing approach ...... 127 PP-AGD Preprocessing approach in combination with AGD . . 127, 128

RA Refined Analysis ...... 61 RP Route Preprocessing ...... 132 RTC Real-Time Calculus ...... 38

SI Scheduled Interference ...... 37 SMT Satisfiability Modulo Theories ...... 74 SV Structural Variable ...... 121

TAS Time-Aware Shaper ...... 15, 35 TDMA Time-Division Multiple Access ...... 15 TSN Time-Sensitive Networking ...... iii, 3, 9, 35, 71, 114, 145 TU Time Unit ...... 49

UDP User Datagram Protocol ...... 10

VLAN Virtual Local Area Network ...... iv, 5, 10, 72, 115, 147

WCRT Worst-Case Response Time ...... 38

181

Curriculum Vitæ

Fedor Smirnov was born in 1989 in Udomlja in the Russian Federation. After his family moved to Germany in the year 2000, Fedor grew up and went to school in Erlangen, where he earned his Abitur (university-entrance diploma) in 2008 at the Emmy-Noether Gymnasium. Fedor started studying in 2008 and received his Bachelor of Science degree in Mechatronics from the Friedrich-Alexander University of Erlangen-Nürnberg, Germany, in 2011. After receiving his Master of Science degree in Mechatronics in the year 2014, Fedor has been working as a scientific researcher and Ph.D. student at the chair of Hardware/Software Co-Design at the department of Computer Science at the Friedrich-Alexander University Erlangen-Nürnberg (FAU) under the supervision of Prof. Dr.-Ing. Jürgen Teich. During the first three years of his dissertation, Fedor was involved in an INI.FAU cooperation project between the FAU and the AUDI AG. Fedor has been a reviewer for several international conferences and journals. His main research interests include multi-objective design space exploration of embedded—in particular automotive—systems and the optimization of constrained combinatorial problems.

183