Deadlock Detection in Distributed Databases

Total Page:16

File Type:pdf, Size:1020Kb

Deadlock Detection in Distributed Databases Deadlock Detection in Distributed Databases EDGAR KNAPP Department of Computer Sciences, University of Texas at Austin, Austin, Texas 78712 The problem of deadlock detection in distributed systems has undergone extensive study. An important application relates to distributed database systems. A uniform model in which published algorithms can be cast is given, and the fundamental principles on which distributed deadlock detection schemes are based are presented. These principles represent mechanisms for developing distributed algorithms in general and deadlock detection schemes in particular. In addition, a hierarchy of deadlock models is presented; each model is characterized by the restrictions that are imposed upon the form resource requests can assume. The hierarchy includes the well-known models of resource and communication deadlock. Algorithms are classified according to both the underlying principles and the generality of resource requests they permit. A number of algorithms are discussed in detail, and their complexity in terms of the number of messages employed is compared. The point is made that correctness proofs for such algorithms using operational arguments are cumbersome and error prone and, therefore, that only completely formal proofs are sufficient for demonstrating correctness. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems-distributed applications; distributed databases; network operating systems; D.4.1 [Operating Systems]: Process Management-concurrency; deadlocks; synchronization; D.4.7 [Operating Systems]: Organization and Design-distributed systems; H.2.4 [Database Management]: Systems-distributed systems; transaction processing General Terms: Algorithms Additional Key Words and Phrases: Deadlock detection, deadlock models, distributed deadlocks INTRODUCTION selection of a so-called victim whose roll- back or abortion breaks the deadlock, and Deadlock detection is an important prob- finally, deadlock resolution itself. This pa- lem in database systems (DBSs), and much per is concerned only with the aspect of attention has been devoted to it in the deadlock detection. Recent developments research community. Generally speaking, a in the area of distributed deadlock detec- deadlock situation is the possible result of tion algorithms are surveyed, with a special competition for resources, such as multiple emphasis on their relation to distributed database transactions requesting exclusive DBSs. The paper introduces a uniform access to data items. framework for the discussion of these al- The deadlock problem has several inter- gorithms. The abstraction achieved this esting components. Among these are dead- way allows us to talk about the algorithms lock prevention, deadlock avoidance, and- in terms of the underlying theoretical con- in connection with deadlock detection-the cepts, instead of just giving a phenomeno- Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1988 ACM 0360-0300/87/1200-0303 $1.50 ACM Computing Surveys, Vol. 19, No. 4, December 1987 304 l Edgar Knapp CONTENTS developed. Section 2 gives a systematic classification of the models of deadlock as they appear in database applications. A hierarchy of models that gives rise to one INTRODUCTION way of classifying most of the distributed 1. THE DEADLOCK PROBLEM deadlock detection procedures found in the 1.1 A Brief Introduction to Concurrency Control literature is introduced. Another classifi- 1.2 Deadlock in Centralized Systems cation, focusing on the theoretical princi- 1.3 Deadlock in Distributed Databases 1.4 The Database Model ples underlying the work in distributed 1.5 A Specification of the Deadlock Problem deadlock detection schemes, is given in Sec- 1.6 Centralized versus Distributed Deadlock tion 3. A survey of a number of algorithms Detection can be found in Section 4, with examples 2. MODELS OF DEADLOCK 2.1 One-Resource Model from each of the classes introduced in the 2.2 AND Model two previous sections. In the final section 2.3 OR Model the relative merits of the algorithms pre- 2.4 AND-OR Model sented are discussed. The references con- 2.5 (;) Model tain an exhaustive list on the work done in 2.6 Unrestricted Model 3. CLASSES OF DISTRIBUTED DEADLOCK the field of distributed deadlock detection DETECTION ALGORITHMS after 1980. Earlier papers included consti- 3.1 Path-Pushing Algorithms tute “classical articles” related to the 3.2 Edge-Chasing Algorithms subject. 3.3 Diffusing Computations 3.4 Global State Detection 4. A SURVEY OF SELECTED ALGORITHMS 4.1 Obermarck’s Path-Pushing Algorithm 1. THE DEADLOCK PROBLEM 4.2 Mitchell and Merritt’s Algorithm for the Single-Resource Model 1.1 A Brief Introduction to Concurrency 4.3 Chandy and Misra’s Algorithm Control for the AND Model 4.4 Chandy, Misra, and Haas’s Algorithm The deadlock problem in DBSs is part of for the OR Model the area of concurrency control.’ Concur- 4.5 Hermann and Chandy’s Algorithm for the AND-OR Model rency control deals with the problem of 4.6 Brscha and Toueg’s Algorithm coordinating the actions of processes that for the (;) Model operate in parallel, access shared data, and 5. DISCUSSION therefore potentially interfere with each ACKNOWLEDGMENTS other. The object of study is an abstraction REFERENCES BIBLIOGRAPHY (model) of many different types of infor- mation systems. The main component of this model is the transaction. Informally, a transaction is an execution of a program that accesses a shared database. In our logical description of the workings of the model transactions are characterized by a algorithms (cf. Elmagarmid [1986]). sequence of operations, for example, R(x) The paper is organized as follows. Sec- denoting the operation of reading some tion 1 focuses on the relationship between data item x from the database and W(X) the deadlock problem and DBSs. For the standing for the operation of assigning a benefit of those readers not familiar with new value to data item x in the database. the necessary database terminology, a brief When two or more transactions execute outline of the relevant concepts is given. concurrently, their database operations ex- For a more thorough treatment of this ma- ecute in an interleaved fashion. That is, terial, the reader is referred to recent operations from one transaction may exe- texts on concurrency control, for example, cute in between two operations of another Bernstein et al. [1987] and Papadimitriou transaction. This interleaving can cause [1987]. Next, a database model is pre- sented, and a specification of the deadlock 1 Part of Section 1.1 follows the introductory chapter detection problem in terms of this model is of Bernstein et al. 11987 1. ACM Computing Surveys, Vol. 19, No. 4, December 1987 Deadlock Detection in Distributed Databases 305 transactions to behave incorrectly or inter- Hence both transactions are blocked, wait- fere, thereby leading to an inconsistent ing for each other: a deadlock situation. database state. Informally, deadlock in a DBS can be The part of the DBS that controls the defined as “a situation in which each trans- relative order in which database operations action in a set of transactions is blocked requested by transactions execute is called waiting for another transaction in the set, the scheduler. The scheduler determines and therefore none will become unblocked the interleaving of database operations unless there is external intervention” (cf. such that the consistency of the database Bernstein et al. [1987]). is preserved. A particular such interleaving Even though some concurrency control of database operations is called a schedule. protocols are provably deadlock free (e.g., As an example, consider the schedule conservative 2PL,3 tree locking), most given below involving two transactions tl known protocols are vulnerable to dead- and t2 and two entities x and y (the notation lock. We next look at a number of other follows Bernstein et al. [1987] and Papa- ways in which deadlock can arise in a DBS. dimitriou [ 19871): Let us consider the case of multiversion schedulers, where each write operation on t1: R(x) W(Y) some data item produces a new version of tz : R(Y) W(x) this item, and each read operation of an This schedule formalizes the following se- item is mapped to some version of this item quence of database operations: transaction that was written previously. In the proto- tl first reads database item x, then t2 reads col for a multiversion-view-serializability y, followed by tl writing y, and finally t2 (MV-VSR) scheduler, the last step of a writing x. transaction is treated in a special way to ensure that at most one uncommitted 1.2 Deadlock in Centralized Systems version exists for any data item in the database. If such a scheduler is given the There are many different known strategies schedule of the previous example, the for schedulers for solving the problem of following happens: concurrent accesses without compromising database consistency. A detailed discussion tl reads x, of these strategies is beyond the scope of t:! reads y, this paper. The interested reader is referred tl waits for t2 to finish (commit), to Bernstein et al. [1987] and Papadimi- t2 waits for tl to finish (commit), triou [ 1987 1. The most popular of these strategies is and, again, the result is deadlock. so-called locking. Locking is the strategy of Lock conversion is a concurrency control reserving access rights (locks) that prevent technique that allows upgrading of a lock other transactions from obtaining certain to a stronger lock type, such as converting other (conflicting) locks. a read lock on a data item into a write lock As an example, consider a protocol called on the same item.
Recommended publications
  • An Aggregate Computing Approach to Self-Stabilizing Leader Election
    An Aggregate Computing Approach to Self-Stabilizing Leader Election Yuanqiu Mo Jacob Beal Soura Dasgupta University of Iowa Raytheon BBN Technologies University of Iowa⇤ Iowa City, Iowa 52242 Cambridge, MA, USA 02138 Iowa City, Iowa 52242 Email: [email protected] Email: [email protected] Email: [email protected] Abstract—Leader election is one of the core coordination 4" problems of distributed systems, and has been addressed in 3" 1" many different ways suitable for different classes of systems. It is unclear, however, whether existing methods will be effective 7" 1" for resilient device coordination in open, complex, networked 3" distributed systems like smart cities, tactical networks, personal 3" 0" networks and the Internet of Things (IoT). Aggregate computing 2" provides a layered approach to developing such systems, in which resilience is provided by a layer comprising a set of (a) G block (b) C block (c) T block adaptive algorithms whose compositions have been shown to cover a large class of coordination activities. In this paper, Fig. 1. Illustration of three basis block operators: (a) information-spreading (G), (b) information aggregation (C), and (c) temporary state (T) we show how a feedback interconnection of these basis set algorithms can perform distributed leader election resilient to device topology and position changes. We also characterize a key design parameter that defines some important performance a separation of concerns into multiple abstraction layers, much attributes: Too large a value impairs resilience to loss of existing like the OSI model for communication [2], factoring the leaders, while too small a value leads to multiple leaders.
    [Show full text]
  • A Theory of Fault Recovery for Component-Based Models⋆
    A Theory of Fault Recovery for Component-based Models? Borzoo Bonakdarpour1, Marius Bozga2, and Gregor G¨ossler3 1 School of Computer Science, University of Waterloo Email: [email protected] 2 VERIMAG/CNRS, Gieres, France Email: [email protected] 3 INRIA-Grenoble, Montbonnot, France Email: [email protected] Abstract. This paper introduces a theory of fault recovery for component- based models. We specify a model in terms of a set of atomic components incrementally composed and synchronized by a set of glue operators. We define what it means for such models to provide a recovery mechanism, so that the model converges to its normal behavior in the presence of faults (e.g., in self-stabilizing systems). We present a sufficient condition for in- crementally composing components to obtain models that provide fault recovery. We identify corrector components whose presence in a model is essential to guarantee recovery after the occurrence of faults. We also for- malize component-based models that effectively separate recovery from functional concerns. We also show that any model that provides fault recovery can be transformed into an equivalent model, where functional and recovery tasks are modularized in different components. Keywords: Fault-tolerance; Transformation; Separation of concerns; BIP 1 Introduction Fault-tolerance has always been an active line of research in design and imple- mentation of dependable systems. Intuitively, tolerating faults involves providing a system with the means to handle unexpected defects, so that the system meets its specification even in the presence of faults. In this context, the notion of spec- ification may vary depending upon the guarantees that the system must deliver in the presence of faults.
    [Show full text]
  • Deadlock: Why Does It Happen? CS 537 Andrea C
    UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department Deadlock: Why does it happen? CS 537 Andrea C. Arpaci-Dusseau Introduction to Operating Systems Remzi H. Arpaci-Dusseau Informal: Every entity is waiting for resource held by another entity; none release until it gets what it is Deadlock waiting for Questions answered in this lecture: What are the four necessary conditions for deadlock? How can deadlock be prevented? How can deadlock be avoided? How can deadlock be detected and recovered from? Deadlock Example Deadlock Example Two threads access two shared variables, A and B int A, B; Variable A is protected by lock x, variable B by lock y lock_t x, y; How to add lock and unlock statements? Thread 1 Thread 2 int A, B; lock(x); lock(y); A += 10; B += 10; lock(y); lock(x); Thread 1 Thread 2 B += 20; A += 20; A += 10; B += 10; A += B; A += B; B += 20; A += 20; unlock(y); unlock(x); A += B; A += B; A += 30; B += 30; A += 30; B += 30; unlock(x); unlock(y); What can go wrong?? 1 Representing Deadlock Conditions for Deadlock Two common ways of representing deadlock Mutual exclusion • Vertices: • Resource can not be shared – Threads (or processes) in system – Resources (anything of value, including locks and semaphores) • Requests are delayed until resource is released • Edges: Indicate thread is waiting for the other Hold-and-wait Wait-For Graph Resource-Allocation Graph • Thread holds one resource while waits for another No preemption “waiting for” wants y held by • Resources are released voluntarily after completion T1 T2 T1 T2 Circular
    [Show full text]
  • The Dining Philosophers Problem Cache Memory
    The Dining Philosophers Problem Cache Memory 254 The dining philosophers problem: definition It is an artificial problem widely used to illustrate the problems linked to resource sharing in concurrent programming. The problem is usually described as follows. • A given number of philosopher are seated at a round table. • Each of the philosophers shares his time between two activities: thinking and eating. • To think, a philosopher does not need any resources; to eat he needs two pieces of silverware. 255 • However, the table is set in a very peculiar way: between every pair of adjacent plates, there is only one fork. • A philosopher being clumsy, he needs two forks to eat: the one on his right and the one on his left. • It is thus impossible for a philosopher to eat at the same time as one of his neighbors: the forks are a shared resource for which the philosophers are competing. • The problem is to organize access to these shared resources in such a way that everything proceeds smoothly. 256 The dining philosophers problem: illustration f4 P4 f0 P3 f3 P0 P2 P1 f1 f2 257 The dining philosophers problem: a first solution • This first solution uses a semaphore to model each fork. • Taking a fork is then done by executing a operation wait on the semaphore, which suspends the process if the fork is not available. • Freeing a fork is naturally done with a signal operation. 258 /* Definitions and global initializations */ #define N = ? /* number of philosophers */ semaphore fork[N]; /* semaphores modeling the forks */ int j; for (j=0, j < N, j++) fork[j]=1; Each philosopher (0 to N-1) corresponds to a process executing the following procedure, where i is the number of the philosopher.
    [Show full text]
  • CSC 553 Operating Systems Multiple Processes
    CSC 553 Operating Systems Lecture 4 - Concurrency: Mutual Exclusion and Synchronization Multiple Processes • Operating System design is concerned with the management of processes and threads: • Multiprogramming • Multiprocessing • Distributed Processing Concurrency Arises in Three Different Contexts: • Multiple Applications – invented to allow processing time to be shared among active applications • Structured Applications – extension of modular design and structured programming • Operating System Structure – OS themselves implemented as a set of processes or threads Key Terms Related to Concurrency Principles of Concurrency • Interleaving and overlapping • can be viewed as examples of concurrent processing • both present the same problems • Uniprocessor – the relative speed of execution of processes cannot be predicted • depends on activities of other processes • the way the OS handles interrupts • scheduling policies of the OS Difficulties of Concurrency • Sharing of global resources • Difficult for the OS to manage the allocation of resources optimally • Difficult to locate programming errors as results are not deterministic and reproducible Race Condition • Occurs when multiple processes or threads read and write data items • The final result depends on the order of execution – the “loser” of the race is the process that updates last and will determine the final value of the variable Operating System Concerns • Design and management issues raised by the existence of concurrency: • The OS must: – be able to keep track of various processes
    [Show full text]
  • Supervision 1: Semaphores, Generalised Producer-Consumer, and Priorities
    Concurrent and Distributed Systems - 2015–2016 Supervision 1: Semaphores, generalised producer-consumer, and priorities Q0 Semaphores (a) Counting semaphores are initialised to a value — 0, 1, or some arbitrary n. For each case, list one situation in which that initialisation would make sense. (b) Write down two fragments of pseudo-code, to be run in two different threads, that experience deadlock as a result of poor use of mutual exclusion. (c) Deadlock is not limited to mutual exclusion; it can occur any time its preconditions (especially hold-and-wait, cyclic dependence) occur. Describe a situation in which two threads making use of semaphores for condition synchronisation (e.g., in producer-consumer) can deadlock. int buffer[N]; int in = 0, out = 0; spaces = new Semaphore(N); items = new Semaphore(0); guard = new Semaphore(1); // for mutual exclusion // producer threads while(true) { item = produce(); wait(spaces); wait(guard); buffer[in] = item; in = (in + 1) % N; signal(guard); signal(items); } // consumer threads while(true) { wait(items); wait(guard); item = buffer[out]; out =(out+1) % N; signal(guard); signal(spaces); consume(item); } Figure 1: Pseudo-code for a producer-consumer queue using semaphores. 1 (d) In Figure 1, items and spaces are used for condition synchronisation, and guard is used for mutual exclusion. Why will this implementation become unsafe in the presence of multiple consumer threads or multiple producer threads, if we remove guard? (e) Semaphores are introduced in part to improve efficiency under contention around critical sections by preferring blocking to spinning. Describe a situation in which this might not be the case; more generally, under what circumstances will semaphores hurt, rather than help, performance? (f) The implementation of semaphores themselves depends on two classes of operations: in- crement/decrement of an integer, and blocking/waking up threads.
    [Show full text]
  • Cs8603-Distributed Systems Syllabus Unit 3 Distributed Mutex & Deadlock Distributed Mutual Exclusion Algorithms: Introducti
    CS8603-DISTRIBUTED SYSTEMS SYLLABUS UNIT 3 DISTRIBUTED MUTEX & DEADLOCK Distributed mutual exclusion algorithms: Introduction – Preliminaries – Lamport‘s algorithm –Ricart-Agrawala algorithm – Maekawa‘s algorithm – Suzuki–Kasami‘s broadcast algorithm. Deadlock detection in distributed systems: Introduction – System model – Preliminaries –Models of deadlocks – Knapp‘s classification – Algorithms for the single resource model, the AND model and the OR Model DISTRIBUTED MUTUAL EXCLUSION ALGORITHMS: INTRODUCTION INTRODUCTION MUTUAL EXCLUSION : Mutual exclusion is a fundamental problem in distributed computing systems. Mutual exclusion ensures that concurrent access of processes to a shared resource or data is serialized, that is, executed in a mutually exclusive manner. Mutual exclusion in a distributed system states that only one process is allowed to execute the critical section (CS) at any given time. In a distributed system, shared variables (semaphores) or a local kernel cannot be used to implement mutual exclusion. There are three basic approaches for implementing distributed mutual exclusion: 1. Token-based approach. 2. Non-token-based approach. 3. Quorum-based approach. In the token-based approach, a unique token (also known as the PRIVILEGE message) is shared among the sites. A site is allowed to enter its CS if it possesses the token and it continues to hold the token until the execution of the CS is over. Mutual exclusion is ensured because the token is unique. In the non-token-based approach, two or more successive rounds of messages are exchanged among the sites to determine which site will enter the CS next. Mutual exclusion is enforced because the assertion becomes true only at one site at any given time.
    [Show full text]
  • Q1. Multiple Producers and Consumers
    CS39002: Operating Systems Lab. Assignment 5 Floating date: 29/2/2016 Due date: 14/3/2016 Q1. Multiple Producers and Consumers Problem Definition: You have to implement a system which ensures synchronisation in a producer-consumer scenario. You also have to demonstrate deadlock condition and provide solutions for avoiding deadlock. In this system a main process creates 5 producer processes and 5 consumer processes who share 2 resources (queues). The producer's job is to generate a piece of data, put it into the queue and repeat. At the same time, the consumer process consumes the data i.e., removes it from the queue. In the implementation, you are asked to ensure synchronization and mutual exclusion. For instance, the producer should be stopped if the buffer is full and that the consumer should be blocked if the buffer is empty. You also have to enforce mutual exclusion while the processes are trying to acquire the resources. Manager (manager.c): It is the main process that creates the producers and consumer processes (5 each). After that it periodically checks whether the system is in deadlock. Deadlock refers to a specific condition when two or more processes are each waiting for another to release a resource, or more than two processes are waiting for resources in a circular chain. Implementation : The manager process (manager.c) does the following : i. It creates a file matrix.txt which holds a matrix with 2 rows (number of resources) and 10 columns (ID of producer and consumer processes). Each entry (i, j) of that matrix can have three values: ● 0 => process i has not requested for queue j or released queue j ● 1 => process i requested for queue j ● 2 => process i acquired lock of queue j.
    [Show full text]
  • Deadlock-Free Oblivious Routing for Arbitrary Topologies
    Deadlock-Free Oblivious Routing for Arbitrary Topologies Jens Domke Torsten Hoefler Wolfgang E. Nagel Center for Information Services and Blue Waters Directorate Center for Information Services and High Performance Computing National Center for Supercomputing Applications High Performance Computing Technische Universitat¨ Dresden University of Illinois at Urbana-Champaign Technische Universitat¨ Dresden Dresden, Germany Urbana, IL 61801, USA Dresden, Germany [email protected] [email protected] [email protected] Abstract—Efficient deadlock-free routing strategies are cru- network performance without inclusion of the routing al- cial to the performance of large-scale computing systems. There gorithm or the application communication pattern. In reg- are many methods but it remains a challenge to achieve lowest ular operation these values can hardly be achieved due latency and highest bandwidth for irregular or unstructured high-performance networks. We investigate a novel routing to network congestion. The largest gap between real and strategy based on the single-source-shortest-path routing al- idealized performance is often in bisection bandwidth which gorithm and extend it to use virtual channels to guarantee by its definition only considers the topology. The effective deadlock-freedom. We show that this algorithm achieves min- bisection bandwidth [2] is the average bandwidth for routing imal latency and high bandwidth with only a low number messages between random perfect matchings of endpoints of virtual channels and can be implemented in practice. We demonstrate that the problem of finding the minimal number (also known as permutation routing) through the network of virtual channels needed to route a general network deadlock- and thus considers the routing algorithm.
    [Show full text]
  • Scalable Deadlock Detection for Concurrent Programs
    Sherlock: Scalable Deadlock Detection for Concurrent Programs Mahdi Eslamimehr Jens Palsberg UCLA, University of California, Los Angeles, USA {mahdi,palsberg}@cs.ucla.edu ABSTRACT One possible schedule of the program lets the first thread We present a new technique to find real deadlocks in con- acquire the lock of A and lets the other thread acquire the current programs that use locks. For 4.5 million lines of lock of B. Now the program is deadlocked: the first thread Java, our technique found almost twice as many real dead- waits for the lock of B, while the second thread waits for locks as four previous techniques combined. Among those, the lock of A. 33 deadlocks happened after more than one million com- Usually a deadlock is a bug and programmers should avoid putation steps, including 27 new deadlocks. We first use a deadlocks. However, programmers may make mistakes so we known technique to find 1275 deadlock candidates and then have a bug-finding problem: provide tool support to find as we determine that 146 of them are real deadlocks. Our tech- many deadlocks as possible in a given program. nique combines previous work on concolic execution with a Researchers have developed many techniques to help find new constraint-based approach that iteratively drives an ex- deadlocks. Some require program annotations that typically ecution towards a deadlock candidate. must be supplied by a programmer; examples include [18, 47, 6, 54, 66, 60, 42, 23, 35]. Other techniques work with unannotated programs and thus they are easier to use. In Categories and Subject Descriptors this paper we focus on techniques that work with unanno- D.2.5 Software Engineering [Testing and Debugging] tated Java programs.
    [Show full text]
  • Separation of Concerns in Multi-Language Specifications
    INFORMATICA, 2002, Vol. 13, No. 3, 255–274 255 2002 Institute of Mathematics and Informatics, Vilnius Separation of Concerns in Multi-language Specifications Robertas DAMAŠEVICIUS,ˇ Vytautas ŠTUIKYS Software Engineering Department, Kaunas University of Technology Student¸u 50, 3031 Kaunas, Lithuania e-mail: [email protected], [email protected] Received: April 2002 Abstract. We present an analysis of the separation of concerns in multi-language design and multi- language specifications. The basis for our analysis is the paradigm of the multi-dimensional sepa- ration of concerns, which claims that multiple dimensions of concerns in a design should be imple- mented independently. Multi-language specifications are specifications where different concerns of a design are implemented using separate languages as follows. (1) Target language(s) implement domain functionality. (2) External (or scripting, meta-) language(s) implement generalisation of the repetitive design features, introduce variations, and integrate components into a design. We present case studies and experimental results for the application of the multi-language specifications in hardware design. Key words: multi-language design, separation of concerns, meta-programming, scripting, hardwa- re design. 1. Introduction Today the high-level design of a hardware (HW) system is usually based either on the usage of the pure HW description language (HDL) (e.g., VHDL, Verilog), or the algo- rithmic one (e.g., C++, SystemC), or both. The emerging problems are similar for both, the HW and software (SW) domains. For example, the increasing complexity of SW and HW designs urges the designers and researchers to seek for the new design methodolo- gies.
    [Show full text]
  • Routing on the Channel Dependency Graph
    TECHNISCHE UNIVERSITÄT DRESDEN FAKULTÄT INFORMATIK INSTITUT FÜR TECHNISCHE INFORMATIK PROFESSUR FÜR RECHNERARCHITEKTUR Routing on the Channel Dependency Graph: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks Dissertation zur Erlangung des akademischen Grades Doktor rerum naturalium (Dr. rer. nat.) vorgelegt von Name: Domke Vorname: Jens geboren am: 12.09.1984 in: Bad Muskau Tag der Einreichung: 30.03.2017 Tag der Verteidigung: 16.06.2017 Gutachter: Prof. Dr. rer. nat. Wolfgang E. Nagel, TU Dresden, Germany Professor Tor Skeie, PhD, University of Oslo, Norway Multicast loops are bad since the same multicast packet will go around and around, inevitably creating a black hole that will destroy the Earth in a fiery conflagration. — OpenSM Source Code iii Abstract In the pursuit for ever-increasing compute power, and with Moore’s law slowly coming to an end, high- performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology- aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today’s networks and the accompanying steady decrease of the mean time between failures.
    [Show full text]