2.2 Adaptive Routing Algorithms and Router Design 20

https://theses.gla.ac.uk/ Theses Digitisation: https://www.gla.ac.uk/myglasgow/research/enlighten/theses/digitisation/ This is a digitised version of the original print thesis. Copyright and moral rights for this work are retained by the author A copy can be downloaded for personal non-commercial research or study, without prior permission or charge This work cannot be reproduced or quoted extensively from without first obtaining permission in writing from the author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Enlighten: Theses https://theses.gla.ac.uk/ [email protected] Performance Evaluation of Distributed Crossbar Switch Hypermesh Sarnia Loucif Dissertation Submitted for the Degree of Doctor of Philosophy to the Faculty of Science, Glasgow University. ©Sarnia Loucif, May 1999. ProQuest Number: 10391444 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest ProQuest 10391444 Published by ProQuest LLO (2017). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C ode Microform Edition © ProQuest LLO. ProQuest LLO. 789 East Eisenhower Parkway P.Q. Box 1346 Ann Arbor, Ml 48106- 1346 I GW SGOw" p "ER.uyj »^iRY Abstract The interconnection network is one of the most crucial components in any multicomputer as it greatly influences the overall system performance. Several recent studies have suggested that hypergraph networks, such as the Distributed Crossbar Switch Hypermesh (DCSH), exhibit superior topological and performance characteristics over many traditional graph networks, e.g. k-ary 77-cubes. Previous work on the DCSH has focused on issues related to implementation and performance comparisons with existing networks. These comparisons have so far been confined to deterministic routing and unicast (one-to-one) communication. Using analytical models validated through simulation experiments, this thesis extends that analysis to include adaptive routing and broadcast communication. The study concentrates on wormhole switching, which has been widely adopted in practical multicomputers, thanks to its low buffering requirement and the reduced dependence of latency on distance under low traffic. Adaptive routing has recently been proposed as a means of improving network performance, but while the comparative evaluation of adaptive and deterministic routing has been widely reported in the literature, the focus has been on graph networks. The first part of this thesis deals with adaptive routing, developing an analytical model to measure latency in the DCSH, and which is used throughout the rest of the work for performance comparisons. Also, an investigation of different routing algoritlims in this network is Abstract presented. Conventional k-my n-cubes have been the underlying topology of contemporary multicomputers, but it is only recently that adaptive routing has been incorporated into such systems. The thesis studies the relative performance merits of the DCSH and A:-ary 77-cubes under adaptive routing strategy. The analysis takes into consideration real-world factors, such as router complexity and bandwidth constraints imposed by implementation teclmology. However, in any network, the routing of unicast messages is not the only factor in traffic control. In many situations (for example, parallel iterative algorithms, memory update and invalidation procedures in shared memory systems, global notification of network errors), there is a significant requirement for broadcast traffic. The DCSH, by virtue of its use of hypergraph links, can implement broadcast operations particularly efficiently. The second part of the thesis examines how the DCSH and A:-ary 77-cube performance is affected by the presence of a broadcast traffic component. In general, these studies demonstrate that because of their relatively high diameter, k-my 77-cubes perfdrm poorly when message lengths are short. This is consistent with earlier more simplistic analyses which led to the proposal for the express-cube, an enhancement of the basic A:-ary 77-cube structure, which provides additional express channels, allowing messages to bypass groups of nodes along their paths. The final part of the thesis investigates whether this “partial bypassing” can compete with the “total bypassing” capability provided inherently by the DCSH topology. Dedication There is no doubt, in my mind, that this work would not have Been completed without the help of Allah (SWT) and memory of my father. My only regret is that he is not with us to see that one of his greatest wish has come true. This thesis is dedicated to my father whose encouragement, during his life time and continuous reminder, after his death enabled me to pull through the low' times during the course of this research. Ill Acknowledgement I would like to express my grateful thanks to Dr. Lewis Mackenzie for his great help thi'oughout the course of this work. My great thanks are also due to Dr. Mohamed Ould- Khaoua. They both provided me with continuous guidance, encouragement, and valuable feedback since the early stages of this thesis. I would also like to thank Professor D.C. Gilles and Dr. J. Jeacocke for providing help and advice any time I asked. I am particularly grateful to my mother, sisters, sister-in-law, brothers, brothers-in-law, and especially adorable nephews, for their unconditional love and moral support. I was fortunate enough to make the acquaintances of Mrs. Nasreen Ahmad and her family, who have become and will remain my second family in Glasgow. I thank them for their support. Finally, I am indebted to the Algerian Government and British Council for the financial support and the opportunity they gave me to study for this postgraduate degree. IV Contents 1 Introduction 1 1.1 Interconnection Networks and Their Features 3 1.1.1 Topology 3 1.1.2 Switching Method 5 1.1.3 Routing Algorithms 7 1.1.4 Traffic Distribution 10 1.1.5 Implementation Constraints 11 1.2 Distributed Crossbar Switch Hypermesh 13 1.3 Motivations 16 1.4 Outline of the Thesis 18 2 A Performance Model of Adaptive Routing in the DCSH 19 2.1 Introduction 19 2.2 Adaptive Routing Algorithms and Router Design 20 2.2.1 Adaptive Routing Algoritluns 21 2.2.2 Router Design 23 2.3 Duato’s Algorithm in the DCSH 24 2.4 Analysis 27 Contents 2.4.1 The Model 29 "■7 2.4.2 Validation 36 2.4.3 Impact of Virtual Chamiels on the DCSH Performance 40 I 2.5 Conclusion 41 1 3 Evaluation of Routing Algorithms in the DCSH 43 3.1 Introduction 43 3.2 Outline of the Models 44 1- 3.2.1 The Hop-based Algoritlun 45 3.2.2 The Deterministic Algorithm 47 : 3.2.3 Validation of the Models 52 3.3 Performance under Uniform Traffic 55 3.4 Performance under Non-uniform Traffic 58 3.5 Conclusion 63 4 Performance of the DCSH and /r-ary «-cubes with Adaptive Routing 65 7 4.1 Introduction 65 j: 4.2 Analysis 66 /S; ■ 4.2.1 Outline of the Model 67 1 4.2.2 Model Validation 73 4.3 Comparison Results 76 4.4 Conclusion 83 V 5 Performance of the DCSH and A-ary «-cubes in the ; Presence of Broadcast Traffic 84 1 VI Contents__________________________________________________________________ 5.1 Introduction 84 5.2 Broadcast Implementation Schemes 85 5.3 Simulation Model 88 5.4 Performance Results 90 5.5 Conclusion 96 6 The ^‘Express” Channel Concept in the DCSH and A:-ary «-cubes 97 6.1 Introduction 97 6.2 Express-cubes 98 6.3 Analytical Models 100 6.3.1 The Express-cube Model 100 6.3.2 The DCSH Model 111 6.3.3 Validation of the Models 113 6.4 Performance Comparison 117 6.4.1 Results for VLSI Implementation 118 6.4.2 Results for Multiple-chip Implementation 123 6.5 Conclusion 125 7 Conclusions and Future Directions 126 References 132 v ii List of Figures 1.1 A 2-dimensional hypermesh (&=4). 4 1.2 Example of messages sharing the physical channel bandwidth. 7 1.3 Example of a deadlock situation caused by four messages. 8 1.4 Structure of a cluster in the DCSH (^ 4 ). 14 2.1 DCSH router structure with deterministic routing. 25 2.2 An example of a deadlocked configuration in the DCSH. 26 2.3 DCSH router structure with adaptive routing. 28 2.4 Transition diagram to compute the occupancy probabilities of virtual channels in the DCSH 33 2.5 Latency predicted by the model against simulation results in 2D-DCSH, A=0. a) K=2, b) L-4. 38 2.6 Latency predicted by the model against simulation results in 2D-DCSH, A=2. a) V=2, b) L=4. 39 2.7 Impact of virtual channels on the DCSH performance. 41 3.1 Diagram of a message path in the network. 48 3.2 Validation results of the hop-based algorithm model, K=2. vin List o f Figures a) A=0, b) Df=2. 52 3.3 Validation results of deterministic algoritlim model, F=2. a) Dt—0, b) Z)/=2. 53 3.4 Validation results of deterministic routing model, V-4. a) D(~0, b) Z)f=2. 54 3.5 Comparison results of deterministic and adaptive routing algorithms under uniform traffic, a) =32 flits, b) =128 flits.

2.2 Adaptive Routing Algorithms and Router Design 20

Adaptive Parallelism for Coupled, Multithreaded Message-Passing Programs Samuel K

Detecting False Sharing Efficiently and Effectively

Leaper: a Learned Prefetcher for Cache Invalidation in LSM-Tree Based Storage Engines

CSC 256/456: Operating Systems

Improving Performance of the Distributed File System Using Speculative Read Algorithm and Support-Based Replacement Technique

Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Tunes Bang Bang (My Baby Shot Me Down) Nancy Sinatra (Kill Bill Volume 1 Soundtrack)

Efficient Synchronization Mechanisms for Scalable GPU Architectures

CIS 501 Computer Architecture This Unit: Shared Memory

Download Slides As

A Survey of Cache Coherence Schemes for Multiprocessors

In-Network Cache Coherence

Invisispec: Making Speculative Execution Invisible in the Cache Hierarchy