Information to Users

INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter free, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy subm itted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely afreet reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back o f the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. UMI A Bell & Howell Information Company 300 North Zeeb Road, Ann Arbor MI 48106-1346 USA 313/761-4700 800/521-0600 Designing Efficient Communication Subsystems for Distributed Shared Memory (DSM) Systems DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Donglai Dai, B.S., M.S. ***** The Ohio State University 1999 Dissertation Committee: Approved by Prof. Dhabaleswar K. Panda, Adviser & Prof. Ponnuswamy Sadayappan Adviser Prof. Wu-chi Feng Department of Computer and Information Science X3MX Number: 9919854 Copyright 1999 by Dai, Donglai All rights reserved. UMI Microform 9919854 Copyright 1999, by UMI Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. UMI 300 North Zeeh Road Ann Arbor, MI 48103 © Copyright by Donglai Dai 1999 ABSTRACT Programmability has proved to be the biggest obstacle to ubiquitous use of scal able high performance computing systems. The emerging distributed shared memory (DSM) systems, broadly classified into hardware DSM systems and software DSM systems, provide good programmability and scalability. The key to materializing the potential of DSM systems is to ensure low latency for various remote meraorj'^ and syn chronization operations. In this thesis we make four important contributions towards building efficient communication subsystems for DSM systems. First, we categorize various types of network contention and evaluate their impact on the performance of DSM systems. We show that network contention can affect DSM system performance significantly. Next, we develop a parameterized analytical model for estimating the performance of a DSM system by characterizing its key components and their interac tions. This model provides system architects with a fast and economical method for identifying the bottlenecks in existing or future DSM systems. Based on this model and detailed simulations, a set of network design guidelines are established. Third, we propose two new designs to improve the efficiency of node-network interfaces (NNI) : (i) a pipelined NNI supporting cut-through delivery and partial cache-filling and (ii) a novel block correlated FIFO strategy and its implementation exploiting multiple paths in interconnects. These designs can significantly reduce remote memory access latencies and the complexity of NNI. Finally, we propose two kinds of architectural 11 enhancements to the networks in DSM systems; (i) multidestination messaging mech anisms for reducing invalidation overhead for full-map cache coherence schemes and limited directory schemes, and (ii) unbalanced network designs exploiting the differ ent characteristics of request and reply traffic. The effectiveness of these new designs and enhancements has been evaluated extensively using a simulation-based testbed and benchmark applications. The experimental results demonstrate that the overall performance of current and future DSM systems can be improved significantly by using these novel designs and enhancements in the communication subsystems. Ill Dedicated to my parents Yuan Zhong Dai and Hai Yan Huang, for their love and faith IV ACKNOWLEDGMENTS I would like thank my advisor, Prof. D. K. Panda for his help, guidance, hiendship over the last four years. He heis spent an immense amount of time and energy in developing and polishing my research, writing, and presentation skills. His patience and dedication have transformed me from the diffident graduate student I was to what I am now. I would like to thank my mother and father for their unquestioned love, and faith, and guidance over the years. It is impossible to do justice to the encouragement that they have given me to continue my education. I thank my sister for her love and encouragement. I théink my wife Wendy for her love, faith, and support. I thank my daughter Allison for making my life so joyful. I would like to thank current and previous members of the PAC group. Prof. P. Sa dayappan, Prof. T. Page, Prof. W.-C. Feng, and other members of the NOW group for the valuable suggestions, feedback, and help. I thank Profs. Mike M.-T. Liu, Steve T.-H. Lai, Phil Krueger, C.-H. Huang, Anish Arora, Neelam Soundarajan, and D. N. Jayasimha for inspiring discussions, help, and friendship. I thank Debashis Basak for invaluable advice and help during the initial years of my Ph.D. I thank Rajeev Sivaram of IBM for help with Chapter 2, Mike GaUes of SGI for help with Chapter 4, and Prof. Jose Duato of Politec. de Valencia Univ. for inspiration on Chap ter 5. I thank the great Mends I have made at Ohio State: Debashis Basak, Rajeev Sivaram, Ram Kesavan, Sri Subramanian, Mohammad Banikazemi, Scott King, Matt Jacunski, Darius Buntinas, Pete Ware, Vijay Moorthy, Sandeep Gupta, Shiv Kaushik, Copal Dommety, Sandeep Kulkami, Ravi Prakash, Mangesh Ghiware, Jayanthi Sam- pathkumar, Sandeep Prabhu, Rohit Goyal, Manoj Pillai, Steve and Marika Fridella, Bobby Vandalore, Pradeep Chordia, Po-Wen Shih, Chao-Hui Wu, Jun Xu, Junfeng He, Ming Liu, Cho-yu Chiang, and Min-te Sun. I thank my teachers and friends from high school, Xian Jiaotong University, and Florida Atlantic University. I would like to thank Drs. Craig B. Stunkel, Ashwini Nanda, Bulent Abali, Micheal Maged, Micky Tsao, Doug Joseph, and Micheal Rosenfield for their help and friend ship during my summer internship at IBM T. J. Watson Research Center. I would like to thank the National Science Foundation for providing funds for my advisor to hire me as a Research Associate, and thank the Department of Computer and In formation Science and The Ohio State University for financial support and a great environment during my stay here. Last, but most importantly, I thank God for leading me through all the highs and lows during my PhD study and life in general. VI VITA June 5th, 1963 ................................................... Bora - Wenzhou, Zhejiang, China July 1985 ............................................................B.S., Computer Science and Engg., Xian Jiaotong University. Fall 1985 - Spring 1988 .....................................University Graduate Fellow, Xian Jiaotong University. June 1988 ............................................................M.S., Computer Science and Engg., Xian Jiaotong University. June 1988 - April 1990 .....................................Instructor and Research Staff Member, Xian Jiaotong University. Summer 1990 - Spring 1991 ............................Graduate Teaching Assistant, Florida Atlantic University. June 1991 - January 1993 ............................... Testing Programmer, IBM Boca Raton Division. Fall 1993 - Winter 1994 ................................... Graduate Research Associate, The Ohio State University. March 1994 ........................................................ M.S., Computer and Info. Science, The Ohio State University. Spring 1994- Winter 1999 ............................... Graduate Teaching/Research Asso ciate, The Ohio State University. Summer 1996 .....................................................Research Intern, IBM T.J.Watson Research Center. VII PUBLICATIONS Research Publications D. Dai and D. K. Panda. “Exploiting the Benefits of Multiple-Path Network in DSM Systems; Architectural Alternatives and Performance Evaluation.” IEEE Transac tions on Computers, Special Issue on Cache Memory and Related Problems, pp. 236- 244, Vol. 48, No. 2, February 1999. F. Silla, M. P. Malumbres, J. Duato, D. Dai, and D. K. Panda. “Impact of Adaptivity on the Behavior of Networks of Workstations under Bursty TraflBc.” Proceedings of the 27th International Conference for Parallel Processing, pp. 88-95, August 1998. D. Dai and D. K. Panda. “Evaluating Pipelined Node-Network Interface Designs for DSM Systems.” Technical Report OSU-CISRC-8/98- TR36, The Ohio State Univer sity, August 1998. D. Dai and D. K. Panda. “How Much Does Network Contention Affect Distributed Shared Memory Performance?” Proceedings of the 26th International Conference for Parallel Processing, pp. 454-461, August 1997. D. Dai and D. K. Panda. “How Can We Design Better Networks

Information to Users

2.2 Adaptive Routing Algorithms and Router Design 20

Adaptive Parallelism for Coupled, Multithreaded Message-Passing Programs Samuel K

Detecting False Sharing Efficiently and Effectively

Leaper: a Learned Prefetcher for Cache Invalidation in LSM-Tree Based Storage Engines

CSC 256/456: Operating Systems

Improving Performance of the Distributed File System Using Speculative Read Algorithm and Support-Based Replacement Technique

Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Tunes Bang Bang (My Baby Shot Me Down) Nancy Sinatra (Kill Bill Volume 1 Soundtrack)

Efficient Synchronization Mechanisms for Scalable GPU Architectures

CIS 501 Computer Architecture This Unit: Shared Memory

Download Slides As

A Survey of Cache Coherence Schemes for Multiprocessors

In-Network Cache Coherence