Digital Technical Journal, Number 5, September 1987: Vaxcluster Systems

VAX clusterSystems Digital Technical Journal Digital Equipment Corporation Number 5 September 1987 Editorial Staff Ediwr- Richard W. Beane Production Staff Production Editor- Jane C. 131ake Designer- Charlotte Bell Interactive Page Makeup- Te rry Reed Advisory Board Samuel H. Fuller, Chairman Robert M. Glorioso John W. McCredie Mahendra R. Patel F. Grant Saviers William D. Strecker The Digital Te chnical journal is published by Digital Equipment Corporation, 77 Reed Road, Hudson, MassachusettS 01749. Changes of address should be sent to Digital Equipment Corporation, attention: Media Response Manager, 444 Whitney Street, NR02-1/J5, Northboro, M.A 01532-2599 Comments on the content of any paper are welcomed. Write to the editor at Mail Stop HL02-3/K11 at the published-by address. Comments can also be sent on the ENET to RDVAX::BEANE or on the ARPANET to BEANE%RDVAX.DEC@DECWRL. Copyright© 1987 Digital Equipment Corporation. Copying without fee is permiued provided that such copies are made for use in educational institutions by facuhy members and are not distributed for commercial advantage. Abstracting with credit of Digital Equipment Corporation's authorship is permitted Requests for other copies for a fee may be made to the Digital Press of Digital Equipment Corporation. All rights reserved. The information in this journal is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. ISBN 1-55558-004-1 Documentation Number EY-8258E-DP The following are trademarks of Digital Equipment Corporation: CI, DEC, DECnet, DECnet-VAX, DECsystem-1 0, DECSYSTEM-20, Digital Network Architecture (DNA), Digital Storage Architecture (DSA), the Digital logo, HSC, Local Area VAXcluster, MicroVAX, MicroVAX II, MicroVAX 2000, Q-bus, RJ\1S- II, SA482, UNIBUS, VAX, VAX-11/750, VAX-11/780, VAX-11/782, VAX-11/785, VAX 8600, VAX 8650, VAX 8700, VAX 8974, VAX 8978, VAXcluster, VAXstation, VAXstation JJ, VAXstation ll/GPX, VA.Xstation 2000, VMS, Cover Design VT, VT220 VAX cluster systems are featured in this issue. The central connection between the elements in a cluster is called the IBM is a registered trademark of International Business Machines, Inc. Star Coupler. Our star-filled cover evokes the thousands of VAXcluster systems now operating worldwide. The image was Intel is a trademark of Intel Corporation. created using the Lightspeed System. Lightspeed is a trademark of Lightspeed Computers, Inc. The cover was designed by Barbara Grzeslo and Tim Roberts Book production was done by Educational Services of the Graphic Design Department. Media Communications Group in Bedford, MA. Contents V AXcluster Systems 7 The VAX cluster Concept: An Overview of a Distributed System Nancy P. Kronenberg, Henry M. Levy, William D. Strecker, and Richard J. Merewood 22 The System Communication Architecture Darrell J. Duffy 29 The VAX/VMS Distributed Lock Manager William E. Snaman, Jr. and David W. Thiel 4 5 The Design and Implementation of a Distributed File System Andrew C. Goldstein 56 Local Area VAXcluster Systems Michael S. Fox and John A. Ywoskus 69 VAXcluster Availability Modeling Edward E. Balkovich, Prashant Bhabhalia, William R. Dunnington, and Thomas F. Weyant 80 System Level Performance of VAX 8974 and 8978 Systems Daeil Park, Rekha D. Von Ehren, Tzyh-jong Wang, and Nii N. Quaynor 93 C/ Bus Arbitration Performance in a VAXcluster System Xi-ren Cao, Nii N. Quaynor, and Fernando C. Colon Osorio Editor:ls Introduction Additional fe atures were needed in the VMS software to accommodate accessing disks on multiple systems. The distributed lock manager, described by Sandy Snaman and Dave Thiel, pro· vides the synchronization needed to accomplish transparent data transfers between cluster mem· hers. Other changes were also needed to broaden the file functions performed by the VMS software. Andy Goldstein relates some alternative ways to expand those functions and how the QJO proces· sor was extended ro synchronize file accesses. The resulting system of locks and queues pro· Richal"d W. Beane Editor vides a consistent sequence for managing dis tributed files. The next paper. by Mike Fox and john Ywoskus, describes the extension of the VA.-'Ccluster con VAXcluster systems are closely coupled configu· cept ro systems connected with an Ethernet. rations of VAX CPUs and storage devices. The VAX Tbcsc Local Area VAXcluster systems use special CPU at any node can comm unicate with software to provide fu nctions needed by clusters, the processor and storage devices at any other but not provided by Ethernet software. Thus, node in the cluster. The interconnects and soft· MicroVAX II and other small VAX systems can be ware used to activate this unique concept allow clustered to yield significant amounts of process· data transfers at up to 70 megabits per second ing power. between nodes. This issue of the Digital Te chni· The last three papers deal with performance cal jo urnal contains papers about some of the aspects of VAXcluster systems. The paper by Ed key hardware and software features in these sys· Ba lkovich, Prashant Bhabhalia, Dick Dunnington, terns , as well as some measures of their perfor· and Tom Weyant discusses the results of a VA X· mance. Since several organizations within Digital cluster model that demonstrates how redundancy are responsible for various VA Xcluster features , improves availability. Then, Dale Pa rk, Re kha these papers are comribured by engineers from a Von Ehren, T·J. Wa ng, and Nii Quaynor describe wide spectrum of engineering groups. two models they developed to measure the per· Since the VAXc luster concept spans such a formances of VAX 8974 and 8978 systems. These range of technologies, the first paper is an models, based on benchmarks run in different overview explaining generally how these sys environments, use a VAX 8700 CPU for a baseline tems work. Nancy Kronenberg, Hank Levy, Bill comparison. Strecker, and Richard Merewood describe the The fi nal paper relates the resu lts of a model to architecture, the storage control , the VMS soft· measure the characteristics of the CI bus. Xi-rcn ware alterations, and the multitude of activities Cao , Nii Quaynor, and Fernando Colon Osorio that control access to the storage devices. describe how their model measures the per· The System Communication Architecture, fo rmance of the arbitration algorithm in this described by Darrell Duffy , is the structure that bus. They suggest some interesting schemes to allows the nodes in a VA Xcluster system to coop improve utilization and reduce response time. erate. This relatively simple framework governs the sharing of data between resources at the nodes and binds together applications that run on diffe rent VAX CPUs. 2 Biographies Edward E. Balkovich Ed Ba lkov ich is the manager of V�'{c luster System Engineering, which addresses issues of VAXcluster pcrformance, availability and architecture for High Performance Systems. He was Digital's associate director of Project Athena at M.I.T. and is an Adjunct A.o;sociate Professor at Brandeis University. Before joining Digital in 1 9RI, Eel was a fa culty member at the University of Connecticut. He earned his B.A. degree ( 1968) from the University of California at Berkeley, and his M.S. ( 197 1) and Ph. D. (1 976) degrees from the University of California at Santa Ba rbara . He is a member of the ACM and IEEE. Prashant Bhabhalia A principal engineer in V�'Ccluster Systems Engi· neering, Prashant Bhabhalia develops and interprets re liability and availabil· ity models. Earlier, he was a program manager in Computer Systems Manu facturing and a senior engineer in GlA Manufacturing. Before joining Digital in 19RO, Prashant was an industrial engineer at Norton Company and Gits Plastic Corporation. He holds an M.S. I.E. degree ( 1974) from the Polytech nic Institute of Brooklyn and a B.S.M.E. degree ( 1972) from the M.S. Univer sity in India. Prashant is a senior member of l.I.E. Xi-Ren Cao A.<> a principal software engineer in the High Performance Sys tems and Clusters Group, Xi-Ren Cao models and evaluates VAX cluster con figurations. Before joining Digital in 1986, he was a research fe llow at Har vard Universiry. Xi-Ren has published over 20 technical papers on performance evaluation, si mulation, srochastic systems, queuing networks, and control theory, and has co-authored a book "Perrurbation Analysis of Dis crete Event Systems," to be published in 1988. He received his Ph.D. degree from Harvard University in 19R4 and is a member of IEEE. Fernando C. Colon Osorio Fernando Colon Osorio graduated from the University of Puerto Rico (B.S.E.E., 1970) and the University of Massachu setts (M.S., Ph.D , 1976). Joining Digital in l97(J. he helped design the PDP- 1Ij60 and PDP-11/74 systems and managed the LAN group in Corporate Research. Fernando also managed the overall design verification for the V�'{ 8600 project. In High Performance Systems. he now manages the systems research and advanced development group, responsible for VAXclusters. fault tolerance, advanced architectures, and performance analyses. He was Associate Editor of the IEEE Transactions on Computers and is the co-author of "Engineering Intelligent Systems." Darrellj. Duffy As a consulting software engineer. Dan-ell Duffy works on the network architecture for VAXcluster systems. On previous projects, he led the development of operating systems for parallel processors and wrote software for the Local Area Te rminal protocol . Darrell helped to deve lop DECnet software after joining Digital in 1977 He received a B.S. in com puter science from West Virginia University in 1972 and worked at the Un i versity of Florida. Darrell and three other Digital engineers have applied fo r a parent on the LAT protocol .

Digital Technical Journal, Number 5, September 1987: Vaxcluster Systems

Openvms Record Management Services Reference Manual

Software Product Description and Quickspecs

Operating System Support for Parallel Processes

Understanding Lustre Filesystem Internals

High Availability for RHEL on System Z

Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques

The Chubby Lock Service for Loosely-Coupled Distributed Systems

Systemprogrammierung Grundlage Von Betriebssystemen

Proceedings, ITC/USA

RSX - 11 M-PLUS Mini-Reference

Scaling HDFS with a Strongly Consistent Relational Model for Metadata Kamal Hakimzadeh, Hooman Peiro Sajjad, Jim Dowling

Clustering of Openvms Installations for High Availability