Netwo;king Products

Digital TechnicalJournal Digital Equipment Corporation

Number 3

September I 986 Contents

8 Foreword William R. Johnson, Jr.

New Products 10 Digital Network Architecture Overview Anthony G. Lauck, David R. Oran, and Radia J. Perlman

2 5 PerformanceAn alysis andModeling of Digital's Networking Architecture Raj Jain and William R. Hawe

35 The DECnetjSNA Gateway Product-A Case Study in Cross Vendor Networking John P:.. �orency, David Poner, Richard P. Pitkin, and David R. Oran ._ 54 The Extended Architecture and LANBridge 100 William R. Hawe, Mark F. Kempf, and Alan). Kirby

7 3 Terminal Servers on Local Area Networks Bruce E. Mann, Colin Strutt, and Mark F. Kempf

88 The DECnet-VAXProduct -A n IntegratedAp proach to Networking Paul R. Beck and James A. Krycka

100 The DECnet-ULTRIXSoftware John Forecast, James L. Jackson, and Jeffrey A. Schriesheim

108 The DECnet-DOS System Peter 0. Mierswa, David). Mitton, and Ma�ha L. Spence

117 The Evolution of Network Management Products Nancy R. La Pelle, Mark). Seger, and Mark W. Sylor

129 The NMCCjDECnet Monitor Design Mark W. Sylor

1 Editor's Introduction

The paper by Bill Hawe, Mark Kempf, and AI Kirby reports how studies of potential new broad­ band products led to the development of the Extended LAN Architecture. The design of the LANBridge 100, the first product incorporating that architecture, is described, along with the trade-offs made to achieve high performance. The speed of communication between terminals and systems depends on how they are connected. Bruce Mann, Colin Strutt, and Mark Kempf explain how they developed the LAT protocol to connect terminals to hosts on an Ethernet. The Ethernet Richard W. Beane , the DECserver 100, and the Editor DECserver 200 all use this new protocol. The next three papers describe how DNA was This third issue features papers about Digital's net­ incorporated into three different operating sys­ working products. Digital was an early advocate of tems. The first paper, by Paul Beck and Jim Krycka, distributed interactive computing, a concept explains how the DNA principles were built into allowing systems resources to be shared among the VAXJVMS system. The authors describe how many users over a network. Just as two persons transparency was achieved by a tight coupling frorri different cultures have problems communi· between the VMS software and the DECnet struc­ eating, however, so do computers with different ture. The software is Digital's second oper­ designs. Some standard set of rules is needed to ating system for its VAX computers. In the second allow successful interaction. paper, John Forecast, Jim Jackson, and Jeff The Digital Network Architecture (DNA) is the Schriesheim describe how they blended DNA into set of rules that defines how Digital's products the ULTRIX software. Several unique tools were communicate over a network. Being flexible, this developed to avoid changes to existing DNA imple­ architecture allows many ways for design ·groups to mentations. The DNA architecture has also been implement the DNA rules into various DECnet incorporated into the MS-DOS system in the products. DECnet-DOS product. The third paper, by Peter The first paper discusses the DNA structure and Mierswa, Dave Mitton, and Marty Spence, describes how it has evolved. Tony Lauck, Dave Oran, and how they built communication services into Radia Perlman describe DNA's design goals and the MS-DOS's background by writing new code and new functions supported in its four development borrowing existing code from the DECnet-ULTRIX phases. The tasks performed by the eight DNA lay­ software. ers are explained, with particular emphasis on the The final two papers discuss an important aspect network management and layers. of any network: its management. Nancy La Pelle, To achieve high performance, models and simu­ Mark Seger, and Mark Sylor discuss how network lations were used to the DNA structure. The management is built into many diverse DECnet paper by Raj Jain and Bill Hawe relates some case products. They describe Digital's common man­ studies, one for each layer, that resulted in faster agement architecture and the need to meld the communication. These models helped to optimize management of voice and data networks. The how data packets.are handled by simulating differ­ NMCCJDECnet Monitor controls a DECnet network ent traffic patterns. from a central location. Mark Sylor relates how this Although the DNA and SNA architectures are monitor functions, describing its database struc­ quite different, they can communicate through the ture and reports for the network manager. The DECnetjSNA Gateway product. John Morency, monitor's analysis techniques to identify real-time Dave Porter, Richard Pitkin, and Dave Oran problems are especially interesting. describe how the gateway's design accomplishes I thank John Adams, Andrea Finger, and Walt this communication. The authors describe the Ronsicki for their help in preparing this issue. components in each architecture and how mes­ sages are structured. _2)� 0�

2 Biographies

Paul R. Beck As a consulting software engineer, Paul Beck is currently the architect and was project leader for the DECnet-VAX product. He designed recovery mechanisms for high-availability software in the VMS group and was the network software architect for the DECdataway product. Before coming to Digital in 1977, he worked as a senior systems analyst at Applied Data Research, Inc. Paul earneda B.E.S. degree (1969) from Johns Hopkins University and an M.S.E.E. degree (1970) from Stanford University. He is a member of Tau Beta Pi and Eta Kappa Nu.

John Forecast John Forecast received his B.A. degree from the University of Lancaster in 1971 and his Ph.D. degree from the University of Essex in 1975. Joining Digital in the United Kingdom in 1974, he later moved to the United States to join the newly formed group that developed the DECnet-RSX Phase 2 products. John held various positions within this group through the development of DECnet Phase IV, then worked on the DECnet­ ULTRIX project. He is currently a consulting software engineer irt the Local Area Systems Group.

William R. Hawe A consulting engineer, Bill Hawe manages the Dis­ tributed Systems Architecture Group. He is designing new LAN interconnect system architectures and integrating ISO standards intj:> DNA. Bill helped to develop the Extended LANArchitect ure. For Corporate Research, he worked on the Ethernet design with Xerox and Corporations and analyzed the performances of new communications technologies. Before joining Digital in 1980, Bill taught networking and electrical engin�ering at Southeastern Massachusetts University, where he earned his B.S.E.E.:and M.S.E.E. degrees. Bill is a member of the IEEE 802 Local Networks Standards Committee.

James L. Jackson In 1976, Jim Jackson came to Digital after receiving a B.A.Sc . (E�E.) degree (1973) from the University of Waterloo and an M.Eng.Sc. (E.E.jC.S.) degree (1976) from the University of Queensland. He ' contributed to several projects since DECnet Phase 11 was conceived. Jim was project leaderjmanager for DECnet-ULTRIX V1.0 and was developer and project leader for multiple versions of DECnet-RSX and DECnet-IAS. He .also supervised a release of the DECnet Router and s01ire advanced develop· ment work. Currently, Jim is a software development manager working on distributed systems services projects. He is a member, of the IEEE.

3 Biographies

RajJain Raj Jain graduated from A.P.S. University (B.E., 1972), the Indian Institute of Science (M.E., 1974), and (Ph.D., 1978) . Joining Digital in 1978, he worked on performance modeling of VAXcluster systems, and Ethernet and other DNA protocols. During a one-year sabbatical at M.I.T., Raj taught a graduate course on modeling techniques. As a consult­ ing engineer, he is now engaged in performance modeling for distributed systems and networking architectures. A member of ACM and senior member of IEEE, Raj has written over 15 papers on performan�e analyses and is writ­ ing a textbook on performance analysis techniques.

Mark F. Kempf Mark Kempf is currently involved in planning Digital's next generation of interconnect products. A consulting engineer, he was the project manager for advanced development of the LANBridge 1 00 and DECserver 100. Coming to Digital in 1979, Mark worked on software fo r a DECnet front end and one of Digital's first implementations of Ethernet. Ear­ lier, he worked at Standard Oil of Indiana on real-time process control sys­ tems. Mark earned a B.S. degree from Northwestern University in 1972. He holds three patents, including one in bridge technology.

Alan J. Kirby As the manager of the Communications and Distributed Sys­ tems Advanced Development Group, Alan Kirby has managed and partici­ pated in the development of products such as the DECserver 1 00 and the LANBridge 1 00. Before joining Digital in 1981, Alan was the manager of net­ work development at National CSS, Inc., where he helped to design a large­ scale network. He received a B.S. degree (1974) in com­ puter science from Worcester Polytechnic Institute and an M.S. degree (1979) at the Polytechnic Institute of New York. Alan is a member of the IEEE.

James A. Krycka A principal software engineer, Jim Krycka is a supervi­ sor in the VMS Development Group and project leader of the VMS Batch/ Print facility. Joining Digital in 1972. as a software specialist, he provided technical support for PDP-1 1 and PDP-8 systems. In the VMS group, Jim designed and implemented the remote file access portion of the DECnet­ VAX software . He also helped to design the data access protocol and repre­ sented Digital on the ANSI committee working on ISO networking standards. Jim earned a B.S. degree (1970) in computer science from Michigan State University.

Nan cy R. La Pelle As a software engineering manager, Nancy La Pelle oversees the development of management software for network products. She chaired the task force to specify preliminary network management requirements for DNA Phase V. Joining Digital in 1977 as a senior software engineer, Nancy later worked in Software Services and Customer Services Systems Engineering. Earlier, she performed systems analysis and program­ ming for several companies. She earneda B.A. degree ( 1966) in French from the University of Pennsylvania and an M.A. degree ( 1968) in linguistics from Cornell University and studied for a Ph.D.· degree at M.I.T.

4 Anthony G. Lauck Tony Lauck is a corporate consulting engineer and group manager for architecture and advanced development of distributed systems. He headed the development of Phases II, III, and N of the DNA architecture. A member of the task force that led toithe development of Ethernet, Tony also led the effort to standardize it. His group developed con­ cepts and built prototypes for the LANBridge 100 and ;DECserver products. Before joining Digital in 1974, Tony worked for Autex� Inc., and the Smith­ sonian Astrophysical Observatory. He earned a B.A. degree ( 1964) in math- ' ematics from Harvard University. I I i Bruce E. Mann A consulting engineer, Bruce Mann[ is now studying the application of Digital architectures to commercial on•'line transaction pro­ cessing. He wrote the IAT architecture, creating its l first prototypes and products. An early contributor to Ethernet projects, Bd.ce helped to design the system interfaces. In 1978 he used networking to momate engine tests at the Volkswagon Research Laboratories. Before joini Jg Digital in 1976, he designed medical computer systems at the Harvard Mdedical School. Bruce earned a B.S.E.E. degree in 1971 from Cornell Univ rsity and with three other engineers has applied for a patent on the IAT pttotocol. I Peter 0. Mierswa A consulting engineer, Peter Mie wa is project leader for the DECnet-DOS and DECnet-Rainbow products. H lis currently studying the integration of OSI protocols into DECnet implemecltations. With Digital since 1977, Peter was a software specialist supporting sales,h then a networks consultant in the Large Computer Group. He was the project leader for the DATATRIEVE-20 system. Peter received a B.S.C.S. degree (Cum Laude) from S.U.N.Y. at Stony Brook, where his faculty named him its best graduating stu­ dent. He worked for S.U.N.Y. as a systems programmdr after graduation.

David J. Mitton Educated at the University of Michigan (B.S. Computer Engineering, 1977), Dave Mitton joined Digital after graduation. He first worked as a software engineer on communications microcode, developing firmware for microprocessors. Later, on the DECnet-RSX development team, Dave designed file access and transfer utilities. He was also the corporate architect for the data access protocol. After organizing the DECnet-DOS pro­ ject, Dave was that product's principal designer and developer of the inter­ nals architecture and implementation. He is currently a principal engineer.

john P. Morency John Morency is a consulting engineer currently defin­ ing future communication server architectures and developing simulation models for IBM interconnect products. He was a principal contributor to the DECnetjSNA Gateway. In other positions John performed worldwide technical support for DECnet, IBM, and X.25 products and supported sales of networking products to the banking and insurance industries. Prior to joining Digital in 1978, John worked at IBM Corporation and at General Electric Company. He earned a B.S. degree (Magna Cum Laude) in mathe­ matics and computer science from the University of New Hampshire in 1974.

5 Biographies

David R. Oran Dave Oran is a network architect working on the DNA naming service. He also worked on the SNA Gateway architecture and supported customers with large networks. Dave represents Digital on the ANSI and ISO committees for the OSI network layer. Before coming to Digital in 1976, he designed a nationwide network for the largest bank in Mexico and programmed at NASA. Earning a B.A. degree (1970) in English and physics from Haverford College, Dave is a member of ACM and was vice chairman of the Ninth Data Communications Symposium in 1985.

Radia J. Perlman Radia Perlman is a consulting engineer responsible for the specification of the protocols and algorithms in DNA's routing layer. On the LANBridge 100 project, she designed the spanning-tree algorithm. Before joining Digital in 1980, Radia was the network archi­ tect on the ARPA Packet Network at BB&N, Inc. She earned her S.B. (1973) and S.M. (1976) degrees from M.I.T. , both in mathematics. Radia is on the editorial board of Computer Ne tworks and ISDN Sy stems and has published numerous papers on networks.

Richard P. Pitkin As a principal engineer and project leader, Richard Pitkin was a senior contributor to the DECnetjSNA Gateway and VMSjSNA projects. His major work was in product development, testing, and per­ formance analysis. Currently, Richard is assessing the IEEE 802.5 token ring standard. In previous wo rk, he was a principal software specialist involved in worldwide technical support for IBM interconnect products. Before coming to Digital in 1979, Richard supported large timesharing systems for the State of Massachusetts. He earned his B.S. degree in math­ ematics from the University of Massachusetts, Boston.

David Porter Dave Porter joined Digital after earning his B.Sc. (Hons) degree with first-class honours in 1977 from Leeds University. In the U.K. , he designed X.25 interconnect products, and on U.S. assign­ ment, he worked on the SNA Gateway V1.0. Returning to the U.K., Dave developed a product connecting to IBM's DISOSS system. Currently, back in the U.S., he is a principal software engineer working on IBM inter­ connect products. Dave specified the architecture for the SNA Gateway Access Protocol V2.0 and worked on gateway management and security.

Jeffrey A. Schriesheim Educated at S. U.N.Y. at Binghamton, Jeff Schriesheim came to Digital in 1976 after earning B.A. (Fine Arts, 1970) and M.S. (Systems Design, 1975) degrees. He worked as a design engi­ neer, supervisor, and consultant on DECnet Phases II, III, and IV on soft­ ware supporting the RSX, lAS, PRO, and ULTRIX systems. Jeff also con­ tributed to the design of Ethernet products, including the DECnet Router and various terminal servers. Currently a consulting engineer, Jeff is working on extending DECnet services. He is a member of the DECnet Review Group.

6 I

Mark J. Seger In 1975, Mark Seger joined Digital after receiving a B.S.Ch.E. degree in 1972 and an M.S. degree in computer science in 1975 from the University of Connecticut. He first worked: on software applica­ tions for internal systems, including one that became the DEChealth sys· tern. Moving to the Industries 'Group, Mark helped to develop PjFM, a PBX and facilities management application. He is cur· rently working on concepts for managing networ¥, defining the next generation of network management products in the: Networks and Com- munications Group. I[ I I Martha L. Spence tb Marty Spence is the software I anager of the group that developed the DECnet-DOS, DECnet-RSX, and DECnet-ULTRIX prod- ucts. She was a project leader on the DECnet-DOS, ECnetjE (RSTS) , and PRO jDECnet products. Marty was also a supervisor in Distributed Systems 9I Computer Services. Prior to joining Digital in 197l she worked at GTE Sylvania and IBM Corporation and taught mathematics at the University of Notre Dame. Marty received her B.S. degree (1965) from the University of Illinois and her M.S. degree (1968) from Notre Dame, both in mathe- matics. She is a member of ACM. I

CoHn Strutt Joi.Ung Digital in 1980, Colin Strutt L project leader on several communications products, including DECnet-IAS and the Ethernet 1 Terminal Server. He is currently a consulting engineer responsible for product strategy of the terminal server fa mily. Coli� is the LAT architect and a member of the DECnet Review Group. FortbI erly, he worked for

British Airways , specializing in network support aridI operating systems. Colin received his B.A. (Hons) degree in 1972 an4 his Ph.D. degree in 1978, both from Essex University. He is a member of the British Com- puter Society and the ACM. '

Mark W. Sylor Mark Sylor is a principal software engineer currently working on the management architecture for a distributed computing sys­ tem. He was the principal designer and a developmc;nt supervisor for the NMCCfDECnet Monitor project. Mark also worked on analyzing perfor­ mance on the TOPS-20 system and on DECnet netWorks. He worked at GTE-AE for four years before joining Digital in 1979. Mark earned a B.A. degree at S. U.N .Y at Geneseo in 1971 and an M.S. degree at the University of Notre Dame in 1975, both in mathematics.

7 Foreword

' William R. Johnson, Jr. set of products confined to the RSX family of Vice President, operating systems, having limited functionality, Distributed Systems and poor performance and maintainability. Although creating serious problems in the field, DECnet Phase I also forced us to make a complete reappraisal of what it meant to be in the network business. As a result of this painful, yet valuable, Phase I experience, network spe­ cialists with direct communication to Engineer­ ing were placed in the field. And a strong archi­ tectural process, managed by Tony Lauck, and a certification and verification process were forged to ensure that products conformed to the DECnet architecture. The result was DECnet Phase II, the first set of DECnet products that ran on multiple operating systems. DECnet Phase II provided more user functionality, much better During the 1970s, the concept of the mini.com­ maintainability, and a much more robust net­ puter changed from a small computing engine work architecture than DECnet Phase I. How­ with minimal software to an effective, efficient ever, DECnet Phase II was still only usefulfor general-purpose computer. While this change small networks since it did not support routing. occurred, the need to exchange information And performance remained a problem, so that among these computers became progres· Digital was still not viewed in the industry as a sively greater. In most cases information was networking leader. exchanged using magnetic tape or, in the case of That recognition came with DECnet Phase III. many DEC computers, DECtape. In the early sev· Phase III provided adaptive routing, making pos­ enties, data communications was in its infancy; sible the building of networks with over 200 the only widely used communications protocol nodes. Additional user functionality was pro­ was 2780 BISYNC from IBM Corporation, a pro­ vided with the virtual terminal capability and the tocol for remote job entry. At this time, Digital release of the first SNA-interconnect product. was becoming very successful at a new kind of Although difficult to accept now, a network of computing, called interactive computing. It over 200 nodes was considered very large in became clear that we needed a flexible way to 1980; there were severe reservations about its interconnect Digital's systems, giving our cus­ ability to be managed. Yet DECnet Phase III was tomers the ability to share resources among these a major achievement. It was supported on seven machines. operating systems and three hardware families: As a result of this realization, a small group of the 36-bit, 32-bit, and 16-bit CPUs. DECnet people was asked to specify a network architec­ Phase III was also quite reliable. Our experience ture. That architecture was intended to work with the Phase III family of products was so good across multiple operating systems and to sup­ that many of the field controls erected to deal port multiple functions-file transfer, remote with Phase I and Phase II problems were resource access, and virtual terminals -and removed. multiple communication technologies- leased All our problems, however, were not solved lines, X.25, and dial-up networks. As it turned yet. The cost of the network links with DECnet out, that task was well beyond Digital's ability Phase III was still excessive. Thus the major to complete; for that matter, it was well beyond objective of DECnet Phase IV was to reduce this the existing state of the art. Therefore, DECnet cost by supporting a low-cost, high-performance Phase I fell far short of the ambitious goals set for multipoint interconnect: the Ethernet. Since it. The reality of DECnet Phase I proved to be a reducing the cost of networking was likely to

8 i increase the sizes of networks, we thought it was many cases DECnet architectsI and developers also important to eliminate the Phase III restric­ played a verysignificant part in ensuring that the tion of 256 nodes. Therefore, the architectural standard was technically exFellent. Today, Digi­ restriction for Phase IV was increased to 64,000 tal is recognized as a leader inI OSI because of our nodes, although the practical limit was about work in standards and the OSI products we are half that number. At that time, there were serious currently shipping. Had it npt been for the expe­ debates about whether anyone would ever build rience we gained from DECrietI development, it is such a large network. We now know that Digi­ quite likely that the OSI activity would be much tal's own internal network will be that large by further behind. ! the end of June 1987. By the standards of the cdmputer industry, ten Using the Ethernet concept was a major under­ years is a verylong time. Ye.t many of the people taking for Digital. The only way to get an inter­ working on the DECnet products today were connect that had both low cost and high speed working on them ten years ago. Most of the was to use a standard. Since no such standard authors who wrote the papers in this issue of existed, we had to create one. As a company, we the Digital Te chnical jo urnal were involved had little experience in generating standards; with DECnet Phase III; all of them were involved many people confidently predicted that such an with Ethernet. The DECnet architecture, as we effort would fail. Through hard work and persis­ know it today, is the result of many people work­ tence, however, we succeeded in that Ethernet ing together, trying to solve a problem that for standardization effort. Today, Ethernet is both. many years was imperfectly understood. Even the accepted market leader and, as IEEE 802.3, today, it seems barely credible. The papers con­ the only approved standard for local area tained within represent the work of many peo­ networks. ple, not just these authors. These papers describe I Shipments of DECnet Phase IV began in 1983. an environment that continues to evolve at a

Almost immediately, customers started to rapid rate. That environment1 is now fundamen- migrate from their old point-to-point networks to tally altering the way peop e use computers. the new multipoint Ethernet. By this time, an �i installed base of over 10,000 DECnet nodes existed in the field. Digital was adding to that base at the rate of 3,000 per year. Since the announcement of DECnet Phase IV, the growth in DECnet systems has been extremely rapid. From July 1986 to June 1987, Digitafwill ship over 30,000 DECnet licenses. By the end of that period, there will be over 100,000 computers running the DECnet system. Clearly, the DECnet concept has been both a technical and commer­ cial success. From a difficult and problem-filled start, it has evolved to become the standard by which other peer-to-peer networking products are measured. The concepts embodied in the DECnet architecture have been incorporated in many international standards. Much of that stan­ dards work has been done by Digital's DECnet architects and implementers. Digital has played a major role in the develop­ ment of the OSI protocols, which will over time become the international standard for network­ ing. Small working groups performed much of the technical work to develop these protocols. In

9 Anthony G. Lauck David R. Oran I Radia J. Perlman

A Digital Network Architecture Overview

TheDi gital Network Architecture (DNA) defi nes th e fu nctions, structure, interfaces, and protocols used in DECnet computer networks. These net­ works can be constructedfr om both local area and wide area communi­ cation technologies. Although evolving th rough jour phases in tenyears, th e DNA design goals have remained constant. Each phase bas supported new technologies and app lications, yet bas retained backward compati­ bility. Phase W contains the latest architectural design. The DNA func­ tions are described with emp hasisplaced on the relationships between layers and bow they cooperate to eliminate duplicate tasks. DNA's future evolution is discussed, showing Digital's commitment to the op en archi­ tecture principle.

Design Goals of the Architecture these trade-offs point out the need to use differ­ A small number of design goals have guided the ing facilities in tandem. evolution of the DNA architecture through its initial version, Phase I, to its current version, Wide Range of Topologies Phase N. These goals are described below. Cost-effective computer networks have many dif­ ferent configurations. Those differences reflect Co mmon Us er In terfa ce the location of computer systems, the availability A single interface to a computer network should of communications facilities, the application support a broad range of applications, isolating traffic patterns, and the performance require­ them from the details of network configuration ments. These configurations usually change as a and communications technology. This isolation network grows, yet the results may not always be allows a network to accommodate new applica­ optimum for changing traffic patterns. The tions as they are developed, sharing communica­ actual network configuration may differ from the tions facilities with existing applications. Thus intended configuration due to failures. A net­ networks can expand to adapt to new communi­ work must accommodate these changing config­ cations technologies without adversely affecting urations, or topologies, to provide a uniform ser­ those existing applications. vice to its users.

Wide Ra nge of Communications Available Service Fa cilities A computer network must provide an avail­ To be cost effective, computer networks must able communications service that its users support a wide range of communications facili­ can depend on to run their applications. This ties with a variety of cost, performance, and dis­ requirement implies that networks must detect tance trade-offs. For example, an Ethernet local and recover from failures and isolate and bypass area network (LAN) can economically support faults. Single failures should minimally affect the data communication in a building or on a cam­ network's operation. pus at a data rate of 10 million bits per second. Leased lines, on the other hand, are currently Extensible economical at a data rate of 9600 bits per second A computer network must be able to evolve to but over thousands of miles. Since communica­ adapt to new technologies. Older and newer tion resources should be shared among users, computer systems and communications facilities

10 Digital Tecbnical]ournal No . 3 September 1986 New Products

must be able to coexist in the same network. upper layers. Upper layers should rely on the ser­ Thus the network architecture must adjust to vices provided by lower la ers without having new technologies, some of which were not envi­ the detailed knowledge of� how they actually sioned when the architecture was originally provide those services. developed. Address Computer Sy stems Un iformly Easily Implemented It should be possible to communicate between The network architecture must be implemented computer systems no matter where they are on a range of computer systems, from small per­ located in the network. This communication sonal computers to superminicomputer or can be done if nodes are as$igned addresses that mainframe systems. The architecture must be can be used anywhere in tll'e network to specify implemented over a range of communication the as either a sour�e or destination of hardware and be cost effective so that either messages. small or large networks can be constructed. This need implies that the architecture must permit Implement Fu nctions at the Highest 1 simple implementations as well as more com­ Practical Level I plex ones to conform to the needs of individual When a function is implemented at a high level, computer systems and network configurations. it gains the use of lower-level functions, thus simplifying the implementation of the higher­ Cost Effective level function. If a function were implemented Implementations of the DNA architecture should at a low level, it might haye to duplicate func­ be cost effective compared to the alternative of tions already provided at some intermediate ·1 an application-specific network architecture. level. This attribute will encourage the use of a com­ mon network architecture, with the resulting Use Dy namic Adap ta�ion economies of scale. The configuration of a computer network will I constantly change as the network expands and as Design Principles its components fail and are testored to service. It In addition to the goals described above, the is highly desirable for the nFtwork to adapt to its development of DNA has been guided by a current configuration without manual interven­ number of important design principles. We tion. This adaptability makes the network easier chose these principles in concert with the to install, easier to operate, ' and more resistant to goals and with the benefit of experience in failures. research networks. Such networks include the ARPA, National Physical Laboratory, and Use Stable, Robust Algorithms Cyclades networks. These design principles are A computer network is a large, complex system. described below. Of course, several general Like any such system, a �etwork may exhibit design principles, such as simplicity and modu­ unanticipated and undesirableI behavior. When larity, also guided the development of the DNA designing critical algorithms, network relia- architecture. bility and availability mustI not be compromised in favor of small impro�ements in average I Distribute Fu nctions performance. , Functions should be distributed among the computer systems in a network to avoid Evolution of DNA andDECnet single points of failure and encourage parallel Products operation. In the ten years since it was first announced, DNA has had four major phases, each bringing new Us e a Hierarchically Layered Structure capabilities to the DECnet family. These capabil­ Functions should be divided into layers to factor ities have increased the rapge of computer sys­ architecture complexity into easily understood tems implementing the architecture, the number pieces and to facilitate the architecture's evolu­ of applications and communications technolo­ tion. Lower layers should provide their defined gies supported, and the si�e and complexity of services without concerning themselves with network topologies. In addition to functional II I Digital Te cbntcalJournal 11 No. 3 September 1986 ------A Digital Network Architecture Overview

improvements, the DNA protocols have been face on any DECnet products. Implementations enhanced to improve network performance and of Phase III and Phase IV still suppon applica­ robustness. tions programs written and debugged on Phase II. Upward compatibility of the user interface Phase I became another DNA goal. Phase I was the initial phase of DECnet products. First announced in 1975 and delivered in 1976, Phase Ill they supponed only the RSX family of operating Phase III was announced and the first products systems for the PDP-11 computers. Phase I prod­ . were delivered in 1980. Phase III added the ucts supponed program-to-program communica­ capability of "route-through" to DECnet tions and file transfer between directly con­ networks. Two systems could communicate nected computer systems. Synchronous, asyn­ if there was a path between them consisting chronous, and parallel communications devices of communications links and zero or more were all supponed. intermediate "routing" systems. Initially, Phase III networks were artificially restricted Phase II to 32 nodes and a maximum network path Phase II was announced in 1977 and the first length between any two systems of 6 "hops. " products were delivered in 1978. Phase II These restrictions were quickly lifted, extended the capabilities of Phase I to a wider however, and network sizes of up to 255 nodes range of computer systems, including all the became practical. Buffer size limits and operating systems for the PDP-11, TOPS-I 0, routing message formats imposed this 255-node TOPS-20, and VAXjVMS operating systems. The limit. communication capabilities were the same as Phase III also added the capability to distribute Phase I: point-to-point communication using network management. The installation, configu­ synchronous, asyncl}ronous, and parallel com­ ration, operation, and maintenance of a network munications devices. could now be done from one or more systems The architecture needed several changes to serving as management nodes in the network. adapt to a heterogeneous collection of computer Thus network usage trends could be gathered for systems. The most significant change was made planning network expansions and reconfigura­ to the user interface. While retaining process-to­ tions. Moreover, systems without mass storage process communication, user-interface functions could be loaded or dumped over the network, had to be modified to conform to the diverse and loopback tests could be run to exercise the needs of timesharing and real-time operating sys­ network and isolate faults. tems implemented on three different computer architectures. These modifications included Phase IV accommodating systems that worked on both a Phase IV was announced in 1982 and the first stream and a message model of IjO. Algorithms products were delivered in 1984. Phase IV for controlling message flows between computer expanded the range of communications facilities systems were also revised. These revisions to include Ethernet IANs and X.25-based packet corresponded with the differing needs of real­ networks. Addressing was expanded to 16 bits time computer systems, with their statically and structured hierarchically. A large network assigned buffer memories, and time-sharing could be divided into separate "areas" to sim­ systems, with their more dynamic memory­ plifyrouting calculation. An area is a contiguous management policies. Finally, the file transfer ponion of a network. A network could now con­ protocol was extended to accommodate the sist of up to 62 areas with up to 1,023 nodes per needs of differing file systems. This extension area. Networks of thousands of nodes were now provided the translation between dissimilar practical for the first time. A virtual terminal systems so that files could be moved between capability was also provided, making possible them. remote terminal access across the network. Gate­ In Phase II, the user interface had to be way functions were defined to map between SNA changed to adjust to a wider range of computer and DNA networks and to provide transparent systems. Since Phase II, however, there have access of X.25 networks through a DECnet been no incompatible changes to the user inter- network.

12 Digital TechnicalJournal No. 3 September 1986 New Products

Overview of th e Ph ase W DNA DNA provides a precise ,specification of the Arch itecture protocols between layers. This precision is nec­ The DNA structure, being hierarchically layered, essary to ensure that separate DNA implementa­ supports communication between adjacent lay­ tions can intemperate. In some cases the details ers within a single system by using architec­ of protocol operation are left to the imple­ turally defined interfaces.1 In addition, commu­ menters. If incompatible dhoices are possible, nication takes place between computer systems the standard specifies rules for negotiation to select compatible options.: The DNA specifica­ by exchanging messages following architec­ � turally defined protocols. A protocol specifica­ tions include model proto ol implementations tion defines the messages to be exchanged and with which all valid implementations· must com- the procedures for sending and receiving those municate correctly. messages. Protocols in the DNA arqhitecture are strictly From an architecture perspective, a computer layered. These are peer protocols that define the network is divided into a set of computer sys­ relationships between modules in the same layer tems, each system being divided into layers. Each of two computer systems. 'the protocols in each layer contains one or more protocol modules. layer operate independently, with communica­ Figure 1 illustrates the case of a two-node tion between layers taking b1ace only across the network in which communication between defined interfaces. This structure allows the modules takes place vertically, using interfaces. independent replacement br addition of proto­ In other systems, communication between cols in each layer as the arthitecture evolves. It modules can take place horizontally, using should be pointed out tha strict layering is an it protocols. architectural, not necess rily an implementa­ The DNA architecture specifies the functional tion, concept. Experien{ e in implementing interfaces between layers. These interfaces DNA has shown that sign�ficant performance define the services each layer provides to higher improvements are possible by collapsing the implementation of severalllayers. However, this layers and thus firmly partitions the architecture ' into layers. The interfaces defined by the stan­ performance improvement is obtained at the cost f dard are abstract specifications, concentrating on of reduced modularity and lexibility of that par­ the functions to be provided, not on the form of ticular implementation. the interface implementation. The details of Figure 2 defines the DNA layers and the inter­ interface realization are left to the implementers faces between them. Each layer is discussed of each DECnet product. These details typically below in terms of its functions, interfaces, and depend on the structure and conventions estab­ protocols. lished in each . The user layer contains user-defined functions such as applications programs. Only one func tion is specified by DNA for this layer: the net­: work control program, or NCP. This is a network PROTOCOLS management module that implements the MODULES • MODULES LAYER n •I I network command language specified by I I I I I liNTER FACES I INTERFACES DNA network management, thus providing a user interface to DNA network management PROTOCOLS functions. Three interfaces to lower DNA layers are also provided. Application programs in the user PROTOCOLS layer can directly access the session layer for program-to-program communication. They can also use the application layer to gain access PROTOCOLS to common network services, such as remote file access. Finally, they can access the network management layer. Tha� access allows pro­ NODE A NODE B grammers to write applications that enhance the basic network management capabilities provided Figure 1 Basic DNA Structure by NCP.

Digital Tecbnlcal]ournal 13 No. 3 September 1986 A Digital Ne twork Architecture Overview

Three protocols are defined for the network management layer: the network information and USER MODULES I J control exchange (NICE), the event logger, and + the maintenance operations protocols. NETWORK MANAGEMENT MODULES NICE sends commands and responses between r=l I peer network management modules in two � nodes. The functions controlled include the fol­ NETWORK APPLICATION MODULES �I J lowing: � • Loading and dumping remote systems SESSION CONTROL MODULES F=--1 • Changing and examining network parameters � J END COMMUNICATION MODULES • Examining network counters and events that I indicate how the network is performing � ROUTING MODULES • Testing links at both the and session � J control levels DATA LINK MODULES • Setting and displaying the states of nodes and I ! lines PHYSICAL LINK MODULES NICE is a simple request-response protocol that J uses the session control interface to permit net­ work-wide management control. The event logger protocol sends significant -CLIENT ====- NETWORK MANAGEMENT events from the nodes in which they are detected to nodes in which these events can be logged. Examples of such events include lines coming up Figure 2 DNA Layers and Interfaces or going down, error counters reaching thresh­ old values, and nodes becoming unreachable. Events can be filtered at the source node, making Network Ma nagement Layer it possible to control the amount of network traf­ The network management layer provides decen­ fic generated by event logging. Like other net­ tralized management for a DECnet computer net­ work management functions, this filtering can be work.2 The network management modules controlled remotely using the NICE protocol. within a node are responsible for two functions. Event logging uses the session control interface First, they coordinate the management of that to permit network-wide event logging. Events particular node; second, they communicate with are sent to "logging sinks," which can be con­ peer management modules in remote nodes as sole printers, disk files, or special applications needed to accomplish decentralized manage­ programs. . ment. The network management module uses the The maintenance operations protocol (MOP) services of three lower layers to provide its func­ performs loop back tests at the data link level, tions. The network applications layer is used to controls unattended systems remotely, and provide common network services, such as down-line loads or up-line dumps computer sys­ remote file access. The session control layer is tems having no mass storage.3 Like most data link used to communicate with peer entities as protocols, MOP uses sequence numbers, timers, needed for decentralized management. The mod­ and acknowledgements to detect and recover ule has a special interface to the data link layer so from data link failures. Unlike other DNA proto­ that simple management functions can be pro­ cols, however, MOP is designed for extreme sim­ vided between adjacent nodes. These functions plicity rather than performance. This design include remote restarting, down-line and up-line makes it possible to implement MOP in small loading, and loopback testing. Intermediate pro­ ROMs. MOP has also been implemented in com­ tocol layers are bypassed for these functions; munications devices and bootstrap ROMs in such layers can be economically implemented in computer systems. Because MOP uses the data ROMs.

14 Digital Tec bnical]ournal No. 3 September 1986 New Products

i i link layer directly, its operation is restricted to Network Applications: Layer adjacent nodes. A system to be down-line loaded The network application l�yer provides generic must be directly connected to its load server. services to the user and network management That load server might be controlled by an NCP layers.5 These services include remote file located elsewhere in a network (using NICE) . access and transfer, remote interactive terminal Moreover, the server might be reading the file access, and gateway access to non-DNA systems.6 containing the load image from yet another Modules in this layer operare independently and node (using the data access protocol, or DAP, in asynchronously. A single DECnet node may sup­ the application layer) . port many different network applications mod­ In examining Figure 2, note the arrows on the ules, which communicate using many different left side that connect the network management protocols. This layer supports modules supplied layer with each lower layer. Each arrow is a con­ by both Digital and users. As new network appli­ trol path used by network management to coor­ cations are developed, they can easily be added dinate the activity of each layer in a node. Each to this layer. One application layer protocol is lower layer of the DNA structure specifies a net­ described below to illustrate a typical operation work management interface that defines the net­ of the applications layer.1 work management functions provided by that DAP provides remote file access and transfer. layer. Two cooperating application layer modules A basic principle followed in designing DNA exchange DAP messages using the DNA session network management was to perform manage­ coiurol service: the user :module at the node ment functions at.the highest level possible. For requests the file operation, and the server mod­ example, management functions use the regular ule acts on the user's behaif at the remote node. applications layer service for accessing files Figure 3 depicts those services. containing management information. These These applications layer modules operate functions can also communicate using the nor­ under the control of either"a utility program or a mal services of the session layer. The alternative user program residing i� the user layer. For to this principle would have been to use some example, a file transfer o�eration might be ini­ special-purpose mechanism to achieve the man­ tiated by a utility program. In some DECnet agement functions, thus adding unwarranted implementations, such as the DECnet-VMS sys­ complexity to the architecture and to its imple­ tem, remote file operations are initiated by nor­ mentations. Out of practical necessity, however, mal VMS user programs using VMS file system the direct use of the data link layer by DNA man­ calls. File naming conventions will determine agement represents a partial deviation fr om this whether or not a local or remote· file operation design principle. Implementing all the lower is implemented.7 layers of DNA in ROM microcode was deemed to In a typical DAP protocol dialogue, the first be uneconomical. Although the sizes of low-cost message exchange involves configuration mes­ ROM memories have expanded in the last ten sages providing information about the operating years, fixing all the DECnet layers into ROM and file systems. Those �essages are followed remains undesirable. Changes to add new func­ by attribute messages that supply information tions or correct implementation or architectural about the file. An access-t1 equest message typi- bugs would simply require too many costly cally fo llows to open a p�icular file. A control hardware updates. message then sets up th'e data stream. After Another basic design principle fo llowed in file transfer has completed, access-complete designing DNA network management was to first messages will terminate the data stream. With specify primitive functions, then make them these messages, either an entire file can be available to network managers or to specialized transferred at one time or· portions of a file can applications programs. The goal was to have a be transferred, either randomly, sequentially, or simple, flexible structure implemented in all indexed. DECnet nodes while still providing the opportu­ nity fo r dedicating computing resources to the management of large networks. 4

Digital TechnicalJo urnal 15 No. 3 September 1986 A Digital Ne twork Architecture Overview

LOCAL NODE REMOTE NODE

USER LAYER THE ACCESSING PROGRAM: •NF"i' UTILITY •A VAX/VMS COMMAND • A USER-WRITTEN PROGRAM NETWORK APPLICATION LAYER NETWORK APPLICATION LAYER ·------· .------t DAP - SPEAKING ROUTINES: DAP - SPEAKING SERVER: • NFARs, OR ROUTINES IN THE • FILE ACCESS LISTENER LOCAL FILE SYSTEM (E.G.,RMS) SESSION CONTROL LAYER SESSION CONTROL LAYER END COMMUNICATION LAYER END COMMUNICATION LAYER ROUTING LAYER ROUTING LAYEA DATA LINK LAYER DATA LINK LAYER

DAP - DATA ACCESS PROTOCOL FCS - FILE CONTROL SYSTEM RMS - RECORD MANAGEMENT SERVICES NFARs - NETWORK FILE ACCESS ROUTINES NFT - NETWORK FILE TRANSFER UTILITY

Fig ure 3 File Tra nsfe r across a Network

DAP 's principal des ign problem was accom­ Session Control Layer modating the needs of diverse file systems. It was The sess ion control layer resides directly above necessary to define a mapping between the fea­ the end communications layer.8 The session con­ tures and functions of each different system. This tr ol layer provides system-dependent, process-to­ definition was not always easy to make. Some sys­ process communication functions for processes tems had differing capabilities (for example, residing in the us er, network management, and some supported index files); others had differing network application layers. These functions means of providing similar capabilities (for bridge the gap between the pure communication example, stream or record structures for text functions provided by the end communications files). Moreover, it was very important for file layer and the functions required by processes transfers between like systems to operate at max­ running under an operating system. The commu­ imum efficiency and to be completely transpar­ nication service provided by the session control ent. For example, it should be possible to copy a layer is connection oriented: an initiating pro­ file from one VMS system to another and still cess requests a connection to a destination pro­ retain exactly the same bit patterns in the copy. cess. The session control layer manages these Two design approaches were studied to achieve connections. Once a connection is established, these capabilities: using a common, or canoni­ data flows between the processes without further cal, file format in protocol messages; and per­ intervention by the session control layer, using forming needed translations. The canonical for­ the facilities provided by the end communica­ mat was rejected because it was not transparent tions layer. or efficient enough in the homogeneous case. When establishing a connection, the higher The second approach, in which translation is layer specifies th e destination process in two performed at the client DAP protocol module, parts: first, by destination node, then, by process was adopted. within destination node. Destination nodes are

16 Digital Te cbnicalJournal No. 3 Septe mber 1986 New Products

specified by a six-character node name. Each ses­ tion typically identifies the requesting user or sion control module contains a local copy of a requested account and, op�ionally, a password. node database that maps between the node names and the 16-bit node addresses used in the End Communications Layer end communications and routing layers. This The end communications layer, residing immedi­ node database is set up under the control of net­ ately above the routing layer, provides a stan­ work management and can be updated in a dardized communication service used by the decentralized fashion. higher layers of the DNA software.9 The end com­ Different operating systems employ different munications layer provides a reliable, sequen­ conventions to identify their processes. There­ tial, connection-oriented Service to the session fore, selecting a specific process in the destina­ control layer. The form�r layer isolates the tion system depends on the particular operating higher layers from any transient errors or system being used. However, the DNA session reordering of data introduced by lower layers. It control layer provides a mechanism for specify­ also provides a function, enabling ing processes generically by their fu nction, using multiple connections to be established between an object-type field. Thus the session control pairs of nodes or between a node and multiple architecture specifies a mapping between nodes. These connections are called logical reserved object-type field values and specific links. upper-layer protocol modules. For example, a The network services pr9tocol (NSP) provides specific object-type code is reserved todesignate the logical link service to the session layer, the process or processes implementing the exchanging protocol messages using the routing server end of the DAP fi le access protocol. This layer. The originating session control module code frees most network usage from having to asks the local NSP module to set up a logical link know the details of process addressing by the to a remote session control module. If the remote operating system. node is accessible via the routing layer and has The session control module at the destination resources to support an additional connection, system will either map an incoming connection the remote NSP module will indicate its desire to request onto an existing active process, activate a connect to the remote session layer. The NSP process, or create a new process, whichever is module will transfer session control protocol appropriate. For example, consider two possible data, such as user identifiers, passwords, and the implementations of the DAP server process. One DECnet object type. The remote session control implementation is multithreaded and supports module can either accept or reject the link. Only multiple simultaneous connections. When the when the logical link is finally established can it first connection is requested, the process would support the flow of data. be activated. Subsequent requests would map The connection management algorithms and onto the existing process. The second implemen­ protocol mechanisms of NSP are designed to tation is not multithreaded and supports only a ensure that data on each logical link flows inde­ single simultaneous connection. Each time a pendently fr om data on every other logical link. connection request is received, the connection This independence takes two forms. First, data must be mapped onto a new process, which may received on each connection will never be need to be activated or created. Whether pro­ mixed with data fr om any other connection, even cesses are activiated or created depends on oper­ one between the same two nodes and processes. ating system conventions and reflects the costs This restriction enables higher-layer protocols to of creating processes and keeping processes establish initial synchronization and to reestab­ dormant. lish synchronization if one computer has The session control layer provides one other crashed. Second, if data flow must be blocked on fu nction: it validates incoming connecting one connection (for exarhple, because there is requests using access control information pro­ nothing to send or no buffers are available to vided by the requesting session control module. receive) , data can still flow on other connec­ The details of this validation information depend tions. This data flow takes place even if memory, on the access control mechanisms provided by processing, and communications resources are the destination operating system. This informa- shared between the two connections.

Digital Technicaljo urnal 17 No. 3 September 1986 A Digital Ne twork Architecture Overview

Data flow on a logical link can be modeled by The Phase II version of NSP ran right on top of a pair of message queues in each direction. One a data link protocol, the Digital Data Communi­ queue in each direction handles the transmission cations Message Protocol (DDC MP). The DDC MP of "normal data" between higher-layer protoc ol protocol provided a reliable point-to-point com­ modules; that pair is used by all higher-level pro­ munic ations service, rendering unnecessary tocols. The other queue pair handles transmis­ NSP's use of timers to detect lower layer failures. sion of occasional shon "interrupt" messages; The Phase III version of NSP was designed to run this pair is used by some higher-level protocols. on top of the routing layer. This version included For example, the virtual terminal protocol uses a timer capability to detect and recover from interrupt messages to transmit interrupt com­ routing layer failures, suc h as the loss of a mes­ mands, suc h as those generated when a user sage when an alternate route must be selected enters the command CTRL-Y to a VMS system. On following a node or link failure. eac h queue, data flows on eac h queue indepen­ Only minor changes were made to NSP in dently of the other queue. Data is transferred Phase IV. Two changes improved the prot oc ol when the requesting session control module pro­ performance by reducing the number of control vides a message to be transmitted and the receiv­ messages exchanged to perform flow-control and ing session control module indicates its willing­ error-recovery functions. First, the protocol mes­ ness to receive, by providing a buffer for sage formats and procedures were allowed to example. NSP uses protocol messages and flow combine control messages with each other and control algorithms to ensure an orderly data flow with data messages. Second, provision was made on logical links. An orderly flow takes place even for selectively delaying acknowledgement mes­ if limited by the ability of the sources to provide sages, making it possible to send many data mes­ data, the network to transmit data, or the destina­ sages for eac h acknowledgement. In a typical tion to receive data. implementation of NSP, these changes make it In providing the reliable logical link service; possible for more than 90 percent of the mes­ NSP must exist in a hostile environment. In par­ sages transmitted by NSP to be data messages. ticular, NSP must operate correctly when the Reducing the number of messages exchanged routing layer occasionally loses, reorders, or improves the throuhput of DECnet implementa­ duplicates messages. Moreover, NSP must deal tions on Ethernet LANs by reducing the CPU time with potential confusion created by computers' needed to generate, transmit, rec eive, and crashing at one or both ends of the logical link; dec ode control messages. Reducing the number this is the problem of "half-open connec­ of messages decreases the common-c arrier tions. " 10· 11 NSP deals with these problems by charges when running NSP over X. 25 public data assigning logical link identifiers to each logical networks, whic h charge for each packet. link and by assigning sequence numbers to each data or flow-control message sent on each Routing Layer link. Timers are used to detect errors and initiate The routing layer provides a network-wide mes­ retries of operations: Should excessive retries sage delivery service. 12 This layer accepts mes­ appear to be required, NSP will repon this prob­ sages from the end communic ations layer in a lem to the session control layer, whic h source node and forwards the packets, possibly can dec ide to break the connection or continue through intermediate nodes, to a destination retrying.9 node. The routing layer implements a datagram NSP has evolved with eac h phase of DNA. In service, whic h delivers pac ketS on a best-effon ·· Phase II, the NSP protocol was revised to allow basis.n The routing layer makes no absolute dynamic sharing of message buffers between log­ guarantees against packets being lost, dupli­ ic al links. This capability, called optimistic flow cated, or delivered out of order. Suc h guarantees control, must deal with the delay between the are made by the end communications layer. To . time that data is requested and the time that data provide this network-wide service, the routing arrives. For example, during this delay on one layer calculates routes, using them to forward logical link, data on other links might arrive, pac kets. In the forwarding process, the routing thus consuming all the available NSP buffers. layer must attempt to avoid or at least control any The NSP protocol was designed to handle this congestion that results from overloading the net­ case correctly without deadlock. work with excessive traffic. The routing layer

18 Digital Tec bnicalJournal No. 3 September 1986 New Products

also ensures that packets do not wander around and destinations, including alternative paths to for too long before being delivered. Such "old" handle failures. These calculations rapidly packets might confuse the end communications become impractical as netWorks grow beyond a layer. few nodes. Figure 4, depicting a small DECnet The routing layer determines routes by using network, illustrates how routes are chosen using an adaptive, distributed algorithm that responds link costs. to changing network configurations. Routes The routing layer forwards packets based on between all sources and destinations are auto­ a uniform addressing scheme. Each node in a matically calculated or recalculated by the rout­ DECnet network is assigned a unique address, ing layer whenever new nodes or links are added used by the routing algorithms to calculate to or removed from the network. These changes routes. These addresses indicate each packet's to the network topology can either be planned source and destination and guide the forwarding by the network operators or result from the decision. A uniform addressing scheme allows unplanned failures and recoveries of network the higher layers and network applications to nodes and links. treat the network as a uniform resource. In Routes are calculated on the basis of link costs, Phase IV,addresses are 16 bits long and have two which typically are inversely proportional to link components: a 6-bit area field and a 1 O-bit node speed. The route to each destination node is address. As mentioned above , a network can be along a path having the minimum total path cost, divided into a maximum ,of 62 areas of up to which is the sum of the link costs along the path. 1,023 nodes. Routes are calculated at two Link costs can be set either by network managers, levels for each area: Level I routes carry traffic if desired, or by using default values. This fe a­ within the area; Level II routes carry traffic ture, called adaptive routing, is a key aspect of between areas. This hierarchical scheme makes it the DNA software, making it very easy to operate possible to build very large DECnet networks large DECnet networks. Without adaptive rout­ yet minimizes the memory, communications, ing, network operators and users would have to and processing requirements of the routing calculate paths between all pairs of data sources algorithm.

®, @ ... !J) � CIRCUIT COSTS IN INCREASING ORDER

NODE A WANTS TO SEND A PACKET TO NODE D. THERE ARE THREE POSSIBLE PATHS. PATH PATH COST PATH LENGTH

A to B, B to C, C to D ®+®+®�r 3 HOPS A to B, B to D ®+!J)�g 2 HOPS A to B, B to F, F to E, E to D ®+ ®+ ®+®� 13 4 HOPS

'7 IS THE LOWEST PATH COST; NODE A THEREFORE ROUTES THE PACKET TO NODE D VIA THIS PATH.

Figure 4 Routing Paths and Costs

Digital Te cbnical]ournal 19 No. 3 September 1986 ------A Digital Ne twork Architecture Overview

Ro ute calculatio n is done in a distributed Ethernet, the end no des send directly to the des­ fashion. Two typesof no des are defined in DNA: tination no de. End no des fo llow a simple proce­ end nodes, and ro uting no des. End no des have dure to determine which path to fo llow. If a only a single attachment to a network; there­ ro uter is present and the end no des do no t know fo re, they do not need to calculate routes or fo r­ abo ut a particular destination, they forward their ward packets on behalf of other no des. On the packets to the router. If no ro uter exists or if they other hand, ro uting no des support multiple links know a particular destination is on the Ethern et, and fo rward traffic on behalf of other nodes; they send their packets directly to the destina­ therefore, routing no des must calculate routes. tion address, using 48-bit Ethernet addresses Route calculation is performed using three major derived fro m the 16 -bit destination no de components: address. End no des learn that particular nodes are on • An initialization sublayer that determines their Ethernet by receiving packets directly from which links interco nnect with which no des those no des or by being informed by a router. • A decision process at each routing node that This approach was chosen to reduce the memory calculates routes to all destinations (within and overhead in end no des while still permitting one area for Level I ro uting or to each area multiple Ethernet LANs to reside in one DNA fo r Level II routing ) area. The alternate approach was to limit each DNA area to a single Ethernet, which wo uld have • An update process at each ro uting no de by limited the size of Phase IV networks. which ro uting nodes exchange information A DECnet network, like a co mplex network of abo ut their ro utes roads, is subject to co ngestion sho uld it be over­ The routing algorithm runs whenever the ini­ loaded. The routing layer incorporates several tialization sublayer at a routing node detects a design decisions to reduce the potential for con­ lo cal to po logy change. It also runs periodically gestion, to prevent lo cal congestion fro m spread­ to ensure that ro utes throug ho ut the network are ing globally, and to minimize the impact of con­ co rrect. This ro uting algorithm, ro bust and self­ gestion on network performance. To minimize stabilizing, recovers automatically fro m corrup­ congestion, traffic sho uld be kept out of con­ tion occurring in routing databases stored in gested po rtions of the network. To accomplish routing nodes or from any number of simulta­ that, each node restricts the number of buffers neous topological changes.12 available to traffic originating at a no de, thereby The routing layer supports a varietyof commu­ giving priority to traffic transiting the no de. nications facilities for communicating between Two design decisions help to prevent the adjacent nodes. A co mplete path from a so urce global spread of local co ng estion. The first deci­ no de to a destination no de can use a mixture of sio n was to keep ro uting as a function of the net­ link types. Three main types are supported: ded­ wo rk to pology, not of the netwo rk load. The sec­ icated links using the DDCMP protocol, X.25 ond decision was to handle co ngestio n by packet-switched networks, and Ethernet LA Ns. allowing packets to be discarded at a no de when X.25 packet-switched networks are treated by an output queue has filled, instead of slowing the routing layer as a co llection of po int-to�point down input to the no de. This second decision virtual circuits; hence, these networks function minimizes the impact of traffic flowing through similarly to DDCMP point-to-po int links. End the no de that does no t need the congested link. no des have a particularly simple task on these· The discard policy also prevents buffer deadlock, typesof links since end no des make no decision which occurred in early research netwo rks, by when sending a packet out; they simply send it preventing circular buffer waiting conditions. on the link to the adj acent node. The performance impact of co ng estion is mini­ End no de ro uting is somewhat mo re complex mized by this policy fo r limited buffer sharing on Ethernet LANs since each station can send between co ng ested links.14 to any other station on the Ethern et; therefore, Perhaps the most important decision made in the end no des attached to an Ethernet must make designing the DNA ro uting layer was to provide a a ro uting decision. When sending to no des "best-effort" delivery service instead of a "reli­ remote fro m their Ethernet, the end no des must able" service. This decision wa s made for a vari­ send to a ro uter. When sending to no des on their ety of reasons. First, implementing functions at

20 Digital TecbnU:aiJournal No. 3 September 1986 New Products

the highest practical level suggested that deliv­ allows the receiver to determine the beginnings ery guarantees should be provided by the end and ends of messages. Incoming bits are assem­ communications layer. In that way, reliable com­ bled into bytes by the communications hard­ munication could be provided to user-written ware, using startjstop bits fo r asynchronous links programs . As we have seen, reliable delivery is and synchronization characters fo r synchronous easily performed by the NSP protocol, involving links. only the cooperation of the two communicating The DDCMP protocol uses a 16-bit cyclic nodes. redundancy check (CRC-16) to detect errors in Second, providing reliable delivery in the headers or user data. On half-duplex or multi­ routing layer would have been quite difficult point channels, DDCMP executes link allocation since it would require synchronizing state infor­ procedures to ensure that two or more stations mation at many nodes. These nodes include all do not conflict in their use of the channel. These those on the path between the communicating techniques are based on polling in which one systems and possibly others that might be or station extends permission to the other to trans­ might have been used, since routes must change mit. DDCMP uses timers and sequence numbers when the network topology changes. Further to detect and recover from lost messages; it also complicating this problem is the fact that any prevents the process of error recovery from cre­ state information in intermediate nodes would ating duplicates. The routing layer uses the error be lost following a crash. detection-and-retry capability of DDCMP to Third, providing reliable delivery would have verify that links between nodes are operational complicated the congestion control problem and and to synchronize the operation of the routing required complex algorithms to avoid buffe r protocols. deadlock. Fourth, a best-effort delivery service The X.25 specification developed by the was a "least common denominator" among data International Telegraph and Consul­ link protocols then in use or being developed. tative Committee (CCITT) defines an interface Ethernet provides such a data link service. When between a packet-switched network, such as a Ethernet was added to the DNA architecture in public data network provided by a common Phase IV, its best-effort delivery service was a carrier, and data terminal equipment, such as a perfect match to the DNA routing layer. DECnet node.16•17 The service provided by packet-switched networks is a virtual circuit Data Link Layer service in which connections, called virtual The data link layer creates a communications circuits, are established between pairs of nodes. path between adjacent nodes. This layer frames DNA supports these virtual circuits for use messages for transmission on the channel con­ by two functions: the DNA routing layer, and necting the nodes, checks the integrity of special applications, such as communicating received messages, manages the use of channel with communication services built on top of resources, and, when required, ensures the X.25 and offered by the common carriers. integrity and proper sequence of transmitted DNA defines procedures for allocating X.25 data. Currently, there are three protocols resid­ virtual circuits between these two functions ing in the DNA data link layer: DDCMP, X.25, and for providing access to X.25 networks by and Ethernet. DNA nodes not directly connected to an X.25 DDCMP operates over synchronous or asyn­ interface. chronous communications links.15 It can operate Routing uses X.25 virtual circuits in much the in point-to-point configurations or in multipoint same way it uses point-to-point links. A single configurations in which communication takes X.25 virtual circuit can carry databetween many place between a control station and each of sev­ different nodes, and virtual circuits are used in eral tributary stations. DDCMP messages are tandem with DDCMP links and Ethernet IANs. framed as sequences of bytes, beginning with a The Ethernet IAN provides a communications single control byte indicating the message's start­ facility for high-speed communication among ing point and type (e.g., data or control). While computers located within a moderately sized DDCMP control messages have a fixed length, geographic area, such as a building or a campus. data messages have variable lengths, indicated by This IAN includes a data link layer and a physical the length field. On reception, this encoding layer, which can send data at a rate of 1 0 million

Digital Te cbnical]ournal 2 1 No. 3 September 1986 A Digital Ne twork Architecture Overoiew

bits per second. The Ethernethas a maximum sta­ assigned in blocks to all vendors who manufac­ tion separation of 2.5 kilometers with a maxi­ ture , thus permitting different propri­ mum of 1024 stations. A shielded etary and public protocols to coexist in a single is used as the physical medium. Ethernet also Ethernet station. Ethernet frames also contain a uses a branching, nonrooted tree topology. The 32-bit CRC to ensure that frames received in Ethernet IAN technology was jointly developed error are detected and discarded. by Digital Equipment Corporation, Intel Corpo­ Since the Ethernet physical channel can trans­ ration, and Xerox Corporation. 18 The Ethernet mit data only from one station at a time, the Eth­ specification, with minimal changes, has subse­ ernet data link protocol must allocate the single quently been standardized by the IEEE 802 Local channel among all the stations. This allocation is Area Networks Committee as IEEE Standard accomplished by the technique of CSMA/CD 802.3.19 (carrier sense multiple access with collision The Ethernet data link protocol provides a detection) . In this contention-based protocol, best-effort delivery service. Messages, called stations "listen" before transmitting (carrier frames, are transmitted over the physical chatmel sense) and defer their transmissions to other sta­ in a broadcast fashion. Stations are assigned tions already transmitting. 48-bit addresses, and each fr ame contains a Should several stations begin transmitting source address and a destination address. A frame simultaneously, a collision will occur, prevent­ can be addressed to an individual station or to a ing correct reception of any transmission. In this group of stations, using a 48-bit group address case the physical channel hardware in each col­ (called a multicast address in Ethernet terminol­ liding station will detect the collision (collision ogy) . A special multicast address, consisting of detection) and each station will reschedule 1 's, is used to denote the set of all stations on an transmission after a randomly selected delay. To Ethernet and is typically used for maintenance ensure efficient, stable operation of the network purposes. In the DNA architecture, the multicast under both low- and high-load conditions, this capability is used for network configuration pur­ random delay is adjusted on subsequent colli­ poses by the routing and network management sions by the back-off algorithm. This causes each layers. For example, a multicast address, speci­ station to reduce the load presented to the net­ fied by the architecture, is defined in the routing work under overload conditions. Studies have layer specification as the set of all routing nodes shown that this procedure provides good perfor­ on an Ethernet IAN. mance (low delay and high throughput) over a End nodes advenise their availability to rout­ range of Ethernet configurations and loads.20•21 ing nodes by periodically "hello" These studies have also shown that the proce­ messages to the multicast address. The large dure allocates resources fairly between compet­ 48-bit address space permits a unique address .to ing stations and operates stably under high-load be assigned to each Ethernet station when it is and overload conditions. manufactured. That address space permits sta­ Digital recently introduced the concept of tions to be plugged in to a IAN and operate with­ bridges and extended IANs as a means to extend out having addresses assigned manually. MOP the physical extent, number of stations, and uses this address when down-line loading com­ throughput capabilities of a single LAN. 22 puter systems, such as server systems with no Extended IANs operate transparently to higher­ mass storage. The 48-bit address can be used to level protocols, such as the.DNA protocols. Thus, select the correct program and parameters to be although not a part of the DNA architecture, loaded into the node, such as the 16-bit DECnet extended LANs - such as those built from node address. Ethernets and LANBridge 1 00s (Ethernet-to­ The Ethernet data link protocol fr ames mes­ Ethernet) - can be components of a DECnet sages using the propenies of the Manchester cod­ network. ing scheme employed by the physical channel to mark the beginning and end of each frame. In Physical Link Layer addition to source and destination addresses, The physical link layer transmits bits of informa­ frames employ a 16-bit protocol type field to tion between adjacent nodes. The functions in identify the higher-level protocol carried in this layer include encoding and decoding signals the frame. The protocol type field values are on the connecting channel, performing clock

22 Digital TecbnlcalJournal No. 3 September 1986 New Products

rec overy of received signals, and interfacing low-cost computer sy stems allows individuals to the communications channel to any processor own network nodes rather than sharing a single and memory used to implement higher-level timesharing system. Second, certain applica­ protocol functions. Implementations of this tions, such as network mail, need to operate layer encompass hardware interface devices across whole organizations. This breadth makes a and device drivers in operating systems, as single company-wide network, rather than sepa­ well as communications hardware suc h as rate independent networks, highly desirable. , transceivers, and the physical channels Indeed, there is even a need for networks that themselves. span multiple organizations, adding further to Protocols for the physical layer are rudimen­ the problems of sc ale, complicating network tary, emphasizing the specification of electric al management requirements, and creating new interfacing parameters. No special physical layer problems of net work security. DNA will have to specifications have been developed for DNA. evolve to adapt to muc h larger and more diverse Instead, it relies on industry standards for the networks. physical layer, thereby ensuring that DECnet As computer networks have beco me larger, products can operate over available communica­ users have developed increasing requirements tions technologies and infrastructures. Physical for networks that interconnect computer systems layer standards supported by DNA for wide area from multiple vendors. The International Stan­ networks include the EIA RS-232C and RS-423 dards Organization (ISO) has been developing specifications, and the CCITT V.24 and X.25 standards for suc h networks through their Open Systems program (OSI) . The OSl Level 1 specific ations. Physical layer stan­ reference model defines a network architecture dards supported by the DNA architecture for similar in many respec ts to that of DNA. 25 Most LANs include two baseband implementations major computer vendors, �ncluding Digital, have of ThinWire Ethernet, the original18 and the announced their support tor OSI and are begin­ thinwire2 3 specifications, and a broadband24 ning to deliver OSI network products. Digital has implementation. announced its strategy to incorporate OSI proto­ Future Directions fo r the DNA cols into its networking products, integrating Architecture them into the DNA arc hitecture. Future versions of the DNA arc hitec ture will correspond For ten years, DNA has evolved in four main to a mixture of standardized and proprietary dimensions: network applications, communica­ protocols. tions technologies, network size and scale, and diversity of supported computer systems. This References ability to evolve independentl y along four DECnet Digital Ne twork Architecture dimensions has proven to be one of the principal 1. (Phase IV) General Description benefits of the architecture. It is reasonable to (Bed­ assume that evolution along these lines will con­ ford: Digital Equipment Corporation, tinue. Local area and wide area communications Order No. AA-149A-TC, 1982) . technologies continue to evolve, typically resul t­ 2. DECnet Digital Ne twork Architecture ing in higher communications data rates. New (Phase IV) Network Management Func­ applications for computer networks and new tional Sp ecification (Bedford: Digital applications protocols will also continue to Equipment Corporation, Order No. evolve. The DNA architecture will continue to AA-X4 37A-TK, 1983). accommodate these trends. DECnet Digital Network Architecture The 16-bit node addresses used by DNA Phase 3. (Phase IV) Ma intenance Op erations IV currently limit the size of DECnet networks. · Fu nctional Sp ecification (Bedford: Digi­ Digital's own internal DECnet networkis nearing tal Equipment Corporation, Order No. the limits of this address space; over 10, 000 AA-X436A-TK, 1983). nodes are currently registered. Clearly, the archi­ tecture must be extended to support more 4. N. La Pelle, M. Seger, and M. Sylor, "The nodes. From our experience with this network, Evolution of Network Management Prod­ there are two separate reasons why networks uc ts," Digital Technical jo urnal (Sep­ continue to grow rapidly. First, the availability of tember 1986, this issue): 117-128.

Digital TecbnicalJournal 23 No. 3 September 1986 A Digital Ne twork Architecture Overview

5. DECnet Digital Network Architecture Equipment Corporation, Order No. Data Access Protocol (DAP) Fu nctional AA-K175A-TK, 1984) . Sp ecification (Bedford: Digital Equipment 16. CCITT Recommendation X. 25, CCITT Corporation, Order No. AA-K177A-TK, Ye llow Book , vol. VIII.2 (Geneva: lnterna- 1983). tional Telecommunications Union, 1981). 6. J. Morency, D. Porter, R. Pitkin, and D. 17. Standard ISO 8208 fo r Information Pro- Oran , ''The DECnetjSNA Gateway cessing Sy stems, "X.25 Packet Level Proto- Product - A Case Study in Cross Vendor col for Data Terminal Equipment," Inter- Networking,'' Digital Technical jo urnal national Standards Organization ( 198 5) . (September 1986, this issue) : 35-53. 18. Th e Ethernet: A Local Area Network, Data 7. P. Beck and J. Krycka, "The DECnet-VAX Link Layer and Physical Layer Sp ecifica- Product -An Integrated Approach to Net- tions, Version 2. 0 (Digital Equipment Cor- working," Digital Te chnical jo urnal (Sep- poration, Intel Corporation, and Xerox tember 1986, this issue) : 88-99. Corporation, Order No. AA-K759B-TK, 8. DEC net Digital Ne twork Architecture Ses- 1982). sion Control Layer Fu nctional Sp ecifica- 19. IEEE Project 802 Local Area Net- tion (Bedford: Digital Equipment Corpora- work Standards , "IEEE Standard 802.3 tion, Order No. AA-K 182A-TK, 1980) . CSMA/CD Access Method and Physical 9. DECnet Digital Network Architecture Layer Specifications," Approved IEEE Stan- (Phase IV) NSP Functional Sp ecification dard 802.3-1985, ISO/DIS 8802/3 Ouly (Bedford: Digital Equipment Corporation, 1983). Order No. AA-X439A-TK, 1983) . 20. J. Schoch and J. Hupp, "Measured Perfor- 10. C. Sunshine and Y. Dalal, "Connection mance of an Ethernet Local Area Network,'' Management in Transport Protocols," Communications of the ACM, vol . 23, Computer Networks , vol . 2 (December no. 12 (December 1980) : 711-721. 1978): 454-473. 21. F. Tobagi and B. Hunt, "Performance Analy- 11. W. Lai, "Protocol Traps in Computer Net- sis of Carrier Sense Multiple Access with works -A Catalog,'' IEEE Tra nsactions Collision Detection," Computer Net- on Communication , vol . COM-30, no. 6 works , vol . 4, no. 5 (October/November oune 1982): 1434-1449. 1980): 245-259.

12. DECnet Digital Network Architecture 22. W. Hawe, M. Kempf, and A. Kirby, "The (Phase IV) Routing Layer Fu nctional Extended Local Area Network Architecture Sp ecification (Bedford: Digital Equipment and LANBridge 100," Digital Te chnical Corporation, Order No. AA-X43 5A-TK, jo urnal (September 1986, this issue) : 1983). 54-72. 13. L. Pouzin, "Presentation and Major Design 23. IEEE Project 802 Local Area Networks , Aspects of the Cyclades Computer Net- "Medium Attachment Unit and Baseband .work," Th ird IEEEjA CM Data Communi- Medium Specifications, 1 OBASE2," IEEE cation Sy mposium (November 1973): P802.3/83j0.21E, Section 10. 80-87. 24. IEEE Project 802 Local Area Networks , 14. R. Jain and W. Hawe, "Performance Analy- "Broadband Medium Attachment Unit and sis and Modeling of Digital's Networking Broadband Medium Specifications," IEEEw Architecture," Digital Technical jo urnal P802.3j84j0.46B, Section 11. (September 1986, this issue) : 25-34. 25. Standard ISC 7498 fo r Information Pro- 15. DECnet Digital Network Architecture cessing Systems , "Open International Stan- Digital Data Communications Message dards Organization ( 1982). Protocol (DDCMP) Fu nctional Sp ecifi- cation, Ve rsion 4. 1. 0 (Bedford: Digital

24 . Digital TechnicalJo urnal No. 3 September 1986 Raj]ain William R. Hawe I

Perfonmance Anarys� and Modeling of Digital,s Ne tworking Architecture

Digital has some of the highest performing networking products in the industry today. Transfe r rates of 3.2 megabits per second and higher have beenmeasured on an Ethernet. These high sp eeds resultfr om care­ fulperf ormance analyses and planning at all stages of the development cycle.A set of case studiesiUustra tes these analyses. Thesestudies include performance modeling fo r adapter placement in the physical layer; buffering in the data link layer; path sp litting in thenetwork layer; cross­ channel piggybacking, timeout and congestion algorithms in the transportlayer; and file transfer and terminal communications in the application layer. Completing the paper are studies on network traffic measurements and workload characterization.

Performance analysis is an integral pan of the Each of those organizations has a team of architectural design and implementation of net­ performance analysts who ensure that the best works at Digital Equipment Corporation. This alternatives are chosen at each stage. The sales deliberate strategy has helped to make us the organization also measures product performance industry leader in networking products. Some of and develops capacity planning and perfor­ these products have the highest performance mance-tuning tools. The field support organiza­ available today. Task-to-task transfer rates of tions monitor performance at customer sites and more than 3.2 megabits (Mb) per second have feed the information back to the development been measured on an Ethernetlocal area network organizations. They then improve the product (LAN) connecting two MicroVAX II systems.1 through revisions, field changes, and updated This paper describes a number of case studies models. ' that illustrate the analyses done to improve the To conduct performance studies, we use ana­ performance of Digital's network products. lytical modeling, simulation, and the taking of These analyses are ongoing; they are planned for appropriate measurements. Which of those tech­ every stage in the life cycles of products. The niques to use depends upon the product devel­ design life cycle of a product consists of the fol­ opment stage and the time available to do the lowing stages: conceptualization, prototyping, study. Queuing theory, including operational marketing research, development, sale, and field analysis, is extensively used in analytical model­ support. Each stage takes place in a different ing. 2·3 Simulation models are usually developed organization within Digital. A research organiza­ to solve specific problems4•5•6 or often they are tion usually conceives an idea for a new product. solved via queuing network solvers. Measure­ An advanced development team then develops ments of operational characteristics are taken of the architectural specification and builds a pro­ the system as well as the workload, using both totype to demonstrate the feasibility of the idea. software and hardware monitors. Traffic mea­ In tum, the marketing organization decides if the surements are taken on Digital's own networks, product can be sold and how competitive it will as well as on those at customers' sites.1•7 Tools be. If they decide that the idea should become a for capacity planning, monitoring, and model­ product, the development organization will per­ ing are also used by the teams doing these per� form that task. formance analyses.8·9 Sometimes we have to

Digital TecbnicalJournal 25 No. 3 September 1986 Performance Analysis and Modeling of Digital's Networking Architecture develop new performance metrics10 or statistical the adapters should be placed only at those computation algorithms.11 marks. That spacing was determined from a This paper presents the diversity of the perfor­ model that simulated many different r:andom ' mance analysis techniques used to ensure that placements of a given number of adapters on the our networking products operate at high effi­ cable and determined the worst-case reflection. ciencies. Many performance studies of our prod­ The simulation model showed that the worst ucts have been published; we do not intend to case occurs when approximately 1 00 adapters reproduce them here. We have selected a repre­ are placed on the marked cable. With 100 nodes, sentative group of unpublished case studies to the reflectedvoltage exceeded 10 percent of the illustrate the diversity of our approach to perfor­ true signal in only 24 of the 10,000 configura­ mance improvement. One typical problem from tions that were simulated. In fact, the maximum each of the key layers of the networking architec­ reflection observed for any placement was 12.1 ture will be discussed. A discussion of workload percent, well below the 25 percent noise characterization and traffic analysis will close allowance. the paper. It is easy to see why the 1 00-adapters case performs worse than other cases with both Physical Layer Peiformance more and fewer adapters. With the cable marked We conducted many performance studies within at 2.5-meter intervals, a single Ethernet seg­ Digital to help set the parameters of the 10 Mb­ ment (500 meters) can accommodate up to per-second EthernetLAN . This is the same Ether­ 200 adapters. When the number of adapters is net that, with certain modifications, we pro­ small, their reflectionswill be too small to cause posed for standardization and was later adopted any problem. On the other hand, if the number as the IEEE 802.3 standard. The two most inter­ of adapters is close to the maximum of 200, the esting problems in the physical layer design are reflections from neighboring adapters will tend clock synchronization (phase lock loop versus to cancel each other out. counter) and the placement of adapters on the The cable marking alone' is no guarantee Ethernet cable. We describe the latter problem against experiencing reflection problems. Given and the proposed solution below. this or any other marking guideline, it is still pos­ At each adapter, some fraction of the incoming sible to position adapters so that the reflections signal is reflected back along the cable. If reinforce. This happens if the adapters are adapters are placed in close proximity, their placed A/2 apart, A being the wavelength at reflections may reinforce each other and inter­ which transmissions are taking place. For exam­ fere with the signal. ple, for a 1 0-MHz signal traveling at a speed of The adapter designers had specified that a total 234 meters per microsecond, A is 23.4 meters, noise level of 25 percent of the true signal level the speed divided by the frequency. Hence, if the was an acceptable limit. Since half of this noise adapters are placed approximately 11.7 meters normally comes from other noise sources apart, their reflections will reinforce. (sparks, radiation, etc.), the reflected voltage must be less than 12. 5 percent of the signal Data Link Layer Performance level. A number of our studies about the performance The cumulative reflection is actually strongest of the data link layer in Ethernet have already at the itself because of the attenua­ been published.12•13 M. Marathe compared five tion of the signal and its reflectionas they propa­ back-off algorithms and concluded that none was gate through the cable. Since the transmitter is significantly better than the binary exponential not adversely affected by the reflection, how­ back-off algorithm.12 This simulation-based anal­ ever, adapters placed next to the transmitter are ysis also showed that the number of retries the most sensitive to reflection problems. There­ should be increased from the original 8 to 16. fore, those adapters were the best candidates for Response times at the user level have also been analyzing problems caused by reflections. studied. Such studies show that a 10 Mb-per-sec­ It is essential to maintain some minimum ond Ethernetcan support up to several thousand separation between adapters. To assist network timesharing users.14 A capacity planning tool was installers, the Ethernet cable is marked a:t developed to study the system-level performance 2.5-meter intervals; the specifications state that for any given configuration.6 The performance

26 Digital TecbnicalJournal No. 3 September 1986 New Products

studies of extended LANs are discussed in this packets in the buffers on the adapter, without issue of the Digital Te chnical ]o urnal.1 5•16 using an y backplane bus to receive The case study described here uses a very sim­ them. However, to receive the packets, the host ple analytical model to assist in designing an must copy them to buffers elsewhere in memory Ethernet adapter. with a programmed move: Three common approaches used for interfac­ In Case C the packets flow in real-time into in g machines to a LAN are depicted in Figure 1 . buffers in the host memory. The backplane is iso­ Case A represents the approach used in the lated from the channel with a first-in, first-out Digital UNIBUS Ethernet Adapter (DEUNA) (FIFO) control point. This approach reduces the product. Case B represents the approach used in overhead on the host, as well as that on the the Ethernet adapters made by 3COM Corpora­ adapter processors. In this case, however, exces­ tion for UNIBUS an d Q-bus systems. Case C rep­ sive DMA-request latencies may cause overflow resents the approach used in adapters like the in the FIFO control; hence a packet may be lost Digital Q-bus Ethernet Adapter (DEQ NA) pro­ when received. When packets are transmitted, duct. Each approach has certain advantages and these latencies may cause an underflow in the disadvantages. FIFO con trol; hence a packet may be aborted. In Case A the packets received are first buff­ The performance of the DEUNA adapter is sen­ ered on the adapter, then moved via direct mem­ sitive to two factors: the number of receive ory access to buffers in the host's memory. Pack­ buffers in the adapter, and the number of words ets to be transmitted follow the same path, to be transferred per UNIBUS capture. The num­ except in reverse. The throughput limit of the ber of receive buffers chosen affects the packet device is limited by how quickly it can move loss rate in the adapter. The number of words packets between the adapter and the host' s transferred affects the response to disks an d other buffers. devices on the UNIBUS system. Transferring too In Case B the packets received are also buff­ many words per UNIBUS capture may cause a ered on the adapter. In this case, however, the disk to experience "data lates, " in dicating that packet buffer memory is dual ported, with the the disk could not get the bus for data transfer host side being mapped in to the address space of within the required time. the host. This scheme allows the host to examine We wanted to know if the number of buffers chosen by the designers of the DEUNA adapter would cause these problems to occur. BACKPLANE HOST We used a simple analytical model to deter­ LAN ADAPTER BUS MEMORY mine the packet loss rate an d a simulation model to determine the wordsper UNIBUS capture. The •• simulations showed that the DEUNA adapter A L _r-;;l should transfer only on e word per UNIBUS cap­ � ture. The an alytical model is described here. The packet-arrival process is assumed to fol­ low a "bulk Poisson" distribution in which bursts of packets arrive at a rate of A. bursts per B second. The number of packets per burst is assumed to be geometrically distributed with a mean B. The burst size is thus described by the COMMON ADDRESS SPACE formula (k I) Pr (k packets in burst) =(1- 1/B) X B - , k � 1

For mathematical con venience we assume that all service times are exponentially distributed: this assumes that packet lengths are exponen­ tially distributed, with a mean of L words per packet. The UNIBUS bandwidth is approximately Figure 1 Th ree Ways of Organizing Buffe rs 800,000 words per second. A fraction of that ban dwidth, J.L words per second, is available for

DigiUU TecbnicalJournal 27 No. 3 September 1986 ------Performance Analysis and Modeling of Digital's Networking Architecture

the transfers between the DEUNA buffer and memory. The rest of the bandwidth is used up by transfers betWeen the disk and memory, and by other devices on the UNIBUS system. 0.04 If the DEUNA buffer has a capacity fo r not B = BURST SIZE more than N packets, then any packet arriving at i:: 0.03 a full buffer will be lost. Let us first calculate the ::::; probability P ( n), 0 < n < N, that there are n ii5 � packets (including the one, if any, in servi�e) a? 0.02 queued at the DEUNA adapter. The distribution 11. (/) (/) of the number of packets in the queue has a rela­ 0 tively simple form: ...J 0.01 N P(n) = (1 - P)/(1 - p X a ) for n = 0 and

and ing case study illustrates how simple analytical a =l-(1 -p)jB models were used to demonstrate that path split­ ting can significantly improve a network's per­ formance. If B = 1, the distribution of the arrival process reduces to an ordinary Poisson distribution, and Assume there are M packet sources in a net­ P( n) reduces to the classical solution of a space­ work and that the ith source has a rate of L X A1 i M limited M/M/ 1 queue. packets per second, for 1 < < and some The packet loss probability is real constant L. (We increase or decrease all traf­ fic by varying L). Assume further that there P(loss) = P(N) X (B - 1 + p)jp are N paths in the network and that the jth Here, P(N) is the probability that the DEUNA path has a speed of 1-LJ packets per second, for adapter is totally full. Notice that the loss proba­ 1 0. In this for transfers between the DEUNA adapter and case all source i packets are sent on path j. memory. Curves for other arrival rates and avail­ 2. Equiprobable Splitting - For each packet able bandwidths can be similarly plotted. The from source i, select a path} E S1 with proba­ curves show clearly that the designers' choice of bility 1 j1 S1 1. In this case the packet is sent on 1 3 receive buffers will result in a loss rate of less path j and successive paths are chosen inde­ than one percent, even with a UNIBUS system pendently. that is relatively heavily loaded. For a large enough overall load factor L, the Network Layer Per:formance mean waiting time per packet under equiproba­ The concept of path splitting was introduced in ble splitting will be much less than it would be Phase IV of the Digital Network Architecture under no splitting. This can be proven by show­ (DNA) . In earlier DNA versions, the routers main­ ing that, with no splitting, there exists a possible tained only one path to each destination even if set of path assignments in which at least one path several paths of equal cost existed. The follow- will saturate before any path saturates under

28 Digital TecbnlcaiJournal No. 3 September 1986 New Products

equiprobable splitting. Since the mean waiting time on a saturated path is infinite, the average waiting time of all sources over all paths will 8 SERVICE RATE = 1 include an infinite term and therefore will also be infinite. The performance impact of path splitting can be seen from the following example. Assume the simple configuration of two and three lines shown in Figure 3. 1 has access to paths 1 and 2; sender 2 to paths 2 and 3. Each sender selects either accessible path with equal probability. Without path splitting, both sources might select the same path (path 2) with 0 �--��--��--�----�----� probability 1/4, or select separate paths with 0.0 0.1 0.2 0.3 0.4 0.5 probability 3/4 . The mean waiting time (assum­ ARRIVAL RATE ing M/M/1 servers) is NOTE: ALL TIMES ARE NORMALIZED BY = � X)) + X)) Wn 3/4 X (1/( - 1/4 X (1/(�-2 X THE PACKET SERVICE TIME

Figure 4 Mean Wa iting Time .JJI.)-;- PATH 1 X ·PATH 2 With equiprobable path splitting, the input rate .JJI.)-;- to paths 1 and 3 is X/2 and the rate to path 2 is X. �PATH 3 The probability of a packet following path 1, 2, L___:_j� JII)-;;- or 3 is 1j4, 1/2, and 1/4 respectively. The mean waiting time is (a) Two Sources and Three Paths Ws = (1/4 + 1/4) X (1/(JL-Xji)) + 1/2 X (1/(JL-X)) The values of the mean waiting time both with and without splitting, Ws and Wn respectively, 87 .JJI.)-;-! � are illustrated in Figure 4. Obs erve that satura­ IIO-;­ tion occurs much earlier when there is no II I splitting. Er-� Ii� PROBABILITY 3/4 PROBABILITY 1/4 Another advantage of path splitting is that it : makes traffic less bursty. Bursty traffic presents a serious problem in performance control, both in (b) No Splitting average performance and in predictability. With bursts, the mean waiting time can greatly increase; in fact, if there is an average B packets per burst, then the mean waiting time will be about B times that value predicted by an identi­ cally loaded M/M/ 1 queueY The waiting time variance will similarly increase, since the first and last packets in a burst will experience mark­ edly different waiting times. The overall perfor­ mance is very difficult to control or guarantee in such a situation. Path splitting has a major advantage in this sit­ (c) Equiprobable Sp litting uation because it breaks up the bursts, sending each packet over a different path. The perfor­ Figure 3 Two Senders Tra nsmitting on mance of a network with bursty traffic is thus Three Lines appreciably improved. In fact, if there are more

Digital Te cbnicaljournal 29 No. 3 September 1986 Performance Analysis and Modeling of Digital's Ne tworking Architecture

paths usable by a source than there are packets One key lesson we learned from the timeout per burst, then burstiness will have little effect algorithm research was that a timeout is also an on either the mean or the variance of waiting indicator of congestion in the network. There­ time. On the other hand, only two or three alter­ fore, not only should the source retransmit the native equiprobable paths are enough to packet on a timeout, but it should also take decrease the bursty-packet waiting time from action to reduce future input into the network. one-half to two-thirds for the first hop. The There is a timeout-based congestion control pol­ packet bursts will tend to spread apart as they icy called CUTE (congestion control using time­ propagate, so that the improvement in subse­ outs at the end-to-end layer) that manages these quent hops will be somewhat less. actions.19 Among the new features of DNA Phase IV are TransportLa yer Performance cross-channel piggybacking, acknowledgment Several studies have been published on the per­ withholding, and larger flow-control windows. formance of the transport layer in the DNA struc­ These features were introduced as the result of a ture. 18·19·20·21 One of the published studies is on study that concluded that straightforward termi­ timeout algorithms. We found that under sus­ nal communication over a DECnet network tained loss, all adaptive timeout algorithms would be slow. This conclusion lead eventually either diverge or converge to values lower than to the development of a new local area transport the actual round-trip delay. 18 If an algorithm protocol, called LAT, for terminal commun­ converges to a low value, it may cause frequent ications. These enhancements were also added unnecessary retransmissions, sometimes leading to the DNA transport protocol . This study is to network congestion. Therefore, divergence is described below. preferable in the sense that the retransmissions In the DNA structure, each transport connec­ are delayed. tion has two subchannels: one for the user, and one for control. The user subchannel carries user data and their acknowledgments, called acks. HOST TERMINAL NODE NODE The control subchannel is used for flow-control packets and their acks. Protocol verification can APPLICATION TRANSPORT TRANSPORT APPLICATION LAYER LAYER LAYER LAYER be easily achieved if the two subchannels are CREDIT = 1 independent so that information on one channel is not sent on the other. In studying terminal communications over a IAN, we discovered that each terminal read took eight transport protocol data units (TPDUs) , as shown in Figure 5. Each unit consists of two application level packets: a read request, and a data response. Each packet requires a link service packet from the respective receiver; this service packet permits the sender to send one packet. The remaining four units are transport level acks for these four packets. Given the CPU time required per packet, we computed that communication for remote termi­ nals takes four times as much CPU time as that for local terminals. Therefore, our goal was to Figure 5 Eight Transport Level Pa ckets improve performance by a factor of four. We pro­ ceeded in three ways to solve this problem. First, we modified application programs to utilize larger flow-control windows; seco·n d, we searched for ways to reduce the num ber of pack: 'ets per I/0 o eration; third, we tried to reduce the CPU timep required pe r packet. The first goal was achieved by multibuffering, discussed later in the section "Application Layer Perform ance."

30 Digital Te cbnical]ournal No. 3 September 1986 New Products

The second goal was achieved by munication performance issues. The new LAT protocol has been designed to provide efficient • Cross-channel piggybacking-This tech­ terminal communication. This protocol and its nique allows transport control acks to be pig­ performance are described in this issue of the gybacked on normal data packets or acks. Digital Te chnicaljourna/ .22 In this section, we • Delayed acks -The receiver can delay an ack will describe some performance· issues in file for a small interval. This delay increases the transfer. prob ability of the ack being piggybacked on File transfer. in DNA takes place via a network the next data packet. ob ject called a file access listener (FAL); which in turn uses an application level protocol called • Ack withholding -The receiver does not the disk access protocol (DAP). Measurements of acknowledge every packet, particularly if an initial version of FAL revealed that the remote expecting more packets from the source. The file transfer took an excessively long elapsed source can explicitly tell the destination to time. A subsequent analysis showed that the sin­ withhold sending an ack by setting a "No Ack gle-block "send-and-wait" protocol used by FAL Required" bit in a packet. was responsible for that excessive ti me. The • No flow control -This option allows flow local FAL waited for the remote write operation control to be disabled for those applications to finish before sending the next block. Thus the operating in request-response mode and thus advantage of larger flow-control windows · having a flow-control mechanism at the appli­ offered by the transport protocols were ignored cation level. by the application software.:The suggested reme­ dies were to allow multiblocking and multi- • Multiple credits per link service packet ­ · buffering. Credits are not sent as soon as each buffer Multibuffering consists of allowing several becomes available. Unless the outstanding buffer writes to proceed simultaneously, an credits are very low, a link service packet is action similar to the window mechanism used at sent only when a reasonable number of buffers the transport layer. Multibuffering allows paral­ becomes available. lel operations at the source and destination To achieve the third goal, reducing the CPU nodes and at the link, thus considerably reducing time per packet, we used a hardware monitor to the elapsed time and enhancing throughput. measure the time spent in various routines. We Experiments have shown that there is consider­ found that in a single-hop loopb ack experiment, able gain in throughput as the buffering level only one third of the CPU time at the source was increases from one to two. Further increases do attributable to DECnet protocol routines. The result in better performance, but the amount of remainder was associated with the driver for the gain is smaller. line adapter; operating system functions, such as Multiblocking consists of sending more than buffer handling and scheduling; and miscella­ one block per FAL write, which decreases CPU neous overheads associated with periodic events, time and the di sk rotational latency (the time such as timers, status updates. Of the time spent spent in waiting for the disk to come under the in the DECnet protocol, 30 percent was spent in heads at the start of each write). As with multi­ counter updates and statistics collection. Simi­ buffering, the elapsed time is considerably larly, 21 percent of the time spent in the link reduced and the throughput is enhanced. driver was used in a two-instruction loop that implemented a small delay. The net result of Wo rkload Characterization and modifying these routines and im plementing the Traffic Analysis architectural changes mentioned above is that The results of a performance analysis study we achieved our target of improving the perfor­ depend very heavily on the workload used. To mance by a factor of four. keep up with continuously changing load char­ acteristics, we regularly conduct system and net­ Application Layer Performance work workload measurements. Workload charac­ The three key network applications are file trans­ terization studies enable us not only to use the fer, mail, and remote terminal communications. correct workload for our analysis but also to Earlier, we discussed some of the terminal com- implement our products mo re efficiently.

Digital TecbnicalJournal 31 No . 3 September 1986 ------Performance Analysis and Modeling of Digital's Ne tworking Architecture

A study of the sy stem usage behavior at six dif­ would choose alternativeH, which performs bet­ ferent universities showed that a significant por­ ter than L under heavy load but worse under light tion (about 30 percent) of the user's time is load. Our view of this choice is quite different. spent in editing.23 This conclusion led us to use We feel that, while high performance at heavy our text editor (EDT) as the key user level bench­ load is important, it should not be obtained at mark for network performance studies. The study the cost of significantly lower performance at results also led to the transport layer perfor­ normal, light load levels. Therefore, the choice mance improvements discussed earlier. between L an d H would also depen d upon the A study of network traffic at M.I.T. showed that performance of Hat low loads. the packets exhibit a "source locality."7 That is, given a packet going from A to 8, the probability is very high that the next packet on the link will be going either from A to 8 or from 8 to A. These observations helped us improve our packet-for­ warding algorithm in bridges. The forwarding w () decision is cached for use with the packets arriv­ z < ing next. A two-entry cache has been found to ::iE a: 0 produce a hit rate of 60 percent, resulting in sig­ u. a: nificant savings in table lookup. w c.. The principal cause of source locality is the increasing size of data objects being transported over computer networks. The sizes of data objects have grown faster than packet sizes have. H (HEAVY) Packet siz es have generally been limited by the buffer sizes and by the need to be compatible LOAD with older versions of network protocols. Trans­ fer of a graphic screen could involve data trans­ Figure 6 Preferred Alternatives fers of around two million bits. This in crease in information size ·means that most communica­ tions involve a train of packets, not just one Traffic monitoring is also used to study packet. The commonly used Poisson arrival the performance of networking architectures. model is a special case of the train modelJ Table 1 shows a breakdown of the DECnet traffic The two major components of a networking during the normal working hours at the same workload are the packet size distribution and engineering facility. All values represent the the interarrival time distribution. ]. Shoch and average of several 15-minute sampling intervals. ]. Hupp made the classic measurements of these The maximum and minimum values observed components for Ethernet traffic. 24 Their tests during the monitoring period give an idea of the have been repeated many times at many places, large variability. The DECnet traffic typically including Digital. The bimodal nature of the accoun ts for 86 percent of the total packets packet size distribution and the bursty nature of at this facility. The routing overhead is very low the arrivals are now well accepted facts; we will (5 percent). The protocol overhead comes not elaborate further on them. mostly from the end communication layer (ECL), The utilization of networks is gen erally very which provides error con trol (acks) , flow con­ low. Measurements of Ethern et traffic at one of trol, sequencing, an d connection man agemen t. our software en gineering facilities with 50 to 60 Forty-four percent of DECnet packets an d 39 per­ active VAX nodes during normal working hours cent of DECnet bytes are user-transmitted data. showed that the maximum utilization during any Thus the ECL overhead is approximately one 15-minute period was only 4 percent. Although packet per user packet, which is low considerin g higher momentary peaks are certainly possible, that most ECL connections are of short duration the key observation, confirmed by other studies (one file transfer of a few blocks) . Furthermore, as well, is that the network utilization is nor­ the results of this study confirm that we actually mally very low. While comparing two alterna­ did reduce the transport packets per application tives, say H and L in Figure 6, some analysts level packet by 50 percent.

32 Digital TecbnicalJournal No. 3 September 1986 New Products

Table 1 DECnet Packet Statistics

Average Maximum Minimum

DECnet packets (percent of total packets) 86 99 60

DECnet bytes (percent of total bytes) 68 99 32

Routing packets Percent of DECnet packets 4 52 1 Percent of DECnet bytes 5 66 2

Transport (ECL) packets Percent of DECnet packets 96 99 48 Percent of DECnet bytes 95 98 34

ECL data packets Percent of DECnet packets 44 51 3 Percent of DECnet bytes 60 81 4

User transmitted data Percent of DECnet bytes 39 68 2 Percent of ECL data bytes 65 84 50

lntraEthernet ECL data packets Percent of DECnet packets 79 91 34 Percent of DECnet bytes 79 93 24

The table also shows that, typically, 80 per­ Acknowledgments cent of all packets and bytes are used in intranet­ The studies reported in this paper were done work commu nication. That is, only 20 percent of over a period of time by many different people, the observed traffic originated from or was des­ some of whom are no loqger with Digital. We tined for a node not in the facility. would also like to acknowledge Dah-Ming Chiu (DEUNA bu ffers) and Stan Amway (traffic mea­ Summary surements) . We would like to express ou r grati­ Performance analysis is an integral part of the tude to them and other analysts for allowing us to design and implementation of network architec­ include their results in this paper. tures at Digital. Analytical, simulation, and mea­ su rement techniqu es are used at every stage of a References network product's life cycle. This conscious 1. W. Hawe, "Technqlogy Implications in effort has made Digital the indu stry leader in net­ LAN Workload Ch�racterization," Pro­ working. ceedings of the 1985 International Over the past decade, the link speeds have Wo rkshop on Wo rkload Characteriza­ increased by two orders of magnitude; however, tion of Computer Sy stems and Co m­ the performance at the user application level has puter Ne tworks (October 1985) : 111- not increased in proportion, mainly because of 130. high protocol processing overhead. The key to 2. K. Ramakrishnan and W. Hawe, "Perfor­ produ cing high performance networks in the mance of an Extended LAN for Image fu ture, therefore, lies in reducing the processor Applications," Proceedings of the Fifth overhead. International Ph oenix Conference on We have described a number of case studies Co mputer Co mmunications (March that have resulted in higher performance for the 1986): 314-320. Digital Networking Architecture. This perfor­ mance increase has come about by reducing the 3. M. Marathe and S. Kumar, "Analytical nu mber of packets, simplifying the packet pro­ Models for an Ethernet-like Local Area cessing, and implementing protocols efficiently. Network Link, " Proceedings of SIGMET­ RICS 'Bl (1981) : 205-215.

Digital Tecbnlcaljournal 33 No . 3 September 1986 ------Performance Analysis and Modeling of Digital's Ne tworking Architecture

4. I. Chlamtac and R. Jain, "Building a Simula- tion Us ers Group Eighteenth Meeting tion Model for Efficient Design and Perfor- (CPEUG '82) (October 1982): 375-389. man ce Analysis of Loca1 Area Networks, " 15. W. Hawe, M. Kempf, and A. Kirby "The Simulation (February 1984): 55-66. Extended Local Area Network Architecture 5. R. Jain, "Using Simulation to Design a Com- and LANBridge 100, " Digital Te chnical puter Network Congestion Control Proto- jo urnal (September 1986, this issue): col, " Proceedings of the Sixteenth Annual 54-72 0 Modeling and Sim ulation Conference 16. W. Hawe, A. Kirby, an d B. Stewart, "Trans- (April 1985): 987-993. parent Interconnection of Local Networks 6. J. Spirn, J. Chien, an d W. Hawe, "Bursty with Bridges, " jo urnal of Telecommunica- Traffic Local Area Network Modeling," tions Networks , vol. 2, no. 2 (September IEEE jo urnal on Selected Areas in Commu- 1984): 117-130. nications , vol. SA C-2, no. 1 (Jan uary 17. J. Spirn, "Network Modeling with Bursty 1984): 250-258. Traffic and Finite Buffer Space, " Proceed- 7. R. Jain an d S. Routhier, "Packet Trains: ings of the ACM Computer Ne twork Perjor- Measurements an d a New Model for Com- mance Sy mposium (April 1982): 21-28. puter Network Traffic, " IEEE jo urnal on 18 . R. Jain, "Divergence of Timeout Algorithms Sp ecial Areas in Communications (Forth- for Packet Retran smissions, " Proceedings coming, September 1986). of the Fifth International Phoenix Confe r- ence on Computer Communications 8. N. La Pelle, M. Segar, an d M. Sylor, "The (March 1986): 174 -179. Evolution of Netw ork Man agement Prod- ucts, " Digital Te chnical jo urnal (Septem- 19. R. Jain, "A Timeout Based Congestion Con - ber 1986, this issue): 117-128 . trol Scheme for Window Flow Controlled Networks, " IEEE jo urnal on Sp ecial Areas 9. M. Sylor, "The NMCCjDECnet Monitor in Communications (Forthcoming, Oc to- Design,'' Digital Technical jo urnal (Sep- ber 1986). tember 1986, this issue): 129-141. 20. K. Ramakrishn an, "Analysis of a Dynamic 10. R.Jain , D. Chiu, an d W. Hawe, "A Quan tita- Win dow Congestion Control Protocol in tive Measure of Fairness an d Discrimination Heterogeneous Environments Including for Resource Allocation in Shared Com- Satellite Links," Proceedings of the Com- puter Systems, " Digital Equipment Corpo- puter Ne tworking Sy mposium (Forthcom- ration Internal Research Report, TR-30 1 ing, November 1986). (1984). 21. D. Chiu, "Simple Models of Packet Arrival 2 11. R. Jain and I. Chlamtac, "The P Algorithm Control," Digital Equipment Corporation for Dyn amic Calculation of Quantiles -and InternalTechn ical Report, TR-326 (1984). Histograms without Storing Observations,'' Communications of the ACM, vol. 28 , 22. B. Mann, C. Strutt, and M. Kempf, "Termi- no. 10 (October 1985): 1076-1085. nal Servers on Ethernet Local Area Net- works," Digital Te chnical jo urnal (Sep- 12. M. Marathe, "Design Analysis of a Local tember 1986, this issue): 73-87. Area Network, " Proceedings of the Com- puter Ne tworking Sy mposium (December 23. R. Jain and R. Turner, "Workload Charac- Pro- 1980): 67-81. terization Using Image Accounting,'' ceedings of the Computer Performance 13. W. Hawe and M. Marathe, "Performance of Evaluation Us ers Group Eighteenth a Simulated Programming Environment, '' Meeting (CPEUG '82) (October 1982): Proceedings of Electro '82 (1982): 111-120. 21/4/1-8. 24 . J. Shoch an d]. Hupp, "Performance of the 14 . M. Marathe and W. Hawe, "Predictin g Eth- Ethernet Local Network, '' Communica- ern et Capacity-A Case Study, " Proceed- tions of the ACM, vol. 23, no. 12 (Decem- ings of the Computer Performance Evalua- ber 1980): 711-721.

34 Digital Te chnical]ournal No . 3 September 1986 john P. Morency David Porter Richard P. Pitkin David R. Oran The DECnet/SNA Gateway Product A Case Study in Cross Ve ndor Ne tworking

Connecting Digital's network products with those from IBM Corporation has been a problem, since the network architectures diifer. SNA has a hierarchicalnode structure and a subset architecture supporting logical unit types. In contrast, the DNA architecture has a symmetric peer-to­ peer structure, with all nodes free to communicate. The DECnetjSNA Gateway product allows components from both networks to communi­ cate. Its architecture enables cross-network connections and sp ecifi es how messages will be structured. A significant fe ature is network man­ agement to measure the performances of components. The software has three servers thatfa cilitate the flowof data across the gateway.

Recent technological trends in the computer of these studies have tended to vary based upon industry have rapidly brought networks to be the organizational need; hO\yever, the fo llowing equivalent of systems. It is true that computer common questions can be iidentified1: networks have long been used to realize dis­ ' • What fu nctions will be provided to the end tributed applications. Until recently, however, users? such networks generally represented isolated pockets of computing power within organiza­ • Should those functions be equally accessible tions. Now, increased pressures to both reduce by users in all the interconnected subnet­ costs and achieve greater organizational produc­ works? tivity have stimulated the drive for more effec­ • What security constraints must be in effect to tive network integration. In the future, compa­ prevent network resources from being com­ nies will be establishing single information promised? "utilities" that will allow end users to access needed resources without regard to their physi­ • What level of transparency can be provided to cal locations. end users so that acc�ss to new network Many issues must be resolved to create this sin­ resources is accomplishfd via existing mecha- gle, integrated structure. One of the most signifi­ nisms? f cant, addressed in this paper, is how to deal with • What level of network protocol compati­ multi-vendor computing equipment within a sin­ bility will be required to allow effective inter­ gle organization. Here, the problem is not only working between any two arbitrary subnet­ one of establishing common communications works? protocols, but also integrating that support into end-user computing to minimize the disruption • What levels of performance capability will be of existing services. Clearly, the scope of this required? problem increases as a function of .thenumber of • What "political" considerations have to be incumbent vendors.

taken into account when interconnecting· sub- Network Interconnection Issues networks? The interconnection of systems into a network • How can the combined network be most has been the subject of many studies. The goals effectively managed?

Digital Te chnicalJo urnal 35 No. 3 September 1986 · The DECnet/SNA Gateway Product

• How effectively can component fault isolation Th e Advent of Op en Sy stems be accoryplished? Fortunately for this organization (and many of ' • How will resource utilization be cross- the world's major corporations fit into this cate­ charged? gory) , the international standards process is beginning to provide a framework to solve this • How effectively can the combined network problem. Substantive definition is now under­ migrate to new technologies? way of the services and protocols at each layer of The most important prerequisite for answering the Open Systems Interconnect (OSI) model, these questions is a clear understanding of end­ shown in Figure 1. Common services spanning a user needs at all times. Failure to understand multitude of vendor equipment will begin to be those needs can result in a significant expendi­ realized by the end of the present decade. In ture of effort to solve the wrong problem, usually some cases, subsets of OSI services are being uti­ at the wrong time. lized much now by major users to bring about standards in particular application areas. Two Possible Solutions to These prominent examples are General Motors with its Questions Manufacturing Automation Protocol (MAP) for Two primary approaches to answering these factory applications, and Boeing Computer Ser­ questions are possible. First, an organization vices with its Technical Office Protocol (TOP) could take upon itself the effort of building cus­ for the office environment. tom hardware, software, and procedures to effect These efforts are significant and over time are the desired solution. Unfortunately, this certain to create a much more open environ­ approach is usually an enormous task with draw­ ment. However, what types of solutions are backs in terms of cost, time, maintainability, and available now for organizations whose applica­ SyStem migration, to name but a few. Some orga­ tions do not fit these examples and who cannot nizations have done it, however, with varying wait for full OSI implementations? Such organi­ degrees of success. zations are generally faced with either building The second approach is to use standard prod­ a custom solution or making use of vendor-sup­ ucts as the means to the desired end. This plied solutions. Two prominent examples of approach that can take several forms: the second approach are the Systems Network Architecture (SNA) from IBM Corporation and 1. An organization could acquire computing the Digital Network Architecture (DNA) from equipment from only one vendor. While cer­ Digital Equipment Corporation. The next few tainly limiting the intercommunication risk, sections introduce the key properties of these this approach has potential drawbacks in terms of flexibility and cost-effectiveness. Furthermore, it creates the risky situation of

a business's having only a single supplier for APPLICATION (LAYER 7) a key organizational resource. 2. An organization could limit purchases of PRESENTATION (LAYER 6) equipment to several vendors. While offer­ ing better flexibility and cost control, this SESSION (LAYER 5) approach can complicate an interworking strategy unless the ability of the equipment TRANSPORT (LAYER 4) from different vendors to operate together is carefully scrutinized. NETWORK (LAYER 3)

Neither approach, however, is satisfactory for the DATA LINK (LAYER 2) organization that owns equipment from more than three or fo ur vendors and does not wish to PHYSICAL (LAYER 1) incur the risk and expense of building custom solutions. This organization must depend on some external communications standard that is Figure 1 Op en Sy stems Interconnect Model supported by all the equipment that it intends to acquire.

36 Digital TecbnlcalJournal No. 3 September 1986 New Products

architectures and discuss some key consider­ ations to be addressed when interconnecting the TRANSACTION LAYER

two. ' PRESENTATION SERVICES LAYER Systems Network Architecture - An Overview DATA FLOW CONTROL LAYER SNA has been in existence since 197 4. It is TRANSMISSION CONTROL LAYER defined by IBM Corporation as '' ...a total description of the logical structure, formats, pro­ PATH CONTROL LAYER tocols and operational sequences for transmit­ ting information units through and controlling DATA LINK LAYER the configuration and operation of networks. "2 As such, SNA is an all-embracing network PHYSICAL LAYER architecture, implemented in products from mainframes to personal computers. Current esti­ mates are that from 30,000 to 40,000 network i Figure 2 Layers of SNA nodes in the world today operate some · form of SNA interface. The layered structure of SNA is shown in Fig­ SNA also makes heavy use of communica­ ure 2. One property of SNA that makes it differ­ tions front-end processors that typically are ent from most existing network architectures is either IBM 3705- or 3725-class machines. its hierarchical node structure and subset archi­ These processors are classed as physical unit tecture. It is also unique in that it accommodates type 4 (PU_T4). They typi�ally perform all the particular function types (known as logical unit classic front-end tasks, such as line polling, data types) . link handling, message unit routing, flow con­ trol, and error recovery and notification. The SNA Node Typ es PU_T4 fu nction is generally implemented in the Shown in Figure 3 are both a sample SNA topol­ Advanced Communication FunctionjNetwork

ogy and an illustration of all possible node types Control Program (ACFjNCP)I software . Given that can logically coexist within the network. current SNA definitions, it is not possible to sup- The physical unit type 5 (PU_T5), or host node, port an SSCP fu nction on a PU-T4. is the functionally richest node. It is typically based on System-370 architecture and contains a component known as the system services control point (SSCP) . SSCP is responsible for much of HOST NODE . the control and management of up to all of an (PU_T5) SNA network and usually contains the primary application subsystems. Application subsystems are usually complex application programs which support both inter­ I I FRONT END active terminal access along with value-added NODE functions, such as transaction processing or gen­ (PU_T4) ; I eral timesharing. These subsystems include the Customer Information Control System (CICS) , the Information Management System (IMS) , and CLUSTER I CLUSTER the Time Sharing Option (TSO) program prod­ TERMINAL CONTROLLER CONTROLLER NODE ucts, all of which network users usually need to NODE NODE (PU_T1) access. The implementation of SNA on a host (PU_T2) (PU_T2) node is typically split between the primary appli­ cation subsystems and the Advanced Communi­ cation FunctionjVirtual Telecommunications Fig ure 3 SNA Node Types Access Method (ACFjVTAM) program, which implements the SSCP function.

Digital TecbntcalJournm 37 No. 3 September 1986 Th e DECnetjSNA Gateway Product

From an architecture standpoint, both the Logical unit type 0 is unique by virtue of its PU_T5 and PU_T4 nodes have a fairly high "nondefinition" (that is, it can be defined by a degree of intelligence and possibly some mass user to implement any form of desired program­ storage. Thus their associated SNA definitions are to-program function). Logical-unit type 6.2 is fairly rich in function and rather complex in defi­ significantly different from the other LU types. nition. On the other hand, such devices as the Its definition changes the semantics of an LU IBM 3274 information-display control unit and from that of a network port to that of a dis­ the 37 7 6 remote job entry workstation are tributed operating system. LU6.2 is used mainly assumed to be limited in both intelligence and for transaction processing; it is a primary indica­ storage. These operations are detailed in the tion of the future direction of SNA.3 physical unit type 2 definition. The SNA node definition for this class of device is more limited An Exa mple of LU-to-LU in that both node and end-user communications Communication operate in the slave mode of a masterjslave rela­ This section discusses briefly the LU-to-LU com­ tionship. munication within an SNA network. 4 Start with The final node type in SNA terminology is the simple SNA topology as shown in Figure 4. typically associated with single-unit, limited­ Assume that an end user attached to a 3270 dis­ function terminals, such as the 3767 communi­ play station cluster controller node B wishes to cations terminal and the 32 71 model-1 1 display enter into an SNA session, or communication dia­ unit. This type is known as physical unit type 1. logue, with a CICS subsystem executing on host It was much more prominent in the early days of node A. For this session to begin, one side must SNA when the architecture was more oriented initiate the request. Typically, that is done at the toward terminal-mainframe communication than display station via either an unformatted log-on is the case today. Given current technology or a formatted SNA initiate-self message. This trends, it is likely that this particular node type message is issued by the logical unit services will become increasingly de-emphasized, except manager at the cluster controller in response to perhaps as a migration mechanism for the inter­ connection of pre-SNA devices or non-IBM equipment. HOST SYSTEM Program-to-Program Communication within an SNA Network CICS SUBSYSTEM Interprogram communications within an SNA network are realized via an architectural compo­ nent known as a logical unit (LU). The LU can be envisioned as a port from which an application program can obtain the services of an SNA net­ work. SNA communications through a logical FRONT END PROCESSOR unit are managed via an entity known as a logical SNAt unit services man ager. This entity is responsible SESSION for interfacing end-user communications requests into the SNA network. � Logical units are further classified by the type 3270 3270 of layered function the application programs CLUSTER CLUSTER CONTROLLER CONTROLLER choose to realize. There are specific logical unit types that are pr edefined by IBM Corporation to correspond to particular layered functions that "standardize" the use of SNA capabilities in real­ izing mainstream usages. Most logical unit types predefine terminal- and printer-to-host program DISPLAYS DISPLAYS functions; these LUs are called types 1, 2, 3, 4 and 7. The exception to this classification is in the definitions of logical unit types 0 and 6.2. Fig ure 4 Sample SNA Topology

38 Digital TecbnicaiJournal No . 3 September 1986 New Products

some display-specific user action. Such an action expanded to include areas such as DEC-to-non­ could be pressing the system request (SYSREQ) DEC communications, particularly terminal-host key and typing some log-on text. The request access, access to non-Digital systems over X.25- action is then directed to the SSCP in the host based public data networks (PDN), and access to node, which in this example is the controlling the resources of SNA-based hosts. The structure SSCP for both the cluster controller and the CICS of the DNA architecture is illustrated in Figure 6. subsystem. Unlike the somewhat hierarchical SNA struc­ After receiving the log-on request, SSCP ture, the DNA structure is symmetric .and peer­ informs the CICS subsystem that a user at a par­ to-peer, with any two processes being free to ! ticular LU in the network wishes to communi­ cate. Ifthe subsystem chooses to accept the con­ CICS CLUSTER CONTROLLER nection, it does so via a request that causes an

SNA message unit, known as a "bind," to flow UNFORMATTED LOGON/ over the network to the destination LU. A bind INITIATE-SELF indicates the willingness of the subsystem to BIND =.:c..:.::....______communicate with the terminal and includes a list of session parameters to which both sides are expected to conform. If these parameters are -DATA EXCHANGE - acceptable to the display management logic in ' I the cluster controller, this logic then requests the logical unit services manager to transmit an SNA "positive" response to the bind message. At UNFORMATTED LOGOFF/ TERMINATE-SELF this point both sides are ready to begin exchang­ ing useful data. UNBIND Useful data is exchanged by both partners using protocol sequences defined by the logical­ Figure 5 CICS- Cluster Controller Session unit type 2 definitions (the SNA logical unit type Exchange defined for 3270-to-host program communica­ tion) . The exchange continues until one partner (typically the user at the display) decides to ter­ minate communication. Termination is generally t accomplished via the transmission of either an USER LAYER unformatted log-off or a formatted terminate-self NETWORK MANAGEMENT LAYER message from the control unit to the SSCP. Upon receiving this request, the SSCP logic informs NETWORK APPLICATION LAYER CICS that the user wishes to terminate communi­ cations. At that point CICS requests that an SNA SESSION CONTROL LAYER unbind message be sent to the control unit to ter­ minate the session properly and to deallocate all END COMMUNICATIO� LAYER associated resources. This generic protocol i exchange is illustrated in Figure 5. ROUTING LAYER Digital Network Architecture - DATA LINK LAYER An Overview

DNA (announced in 1975) has been in existence PHYSICAL LA YEA almost as long as SNA and is implemented on approximately the same number of network nodes. DNA was originally conceived as a means Figure 6 Layers of DNA to facilitate DEC-to-DEC communication in applications areas such as program-to-program communication, remote file transfer and access, remote terminal access, and down-line loading of diskless systems. DNA's scope has been

Digittil Tecbnical]ournal 39 No. 3 September 1986 The DECnetjSNA Gateway Product communicate, provided that naming and security cept was rejected, however, due to daunting constraints have been satisfied. DNA has only maintenance and support considerations. Based two node types: end and routing. The nature of on these factors, we adopted the gateway application process communication is deter­ approach. mined by using either Digital-defined protocols Building this gateway was a tricky business at the network application layer or protocols because the two architectures differ not only in defined by end users. A communication instance their detailed protocol specifications but also in between two partners is called a logical link, the services provided at layer boundaries. with partners being identified via a network Despite superficial similarities (both architec­ node name (associated with a network-unique tures have seven layers, both support mesh node address) and a particular object type topologies, and so forth) , SNA and DNA are about identifier. 5 as dissimilar as can be. For example, at the network layer, DNA rout­ Interconnection Issues between ing provides a connectionless service using an DNA and SNA Networks adaptive routing algorithm; conversely, SNA An interconnection between DNA and SNA net­ path control provides a connection-oriented ser­ works involves a number of the questions raised vice using quasi-fixed routing. At the transport earlier in this paper about network interconnec­ layer, the DNA structure uses a standard, symmet­ tion. Some effective answers to these questions ric, three-way handshake to establish connec­ are provided by a product called the DECnetj tion; whereas SNA uses an asymmetric, three­ SNA Gateway. This product was developed by party negotiation. At the session layer, the DNA Digital Equipment Corporation to address these architecture provides simple process-binding many issues. Other issues, such as the nature of and access-control fu nctions; whereas SNA hardware interconnection used, the extent to provides complex data-phase services, such which common services are shared, the architec­ as chains, brackets, and multiple acknowledg­ tural interface of each network (and its associ­ ment schemes. At the application layer, the dif­ ated users) to the other, the cross-network certi­ ferences are even more pronounced. A central fication of products, and installation factors, DNA application service is file transfer and were all addressed as pan of the development of access, which SNA does not support at all. this product.6·7 Conversely, a widely used SNA application ser­ The remaining sections of this paper address vice is remote job entry, for which DNA has no the DECnetjSNA Gateway from two perspectives. counterpart. The first examines the architectural philosophy and protocols upon which the gateway is based. Possible Gateway Architectures The second examines the actual components of There were three possible architectural the gateway in terms of both the hardware pack­ approaches to this gap. First, a protocol aging and software modules used to give the translation gateway could be used to find a suffi­ desired degree of interconnection. In both sec­ ciently similar pair (or pairs) of protocols and tions attention is given to the question of how provide a deterministic mapping between their effectively the design approach addressed many messages. Second, a one-way encapsulation gate­ of the aforesaid issues. way could be used to select one protocol of one architecture and encapsulate all higher-layer Th e Need fo r a DNAjSNA Gateway protocols of the other architecture. Third, a two­ A fundamental decision in attempting to bridge way, or mutual, encapsulation gateway could be the gap between two different network architec­ used to operate as two one-way gateways to carry tures is whether to simply incorporate one archi­ the higher-layer protocols of each architecture tecture into the other or to employ some form of over the lower-layer protocols of the other. gateway. In the case of a DNAjSNA bridge, repro­ Now, protocol translation gateways have the ducing all of SNA into each DNA implementation (at least theoretical) advantage of providing was not feasible, given SNA's size and complex­ transparent communication between two incom­ ity. On the other hand, implementing the DNA patible network architectures. They do that by architecture into the confines of an SNA node hiding the differences inside the gateway box. was an attractive technical alternative. This con- Our initial attempts at designing an SNA gateway

40 Digital TecbnlcalJournal No. 3 September I986 New Products

centered around this model. We abandoned it, • To allow for the easy incorporation of however, because the two architectures pro­ DECnetjSNA functions into other Digital prod­ vided such different services that, even if the ucts, thus providing our Customers with inte­ protocols could be mapped, both user interfaces grated solutions would have to be altered, perhaps radically. The • To enable the modular development of func­ third alternative, a mutual-encapsulation gate­ tional pieces so they can be re-used in several way, suffered from these same maintenance and different products support difficulties; since DNA higher-layer pro­

• tocols would have to be built to run on SNA To provide an architectural1 base that is com- nodes. moo between the network interconnect (gate- Therefore, we chose a one-way encapsulation way box) and single-system interconnect gateway because it appeared relatively easy to products implement the SNA higher-level protocols • To provide a structure that can accommodate within a DNA network. Moreover, this solution future developments in both hardware and did not require any DECnet-specific software or software, without negating existing invest- hardware components to be introduced into the 1 ments SNA environment. One-way encapsulation operates at the trans­ None of the preceding objectives are surprising; port layer. The SNA notion of a "session" is the architecture defines a workable segmenta­ mapped onto the DNA notion of a logical link. tion of the SNA function that we can and do use This mapping is accomplished by carrying all in product development. The SNA product archi­ SNA session data, including the connection estab­ tecture is the master architecture; we have also lishment messages, over to the data phase of the developed other architectural specifications that DNA logical link. Choosing this mapping meant prescribe specific aspects1 of individual prod­ that SNA-oriented applications on DNA nodes ucts. The SNA gateway-access and gateway-man­ could be written as though they used directly the agement architectures are described later in this services of the transmission control layer of SNA. paper. This has worked quite well in practice since we Layered Structure of the Architecture have implemented many mainstream SNA appli­ I cation protocols successfully through the gate­ The SNA product architecture distinguishes five way, including 3270 data-stream and DISOSS separate layers in a product. In descending access. The main disadvantage is that each new order, these layers are as follows : SNA application protocol requires a complete • The functional layer implementation on the end-user's DNA node before an application can be run in the DNA • The SNA interface layer i universe. • The common network layer In some cases the need to have application­ specific code on the same node can be avoided • The data link layer by building a "server," an approach described • The physical layer later in this paper. In an architectural sense, the server is just one user of the encapsulation gate­ The layers are shown in Figure 7. way mechanism described above. In a particular product, the layers of the archi­ Digital's SNA Product Architecture tecture can be physically distributed in a number of different ways . Currently, we use two distinct The SNA product architecture has been devel­ distributions, called the gateway access model oped by Digital to provide a fr amework within and the server model. The important difference . which our network products can be designed. between the two models is how much of a The most important objectives for this architec­ product's function is physically located in the ture are gateway node. ! • To promote and support the idea of the In the gateway access model, the functional DECnetjSNA product set as a fa mily of prod­ and SNA interface layers are contained in a ucts, with each product being a part that fits process that executes in the end-user's node, well into the whole whereas the common network layer and lower

Digital Te chnicalJo urnal 41 No. 3 September 1986 Th e DECnetjSNA Gateway Product

end-user function. The translation from SNA pre­ sentation protocols to DECnet presentation USER 3270 DIA/ FUNCTIONAL PROG T.E. DCA LAYER media and formats takes place in this layer. The functional layer can contain programs supplied either by customers or by Digital's application software groups, as well as entities that are part

LUO LU2 LU6.2 of DECnetjSNA products. Two such products I J 1 SNA are described later: the DECnetjSNA VMS EXTENDED MODE INTERFACE LAYER DISOSS document exchange facility (DDXF) , BASIC MODE and the DECnetjSNA VMS remote job entry J j (RJE) . SNA Interfa ce Layer STANDARD COMMON NET INTERFACE The SNA interface layer provides access into the COMMON GATEWAY SINGLE SYSTEM SNA network. Three different access levels of NETWORK ACCESS METHOD ACCESS METHOD I LAYER interface are offered: a basic interface, which is COMMON SESSION CONTROL AND very close to that offered by the common net­ PATH CONTROL FUNCTIONS work layer; a so-called extended interface, which offers generic support fo r the data flow l control and transmission control protocols; and STANDARD DATA LINK INTERFACE several LU-mode interfa ces, each of which DATA LINK LAYER implements a particular logical unit type (ses­ I ETC. sion type) . The reason for the existence of three different levels of access is to some extent historical. That

STANDARD DRIVER INTERFACE is, we fi rst gained design experience using the PHYSICAL LAYER basic level of interfa ce and then were able to ETC. abstract and isolate the functions comprising the higher levels. The boundary between the functional and the SNA interfa ce layers is somewhat indistinct due TO SNA NETWORK (3725 FRONT END) to the different levels of interface offered by the latter. Nevertheless, in any particular case the structural distinction is a useful one, provid­ Figure Layers of Digital's SNA Product 7 ing as it does a standard model for program Architecture structure. The choice of which interface level to use layers execute in the gateway node. The trans­ involves a compromise between ease of use and port mechanism by which the SNA interface flexibility in manipulating the SNA protocol. layer communicates with the common network It is analogous in some ways to the choice of layer is defined in the section SNA Gateway whether to program in assembly language or in Access Architecture. a higher-level language. The LU-mode interfa ces In the server model, all five layers execute in offer specialized, easier-to-use interfa ces; the gateway node (more properly called a server the lower-level interfaces allow greater control node in this role) . The way in which the end­ over protocol operation at the cost of additional user gains access to the server depends on the programming. server itself; the SNA product architecture does The SNA interface layer is the lowest that is not specify that way. Existing DNA protocols are publicly accessible. Several programming inter­ used where appropriate. face products (basic mode, LUO, LU6.2 and 3270 data stream) are available and form part of Functional Layer this layer. In fact, the definition of these prod- The functional layer is the highest in the SNA . ucts was derived from the definition of the SNA product architecture and implements the actual product architecture.

42 Digital Tecbnical]ournal No. 3 September 1986 New Products

Common Ne twork Layer SNA gateway access protocol, or GAP, and is The common network layer provides routing and depicted in Figure 8. GAP is a fa irly straightfor­ multiplexing functions between the da ta link ward DNA application la yer protocol. It ma kes la yer below and the SNA interface layer above. use of the features provided, by the DNA session Data units received are routed to the entity (in control and lower la yers, in pa rticu1ar, flow con­ the SNA interface layer) that owns the SNA ses­ trol, error control, and message segmentation sion to which the data units belong. Data units and reassembly. Hence GAPineed not contain any sent are routed to the appropria te data link layer. such mechanisms itself. This layer implements the SNA path-control The SNA access module is! part of the SNA inter­ protocols and some - but not all - of the fa ce la yer, implementing what was referred to transmission-control protocols, including the above as the ba sic level of interface. The SNA common session control. The PU and LU ser­ ga teway module runs as a se pa ra te process in the vices management functions are also pa rt of this ga teway system, in which the module uses ser­ layer. vices that provide it with the functions of pa th control and.common session control. Data Link Layer A brief example is presen ted in the following The data link layer provides error-free transmis­ section in lieu of listing the specific operations sion of da ta units over a physical link. Current of GAPin detail. This example illustrates some of implementations of the data link la yer support the message flows that take place between the only the SDLC secondary sta tion mode. However, SNA access and ga teway modules. the structure could allow the addition of other data link protocols (such as X. 25) in the future. Examp le of Message !flows This la yer corresponds exactly to the da ta link This example describes the( exchanges needed to layer in both the DNA and SNA architectures. establish a session and to transfer data, with the session being terminated by the IBM application. Physica! Layer Figure 9 illustrates the actions tha t ta ke place. In The physical la yer provides the means to control the following description, the "user" is a higher­ and use the physical connections that transmit level entity that utilizes the SNA gateway service. data between systems. This la yer ca n support a In the context of the SNA product architecture, wide variety of device types and interface stan­ such a user will in fact be pa rt of the SNA inter­ da rds, and can be implemented in va rious hard­ fa ce la yer. wa rejsoftware mixtures. This la yer corresponds To initiate the connection, the user program exactly to the physical la yer in both the DNA and issues a connect call. Included in the parameters SNA architectures. of this call are the name of the ga teway node, the secondary LU (SLU) address to be used, the name Digital's SNA Gateway Access of the primary LU (PLU) to be the session pa rt­ Architecture ner, and sundry other SNA parameters required The SNA ga teway access architecture prescribes · for the connection. 1 The SNA access module hen allocates internal the tra nsport mechanism that allows SNA inter· iI fa ce la yer modules in a DECnet host node to ga in resources and es tablishes a DECnet logical link access to common network la yer modules in an to the SNA gateway module in the specified gate­ SNA gateway node. This mechanism is at the way node. In turn, the SNA gateway allocates heart of the gateway access model for product resources for the session and wa its. design. Hence it is used implicitly by most of the The SNA access module then transmits a GAP current DECnetjSNA product set. connect message to the SNA gateway module. The SNA ga teway module allocates the requested Overview of the Architectural SLU address and transmits an SNA initiate-self Structure message to the SSCP, which informs the PLU via Two modules are needed to implement the SNA an SNA control-initiate message. ga teway service: the SNA access module and the Eventually, the PLU transmits a bind to the SNA ga tewa y module. The two modules commu­ ga teway. The bind is forwarded to the SNA access · nicate by means of a protocol that operates over module as a bind-data mes1>ageand ultimately to a DECnet logical link. The protocol is termed the the user program in resporyse to a read-bind call. I

Digital Tecbnical]ournal 43 No. 3 September 1986 The DECnetjSNA Gateway Product

The user program then agrees to the session by At some point, the PLU will terminate the ses­ issuing an accept message, which causes the SNA sion by sending an unbind message to the gate­ access module to send a bind-accept message to way. At that point the SNA gateway module dis­ the SNA gateway module. The gateway module connects the logical link with the SNA access then acknowledges the bind (that is, transmits a module, supplying an appropriate reason code in positive response), and the LUs are now consid­ the disconnect message. The user program will ered to be in session. read this reason code and issue a close-call to The user program can now exchange data cause the SNA access module to deallocate its messages with the IBM application. To effect resources. an exchange, the program uses the transmit-calls and receive-calls functions of the SNA access Relationship between Products module. Note that higher-level protocol initial­ and Architecture ization may well be needed before true end­ The SNA product architecture allows consider­ user data exchange can begin. Such details, able freedom with respect to the distribution of however, are not known to the SNA access and functions when a particular product is being SNA gateway modules. During this data trans­ designed. Various amounts of a product can be fer phase, the SNA gateway module operates located in the DECnetjSNA Gateway, whatever is as a simple message switch, passing data to appropriate for the design. Digital' s current and from the SNA access module without products conform to either the gateway access interpretation. model or the server model.

USER MODULE DECnet SYS TEM SNA ACCESS I MODULE

SNA GATEWAY ACCESS DEC net PROTOCOL(OPERATED NETWORKI OVER A DNA LOGICAL ( ) LINK)

SNA GATEWAY MODULE

DECnet SYSTEM CONTAINING SNA I GATEWAY PROTOCOL EMULATION MODULE SNA/SDLC DATA LINK CONTROL MODULE

SNA ( NETWORK )

Figure 8 Structure of Digital's SNA Gateway

44 Digital Tecbnicaljournal No. 3 September 1986 New Products

The DECnetjSNA VMS DISOSS document increased use of gateway resources and the intro­ exchange fa cility (DDXF) is an example of the duction of product-specific; support in the gate­ gateway access model. DDXF allows documents way. Whether those trade,offs are acceptable to be transferred between VAXJVMS systems and depends on the product. 1 IBM DISOSS. On the other hand, the DECnetjSNA VMS remote job entry (RJE) is an example of the DDXF as an Exa mp le of the Gateway server model. RJE is a traditional remote batch Access Model · workstation emulator that allows VA XjVMS users The IBM DISOSS system use� three important IBM to submit jobs to an IBM host for processing and architectures: to have the job output returned to the VAXJVMS system. • The Document Content Architecture (DCA) , Building a product to the gateway model which prescribes the format and content of allows the greatest use of common code mod­ documents ules. The product designer need concernhim self • The Document Intercnange Architecture only with the functional layer and, if no standard (DIA) , which defines the !format and protocols interface module is available, with the SNA inter­ used to transfer documents between office sys­ fa ce layer . Also, no product-specific software has tems to be included in the gateway node. Building a product to conform to the server • The logical unit type 6.2 (LU6.2), which model can re move some of the processing describes the SNA session protocols used to from the host node. The cost, however, will be implement the communications function

CONNECT � CONTROL f' f' POINT --'---=�---i � INITIATE-SELF L����� J�� coNTROL-,Nmm j BIND I

BIND-DATA

SEQUENCE I BIND-ACCEPT - OF EVENTS DATA

DATA

j _I DATA

DATA

UNBIND I

ABORT

PR,MARY USER LOGICAL PROGRAM GA'EWAY UNIT J GAP OVER A DNA LOGICAL L SNA SESSION LINK J PROTOCOLS l Figure 9 Gateway Access Message Fl ow

Digital Te cbnicalJournal 45 No . 3 September 1986 Th e DECnetjSNA Gateway Product

OUTPUT FROM INPUT JOB PRINT/PUNCH FILES STREAMS DECnet SYSTEM

DECnet �R JOB WORKSTATION I F1 LE � A � ' cE oPERA oR � ����;� ______-__ -__ __ ------__- - ______------

( DECnet NETWORK ) j RJE IC SERVER I DEC net I SYS TEM SNA o c NTAINING PROTOCOL GAT EWAY EMULATOR

SNA/SDLC DATA LINK CONTROL

Figure 10 Structure of Remote jo b Entry Fa cility

The functional layer for DDXF handles both the R]E as an Example of the Server Model DIA protocols and the conve rsion to and from The functional and SNA interface layers for RJ E DCA document formats. The fu nctional layer both appear in the RJ E server process, which exe­ uses the LU6 .2 component of the SNA interface cutes in the gateway. Figure 10 depicts the struc­ layer. Internally, the LU6.2 interface uses the ser­ ture of RJ E. The functional layer converts data to vices of the extended-mode interface, which in and from the fo rmats used by SNA remote job turn is supported by the basic-mode interface. entry. This layer uses the DECnet remote file The basic mode of the SNA interface layer must in access facility to read and write files located on turn communicate with the common network the DECnet host node. The SNA interface layer layer; the latter is located in the DECnetjSNA implements the LUI protocol for remote work­ Gateway node, and communication takes place stations. The RJ E SNA interface layer uses the using the SNA gateway access mechanism dis­ common interface layer software in the gateway cussed earlier. node, as does the SNA gateway module. The common network layer uses the services The end user communicates with the RJ E of the SDLC module in the data link layer. In server by using one of two command interfaces turn, this layer uses whichever physical layer (workstation operator or unprivileged user) resi­ module is appropriate to the communications dent on the DECnet host node. These interface hardware. programs use a private protocol, operated over a DECnet logical link, to pass commands to the server .

46 Digital Te cbnicaJJournaJ No . 3 September 1986 New Products

Digital's SNA Gateway Management ity. The manageable entities are the line, the cir­ Architecture cuit, the PU, the LU, the access name, and the The problem of network management is proba­ server. bly the greatest impediment to effective inter­ Not all entities possess all the above proper­ vendor networking. By network management we ties. For example, the access name is a fairly mean configuration, performance monitoring, static entity possessing only a name and some and fault diagnosis. associated parameters. However, the access In the DECnetjSNA Gateway product, we name has no state, no counters, and generates

chose to partition the management problem into no events. In contrast, a:I circuit has a name, three parts: management of the DECnet compo­ parameters, and a state anq also maintains coun- nents, management of the SNA protocol compo­ ters and generates events. ( · nents, and management of individual servers. The line and circuit entities correspond to the This division mirrored both the logical and identically named entitie� of DECnet manage­ physical structure of the gateway and made the ment. The line entity represents the physical management problem tractable. layer, the circuit entity th� data link layer. The We did not attempt to implement a manage­ PU and LU entities represent the SNA physical ment gateway. It is not possible for a network and logical units respectively. These have no manager on a Digital system to display or modify direct counterparts in DECnet management. operational parameters of the SNA network. Nor There are certain similarities between the PU is it possible for the manager of an SNA network and DNA' s ECL, and the LU and DNA's session to display or modify parameters of the DECnet (both of which are subsu�ed into the manage­ network or of the DECnetjSNA Gateway itself. ment entity called executor node). (It is possible, however, for either manager to An access name entity has no direct counter­ log in remotely to a system in the other net­ pan in the IBM environment. This entity is sim­ work and use its management utilities. A brief ply a shorthand form for specifying any parame­ description of this capability is described in the ters required to establish a session between the

section Remote Access.) DECnet and SNA networks!I A user process is able

to specify. an access nameI instead of providing Management of DECnet Components the individual connection parameters. A server The management of the DECnet components of entity is one that represen some son of service an SNA gateway node is performed according to provided to users. Two e tsamples of application the DNA network management model, using servers in the SNA gateway� node itself are the standard DECnet management utilities from a SNA gateway module and the RJ E server module. host DECnet node. (This paper does not dis­ In a sense, the view of the gateway that results cuss DECnet management. See reference 8, the from these entity definitions is a simplification DEC net DNA Phase IV Ne twork Management of the gateway architecture. This simplified Functional Sp ecification , for more details.) · view is desirable because it lessens the effort needed to understand and hence to manage the Gateway Management Entities gateway. This idea of having a simpl ified model For the SNA components of the gateway, we of the network for the purpose of management her;e define a model that is similar in a general is common to both SNA and DNA. sense to th e model described in the DECnet DNA Phase IV Ne twork Management Func­ Configuration Ma nagement tional Sp ecification. Our model treats the SNA Configuration and oper,a ting parameters for software as a number of named entities, each of the SNA components in the gateway are set which is the focal point fo r some management or modified by using a[ management utility operation. An entity typically ha s pa rametric running on the DECnet qostl node. This utility information that can be read and perhaps writ­ communicates with the SNA network manage­ ten and maintains current state information that ment listener in the gateway node. can be read. An entity also maintains a set of The general command syntax is derived from counters that can be read andjor zeroed and that of DECnet management but has been sends event messages to a central logging facile changed to accommodate the details of SNA

Digital Tecbnlcal]ournal 47 No. 3 Septe mber 1986 The DECnetjSNA Gateway Product operation. A few examples of the syntax are may be implemented makes it difficult to given as follows: include sufficient support for all servers . Thus if SET LINE DSV-0 DUPLEX FULL necessary, each server implements its own man­ agement utility. SET CIRCUIT SDLC-0 ADDRESS 40 STATION 1D OOOOODEC Remote Access SET ACCESS NAME CICS CIRCUIT SNA-0 Although neither the DECnet network manager APPLICATION CICS6 LU LIST 1-16 nor the SNA network manager can directly con­ trol the other, it is possible to use remote termi­ In these examples, LINE DSV-0, CIRCUIT nal access mechanisms to effect some degree of SDLC-0, and AC CESS NAME CICS identify the indirect control. The manager must log in to a entities to which each SET command is to apply. system in the "other" network, and hence The rest of each SET command string is a become a user of that network, in order to gain sequence of parameter names, each followed by access to the network management programs. the value to be set. Thus ADDRESS 40 indicates The DECnetjSNA VMS 3270 terminal emulator that the address parameter (the SDLC secondary (TE) allows the DECnet network manager to log station address) is to be set to the value hexa­ on to IBM applications, including the SNA net­ decimal 40. work management utilities, such as NCCF. He Performance Mo nitoring can thus control the SNA network to the extent SNA gateway management maintains counters, that he has the privilege to do so. associated with various entities, which record The DECnetjSNA distributed host command statistics. These include the amount of data trans­ facility (DHCF) allows the SNA network manager mitted, the number of CRC errors occurring on a to log on to a VMS system through the gateway. particular data link, any buffer availability prob­ Once logged in with sufficient privilege, he can lems, and so forth. By examining these counters issue NCP commands and hence control the periodically, the DECnet network manager can DECnet network. see how well the various components of the gate­ DECnet/SNA Gateway Components way are performing and whether or not any prob­ lems need to be investigated. The DECnetjSNA Gateway provides a protocol­ handling interface between SNA and DECnet net­ Fa ult Diagnosis works. The gateway, introduced in 1982, was the There is considerable overlap between perfor­ first product to provide remote functions over a mance monitoring and fault diagnosis. Fre­ network from a closed server system. The gate­ quently, poor performance is the first indication way consists of several software components; of a fault; thus counters can be viewed as fault­ Figure 11 provides an overview of its major diagnosis aids. Event-logging messages are also parts. The base software is the RSX-1 1S operating useful in diagnosing faults; for example, fr e­ system with a communications supervisor called quent circuit-down events could indicate hard­ the CommExec. These two components provide ware problems. Event messages are generated by the environment fo r both the DECnet-RSX soft­ various software components in the gateway and ware and the RSXjSNA protocol emulator. are sent to the DECnet host. Usually the events are displayed on the operator console of the RSX- 11S, Communications Executive, host; they may also be collected in a file for latei: and DECnet-RSX Software analysis. We chose the RSX-1 1S software for the fo llowing Gateway management also supports com­ reasons: mands that allow loopback testing to be per­ • The RSX-11S operating system was well docu­ formed. The system can isolate failing hardware mented and provided a good development components by using loopback at various levels. base by means of the RSX- 11M operating sys­ Management of Servers tem, a well-tested one. Server management is perhaps the least well • The RSX- 1 1 S system, being a memory-only sys­ defined area of the SNA gateway management tem, required no expensive peripherals that architecture. The range of different servers that would add to the cost of the gateway.

48 Digital Tecbnlcaljournal No. 3 September 1986 New Products

• Th e RSX-l lS system could be down-line munications executive can be considered a dedi­ loaded. This was proven and established cated communications subsystem. technology. RSXjSNA Protocol Emulator • RSX-l lS had host support for system images. The RSXjSNA protocol emulator (PE), an exist­ Thus the RSX-l lS system provided a sound start­ ing product, provided a starting point for the ing point. Furthermore, th e RSX communica­ IBM SNA network connection requi red by the tions executive and DECnet-RSX products also gateway. Moreover, the PE sed an early version provided a known base that worked well with f of the CommExec. We were able to reduce the the host facilities provided in Digital's operating engineering effort required to build the gateway systems. from a complete design of basic network func­ RSX Co mmunications Executive9 tions to an upgrade of the RSXjSNA PE th at Th e communications executive is a group of soft­ would work in the DECnet Phase IV environ­ ware modules that create an environment within ment. Updating the RSXjSNA PE was an easier which data communication software can execute task than doing a complete design because the in cooperation with an operating system. Tai­ existing RSXjSNA product was stable and its lim­ lored to the needs of th e communications soft­ itations were clearly unde�stood. ware, th is special environment shields data com­ For example, th e RSXjSNA PE required munications programs from involvement with support for the SNA message "unbind-bind the internal mechanisms of the host operating forthcoming," used with IBM TSO sessions. system. Just as the operating system supervises Th e RSXjSNA PE also needed to support the execution of user programs on the computer, the pacing and segmen�ation of messages. so the communications executive supervises the Pacing is th e IBM SNA method of flow l execution of data communications software. control; it allows the network to regulate Together with the software it manages, the com- buffer usage on an individual session basis.

PDP-11/24 SYSTEM

RSX-1 1S OPERATING SYSTEM

COMMUNICATIONS EXECUTIVE

-. SNA INITIAL LOAD PROGRAM (SNAIL)

IBM SNA SNA GATEWAY ACCESS SERVER (GAS) DECnet NETWORK PROTOCOL NETWORK EMULATOR DECnet-RSX

REMOTE JOB SERVER (RJSRV)

HOST COMMAND FACILITY SERVER (HCFSRV)

Fig ure 11 DECnetjSNA Components

Digital Tecbnical]ournal 49 No. 3 Septem ber 1986 Th e DECnetjSNA Gateway Product

Segmentation allows user code to send and units that the circuit can handle, an identifica­ receive buffers larger than the buffers used at the tion value used in dial-up configurations, and data link level. other items in the SNA network.

Philosophy behind the Gateway • The third SET command defines a shorthand name, SET ACCESS, fo r a number of SNA net­ In building the gateway, we wanted a system that work connection items. This command defines could be developed completely within Digital's a list of LUs, a circuit name, and an IBM appli­ engineering groups. We also wanted a system cation . The user can specify a single access that would allow the values chosen by our engi­ name rather than all the needed parameters. neers to be overwritten by customers with their own rietwork values. This capability required SNAIL is an RSX-privileged task that deciphers that data structures be allocated dynamically dur­ the RSXjSNA PE data structures and reads the ing the initial gateway startup. configuration file on the DECnet host node. SNAIL parses the commands from the configura­ Us er Level Components tion fi le, traces the RSXjSNA PE data structures, The user level components are as fo llows: and then places the configuration information in the correct location in the data structures. In the • The initial load program (SNAIL) , which man­ case of LU databases and access names, each ages SNA configuration with only a DECnet structure is allocated and linked into the existing interface database. • The gateway access server (GAS) , which Part of the SNAIL code detects errors during switches messages from one network to the the command parsing of the configuration file other records . Not having a console, the gateway needs a means of reporting errors, and DECnet • The remote job server (RJSRV) , which pro­ event messages supply that means. The number vides IBM remote job submission functions and contents of each line in error are merged • The host command facility server (HCFSRV) , into a line of text that is sent to the DECnet host which provides IBM terminals with a method node. of reaching DECnet-VAX hosts After loading the gateway, the DECnet network manager must check that the software and infor­ SNA IL , the In itial Load Program mation from the configuration fi le have been We chose to accommodate customers' network . loaded correctly. This checking is done by moni· values to the gateway during initialization by toring the DECnet event messages that appear on means of a plain text file (configuration file) . the host. These messages provide status informa­ Such files are common; to all of Digital's operat­ tion on successful steps and error info rmation fo r ing systems, and network routines are provided · failed steps. · that can read these files. The text in the configuration files for the SNA Gateway Access Server (GAS) network is divided into three areas based on the GASprovides support fo r the gateway access pro­ three SET commands used to configure the gate­ tocols (GAP). GAS is basically a message way. (A SET command in the SNA gateway is sim­ sw itcher that receives messages from the IBM ilar to the one used for DECnet NCP.) SNA network session and sends them along the correct DECnet logical links. GAS also takes mes­ • The first SET command is used to determine sages received from DECnet logical links and the type of signal control being used. sends them to the correct IBM SNA sessions. A This SET LINE command allows two choices: DECnet logical link and an IBM SNA session are full duplex, meaning the. modem leads are associated with one another at connection time. held "high"; and half duplex, meaning the Connections into the IBM applications are always modem leads are varied according to the send­ initiated from the DECnet side of the gateway. ing and receiving rules. GAS concerns itself only with the SNA bind • The second SET command defines the circuit message because it determines the buffe r size parameters. This command sets up the data that the gateway will receive from and transmit link station address, the number of SNA logical to the IBM SNA network. These sizes are allo-

50 Digital Te cbnicalJournal No. 3 September 1986 I I New Products I I cated from a common buffer pool, and each side Host Command Fa cillty Server of the connection has a maximum allocation (HCFSRV) limit. The buffers are allocated until the ! HCFSRV is a program lying11 midway . in complex- allocation exceeds the maximum allowed. ity between GAS and RJSR:V. HCFSRV performs This method provides the best allocation of some SNA protocols fo� the sessions that buffer memory, but it does not guarantee a have been established. It d,iffers, however, from fixed number of sessions since a single buffer the other servers in that the IBM application can be allocated that exceeds the allocation initiates the connection. Af1 ter the IBM applica- limit for a session. Therefore, the 32 sessions tion session has been eStablished, HCFSRV that the gateway documentation discusses receives the VMS host nameI and establishes a only occur if the allocation limits are not DECnet logical link with that node. HCFSRV exceeded by the sessions. The server must per­ S then continues to provide I NA protocol support form protocol work only at the start and end of after the session to the VMS host has been estab- the session. lished. This server can ha dle multiple sessions . dI from the IBM network, but the number of ses- Remote jo b Server (R]SRV) sions is limited by the arriount of buffer space RJ SRV is by far the most complicated program in available. r the gateway. RJ SRV supports multiple SNA Ii remote job entry workstations. Each workstation Th e Gateway Hardw4re contains a DECnet control link, multiple The DECnetjSNA Gateway software runs on two IBM sessions, and multiple network files. [ hardware configurations': a PDP-1 1 j24 with This handling of many different linkages has RX02 disks, DMRlls for the DECnet connec­ almost transformed the server from a simple tion, and DUPl ls for the sNA connection; and message switcher to a "micro-operating" 1( the Digital Ethernet Communications Server system, since it performs these activities in (DECSA) . DECSA is the ne ork equivalent of a real time. This micro-operating system provides rw communications controller, such as the DZ 11, a scheduler for events, common termina­ I • DMF32, or DUPl l. A serv(lr is a shared resource tion routines, and common buffer allocation for the hosts in an Ethernet andjor wide area methods. I networks connected to an Ethernet. The se er As with GAS, a DECnet host program initiates 1 � performs specific communications functions for the connection with RJ SRV, thus establishing , I these hosts. The hardware components are the workstation connection. The number of packaged in a freestandin , table-top unit with workstations and the sessions per workstation g self-contained power and cooling; it can operate are limited only by the available memory in in an office environment o in a computer room. the gateway. Because of this limitation, i-I At start-up, the unit performs a brief self-test. each workstation is treated as an RSX pro­ the appropriate se er software is down­ gram logical address space (PLAS) region. Ses­ Then nt line loaded from a Phase W DECnet host on the sions can then be allocated from the PLAS • I same Ethernet, and the unit begins operations as region. DECnetjSNA Gateway. The messages from the DECnet control link a i are parsed, and the actions taken vary depending on the current workstation state. At the same Summary I time, IBM SNA sessions may be active, receiving We ·have enumerated the! many diverse issues printer or punch records, or transmitting reader that need to be addressedI as part of a network records. The server provides all the SNA proto­ interconnection process. ThisI process is, to say cols for transmission control (TC) , data flow the least, a complex one. An effective network control (DFC) , and fu nction management head­ interconnection scheme Ican result only from ers (FMH) . In addition, RJ SRV provides support an effective architectural! and implementation for SNA character strings (SCS) and LU I com­ process. 1· pression. These facilities and the permutations Numerous aspects of cross-network intercon­ of different states make R) SRV rich in functional­ nect must be considered i'f the final result is to ity and fairly complex in terms of its internal meet the end-user's ne,eds. The following structure. aspects should be considered. I

Digital Te cbnicalJournal 51 No. 3 September 1986 The DECnetjSNA Gateway Product

• One must clearly understand the properties of Acknowledgments all architectures to be interconnected to deter­ The authors would like to acknowledge the con­ mine the most effective level of interconnec­ tributions of the following members of Digital's tion between them from a base services stand­ IBM Interconnect Engineering team, with­ point. out whose diligent effort much of our suc­ cess would not have been possible: Scott David­ • In implementing the interconnect, one must son, Dave Garrod, Bob Fleming, and Ladan take consistent approaches that take into Pooroshani of the Littleton team; Bob Ellis from account both the turnkey functions to be Colorado Springs; Richard Benwell, Carol Charl­ implemented as well as end-user requirements ton, and Chris Chapman of the Reading, U.K., concerningthose functions. (For example, is it team; and Craig Dudley of Systems and Commu­ , effective to split functions across multiple sys­ nications Sciences. Their leadership efforts have tems, and if so, what are the benefits?) truly resulted in leadership products. A modular approach that uses effectively both hardware and software "building blocks" is also References important for reliability, maintainability, and 1. V. Cerf and P. Kirstein, "Issues in Packet­ reusability considerations. Thus it is as important Network Interconnection,'' Proceedings to provide a modular implementation consisting of the IEEE , vol. 66, no. 11 (November of known, proven software segments as it is to 1978) : 1386-1408. use a framework that allows "mixing and match­ ing" pieces to facilitate the development of new 2. Sy stems Network Architecture: Te chni­ functions for various base systems. Once a struc­ cal Overview (Armonk: IBM Corporation, ture has been defined, the turnkey functions Order No. GC30-3073, March 1982). themselves must be of a bidirectional nature, 3. Systems Network Architecture: Tra nsac­ allowing users in one environment equal access tion Programmer's Reference Manual to the resources 'of the other (provided, of fo r Logical Un it Type 6. 2 (Armonk: IBM course, they are authorized to do so) . Corporation, Order No. GC30-3084, May Coincident with all this functionality is the 1983). need to manage it effectively, either from a cen­ tralized point in the network or at the distributed 4. Systems Network Architecture: Sessions points closest to the actual work being done. An between Logical Un its (Armonk: IBM Cor­ interconnect structure must be chosen that poration, Order No. GC20-1868, April allows the continuing use of existing mecha­ 1981). nisms with convenient "hooks" to access other 5. Digital Network Architecture Phase IV environments, if needs so dictate. Finally, the General Description (Maynard: Digital approach chosen must be flexible enough to Equipment Corporation', Order No. AA­ allow existing structures to migrate conveniently N149A-TC, May 1982). to more cost-effective technologies as they become available, all without disrupting the user 6. J. Morency, "The SNA Gateway-The Foun­ interface. The preservation of existing user dation for the Information Bridge," Pro­ investment must always be a key concern. ceedings of INTERFA CE '83 (March All these goals were met in the existing 1983): 146-154. DECnetjSNA Gateway product set. Our approach 7. ]. Morency and R. Flakes, "Gateways: A is the.result of a carefully considered structure, Vital Link to SNA Network Environments,'' not of an ad-hoc collection of functionality. That Data Communications Oanuary 1984): structure facilitates the rapid development of 159-166. new functionality today and preserves existing application investments for the increasingly dis­ 8. DECnet Digital Network Architecture tributed processing world of tomorrow. We Phase IV Network Management Func­ these products to be key components of tional Sp ecification (Maynard: Digital the network that eventually becomes the Equipment Corporation, Order No. AA­ system. · X437A-TK, December 1983).

52 Digital TecbnlcalJournal No. 3 September 1986 New Products

9. J. Forecast, J. Jackson and J. Schriesheim, ''Communications Executive Implements Computer Networks," Computer Design (November 1980): 71-75. Other References R. Bradley, "Interconnection Draws DEC, IBM Networks Closer," Data Communications (May 1985): 241-248. D. Korf, "New Ways of Communicating with IBM: A User View," Proceedings of INTERFA CE '85 (March 1985) 241-248. J. Martin, Design and Strategy fo r Distributed Processing , (Englewood Cliffs : Prentice Hall, Inc., 1981). Sy stems Ne twork Architecture Format and Pro­ toea/ References Manual: Architectural Logic (Armonk: IBM Corporation, Order No. SC30- 3112-2, November 1980).

Digital Technicaljo urnal 53 No. 3 September 1986 WiUiam R. Hawe Mark F. Kempf AlanJ. Kirby I

The Extended Local Area Ne twork Architecture and LANBridge 100

A study was conducted to identify thewide variety of application needs and environmentsfo r broadband local area networks. This study con­ cluded that no single local area network in isolationwas capable of com­ pletely solving the broad range of networking problems of interest. The proj ect team investigated alternative ways to providesolutions to these problems, including various local network technologies and interconnec­ tion schemes. From this investigation the team developed the Extended LANAr chitecture, capable of incorporating a variety of LANtechnologies. Using this architecture, the team designed a high-performance implemen­ tation of an Ethernet-to-Ethernet bridge, which led directly to the !AN­ Bridge 100, a product satisfying the originalgoals.

In early 1982, Digital's Networks and Communi· needed. There was a clear need to have LAN cations Group in conjunction with Corporate products that could intemperate, at least at the Research initiated an advanced development network level . effort called the Broadband Project. The project's Third, the project team agreed that some IAN original goal was to recommend which broad­ applications would require significantly more band products should be implemented during physical extent than could be offered with either the next two years and which technologies the baseband or broadband Ethernet products. should be contained in those products. There Therefore, some means of offering a greater were several motivations behind this goal. extent was required. As will be shown later, the First, there was significant uncertainty with­ results of the Broadband Project were very differ­ in computer companies, including Digital, ent from the ones that had been anticipated about the most appropriate physical medium for when it was initiated. local area network (IAN) products. At that time, Digital had - and still has - a strong commit­ Defining the Problem ment to the Ethernet concept using baseband The project team first proceeded to investigate coaxial cable. It was clear that while most appli­ the user environments in which these networks cations were served very well by an Ethernet would be utilized. There were three types of using baseband coaxial cable, some applications environments ofconcernto the project: the busi­ were better served by other media, such as CATV , ness office, the university campus, and the fac­ fiber-optic, or twisted-pair cables. The increas­ tory. Clearly, assumptions about these environ­ ing number of installations using private broad­ ments were not mutually exclusive, but the band technology, with its moderate bandwidth, names evoke the problems to be solved in each led the team to focus on this technology. one. The next step was to gather more input on Second, the DECOM broadband Ethernet customers' requirements, applications, and media access unit and related products were physical environments. under development within Digital at that time. Some information had already been collected Therefore, an effective mechanism for intercon­ by team members on previous visits to customer necting broadband and baseband products was sites, including a heavy-manufacturing facility

54 Digital Tecbnk4ljournal No. 3 September 1986 l

New Products

and a university campus. This information than the previous model. The traffic for this belped the team to construct a refined set of model is also bursty. FurtI hermore, the project questions to be asked on visits to other cus­ team thought that, as workstations and personal tomers. Subsequently, the team visited two more computers became commbn, this class of traffic universities and several commerCial sites where would soon become muth more widespread continuous process monitoring and control, and than terminal-to-compute traffic. research were performed. The team also exam­ The real-time environment� is characterized by ined one of Digital's sites that represented an a large . number of devict!I s (thousands) whose extensive office environment. requirements to communicate are quite hierar- The team discovered several generalized types chically structured. The applied load for a real­ of message traffic that were characteristic of the time environment is more accurately modeled by applications studied. These types were terminal­ deterministic arrivals. Mdreover, most applica­ to-computer, computer-to-computer, and real­ tions in this environment �xpect the variance of time traffic.1 Unfortunately, most customers the access latency to be lrlw in the IAN . were unable to deliver actual network loads and The team next defined hominal environments traffic matrices for their environments. There­ for an office, a campus, ana a factory. These defi­ fore, the team had to derive models for those nitions are summarized in Table 1. generalized types of traffic, using previous mea­ In these definitions, harshl and benign environ­ surements of internal workloads and some edu­ ments refer to the envirorlmental characteristics cated assumptions. These models were subse­ in which the IAN needs to operate. For example, quently used to evaluate several architectures in a harsh environment oneI might expect a wide offered by the team as candidates to meet the range of operating temperluures or the presence project's goals. of strong electromagnetic fields. The environmental model for each traffi c type Added to the definitionsl were. a number of I shows particular characteristics. The terminal­ facts that customers stressed or that were of to-computer model has a large number of termi­ general use to the projec . These facts were as nals, all needing access to a small or moderate fo llows: � number of host computers. Although the aggre­ . . . • Many customers had a variety of standard and gate throughput is small, the traffic is bursty. In nonstandard higher-le I protocols .running addition , the cost to connect each terminal �d on their LANs. Clearl , any solution had to device to the network must be small (i.e., not take those existing protocols into account. large compared with the cost of an inexpensive � terminal). • Despite using nonsta dard protocols, cus- J1

Computer-to-computer traffi c needs full logi­ tomers generally impleI mented their LANs cal connectivity and has higher throughput (up with subsystems comp�iant with a standard, to several megabits per second per computer) such as one of the IEEE 802 'tandards. I Table 1 Definitions of Environments I

Number of Physical Frequency of Environment Extent Stations Environments I Station Movement Office Less than Less than Benign I Occasional 3 kilometers 130

Campus Less than Less than · Benign within Possibly frequent 25 kilometers 10,000 a building Harsh between buildings

Factory Less than Less than Harsh Rare 8 kilometers 2200

Digital TecbnlcalJournal 55 No. 3 September 1986 Th e Extended Local Area Ne twork Architecture and LANBridge 100

• 2 In addition to the valid technological and 10 ...Jw environmental reasons fo r choosing a particu­ <( (.) lar LAN technology, some customers had (/) (!) based their choices upon faulty assumptions. 0 ::::!. This was particularly noted in discussions on (/) Cl 10 the delay variance of token-based systems in z 0 e � (.) 16 various normal recovery modes. w (/) e�a :J • The importance of performance-monitoring · ...J � and serviceability features were emphasized � >- almost universally by customers. <( ...J w At this point it was clear that the original project Cl z <( goal of investigating only broadband technology w was too narrow. Using broadband technology � 1 a·' 0 25 50 75 100 alone could not satisfythe broad requirements of the environments identified by the team . There­ PERCENT OF OFFERED LOAD fore, the team expanded its scope to encompass the larger problem of providing a wide variety of Figure 1 Local Area Network Performance services (terminal-to-computer, computer-to­ computer, and real-time) in the three environ­ ments (office, campus, and fa ctory) . Carrier Sense Multiple Access (CSMA) It was also clear that there were two funda­ By eliminating the collision detection capability mental approaches to provi ding those services. of CSMA/CD, one can build a LAN whose trans­ First, the team could attempt to develop a LAN mission rate scales well with distance . However, architecture, or enhance an existing one, that the obvious benefits of collision detection will · could cope with the wide range of nodes, dis­ be lost. Without collision detection, the delay tances , media, performance, and cost con­ variance experienced by applications tends to straints . Second, the team could attempt to increase because CSMA relies on more-frequent develop a mechanism for interconnecting the higher-layer protocol time-outs. To compensate various LAN technologies. for this problem, the transmission rate could be increased sufficiently to reduce the probability LAN Te cbnology Alternatives of collision. However, this action would impose · The team decided first to evaluate a variety of significant cost penalties on the end stations, suitable media access methods. Each alternative which would need faster logic and would experi­ and the conclusions reached by the team are ence more difficult transmission problems. summarized below. Carrier Sense Multiple Access with Pa rtial Carrier Sense Multiple Access with Collision Collision Detection (CSMA ±CD) Detection (CSMA jCDj 2 Either the physical extent or the transmission This was the alternative most fam iliar to the team rate of a CSMA/CD network could be extended so members, since Digital was currently building that collisions would be detected only if the col­ products utilizing CSMAfCD for both baseband liding stations were sufficiently close or if the and broadband media. The performance of packets were sufficiently large . Such a scheme CSMA/CD does not degrade rapidly as a fu nctio would have good throughput-delay characteris­ of the number of connected nodes (see Figure tics if the physical extent were small; however, 1). However, its extent (maximum signal propa­ degraded performance would result if the physi­ gation path length), transmission rate, and mini­ cal extent were large. Unfortunately, this scheme mum packet size are not independent because of yields a significant delay variance because the fi nite propagation delays .3·4 Therefore , to relative locations of the colliding stations and

increase the physical extent of CSMA/CD LAN, the size of the colliding packets now affect the the minimum packet size or the transmission rate layer, either the media access control (MAC) or or both must be decreased to ensure that there transport, at which collision recovery is per­ are no undetected collisions. formed.

56 Digital Te cbnicaiJournal No. 3 September 1986 New Products

To ken Passing Bus5 Frequency Division Multiplexed (FDM) Bus The characteristics of the token-bus access The characteristics of an FDM bus are somewhat method scale reasonably well with physical similar to those of the TDM bus. The FDM bus has extent (see Figure 1), but poorly with the num­ an additional degree of freedom in that it could ber of nodes.3 This situation is complementary to have slots of different bandwidths. The problem that of CSMAjCD. The sensitivity of a token bus with the FDM bus, however, is logical connectiv­ to the number of nodes makes it unsuitable for a ity. To have fu ll connectivity, each node must single LANwith many nodes. The token bus, like monitor each frequency band for messages des­ CSMAjCD, is well suited to implementation on a tined for that node. In practice, this monitoring CA1V-like cabie plant. is prohibitively expensive.1As an alternative, one could apply a reservation' system to either the To ken Ring6 TDM or FDM buses. The dharacteristics of such The performance characteristics of the IEEE an approach, however, arejmuch better suited to

802.5 token ring are somewhat similar to those a connection-oriented ser\riI ce, such as voice or of the token bus. However, an IEEE 802.5 token video, rather than one . with bursts of data. I ring station will not reissue a token until the pre­ I. viously transmitted frame has circulated com­ Hy brid of FDM and CSM1fCD pletely around the ring. This characteristic A hybrid scheme utilizing multiple slow-speed makes the ring more sensitive than a token bus to (approximately 1 million bits per second, or increasing physical extent. Moreover, a token Mbps) CSMAjCD channels, each in its own ring cannot be applied directly to a branching­ 6-MHz band, is another specific alternative that tree physical topology, such as the one in a was considered. Without increasing the mini­ CA1V-like cable plant. mum packet size used in an Ethernet, each CSMA/CD channel can span an extent of approx­ Slotted Ring imately 30 kilometers. Multiple CSMAjCD chan­ The design tradeoffs made in most slotted rings nels could be used to increase the aggregate result in small slots, usually less than 20 bytes. capacity of the network. Unfortunately, logical Therefore, it is important to minimize the slot connectivity cannot be achieved without some overhead, such as source and destination mechanism for switching packets between these addresses and error detection fi elds. Such opera­ channels. Furthermore, th� bandwidth available tions are usually associated with connection­ to any station is limited to a rate of 1 Mbps. Since oriented services, such as voice transmission. In there is no industry standard for a 1-Mbps, 6-MHz slotted ring networks, mechanisms are often CSMAjCD LAN, selecting;this approach would present to impose a measure of "fairness" in the make necessary an attempt to standardize it. network. Those mechanisms make it difficult for These evaluations con}rinced the team that an individual station to acquire a significant frac­ none of these access methods sufficed for build­ tion of the instantaneous transmission rate. Such ing a single LAN capable bf successfully operat­ networks are often inadequate for handling the ing in all dimensions of interest to the project. bursty traffic expected in the environments of Not one of these alternatives was capable of interest. directly employing all the types of media that the customers wished to utilize. Furthermore, any Time Division Multiplexed (TDM) Bus choice was constrained , by the desire for an The principal disadvantages of using a TDM access method with a defined standard having structure are that the number of time slots is the appropriate parameters. The project team fixed, and each time slot is assigned to only one would have to find a way to interconnect at least station. Thus, with a large number of stations, a subset of the standard LANs if the project were even with low network utilization, the mean to be successful. waiting time is large. Furthermore, since the bus is allocated in tum to each station, the maximum LANIn terconnection Alternatives throughput of any station is limited to the data The team next investigated a variety of intercon­ transmitted in that station's slot. The TDM bus is nection methods, each of which had certain well suited to isochronous traffic, such as voice advantages and drawbacks. or video.

Digital TecbnlcalJournal 57 No . 3 September 1986 Th e Extended Local Area Ne twork Architecture and LANBridge 100

DECnet Router LAN to which node Q is connected. Bridges make The architecture fo r DECnet Phase IV+ could be use of data link layer addresses to make forward­ used to create a DECnet network for the inter­ ing decisions. A bridge receives all frames from a connection. Such a network would be fu lly capa­ particular LAN and then decides, based upon the ble of handling the number of nodes needed in destination address in the MAC header, whether any of the three environments of interest. In fa ct to forward each frame. this was quite an attractive alternative. One A collection of LANs interconnected by bridges could choose the data links in such a network to is referred to as an extended IAN. In general, optimize the cost and performance. For exam­ bridges may be used to connect LANs of different ple, Ethemets placed in local areas could offer types, as shown in Figure 2. Therefore, this alter­ good response at low and moderate network native can successfully utill.ze diverse LAN tech­ loads. Then a token bus, with its capability to nologies, if appropriate, to optimize some func­ handle high utilizations and large extents, could tion (e.g., low cost, high performance, or a be used as a backbone to interconnect the combination of these) . Furthermore, a bridge routers. Those would in tum connect to the Eth­ appears to be merely another station on each LAN emets. The sensitivity of the token bus to large to which the bridge is connected. Therefore, numbers of nodes would be minimized since the multiple IANs, each fu lly configured, can be only nodes on the token bus would be the connected to eliminate their practical con­ routers. Unfortunately, not all customer nodes straints on distance, number of nodes, media, use the DECnet routing protocol, making this and utilization. alternative useful for only a subset of the nodes Based on the reasoning above, the team in a network. selected the bridge alternative as the one best suited to realize the expanded goals of the pro­ Central Switch ject. Thus the team began to develop an architec­ To complete the logical connectivity of a net­ tural specification for extended LANs and work composed of multiple LANs, the team con­ bridges. The team also began to develop a work­ sidered an architecture organized around a cen­ ing breadboard model that eventually led to the tral switch element. Conceptually, the switch IANBridge 100 product. The architecture that could be connected to all LANs in the network was evolved fo r extended IANs and bridges is and then selectively forward packets to the LAN described in the next section. with the destination end station. This alternative has most of the advantages of the DECnet router solution discussed above. Normally, however, all end stations need the same routing protocol. To avoid this problem the switch must either sup­ port a variety of routing protocols (and translate among them) or somehow perform its switching task in a way transparent to the end stations. A single switch of sufficient capacity and reliabil­ ity to do either task was likely to be fairly com­ plex to design and manufacture_ It would also need to scale in a cost-effective manner for a wide range of networks.

Bridge 7 •8•9 lOCAL AREA NETWORKS A bridge, or MAC layerrelay, is a device connect­ � ing two or more LANs so that a node on one LAN may communicate with a node on another' just as Figure 2 Bridged Network Configuration if they were on the same IAN. In operation, a bridge is a store-and-forward switch that isolates traffi c to only those IANs on which the traffic must appear. For example, in Figure 2, traffic between nodes X and Y would not appear on the

58 Digital Tecbnical]ournal No . 3 September 1986 New Products

Advantages of Bridges The ExtendedLAN Ar chitecture Bridges used to connect IANs have several useful General Goals properties: An ideal extended IAN should possess a nuinber • Traffic Filtering - Bridges isolate each IAN of characteristics that translate into design goals from traffic that does not have to traverse that for the architecture. These design goals are as IAN. For example, in Figure 2, traffic between fo llows: nodes A and B is not sent on the IANs to which P and Q are connected. Because of this fi lter­ • Minimize Traffic - The primary traffic on the ing, the load on a given IAN can be reduced, individual LANs should be generated by the thus decreasing the delays experienced by all user stations. Traffi c due to complex routing users on the extended IAN. algorithms should be eliminated or at least minimized. • Increased Physical Extent - IANs are limited in physical extent (at least in a practical • No Duplicates - The· bridges should not sense) by either propagation delay or signal cause duplicate frames to be delivered to the attenuation and distortion. Being a store­ destinations during nor�al operation.

and-forward device, a bridge forwards frames • Sequentiality - The col:nbinationI of LANs and after having gained access to the appropriate bridges should not permute the frame order- IAN via the normal access method. In this ing as transmitted by thb source station. · way the extended LAN can cover a larger I extent than an individual IAN. The penalty for • High Performance --+ The extended LAN this coverage is a small store-and-forward should preserve the cl!taracteristics of high delay. throughput and low del�y that users expect in LAN environments. In !practice, this means • Increased Maximum Number of Stations - that the bridges shoulp be able to process Because of either physical layer limitations or frames at the maximum rate they can be stability and delay considerations, most IAN received. Since LANs operate in the multi­ architectures have a practical limit on the megabit-per-second range, fulfilling this goal number of stations on a single IAN. Since the requires a fa st switching operation. bridge contends for access to the IAN as a sin­ gle station, one bridge may "represent" many • Frame Lifetime Limit - Frames should not be nodes on another IAN or an extended IAN. allowed to exist in the extended IAN for an unbounded time. Some higher-layer protocols • Use of Different Physical Layers - Some IAN may operate poorly if frames are unduly architectures support a variety of physical delayed. This fact is especially true for proto­ media (baseband and broadband coaxial cols used for interactive applications. These cables, and cable) that cannot be protocols depend on the low delay character­ directly connected at the physical layer. istics of a LAN. An example is the Local Area Bridges allow these media to coexist in the Transport (LAT) protocol.10 same extended IAN. • Low Error Rate - LANS typically have a low • Interconnection of Dissimilar IANs - IANs of effective bit error rate. �igher-layer protocols different architectures are typically intercon­ are often designed to take advantage of this nected with routers or gateways. Often these I low rate, which allows �hem to operate mol"e devices are complex with only moderate efficiently since they c�n assume that errors throughput, not an appropriate situation for a I are infrequent. Extendetl IANs should not sig- LAN environment. It is possible to build a nificantly increase this �rror rate . bridge connecting dissimilar LANs (within I constraints discussed in the section ."Perfor­ • Low Congestion Loss -;Individual LANs min- mance Considerations"). For example, such a imize congestion by employing access control bridge would allow stations on an IEEE 802.3 schemes that prevent excessive traffic from (CSMAjCD) IAN to send frames to stations on entering the LAN. Extended LANs are more IEEE 802.4 (token bus) or IEEE 802.5 (token vulnerable to congestion loss since the ring) LANs. 2•5•6 bridges may be forced to drop frames when

Dtgital Te cbnical]ournal 59 No. 3 September 1986 The Extended Local Area Network Architecture and LANBridge 100

the ones queued to be transmitted match the should be possible to simply plug the IANs available buffers. This phenomenon should be into a bridge , then apply ac power to the minimized by designing the bridges properly bridge for the network to operate. No com­ so that they are not bottlenecks. It will also be mands from a network manager should be minimized by configuring . (placement and siz­ required to achieve normal operation. ing) the extended IAN properly. • Growing from a single-segment LAN to an • Generalized Topology - To increase the extended IAN should be accomplished with­ availability of the extended IAN, it would be out prior planning. The architecture must useful to allow arbitrary of provide simple, efficient mechanisms (net­ IANs by means of bridges. This interconnec­ work management, etc.) to manage growth tion allows duplicate bridges and IANs to be easily. This goal means that the owner of a IAN configured in parallel, thus increasing does not have to anticipate that he will, at availability. some time in the future, install bridges and more IANs to build an extended IAN. Thus his Sp ecific Goals IAN can become an extended IAN without his Although the ideal goals described above served having to plan the ultimate configuration of as a fr amework for development, other specific the extended LAN when the first LAN is goals were formulated for the architecture installed. This fact is important since experi­ itse1f. I,7,B,9 ence has shown that networks never grow according to the plans made at the outset. • No modifications should be needed to stations that adhere to the existing IEEE 802 stan­ • Predictable, stable performance, including dards. 2·5·6 Therefore, the extended IAN will be predictable route selection for a given topol­ transparent to the end stations. This goal sim­ ogy, should be provided under normal and plifies the end-station hardware and software fa ilure modes. In addition, to make diagnosis, designs. maintenance, and management of the network easier, the routing algorithm should be deter­ • The interconnection of all IEEE 802 MAC pro­ ministic. For example , it should compute a tocols must be accommodated. given topology based on the current state of • Automatic recovery fr om state changes in the the network . If that state changes , the

extended LAN , including LAN , bridge, and algorithm will compute a new topology. If the end-station fa ilures, must be accomplished. new state now changes back to the original state, the algorithm should produce the origi­ • Connectionless and connection-oriented nal topology, not a completely new one. With­ IEEE 802.2 logical link control (LLC) proto­ out this feature it is very difficult to reproduce cols should be supported efficiently. The fa ilure scenarios when diagnosing the net­ extended IAN should also be independent of work or to plan the network predictably. all higher-layer protocols. Such independence is needed to suppon the diverse set of proto- • No overhead should be required in the end · cols that wi ll exist. ll, I2, I3 stations to communicate with stations on the same or different IANs. • A bridge should not require explicit notifica­ tion of station location by the end stations or a • No overhead should be imposed on stations as management entity. The bridge should learn a penalty for communicating with many pan­ automatically of the stations' locations with­ ners (such as file servers or gateways to other out communicating with other bridges. (The extended IANs) . bridges do communicate with each other to • End-station and bridge MAC addresses can be maintain the logical tree topology. This com­ assigned with any policy (global, local, flat, munication is independent of the activities of hierarchical, etc.) desired by the users. The end stations.) architecture should require only that each end • Management intervention should not be station and bridge have a unique address

required to make the network operational; the within the extended LAN . If addresses are bridges should autoconfigure. For example, it assigned globally, then the extended LAN

60 Digital Tecbnical]ournal No. 3 September 1986 New Products

should have the added advantage of requiring be forwarded on each MAC entity except the one no management intervention when previously that received the frame. disjoint extended IANs are merged. Frames with group addresses (i.e., multicast addresses) are always fo rwarded on each MAC • A general mesh topology including backup entity since the bridge has; no way of knowing bridges and IANs should be supponed trans­ which end stations should receive the frames parent to the end stations for increased reli­ that are addressed to gr ups. This concept, ability and availability. d called protocol regionalization, can effectively • End-to-end data integrity should be provided limit the propagation of these messages through across the extended IAN for all normal and the extended IAN, thus allowing cenain appli­ multicast/broadcast frames when the MACs cation protocols to be confined to various are the same type. This capability holds across regions.14 This confinement is done for reasons any connected subset of the topology that is of performance, management convenience, and the same MAC type. Thus the frames are not privacy. · I modified and the MAC frame check sequence The table is simply a cac�e of station address­ (FCS) has end-to-end significance for the to-MAC entity associations! for stations that are extended IAN. communicating. As with any caching scheme, the problem of stale data exists. Therefore, the Architectural Overview table entries are aged out on a time scale that is We used the goals above to develop a unique long enough to minimize. overhead, yet short architecture for the extended IAN concept. This enough to capture station rp.ovements. architecture is described in this section. Follow­ The algorithm learns the location of end sta­ ing that description are sections on bridge per­ tions dynamicallyand assumesI that few of them formance and resources. These are two impor­ simply receive traffic withoutI ever sending tant, but often neglected, topics that should be replies. If the station location is not known, then considered when specifyingan architecture. Per­ fr ames directed to it are forwarded on all MAC fo rmance is especially critical to the proper entities. Our experience shows that in a typical functioning of products that are eventually built operation only one frame from a station is to use the architecture. For the fi rst time, perfor­ required for most, if not all, bridges in the mance analyses were included as an integral pan extended IAN tolearn the s�tion's location. Typ­ of the architectural specification. ical higher-layer protocols more than satisfy this The algorithm used in the bridge is very sim­ requirement. ) ple. The algorithm maintains an association The initialization phase qf a bridge is specified ' between the end-station MAC address and the in a panicular fa shion. The bridge is powered on MAC entity on the bridge through which that sta­ and then passively observes·the traffic on its MAC tion has been observed. The associations are entities for a number of seconds. During this stored in a table, also called a forwarding data­ time the bridge accumulates associations in its base, in the bridge. The bridge maintains that forwarding database, after which it comes on line table by observing traffic on the IANs to which and begins fo rwarding op�rations. This initial the bridge is attached, operating in "promiscu­ passive learning period prdvents the bridge from ous" mode. In this mode the bridge monitors all unduly flooding the exten�ed IAN with frames frames that appear on the IAN. For each frame destined to stations it hasn't yet heard from. As received, the bridge notes the source MAC with all parameters in the algorithm, the dura­ address and the MAC entity on which the frame tion of this learning period during power-up is was seen. The bridge also searches in the table not critical. It should simply be long enough to for the destination MAC address. If that address is witness frames from a large percentage of the found, the frame will be forwarded on the MAC active stations. entity indicated in the table. Of course, if the As specified so far, the algorithm will not mod­ indicated MAC entity is the same one that ify frames as they are passed through a bridge received it, the frame will be dropped since the between IANs of the samej type. This restriction destination is known to be on that "side" of the provides the additional benefit of end-to-end FCS bridge. If no association is fo und, the frame will coverage fo r normal and broadcast frames within

Digital TecbntcaiJournal 61 No. 3 September 1986 The Extended Local Area Network Architecture and LANBridge 100

the extended LAN. When forwarding between • The computed topology converges in a maxi­ dissimilar MACs, the LLC protocol data units mum of twice the round-trip delay across the (PDU) are extracted from MAC data frames and extended LAN. forwarded on the next MAC . An FCS for that MAC • The computed topology is deterministic, is computed normally by the bridge. Bridges of meaning that it can be calculated deterministi­ this type should guard against errors in memory cally by the network designer. or on datapaths. Also note that the IEEE 802.5 token ring may require byte reordering, which, • Bridges implementing this algorithm can however, can be dealt with at the controller coexist with simpler bridges not implement­ interface. ing it. Loops will still be broken, provided Because the bridge does not modify frames, that no loop exists composed solely of bridges there is an inherent mechanism for loop detec­ that do not implement the algorithm. tion encoded into them. Conventional net­ • Duplicate packets are not generated when work layer routing algorithms detect loops redundant bridges are used for backup. with hop counts or other frame lifetime (age) controls, losing transparency in the pro­ • An effectively unlimited number of bridges is cess.ll· 12• 1 3 The Extended LAN Architecture supported. The only practical limit is the per­ restricts the logical topology to a tree that pre­ formance characteristics one wishes to have vents loops from occurring, while preserving for the extended LAN. We size the extent of transparency. the network based on the delay and through­ It is desirable to maintain proper operation put characteristics it achieves, not on arbitrary of the extended LAN if it is misconfigured. restrictions. An example based on models Therefore, we designed an algorithm to auto­ developed of the IAT protocol is given later in matically and transparently transform a general this paper. mesh topology into a spanning tree, thus • No a priori knowledge of the topology is preventing packet looping.15 This algorithm required. also allows redundant bridges and links to be • used as backups, thus increasing the availability Optional user-defined "primary" routes or of the network. Availability is very important "backup" bridges are permitted; otherwise, since the extended LAN is quite of the basis fo r the routing is automatic. much, if not all, of the communications. This Performance Considerations algorithm was implemented in the LANBridge 100. The performance of an extended LAN is deter­ The spanning tree algorithm imparts the fol­ mined by a number of design parameters, includ­ ing lowing characteristics to an extended LAN: the expected capacity of the backbone and subnets, the overall system capacity, the applied • A spanning, acyclic subset of a general mesh load, apd the frame loss rates. The designer must topology is maintained. be concerned not only with providing adequate performance for current usage but also with • A very small, bounded amount of memory allowing future growth . . per bridge is required, independent of the Ideally, a system is designed to be sufficiently total number of LANs or the total number of robust to accommodate changes in its user popu­ bridges. lation as well as its characteristics. For example, • A very small, bounded amount of communica­ studies have been conducted to measure user tions bandwidth is required on each LAN , workloads in a program development environ­ independent of the total number of LANs or ment.16 The study results could be used directly the total number of bridges. to estimate the applied load due to some number of those users. To do that, however, requires that • Lost messages are tolerated and the broadcast the designer also model all the protocol layers nature of multiaccess LANs is utilized effi­ involved in transferring this information across ciently. the extended LAN. • Participation by the end stations is not It is important to size the capacity of an required. extended LAN. Given the characteristics of user

62 Digital Tecbnical]ournal No. 3 September 1986 New Products

demands, capacity is expressed in terms of the the resources of the subnets will likely be under­ number of users supported. The difficulty with utilized. The additional capacity of the subnet using this number as the independent variable is can then be used only for local traffi c. This calcu­ that the designer must account for the resource lation defines the limits for the system with consumption from all layers of the protocol. That respect to the ratio of local ;traffic to total traffi c · is generally hard to do . that is possible. Another difficulty is that the performance Using the above principles we developed requirements may vary for different higher-level a capacity-sizing methodplogy for extended protocols. Some may be delay sensitive , For LANs . The diameter of ahI extended LAN is example, terminal access protocols that return sized in the following fa shion. The average echoes end to end are quite sensitive to delay. one-way delay across the longest path cannot Other terminal access protocols that allow local exceed 10 milliseconds, chosen as the delay editing and echoing are not so delay sensitive. budget based on analyses of higher-layer File transfer protocols are not sensitive to the protocols that ate delay sensitive . One such pro­ · delay but require high throughput. Therefore, tocol is the LAT protocol used for terminal to determine the capacity ofan extended LAN, access. 10 the .designer must investigate both delay and A detailed simulation of that protocol was used throughput as applied to the requirements of to study different configurations and values of particular protocols and applications that use the an average delay budget. The 1 0-millisecond LAN. delay budget allowed for �ariance in the delay Certain LANs constrain the configuration, and kept the protocol operation in the normal owing to either physical layer limitations (such states (without timeouts, e�c.) . In addition, the as the distance over which the line drivers can operating point must be sei: so that none of the operate) or the interaction between the access links in the extended LANI run at greater than method of the data link layer and the propagation 90 percent utilization . (Note that this utilization delay. For example, Ethernet places a limit on may occur at an offered lo:id of much less than the maximum number of repeaters between 90 percent.) On an Ethernet this limit occurs for any two communicating stations. l7·18 This con­ offered loads of anywhere from 4 5 percent to straint assures that the propagation delay bud­ 90 percent utilization. The difference between get, which is assumed by the access method pro­ the utilization and the offered load is the over­ tocol, will not be exceeded in any configuration. head on the link. On an Ethernet this difference In an extended LAN, the designer may also wish includes delays caused by collisions; on token to constrain the configuration based on the rings it includes delays for token passing and the performance expectations of higher-layer pro­ like. tocols. For example, he may require that there This methodology assures, that the component be no more than a certain number of bridges links in an extended LAN: are all running in between two stations that use a delay-sensitive stable operating regions, andI that the delay is protocol. In general, the constraints are more similar to that on a single LAN. The fulfillment complex when an extended LAN is config­ of these conditions is impo ant so that the per- ' ured with dissimilar LANs since the individual formance expectations of h�gher-lah yer protocols LANs may provide diffe rent delay/throughput are still met. Depending on the type of LANs used · characteristics. in an extended LAN , the number of bridges ' Another problem when determining capacity allowed (in series) will be different. Token­ is estimating the amounts of traffi c remaining access LANs often have higher average delays local to a subhet and leaving that subn�t. The than Ethernet LANs. These delays could consume worst case occurs when all traffic must be for­ some of the delay budget, which averages 1 0 warded from a subnet through one or more levels milliseconds. In the case of all Ethernetlin ks, the of backbone, thus creating the largest demand on number of bridges allowed in series is some­ the resources of the backbone. One way the where between seven and nine. Further discus­ designer can handle this situation is to assume sion of the performance aspects of extended that all the locally generated traffic must also be LANs may be found in the aper "Performance · carried by the backbone. Increasing the load will Analysis and Modeling of Digitalp 's Networking then define the system saturation point at which Architecture. "19 11

Digital TecbnicalJournal 63 No. 3 September 1986 Th e Extended Local Area Ne twork Architecture and LANBridge 100

Bridge Resources 1. Upon reception, owing to the lack of receive The major resources of concern in a bridge are buffers the buffering required to store and forward 2. After reception, owing to queuing for the frames , the table space for the forwarding forward ing process database, and the CPU cycles to execute the algorithm. Note that CPU cycles are also 3. After the forwarding process, because of con­ required to perform network management. Typ i­ gestion on the outbound LAN cally, any bridge implementation must guarantee Proper bridge design can solve the first two that network management commands are eventu­ ally executed . For example, suppose a bridge sources of congestion. The third problem, how­ was heavily loaded because of a slow outbound ever, is a general one for bridges, routers, and 20 IAN. A network manager wanting to disconnect any store-and-forward device. There are several that bridge may be unable to do so if all received ways that the bridge designer can address this frames are being drop ed because of buffer con­ problem. We first make a general observation gestion. Therefore , p one important aspect of about the required service rate of the service implementing a network management architec­ centers in a queuing network. Steady-state con­ ture is that some amount of buffering must be gestion at the forwarding process can be avoided preallocated to handle those messages . More­ completely if the network can always make for­ over, scheduling must be accomplished so that warding decisions faster than the summation of the network management process in the bridge is the interarrival times of the smallest frames guaranteed to make progress . This guarantee is a across all the inbound LANs. The forwarding matter of correctness and therefore should be database must be consulted for each frame on stated in any effort to make the architecture a which a forwarding decision is made. There are standard. many ways to do that very efficiently. Buffers are also required to hold frames while The table discussed earlier is really only a they are waiting to be either processed or fo r­ cache of station address-to-MAC entity associa­ warded. As depicted in Figure 3, bridge can be tions; a search of that table is required to locate modeled as a queuing system in which the ser­ an entry. If the table is ordered, then a binary vice centers represent the forwarding process search can locate the entry in question. There are and the outbound LANs. Congestion can occur at other alternative search methods, such as seg­ three places: mented hashing. The implementation of this pro-

BRIDGE ,------�------� I I BUFFERS I I I I

...... L-L...L..I, I T I I . DISCARD I I L------j

Fig ure 3 The Tw o Port Bridge Resource Model

64 Digital TecbnicalJournal No. 3 September 1986 New Products

cess is one of the key aspects of the bridge tech­ is available to maintain stability in the queue nology. This facet is covered later in the paper in (particularly during transient bursts of frames) . the discussion of the technology used in the IAN­ Frames will have to be moved (out of the con­ Bridge 100 product. troller buffers or shared m�mory) into another

A final point with respect to caching is in buffer to queue for the forw�I rding process. With order. Further enhancements in performance can respect to bridge delay, th�s time must also be be obtained by recognizing something about the included in the forwarding process. With respect nature of the traffic on IANs. Extensive measure­ to bridge throughput, the bottleneck server will ments on token rings, Ethernets, etc. have uncov­ determine the peak. ered several important facts. These are related to Therefore, the only place any congestion will the nature of higher-layer protocol and applica­ occur in these bridges is at the outbound IAN. tion operation. One is that, given that a frame This congestion will occur if that LAN is not from station S and station D has just been fast enough for the volurrle of traffic it must observed on the LAN , the probability that the carry. This problem is an iss�e of IAN speed, not next frame observed is either from D to S or also bridge speed. The philosophy is to design from S to D is very high.21 Thus, if the bridge bridges so that they will not be bottlenecks. keeps the last few associations it has obtained Most of these comments apply to any routing from the database, it is very likely that the next algorithm and hold true whether a table or a

frame will use one of those associations. Keep­ frame must be searched. AQ.d· they hold true for ing them further reduces table access rates. It all the MACs. amounts to a two-level cache .

With these observations we now focus on con­ Effe ct of Bridges on EtI hernet Links gestion at the receive or transmit buffers. Con­ Bridges have several effects' on the performance gestion at the receive buffers can- be avoided of CSMAfCD IANs. One effect is due to the filter­ through proper machine organization. For exam­ ing fu nction that prevents traffic from entering a ple, a bridge using separate controllers for each subnet that it need not traverse . Recall that IAN, each controller having its own local buffer­ bridges operate above the data link MAC layer as ing, will have to assure that sufficient buffering shown in Figure 4. Preventing this traffic flow

APPLICATION APPLICATION LAYER LAYER

NETWORK NETWORK LAYER I LAYER I ------·------j ______------BRIDGE

BRIDGE FUNCTIONS

DATA LINK DATA LINK DATA LINK DATA LINK LAYER LAYER LAYER LAYER i

PHYSICAL PHYSICAL PHYSICAL I PHYSICAL LAYER LAYER LAYER LAYER PHYSICAL MEDIUM � p PHYSICAL MEDIUM

Figure 4 Bridges and Data Links

Digital Technicaljournal 65 No. 3 September 1986 Th e Extended Local Area Ne twork Architecture and LANBridge 100

reduces the applied load on the LAN , thus In the following sections, the design goals and improving performance for the local users. principles of the IANBridge 100 are discussed, Another effect is more subtle. Consider a along with the trade-offs that had to be made in CSMAfCD system with an extent of D meters and the design. N stations distributed uniformly over that extent. Without using bridges, all N stations have to Design Goals share the resources of that one LAN extended The design of the IANBridge 100 was guided by over D meters. The delay and capacity are deter­ one primary principle: The bridge characteristics mined by the applied load, as described above . If should not cause the performance of the added in the center of the system, the Ethernet extended network to degrade . Network conges­ will be partitioned into two Ethernets, each with tion could cause such a degradation, but not the N/2 stations . Thus the collision windows on bridge. Therefore, the bridge had to have suffi ­ each partition have been cut in half. The smaller cient processing power to receive any possible collision windows cause less bandwidth to be stream of incoming traffic and make the correct wasted per collision. The net effect on the sys­ forwarding decisions. If some or all traffic is to tem is not only to reduce the load applied to a be forwarded, the brid e must queue it for for­ given Ethernet (through filtering) , but also to warding. If the outboundg Ethernet is congested, improve the overall efficiency or capacity of that however, it may be impossible to fo rward the Ethernet since the extent it must cover is smaller. packets and some or all may eventually have to In effect, the Ethernet gets more efficient as the be discarded. This is a problem of network uti­ load applied to it is reduced. Given these factors, lization, however, and not bridge design. along with performance information characteriz­ Although a bridge must discard packets during ing the behavior of the Ethernets under load, the periods of prolonged congestion, it should not bridge designer can investigate the performance discard traffic during periods of transient conges­ of bridged networks using these IANs.4,22 tion. Therefore, the bridge must provide suffi­ The remaining section of this paper discusses cient buffering to prevent packet loss during the the IANBridge 100, which is an implementation traffic peaks occurring in any properly operating of the Extended IAN Architecture . IAN. When the individual Ethernets are operat­ ing close to saturation, any IAN 's performance Development of the LANB ridge 100 will be generally unsatisfactory regardless of The bridge architecture was deve loped in paral­ the presence or performance of bridges. When a lel with the first implementation of that architec­ LAN is operating in the range in wh ich good ture . An Ethernet-to-Ethernet bridge, was performance can be expected, transient conges­ designed as a prototype to demonstrate the use­ tion may still occur. In this case, packet loss fulness and practicality of the bridge concept. in bridges can be avo ided with a bounded This was called the "Brooklyn Bridge." The pro­ amount of buffer memory, an amount that can be totype hardware and software were operated in a predicted. laboratory environment. After some operating Since bridges can be installed in series, the for a time, the prototype was installedin Tewks­ issue of store-and-forward latency becomes bury, Massachusetts, between an Ethernet and important. In varying degrees, higher-level pro­ Digital's Engineering Network (ENet) . This pro­ tocols are intolerant of delay; therefore, store­ totype led to a full-scale product deve lopment and-forward delay must be kept to a minimum. project, called Janus, that resulted in the IAN­ Ideally, this delay should be equal to the packet Bridge 100. reception time; however, some additional time is The prototype incorporated some, but not all, needed to make the forwarding decision. Even if of the higher-level reliability and availability fea­ the decision process were partially overlapped tures of the final LANBridge 100. Many of the with the packet reception, this decision could final features were incorporated by the product not be made until after the frame check development groups as the final product design sequence (FCS) had been received at the end of evo lved. Another feature added in deve lopment the packet. The nature of the applications in was a fiber-optic Ethernet extension that allows which bridges were expected to be used led us networks to be extended farther than is possible to choose 1 00 microseconds as the maximum with the Ethernet cable alone . latency for minimum-sized packets . Of course,

66 Digital Tec bnicalJournal No . 3 September 1986 New Products

i longer packets will have a greater store-and-for­ repeaters, and with routers. Jo be successful, the ward delay, proportional to their length. LANBridge 100 had to be perceived as having a A bridge must be able to store information cost/performance advanta�e relative to those about the location of the stations attached to its other options. I LAN. If insufficient room exists to store all the stations in the station database, the bridge must Design Principles I forward messages for any station that did not fit. Designing the LANBridge 1 00I hardware started This situation leads to inefficiency in the opera­ with the premise that a! general-purpose tion of the LAN, since traffic may be forwarded microprocessor is often th� most cost-effective unnecessarily. Based on trends in network appli­ method for implementing s�veral complex func­ cations , we judged it unlikely that a single tions in a single system. Microprocessors are flex­ extended LANwo uld grow to over 8000 simulta­ ible and economical because the same set of neously active stations within the lifetime of this hardware logic can be used to perform many dif­ product. Therefore, the storage limit was set at ferent functions under program control . For any 8000 station addresses. given technology, however, 'they are rarely as fast In addition to providing a low packet loss rate as dedicated logic. Therefore, the bridge was and efficient filtering, a bridge should not con­ designed to implement as many functions as pos­ tribute significantly to the data error rate of the sible in microcode; special hardware logic was extended LAN. Even more importantly, an unde­ used only for time-critical functions. tected bit error corrupting a packet while it is in With a sufficiently fast microprocessor, apply­ a bridge should be detected at the destination ing the principles above to the LANBridge 100 station. This detection can be ensured to a high requirements resulted in the high-level block probability by forwarding the received cyclic diagram shown in Figure 5. This diagram is use, redundancy check (CRC) instead of generating a ful for understanding the general principles of new CRC in the bridge. bridge operation and represents the first pass at It is essential that a bridge be reliable and the LANBridge 100 design. available, since it is as important in an extended Of course, this design: was based on the LANas in the Ethernet cable itself. Therefore, the hypothesis that some available microprocessor LANBridge 100 had to power up and operate cor­ was fast enough to do all the required work in rectly without the intervention of any person or the allotted time. In reality, �orne hardware assist other machine. Since a bridge is important to the was required. Moreover, the memory sizes and proper operation of an extended LAN, a mecha­ the implementations of th�I Ethernet interfaces nism to ensure high network availability was also had not been considered. [fhese complicating required. Parallel standby bridges provided that factors are now examined ip more detail, along mechanism. The bridges also had to be able to with the trade-offs that were required. 1 detect and correct for many unworkable configu­ I Processor and Suppor( Logic rations, such as looping topologies, that might I result from installation errors. A preliminary performance analysis of the Although a bridge must operate without inter­ bridge design in Figure 5 showed that all bridge vention, a network manager should be able to functions, except associating network addresses observe parameters and counters associated with with forwarding inform�tion, could be handled the bridge's operation. He should also be able to by several available microprocessors. Assuming alter some of those parameters. A centrally that this exception could be. performed by exter­ located bridge is an ideal place from which to nal logic, it was possible to consider other price observe activity in order to isolate faults and and performance requirements and select the gather information. That information can then be most suitable processor. used to make decisions about changes and In this design, the microprocessor is directly enhancements to the particular network configu­ involved in making forwarding decisions, but ration. with hardware assistance. It also coordinates the Of course, all these goals should be met with a packet-forwarding process, although actual data cost as low as possible. Although a bridge pro­ movement is the responsibility of the Ethernet vides many valuable features, it nevertheless interface logic. Thus the microprocessor has competes with single Ethernet cables, with some stringent real-time requirements. Further-

Digital Te chnicaljo urnal 67 No . 3 September 1986 The Extended Local Area Ne twork Architecture and LANBridge 100

MICRO- PROCESSOR

l ETHERNETI PROGRAM PACKET ETHERNET ETHERNET MEMORY ADDRESS M MEMORY ME ORY INTERFACE INTERFACE

ETHERNET LAN ETHERNET LAN

Figure 5 High-level Block Diagram

more , it must have additional time available to but only at considerably greater expense. Since a perform other functi(;)ns, such as updating timers VLSI implementation was clearly the most attrac­ and running the spanning-tree algorithm to cor­ tive option, we explored a number of alternative rect possible faulty network configurations. At sources for it. power-up or system reset, the microprocessor Data integrity is one of the more important must verify via diagnostic code that the entire considerations in designing a bridge . In particu­ system is operating correctly. lar, the bridge should not cause undetectable The design of the real-time code paths was data errors in a packet delivered to a destination fairly straightforward. It was written in a detailed station. This injunction implies that either the outline form for a generic processor. That packet memory in the bridge must have a very . allowed us to understand the requirements and low probability of error or the original CRC gen­ select a processor with enough power. From erated by the source station must be fo rwarded · this design , it was quite clear that a high­ with the packet. If the original CRC travels performance microprocessor was required. The through a bridge with the packet, then any 10-MHz MC68000 chip fr om Motorola, Inc., was packet memory errors will be detected as trans­ chosen based on its available power and attrac­ mission errors at the destination station. tive price. The only available chip set that allowed pack­ In this design, the microprocessor has a private ets to be transmitted without a recalculated and memory. Thus instruction and local-data access appended CRC was one made by Advanced Micro will not conflictwith packet data flowing to and Devices Corporation. This chip set was called from the two Ethernet interfaces. Some of this LANCE(Local Area Network Controller for Ether­ memory is ROM, · which contains all the code net) . Although other considerations were impor­ needed by the bridge to be fu lly functional on tant, this very important capability was the power-up. The bridge also contains RAM,used as deciding factor in our selection process. a writeable data area. There is a small amount of nonvolatile RAM (NVRAM) , which stores system­ Network Address Look- up specific parameters that must survive power fail­ The network address look-up mechanism is one ures and be available on the next system start-up. of the most interesting aspects of the LAN­ Bridge 100 design. Upon receiving a packet, the Ethernet Interface bridge must locate the information associated The Ethernet interface is a complex function that with its destination address so that a forwarding is implemented most economically in VLSI. The decision can be made. In addition, the source interface can be implemented at the board level , address must be added to the database unless it

68 Digital Tecbnicaljournal No . 3 September 1986 New Products

has already been added. Therefore , two 48-bit not be easily used in parallel to form a 16-bit or CAM. .: network addresses must be located for each a 48-bit wide I packet received. It cannot be assumed that the The only feasible alternative remaining was to 48-bit source and destination addresses fo und in employ a hardware-assisted search using eco­ various packets have any known relationship to nomical, commercially avaHable memories fo r each other. On the other hand, the addresses are data storage. Binary search w:as cb.<>sen as the likely to occur in groups because each of the var­ search algorithm. This search technique is fast ious equipment manufacturers has been assigned since it requires at most log( n) probes, where n a block of addresses. The look-up function must is the number of entries in the table. Unfortu­ occur quickly,-since only a small ponion of the nately, the table must be kept soned in numeric time available for processing each packet can be order. That is not a severe disadvantage, how­ devoted to this. one function. ever, since the table can be soned in place with­ There are several possible techniques for look­ out interrupting search operations. ing up network addresses. One straightforward In the IANBridge 100, tl;ie search function is approach is based on a software search per­ performed entirely in hardWare at the request of formed by the microprocessor. With this tech­ the .microprocessor. The microprocessor loads nique, the microprocessor fetches a network the search hardware, or binary search engine, address from a packet and then searches the data­ with network addresses fetched from a packet. base. Even with efficient search algorithms and The enginethen runs in parallel while the pro­ the fastest available microprocessor, however, cessor,does other work. Aft�r 3.9 microseconds, [ this technique is much too slow, especially the microprocessor logic · returns to read the when.the database is filled to capacity. results of the ·search. A second.•approach, also based on software, is Although-searching is a hardware function, the to use the source or destination address as an microprocessor uses software to order the table index into a: table. This technique has the advan­ of network addresses. Reordering must be done tage of.being fast; yet, it is quite impractical , only when new stations are, added or when inac­ since the table length would be almost 280 bil­ tive stations are removed. these events happen lion (248) entries. relatively infrequently, and analysis and experi­ A third solution is hashing, which might be fast ence have shown that software is fast enough not enough in software and could also be easily to hinder operations. If there are several changes implemented in hardware. In this technique, in a shon time · (for example, during the initial 48-bit addresses are transformed by an arithmetic learning period) , they are cached and added, at function to a hash address with a smaller maxi­ lower priority, to the search table. I mum value of the address . For example, if the maximum hash value were .216, a direct Packet Memory Size table look-up could be ;performed, using the Upon determining that a packet received on one hash address as an index. The disadvantage of Ethernet should be forwarded to another Ether­ hashing is that the distribution of-network net, the IANBridge 1 00 must queue the packet addresses is not known a pFiod; therefore, many for transmission. Since Ethernet is a shared chan­ network addresses could translate to the same nel, the bridge must provide buffering to store hash address. This duplication could result all packets that might be queued while the Ether­ in either unnecessary forwarding or incorrect net is busy with traffic from other users. The filteri�g. amount of buffer memory must be large enough 'T hus all three software solutions were unus­ to avoid excessive packet loss resulting from able. It became clear that some type of hardware buffer exhaustion, yet sm�ll enough to be cost assist was required. The most attractive hardware effective. 1 assist from the -standpoint of speed and ease of Over the long term, if the average traffic gener­ use was content addressable memory (CAM) . ated by a bridge and other users on a single Ether­ Unfortunately, the available CAMs were best net exceeds the total capacity, only an infinite suited for use in cache memory applications, amount of memory will prevent packet/ loss. In since they are small and faster than needed (thus this case latency will inc�ease without bound. ,more expensive) . These CAMs also do not scale This situation is rather uninteresting fr om the .wt Hrin width; for example, 8-bit wide CAMscan- \Standpoint of bridge design. The system user will

DJglltll•Tecbnicoljournal 69 No. 3 ·sept ember 1986 The Extended Local Area Ne twork Architecture and LANBridge 100 have exceeded not only the capabilities of any Pa cket Memory Performance realizable bridge, but of the underlying network The packet memory system in the bridge was technology as well. However, there is the more designed to handle worst-case conditions with­ interesting question of the magnitude and dura­ out allowing overruns, underruns, or processor tion of traffic transients in real networks, and the memory · cycle starvation. The worst case occurs amount of memory required to avoid packet loss when both Ethernet interfaces are continuously during those transients. receiving but not forwarding Ethernet packets of The size of the bridge memorycan be bounded the minimum size (64 bytes). The packet source if one notes that most high-level protocols built and destination addresses are examined only on a datagram service like Ethernetexpect tiinely when a packet is received. Therefore, if the pack­ packet delivery. If the delivery of a packet is ets were forwarded, fewer memorycy cles would delayed excessively, these protocols treat the be required. If the packets were longer, the packet like a lost datagram and retransmit it. In number of memory cycles needed to deal with these circumstances forwarding the original buffer descriptors would be reduced, since copy has the undesirable effe ct of generating more data would be contained in each buffer. multiple copies, thus increasing network con­ Under worst-case conditions, the memory must gestion. One way to avoid this problem is to transfer approximately 108 16-bit words per employ the concept of maximum packet life­ minimum packet time (63.3 microseconds) . time. This concept is enforced by the bridge and When the required memory refresh cycles c:in be used to place an upper limit on bridge are also included, one memory cycle takes memory requirements. Unfortunately, even a 580 nanoseconds. rather short packet lifetime requires large The packet memory system must also provide amounts of memory. For example, a lifetime of low-latency microprocessor access so that valu­ two seconds requires five megabytes of memory able CPU cycles are not lost because of packet (lOMB/second X 2 seconds X 2 directions -:- 8) . memory wait states. Since the LANCE chips are Although declining costs may make memories of burst-transfer direct memory access (DMA) this size practical in the future, packet lifetime devices, any shared bus system would have an could not be used to size memory for this inherently unacceptable latency. Therefore, after product. some analysis, a four-port memory system was Another way to size bridge memory is to simu­ chosen: a port for each LANCE, a port for the late the behavior of the system under various CPU, and a port for the refresh operation. The workloads. The LANBridge 1 00 can be modeled memory is fully buffered so that the RAMs them­ rather easily if one notes that the bridge takes less selves cycle every 300 nanoseconds, even time in deciding to discard or fo rward a packet though the LANCE has a 600-nanosecond mini­ than the packet takes to transmit. In this case the mum cycle time, and the CPU has a 400-nanosec­ fo rwarding operation will not be a bottleneck, ond minimum cycle. and the receive buffers will never hold more than one packet. The transmit buffers, however , The Final LANB ridge 100 Design will be emptied at a variable rate depending on A high-level block diagram of the final !.AN­ the load on the Ethernet. Bridge 100 design is shown in Figure 6. Its logic We considered fo ur different workloads: the is quite similar to the original prototype design first based on an existing timesharing environ­ in Figure 5. The primary change was the addition ment, the second on a file transfer application, of hardware to assist in locating forwarding infor­ the third on a process control system, and the mation. In addition, the single bus has been fourth on a hybrid of office and process control . divided into several buses to increase the total The results showed that 16KB of memory (in bandwidth of the system. each direction) was inadequate. Increasing the memory size beyond 64KB gained very little in Summary terms of performance . Therefore, since 64KB The LANBridge 1 00 is the first product to imple­ memories were readily available, a 16-bit mem­ ment Digital's Extended LAN Architecture. With ory bus provided the required 128KB in a conve­ this product, customers may easily expand nient and cost-effective manner. beyond the confines of a single Ethernet to an

70 Digital Tecbnical]ournal No. 3 Sept ember 1986 New Products

MC6BOOO MICROPROCESSOR I I BINARY I MEMORYI ROM SEARCH LANCE LANCE CONTROL CONTROL

RAM

ETHERNET PACKET ADDRESS I NVRAM RAM RAM

ETHERNET ' r LAN i ETHERNET LAN

Figure 6 High-level Block Diagram with MC68000 j I i extended network with as many as 8000 active The origins of the LANBridge 100 and the stations. Up to eight Ethernets may be intercon­ Extended IAN Architecture may be traced to a nected in series (22,400 meters of cable) . Aggre­ project whose original goals only vaguely recog­ gate bandwidth will increase proportionally with nized the need for such a mechanism. Both the every added Ethernet, and usable bandwidth will product and architecture are the result of gather­ also increase substantially in many application ing and analyzing customer requirements, fo l­ environments. lowed by applying innovative design techniques. The Extended IAN Architecture itself is now a key part of Digital's networking strategy. Acknowledgments As exemplified in the LANBridge 100, this The authors would like to acknowledge the other architecture permits substantial expansion in members of the original p;roject team. In addi­ the physical limits of a given IAN technology. tion, the work of Bob Shelley and Tony Robillard These physical dimensions include geographic was instrumental in the de�ign and construction extent, number of stations, and aggregate band­ of the bridge breadboard�. Tony Lauck, Radia width. Bridges implementing the Extended Perlman, George Varghese,! Mike Soha, as well as 0 LAN Architecture also provide increased the the entire LANBridge 1 I 0 development team availability as a result of the spanning tree provided invaluable additions and insight to the algorithm and the standby operation mode. The architectural process. transparent operation of high-performance bridges enhances significantly the capabilities References and services offered by both Digital and non­ 1. B. Stewart and W. Hawe, "Local Area Net­ Digital equipment. work Applications," Telecommunica­ The Extended IAN Architecture also provides a tions (September, 1984): 96f-96u. unifying mechanism in networks composed of 2. IEEE Project 802 Local Area Network multiple homogeneous or heterogeneous IANs. Standards , "IEEE Standard 802.3 CSMA/ The bridge concept may be extended to inter­ CD Access Method and Physical Layer connect IANs with different physical layers, such Specifications," Approved IEEE Standard as baseband and broadband Ethernet. Within cer­ 802.3-1985, ISO/DIS 8802/3 (July tain constraints, it may also be used to intercon­ ' 1983). ' nect dissimilar IANs.

Dtgttal TecbntcalJournal 71 No. 3 September 1986 The Extended Local Area Ne twork Architecture and LANBridge 100

3. W. Bux, "Local-Area Subnetworks: A Per- Op en Sy stem Interconnection (Decem- formance Comparison," IEEE Transac- ber 1983): 1388-1393. tions on Communications, COM-29 14. W. Hawe and G. Varghese, "Extended (10) (October 1981): 1465-1473. Local Area Network Management Princi- 4. W. Hawe and M. Marathe, "Predicting Eth- pies," Digital Equipment Corporation ernet Capacity-A Case Study," Proceed- technical paper ,submitted to IEEE 802 ings of the Eighteenth Computer Perfo r- Standards Committee, IEEE Document mance Evaluation Us ers Group Con- Number IEEE-802.85.1.98 (October 31, fe rence (October 1982): 375-388. 1984). 5. IEEE Project 802 Local Area Network 15. R. Perlman, "An Algorithm for Distributed Standards , "IEEE Standard 802.4 Token Computations of a Spanning Tree in an Passing Bus Access Method and Physical Extended LAN ," Digital Equipment Cor- Layer Specifications," Approved IEEE poration technical paper submitted to Standard 802.4-1985, ISO/DIS 8802/4 IEEE 802 Standards Committee, IEEE Doc- (December 1985). ument Number IEEE-802.85.1.97 (Octo- ber 31, 1984). 6. IEEE Project 802 Local Area Network R. Jain and R. Turner, "Workload Charac- Standards , "IEEE Standard 802.5 Token 16. terization Using Image Accounting," Pro- Ring Access Method and Physical Layer ceedings of the Eigteenth Computer Per- Specifications,'' Approved IEEE Standard fo rmance Evaluation Us ers Group Con- 802.5-1985, ISO/DIS 8802/5 (December fe rence (October 1982): 111-120. 1985). 17. The Ethernet: A Local Area Network, 7. B. Stewart, W. Hawe , and A. Kirby, "Local Data Link Layer and Physical Layer Area Network Connection,'' Telecommu- Sp ecification, Ve rs ion 2. 0 (Digital Equip- nications (April 1984): 54-66. ment Corporation, Intel Corporation, and 8. W. Hawe, A. Kirby, and B. Stewart, "Trans- Xerox Corporation, Order No. AA-K759B- parent Interconnection of Local Networks TK, November 1982). with Bridges,'' jo urnal of Telecommuni- 18. R.M. Metcalf and D.R. Boggs, "Ethernet: cations Networks , vol. 3, no. 2 (Summer, Distributed Packet Switching for Local 1984): 116-130. :. ,, Computer Networks," Communications 9. W. Hawe, A. Kirby, and A. Utuck, "An of the ACM, vol. 19 Ouly 1976): 395- Architecture for Transparently Intercon- 403. necting IEEE 802 Local Area Networks," 19. R. Jain and �· Hawe, ''Performance Analy- Digital Equipment Corporation technical sis and Modeling of Digital's Networking paper submitted to IEEE 802 Standards Architecture," Digital Te chnicaljournal Committee, IEEE Document Number (September 1986, this issue) : 25-34. IEEE-802.85.1 .96 (October 31, 1984). 20. M. Gerla and L. Kleinrock, "Flow Control: 10. B. Mann, C. Strutt, and M. Kempf, "Termi- A Comparative Survey," IEEE Tra nsac- nal Servers on Ethernet Local Area Net- tions on Communications, vol. COM- works," Digital Te chnical jo urnal (Sep- 28, no. 4 (1980): 553-574. tember 1986, this issue) : 73-87. 21. R. Jain and S. Routhier, "Packet Trains: 11. DNA Phase IV Routing Layer Sp ecifica- Measurements and a New Model for Com- tion, Ve rsion 2. 0. 0, (Maynard: Digital puter Network Traffic," IEEE jo urnal on Equipment Corporation, Order No. AA- Sp ecial Areas in Communications X435A-TK, 1982). (Forthcoming, September 1986) . 12. Tra nsport Protocols (Rochester: 22. F. Tobagi and V. Hunt, "Performance :Xerox Corporation, Order No. XSIS. Analysis of Carrier Sense Multiple Access 028112, 1981). with Collision Detection," Computer 13. R. Call on, "Internetwork Protocol," Pro- Networks , vol. 4, no. 5 (OctoberjNovem- ceedings of the IEEE, Sp ecial Issue on ber 1980): 245-259.

72 Digital TecbntcalJournal No . 3 September 1986 BruceE. Mann [ Colin Strutt !Mark F. Kempf I

Te rminal Servers on Ethernet Local Area Networks

Digital's terminal servers provide flexible, cost-effe ctive connections between terminals andhost systems in a local area network (LAN) . The product develop ers tried several approaches befo re develop ing t�e Local Area Transport(L AT) protocol asthe basis fo r all terminal servers. The LAT architecture supports connections to multiple hosts over: a high� bandwidth EthernetLAN. LAT establishes a single virtual drcuit 'between a terminal server and each host, and individual sessions are multiplexed over a virtual drcuit. A uniquedirecto ryservice permits terminalservers to be configured automatically, learning about hosts as they become available. The latest implementations support mixed-vendor environ­ ments and Digital's major op erating systems.

The Original Problem ucts available today; thar is, backplane multi­ In 1981, Digital faced the task of designing a plexers on the CPUs are switched to the termi­ method for connecting a few hundred "dumb nals. The problems with! this approach were terminals" and printers to a VAXcluster system. excessive cost, the lack ofl Digital technology in If, as in the past, the terminals were connected to this product area, and po9r availability. a single computer, then many of the advantages Because of these complexity and cost factors, of clustering would be negated. Instead, it was the original CI-Mercury project was replaced proposed that terminals be connected to a with one called Pluto. This project envisioned "front-end" terminal server shared by all mem­ using an Ethernet as the interconnect, thus bers of the cluster. This front end would then lowering the attachment cost dramatically. allow more flexible connections. A user termi­ This server was based on a PDP-11 central pro­ nal, fo r example, could connect to any processor cessor, and we chose a variant of the RSX- 11S in the VAXcluster group, rather than directly operating system for the initial kernel software. connecting to just one. Our goal was to migrate The lower-layer communiCations protocols used our existing installed terminal base gracefully between Pluto and the VAx cluster nodes were from single-processor attachments to VAXcluster the DECnet protocols, sucbessfully used in other systems. products. The original effort to provide this server was We believed that Pluto could be cost effective called the CI-Mercury project by our develop­ in large installations; how�jver, its . initial cost was ment groups. We aimed to attach this terminal too high to be competitive in smaller configura­ server directly to the high-speed cluster inter­ tions. This cost factor was especially important as connect, called the CI, so that the server func­ Ethernet became an integral part of Digital's tioned as a switch. However, the cost of this strategy. With Ethernet, it became practical and scheme proved to be excessive. (The cost for the cost effective to distribute small terminal servers interface to the CI itself was about $20,000.) throughout an office environment rather than Moreover, a connection to the CI would have concentrating all termina� interfaces in a large, resulted in a server that could conn�ct only to centrally located server. Therefore, in late 1981,

nodes in a single cluster. work began on an eight-lineI terminal server, the We also studied other vendors' switch offer­ primary goals being low 'cost and high perfor- ings as front-end terminal switches. These prod­ mance. Internally, this pnhect was dubbed Pluto ucts function much as do the dataswitch prod- Junior, later called Poseidbn.

Digital Te cbnicaljournal 73 No . 3 September !986 Te rminal Servers on Ethernet Local Area Ne tworks

Late in 1983, significant problems were The Development of LAT encountered in the design ofthe Pluto and Posei­ In late 1981, the prototype of the original LAT don terminal servers. The CTERM protocol, a server was developed on a VT 1 0 3 terminal new design of a layered DECnet protocol off­ server, which contained a small Q-bus backplane loading character-processing overhead from the with a PDP-1 1/23 system and an Ethernet con­ host to the terminal server, proved to be more troller. (An Ethernet controller made by 3COM complex than anticipated. Measurements of mes­ Corporation was used since Digital had no Ether­ sage-processing overhead and estimates of the net products available at that time.) This early overhead in the DECnet-VAX software showed work involved quantifying the maximum charac­ that CPU consumption in the host system would be a problem for keystroke editors. Existing stud­ ter-echo delay that a person could comfortably ies showed that terminals were used in keystroke tolerate. We learned that an experienced touch modes, rather than command-line modes, more typist encounters difficulties when the echo time than fifty percent of the time. Moreover, the exceeds 100 milliseconds. By extrapolating from Pluto server itself was experiencing severe per­ this fact, we deemed that the network and CPU formance problems. For example, CPU satura­ efficiency of the entire LAT subsystem should be tion occurred when running less than six termi­ dramatically improved. The approach was to nals at 9600 baud, even when the terminal "procrastinate" for up to 80 milliseconds after interfaces used direct memory access (DMA) . characters were received from the terminals at Finally, a number of issues, not considered each server. This delay had the very desirable during the requirements phase, became more effect of reducing the number of messages pro­ apparent: cessed by the Ethernet, the host systems, and the • How could a VAXcluster system be viewed as terminal servers. (Eighty milliseconds is imple­ a single system rather than as individually mentable as a multiple of either the 60-Hz line­ addressable nodes? frequency clock common in the United States or • How could the terminal load be balanced the 50-Hz line-frequency clock common in across nodes in the VAXcluster system? Europe and other countries.) • How could the management of the terminal In early 1982, we created a VMS driver servers be automated? (LTDRIVER) using a dedicated Ethernet controller to support the LAT server prototype. Thus the use of the CTERM protocol for terminal By April 1982, log-in to a VMS system from servers in both Pluto and Poseidon was halted. a server was achieved; about two weeks later, (In fact, the Pluto project with an RSX kernel the performance relative to the then current was used successfully as the basis for a number of multiplexer, the DZ-11, was measured. The different servers in the Ethernet Communica­ LAT connection was easily able to outperform tions Server, or DECSA, family, including the DECnet Router, DECnet RouterjX.25 Gateway, the DZ-11 (a programmed-interrupt controller) and DECnetjSNA Gateway products. The same under a wide variety of loads. Under many hardware base, though with a completely rewrit­ loads, the LAT connection was shown to outper­ ten software kernel, formed the basis for the final form the DMF-32 (one of a number of DMA Ethernet Terminal Server.) 1 controllers) . However, the original task still remained; In early summer 1982, we converted therefore, an alternative solution was proposed, LTDRIVER to the shared Ethernet port driver. based upon work done using a new architecture This conversion allowed a single Ethernet con­ called local area transport (LAT) . The LAT solu­ troller to be used simultaneously for LAT soft­ tion involved three essential components that ware, and DECnet and other communications were 'unique to that architecture: software. Unfortunately, this change yielded a • A new transport and naming architecture to significant performance degradation. At this replace the DNA routing, transport, and ses­ time, however, the VMS Development Group was sion layers designing a lower-level program interface to the • A new operating system for the terminal server Ethernet driver that would allow system-level VMS usage of the Ethernet. Currently, this inter­ • A new "port" driver for the terminal driver of face is used to implement VAXcluster support via the VMS operating system the Ethernet.

74 Digital TechnicalJo urnal No . 3 September 1986 New Products

i By late 1982, we decided to include both IAT environments, the added fJatures of server tech­ and CTERM support in the Pluto terminal server, nology are not necessarily perceived as adding but only IAT support in Poseidon. In addition, value; then cost becomes the sole factor for com­ the original code from the prototype Vf1 03 ter­ parison. Digital's servers are at a disadvantage in minal server was migrated to a UNIBUS PDP- 11' this situation because they offer features that cost system; this code was called I.AT-11. more. Digital must pursue � dual path to develop By early 1983, a significant number of VMS servers for some applications and to maintain and developers were using the prototype I.AT-11 expand backplane terminal interfaces for others. servers. This software was maintained by the IAT As noted earlier, we knew that the Ethernet developers. It was important that the software Terminal Server could compete effectively on worked reliably since the VMS developers were cost alone for large numbers of terminals; for using it in developing the VAXcluster software. smaller configurations, however, it could com­ As noted earlier, the original development pete only on the basis of gr�ater functionality. Its team for the CTERM terminal server on Pluto fixed cost is relatively high, although the incre­ experienced a number of problems. Therefore, mental cost for each terminal added is low. Thus in early 1984, a new terminal server was imple­ we started to design a low-cost terminal server. mented on Pluto, based on the I.AT- 11 code and The first decision we made was an important not on the RSX software. This new server, con­ one: the product wouldi be a local terminal taining software only from IAT , was referred to server and nothing more. !Telephone data lines internally as Plato. usually terminate inside computer rooms. There­ The prototype I.AT- 11 code was developed fore , Pluto, which is suited to computer room into a product to run on version 3. 7 of the VMS configurations, already filled the need for a ter­ system. This product became available in July minal server with modem control capabilities. 1984, somewhat before VMS VAXcluster support Poseidon was specifically designed to be dis­ appeared in VAXfVMS version 4.0. One month tributed along an Ethernet throughout an office later, the EthernetTerminal Server, the product environment, near the at�ached terminals. Of name for the Pluto terminal server, became avail­ course, multiple Poseidons could also be used in able. The risk of having the VAXcluster offering wiring closets and computer rooms. adversely affected by an unproven terminal We also believed that Pluto already provided a server was limited by releasing it with the earlier hardware base for other communication server version of the VMS system. Thus we took advan­ applications; therefore, P9seidon need not sup­ tage of extensive "free" testing from over 1000 port applications other thanI - terminal serving. internal users. Although often desirable (rom the standpoint of In March 1985, the DECserver 100, the pro­ the company's total product set, generality is duct name for the Poseidon terminal server, was also the archenemy of low cost. Hardware that released. The DECserver 100 implementation serves many functions also has capabilities that was radically different from the other terminal are unused in some applications. Those unused servers. capabilities represent a co�t from whkh no bene­ fit is derived when an i�olated application is 1 DECserver 100 viewed. Although the Ethernet Terminal Server and On the other hand, hardware designed for a I.AT- 1 1 products provided the benefits of server­ particular application can optimize cost and per­ based terminal interconnect, they did not fully formance by eliminating any unnecessary capa­ implement Digital's ter';Ilinal server strategy. For bilities. The Ethernet Terminal Server and DEC­ server technology to b�come pervasive, it must server 100 illustrate both ;ends of this spectrum. compete with other terminal connection meth­ The hardware base for the former functions in a ods on the basis of cost alone. In cluster and number of general roles related to communica­ multi-host systems, servers provide necessary tions, such as the DECnet Router or DECnetjSNA and desirable added functions. Therefore, they Gateway products. Consequently, this product should be compared with other connection has a high entry cost, but a low incremental cost methods by assigning some value to the addi­ as each terminal is adde

Digital Te cbnical]ournal 75 No. 3 September 1986 Te rminal Servers on Ethernet Local Area Networks

A second equally important decision was made high-volume, and therefore inexpensive, compo­ early in the project: the product managers nents where possible - a decision driven par­ defined and then enforced a very aggressive cost tially by the desire to shorten the design time. go:'ll in terms of dollars per connection. That goal Mter these decisions, work started in earnest. was set in two passes. In the first, the engineers One of the most important issues was making did a preliminary cost analysis, taking into sure there was enough processing power. Since account competitive pressures and currently we had confined the problem to a specific appli­ available technology. In the second, the product cation, we could size the processing require­ managers decided the original goal was too high, ments quite accurately. Pluto had to deal with lowered it, and then challenged the engineers to many potential applications and an expandable meet it. This challenge gave the engineers every number of terminals, Poseidon with exactly one incentive to squeeze cost out of the design . application and eight terminals. Pluto has one Although some cost reductions seemed quite main processor with assist processors added as insignificant and not worth the effort, in the end terminals are added; Poseidon did not have to the old adage of "watch the pennies and i:he dol­ expand and needed only one processor ifit had lars will watch themselves" proved to be true. sufficient power. At this time, several extremely The insistence on meeting the cost goal also pre­ powerful 16-bit processors became generally vented us from adding "bells and whistles," with available. We evaluated them, including some their associated costs and complexity, to the from Digital as well as other vendors. Since requirements list as the project progressed. Poseidon would not be programmed by cus­ Starting system design, we immediately faced tomers, the extensive PDP- 11 and VAX instruc­ an inescapable trade-off in the design options. In tion sets were not really needed. We decided the ideal case, the cost per terminal to connect a finally to use the Motorola 68000 chip, which single isolated terminal should be the same as was the lowest cost, most readily available cost per terminal to connect, say, 16 terminals. microprocessor with sufficient power. That is, the cost steps should be uniform as ter­ As the design progressed, we considered every minals are added to the system. Unfortunately, possible cost reduction option. For example, the some of the costs in a server system are essen­ dynamic RAMs are reshed ref by software since tially fixed. For example, the power and packag­ sufficient processing time exists to do that; the ing costs are approximately the same whether a cost of refresh hardware could thus be elimi­ server accommodates one terminal or four. nated. Chips were selected to perform multiple These fixed costs result in a relatively large ini­ functions whenever possible. For example, the tial cost step, followed by smaller steps as termi­ terminal interface (UART) chips have integral nals are added, followed by another large step timers used to control the software refresh, the when an additional server is added. We realized timer interrupt, and the watchdog timer. Essen­ that a compromise was needed between step size tially, the interrupt logic uses very little external and the potential for amortizing fixed cost over logic to tum around the interrupt priority level several terminals. As the design progressed, we to generate the vector address. decided that eight terminals per server provided Thus the design resulted in an extremely low­ an acceptable step size that allowed us to meet cost, fixed-function terminal server, the DEC the cost-per-line goal. server 100, which has proven to be, by far, the Work started on the hardware design with most popular member of Digital's terminal a clear cost goal, but with no preconceived server fa mily. Figure 1 depicts the initial LAT · requirements for the implementation. It seemed product. fairly obvious that an eight-line server could be built on a single printed circuit board. Since The LATAr chitecture there is a substantial expense simply in connect­ ing multiple boards, we decided very early that Th e LAT Protocol directly incorporating any pieces of existing One initial goal of the LAT architecture. was to products was too expensive. The server would connect terminals to host systems using the Eth­ be a single board designed from scratch, ernet as a data link. Even today, LAT is•still used although we were free to borrow design ideas primarily for connecting terminals to hosts. from other products. We also decided to use only However, its application hasspread to connect-

76 Digital Tecbnicaljournal No . 3 September 1986 New Products

VAXcluster SYSTEM

VAX/VMS HOST

ETHERNET DECserver 100 TERMINAL SERVER

TERMINALS TERMINALS DIAL-IN MODEMS

Figure 1 Initial LA T Product ing other asynchronous devices, such as printers The LAT protocol mak�s certain simplifying or links to hosts other than those directly con­ assumptions: nected to an Ethernet. • Communication is local to a single logical Eth­ The goals of the LAT protocol are as follows: ernet (possibly connected by repeaters and • To permit dumb terminals to be connected to bridges); thus no routing capability is multiple hosts required.

• • To be a transparent character transport mecha­ Communication is inherently asymmetric, nism (implying that character echo must be which simplifies connection management and performed by the host and not by a server) permits straightforwa!rd host implementa- tions. . . • To support a high-bandwidth LAN technology • . (specifically the Ethernet) The bandwidth of the Ethernet ( 10 megabits per second) is much gI reater than the band­ • To use a fixed maximum bandwidth that is width needed for a given terminal (e.g., 9,600 much less than the total LAN bandwidth, bits per second), so that a timer-based proto­ which should be used in a fair and predictable col is appropriate. manner The normal model of dumb terminal usage is • To be an efficient data link protocol, relative one of low-speed data entry, say a: few characters to the higher-layer DECnet protocols, such as per second, and higher-speed display in bursts of CTERM operating in a LAN environment several hundred characters at a time, taking sev­ eral seconds to display. In addition, a user is usu­ • To provide for low CPU loads and memory use ally sitting at his terminal while a program oper­ on the host system at the expense of higher ates at the host. LAT takes advantage of this CPU and memory utilization on the terminal asymmetrical relationship. Also, the terminal servers connection normally takeI� place at the explicit request ofthe user rather than of the host system. • To allow for simple terminal server imple­ 1 mentations, which means low-cost and high­ LAT also takes advantage of this asymmetric performance hardware implementations aspect. The server does not c�mmunicate characters • To permit automatic configuration so that, for to a host system as they are entered by the user; example, servers can determine, without man­ rather, it collects characters and periodically ual intervention, the names and addresses of transmits them to the host. The time interval of hosts on the Ethernet this period, the "circuit timer," is quite short -

Digital Te cbntcaljournal 77 No. 3 September 1986 Terminal Servers on Ethernet Local Area Networks

typically 80 milliseconds . With many users have ignored errors that may occur in message connected, a host is interrupted much less often delivery, and we have assumed message delivery by gathering together all the characters typed even when there is no data to send. We will by those users and sending them as a single examine the implications of these cases shortly. message. The protocol as defined is, in effect, a request­ . The lAT protocol is divided into two distinct response one. Such a protocol has the character­ layers, the virtual circuit layer and the slot layer. istic that only one data link buffer need be allo­ cated at each end of the virtual circuit. This fact Virtual Circuit Layer can be important for hosts that need to support The virtual circuit layer establishes and main­ large numbers of virtual circuits without dedicat­ tains an error-free communications path (a ing large quantities of buffer space to that task. virtual circuit) between two nodes, typically a The termination of a virtual circuit can occur terminal server and a host, that wish to commu­ from either end; under normal conditions, how­ nicate. The connection is initiated by one end of ever, the master usually initiates the closing. the communications path and operates under the The IAT protocol defines three messages at control of the initiator. However, the circuit can the virtual circuit layer: the start, run, and stop be terminated by either end. Typically, the vir­ messages. Thus for a typical virtual circuit, we tUal circuit connection is initiated when the first might see the exchange of messages depicted in terminal user requests a connection to a host sys­ Figure 2 (again, making the stated simplifying tem to which no virtual circuit yet exists. The assumptions) . initiator of the virtual circuit is referred to as the Knowing the built-in limits on maximum mes­ "master node," the other end as the "slave sage size and the rate at which IAT messages are node." Thus the terminal server is normally the exchanged, we determined that the maximum master and the host the slave. amount of data that can be transferred across any The establishment of a virtual circuit connec­ virtual circuit is just under 150,000 bits per sec­ tion requires a single message exchange. Infor­ ond in each direction. (In fact, the IAT protocol mation such as protocol versions, message sizes, defines a method for increasing the available and node names are included in these messages. bandwidth fo r a virtual circuit by using multiple data link messages. To date, there has been no Simplified View of Vi rtual Circuit Op eration TIME We start with a simplified explanation of the vir­ (MILLISECONDS) MASTER SLAVE tual circuit operation. Once established, the data exchange occurs as fo llows: 0 START 5 START • Every 80 milliseconds, the master sends to the slave a message containing any data that must 10 RUN be sent. 15 RUN

• On receiving this message , the slave processes 90 RUN any data in that message and sends back a reply containing any data waiting to be sent in 95 RUN that direction. 170 RUN

• On receiving this reply, the master processes 175 RUN any data that was in the message.

• Eighty milliseconds after one message was n RUN sent, the next message is seiu fr om the master. n+5 RUN The message round-trip time is typically less n+10 STOP than 10 milliseconds. This operation is timer driven on the master, the terminal server, and event driven (by message receipt) on the slave, Figure 2 Ex change of Messages the host. The operation is simplified because we

78 Digital Tecbnical]ournal No . 3 September 1986 New Products

need to implement this feature .) Once this max­ each end of the virtual circuit can agree to acqui­ imum has been reached, terminal users will esce for a time; a circuit in this mode is called experience a degradation of service, shared "balanced." Once balanced, if no data needs to equally among them. As shown later, the mini­ be sent for a long time, the master will eventu­ mum message exchange rate may be much less ally send and the slave wiU then respond with than one exchange every circuit timer, due to single run messages. Thus! each end knows the optimizations in the IAT protocol. other is still alive. This action is called a "keep­ alive" function, which tak�s place every 20 sec- Removing the Simplifications onds by default. I So far, our view of the virtual circuit protocol has If data becomes availabl� when the circuit is been constrained by two major simplifications. balanced, then either end must be permitted to These two are concerned with errors that can "unbalance" the circuit. If the master wishes to occur on the Ethernet, and with a mechanism for send data, then this unbalancing operation is no reducing traffic on the Ethernet when there ) different from any normal run message that the is no data to be sent. The following discussion master may send. However, if the slave wishes to explains how the consequences of these simplifi­ send data, then it must send an "unsolicited" run cations are taken into account. message that is not explicitly solicited by the The IAT protocol, being based on Ethernet, master. As with any other run message, the presumes that the majority of packets will be unsolicited message is sequenced and must be transmitted and received without errors. These acknowledged by the masterI before the slave is errors can be due either to corruption of data permitted to send another :run message. (detected by CRC checking) or to buffering Thus by allowing virtual circuits to be bal­ problems at the destination node. To account fo r anced when there is no data to be sent, the IAT any errors that do occur, a sequence number protocol uses much less Ethernet bandwidth and must be assigned to each virtual circuit message, allows a corresponding reduction in the loading and an acknowledgment of that message must be of the CPU host. made. No extra messages need be sent since the The virtual circuit layer provides reliable com­ sequence number and acknowledgment fields munication between a pair of nodes. It also pro­ are contained within the normal message for­ vides a datapath that is bidirectional, sequential, mats. However, there is no negative acknowledg­ timely, and error free. All users desiring to com­ ment defined by the IAT protocol for reasons of municate over that path are multiplexed over the simplicity and the low error rates experienced same virtual circuit, cons�quently lowering the

on Ethernet IANs. CPU cost per user on the hbI st. This multiplexing The Ethernet communications medium is function is the responsibility of the slot layer. inherently very reliable. Therefore, whenever a ! message is unacknowledged within the SO-mil­ Th e Slot Layer lisecond period before the next message is sent, The slot layer establishes user sessions, transfers the cause will normally be due to either a heavy data bidirectionally, and multiplexes and demul­ CPU load on the host or a host crash. To avoid tiplexes sessions over virtual circuits. In this compounding the problem of a transient over­ context a session can be envisioned as a connec­ load on the host CPU, IAT specifies that mes­ tion from one user's terminal to one host system. sages are not retransmitted every 80 millis.ec­ In the simplified case, a terminal user first onds. Rather, they are retransmitted only identifies the computer system with which he when they have not been acknowledged within desires to communicate. Aivirtual circuit is then approximately one second. A given message will established - if one doe� not already exist - be retransmitted a certain number of times; after from the terminal server t� the chosen host sys­ that, the conclusion can be drawn that either the tem. A session is then established on top of the ' host has crashed or the communications con­ virtual circuit. The service access point at the trollers or medium have failed. This number is host would normally be represented as a virtual known as the "retransmit limit." terminal port into the host operating system. If we can reduce the data sent over an idle vir­ Thus the user would perceive the virtual termi­ tual circuit, the CPU load of the host will be nal as being directly connected to the host reduced in tum. !ATemplo ys a scheme whereby system. For example, on the VMS system, the

Digital Te cbnical]ournal 79 No . 3 September 1986

I !!! Te rminal Servers on Ethernet Local Area Ne tworks

LOGINOUT fu nction can be run to allow the user desires to send some data, that end must have a to log in and continue with the normal interac­ credit outstanding. Typical implementations tive use of the system. normally keep two credits outstanding at any At the slot layer, data is passed to the virtual time. Thus each end of a session must be pre­ circuit layer as "slots," which are addressed pared to receive up to 510 bytes of data. A credit units of data . A number of different types of slots is not reissued until all the data contained in have been defined. Each session has a unique slot the data slot that used the credit has been con­ number on the virtual circuit to aid in the multi­ sumed .. That is, all the data must have been either plexing and demultiplexing of sessions over vir­ displayed on the terminal or read by the host tual circuits. Slots are only sent over virtual cir­ application. cuit run messages. Because slots all share the The initial credit allocation is passed in a start underlying virtual circuit, no explicit error slot. The slot header will contain a field for detection and correction need be performed by passing credits to the other end of the session; the slot layer. that field is non-zero when credits are being The establishment of a session is accomplished extended. In this way it is possible to send a using one of the assigned slot types called a start data-A slot with no data but with the credit field slot. As with the start message (which causes the non-zero. Such a slot does not itself consume a creation of a virtual circuit) , the session estab­ credit since it is presumed to take no additional lishment occurs with a single start slot exchange. buffering at the slot layer at the other end to pro­ First, the master sends a start slot requesting a cess the slot. connection to the slave . If the slave is able to There are three additional slot types defined accept the connection, it replies with a start slot; for the slot layer. The first, the data-B slot, com­ if not, due perhaps to lack of resources, the slave municates the following information: may reject the connection with a reject slot con­ • The physical port characteristics, such as baud taining an appropriate reason code. During ses­ rate (e .g., 9600 baud) , character size (e.g., 7 sion establishment, various parameters are nego­ or 8 bits) , and parity (e.g., none, odd, even) tiated, one being the maximum quantity of data that may be sent in a single data slot. This quan­ • The session characteristics, such as whether tity can be different in each direction, the largest the ANSI flow-control characters (XOFFI being 255 bytes. XON) should be treated as data or flow-con­ As noted earlier, the virtual circuit layer pro­ trol messages vides an error-free, bidirectional datapath • The in-band signaling of break conditions or between two nodes. The slot layer takes advan­ signaling errors (parity or framing errors) tage of this condition and passes data in each direction independently, mirroring the opera­ The data-B slot is subject to the same credit tion of a terminal as a fu ll-duplex device. Owing mechanism as the data-A slot and indeed shares to the mismatch of speed between terminal and the same credits. host, some flow-control mechanism is needed to The next slot type, the attention slot, is not prevent one end from overloading the other. subject to credits and is used for out-of-band sig­ (This mechanism is independent of the flowcon­ naling. This slot is currently used only for an trol required between the terminal server and abort-output operation; for example, discarding the terminal itself. That control is normally han­ any output waiting to be sent to the terminal dled by using the ANSI flow-control characters when a cancel-output ("O) character is typed. XON and XOFF.) A session may be terminated by either end via The LAT protocol defines a credit-based flow­ the final slot type, the stop-slot. Typically, the control scheme at the slot layer. In this control stop slot is sent by the host system after the user scheme, the receiver must give permission to a logs out of the system. transmitter to send each data unit, contain­ ing one or a collection of bytes. Data may be Directory Service exchanged in units of up to 255 bytes in a One goal of the LAT protocol is to permit the slot type called a data-A slot. The sending of a automatic configuration of the LAN. The impor­ data-A slot (if it contains any data at all) uses tant information that needs to be disseminated up a single "credit." If one end of a session throughout the LAN is the name of each service

80 Digital Te chnlcalJournal No . 3 September 1986 New Products

that may be used. Rather than requiring that each the nodes in a VAXcluster system. The users need terminal server possess this information a priori, not be aware of the current configuration of the LAT provides a mechanism that permits each cluster in order to form a connection. server to "learn" about the configuration. By carefully managing the service advertise­ To accomplish this learning process, an addi­ ments, the server makes the service directories tional message type is used, the "service adver­ reflect the current service list and their associ­ tisement." This message is multicast from each ated ratings. If a server fails �o hear from a service slave node to all master nodes and gives the provider for some period, t?e server can assume

names of all services that the slave node is cur­ that the service provider hasI failed, or crashed. rently offering. (A multicast message is a single The server can then remove the service from its message addressed to and received by multiple directory of available servi�es. nodes.) An advertisement is transmitted periodi­ Note that this multicast naming service is also cally, typically every 60 seconds. Thus on start­ asymmetric; the master nodes do not send multi­ up, a server can "listen" for service advertise­ cast advertisements to the slave nodes. A recent ments and build a directoryof available services. addition to the LAT protocol allows a slave to uti­ This directory can then be presented to the user, lize a different multicast message to determine if on demand, enabling him to choose whichever a given node name exists on the LAN . This tech­ services he wants from those available when a nique is used so that host sy�tems can find termi­ connection to a host system is desired. nal servers (in order to solibt connections from Service names, the names used to gain access their ports, described later) I by knowing only the to the appropriate service access points, are not name, not the specific Ethernet address, of the limited to the name of the node on which the ser­ server. vice is offered. Indeed, there is no restriction Some details of this naming service deserve that any node may offer just one single service. further discussion. For example, the LAT "load­ Instead, LATallo ws a given node to offer multi­ balancing" and "fail-over" features are most ple services. often associated with VAXcluster systems. How­ One common use for multiple service names is ever, although they enhance Digital's VAXcluster in a VAXcluster environment. Here the cluster offering, these LAT features are independent manager can choose to offer as a service a name of it. i � representing the logical name of the cluster, in "Equivalent services" m I y also be offered by addition to (or instead of) each individual node multiple nodes using the directory service. Con- name. When a user requests a connection to the sider services that are net\vork based, such as service name representing the cluster, the termi­ videotext and dial-out modems. With an Ethernet nal server can select one of the available nodes. LAN, many independent nodes might offer such In this case all nodes offering the same service services; typically, however, users can access the will be presumed to be offering identical capa­ service only through nodes on which they have bilities to the user. accounts. If a user's system is down, he is denied To assist the terminal server in choosing a access to the service, even though the service node, the service nodes provide a "rating" asso­ remains available on other �nodes. For example, ciated with each service offered. The rating is a consider a videotext-basbd service, such as numeric value from 0 to 255 that represents LIVE_WIRE (an in-house j electronic bulletin some measure of the resources available to apply board) , that can be offered by many independent to that service. For example, the current VMS LAT host systems. If a LAT user connects to LTDRIVER implementation takes into account LIVE_WIRE, the terminal server software will the most recent CPU idle time, the CPU type, the detect that the service is offered from multiple amount of memory, and the number of remain­ sources. The software will then make a connec­ ing interactive job slots. VMS LTDRIVER also tion to the source believed to be currently offer­ allows the system manager to specify a rating. ing the best level of service. If that service The terminal server can then choose, at any should fail (i.e., stops sending Ethernet LATmes­ instant, the node that offers a requested service sages) , the terminal server s1bftware will automat­ with the highest rating and use that node as ically reconnect the user toi an alternate provider the one to which to form the connection. This of the same service if one1 exists; this action is choice ensures that the load can be shared among known as fa il-over.

Digital TecbnicalJounaal 81 No. 3 September 1986 Te rminal Servers on Ethernet Local Area Networks

Future versions of Digital's LAT products may connections, or 32,000 terminal servers, each make more extensive use of the LAT service capa­ with up to 255 sessions. This large number rep­ bility. That would make it possible to install resents a. significant cost advantage, especially applications that are accessible to the extended considering that Ethernet controllers are stan­ LAN but not to the . A form of dard options on many of Digital's processors. In nondiscretionary access control is implicit in this case the host-processor terminal connection this design. cost then becomes negligible, making back­ LAT group codes can be used to partition an plane-oriented terminal switches much less Ethernet logically when the number of nodes attractive . This cost advantage improves as the gets large . By large, we mean more than I 00 ser­ size of the system increases. Table I compares vices. Having more than 20 services or so means the requirements of LAT with those of a data­ that a server display with one line per service switch fo r diffe rent numbers of terminals and will no longer fit on a terminal display without hosts. scrolling. Some additional advantages afforded by using LAT are as follows: Product Implications of the • Multisession capability, not offered by data­ LA T Architecture switches Al though not originally conceived as a dis­ • Simplified installation and management tributed terminal switch, an Ethernet can be used (especially where users and computer systems effectively in that role if combined with the ter­ are often added or moved around) minal server products. This fact remains true even when the Ethernet and host system are run­ • Higher availability due to the lack of any sin­ ning other protocols simultaneously, such as gle point of system fa ilure DECnet and VAXcluster systems based on Ether­ • Simplified, incremental expansion and net. Our experience has shown that a single ded­ migration capabilities inherent in Digital's icated Ethernet segment, without bridges, can extended LAN architecture , utilizing bridges easily support several thousand concurrent users. LAT Performance Functioning as a distributed terminal switch LAT performance is measured in terms of CPU in the Digital computing environment, LAT load per user, which decreases as the number of offers significant advantages over dataswitches users performing terminal IjO increases. Thus and backplane multiplexers. The most promi­ LATperf ormance increases with increasing CPU nent of these advantages is that any terminal loads. Under light loads, LAT uses a relatively server user can connect to any host system . large amount of CPU resources. This is under­ "Blocking" connections to host systems (more standable if the cost of processing an Ethernet accurately called "port contention") is not an packet containing a single character is compared issue because host-system ports are logical, not with the cost of servicing a single DZ- I I charac­ physical. A VAX(VMS system is limited by the ter interrupt. As more data is exchanged, how­ LAT architecture to about 6 million simultaneous ever, the number of messages exchanged does

Table 1 A Comparison of Host Connections for LAT and Dataswitch

Number of Terminals, Number of Hosts LAT Requirements Dataswitch Requirements

8 terminals 8 server connections 8 terminal connections 1 host 1 Ethernet adapter 8 host connections

64 terminals 64 server connections 64 terminal connections 8 hosts 8 Ethernet adapters 512 host connections

512 terminals 5 ·1 2 server connections 512 terminal connections 16 hosts 16 Ethernet adapters 4096 host connections

82 Digital Tecbnical]ournal No. 3 September 1986 New Products

soles for management or system-level debugg­ ing. Moreover, reverse LAT can also be used to provide shared access to a pool of dial-out modems. f

Implementations and �Applications

0 <( 0 Th e Original Implementations ...J ::J Digital's original terminal server family had a.. u three members: IAT-11, the Ethernet Terminal Server, and introduced in March 1985, the DECserver 100. These LAT products support interactive terminal users. The products use the unique naming capabilities of LAT (service names, load-balancing, fail-over, and autoconfig­ uration) and feature multi�ession support and 1/0 LOAD complete application transparency. The servers implement an easy-to-learri user interface that allows users to change parameters, view avail­ Figure 3 Host CPU Loading fo r LAT able services, and connect and disconnect from and DZ-11 these services. In addition, the same user inter­ face allows a local manager to control the opera­ not increase. Instead, the number of characters tion of the server and ports. The DECserver 100 per message increases and the overhead cost of and the Ethernet Terminal Server also implement processing the message is amortized over a larger a remote console feature that allows remote man­ number of characters. Figure 3 shows these agement from the server by; using a convenient, relationships. centrally located host system. The performance of DMA backplane multi­ The IAT-1 1 product, unlike the other two ter­ plexers (such as the DMF-32 or DHU-1 1) falls minal servers, is a software product. It was origi­ between the two curves. Thus LAT is less effi­ nally sold to enable users with PDP- 11 systems cient than backplane multiplexers under light that were no longer being u:sed for general com­ terminal loads and more efficient under loads puting facilities to take adVantage of the server operating with more concurrent terminals. technology, but without incurring any initial By essentially emulating the RS232 and RS423 hardware investment. The software ran on some interfaces, IAT is able to provide a "single-sys­ of the older UNIBUS PDP- 11 systems, using tem view" in environments that include both 124KB of memory, up to eight DZ- 1 1 multiplex­ Digital's and other manufacturers' systems. A ers, and a DEUNA Ethernet 'controller. The soft­ "reverse" IAT server can be used to "front end" ware was loaded either via ttte Ethernetor from a the equipment of other vendors (a process called local disk. IAT- 11 offered 1a user interface and non-IAT host support) . These reverse-IAT serv­ capabilities similar to thos� on the original ver­ ers attach to the backplane multiplexers of the sion of the Ethernet Terminal Server and could non-IAT host systems. The servers offer service in connect up to 64 users toi the Ethernet. Being the same way Digital's host systems do over the based on PDP-11 technology, servers using Ethernet: by multicasting. Terminal server users IAT- 11 would normally be located in computer need not be aware of the details of this topology. room environments. For example, a developer debugging a communi­ The Ethernet Terminal Serveruses the Ethernet cation product between a VAXfVMS system and Communications Server {DECSA) hardware one from Prime Corporation could log in on both shown in Figure 4. This is a special-purpose systems simultaneously using the terminal PDP- 11/24 system with 5;1 2KB of memory, a server's multisession capability. The developer DEUNA UNIBUS-to-Ethernet controHer, and two could then switch between sessions with a single protocol assist modules (PAM) . PAMs are intelli­ keystroke . Reverse LAT can also be used to gent microprocessor-contrdlled interfaces based provide shared remote access to processor con- on the AMD 2 90 1 from Adv�nced Micro Devices,

Digital Te cbntcal]ournal 83. No. 3 September 1986 Te rminal Servers on Ethernet Local Area Networks

Internally, the DECserver 100 is radically dif­ ferent from the other two members of the terminal server family, yet still retains the same external characteristics. The DECserver 100 is a low-cost terminal server capable of connecting eight asynchronous ASCII terminals to an Ether­ net using the LAT protocol. This server is a very compact unit and can be located in a computer room, a communications closet, or in an office environment. The server has no modem control. Modem control is implemented using an 8-MHz

Motorola 68000 chip, with 128KB of RAM , and 512 bytes of nonvolatile RAM (NVRAM). Like the Ethernet Terminal Server software, the DEC­ server 100 software is down-line loaded from a Fig ure 4 Communications Server DECnet load host.

Inc. Each PAM interface connects up to eight line Extensions to the Original cards, each of which is a dual RS232C interface Implementations with fu ll modem-control capability. The server The initial implementations of the LAT protocol also has a console boot terminator (CBT) module were on the terminal servers described above for self-test code, bootstrap code, and remote and on VAX/VMS host systems. The servers console support. The Ethernet Terminal Server implemented only the master end of the offe rs a user interface similar to that on the DEC­ LAT protocol, whereas the hosts implemented server 100. Using the LAT protocol, the server the slave end . Follow-on implementations can connect up to 32 terminals (either locally or have added similar support for additional remotely via modems) to the Ethernet. The Eth­ host systems: the MicroVMS, RSX- 11M-PLUS, ernet Terminal Server can be located in a com­ MicroRSX, ULTRIX-32, ULTRIX-32m, TOPS- 10, puter room environment or a communications and TOPS-20 systems. closet. The software is always down-line loaded Each system implementation offers access to into the unit from a DECnet load host across the the command interpreter as the service access Ethernet. point. Figure 5 illustrates this support.

VAX/VMS RSX1 1 -M-PLUS ULTRI X-32 TOPS-20 SYSTEM SYSTEM SYSTEM SYSTEM

MicroVMS MicroRSX TOPS-10 SYSTEM SYSTEM SYSTEM

I I I

ETHERNET DECserverI 100 TERMINALI SERVER \\ //... II \\ DIAL-IN TERMINALS TERMINALS MODEMS

Fig ure 5 Additional LA T Host Support

84 Digital Tecbnicaljournal No. 3 September 1986 New Products

NON-LAT HOSTS

ETHERNET VAX/VMS TERMINAL HOST SERVER

DECserver 100

TERMINALS TERMINALS DIAL-IN PERSONAL MODEMS COMPUTERS

Figure 6 Ethernet Config ured as a Service Node

' Version 2_0 of the Ethernet Terminal Server, The third category is service fo r personal com- released in August 1985, added the reverse-IAT puters (PC) . They can be c I nnected to terminal implementation, permitting a server to offer servers and run in either ofb the terminal emula- additional services to which terminal users tion modes. Each PC thus acts as though it were ' can connect_ This implementation permits a dumb terminal. A PC can also run in file trans­ sessions to be created within the box as we ll fer mode when connected to another PC via .the as across the network, thus fo rming a switch same , or another, terminal server. Figure 6 illus­ style of operation in a single server. The types of trates the terminal server as a service node . services that may be offered by the terminal Subsequent versions of the Ethernet Ter­ server can be grouped into the fo llowing three minal Server, the DECserver 100, and the VMS categories. LTDRIVER software all permit asynchronous The first category is connections to non-IAT printers to connect to terlninat servers. These hosts. In this mode , the server acts as the Ether­ versions also allow print qu¢ues to be directed to net connection for systems (typically not made the printers from hosts. The IAT protocol has by Digital) that cannot themselves offer IAT ser­ been enhanced so that th� connection mecha­ vices on the Ethernet. Asynchronous ASCII ports nism remains under the c j.ntrol of the terminal on these systems are connected to a terminal server (for the reasons of tiefficiency mentioned server. Terminal users on the same or diffe rent previously) . That enhancement allows a host to terminal servers can connect to the service "solicit" a connection from a port on a terminal offered. They can then communicate with the server. Once the connection has been made, data non-IAT host as though it were connected to the transfer can occur as in the normal interactive Ethernet. terminal case, except that the printer output is The second category is service for dial-out under the direction of a VMS print symbiont. It is modems. Terminal users can connect to a port in possible, with these implementations, to direct a pool of dial-out modems. The users can then the queues from multiple systems to a single use the appropriate ASCII protocol to create a printer or bank of printe s being offered as a dialed connection and then access the remote common service. When a r connection request is system via its own dial-in port. made while the printer is I eing used by another �I

Digital Tec hntcaljournal 85 No . 3 September 1986 Te rminal Servers on Ethernet Local Area Ne tworks

system, the connection request can be queued. applications programs could initiate connections This queuing provides a basic mechanism for to terminals (or other asynchronous devices) sharing printers among multiple systems. located within the LAN . Some of Digital's personal computers now implement the master end of the IAT protocol DECserver 200 and can operate as simple single-session terminal The DECserver 1 00 interconnects terminals in servers. These servers are implemented as part of an office environment at a very low price. Soon the DECnet-DOS and ProjDECnet releases and after it was announced, it became clear that allow the PC to emulate a terminal connected to modem-controlled lines and connections to non­ a terminal server. Combining this feature with rAT host systems should also be priced just the servers that offer services, a PC user can con­ as low. nect to any PC that is connected to a terminal Thus the DECserver 200 project was initi­ server fo r file transfer applications, to a dial-out ated to produce a new server based on the DEC­ modem, or to a non-LAT host system. Data server 1 00 design, but with modem control capa­ integrity is provided "end-to-end" in PC-based bilities. Moreover, this product had to meet the implementations due to the lack of , original cost goals of the DECserver 1 00. This or simihir, wiring. Figure 7 shows the connec­ project involved a redesign of the printed circuit tions to asynchronous printers and IAT from per­ board, yet retained the same system architecture. sonal computers. A faster version (1 0 MHz) of the same MC68000 Within the IAT environment, the service name microprocessor was used, and memory was offered by a host system does not always have to increased from 128KB to 384KB of RAM and represent the command interpreter on a given from 512 bytes to 2KB of NVRAM. This increase system, though this is by fa r the most common allowed room for the implementation of modem use today. Instead, a service name could repre­ control software and support for non-IAT hosts sent an application program, which might be run (i.e., reverse-rAT capabilities) . The increase also automatically when a connection request is allowed a larger service directory database to be made. Alternatively, using the solicited-connec­ stored and an enhanced on-line help capability tion mechanism currently employed for printers, to be added.

NON-LA T HOSTS

ETHERNET VAX/VMS TERMINAL HOST SERVER

ETHERNET DECnet-DOS DECserver 100 PRO/DECnet TERMINAL SERVER .

TERMINALS ASYNCHRONOUS TERMINALS DIAL-IN ASYNCHRONOUS PRINTERS MODEMS PRINTERS

Figure 7 Asynchronous Printers and LA Ton PCs

86 Digital Tecbnicaljournal No . 3 September 1986 I 1 New Products

• ,. .. - ·., 1_ -. -- ::_

Figure 8 DECserver 200

Another fe ature of the DECserver 200 takes competing independently for the same resource . advantage of the new DECconnect cabling The first "solicitation" is serviced by a connec­ scheme, allowing coQnections to be made using tion, and subsequent requests are queued on a DEC423 wiring. This fe ature allows communica­ first-in, first-out basis. tions at up to 19.2 Kbau.d over cable that is nei­ On a particular terminal s�rvI er, all devices that ther twisted pair nor shielded, for relatively long are logically connected to the same host system distances of up to 1000 feet. Figure 8 shows the share messages both to and from that host . DECserver 200 hardware. Within each message, each user's data is con­ tained within slots. This �ultiplexing, in con­ Summary junction with the delay timer, reduces further Unlike other existing packet-oriented transport the number of messages e:?cchanged. For exam­ layer architectures, the LAT transport layer ple, as more users log in to a host system, the implements asymmetric connection manage­ number of messages exchanged remains con­ ment, asymmetric data flows , and timer-based stant at approximately 12 per second in each message exchanges. direction, even as the lengths of the messages The most unusual innovation of the IAT archi­ increase. tecture is the use of multicasting as a presenta­ The DECserver 100 and · DECserver 200 are tion level naming service . On Ethernet, packets low-cost implementations of the IAT architec­ are normally addressed to the adapter of a . ture, allowing terminals and other asynchronous specific system. However, the Ethernet specifica­ devices to be configured in a flexible and cost­ tion describes a fo rm of logical addressing called effective manner in a LAN.. multicast addressing. In this scheme a packet addressed to a multicast address is received Acknowledgments , nearly simultaneously by many independent sys­ Over the years , a large number of people have tems. LAT uses these messages to completely contributed to the architecture and products configure the topology automatically. This described in this paper. The authors acknowl­ action means that installing a terminal server is as edge , with gratitude, all this work. In addition, a simple as plugging it into the Ethernet and wait- number of people have taken the time to review · ing for services to be advertised. this paper and have made many helpful sugges­ Asymmetric connection management consider­ tions; to these. people we a.lso! extend our thanks. ably simplifies the complexity of the protocol in which terminal servers initiate connections to References host systems. If a host system wa nts to connect to 1 . J. Morency. et al ., ''The: DECnetjSNA Gateway I a terminal server, that connection must be solic­ Product -A Case St;udy in Cross Vendor ited from the terminal server. This protocol Networking,'' Digit l Te chnical jo urnal solves the problem of having many host systems (September 1986, this� issue) : 35-53.

Digital Technicaljournal 87 No . 3 September 1986 Paul R. Beck ja mes A. Krycka

The DE Cnet- VAX Product An In tegrated App roach to Networking

Ea rly DECnet implementations were completely layered above th e ser­ vices of the op erating system. This loose bonding of network products to the op erating system resulted fr om separate development effo rts. From its inception in 1976, the VMS op erating system integrated networkingfunc­ tions adhering to the Digital Network Architecture (DNA). The DECnet­ VAX product is th e DECnet implementation most tightly coupled with its parent op erating system. Thisproduct provides an unprecedenteddegree of transparency fo r network applications while remaining true to the DNA strategy. Transparency is achieved by providing access to network capabilities through system services, record management services, and the standard1/0 statements of high-level languages.

When the fi rst VAX processor and its VMS operat­ layered product, DECnet-VAX was designed and ing system were designed a decade ago, the implemented by the same group that developed DECnet architecture was in its second major the VMS software. This product was designed phase. Several of Digital's major operating sys­ from the beginning to be a coherent pan of the tems had already implemented DECnet Phase II. VMS system . Its components are maintained with Therefore, a major goal of the VMS Development the VMS source code and compiled as part of Group was to provide networking capabilities each VMS base level . This decision to integrate with the initial release of that group's product. the DECnet-VAX development into the overall Both the VAX architecture and the VMS operat­ VMS project was instrumental in achieving the ing system were completely new designs. How­ levels of integration and transparency found in ever, the VMS system shares a common heritage today's product. with the RSX- llM operating system. Some of the In designing DECnet-VAX, a completely inte­ utilities in the first few VMS releases were actu­ grated approach to networking was taken to ally images of their RSX- llM equivalents run­ achieve the following goals: ning in compatibility mode. That was not the • A high degree of transparency at many levels, case with the DECnet-VAX product, the network allowing remote services to fu nction in a way product in the VMS system. that appears local to the system Previously, DECnet implementations had been • add-ons to their host operating systems, which The utilization of unique fe atures in the VA X predated the development of the DECnet archi­ hardware and VMS software tecture. The VMS system, on the other hand, was • High performance and efficiency designed after the DECnet architecture had been • Ease of implementation of network well established. The VMS architects recognized applications that including networking capabilities was vital to their system's success in the future. Thus they To build adequate DECnet capabilities into the decided to integrate those capabilities smoothly VMS system, a model to view network fu nctions into the operating system itself rather than to had to be developed. This model had to provide layer the architecture on top. Although sold as a answers to a number of strategic questions. How

88 Digital Tecbnical]ournal No . 3 September 1986 New Products

could the network name space be built on the integrating networking capa;bilities and evolving local name space of an individual node? How a highly transparent user ip.terface to network . would network functions be accessed from with­ I servtces. 1 in the operating system itself? To what extent One fu ndamental concept is that the VMS sys­ should a user be aware that he is specifyinga net­ tem treats network operatio s as a natural exten­ work function rather than a local one? sion of local 1/0 operationhs. The DECnet-VAX This paper will describe how the design of implementation, fr om the session layer (provid­ the VMS system facilitates the integration of net­ ing logical link services) down to the physical working capabilities. The DECnet-VAX product device layer, is modeled after the file access takes advantage of this design to provide net­ primitives of the VMS file system. Both network working services that are fa ithful to the DNA and file system operations use the assign channel philosophy while still tailored to the unique (ASSIGN) , queue 1/0 (QIO) , and deassign chan­ VA XfVMS environment. nel (DASSGN) system serviCes as their program­ ming interface, and make u e of the same subset Fo undations of the DECnet- VAX of QIO functions. Both oph$ ittions divide their Product work between higher level fu nctions requiring a The foundation of all networking applications process context to provide a large address space is the ability of a program on one system to ex­ and 1/0 handled through appropriate device change data with a program running on another drivers. At this level of abstraction, the program­ system. In the DECnet architecture this capabil­ ming steps required to engage in task-to-task ity is called task-to-task communication. It is the communication are quite similar to those needed backbone upon which a wide range of VMS net­ to access a local file. working facilities are built. These facilities Another design decisioq having a profound include effect on the style of inte!rface to remote file

• Remote file access and virtual terminal access was to integrate the record management VMS! suppon services (RMS) into the system. RMS is used for all common file access operations by the • Layered product extensions, such as dis­ operatingsyste m as well as by most VMS utilities. tributed mail and remote database applica­ These services provided a platform from wh ich tions to develop a common interfa ce for both local • Applications that rely heavily on file access and remote file access, as well as task-to-task and task-to-task capabilities, developed by communication. The DE�net-VAX developers users and third-pany companies achieved transparent remot� fileaccess by incor­ porating the data access protocol (DAP) modules • Distributed network management operations in RMS to communicate wi�h a remote file access The DECnet-VAX implementation had to sat­ listener (FAL) . For local filJ access, RMS uses the isfy the needs of both end users and application QIO interface to the file system. For remote file developers. Therefore , its main goals were to access, RMS uses the QIO interface to create a provide remote file access and task-to-task com­ logical link to the FAL server program through munication capabilities that would be easy to the session layer of the network. FAL then learn and use, functionally complete, and acces­ accesses the file byacting as a local user of RMS sible through the standard VMS 1/0 interfaces. on its system. The use of RMS is illustrated con­ Providing a high degree of transparency for ceptually in Figure 1. network activities was the key to achieving these The definition of a V S fi le specification goals. For example, transparency at the file level was extended to include �the node name with a means that accessing a file on a remote node is provision for an optional a cess control stririgto conceptually the same as accessing the file on pass authorization information to the remote sys­ the local system. That access should not require te m. The syntax of a node �specifier is one of the the use of diffe rent comma�ds or any changes to following: application programs. nodename:: The VMS design was influenced by several important concepts that laid the foundation for nodename"username password account''::

Digital Te cbnical]ournal 89 No . 3 September 1986 Th e DECnet- VA X Product - An Integrated Approach to Ne tworking

LOCAL NODE

VMS DISK FILE DISK '---- DRIVER SERVICES r--- FILE LO ACCESS c REQUEST RMS SERVICES

REM

NETWORK COMMUNICATIONS - L SERVICES DRIVER

NETWORK I FILE ACCESS COMMUNICATIONS NETWORK FAL REQUEST RMS I ... DRIVER r-- SERVICES '---- SERVER SERVICES

REMOTE NODE

Figure 1 RMS Interfa ce to Local and Remote Files

Moreover, the concept of a quoted file specifica­ tions using RMS can employ the network trans­ tion was introduced to allow file name informa­ parently to tion on a non-VMS system (one not adhering to • Access sequential, relative , and indexed the parsing rules specified for VMS files) to be represented. In addition, two special forms of (ISAM) files quoted string were adopted to specify the target • Utilize different access methods (sequential, entity in task-to-task communication. Thus the random by relative record number, relative by syntax of a file specification fo r network access key, and record file address) can be one of the following: • Operate in either record or block mode nodespec::device:[directory]file.type;version • Communicate with a network task as though nodespec::' 'foreign-file-specifier' ' reading and writing to a sequential file nodespec::"TASK= taskspec" RMS is used throughout the VMS system by the nodespec::"n=" Digital Command Language (DCL) interpreter, VMS utilities, and the run-time library routines where the latter two forms are used to identify a network task by name or object number. supporting high-level languages. As a result, Another important early design decision was to transparent remote fi le access and task-to-task provide fu ll access to remote fi les, beyond communication are available at the fo llowing remote file transfer and manipulation functions, interface levels: DCL commands, high-level lan­ through RMS. Currently, almost every RMS fu nc­ guage r;o statements, RMS services, and I/O­ tion can be performed over the network on a related system services. Figure 2 illustrates these remote VAX/VMS system. Thus most applica- relationships.

90 Digital Tecb nicalJournal No . 3 September 1986 New Products

I

PROGRAM

TO LOCAL DIGITAL RECORD QUEUE OR COMMAND MANAGEMENT 1/0 NETWORK LANGUAGE UTILITY SERVICES INTERFACE ! H 1-- DEVICE

RUN TIME HIGH-LEVEL LIBRARY LANGUAGE H �

I Figure 2 Interfa ce Levels fo r the VM S Sy stem I I All DECnet implementations provide task-to­ DECnet- VAX Building Blocks task communication so that an application pro­ Network Primitives gram can exchange data with another program DECnet-VAX provides task-to-task communica­ running on a remote system. In the VMS environ­ tions between diffe rent nodes within a network. ment, task-to-task communication can be per­ layered network applicati

Digital TecbnicaljourtUll 91 No . 3 September 1986 Th e DECnet- VA X Product - An Integrated Approach to Networking

Table 1 Relationships between RMS and DECnet-VAX

File System RMS Programming DECnet-VAX QIO Function Service Language Operation Operation

10$--ACC ESS $OPEN Open file Initiate or accept logical link

IO$_DEACCESS $CLOSE Close file Disconnect logical link

IO$_READVBLK $GET Read from file Read data across logical link

10$-WRJTEVBLK $PUT Write to file Transmit data across logical link

• The network ancillary control process , nection. This mapping causes the packet switch NETACP, which handles those functions that network (PSN) to act as though it were a DECnet require a process context in which to execute data link, thus allowing DECnet nodes connected to the same PSN (but not to each other) to com­ NETDRIVER processes all QIO requests fo r the municate using DECnet protocols. These proto­ network device. Network QIOs generally fall cols in turn permit any applications layered on into one of two categories: logical link traffi c, or network management requests. NETDRIVER DECnet to function correctly between the DEC­ net nodes. forwards to NETACP any request for logical link start-up or shutdown (e.g., the IO LACCESS QIO DECnet-VAX provides two other components, used to create a logical link) , or for net­ called the network control process (NCP) and work management fu nctions. On the other the network management listener (NML) , which hand, NETDRIVER handles logical link transmit work together to provide local and remote net­ 1 and receive requests (IQ $_WRITEVBLK and work management capabilities. NCP provides IOLREADVBLK) by itself. the user interface to network management fu nc­ NETDRIVER also contains the bulk of the rout­ tions. Network management is the process of ing layer of the DECnet-VAX software. Using controlling those parameters that allow the information provided by NETACP, NETDRIVER various components of a network to fu nction can determine the optimal circuit on which to efficiently. These parameters reside in two sepa­ send any received packet whose destination is rate databases: a permanent database that estab­ another node. By giving high priority to both lishes the default parameter values upon node logical link and routing-forwarding traffic, start-up, and a volatile database that contains NETDRIVER eliminates the overhead of invoking the current parameter values in a functioning a process to perform these high-throughput network. functions. Both the volatile and permanent databases can NETACP defines and provides access to the be accessed through NCP by using a common volatile network database, which is the working user interface . NCP in turn passes the parsed copy of the permanent network database. The requests to NML fo r actual processing. NCP then volatile database is allocated from NETACP's vir­ fo rmats the results returned by NML for the user. tual address space. NETACP also controls the NML is a server whose function is to perform state transitions of data links, the routing layer, network management operations on behalf of and logical links. Both the start-up and shutdown some client. NML receives its commands in a of logical links are handled in NETACP, which protocol called NICE, either from a local NCP also creates the process to receive an incoming copy or over a logical link fr om another node logical link having no declared network task. (usually, but not necessarily, fr om that node's In addition, NETACP provides support for Dig­ NCP). NML then returns the results via the same ital's X.25 packet switch network product, VAX NICE protocol. PSI. Through "data link mapping," NETACP NML is also the agent that owns and main­ makes it possible to map the fu nctions normally tains the permanent database. Upon receiving a provided by a data link driver onto an X.25 con- request for a permanent database operation, NML

92 Digital Tec bnlcaljournal No . 3 September 1986 New Products

will access the appropriate file using normal object number. The second ilre network "tasks ," RMS calls. For operations on the volatile data­ addressed by node and task name. Network tasks base, NML will issue a QIO to NETDRIVER, are actually a special case of network objects, which in turn forwards the request to NETACP, with object number zero reserved for identifying where the request is honored. network tasks. If a logical link specifies object number zero, it also supplies a task name identi­ Data Link Drivers fying a particular network task on the target There is a separate device driver for each com­ node. � munications device that can be used by DECnet­ In the VMS system, execu ion of each program image takes place within the context of a pro­ VAX . In most cases these communications cess. One process will ty�ically run multiple devices can be used by non-DECnet applications I images serially during its lifetime. A flexible as we ll. The same device driver is used to I mechanism was needed to associate a request for provide the data link interface to NETDRIVER, as a network object or task �ith the right process ' well as the QIO interface for user-written appli­ running the right image . cations not using the DECnet software . DECnet-VAX has three mechanisms to identify The data link drivers supply the needed sup� the execution entity that will be associated with port for the variety of lower level protocols pro­ a network object or task. In the fi rst , an image vided in the DECnet-VAX product. These proto­ registers itself in the network database as the cols include the synchronous and asynchronous specified object or task. TheI image then waits fo r Digital Data Communications Message Protocol (or initiates) connections with other programs in (DDCMP) , Ethernet, IEEE 802, and Systems the network. I Communications Architecture (SCA) for commu­ The second mechanism I involves creating an nicating across a VAXcluster communications entry in a local database tliat identifies network interface (CI) . objects. The database information specifies either a command procedhre or an executable File Access Listener image to be run in response to a request to con­ The File Access Listener is the component of RMS nect to the specified object,. Upon receiving such that is activated to service a request for access to a connect request, NETACP will create a process in a specified account that executes the com­ the local file system by a remote node . As such, FAL is an extension to RMS on the remote system. mand procedure or image . Setting up an entry in FAL uses RMS se rvi ces to access local files and the object database provipes the flexibility to b DAP to send data back to the requesting node . specify account informati n and privileges fo r the object when it is activ£ted. The third mechanism isl a catchall. This acti- VAX /VMS Environment in the ' Neh.IJork Kernel vates a command procedure to serve as a network task in the absence of e�ther an entry in the The underlying structure of DECnet-VAX was object database or a nontransparent declaration designed around the special environment pro­ of the network task by a running image . Upon vided by the VMS system. The manner in which receiving a connect request for a network task network programs are created and the environ­ not identified by either of the first two methods, ment ·in which they run are governed more by DECnet-VAX assumes that a command procedure the design of the surrounding operating system resides in the default dir�ctory of the account than by the network architecture. Some impor­ specified with the reque;st. DECnet-VAX then tant aspects of this design are the way network creates a process to execute this command pro­ objects are identified and activated and the use of cedure. For example, up�n receiving a connect VMS command procedures. This use provides a request for a task called NETISK, undeclared by simple, transparent mechanism for creating a any running image , DECne�-VAX will assume that network task. a command procedure tailed NETTSK.COM The DECnet architecture defines two classes of resides in the default direttory. execution entities that have addresses within a The DECnet-VAX software also causes a logical network and with which network communica­ name, SYS $NET, to be created for this process. tions can be established. The first are called net­ SYS $NET translates to a data structure containing work "objects," identified by node address and connection information .for this logical link.

Digital TecbntcaiJournal 93 No. 3 September 1986 Th e DECnet- VAX Pr oduct - An Integrated Approach to Ne tworking

The command procedure can either activate an Tra nsparent and Nontransparent image that accepts the connection specified by Modes SYS $NET or complete the connection directly DECnet-VAX provides two mechanisms to estab­ from DCL by opening a channel using the logical lish network communications between applica­ name. tions on different nodes. In the nontransparent There are both advantages and disadvantages to mode, an image executes a network call to de­ the way in which DECnet-VAX uses separate pro­ clare itself as a particular network task by cesses for diffe rent network objects. Each VMS name or object number. To use this mode a pro­ process carries its own protection, defined by grammer must have a thorough understanding of the authorization parameters of the account network primitives. However, the mode pro­ specified for the process and enforced by aspects vides greater flexibility than the transparent of the VAX architecture . Therefore , using a dif­ mode . That is, the same image can support multi­ fe rent process for each logical link is a conve­ ple logical links concurrently and can even act as nient way to maintain security and provide multiple network tasks. accounting information . It is also a simple means The transparent mode provides a very simple to keep separate the context of each logical link. means to establish a correlation between a For example, FAL must maintain the file protec­ network task and a process. In this mode, the tion appropriate for each user accessing files image being executed need not even be aware over the network. A FAL process handles only one that it is operating as a network task. Upon creat­ logical link (for one file access) at a ti me. To ing a process to handle an incoming logical link handle more would require FAL to completely for an undeclared ne twork object or task, and securely replicate the security context of NETACP creates the logical name SYS $NET. each user for each logical link concurrently This name contains the network control block­ maintained. That is a formidable task in the VMS needed to complete the connection with the system, with its complex set of authorization originator of the link. Performing a normal privileges. RMS OPEN on this logical name will start a chain The primary disadvantage of maintaining dif­ of events culminating in the establishment of the fe rent processes for different security profiles is logical link: reduced performance. This reduction results • The RMS OPEN operation issues an ASSIGN from having to create a new process when a log­ followed by a QIO IO$_AC CESS to confirm ical link is established. This disadvantage can be the connection. offset by using large timeout constants fo r each account's NETSERVER processes and FAL logical • The RMS GET and PUT operations translate to link caching, both of which are described later. QIO IOLWRITEVBLK and IOLREADVBLK calls to the same network device . ASSIGN and DASSGN Sy stem Services • The network device translates to network The ASSIGN system service creates a channel to a transmit and receive operations. specified device. This service provides transpar­ ent access to the DECnet-VAX software by recog­ The standard logical names SYS SINPUT and nizing when the device specification includes a SYS $0UTPUT can be assigned to the logical node name . ASSIGN then issues an IO$_ACCESS name SYS $NET before an image is activated. This QIO fu nction to NETDRIVER on behalf of the action will allow programs originally written caller to establish a logical link transparently. In to perform 1/0 from eithe r te rminal or disk this way, simply supplying a network task speci­ devices to act like distributed applications in a fier in place of a device name will create a logi­ network. The programs require absolutely no cal link when a channel is assigned. rewriting. The internal data structures associated with At this point the various layers of the design the channel are defined to resemble an open provide an environment in which network com­ file on the channel. As a result, when called to munications can proceed without any specific deassign the channel, the DASSGN system network calls issued by the programmer. A sim­ service wi ll issue a QIO IO $_DEACCESS re­ ple example, using DCL command procedures, quest to close the file. The logical link is then wi ll demonstrate the nontransparent mode of disconnected. network communication.

Digital Te cbnicalJournal 94 ' No. 3 September 1986 New Products

1. At a node called SOURCE::, the command Throughout this example, only two network fu nctions were performed at the application $ TYPE TA RGET::"O=TIME" level: the use of a network destination name in is entered fr om a process running under the TYPE fu nction, and the reference to the account USER. SYS $NET logical name in the TIME command procedure. 2. The TYPE image issues ari RMS OPEN to TA RGET::"O=TIME". DECnet-VAX FeaturesI fo r the 3. The OPEN service issues an ASSIGN re­ VMS Environment / . quest on the string, which results in a QIO Besides implementing tho�e fu nctions defined IO $_A.CCESS being issued to DECnet-VAX . fo r all DECnet implementations, the DECnet­ VAX product supplies added-value features 4. A connect request for network task TIME is designed for the VMS environment. These exten­ routed to node TA RGET, where the DECnet­ sions to the architecture enhance the way VAX software creates a process to run a com­ DECnet-VAX blends into the VMS system. Several mand procedure called TIME.COM. DECnet­ examples are illustrated in the following VAX then creates the logical name SYS $NET, · paragraphs. whose translation contains a network task specifier identifyingthe source of the logical Proxy Log- in link. One traditional problem withr password-based 5. TIME .COM issues the commands access control is making the required password available to all users needing access to a re­ $ DEFINE SYS $0UTPUT SYS$NET stricted resource . If the user membership needs $ SHOW TIME to change (e.g., if someone changes jobs and 6. DCL issues an RMS OPEN using the logical thus no longer has the right to access the name SYS $0UTPUT for its output. OPEN in resource) , a new password , which must be com­ turn issues an ASSIGN. When one logical municated to all current members, is required.

name points to another, the names are trans­ To address this problem, tbeI concept of "proxy lated in an iterative fashion until no fu rther log-in" was added to the DECnet-VAX software in translation is required. Since the logical 1983.2 name SYS $0UTPUT points to the logical With proxy log-in, each1 node maintains a name SYS $NET, the latter translation is used database of those network users having proxy by ASSIGN. ASSIGN finds the network task access to specific accountS on the local system. specifier and issues a QIO 10 $-ACCESS The database is used t() provide a one-to­ request to DECnet-VA X. The formation of one mapping between the user, identified as the logical link is then completed. The NODE::USERNAME, and the target proxy TYPE image at the source node now issues account. For example, take the case of the arrival an RMS GET, which then translates to a of a logical link request having no explicit access QIO IO $_READVBLK request on the net­ control information from � user whose name is in work channel. the database. In this case he process created by NETACP to handle the logicalt link will be run 7. The time is sent as a string by an RMS using the authorization context of the proxy PUT operation, which then issues a account. This mechanism allows members to be QIO IO$_WRITEVBLK request on the chan­ added to or deleted from a particular proxy nel established fo r SYS $0UTPUT. Since this account without their a priori knowledge of the is a network channel, the QIO is handled by account. DECnet-VAX and passed across the logical link to the source node. Cluster Alias Addresf 8. At the source, the data satisfies the QIO DECnet-VAX nodes can dperate on VAXcluster IO $_READVBLK, which in turn satisfies the systems. A cluster is a loo ely coupled, multiple RMS GET, allowing the TYPE image to dis­ processor network featuring$ fu ll sharing of disk play the time sent from the target node. storage and common user !environments on each

Digital Te chnicalJournal 95 No. 3 September 1986 The DEC net- VA X Product - An In tegra ted Approach to Ne tworking

node. Each member node within a cluster can he is using to a DECnet line. (This conversion be directly addressed from any other node in requires that access be from a PC using a termi­ the network. At times, however, it is also nal emulation package , such as SET HOSTjDTE very convenient to treat the cluster as a single under the VMS software, and that the PC can run DECnet node . Among other advantages , this the DECnet software .) capability makes it possible for mail to be sent to Mter logging in via the terminal .emulator to an users with accounts in the VAXcluster sys tem account on a routing node, the user directs the without knowing which member nodes are VMS system on the routing node to switch the ter­ active . minal line to DECnet use. The VMS system sends Associating the cluster with a DECnet address an escape sequence to the terminal emulator on is accomplished by supplying each node in the the PC. Recognizing the sequence, the emulator cluster with a second address, an "alias," repre­ converts the line to a DECnet link at the local senting the cluster. Each router in the cluster (at end. Meanwhile, the code in the router converts least one is required) adds the alias address to the line at that end to DECnet use. The design of the routing vector transmitted to other routers the VMS terminal driver makes possible this con­ in the network. That makes the routing vector version. The terminal driver separates its fu nc­ appear to be the optimal path to the alias tions between class drivers (implementing address. As a result, the rest of the network can­ higher-level fu nctions) and port drivers (inter­ not distinguish the cluster alias from the address facing with the hardware devices) . The DDCMP of a physical node . This approach has an advan­ asynchronous device support in the DECnet-VAX tage in that it requires no unique support in product is implemented as a class driver. That other systems and no modifications to the DEC­ makes it possible to switch dynamically between net architecture. DECnet and terminal use on a particular device As it is routed through the network , a message simply by switching class drivers on the same with the alias address will eventually arrive at a port driver. router within the VA Xcluster system. The router When both ends have switched to DECnet use, will recognize the destination address as its own the normal routing layer initialization takes alias and select a node within the cluster to place. Some additional checks happen during receive the message. The selection process is routing initialization on dynamic lines to ensure based on a weighted, round-robin algorithm. The that the node that just switched the line is per­ end communications layer within the router is mitted to do that by the router. These checks capable of identifying which node is associated give to the system manager on the router the with each logical link. Therefore, once a connec­ opportunity to control which nodes should be tion has been established, subsequent messages permitted to connect to his system. will always be routed to the correct node within the cluster. Performance Issues As DECnet-VAX has evolved, continuing effo rts Dy namic Asynchronous Connections have been made to improve its performance. Many personal computers, ranging fr om IBM PCs These efforts have run the gamut fr om restructur­ to MicroVAX workstations, are now capable of ing the basic modules to including support running the DECnet software over asynchronous aimed at improving specific areas of perfor­ lines. Thus has arisen the need for a more secure mance. The remaining sections discuss some of and easily managed mechanism fo r setting up ter­ the areas that have yielded the greatest perfor­ minal lines to be used as DECnet communica­ mance increases. tions lines. Ordinarily, one terminal line must be dedi­ Ne twork Drivers and Ancillary cated to DECnet use for each asynchronous line Control Processes needed. When those terminal lines are not being Wherever possible, those fu nctions having the used for DECnet purposes, they cannot be used greatest effect on performance have been imple­ as normal terminal lines. To solve this problem, mented in NETDRIVER. There they can be exe­ DECnet-VAX introduced, in 1985, dynamic asyn­ cuted at high priority without changing the pro­ chronous connections which allow an interac­ cess context, which would be required for tive user to dynamically convert the terminal line functions executed in NETAC P.

96 Digital Technical]ournal No . 3 September 1986 New Products

I I I Those functions that occur more infrequently rently idle. When a new 'ogical link request have been implemented in NETACP, which pro­ specifyingthe same account is received, NETACP vides the necessary process context. These func­ will forward the request to an appropriate idle tions include state changes and others that NETSERVER process instead of creating a new require a process context to allow access to a process to handle the request. On a busy system, large pageable database or system service. These this action can trim seconds off the start-up time include logical link creation and deletion, net­ for the logical link. To pr¢vent the problem of work management functions operating on the the local system filling up with NETSERVER volatile database, and routing table maintenance. processes that no one nee�s to talk to, an idle NETSERVER will time out ahd delete itself aftera Ne twork Server Processes certain amount of time. Transferring large files across a network or In 1986, FAL was extendedI to include the accessing many remote files can consume con­ capability to serially prodess multiple logical siderable resources on a remote system. Thus the links (and as a result, multiple files) . This addi­ provision of accurate accounting information for tion yielded a significant improvement in overall remote file access operations is quite desirable. throughput for file transfer activity, especially This need led to the implementation of FAL as a for wild-card operations. single-threaded server. That server runs in the context of a process logged in to the remote sys­ Window-based Congestion Control tem on an account that is accessible to the initia­ In a large network, data packets must be routed tor of the fileaccess request. through several nodes bef�re reaching their des­ RMS, being procedure based, does not know tinations. Congestion in ani intervening node can whether or not an application program intends severely decrease the throughput of all logical 1 to access additional files via the same account on links using that path. Con�inuing to send more the remote system. Originally, the DAP imple­ data through a congested n<;>de only makes things mentation in RMS was designed to terminate the worse. The systems at the ends of the logical link logical link with FAL upon closing a file or finish­ have no knowledge of which path is being used; ing a file search sequence. Consequently, for therefore, they have no direct way of knowing example, a wild-card operation transferring n where congestion may be occurring in the net­ files using the COPY command results in the work. This problem was addressed through the invocation of a total of n + 1 FAL processes. implementation of a win�ow-based "back-off" One process performs the RMS search sequence, scheme designed to detect the presence of con­ each of the others transfers in serial fashion each gestion somewhere along lthe path. The rate at file that is found. Unfortunately, this approach which data is sent will be reduced until the significantly reduced overall throughput, espe­ effects of the congestion ate no longer seen. I cially when a large number of small files were i being transferred. Node Database Structure The primary disadvantage of using separate Digital Equipment Corporation has a very large processes for individual network tasks lies in the internal communications network. As that net­ overhead required to create the process. The work grew in size, its volatile node database increasing complexity of authorization and pro­ became a performance bottleneck. Searches tection mechanisms within the VMS system has through the database to locate a particular entry increased the start-up time during process cre­ by name were causing ex�essive paging. Noting ation. This increase is experienced by users that the node database isi frequently accessed activating network tasks on other nodes as an using either the node name or the address as a increase in response time. search key, it was decide� that the speed of a Support for network server processes was look-up should be the sa�e for either type of introduced in 1983 to solve the overhead prob­ search. To accomplish th�t, the node database lem. A NETSERVER process can handle serially was augmented by two balanced binary search many logical links that require the same account trees, one keying off the node address, the other on the server node. NETACP maintains a list of off the node name. 3 Each entry in each tree con­ those NETSERVER processes that have been tains a pointer to the node database entry refer­ started for particular accounts but are now cur- enced by that entry. It also has a separate pointer

Digital Te cbnical]ournal 97 No . 3 September 1986

- The DECnet- VAX Product - An Integrated Approach to Networking to the node in the companion tree, making it pos­ Solving this problem was easy on a routing sible to parse the node database by either name node since it could determine that the destina­ or address. tion node was exactly one "hop" distant on the Another problem with the volatile node data­ same Ethernet. On a nonrouting node, however, base developed in 1984 as our internal network this information was not readily available. grew to over 2,500 nodes: the size of the data­ Fortunately, the NSP protocols establish the base caused the NETACP process to exceed its buffer size as the smaller of those offered by the paging file quota. That quota was increased to two parties involved. Furthermore, a nonrouting accommodate 5,000 nodes, a limit exceeded less node maintains (in its routing layer) a cached list than one year later. At that point the best solution of the nodes residing on the same Ethernet. That was to reduce the number of pages required by list enables the nonrouting node to address pack­ the node database rather than to continue taking ets to other nodes directly without passing the a larger portion of the paging file. packets through a routing node. This action per­ In a very large network, most nodes are repre­ mits a nonrouting node to always offer the use of sented merely as names and addresses. Most of 1500-byte buffers when it initiates a logical link the other node parameters, such as routing ini­ request on an active Ethernet circuit. When the tialization passwords, are generally used only for request arrives at the target node, the cache there a small subset of the total node population. With will correctly reflect whether or not the source that in mind an optimization that "walked" the node is on the same Ethernet. If so, the node can binary trees was built into the search routines. either offer the larger buffers or demand the nor­ Any node with only its name and address defined mal buffer size. is completely represented by the binary tree entries; so no database entry is allocated. When a Summary tree search locates an entry with no associated The DECnet-VAX product makes possible the database entry, the name and address information provision of a comprehensive set of networking from the tree entries will be used to initialize a capabilities that are compatible with implemen­ template database entry to return to the caller. In tations in other operating systems. DECnet-VAX a node database with 7,000 entries, this opti­ does this while integrating a high proportion mization resulted in a reduction of almost 2,500 of those capabilities into the heart of the operat­ pages in the paging file, which cut NETACP's ing system. That integration supplies services total page fileutilization almost in half. that make local and remote operations appear indistinguishable. Buffe r Size Op timization This integration was achieved by anticipating The DECnet architecture does not allow the seg­ the need for integrated networking capabilities mentation and reassembly of data packets at the from the start. The necessary "hooks" were pro­ data link layer. The architecture requires that the vided in a sufficiently general fashion to allow buffer size used by the NSP layer (the transport the continued development and expansion of layer) must be small enough to be handled by the networking product. This design allowed any data link in the network. Traditionally, this DECnet Phase II to be included with the early has meant using 576-byte buffers in the NSP releases of the VMS software, which did not have layer. to change as DECnet-VAX progressed through The 576-byte buffers limited the network's Phases III and IV. Similarly, as the VMS system performance when Ethernet, with its 1500-byte itself evolved to support multinode VAXcluster data link buffers, was supported. The NSP layer networks, DECnet-VAX was able to provide clus­ was still constrained to use the smaller buffers, ter addressing through its coupling with the since lower capacity data links existed in the net­ operating system. work. It was recognized that performance could As both the VMS operating system and DNA be improved between nodes on the same Ether­ continue to evolve, the design of the DECnet­ net by using the larger Ethemet buffers. In this VAX software will permit it to follow both, case there was no chance of the packets being providing transparent networking capabilities routed through a node that could not handle for users of VMS. them. The problem was to recognize when this optimization could be safely used.

98 Digital TecbntcalJournal No. 3 September 1986 New Products

Acknowledgments Any discussion about the basic design of the VMS software should acknowledge the contribu­ tions of its primary architects David Cutler and Richard Hustvedt. The design and implementa­ tion of DECnet-VAX owe much to the work of Scott Davis, Alan Eldridge, and Tim Halvorsen. References

1. N. La Pelle, M. Seger, and M. Sylor, "The Evolution of Network Management Prod­ ucts," Digital Tecbnietiljo urnal (Septem­ ber 1986, this issue): 117-128. 2. P. Karger, "Security in DECnet: Authentica­ tion and Discretionary Access Control ,'' Digi­ tal Equipment Corporation Internal Techni­ cal Report, DEC TR-121. 3. D. Knuth, The Art of Computer Program­ ming , Volume 3 (Reading: Addison Wesley, 1973).

Digital Te cbnical]ournal 99 No. 3 September 1986

• john Fo recast james L. jackson jeffr ey A. Schriesheim I

The DECnet- ULTRIX Software

The UL TRIX system is the second op erating system app roved by Digital fo r its VAX processors. Incorporating the Digital Networking Architec­ ture (DNA) capabilities into this software was importantto supportdis­ tributed applications. A key constraint was that no changes should be required to existing DNA protocols or DECnet implementations. The 4.2BSD socket interfa ce wasexpanded to support the DECnet protocols and a unique object sp awnerwas created to simplify writing new servers. A network management structure incorporating DECnet's database con­ cept also hadto be built. TheDECnet- UL TRIXso ftware isthe first product implementing the DNA strategy on any variant of the software.

Project Goals Project Constraints The DECnet-ULTRIX software is Digital's first In planning the DECnet-ULTRIX design, we product to be layered on the ULTRIX-32 software wanted to clearly identify our constraints at the and is a key part of our ULTRIX strategy. One outset of the project. Thus we would have a con­ major reason for developing DECnet-ULTRIX was sistent and well conceived framework for making to bring the ULTRIX system into Digital's com­ design decisions. puting environment. We believe that our cus­ We decided that the software should require tomers will better meet their computing needs no changes to the currently available DNA proto­ by being able to use the VMS and ULTRIX operat­ cols and DECnet implementations. If problems ing systems together. Such a mixture of systems with other DECnet products were uncovered by requires communications mechanisms that the DECnet-ULTRIX software, those problems are easy to use and manage, yet provide high would be solved. This decision was made so that throughput. These mechanisms make possible the software could be completely compatible the transportation of existing applications from with the large base of existing DECnet networks VMS systems to ULTRIX systems. Thus new dis­ without requiring the upgrading or patching of tributed applications can be built by taking software fo r any system. Our goal was to have advantage of the strengths of each system. ULTRIX systems simply "plug" into existing DECnet-ULTRIX Version 1.0 provides file networks, thus adding new capabilities for our transfer, remote terminal access, mail, network customers. management, and user programming interfaces. All features of the DECnet programming inter­ All these fu nctions are completely compatible face had to be provided even though some would with all current implementations of the Digital never be used by many customers. A DECnet­ Network Architecture (DNA) . The DECnet­ ULTRIX user should be able to write programs to ULTRIX software also makes possible a large communicate with any existing DECnet applica­ number of other options, such as support for ter­ tion program on any type of DECnet system. minal servers, protocol gateways developed by These features include passing optional data Digital, layered applications, and management with connection establishment and dissolution, tools. As we migrate the DECnet protocols passing access control information on a connect toward the Open Systems Interconnect (OSI) request, and rejecting a requested connection protocols, the DECnet-ULTRIX software will while supplying a reason code. provide the means for ULTRIX systems to com­ We decided that the DECnet-ULTRIX software municate with those of other vendors. should be culturally compatible with the UNIX

100 Digital TecbnlcalJournal No . 3 September 1986 New Products

programming environment and other networking DECnet system requires op(ions having no equiv­ implementations on the ULTRIX system. Such alent in the IPC socket interface . compatibility required that it be easy to port We also had to find ways to present those applications that used other protocols, such as options to users without extensively modifying the transmission control protocol (TCP) and the the IPC routines in the ULTRIX kernel. Modify­ (IP) to use the DECnet system. ing the kernel's IPC code would require changes Also, it should be possible to write applications to other protocol implementations and reduce that would run over the DECnet and other proto­ our ability to port the DECnet code to other cols at the same time. We felt that the DECnet­ 4.2BSD-basedsy stems. In pmicular, a way had to ULTRIX software should perform at least as well be found to allow a server process to reject a as the TCPjiP implementation. requested connection. We also had to support DECnet's ability to pass user-supplied data with Significant Design Decisions connect, accept, or reject operations. The IPC For several weeks at the project's start we exam­ interface in 4.2BSD provides no means for pro­ ined alternatives for the basic design. Two major grams to pass data or acce�s control information ones were considered for the basis of the net­ within a connection request. Therefore, no work environment. The first was to extend the means existed for a program to decide that a "socket" interface from the ULTRIX system, requested connection should be rejected. This which had been developed as pan of Berkeley limitation was not acceptable for a DECnet 4. 2BSD for the Defense Advanced Research Pro­ implementation because certain application ject Agency (DARPA) TCPjiP project. A socket is level protocols in the DNA structure depend on an addressable end point of communications connection data for versi�n coordination and within a process, directing data to a similar access control. socket in another process. This socket interface Another weakness found in the existing had many of the functions we needed, although 4.2BSO mechanisms was in the area of network some additions would be required. It had a disad­ management. One of DECnet's strengths is its vantage in that the socket environment would management and control functions provided by be difficult to port to another variant of the network management tools across nodes within a UNIX software, should we eventually decide to network. 2·3 Using a single command interface do that. called the network command program (NCP), a The second alternative was to build a version user may examine and ch'ange the state of key of a communications executive on the ULTRIX parameters on any system within his network. In software that would isolate the protocol modules addition, he may examine counters describing from depending on the operating system.1 This network activity and errors. Many conditions approach had been used successfully in another occurring on systems within a network can product set and had the primary advantage of trigger the generation of events or notification making more of the implementation portable. messages. These messag�s can be directed to For example, this alternative would make it eas­ consoles, files, and programs anywhere in the ier for us to port the DECnet-ULTRIX software to network. the UNIX System V software. To implement these network management fea­ Our final decision was to implement the tures, we had to add program-level access to DECnet-ULTRIX software with the first alterna­ change parameters and to read counters and tive, using the 4.2BSD interprocess communica­ other information kept in the ULTRIX kernel. tions (IPC) mechanisms. This alternative pro­ The network device controlI needed an especially vided the most compatible interface with other large number of changes. :Berkeley 4.2BSD pro- protocols and took advantage of the services vides a very limited set of controls over network already provided by the IPC code in the ULTRIX interfaces. These controls are insufficient to sup­ kernel. We knew that the socket interface would port the functions of DECnet network manage­ have to evolve to support other protocols, such ment. In particular, the DECnet software has to as ISO transport, and that we could provide some be able to turnint erfaces on and off, gather coun­ leadership in managing its evolution. In the short ters kept in the device, a�d enable and disable · term we would provide extensions since the multicast addresses,

Digital Te cbntcaljournal 101 No . 3 September 1986 The DECnet, ULTRIX Software

Components of the DECnet-ULTRIX We had to provide many supporting routines Software in addition to the DECnet protocol code linked Programming Interfa ce into the ULTRIX kernel.Those routines are mod­ ules archived in the standard C-language library As mentioned earlier, we decided to base the at DECnet installation time. They provide access programming interface in the DECnet-ULTRIX to the DECnet node and object databases, code on the 4.2BSD interprocess communica­ address-conversion routines, and several routines tions facilities, which are modeled on the socket providing a simplifiedprogr amming interface to interface. Operations on sockets are similar to the kernel socket routines. operations performed on "logical units" or "file descriptors," which direct IjO operations to a Kernel Changes file or another device. Programs make systems Our goal was to minimize the number of kernel calls to create, bind names to, connect to, send changes required to support the DECnet system. and receive data over, and destroy sockets. All the new fu nctions that reject connections and The IPC interface in 4. 2BSD is designed so pass data or access control information on con­ that sockets exist in specified communications nection requests were implemented using the domains. Sockets within a domain share common existing "setsockopt" (set socket option) and properties, such as their naming scheme, and "getsockopt" (get socket option) system calls. may communicate only with other sockets in the We modified the ULTRIX kernel to allow those same domain. To implement the DECnet-UtTRIX calls to be dispatched to domain-dependent software, we had to add the "DECnet" communi­ code. That was something the 4.2BSD designers cations domain to the existing Internetand UNIX had documented but not fu lly implemented. We domains. The basis of this support was the addi­ also increased from 112 to 1024 bytes the maxi­ tion of new modules implementing the DECnet mum amount of data that could be passed across protocols (at OSI levels 2, 3, and 4). These mod­ those interfaces. In that way we could accommo­ ules would be linked into the ULTRIX kernel date passing all the access control information in when the DECnet domain was installed. They a single request. allow programs to create sockets using the DEC­ We found several bugs in the kernel support net network services (NSP), routing, and Ether­ for this type of socket. Therefore, this DECnet net data link protocols to communicate. implementation appears to be the first network­ Within the DECnet domain, two types of sock­ ing domain to support sequenced packets. All ets are provided: stream, .and sequenced packet. these bugs were fixed in version 1.2 of the Stream sockets provide a bidirectional, reliable, ULTRIX-32 software. and flow-controlled stream of data between two Most of the kernel changes were made in the processes. Sequenced packet sockets provide network device drivers. The initial release of the these same features while preserving the mes­ DECnet-ULTRIX softwarewas to act as a nonrout­ sage boundaries of the data as presented to: the ing node on the Ethernet. Therefore, we were sending socket interface. All existing DECnet concerned only with the Digital Ethernet inter­ applications protocols use the sequenced packet faces and the DEUNA and DEQNA network interface because the message boundaries are adapters. As written, the drivers for those devices used to indicate the lengths of data within mes­ supported only the internet protocol (IP) used sages. The stream socket interface was provided by TCP for routing. They had explicit informa­ to facilitate porting applications from the other tion about the IP protocol types coded into the UNIX communications domains to the DECnet device interrupt routines. We also wanted to add domain. In stream sockets,, the data flowing support for additional protocols (e.g., the Local through the stream must be self describing. In Area Transport, IAT, and maintenance operations that way the applications programs using the protocol, MOP) at a later date. Therefore, we stream know how long each data element is with­ added kernel routines that could be. called from out relying on message boundaries. Data delivery any Ethernet driver. Those routines:dispatch to is based on buffering and flow control consider­ domain-dependent routines when a message is to ations rather than preserving information about be transmitted or a new message is received. We the way the sender presented the data to the also added a number of I/0 control () func­ stream. tions: to those drivers. Those functions allow

102 Digital TecbnlcalJour.nal No. 3 September 1986'. New Products

changes to the physical address of the hardware user account. Once the process is executing, the interface, enable and disable the reception of serversimply needs to read and write from stan­ multicast messages, and control more extensive dard input and output, set ;up by the spawner to support for device counters than had previously be directed to the created �ocket. been present. Network Management Object Sp awner The user interface to network management is We thought that one area could be greatly provided via NCP, which on the ULTRIX system improved over the standard socket support in the accepts the same command syntax as that on all 4.2BSD standard: the invocation of server, or other DECnet systems. NCP communicates with "daemon," processes. When a client program the network management listener (NML) , both connects to a server program, software on the on the local system and on remote systems, to specified node has to decode the address and execute management commands. Local com­ inform the correct target program that it has a mands cause NCP to con;tmunicate with NML pending connection. Calls from the 4.2BSD using a UNIX ''pipe''; remote ,commands are exe­ socket kernel support that software in a way cuted through DECnet sockets. NML controls the requiring all possible destination processes to be management databases, implemented mostly as running and listening for connections. This sup­ files with some parameters stored in the kernel port has several bad effects. First, each of those and accessed through a special DECnet socket servers consumes memory and slots in the pro­ interface. cess table. Second, writing a new server process The access methods and file organization fo r is more difficult since each process has to issue DECnet databases are quite different from those multiple system and library calls to receive and provided by the 4.2BSD TCPfiP implementation. bind its address to a socket. The TCPfiP databases are organized as a set of To solve these problems on the DECnet­ files constructed and modified using any stan­ ULTRIX software, we implemented an "object dard text editor. Those files contain the host spawner," which creates a socket to which the name-to-address mapping and the service name­ process binds a special address. That address to-address mapping. Program access to those informs the DECnet code in the kernel that files is supported only for read operations. This the spawner should be given the connection limitation was unacceptable for the DECnet data­ requests for which no other process has declared bases, which require full read and write access an interest. With this mechanism the existing using the NCPfNML programs. NCP supports model of serverprocess is still supported and can commands to add and modify entries for many be used as desired. A process may choose to cre­ DECnet entities. Moreover; the DECnet databases ate its own socket and listen for connections. It must support networks cbntaining many thou­ does.that if it wants to handle multiple sockets sand of nodes. per process or to decrease the connection pro­ We explored several alternative ways of struc­ cessing time by the time required to create a new turing the databases to provide such program process and execute a file. access to support write operations. Eventually Using the DECnet object spawner greatly sim­ we chose a file format organized as a simple plifies the writing of a new server and provides sequential binary file. Reading an element from several useful services. A new .server has to be the database involves first allocating enough vir­ defined in the DECnet object database by using tual memory for the entire file. Then the file is NCP. Defining a server involves specifying its read into virtual memory,' and a linear search is address and the file that should be executed performed for the desiredI element. Writing an when a connection for the server arrives. Addi­ element into the file involves reading the entire tional parameters indicate the type of socket that file into the allocated virtual memory. Then the should be created for the server (stream or file is .searched for the :position of the new sequenced packet) and the default user account element, which is written to the file. Finally to run under if no access control information is the remainder of the existing portion of the supplied by the '.client process. The spawner file is written into its new position in the file. authenticates ever.y.:t:hing for the server and exe­ While this "brute force" method was not partic­ cutes the process in the context of the specified ularly elegant, its performance and reliability

Digital Tecbnlcal]ournal 103 No. 3 September 1986 Th e DECnet- ULTRIX Software have proven to be very acceptable, even with drm - for DECnet remove files (as in the extremely large databases. Other methods rely­ UNIX "rm" command) ing on indexed and hashed file access proved to dcat - for DECnet type fi les (as in the UNIX be far more complicated than their marginal per­ "cat" command) formance benefits warranted. One idea we examined during the develop­ We decided to provide only command-level ment of the DECnet-ULTRIX software was to access to DAP file operations to shorten the build a new network management database that development time for the DECnet-ULTRIX pro­ could include entries from TCPfiP, DECnet, and ject. The d commands were implemented using a other protocols. This idea was abandoned for two set of file access routines with a calling conven-­ reasons. First, we fo und that existing calls to the tion very similar to the normal C-standard 1/0 TCPflP database routines did not contain all the routines (stdio). We decided not to make the necessary parameters required to support net­ d routines available to customers. We felt that work addresses of more than one format. Since network file access should be transparently avail­ one project goal was to leave the existing net­ able to all programs, not just those using a spe­ work programming interfaces unchanged, this cial set of 1/0 routines. Making the routines idea made impossible the adding of new parame­ available would require extensive changes to the ters to those function calls. Second, we felt that ULTRIX kernel, something not possible given creating new routines that were to be linked into our tight development schedule. customers' programs would require a signifi­ The server side of the DAP implementation is a cantly different database format, thus requiring file access listener (FAL) program that is invoked the relinking of existing TCPflP applications. using the standard DECnet object-spawning We deemed this relinking to be unacceptable. mechanism to handle user requests. FAL is a What we did do, however, was to ensure that straightforward program that can sometimes fall the DECnet and Internet database routines were short of what users expect it to do. In fact the compatible in their naming and calling conven­ biggest challenge we faced in implementing the tions. In that way a later release could change DAP protocol on the ULTRIX system was to meet both sets of routines to be "stubs" that called the expectations of both ULTRIX and remote into a common base of supporting routines. We operating system users concerning what consti­ intend to explore this concept further when we tutes reasonable behavior. The DAP protocol has adopt a name server-based mechanism for storing many options, but each DAP implementation certain network management information. incorporates a slightly different dialect. These slight differences exist because each operating File Tra nsfe rs system's file operations are different. Each sys­ In a DECnet system, file access operations are tem must map its own way of performing those performed using the data access protocol (DAP) . operations into the DAP operations. Writing each File access uses a clientjserver model in which side, the client and the server, of a new DAP the client program contacts a server program to implementation presents different problems. accomplish some task specified by a user. DAP These problems are exacerbated when one supports most of the common file system opera­ requires also that no .existing DAP implementa­ tions, such as reading, writing, deleting, and list­ tion be changed to work with the new imple­ ing the names of files. For the first release of the mentation. The client side of DAP drives the file DECnet-ULTRIX software, we decided that DAP operations; the server side is passive, performing client operations would be implemented using a only the operations requested. Most problems new set of file operation commands called the d occur because the ULTRIX system has no (for DECnet) commands. They are similar in con­ enforced record structure within its files, while cept to the 4.2BSD rep command for remote most of Digital's other operating systems perform copy. The d commands are as follows: their file access using a record orientation. The ULTRIX client code, as implemented in the dcp - for DECnet copy files (as in the UNIX d commands, cannot simply request other DAP "cp" command) servers to supply data in ULTRIX record format dis - for DECnet list fi le directory informa­ (stream). Instead, the d commands must inter­ tion (as in the UNIX "Is" command) pret the record formats of those other operating

104 Digital Te chnicalJo urnal No. 3 September 1986 New Products

systems and convert data to and from the fo rmat remotely logged in. We did, however, encounter that ULTRIX users expect. some problems trying to use this capability. Ac hieving this capability involved adding The CTERM protocol exports terminal I/0 knowledge to the commands by · programming requests from the host to � server, which exe­ several different record fo rmats and attributes. In cutes them, thus reducing the host processing most cases the data is automatically converted load. The pseudoterminal! interface provides for the user into the fo rm desired. For cases in transparent buffering ber een the controlling which that is undesirable, a means is provided program and the programsJ. controlled. In that for the user to bypass that conversion. The way the controlling program never knows when ULTRIX FAL program must also perform data another program has issued a read request; there­ conversions even though DAP, as a server, has no fo re, the controlling program cannot know when such responsibility. FAL is forced to convert the to ship a read request to the server. Fortunately, ULTRIX format stream into the appropriate vari­ the protocol supports a notification function that able length format files of the other operating the server sends to the host !if a user types a char­ system so that it need not be modified to work acter and there is no outst nding read request. with the DECnet-ULTRIX code. Using this function we al�lowed the server to issue a "pseudoread" request when the first Remote Te rminal Access character is typed. Usually, the request is for a In our planning for the initial release of the fu ll line of input, thus allowing the server to per­ DECnet-ULTRIX software , we decided not to form character interrupt processing and local include remote terminal access because it character editing. required too much development time . Once the Using the remote termin l access protocol, a basic networking code ran in the kernel, how­ terminal can connect logically� to a remote sys­ ever, we easily modified the 4.2BSD remote ter­ tem having very different �ontrol conventions, minal access programs rlogin and rlogind to run such as control characters and line terminators. over a DECnet system. Those programs provide For this reason the server program ( dlogin) dis­ ULTRIX-to-ULTRIX terminal access. Later, with ables all special character processing by the local minor changes to the protocol , we used the terminal driver. The program then processes modified rlogind (now called dtermd) to adver­ each character individually to perform any func­ tise to other DECnet systems that it could com­ tions requested by the host �ystem. A two-charac­ municate with the TOPS-20 remote terminal pro­ ter sequence (by default a filde [ -] followed by tocol. Using this mechanism we provided access a carriage return) is reserved to allow entry to a to the ULTRIX system from non-ULTRIX DECnet local command mode (the fi rst character may be systems that had previously implemented sup­ changed by a command line switch) . This local port for the TOPS-20 software . This capability command mode provides access to the shell on proved so useful that we decided to include full the local system. This mode also provides com­ remote terminal support in DECnet-ULTRIX mands to log in the terminal session to a fi le and Version 1.0. to suspend or terminate the current remote ter­ In the DECnet system, remote terminal access minal session. operations are currently performed using the command terminal (CTERM) protocol. Remote Mail jI terminal access uses a hostjserver model in Of all the fu nctions in the bECnet-ULTRIX soft- which the server (which controls the physical ware, mail was the easiest to implement. The terminal) contacts the host to request access to mail system included with the ULTRIX system the remote system. Once that connection has was already quite sophisticated. This mail system been established, the host controls the terminal supports multiple mail protocols and address through the CTERM protocol. fo rmats with a central mail program named send­ The ULTRIX system supports a "pseudotermi­ mail. Sendmail is driven by a configuration file nal" driver that allows a program to control that can be tailored on eath ULTRIX system to other programs through what appears to be a define new address format and mail-forwarding normal terminal interface. This control allows rules. We decided to add �� upport for the most the daemon program ( dlogind) on the host to common DECnet mail protocol, called mail- 11, provide a standard interface to users who are supplied with DECnet-VAX and other DECnet

Digital Tecbnlcal]ournal 105 No. 3 September 1986 i I J. The DECnet- ULTRIX Software

systems. Supporting the mail- 11 protocol was a many of our development ideas with people simple matter of writing a mailer program adher­ from Digital's UNIX Engineering Group. We ing to the sendmail interface and speaking the decided our first project should be to implement mail- 11 protocol . Once this program was writ­ an Ethernet end node using the DNA Phase IV ten, we modified the sendmail configuration file protocols. This implementation would include to handle DECnet mail addresses properly so that support for a programmable user interface, mail, the DECnet mailer would be invoked- when nec­ and file transfer. Our decision to add a remote essary. Only a few minor problems were encoun­ terminal capability was made later. Experience tered dealing with different ways to parse mail with other DECnet implementations had shown addresses and different comments contained in us that these fu nctions would be both necessary mail addresses between� VMS systems and the and sufficient to satisfy the majority of most ULTRIX sendmail program. users' needs. With this new capability, ULTRIX systems can We built prototypes whenever possible to get now act as mail gateways between DECnet net­ fu nctions working quickly. These prototypes works and any other type of mail network sup­ tested the viability of the interfaces and various ported by UNIX systems. implementation approaches. Often we explored several diffe rent designs before choosing one DEC net- UL TRIX Performance that worked best as a prototype. One original goal for the DECnet-ULTRIX soft­ These shortcuts can be very valuable, provided ware was to provide a level of performance simi­ they are fo llowed by a thorough review of the lar to that of the TCPjiP domain. In general, we work done. On this project this method worked met this goal. Both rep and dcp transfer files at very well. The ULTRIX system ran as a DECnet approximately the same rate, and while rep is end node in late summer 1984, afterwhich time slightly fa ster,. it requires a larger percentage of several UNIX utilities were quickly converted to the available CPU time. use the DECnet software. The following measurements were taken Our past experience helped us to gauge the between two VA X- 11/780 systems on a private amount of work required in each development network: area. That experience allowed us to start work early on network management since previous dcp average file 51 kilobytes (KB) implementations had shown that area to be one tranfer rate per second of the largest bodies of work. Throughout the rep average file 51KB project we succeeded in keeping work on each transfer rate per second component from being blocked by dependencies on other components. With tight project man­ ftp average file 27KB agement we put the DECnet-ULTRIX software transfer rate per second into field test less than one year after the project DECnet maximum 1200 kilobits (Kb) began. data transfer rate per second The entire product was written in the C pro­ gramming language; a large amount of code Project Management was later transported to other projects, notably to The DECnet-ULTRIX project began in early DECnet-DOS. Our experience transporting the 1984. The Berkeley 4.2 version of the UNIX implementation improved the quality of software had just been selected as the basis for the code since many components were tested Digital's UNIX software fo r the VAX system. Ver­ using additional interfaces and different code sion 1.0 of the ULTRIX system was well on its reviewers. way to completion. Our task was to define the DECnet-ULTRIX project, build a project team, Summary and deliver a Phase IV implementation for the The DECnet-ULTRIX project provided many ULTRIX system in the shortest possible time. challenges. The most constant one was how to At the start, each team member had a lot of build a product that appeared similar to other DECnet and software development experience, Digital products, yet acted like a natural exten­ but very little UNIX expertise. As we learnedthe sion to the UNIX base upon which the product intricacies of the UNIX software, we discussed was bu ilt. The compromises required to meet

106 Digital Te cbnicalJournal No. 3 September 1986 New Products

that challenge fo rced us to address many areas, References from command formats to the structure of the 1. J. Forecast, J. Jackson, J. Schriesheim, "Com­ written documentation. munications Executive Implements Com­ The DECnet-ULTRIX project met all its puter Networks," Computer Design (Nov­ ' goals for functionality, performance, and sched­ ember 1980): 71-75. ule. The completed product was delivered I to Digital's Software Distribution Center only 2 .. N. LaPelle, M. Segar, arid M. Sylor, "The Evo- 16 months after the project's inception. The lution of Network Management Products," Digital Te chnical jo urnal (S eptember DECnet-ULTRIX software will be followed by i other releases, thus adding functions and follow­ 1986, this issue): 17�128. ing the migration of the DNA strategy to Phase V. 3. M. Sylor, "The NMCCjDECnet Monitor Since the ULTRIX system supports TCPjiP, the Design,'' Digital Technical jo urnal (Sep­ addition of DECnet has provided a natural base tember 1986, this issue): 129-1 41. for a DECnet-to-TCPjiP gateway. While not being the primary focus for this product, the essential fu nctions required for a gateway are now present. This fact is significant because TCP/IP represents a de facto standard fo r com­ munications protocols in the UNIX community. The DECnet-ULTRIX product is thus able to provide a level of integration for the UNIX prod­ ucts of other vendors into Digital's computing environment. Future standards in all areas of OSI will provide a better degree of integration for the DECnet system and the UNIX community. Until they are widely implemented, however, DECnet capabilities on the ULTRIX system provide a valuable bridge between the two environments. Acknowledgmen ts The authors wish to acknowledge the help of the members of the DECnet-ULTRIX development team, who maintained a high degree of energy and enthusiasm throughout the project. These members were Kim Buxton, Ed Ferris, Karen Gillin, and Bill Spencer. Thanks are also due to Steve Seufert, Faye Allen, Marie Rowntree, Terri Buckley, Pat Nelson, and those individuals in the ULTRIX Engineering Group who worked with us throughout the project.

Digital TechnicalJournal 107 No. 3 September 1986 Peter 0. Mierswa DavidJ. Mitton I Martha L. Sp ence

The DE Cnet-DOS Sy stem

The DECnet-DOS system is an implementation of th e Digital Ne twork Architecture standard fo r both Digital's Rainbow personal computers and those of IBM Corporation. This system provides all the services asso­ ciated with a DECnet implementation. These include a choice of commu­ nication technologies, adaptive path routing over complex topologies, and network monitoring and management. DECnet-DOS also supp orts task-to-task programming, remote file transfe r and access, remote termi­ nal services, and network mail services. Those tasks are all performed on a fa mily of low-speed, small-memoryprocessors.

Over the past few years the low cost and avail­ ten by hundreds of different companies. And ability of appl ications for personal computers because of the volatile nature of the PC business, (PC) have been enticing an increasing number of this product had to provide a wide range of basic businesses to acquire them. At first, each PC user network services and layered applications. Since worked with his own computing resources in a products for PCs were being rapidly introduced, stand-alone manner. Eventually, however, these our goal was to design, implement, and test this users fo und they wanted to share programs, data, product in a fairly short time period. and messages with each other. They also wanted It was clear to the project team that the only to take advantage of the databases, processing way to meet the time-to-market goal was to fo l­ power, and larger applications on the large com­ low one strategy. First, the project had to adhere puter systems in their companies. strictly to the DNA architecture, thus eliminating Within Digital Equipment Corporation, there any temptation to implement new and unique is an engineering group responsible for the protocols not supported by other existing imple­ implementation of DECnet software on Digital's mentations. Second, the project had to borrow small systems. This group believed that a DECnet software freely from other products that had implementation for personal computers could been or were being developed. easily satisfy these users' desire for data and pro­ This paper presents a unique model for the gram sharing. In 1984, this gro�p initiated a pro­ rapid development of a specific product by uti­ ject to implement the Digital Network Architec­ lizing work done on many other projects. The

ture (DNA) on personal computers that used · the combination of original software and borrowed MS-DOS operating system. code, all within the framework of DNA, allowed The implementation of DECnet software, a us to introduce the DECnet-DOS system in a rela­ mature, layered communications architecture, tively short time period. The overall problems on the personal computers of both Digital and we encountered and how we solved them are IBM Corporation presented a number of interest­ first presented within the context of general ing problems. The team had to work with asyn­ issues. Then each layer of the architecture is chronous and Ethernet communications con­ used as a springboard from which to discuss par­ trollers and a number of different, relatively slow ticular problems encountered in that layer. processors, all built by other companies. They had to work within the confines of the MS-DOS General Development Issues system, a small operating system with few system services capable of supporting multiple commu­ Coding Style and Standards nication tasks in the background. Moreover, the The time-to-market goal dictated that both cod­ resulting product had to be compatible with ing and debugging had to be done as quickly as thousands of application programs already writ- possible. We immediately agreed upon using a

108 Digital Tecbnical]ournal No. 3 September 1986 New Products

1 higher-level lang�age and fairly strict coding sity of Texas Health Science Center, San Antonio, standards to shorten our development time. We to implement the fi le transfer utility. They had also had to initiate a search for code fragments expertise developing an in-house DECnet imple­ already existing within Digital that were also mentation on another small system. required in our product. We felt that incorporat­ ing related code could also greatly reduce both Background Ta sk Design the development and debugging time. The MS-DOS system is a single-task operating sys­ The MS-DOS environment is quite similar to tem; no servicesare provid�d to support the con­ the UNIX environment. Many C compilers based current execution of two tasks. This fact raised a ! on MS-DOS machines offer libraries similar to problem because a number of the requirements the one on the UNIX system. Therefore, it was for the DECnet-DOS system! demand the concur­ quite fortuitous that the DECnet-ULTRIX system, rent operation of some network tasks with the Digital's DECnet implementation for the ULTRIX user's current application. Such tasks include the system, had just been completed. It was written .transmission and reception of periodic routing almost entirely in the C programming language. and line confidence messages, and the concur­ We felt that some of the DECnet-ULTRIX code rent operation of multiple connections. More­ could be used successfully in DECnet-DOS. Our over, a particular constraint was that application strategy was to do the fo llowing tasks: programs have to run as wr�tten, without coding changes, while the network! is active. We simply • Write as much code as possible in C. Do not could not require that the t1 housands of applica- preclude the use of assembler language tions currently on the market for MS-DOS-based if required to access devices or services systems be changed. unavailable in C or to reduce execution time To solve this problem, we devised a scheme where necessary. that would allow network tasks to run either at • Use common coding and style practices for periodic intervals or from an interrupt from a all code. communications controller. This design envi­ sioned that network tasks w�re interruptable and • Adopt the DECnet-ULTRIX programming could run in the background completely trans­ interface. The programmer's access to net­ parent to the application pr�gram running in the work services is not part of the architecture foreground. This scheme I had to be designed but is specific to the operating system and quickly and work the first ! time for the project the DECnet implementation. team to complete the product on schedule. • Port code from the DECnet-ULTRIX software Unfortunately, the design of task scheduling in whenever it is applicable and easier than the DECnet-ULTRIX software was incompatible writing original code. with our scheme. Therefore, that portion of the DECnet-ULTRIX code could not be ported to • Include trace facilities in the basic driver solve this problem. However, the interrupt archi­ and all utilities as part of the design. tecture of the PDP- 11 syst.em and those of the Intel processors in the target machines are very Training Issues similar. Therefore, the intetrupt design from the At the start of this project, few engineers in DECnet-RSX software that l runs on the PDP- 11 Digital's Network and Communications Group system could be used for the, interrupt function had extensive experience with MS-DOS internals in DECnet-DOS. or C programming. To prevent a certain delay of The DECnet-RSX CPU scheduler uses a set of several months while people were trained, the work queues in which request packets called project team decided to pursue two avenues of communication control blocks (CCB) are external assistance: temporary in-house consul­ queued for processing. Any data buffers associ­ tants, and external engineering. Three consul­ ated with the requests are pointed to by fields tants with MS-DOS and C programming back­ within the CCBs. ; grounds were employed, being gradually re­ For example, if a messag� is received from the placed by Digital employees as they completed communications controllet, a CCB will be cre­ their training. An arrangement was made with the ated by the device driver f�r the controller. The Computing Resources Department of the Univer- received data is placed in a. buffer pointed to by

Digital TechnkalJournal 109 No. 3 September 1986 The DECnet-DOS System

that CCB, which will be placed in the work 3. Allocates a receive buffer queue of the routing layer. This layer, when run, 4. Copies the message from the Ethernet will process the buffer pointed to by the CCB. If controller to the receive buffer fu rther processing is necessary, this layer will place the CCB in the work queue of another net­ 5. Resets the controller to receive another work process. message This scheme would work perfectly well if the CPU scheduler were designed to scan the work 6. Calls a subroutine to insert a CCB point­ queues both at periodic intervals and after con­ ing to the message onto the work queue in troller interrupts. Then the scheduler would dis­ the background network process patch the network tasks to empty the work The Ethernet interrupt controller then dismisses queues. And all these actions must happen so the interrupt, and control returns to the sched­ that they are transparent to the current fore­ uler. The scheduler now enables interrupts and ground application. scans the work queues for additional work. To fulfill those requirements, we designed a The CCB containing the received message is memory-resident scheduler that performs all found on the work queues and the routing layer those actions by using a technique called inter­ is called to completely process the message. rupt shelling. To shell an interrupt, the sched­ When all work queues with immediate work are uler first records the address of the current empty, the scheduler finally returns to the origi­ interrupt handler, then replaces it with the nally interrupted code in the user's application scheduler's own address. Thus when the inter­ program. rupt occurs, the CPU state and the interrupt The second example deals with handling an return address are saved, and the scheduler, interrupt from the clock. In this case exactly the instead of the original interrupt handler, is called same code path as the one in the first example is directly. followed. The clock interrupts and the scheduler Upon entry for an interrupt, the scheduler gains control with interrupts disabled. It saves saves the current context of the system and simu­ the return address and state and dispatches to lates an interrupt to the original interrupt han­ the "real" clock interrupt handler. This handler dler. When the interrupt processing completes, will update the date and time and dismiss the the interrupt handler will return to the sched­ uler - not the foreground task. Therefore, the interrupt. Control now returns to the scheduler, scheduler now gains control. The interrupt pro­ which enables the interrupts, scans the timer cessing is now complete and the time-critical queues, and dispatches any process whose timer processing has finished. The scheduler can now has expired. When all such processes have enable interrupts and examine all work queues been completed, the scheduler returns to the for tasks that need to be run. After all tasks have originally interrupted code in the application been run, the scheduler finally returns to the program. · interrupted foreground task. Using the scheduler to handle interrupts and Two examples will help to make this process context switching allows network processing to more clear. be performed in the background while an MS­ First, consider an interrupt from an Ethernet DOS application is running in the foreground. controller that signals the successful reception of The interrupt shell ensures that a minimum a message from some other node in the network. amount of code runs with interrupts disabled. When the controller causes the interrupt, the The background process scheduling ensures scheduler gains control with interrupts disabled. there is no network performance loss due to a The scheduler saves the return address and state pause between message receipt and message pro­ and dispatches to the "real" interrupt handler of cessing. the Ethernet controller. The interrupt handler performs the following series of actions: Overview of the DNA Archite�ture Table 1 lists the layers of the ISO model for data 1. Analyzes the interrupt communications, along with the corresponding 2. Determines that a message has been DNA layers and the appropriate DECnet-DOS received components within each layer.

110 Digital TecbnicaiJournal No . 3 September 1986 New Products

Table 1 Data Communication Layers

ISO Layer DNA Layer DECnet-DOS Qomponents I I Application User/network management Programming library Job spawner I Network control program (NCP) Network test utility (NTU)

Presentation Network application Network file transfer (NFT) Virtual terminal service (SETHOST) Virtual disk/printer service (NDU) Network mail (MAIL) File access server (FAL)

Session Session control SESSION

Transport End-to-end communication NSP

Network Routing ROUTING I I Data link Data link Asynchronous :DATAI LINK Ethernet DATA LINK

Physical link Physical link Asynchronous controllers Ethernet controllers

Data Link Services After calculating the bytes per second at 9600 baud and the instru�tions per second on Asynchronous Data Link Layer the lower-speed PCs, we r�aiized that very few The Digital Network Architecture standard speci- _ instructions could be exetuted between each fies a protocol providing a reliable data commu­ received character. In this �I ase the advantages. of nications path between two processors over coding in a higher-level !language were out- synchronous and asynchronous serial communi­ weighed by other considerations. After carefully cation lines. This protocol is the Digital Data recoding the interrupt handler for received char­ Communications Message Protocol (DDCMP) . acters in assembler language, we reduced but did The asynchronous data link layer provides not eliminate the character loss. Using debug DDCMP protocol processing and device driver tracing of the interrupt stack, we discovered that support for the asynchronous controllers con­ the PC BIOS code handling the clock interrupts tained.in the PCs. could leave the interrupt; system disabled for We fo und that no existing software could be long periods. Changing this clock interrupt code borrowed for the DDCMP protocol modules. solved the character-loss problem. However, existing DDCMP software programs I from other products were used as models to con­ Ethernet Data Link Layer I struct our own modules. We also had to design The Ethernetdata link layer provides buffer man- and code all device drivers fo r the various asyn­ agement services, transmfts and receives mes­ chronous controllers. At first we were not sure sages, and dispatches the received messages exactly how the asynchronous controller chip based upon their protocol types. The goal for and the interrupt controller chip worked DECnet-DOS included support fo r a number of together. Reading the specifications from the Ethernet controllers. Unfortunately, the code for chip manufacturers along with the documenta­ the device drivers is often the hardest to design tion from the makers of the controller boards and debug. resolved any questions we had. The code we Our search for existing code led us to two sep­ then developed worked properly at lower arate engineering groups within Digital. Both speeds; at 9600 baud, however, we found that groups had already written ldevice drivers for PC­ characters were being lost during reception. based Ethernet controllers.! We decided to use a

Digital Techntcal]ournal Ill No. 3 September 1986 The DECnet-DOS Sy stem

data link layer for buffer management and were solved by using the algorithms from other received-message dispatching that was common implementations. to these drivers. We also borrowed several other device drivers, all having consistent calling Session Layer Services sequences. As a result, our team had to write the The session layer, where user requests are code for only one device driver; the code for the checked and dispatched for processing, is other device drivers and the data link layer was highly dependent on the type of operating sys­ provided to us complete and partially tested. tem. The single-task environment of the MS-DOS system provides no process context identifica­ Network Layer Services (Routing) tion or integrity assurance for the user. As a The modules that perform routing functions result, we could not use the traditional design for have well defined inputs and outputs that are a DECnet session layer, in which logical link almost entirely independent of the operating sys­ ownership is known by a process code assigned tem type. Messages arriving from the NSP layer by the system. must be passed to lower layers, depending upon To design a new session layer, we chose to information stored in the routing database. Mes­ make the logical links into system-wide entities sages also arrive from the data link layer and must and retain all information about those links in be passed to higher layers, depending upon the background network process. In that way the information stored in the routing database. identifiers for logical links would be unique Since the code is relatively independent of the across the entire system. This solution ensures system, we were able to use the routing modules the integrity of the logical link database even if a from the DECnet-ULTRIX system with very few user program creates a logical link and then changes. exits. One side effect of this design is that an application can create a logical link, exit, and Transport Layer Services (NSP) then be run again later to access the existing log­ The NSP layer is the one in which logical links ical link. This effect allows the SETIIOST appli­ are created, maintained, and destroyed. Link cation to interrupt a virtual terminal session, maintenance includes all the timing and retrans­ return the user to the MS-DOS system to perform mission of messages necessary to maintain logi­ local tasks, and then resume the terminal session cal link integrity. NSP also segments large user later in the same state in which the session was buffers into smaller network buffers and ensures interrupted. that they are reassembled correctly. The actual session interface that we provided We studied the feasibility of porting the NSP was modeled after the UNIX 4.2BSD network modules from the DECnet-ULTRIX system to the socket interface as implemented in the DECnet­ DECnet-DOS system. However, the differences in ULTRIX system. Since the MS-DOS code is similar memory management and process scheduling to the UNIX code, we felt the interface was the made the conversion appear too costly. There­ most appropriate model to use. Unfortunately, fore, we re'Yrote the modules in the NSP layer of we could not use any of the UNIX code in this the DECnet-ULTRIX software, but retained the area, so we had to write our own implementa­ same names and functions. tion. By providing the same interface, however, This code was the most difficult to develop. we could easily share network application pro­ Manipulating dynamic memory, buffers, and tim­ grams with the DECnet-ULTRIX project. ing for multiple internal tasks is very specific to the operating system. Code from other imple­ Presentation Layer Services mentations to perform these functions was not veryhelpf ul. Ne twork File Transfe r In addition, our initial device drivers for asyn­ The network fi le transfer utility (NFf) provides chronous and Ethernet connections often los't file access services between the PC and remote characters or messages. Since NSP maintains the nodes. NFf supports a number of activities. integrity of each logical link, the retransmission • Files can be copied in both directions. algorithms had to be complete and correct very early for both low- and high-speed failures. • Remote files can be displayed on the PC's These difficult problems, like many others, console.

112 Dtgital Te cbntcaljournal No. 3 September 1986 New Products

• Listings of remote directories can also be products to provide remote terminal support: displayed. CTERM and LAT . CTERM is layered onto the DECnet software and provides remote terminal • Remote files can be deleted. support to any node in a DECnet network. The • Remote files can be queued to be printed or LATprotocol is independent of the DECnet soft­ executed at the remote node. ware and provides remote t�rminal support only among the nodes on a single Ethernet. The filesto be accessed can be specified using I We constructed the SET�OST utility entirely wild cards and file lists. out of existing code, the common parser being In addition to these services, NFf is responsi­ used to process SETHOST c'ommands. The soft­ ble for reformatting data if it is copied to or from ware to emulate the VT l 00 terminal was a remote system with a different file system for­ obtained from another engineering group within mat. To be a DECnet implementation, NFfhad to pass strict certification tests that ensured its com­ Digital . The handling code for the CTERM proto­ patibility with all other DECnet implementa­ col was ported from the DECnet-ULTRIX tions. Passing those tests was our single biggest software with very few changes. The han­ hurdle in this area. dling code fo r the LAT protocol was obtained The project team again decided to use existing from still another engineering group. All net­ work 1/0 is done using the�. programming inter- designs and code for NFT, the common parser · and common message processer being used to face library. parse NFf commands. We wrote the data format­ ting and protocol modules using DECnet-RSX Virtual Disk and Printer NFfas a guide. Network 1/0 was done using the Virtual device services support disk and printer programming interface library, which provides devices that are located at remote nodes yet programmers with network access. appear to be local to an application program. Using the NFf implementation in the DECnet­ Our goal was to provide this:service in a transpar­ RSX software proved to be a wise idea. That ent manner so that no chanies had to be made to implementation had been in use for many application programs or to the MS-DOS system. years; therefore, its algorithms and design Our final design for these services was quite were well tested. The DECnet-DOS NFT imp­ simple. The services are provided by two compo­ lementation was so successful that it was one of nents: the network device utility (NDU), and the the first applications to run in house during virtual device driver. The NDU accepts com­ development. mands from the user to establish either a virtual The file access listener (FAL) provides the disk volume or a virtual printer device at a same services as NFf but runs on the PC to give remote node. The logical ;link is made to the other network nodes access to the PC files. The remote system by NDU, and/the logical link ID is FAL utility was begun very late in the project. As passed to the virtual deviqe driver resident in a result we were able to port the completed DEC­ memory.This device driver �s written to conform net-ULTRIX FAL to the MS-DOS system, a task to the standards for MS-DOS device drivers. It completed in under two weeks. This gave the loads at system boot time by the standard MS­ project team enough time to add a number of DOS-loadable driver technique and accepts stan­ attractive optional fe atures, such as supporting dard device 1/0 requests from the MS-DOS fi le simultaneous multiple connections. system. The driver then executes these requests by performing the equivale.nt data access proto­ Virtual Terminal Service col sequences on the logical link established by The SETHOST utility allows the keyboard and NDU. '! screen of a PC to emulate a Vfl00 terminal con­ The data access protocol 'chosen was the same nected directly to a remote DECnet node. To do one used for file transfer. This choice allowed us that, this utility must provide not only emulation to use existing DECnet implementations on support for the keyboard and screen but also pro­ larger systems for the virtual device support. tocol-handling support for the remote terminal Thus the need to design and implement a spe­ protocol used to communicate with the node. cialized protocol for a numb�r of different oper­ Two protocols are currently used on Digital's ating systems was eliminated.

Digital Te cbnicaljournal 113 No . 3 September 1986 Tbe DECnet-DOS Sy stem

NDU and the virtual device drivers were built this restriction, our PC mail service automati­ from three subcomponents: cally tells the receiver of mail fro m the PC which large system should be sent the replies. • A common parser and common message pro­ cessor (described in the next section) Application Layer Services • A small library of subroutines that create remote files and open them using the data Application Program Command access protocol (These subroutines were Parsing taken directly from NFT.) The DECnet-DOS product would contain as many as ten different application programs, including • The programming interface library one each fo r file access, virtual terminal support, virtual device support, and mail services; and Ne twork Mail two fo r network management. Our requirements Digital provides network-wide mail services fo r a called for common, easily translatable messages number of its systems. These utilities allow users and common command parsing, inCluding char­ to compose messages directly within the mail acter delete, line and character recall, and abbre­ utility or to use an editor or pre-existing text file viation. These standards were important because as the source . Text can be sent to one or more it would be very costly to have ten diffe rent users on a multitude of systems, even using a dis­ engineers coding in ten different ways . Such an tribution list fro m another text file . approach, without standards, could have intro­ The DECnet-DOS mail utility was completed in duced differences in the applications, which a short time by combining the common parser would be difficult for the end user to learn and with the mail utility from the DECnet-ULTRIX remember. software. The unique requirements of a small To avoid this problem we designed and imple­ system did present some problems. In a DECnet mented a common parser and a common message network, node addressing is done by numeric and help processor. The common parser is addresses. However, users often prefer to use driven by a parsing command file that supports names, which can be more easily remembered. abbreviation, character delete, and line recall. To facilitate that use, each node maintains a data­ The common message processor is also driven by base that maps names to addresses. Now, such a message fi le and performs fast look-ups of text a database for a large network would be too strings and blocks. This design greatly reduced large for a small PC to keep on disk or search the time to code and debug the utility programs quickly. Yet the PC users may want to send mail and made their translation to fo reign languages to other users anywhere in such a network. For fa irly straightforward. example, Digital has an in-house network with over 1 0, 000 nodes, any one of which can be Network Ma nagement addressed by any other. To solve this problem we The network management architecture in the implemented the mail-forwarding feature of the DNA architecture re quires access to current mail protocol. That feature allows a PC user to parameters, ·counters, and statistics, all kept in send a message to an unknown node by asking a the volatile database. It also needs the parame­ known node to forward the message. Thus the PC ters, kept in the permanent database, that will be database keeps a much smaller list of known in effect the next time the network software is nodes, which is considerably more manageable. started. Data in both databases should be accessi­ The project team originally had a requirement ble from either the local node or a remote node. for receipt and storage of network mail on the However, the single-threaded restriction that local disks of the PC. Our investigations showed; the MS-DOS system imposes on file 1/0 makes however, that the engineering cost of developing access to the network management databases a background task that could record the received very difficult. This restriction not only affects the mail on disk was very high. mail service, as explained earlier, but prevents Since 1/0 in the MS-DOS system is single DECnet-DOS from providing remote access to threaded, the network software, running in the local network management databases. It also pre­ background, cannot perform 1/0 while an appli­ vents the background network software from cation program is performing it. To overcome accessing disk files. Thus the project team felt

114 Digital TecbnicalJournal ·No. 3 September 1986 New Products

that the cost of providing remote access to local applications and services. DECnet-DOS provides network management data was too high in terms an assembler language interface, a C program­ of resident memory usage and design complex­ ming subroutine library, and transparent MS-DOS ity. Therefore, the DNA architects granted an file access services. exception to the DNA requirement that remote An assembler languag� programmer can nodes have access to counters and parameters perform basic task-to-task �erI vices by filling a local to PCs. data structure with control information, then Solving the problem of database access by the issuing a software interrupt that is serviced by background network task, however, was essen­ the DECnet process resident in memory. Such tial to the success of our project. Therefore, we services include logical link creation and adopted the following design. On its first run, destruction, message transmission and reception, the network software performs as a foreground and status and control . task. As such the software first performs initial­ A C language programmer can access the same ization, then shifts to the background. During services through a subroutine library. We chose initialization, the network software, running in to make this interface c9mpatible with the the foreground, can safely perform disk 1/0. At DECnet-ULTRIX programmer interface. This this time it reads the permanent database, which choice decreased the development costs of a establishes the parameters necessary for running number of our application modules by making the network, such as buffer counts and sizes. The the DECnet-ULTRIX applications more portable. volatile database is kept in memory in the back­ It also allowed the project team to begin to ground network task. The network control pro­ develop MS-DOS-specific applications under the gram (NCP) queries the background network ULTRIX system. Our customers would benefit task when access to the volatile database is from the same advantages. i required. Similarly, NCP performs disk 1/0 to Using these assembler laq.guage or C services, the permanent database when its access is however, requires a good uhderstanding of logi­ required. cal link management and networking concepts. Node name-to-address translations, usually Many applications need only one simple connec­ performed by resident DECnet code, were tion· to access a file or program on another sys­ assigned to the application layer in the DECnet­ tem. Unfortunately, existing applications often

DOS software. This shift overcame the back­ cannot be easily rewritten ·to take advantage of ground disk 1/0 restriction. Also, even though network calls. remote nodes cannot access our local DECnet­ To solve this problem, we designed services DOS databases, they can perform loopback tests for transparent network aJcess. These services to the DECnet-DOS node. For Ethernet configura­ make network access possible for the thousands tions, remote nodes can query the data link Jayer of PC applications already written and make the for identification information and data link error development of new network applications eas­ and traffic counters. ier. Two tasks, one for remote file access and one NCP and the network test utility (NTU) were for remote program communication, are first written using the common parsing package and loaded into memory. These tasks then take con­ the programming interface library. The aCtion trol of the software interrupt used by all pro­ routines performing network management and grams for services offered by the MS-DOS system. testing could not be ported from other imple­ Each interrupt made by !an applications pro­ mentations because of the MS-DOS restrictions. gram requesting an MS-DOS service is examined. As a result, creating the routines to perform net­ If the request is a file OPEN or CREATE , the file work management consumed a large part of our specification is also examined. File specifica­ development effort. tions that begin with a double backslash are not valid MS-DOS fi le names and instead signal a Ne twork Programming Services request for network servic�s. All other MS-DOS

All DECnet implementations make the basic service requests are passedI on to the operating program-to-program communication services system for processing. The '!'intercepted" service available to programmers through some sort of requests are processed by :the memory-resident or subroutine library. That allows tasks that mimic the actions of the MS-DOS sys­ programmers to develop their own network tem. Thus an application to display a file on the

Digital Te cbnical]ournal 115 No . 3 Sep tember ·J986 Th e DECnet-DOS Sy stem

console can access either a local file with the General References specification "local.fil" or a remote fi le at DECnet Digital Ne twork Architecture (Phase another node. IV) General Description (Maynard: Digital Although a wide range of programmer services Equipment Corporation, Order No. AA-N 149A­ was offered to single application programs, the TC, 1982). single-tasking nature of the MS-DOS system made it difficult to run a PC offering multiple services. J. Forecast, J. Jackson, and J. Schriesheim, Digital Te ch­ To solve this problem, we added a service to the "The DECnet-ULTRIX Software," nical jo urnal session layer . That service allows a single appli­ (September 1986, this issue): cation to receive a request for any other service . 100-107. Using this capability, we wrote a program called J. Forecast, J. Jackson, and J. Schriesheim, "Com­ the job spawner, which receives all requests for munications Executive Implements Computer service. For each request, the job spawner Networks," Computer Design (November accesses a database for the name of the applica­ 1980): 71-75. tion that must be run to service that particular request. Upon finding the application, the job S. Leffler, W. Joy, and R. Fabry, "A 4.2BSD Inter­ spawner runs it to completion and then waits fo r process Communication Primer," University of the next request. CaliforniaTechnical Report, Berkeley, California (1983) . Summary S. Leffler, W. Joy, and R. Fabry, "4.2BSD Net­ This implementation . of DECnet-DOS provides a working Implementation Notes ," University of wide range of reliable communications services California Technical Report, Berkeley, California between personal computers and larger systems. (1983) . The development of this software was successfu l and completed close to schedule, made possible _ by

• Strict adherence to a proven communications architecture

• Porting existing designs, algorithms, and code fr om other software projects

• Strict, independent certification and perfor- mance test procedures

An adherence to company-wide architectures · also ensures that future communication tech­ nologies, both hardware and software, can be easily integrated with the existing DECnet architecture.

116 Digital TecbnicalJournal No . 3 September 1986 Nancy R. La PeUe jMarkJ. Seger b:fark W. Sy lor I

The Evolution of Ne twork Alanag�ent Produc�

The management of data networks bas evolved at Digital since 1978, although the managementof voice networks bas been a more recent phe· nomenon. Digital's first data network management products m�naged networks of DECnet nodes. Our capabilities now include the management of diverse data network components by meansof several different prod· ucts described in this paper. The integration of these data network man· agement capabilities through a common architecture, user interfaces, management databases, and protocols is a major short-term g�al. The integration of voice and data network management is a much longer­ term goal. The voice managementproduct presented here wiU be 'part of thatfu ture integration.

The size and complexity of networks have been across all DECnet produCts.1 That program growing at an accelerating rate. For example, would allow a network mahager to control the over the last ten years the size of Digital's inter­ operation and configuration of the network by nal network has grown from a few communicat­ manipulating operational parameters. The pro­ ing systems to over 10,000 computer nodes, dis­ gram would also allow him to observe how well tributed throughout 250 sites in 37 countries. the network operated by providing current status We are currently adding about a hundred new information, and network traffic and error data. systems per week to this private network. This I rapid growth has led to a need for more-sophisti­ Basic Management Capabilities fo r the cated network management capabilities to con­ Wh ole Network trol such networks. This paper describes the As other architectures and protocols emerged, changing needs of network management, how new products, such as X.2,5 and SNA gateways, Digital's products and capabilities have evolved and local area network (LAN) bridges, needed to meet those needs, and some directions for the same capabilities to b� managed as did the future evolution. DECnet products. This requirement brought Traditionally, networks have come in two cate­ about the fi rst major evolutionary trend in net­ gories: data and voice. Digital supports many work management. It became clear that DNA had data network architectures, including BISYNCH, to be extended to accommodate the management SNA; X.25, Ethernet, and OSI. However, the of connections to these non-DECnet products. primary one supported is the Digital Network Ii tO Architecture, or DNA, which defines our DECnet Adding Intelligence · Management products. Functio ns The second evolutionarytrend was driven by the The Network Evolution increased size and complexity of DECnet net­ Some basic network management capabilities works and the difficulty in finding qualified peo­ were added to the DNA architecture early in its ple to manage them and the: systems within them. evolution (1978) . These capabilities included This trend led to the need for more-intelligent the manual on-line observation and control of network management functionsto support a cen­ both local and remote network nodes. Included tralized staff dedicated to managing the network. in the DNA design was a network control pro­ To partially meet this need, we developed a pro­ gram that was to be implemented consistently duct that automates the rponitoring of traffic I I Digital TechnicalJo urnal 117 No . 3 September 1986 The Evolution of Ne twork Management Products

and other events in the network. This product other node, subject only to access control. Each also contains many event evaluation functions, node also provided access to its own manage­ which produce statistics made available through ment fu nctions and data for its own local users. both interactive and hard-copy reports. A common user-interface program, called the Other products have also added intelligence to network control program (NCP) , is imple­ the basic network control capabilities of the mented across all DECnet products. This pro­ early DNA architecture . These products perform gram is not simply a remote console interface to IAN testing and diagnosis, and X.25 accounting network products. In addition, it allows remote and enhanced protocol-tracing. access to network management functions via a published, proprietary protocol. Vo ice Ne twork Integra tion The character of Digital 's networks has An evolutionary trend similar to that in data net­ changed significantly over the last ten years. works was also happening in voice networks . They now have components that are neither Vo ice network users were becoming more DECnet nodes nor peers, such as IAN bridges, sophisticated, requesting capabilities similar to gateways, and other servers of various kinds. The those seen in data networks. For example, like approach we used to integrate the management their data network counterparts, voice network of these products was based on the DNA manage­ managers need the ability to control, optimize, ment strategy. That is, the level of function and configure, and monitor the network. In addition the general presentation to the user were similar to collecting management data, users also to those originally in DNA network management. requested its processing to provide management A number of initial strategies were tried to information. extend the architecture in various ways to At the present time, telecommunications net­ include the management of these products. work management has evolved beyond the scope For X.25 connections, our initial strategy was of the DNA design. Because of this rapid advance, to extend the arch itecture by adding X.25- product strategies have been adopted fo r tele­ specific capabilities. Unfortunately, this strategy communications management that identify a tightly coupled the management of the X.25 number of directions data network management product to that of the DECnet product by adding products could pursue in the future . Specifi­ X. 25 management to NCP. That coupling made cally, we are expanding our management archi­ upgrading the management implementation for tecture to allow fo r the inclusion of additional either product more difficult since it created network components. We also see the need to interdependencies between product develop­ integrate management user interfaces and infor­ ment schedules. We have found these interde­ mation with other network applications. This pendencies to be difficult to manage with only integration will support all the business-data and these two products. This strategy would be com­ resource-management needs of users. pletely unworkable if we pursued it fo r the man­ The remainder of this paper covers the evolu­ agement of all the different network products. tion of network management in more detail as it Nothing would come to market if we had to coor­ relates to the development of specific manage­ dinate the development and release schedules ment products. The fo llowing section discusses for dozens of interrelated items. data network management as a distributed For SNA gateways, our temporary strategy was application that provides operational control to derive a parallel management architecture for and observation of a variety of data network SNA, based on the DNA management capabilities. products. Wh ile decoupling the two architectures and implementations, this strategy only postponed Management as a Distributed the integration of network management for SNA Application gateways. It also resulted in multiple manage­ Network management within Digital Equipment ment protocols and user interfaces, although the Corporation began as the management of net­ parallels between these were obvious. We could works of DECnet nodes. These networks con­ readily see that this approach would not work sisted of peer computer nodes with peer manage­ well in the long term as the number of products ment capabilities; that is, each node had remote to be managed increased. We also knew that inte­ access to the management capabilities of every gration of network management for all our net-

118 Digital TecbnicalJournal No . 3 September 1986 New Products

work products was very desirable to customers be centralized, in which one management station and would allow us to eliminate duplication on provides access to the entire network; dis­ development projects. tributed to a few management stations; or fully For IAN bridges, our strategy was to use a non­ distributed to all nodes, as ·access is in DECnet proprietary protocol based on the evo lving IEEE NCP. In any case remote accrss fr om one node to 80 2 .1 management protocol. This strategy was the management functions ahd data from another the first step in an evolution toward the adoption node necessitates designin and implementing �t of emerging international standards fo r network the management applica 1 ion itself as a dis- management. 2 Since the development of manage­ tributed function. The architecture for this dis- ment protocol standards had q.ot been com­ tributed management must specify a set of pleted, using such protocols amounted to distributed software and ,database elements working with prototypes. However, the need that will be needed to manage diverse network to evolve the strategy in this direction was components. clear. NCP and other similar products developed for Distributed Manage ent Architecture SNA gateway and IAN bridge management pro­ and Application Elem�mtrrt, s vide the network manager with various simple The basic elements that must be included in a fu nctions. Among these are configuration con­ distributed management architecture and in the trol , low-level testing, and snapshot views of applications developed within it are state info rmation and various traffic and error • counters. A user interface While integrating these products, both the • A management agent in e�ch component to be architecture and the subsequent implementation managed, providing remote access to its man- strategy must be enhanced to allow the indepen­ • I agement funcuons and dataI dent development of management capabilities I • for these diverse components. We are currently A communications mechanism between the working on such enhancements, as discussed in node running the user irtterface and the agent the section "Future Developments." software modules in the network components Other factors were also highlighted while we being managed developed management capabilities for bridges, • A management database · . gateways, and servers. These more recent addi­ I • tions to our network product set do not always A set of simple manag�ment fu nctions on provide local access to their own management which more intelligent functions can be capabilities or allow the initiation of manage­ layered ! ment access to other network components. Many Figure 1 illustrates the relationships between have no locitlly connected device to support a these elements. local management user interface. To allow net­ work management for these components, some Th e Us er Interfa ce type of remote management access from another The user interface and software to invoke man­ station that does provide a management user agement functions genedlly reside on one or interface is essential. more management stations i (more if distributed, Remote access is also the key for managing as with NCP; or one if centralized). The user DECnet nodes. Without this access, no manager interface is the key to developing an integrated could see more than his own local node informa­ management application from a network man­ tion, which is not suffi cient to diagnose and ager's perspective. In developing new network solve overall network problems. Without remote products, we have extended the command lan­ access, service personnel would have to be sent guage syntax developed for DECnet management to each node location that was experiencing the to X.25 connections and SNA gateway products, problem in order to collect management infor­ as well as to bridge and se�er products currently I mation. This problem results in networks that are under development. Thus 1a common command much more expensive to service. style, resembling English s�ntences as closely as Remote access to the management application possible, has been used to manage our expand­ can be provided in a number of ways. Access can ing set of network products.

Digital Te cbnlcaiJournnl 119 No. 3 September 1986 Th e Evolution of Ne twork Management Products

COMMUNICATION OF MANAGEMENT FUNCTIONS, DATA NETWORK MANAGEMENT STATION OVER NETWORK COMPONENTS

PROTOCOL

MANAGEMENT AGENT NODE / DATABASE

USER MANAGEMENT MANAGEMENT COMMAND GATEWAY I-- INTERFACE FUNCTIONS AGENT DATABASE

.

. I I .

MANAGEMENT AGENT BRIDGE DATABASE

l _../ r'-- MANAGEMENT DATABASE

Fig ure 1 Distributed Ma nagement Applicatio n Elements

A unified user interface giving management remotely. These agents perform the management access to all these network products from a sin­ actions issued by users by way of the user inter­ gle program would have been extremely desir­ face. The agents also maintain the operational able. Such an interface, however, could not be management data needed to support the manage- provided for the first release of all of them. A uni­ - ·ment application and provide remote access to fi ed interface would have to allow fo r the identi­ this data maintained by the component itself. fication of all products (DECnet, X.25, SNA, An agent must be addressable across the net­ bridges, etc.) to which the desired management work and must respond to a set of fu nction functionwou ld apply. The architecture had not requests that provide adequate management yet addressed this problem of selecting among capabilities for each particular type of compo­ multiple products to be managed, although it nent. Some components (e .g., hardware commu­ had been extended for X.25 connections. Thus nications controllers connecting computers to the X.25 connections can be managed via the network media) may have extremely simple standard NCP interface. However, the SNA gate­ agents. Complex programmable network compo­ way has its own SNA control program bundled nents, like bridges and servers, have agents that into the SNA products; and LAN bridges have may respond to many more functions.3 Thus their own management software, the remote these components may have a lot of the dis­ bridge management software (RBMS) , sold as a tributed management application residing in the separate product. The network architecture is software or firmware of the agent. currently being extended to incorporate this task Consistency of implementation of manage­ of selecting among multiple products. ment function across agents of diverse types is extremely important. 4 If the implementation of Ma nagement Agents fu nctions in the agents of different network NetWork components must contain management products is inconsistent, · then these products agents before the components can be managed will not provide a uniform set of fu nctions to the

120 Digital Tecbntcal]ournal No. 3 September 1986 New Products

users. Furthermore, the same management func­ aged components themselve , including compo­ tion might result in different actions for different nent identification, state infb�rmation, traffic and network products. Inconsistency of this type error counts, and other operational parameters. A results in user confusion and dissatisfaction. permanent copy of the operational parameters must be maintained in nonvolatile storage to Management Pro tocols recover from human, computer, and network A management protocol is the vehicle for com­ fa ilures. This copy could exist either in non­ municating management functions and data volatile RAM in the compom�nt, on local external between the user interface and the management storage, or on a file at the l:emote management ' agent. The NICE protocol was developed for this station. If the copy exists at the management sta­ purpose within standard DECnet components. tion, then the component must depend on that NICE was later extended to include X.25 gate­ station for correct operation unless the data is way management and was also used as the base also stored locally. for adding extensions to manage DECnetjSNA Other data files that might be contained in a gateways. As mentioned earlier, for non-DECnet management database inclure the fo llowing: servers and bridges, we are tracking the evolu­ • Loadable software image 'files for both opera­ tion of a nonproprietary standard management tional and diagnostic im�ges associated with protocol. particular network components The communication mechanism between the user interface and the management agent must • Event log files in which notifications of signif­ icant events for network :components are col· also provide reliable transmission and end-to-end ' integrity of management fu nction requests and lected I management data. These requirements have been • Reference data files that,] though not essential satisfied by the data link and end communication for component operation, give information (OSI transport) layers of the Digital Network relevant to the physical identification, loca­ Architecture for DECnet links. These layers have tion, and servicing of network components also been used in the management of the gate­ ways between DECnet and X.25 or SNA networks • Management directory since the layers are available on all components The management directqry contains informa­ implementing the DECnet standards. tion needed to identify and locate data relevant For LAN bridges and some other non-DECnet to a particular component and to gain access to server implementations, other transport mecha­ netwOrk addresses fo r the components them­ nisms have been used with the management pro­ selves. Those addresses are needed for the pur­ tocol. In the case of bridges, no transport mecha" pose of remote management access. nism is implemented at the bridge; simple To provide reliable access to essential direc­ datagram-based management is used. For bridge tory information, the directory data must either management, the node running RBMS must be part of a network-wide :directory (or naming assume total responsibility for the reliable trans­ service) or be kept at the anagement station of 1 mission and integrity of management messages the component. The otherd. files, however, could on the LAN . The component being managed reside at the management station or on external assumes no mutual responsibility for these mes­ storage at the component, if such storage exists. sages. Based on product constraints, the respon­ (It does not exist for many components · like sibility fo r the reliable communication of man­ bridges, gateways, and some servers.) The direc­ agement information can thus be distributed in a tory can be used to access idata no matter where variety of ways . Therefore, we must extend the it resides. The directory !allows management network management architecture to specify a data distribution trade-off to be made to meet i ' standard solution to this problem for non­ the needs of the network� • products themselves DECnet products. as we ll as those of the management station software. Ma nagement Database The database needed to support the management Ma nagement Fu nctions application can be distributed in a number of Simple management fu nctions are network con­ I ways . Some data must be maintained by the man- trol functions involving interactions with net- 1 t

Digital Technicaljournal 121 No . 3 September 1986 The Evolution of Ne twork Management Products work components in which few or no analyses products to be managed by one or a few manage­ are made of the data or test results obtained. ment stations. Therefore, an expensive manage­ These interactions include ment station would be preferable to many expen­ sive network products. One could make the • Collecting or modifying management data decision to put much of the complexity into the maintained by components management station software if the product • Requesting management actions to occur at requirements warrant this move; a more intelli­ components (enable, disable, test, etc.) gent agent generally results in less traffic. How­ ever, such an agent may also require more over­ • Loading operational code or parameters into head in the component, thus affecting the components performance of the component itself. • Collecting notifications of significant events The amount of the management application to from components be distributed to the management agent also affects the design of intelligent management Access security to these operations is provided fu nctions in the management station software. through a database of authorized network man­ For example, an agent able to log events at set­ agers and their access rights. In this way different table time intervals automatically provides the levels of access rights can be granted to different data needed for analyses by intelligent manage­ usersfor certain fu nctions (e.g., privileges might ment functions. An agent not providing these be granted to collect management data but not functions must be polled for this information by modify it from a remote node) . that software, which means more code in the We are currently extending the four simple management station software and more traffi c on network-control functions across our network the network. The fo llowing sections describe product set. More-intelligent management fu nc­ tions are being devised to automate the collec­ two products that provide some of these more tion of management information and to provide intelligent management functions in the manage· analyses of data and test results. Such analyses ment station software. would yield information about network perfor­ mance, fa ult management, and accounting. The DECnet Monitor These more intelligent functions are being intro­ Digital's monitoring product automates the col­ duced in items such as the DECnet monitor and lection and analysis of network management PfFM products, described later, and ETHERnim, data. This section discusses how the monitor the LAN testing and diagnosis software. The evolved to include automated fu nctions and selection of simple and intelligent management enhanced management databases, management fu nctions to be performed must be accommo­ protocols, and user interfaces. dated by an integrated user interfa ce, just as that interface must allow for the selection of compo­ Evolution of the Monitor nents to be managed. The evolution in Digital's How to distribute the management responsibili­ network products toward this level of integration ties is a key question in developing a network has just begun. management strategy. One answer is to distribute this responsibility to the manager at each compo­ Amount of the Management nent. That was done with DECnet network Application to Be Distributed management, as we described earlier. Another Certain distribution criteria affect the design of answer is to centralize network management network components relative to. their manage­ responsibility at one or a few nodes . ment capabilities.5 These criteria concern com­ By 1979, Digital's internal network was large ponent pricing, component performance, and enough to require the centralization of its man­ network performance. Clearly, the cost of devel­ agement tasks. The group responsible for central oping sophisticated, intelligent management management developed a tool to monitor agents, such as those providing complex the network. This tool, called Observer, was threshold and statistical analyses, will be released in December 1982 and ran ona PDP- 1 1 reflected in the price of the network product. system under the RSX- 11M software. Eventually Customers will normally purchase many network the size of the network and the subsequent mon-

122 Digital TecbnicalJournal No. 3 September 1986 New Products

; i itoring activities outgrew the capabilities of this Short, Medium, and L�ng Te rm system. The lack of memory address space and Problems : CPU speed. prevented adding new features, and Network managers have short-, medium-, and the forms-based user interface became quite long-term needs for data and for analyses of that cumbersome . Thus a new monitor develop­ data. Over a short time period (hours or min­ ment project was initiated, which culminated utes) , the network manager is concerned with in the DECoct monitor, based on the VAXjVMS detecting and solving critical network problems 1 software .6 or fa ilures. Such fa ilures might cause the com- ' The DECoct monitor provides the capability pany's network application Ito be unavailable to to centralize network management, automating support its business needs fo r an indeterminate many of the intelligent monitoring fu nctions amount of time. To detect problems, the network · discussed in the last section. It also enhances manager needs intelligent analyses of the most the databases, protocols, and user interfaces recent state of the network.. provided by NCP, thus allowing the user to Network problems often:arise from multiple monitor the state of his network more effec­ component failures. For ex�mple, in a network. tively. I that makes use of alternate ]paths and automatic Centralized Ma nagement fail-over (such as provided' by the DECoct soft­ ware) , the fa ilure of two or more circuits In centralized management, a centrally located can partition the. network. Tb is fa ilure could also organization (for example, corporate head­ disrupt a business application that depends on quarters) assumes responsibility for the man­ the network as a resource. Such a partition will agement of key parts of the network. A central not occur if only one failure takes place . On group usually has the expertise and resources I to deal with the more difficult problems of the other hand, a single fa qure can be detected network management, which require people in several locations. For e*mple, if a point-to­ whose skills are scarce and expensive. Thus point communication line fails, that failure these people are more effective as a cen­ will be recorded at each end of the circuit. Thus tral, and therefore shareable, resource. Their the network manager needs coordinated informa­ scarcity means that they need easy-to-use tools tion fro m all points noting the fa ilure if he or to make them productive . There are many she is not to be confused by multiple indications aspects to ease-of-use; for the monitor, it means of the same problem. Th� monitor evaluates providing users wi th information in a form these communications failures, sorting out that is easy to understand. Of course, easy-to­ duplicate indications and ignoring redundant use programs often require more computer information. resources (disk space , memory, CPU time) , Error and traffic statistics are a key means of but the overall cost of network management detecting problems in the network. Many of can be lowered if the tools make people more these statistics come from the counters built into efficient. the DECoct software. Single counter readings, however, are not immediat�ly meaningful. To be Centralized Database useful, counters must be �ampled at the begin­ Network problems typically span many nodes. A ning and end of an interval, and their difference system manager at a component may see the can then be used to compute important statistics. symptoms of a problem but not have the informa­ For example, dividing the difference in counter tion needed to fully understand it. Only the net­ readings by the length of ,the time period will work manager can access all the information yield the rate (such as traffic or error rate) per needed to diagnose and solve problems that interval. The DECoct mohitor computes both affe ct the whole network. To do this diagnosis these statistics as well as niany others. from one location, the network manager must In a medium time period (hours or days) , the gather and store data in a historical database at network manager must note developing trends his central site . This centralized database is an that may predict incipient problems that might immensely valuable resource in managing a net­ be prevented. Certainly, increasing error rates or work, operating even when some of the systems traffic levels could signal developing problems. being managed are not. The DECoct monitor provides displays to signal I

Digital Tec bnlcaljournal 123 No . 3 September 1986 Th e Evolution of Ne twork Ma nagement Products such trends, for example, an on-line histogram of logical map) . Yet the command syntax and pre­ traffic or errors over the past week for a specified sentation of information for the more basic capa­ line. bilities have been derived fr om those in the Over a long time period (weeks or months) , original DNA network control program. the network manager is concernedwith planning Network managers have different styles of issues, such as how to configure the network so using monitoring. The DECnet monitor supports that its performance and reliability meet the three usage styles: batch, interactive, and alarm. users' needs. Information about the current net­ In batch usage, reports are automatically gen­ work, its performance and reliability, and the erated at set intervals. This style, oriented to workload presented to it is invaluable. This medium- and long-term planning needs, is used information is all available fr om the DECnet most often to produce summaries of network monitor via traffi c and error reports over speci­ management information. fied long-term intervals for specific network Interactive usage is driven by user commands. components. The DECnet monitor's interactive user interface adds many new commands and dynamically Function Distribution updated displays to the static displays available A key to the design of the DECnet monitor was to from NCP. These capabilities provide automated balance the functions that could be distributed monitoring capabilities on line. With these most advantageously with those that could I?e added fu nctions, it was impossible to keep the centralized. The advantages of centralized infor­ monitor's syntax identical to that of NCP's; how­ mation were described earlier. However, the ever, there is a definite family resemblance. monitor's design had to incorporate the flexibil­ Interactive usage is oriented toward short- and ity and other advantages of the existing dis­ medium-term management needs. tributed network management features built into In alarm usage, the monitor initiates a signal to the DECnet software. indicate a problem to the user (for example, As discussed earlier, each DECnet system has a turning red the symbol of a system on a topology management agent that maintains data about that map) . Alarms are oriented toward short-term system and makes it available to a network man­ problem solving. ager at a remote location. That composite body of data must be collected, analyzed, and stored in Design of the DECnet Monitor some central database, after which it can be dis­ The most important decision we made in design­ tributed to the system managers. The DECnet ing the DECnet monitor was to limit the product monitor gathers this data at user-specified inter­ to monitoring; we did not attempt to control the vals using the standard DNA management proto­ network as well. Network managers really col, NICE, originally used only by NCP. The data needed more help in monitoring, since NCP's collected concerns counter values, and status capabilities in this area were primitive, although and operational parameters for the lines, cir­ its control capabilities were perfectly sufficient. cuits, and nodes in the network, including those Including both monitoring and control was sim­ for DECnet, X.25, and SNA connections. ply too much to attempt in a single development The central database supports remote shared effort. access fr om users. The DECnet monitor provides With these general requirements in mind, the each network manager with a separate presenta­ DECnet monitor was designed to have the fol­ tion interface to the shared data. Data distribu­ lowing five functions: tion is accomplished via an enhanced manage­ • Collect management data from the network ment protocol, an extension to NICE needed to components support the additional functionsprovided by the DECnet monitor. • Store the management data in a central location Us age Styles • Distribute data to users Ease of use through human engineering was a major goal of the DECnet monitor, reflected in • Evaluate the collected data into meaningful the graphical presentation of information that is information (statistics, configuration descrip­ difficult to express in other ways (e .g., the topo- tion, etc.)

124 Digital Te chnicalJo urnal No . 3 September 1986 New Products

• Present that information to users in easy-to-use This early product was intended to be a stand­ formats (color graphics, topology maps, alone, inexpensive, and easy-to-operate product. histograms, tables, lists, fo rms , and batch The processors initially chosen were the PDP- 1 1 reports) family. This choice gave end users the ability to . choose independently from; a variety of proces­ Many of the lessons learned from designing sors, disks, and memory configurations. Thus management software , such as the DECnet moni­ they could tailor their systems to fit the require- tor for data networks, also apply to voice net­ ' ments imposed by their data vo lumes. work management. To further extend this pr duct's functionality, we added Digital's databasdq query language and Te lecommunications Network Management report writer, the DATATRIEVE system. This addi­ tion gave TELEPRO's developers the ability to Digital's earliest telecommunications manage­ build a powerfu l set of standard reports in a ment product provided simple cost-allocation small amount of time. With this set, users could and traffic-management reports. This capability create their own custom,built reports. As a has evolved to allow users to track costs and result, the early product fi t well with the needs expenses and to generate billing invoices. Users of the then current market for telecommunica­ can now capture historical data and perform pre­ tions management software. diction analyses on it. When this early product was introduced, many The products developed are based on an office competing systems were based on personal com­ automation system that also provides generic puters, which did not have the same extended application-generation tools. Thus users can now functionality as the PDP-11 i system. These small integrate their telecommunications management computers simply could no provide the storage fu nctions with the rest of their business commu­ capacity or processing pow�er required by large nication needs. This integration means that customers. Their applic tions needed large reports, which are really just document files, can tables fo r accurate call pr�icing, and reports of now, after processing, be annotated and shared varying detail with information stored as low as (via electronic mail) With people from other the call record level. Many PC-based products departments. tried various ways to get around these limita: tions. Some approximated call prices (by region Evolution of Vo ice Ma nagement rather than explicit area codejexchange) ; others The TELEPRO product, introduced in 1981, created summary dataon the fly (thereby not sav­ was Digital's first entry into the field of voice ing the raw data) ; still others used a combination networks. This early telecommunications of the two . All these attempts meant that the product was designed to collect station message required information might not be available if a detail recorder (SMDR) information from user wanted to create his o reports. PBX systems and to generate basic cost account­ ;wn ing reports . These reports included roll-ups Meeting Ma rket. Needs I of telephone charges for various subgroups, Around 1984, because of the breakup of AT&T such as departments, cost centers, and the like. Corporation, the requirem nts fo r voice manage­ There were also reports providing network ment began to expand ra� idly, beyond TELE­ traffic infor mation, such as trunk usage and PRO 's capabilities. Owners,p of large PBX systems call cost distribution. In the latter case, infor­ were now reselling services, mation was reported on a per-trunk basis and for example, to landlords of buildings, universi­ subdivided by times of day over a monthly ties, and even hospitals. Unfortunately, TELEPRO period. For example, a manager could see was limited to tracking expenses rather than gen­ how many calls on a particular tieline exceeded erating invoices. Thus the P/FM (PBX fa cilities one minute, how many two minutes, etc., management) product was created to provide for each day of the month. This information this additional functionality. gave the manager better analyses of the perfor­ As with the DECnet monitor, we soon realized

mance of the network, helping him to make the limitations of the PDP-' 11 architecture for better decisions about proper network con­ supporting this additional functionality. To solve figuration. this problem, we first debded to convert the

. I I i Digttal Tec bnicaljournal 125 No . 3 September 1986 The Evolution of Ne twork Management Pr,oducts basic functions of the early product into the Evolution of the PjFM Software richer VAX architecture. We then added an To meet these expanding needs, we redesigned invoicing capability and built a common user PjFM with a new architecture that allowed cer­ interface to yield the fi nal product. tain areas to be easily modified in the develop­ Besides tracking telecommunications infor­ ment process, as well as in the field. In essence, mation, P jFM can track charges for nontele­ th,is new architecture would provide PjFM 's communications items, such as monthly parking basic set of fu nctions. As needs arose for addi­ fe es, office cleaning, and equipment rentals in a tional fu nctions, experienced customers or facility environment. These capabilities allow a Digital's Software Services personnel could write landlord of a building to present tenants with sin­ the necessary code. The resulting product would gle invoices that charge not only for telephone have more fu nctionality and be better integrated usage, but also for virtually . anything he has and therefore easier to use. Furthermore, by defined as a billable entity. being tailored to fit the VA XjVMS environment, Unlike the early product, used primarily the product's performance would improve as by telecommunications managers experienced well. with computers, the PjFM product is aimed at a As companies throughout the country ex­ more business-oriented market. Therefore, it panded their telecommunications needs, we saw needed an extra degree of friendliness to be a continual flow of new product requirements. It successful. End users can choose a wide variety became obvious that a more formal strategy fo r of processors from the VAX fa mily, which pro­ changes was needed to keep up with the con­ vides developers with an excellent base operat­ stantly changing market. To satisfy this need, we ing system on which to add future functionality. decided to have frequent PjFM releases (e .g., the Not only do users have access to bigger data­ next two releases came out about a year apart) . bases, but developers have all the VAX layered This schedule would allow us to add new func­ products with which to construct their applica­ tionality to the base product, such as supporting tions. One such product is the ALL-IN-I system. changes in FCC regulations or implementing spe­ Although presented primarily as an office cial requirements from customers. Most impor­ automation product, this system provides a tantly, it allowed us to offer in the base product very user-friendly environment for entering data some of the custom programs written by Soft­ and paging through forms. When building PjFM, ware Services. These programs expanded the

. we took the office automation. forms software basic PjFM functionality while removing the from the ALL-IN-I system and borrowed the burden on external groups to support custom remainder fo r the base software. The resulting code. product has the ease of use associated with offi ce ·Although customers could then get the func­ automation, although it is dearly not office tionality they really needed, the overall cost of automation in the electronic mail or calendar operation was still a problem. A customer could sense. either run PjFM on a stand-alone VAX-11/730 To get the PjFM product into the marketplace system or share a larger VAX CPU (1 1/750 or quickly, we designed and wrote the first version 11j780) with other applications. This latter of the software in a fairly short time. This version choice could reduce the capability of the cus­ consisted of two separate and distinct subsystems tomer's primary processor . In essence, PjFM's (expense tracking and invoicing) , even though processing needs for large organizations re­ the user saw only one system interfa ce. Unfortu­ quired more hardware than many users were nately, in some cases information had to be willing to pay fo r. entered twice to produce different reports using This problem was quite effectively solved by that information. Furthermore, the TELEPRO Digital's new low- and high-end VA X processors. algorithms were based on a limited program size The MicroVAX II system provides a cheaper and imposed by the original 16-bit PDP-1 1 architec­ more powerful low-end processor, while the ture. Those parts of PjFM based on this architec­ VA X 8000 series can, when clustered, provide ture were already at their limits. It was clear the configurations with mainframe capabilities . expanding market for this product wo uld PjFM can run efficiently on a MicroVAX II system quickly supersede PjFM's ability to go beyond its but now at a considerably reduced cost fr om the original goals. former arrangement. If a customer wants to run

126 Digital Te cbnicalJournal No. 3 September 1986 New Products

i I on a bigger VAX system, there is now much more ment now emerging. DevelopingI products like CPU power available fo r the primary tasks since the DECnet monitor and PfFM are the fi rst step in the PjFM software takes a correspondingly this direction. However, th�y are only the begin­ smaller slice. In fact, it's quite fe asible to ning of a long-term commitment to produce an run this software on the same hardware with the integrated management arthitecture and man­ ALL-IN- I system. The advantage here is that when agement software based on an integrated model. running P jFM, the user is presented with the Future versions of existing products will evolve same type of user interface down to the individ­ within the framework of this model. ual keystrokes. In fact , with simple modifica­ Our proprietary management architecture is tions, a user could actually take PjFM reports and being extended beyond the range of the DECnet mail them to ALL-IN -I users, thus closing the software and toward international standards loop between the telecommunications and MIS as they evolve . This architecture will further departments. integrate the management of non-DECnet prod­ ucts, such as Ethernet bridges and SNA gateways, Pa rallels between Vo ice and with the existing integrated management capa­ Data Networks bilities of DECnet and X.25 products. This inte­ There are some interesting parallels between gration wi ll extend remote :access to the current managing voice networks and data networks. management functions of all products (e.g., Both management schemes can potentially simple network control fun ions for all servers, require tremendous amounts of storage, and and monitoring for bridbes , gateways , and I manipulating that data can require a lot of CPU servers) . ·� power as we ll. It also appears that no matter While extending our management architec­ what capabilities are provided in the base pack­ ture, we are developing a more loosely coupled age , customers always have additional require­ management software desi n that will ease the ments. The capability has always existed to build addition of new network productsk and manage­ complex systems to analyze massive amounts of ment functions. This design will include a uni­ data . However, they have been too expensive for fied user interface across network products and all but a few companies. Hardware capabilities network management functions and will provide have now become large enough and cheap a choice of interface styles (e.g., command line, enough to allow the fu ll integration of both types fo rms, graphics) . The design will also allow of network management on a single system. users to customize their so�are in several ways. The computers in the new generation of VAX A customer should be able to purchase as little or systems, especially the MicroVAX II system, as much network manageqtent capability as he provide the right amount of CPU power at the desires. Customers want to �elect which network right time. As regards software customization, we products are to be managt;d and which higher­ can accommodate the need for custom-built level management functions are needed. Their products by creating base software that will sup­ goal is to tailor the managehtent application soft­ port that strategy. With PjFM, the ALL-IN-I sys­ ware appropriately for th�ir network environ­ tem was used as the base on which customer­ ments. Customers' or Digftal's field personnel designed applications could easily be added. should be easily able to add customized software That is a case study of how management applica­ enhancements, such as special reports and new tions can fit into a more global structure. As our intelligent management functions. network management architecture evolves, it Besides extending the architecture and devel­ will define the global structure to be used for oping an integrated software design, we are eval­ integrating and customizing the software for uating the market requirements for the addition both data and voice management applications. of more intelligent management fu nctions and management for emerging technologies. Part of Future Developments this evaluation is understanding the ISDN stan­ The lessons learned from deve loping the prod­ dards and fu ture products based on those stan­ ucts described above are helping us to plan the dards. We need to determine how and when the continued evolution of Digital's network man­ requirements for integrated voice and data net­ agement effort. This effort will address the man­ works and network management will appear in agement needs for the complex network environ- the customer environme!nt. We also want to

Digital Te chnicalJournal I I27 September I No. 3 1986 II The Evolution of Ne twork Management Products

understand customers' needs for the integration • Customers should be able to tailor the man­ of network management with system and appli­ agement application software appropriately cation management. for their network environments. Throughout its evolution, network manage­ • Network management is a distributed applica­ ment has become increasingly essential to cus­ tion that should be integrated into the overall tomers whose businesses depend on the opera­ system environment in support of users' tion of their networks. One problem has been businesses. that network management fu nctions have high requirements for processor power and database Acknowledgments storage. However, since processing power is The authors express their appreciation to Jim becoming cheaper, customers can now take Critser, Stan Goldfarb, Bernard Harris, Bill Key­ advantage of smaller, less expensive, yet more wo rth, John Morency, Louise Potter, and Donna powerful processors to fu lfill these needs. The Ritter for their careful review and many sugges­ primary evolutionary trend for network manage­ tions that have enhanced the ideas presented in ment has always been to make people mon; effi­ this paper. cient. The affordability of increased processor power will contribute enormously to Digital's References ability to provide integrated, extensible, and more-intelligent management functions. The 1. DECnet Digital Network Architecture availability of these functions will make people (Phase IV) Network Management Fu nc­ more efficient and effective in the future. tional Sp ecification (Maynard : Digital Equipment Corporation, Order No. AA­ Conclusion X437A-TK, 1983). Some important goals and guidelines have 2. ]. Heffernan and D. Ritter, "Remote Bridge emerged from the evolutionary process Management," DECUS NE Twords Newslet­ described in this paper. They will serve as a ter (1986). guide for fu ture network management develop­ ment in Digital Equipment Corporation . 3. N. La Pelle and K. Chapman, "Building Blocks for Remote LAN System Manage­ • New products to be used in DECnet networks ment," FOCjL AN85 Proceedings (Septem­ should incorporate basic network manage­ ber 1985): 137- 146. ment when those products are introduced. 4. D. Thompson, "A Management Standard for • Remote access to management functions is Local Area Networks," IEEE Fo urth Inter­ needed to support both decentralized and national Conference on Computers and centralized management. Communications (March 1985): 390- 396 .

• An integrated management architecture is 5. N. La Pelle and K. Chapman, "Distribution of desirable, yet_ it must allow actual product the Management Function in LAN Systems," implementations that are not tightly coupled. Second Annual ACM No rtheast Regional • Commonality in management user interfaces, Conference Proceedings (October 1985) : databases, protocols, and fu nctions reduces 250-267. complexity, makes the products easier to use, 6. M. Sylor, "The NMCCjDECnet Monitor and reduces the duplication of development Design,'' Digital Te chnical journal (Sep­ resources. tember 1986, this issue) : 129,-141. • More intelligent and automated data-analysis and evaluation fu nctions are needed to fa cili­ tate the network manager's job. These fu nc­ tions should address the network management requirements of all network products.

• The distinction between voice and data net­ works is becoming less distinct, and network management must consider both.

128 Digital Te cbnical]ournol No. 3 September 1986 Mark W. Sy lor 1

The NM CC/DECnet Monitor Design

The NMCCjDECnet Monitor system allows the monitoring of a .DECnet network. Using the monitor at a central point allows the network man­ ager to control the op eration of the network. To be effective, �e needs infonnation about the network's current configuration, state, I perfo r­ mance, and errors. The monitor maintains and interprets a dat�base of network infonnation, which is presented clearly and concisely to the user through interactive graphics and other techniques. The interpreta­ tion and evaluation techniques analyze situations that may be problems and alertthe user to them in a real-time op eration.

Network management can be described as a con­ Requirements fo r a NetworkMana ger trol and feedback loop like the one shown in Fig­ A network monitor like the NMCC system must ure 1. In this loop, information is gathered from meet many requirements.: The most imponant the network by the monitor fu nction and pre­ ones to consider in designing such a product are sented to the network manager. He then decides described as follows: : if the situation in the network is satisfactory or I • A not. If not, the manager can initiate some control Multiple managers - network may have multiple network mam1gers, people who all action - perhaps issue a correction, gather fur­ access the monitor simultaneously. The moni­ ther information, or perform a test. The control tor must allow performance data and calcula­ loop feedback cycle is "Look, Think, Act." tion programs to be shared among those man­ It's clear that one key to network management agers, even though they will typically be is the manager's having available the information asking for different types of information. he needs to make control decisions. In DECnet networks, the NMCCjDECnet Monitor system, or • Multiple styles of usage - Network managers NMCC, can provide this information at one cen­ use monitors for different purposes; hence, tral point. they have different styles of usage. The five styles of usage that are encountered are 1. Batch, characterized by the automatic pro­ duction of periodid repons MANAGER I 2. Routine, an intedcI tive style wherein. "THINK" monitoring is done · at fix�d time periods (e.g., every morning when the user comes MONITOR "LOOK" "ACT" CONTROL to work) 3. Browse, an interactive style wherein mon­ itoring is done on a random basis, when NETWORK time is available 4. Alarm, in which a. monitor notifies the user of problems W:hen they are detected Figure 1 Monitor Control Feedback Loop (A notification couldI be to color a system red on a display, print a console message, signal a beeper, et�.)

Digital Tecbnicaljournal 129 No. 3 September 1986 The NM CCjDECnet Monitor Design

5. Operational , in which the manager • Evaluate the data into meaningful information observes a terminal on which information • Present that information to the network man- about the network is continuously dis­ ager and end users upon request played We also decided to support two usage modes: a:.1 The NMCC architecture supports all five usage interactive user interface, which supports the styles. routine, browse, alarm, and operational styles of • Variety of information - The complexity of usage; and a reporting user interface, which sup­ the network is reflected in the variety of info r­ ports the batch usage style. mation that the network's components can These decisions led naturally to the overall present to a monitor. It must collect, store, �MCC design shown in Figure 2. The monitor and analyze configuration, status, perfor­ consists of three major programs: the kernel, the mance, error, and reference information about interactive user interface, and the reports the network. Each component in the network package. can supply information about one or more of The kernel collects data from the components these categories. Moreover, a monitor must in the network and stores that data in an on-line have information to control its own behavior. database. The kernel distributes the stored data both through the NMCC protocol used by the • Real time and history - A monitor must interactive user interface and through the history provide information about current conditions files used by the reports package. Running con­ in the network. Of course, "current" is a rela­ tinuously, the kernel supports parallel activities tive term because changes occur in real time for multiple simultaneous users. as more recent information is gathered. A The interactive user-interface (UI) program monitor must also provide historical data, can be run on demand by the manager or any user needed to compute trends over periods of with proper authorization. This program evalu­ time. Network managers must be able to ates the data and returnsthe subsequent informa- "replay" what occurred in the network, both . tion to the person requesting it. The UI program for long-term reporting and for immediate also manages the operation of the monitor itself. problem solving. The programs in the reports package also eval­ • Ease of use and clarity of presentation - The uate the data, which is presented as hard-copy efficiency of information presentation is very reports. The kernel periodically writes data from important, given that the manager interacts so its on-line database into history files, which are closely with the monitor. Often, graphics are archived copies of the data collected during each the best way to present complex statistical day of operation. information and topological relationships that The design of NMCC separates the kernel, are difficult to display in any other way . which is a management server, from the network manager's workstation, the user interfaces, and • Universality -A typical DECnet network is the reports package. This separation allows the implemented across many diverse computer kernel to be run on one system, while the other hardware and software systems and supports a programs can run on other systems. variety of communications media. Thus a mo�itor must be able to collect and present Common Design Threads information from each and every one of them. Three common threads run through much of the High Level Design of th e design of the NMCCjDECnet Monitor system. NM CC Software These threads involve a data model, a request/ To meet the requirements discussed above, we response operation, and a news function. decided that NMCC had to provide five basic Data Model fu nctions: Early in the design, we focused on modeling the • Collect data from the network data being manipulated by the management functions rather than modeling the functions • Store the data themselves. We felt that the organization of the • Distribute that data to users upon request data was more complex than the fu nctions.

130 Digital Technicaljo urnal No. 3 September 1986 New Products

The data does not change its organization systems and wires are the main components mod­ when passing through the collect, store, and dis­ eled by the monitor, and, since it also has to man­ tribute functions. It may change its form (e.g., age itself, some of the monitor's components are from binary to text) , but that is relatively minor. included as well. In that way, we unified two Within that portion of the monitor, the functions '· separate functions within a single concept. Fig­ can be viewed as simple database actions (i.e., ure 3 shows the component hierarchy that is read a record, wdte a record, etc.). As with a built into the monitor. The hierarchical relation­ database, deciding how to organize the data is ship shown in this Bachman diagram reflects the the most important decision in the design of the naming relationships between the components. system. Each component located below other compo­ We found we could organize the data so that nents in the hierarchy is considered to be part of any record could be identified with three keys: those components. For example, a circuit the component, the informatiOti."type, and the located below a system is part of that system. time of collection. Information Type Components All the component attributes collected and dis­ In a logical sense, components are the various played by the monitor could be viewed as a sin­ pieces of the network that must be represented gle data record. For practi�al reasons, however, in NMCC. Fundamentally, a DECnet network the attributes are distinguished by a number of consists of computer systems and the communi­ different information types'. The Digital Network cations facilities (wires) that join them. Those Architecture (DNA) structure that underlies all

DNA NETWORK MANAGEMENT

.

�-· ·· ...... : ...... , _ . -�· · ( . . . :...... · . NICE • PROTOCOL •

DNA NETWORK MANAGEMENT

Figure ,2 NM CC DECnet Monitor, Top Level Structure

Digital Technicaljournal 131 No. 3 September 1986 Th e NMCCjDE Cnet Monitor Design

Figure 3 Component Hierarchy

DECnet products provides three of these infor­ this case each sample taken is stored with a time mation types: stamp, indicating the time of collection. The local clocks of the systems monitored cannot be • Characteristics parameters that control the used for the time stamps since they are not syn­ behavior of the DECnet network chronized, nor can they be guaranteed to run at • Status parameters that reflect the dynamic the correct rate. Thus NMCC uses its own time state of the DECnet network stamps, calibrated in Universal Coordinate Time (Greenwich Mean Time) , which are generated • Counters that are incremented when an within the kernel. important event occurs (e.g., a data packet is received) Request/R esponse Op eration In addition, reference information provided by Within the data model, only a few simple func­ users and definition information used in naming tions are needed to operate on the data. Those are two more information types. The NMCC sys­ fu nctions create and delete components, read tem stores data from all five types. From that collections of records (defined by their keys) , stored data, NMCC can compute three more write records, and set one or more parameters information types: statistical, topological, and within records. summary information . Each function can be modeled as a request issued by the client software wanting that func- Time of Collection 'tion performed, fo llowed by one or more The monitor collects and stores historical data. responses to that client from the server perform­ This third key, the time of collection, is used to ing the fu nction. This interaction is shown in distinguish historical records. While data always Figure 4. has a value, the monitor can collect only samples of it. News Fu nction By examining the attributes of' the various Once each record has been appropriately time information types, we found that the data itself stamped, it is easy to access historical informa­ could also be classified. For example , parametric tion in the database . To support a real-time oper­ data is fa irly constant over a period of time. ation, changes in the data displayed have to be Rather than store the values found in each sam­ communicated from the kernel to the user inter­ ple together with the time the sample was col­ face. To accomplish that transfer, we defined a lected, we store the values found plus the times special time value called "current." Reading the those values were first and last seen, thus saving database with the time key of current causes the storage space. Counters change much more fre­ data responses to be returned in two phases. In quently than parameters, however, and, in fa ct, the first phase, the most recent data is read fro m more frequently than they can be sampled. In the on-line database. In the second phase, the

132 Digital TecbnicalJournal No. 3 September 1986 r New Products iI

SERVER CLIENT

WAITING FOR SOMETHING TO DO. I WANT TO DO X, AND THE SERVER CAN DO IT, SO ... SEND A REQUEST!

REQUEST FOR X, M IN TRANSIT. GO ON DOING WHATEVER CAN BE � DONE. (SOONER OR LATER, WE DO ALL 7 THAT CAN BE ACCOMPLISHED WITHOUT READ REQUEST. X HAVING BEEN DO�E.) DO X, WHICH MAY INVOLVE ISSUING REQUESTS OF OTHER SERVERS. SEND RESPONSE. WAIT. �NSE�

WAITING FOR SOMETHING TO DO. READ RESPONSE. j CONTINUE WITH PROCESS' ING WHERE WE LEFT OFF.

Figure 4 RequestjResponse In teraction

kernel will return a response whenever a new or The approach also has iI two . disadvantages: I changed value to the data is written to the on­ c. Adding new ways of evaluating the data line database. A response received in this second would result in major changes to the phase is called "news." News can be generated software. by the collection of more up-to-date information or by other managers modifying the database. d. The compute-inten�ive evaluation could not be performed orta separate machine. Data Evaluation 2. The data in the datab e could be stored in An important design choice was in what section � raw form and evaluated in the kernel only of NMCC should the collected data be evaluated. when requested specifically by a user. While This choice was imponant because data evalua­ avoiding the problems discussed in a. and b. tion is a compute-intensive operation. There above , this approach also suffe rs fr om the were three basic choices. disadvantages in c. anq d. 1. Data could be evaluated immediately after it 3. The data could be eval�ated in the user inter' was collected. This approach has two fa ce immediately befote presentation to the advantages: user. a. Processing has to be done only once. That We chose to use the third approach because processing, however, would take place adding new evaluation fu nctions is easy, be­ whether or not any user ever looked at the cause evaluation is perfqrmed only when re­ results. Thus the CPU time spent on com­ quested by a manager, and because the com: putation could be wasted. pute-intensive evaluatio� could be moved to b. The evaluated data would be reduced - a separate machine, a network-management and thus take less space - when stored in workstation. the database. However, a careful analysis found that in most cases the evaluated Kernel data was no smaller than the raw data. Fur­ The major fu nctional sections of the kernel are thermore, in those cases where the data depicted in Figure 5. The system is built in suc­ was reduced, information had been lost. cessive layers around the bean of the kernel, a !

Digital Te cbntcalJournal 133 No. 3 September 1986 Th e NM CCjDECnet Monitor Design

KERNEL INFORMATION MANAGER (KIM)

--

------

Figure 5 NM CC Kernel

physical database that uses Digital's RdBJVMS The logical database is contained within the software . This relational database system was kernel information manager (KIM) , to which chosen because it provides data integrity, its data all requests to read or modify data are made . model is similar to the NMCC data model, and it The actions performed by KIM are atomic, mean­ offered a simple method for handling sets of ing they act as a single unit even though com­ records. posed of more primitive actions. KIM's clients The physical database is contained within a are thus freed from needing detailed knowledge logical database (LDB) system. LDB provides of the transactions. But KIM's most important transaction services and abstracts the operations task is providing a uniform way to request histor­ on the database, thus masking from the rest of the ical and real-time data. This uniformity greatly system the detailed knowledge of how the data­ simplifies the design of all other parts of the base is implemented. The interface to LDB is code. The user interface and reports package asynchronous, allowing the rest of the system to do not need special code to perform historical proceed with other actions while data is read or real-time functions. Instead, they only have from or written to the disk. Because the interface to perform some simple data manipulations; to the RdBJVMS software is synchronous, LDB is KIM handles all the intricacies of detailed implemented as multiple server processes sepa­ processing. Many functions are clustered rate from the kernel. Each server is synchronized around KIM, all of which use it to access their with its database transaction. data.

134 Digital Tecbnical]ournal No. 3 September 1986 New Products

I Data is collected from the netwoPk by the net­ network. When discovercd, they arc added to the work management interface (NMI) , which polls on-line database. l the systems in the network periodically for data. · If allowed to, the database would grow with­ As defined in the DNA architectural specifica­ out bounds with continuohs polling. The data­ tion, which is the formal basis for the DECoct base administration (DBA) software prevents this software, each system in the network stores man­ problem by periodically pUrgingI old data from agement information and accommodates remote the database . ' access to it. u The protocol for accessing this One unique attribute of the data collected data is called NICE, which NMI uses to request · from the DECnet network is its extensibility. status, characteristic, and counter information. Each new implementatiop or upgrade of the

The components for which these types of infor­ DECnet software can defineI new fields in the mation can be collected include the system, the records returned from the polling operation. lines, the circuits, and any other remote node in That is accomplished by a data format (called the network. NICE data blocks) , which is self describing and Counters have a limited range . When they extensible. The kernel preserves this structure reach their maximum values, they latch, and any and also enhances it so thaf all data passed from subsequent events will not be counted. There­ one major fu nction to another is carried in this fore , if NMI detects any counters that have form. already or may soon latch, it can zero their The log file writer (LFW) produces the history values. files that are read to produce reports. At fixed The kernel can poll multiple systems simulta­ periods, LFW writes to a set of files the data col­ neously. The list of systems to poll and the fre­ lected since the last histo fi le was written. quency of polling for each kind of information The NMCC protocol serVerrt (NPS) is responsi­ for each component (twe lve kinds in all, four ble for the kernel's end of the protocol link by components times three types) are all controlled which the UI program communicates with KIM. by the network manager. This control data is In effect, NPS, the NMC� protocol, and the stored in the on-line database. NMCC protocol client (calledI NPC in the user The data collected by NMI is passed to KIM, interface) allow remote access to the data main- which determines if the data is news . If so, KIM tained by KIM. Multiple protocol links can be writes the news to LOB and notifies any user who supported by the kernel , thus allowing multiple has requested to be notified when that particular users to access the data. news arrives. Among the other facts that could be The need for the asynchronous operation of all I discovered from the data collected is that new these fu nctions posed a major design problem for systems, lines, or circuits have been added to the the NMCC development team. Without our going

TASK TASK TASK TASK TASK TASK ... TASK

..,.�, ---'' _,"', /� ", ", 1", /.,. • __ -"<...... _/ ...... /...... _ ' ...... ,...... -;:;_ - _:::.... / .I ...... / --- - I 0 ------! DECnet 1/0 DECnet lfO' J RESOURCE SCHEDULER . RESOURCE RESOURCE \. I I \

.__ ...... VAX/VMS OPERATING SYSTEM I }-' --. I --- TASK TO TASK MESSAGE PASSING RESOURCE . -·-· DEC net LOGICAL LINK TO OTHER PROCESSES

Fig ure 6 Kernel Resource Scheduler

Digital Te chnicalJo urnal 1 3 5 No. 3 September 1986 The NM CCjDECnet Monitor Design into excessive detail, the kernel is structured Data Access Modules as multiple, cooperating tasks running asyn­ Data access, interfacing the UI program with the chronously (e.g., a function could be one or kernel, is fu rther divided into a protocol client, a even one hundred tasks, as is the case with NMI). request manager, and two data managers. The The tasks share resources to which they at times protocol client implements the UI end of the need exclusive access. The tasks must communi­ NMCC protocol. This protocol, being based on a cate with each other, and they must be sched­ DECnet task-to-task logical link, allows remote uled. The software that performs these chores is access by multiple users to the kernel database. called the resource scheduling services (RSS) . The protocol client performs the same services The design of RSS is based on the processjmoni­ as those provided at the KIM interface within the tor structure proposed by C.A.R. Hoare.3 The kernel. This capability "hides" the kernel and relationship between the tasks, RSS, and the mes­ the logical link from the remainder of the UI pro­ sage-passing services that allow communications gram. The basic fu nctions of the protocol link between the tasks is described in Figure.6. The are to connect and disconnect the logical link, main advantage of this approach is that the devel­ and to code and decode the protocol messages. opers writing the tasks did not have to deal with The NMCC protocol supports the reading and the details of interrupts, synchronization, or modification of records in the kernel's on-line scheduling. database. The protocol is an asynchronous, fu ll­ duplex, interleaved request-response protocol. The User Interfac e (NMCCjUI) It is asynchronous so that the UI program does The user interface supports the interactive usage not have to wait for a response to a request, and of the NMCCjDECnet Monitor. As shown in Fig­ fu ll duplex so that requests and responses can ure 7, the UI program has three main parts: data flow simultaneously across the link. The proto­ access, action routines, and presentation. The col is interleaved so that a request can be issued data-access part controls the UI interfaces, and while an earlier request is still outstanding. Thus the presentation part controls the interface to the responses can be returned in a different order kernel and the interface to the network man­ than that in which the original requests were ager's terminal. The action routines, containing issued. (This situation normally happens when the main routine of the UI program, execute user news data arrives while other data is being commands and evaluate data. returned.) The protocol also allows multiple

CONTROL DATA MAIN AND SCREEN MANAGER (COM) PARSER MANAGER

DISPLAY DATA DISPLAY. GRAPHICS MANAGER (DDM) MODIFY AND KERNEL NMCC REQUEST OTHER PRESENTATION TERMINAL SYSTEM PROTOCOL MANAGER MISCELLANEOUS STYLE 1/0 (GKS) CLIENT (RM) ACTION ROUTINES

CACHE A GRAPHICS DEVICE CONTROL LANGUAGE FOR VT240s (ReGIS)

ACTION ------DATA ACCESS----.,. ------PRESENTATION ------<.,. ROUTINES

Figure 7 NMCC Us er Interfa ce

136 Digital TecbnicalJournal No. 3 September 1986 New Product�

I I requests and responses to be transmitted in a database. DDM contains an internal cache of data single message across the logical link. The proto­ that has been recently read from the database . col has proven to be very fast, yet not expensive This cache improves the monitor's performance in terms of CPU time. This fact has allowed us to by decreasing the flow of �ata between the UI see a definite improvement in performance program and the kernel. The UI process also runs I when the kernel and the UI program are run on faster because a user reques� for data can often be separate machines connected by an Ethernet satisfied by a short access to the cache rather than cable. a longer one to the on-line database in the kernel. Layered on top of the protocol client is the The cache is purged accord ing to a least­ request manager (RM) . RM maintains a list of the recently-used algorithm. I.n a read for current outstanding requests so that they can be matched information, the kernel has remembered the I to their responses. RM can be viewed as a set of request so that it can notify the UI program of service routines used by the data managers. The news. Thus the cache purging logic will ask RM interface can be either synchronous (RM to cancel that outstanding request when the data "blocks" until the response is received) or asyn­ is no longer needed. DDM also provides a consis­ chronous (RM notifies the main routine via an tent view of the data in the cache by locking its event flag when a response is received) . The contents so that cache up9ates (read responses code to access the synchronous interface is received by RM) do not change those contents ' simpler for the requesting person to program; while the action routines are reading the cache. this interface also behaves in a way that is more Among other benefits, this logical separation intuitive to the user. On the other hand, the asyn­ allows the display action routines to be quite chronous interface provides better performance simple. and must be used to implement real-time The control data manager (COM) provides the monitoring. action routines with a syn�hronous interface to If the protocol link to the kernel should be dis­ modify data in the databas�. connected (perhaps by a failure in the network) , RM wi ll first wait and then attempt to reconnect Action Routines to the kernel. These actions allow the UI pro­ The action routines control the Ul program and gram to run unattended as a permanent display. provide most of the functionalityI visible to the RM detects and eliminates duplicate read user. The major action modules are the UI main, requests, thus saving CPU time. It also cancels the parser, and the display, !modify, and miscella­ old read requests when they are dropped from neous action routines. UI' main is the highest­ the cache because of "old age." Finally, RM level routine in the program. Figure 8 shows the keeps the pipeline flowing by receiving data at pseudocode of this routine's algorithm. The Ul the interrupt level and buffering that data for program waits for one of two actions to occur: b later use. either the user enters data I n the keyboard, or a The display data manager (DDM) provides the response to a currently outstanding request is action routines with an interface to read from the received from the kernel.

Initialize; current-display:=''N etw ork Summary Current Du rat ion 15 Minutes''; I Loop Wa itFor(user-input DR response-arrived) ; If user-input Then Ge tCommand(command , current-display, new-display> ; Par se(command , current-display, new-disp lay> ; current-display:=new-display; Display(current-display> ; Unt il exi t; Termin a te;

Figure 8 UI Main Pseudocode

Digital Technical jo urnal 137 No. 3 September 1986 Th e NM CCj/JECnet Monitor Design

The parser first performs a syntactical analysis can come either directly from the database or of the command entered by the user. It then dis­ from the evaluation routines, which evaluate patches to the correct action routine, which per­ data stored in the database. The information dis­ fo rms the command. Table 1 lists the commands played can even come from different database si.1pported by the UI program. records, the intent being to display information The parser is context sensitive, meaning that that provides a clear, related viewpoint. the current display on the user's terminal is used The evaluation routines get their data from the to resolve ambiguity wherever possible. For cache in DDM. It's quite possible that the data example, if the user is currently viewing a dis­ may not yet be in the cache (if this is the first play for "Network System BOSTON" and issues time the data has been requested). In that case the command SHOW LINE UNA-0 TRAFFIC, the the evaluation routines have to either present no parser will conclude that the line referred to in data or take some reasonable default. Eventually, the command is part of a system called BOSTON. the data will appear in the cache, at which time The parser accepts abbreviated commands and the response_arrived flag will be set and the allows mosckeywords to be entered in any order. updated data will be placed on the terminal Since the main function of the UI program is to screen . present information to the user, the display com­ This approach was taken because we did not mands are the most important. want to block all actions while waiting for poten­ Each display action routine presents a single tially large amounts of data to be returned. More­ display to the user. The parser determines which over, in a real-time display, we could not predict display is the current one. Since any display can when news would arrive. In practice, this design be changed if new data arrives from the kernel, choice has proven to be sound because the user the correct display action routine is accessed on remains in control. He can issue another com­ each main cycle of the UI program . The current mand or change the current display at any time. display is defined by the three key items men­ We did find that early versions of the software, tioned earlier: the component, the page, and the which sent no indication of progress to the user, time of collection. tended to be confusing since the user was unsure The display action routines select the informa­ if the software had completed its update of the tion to be displayed and then copy it to the pre­ display. Currently, the software indicates when it sentation routines. The information displayed is "working," (i.e., updating the screen). In fu ture versions, clearer indications of progress may be added. Table 1 Commands Supported by Lll During the design we were concerned that

Display Commands evaluating all the information on a display would SHOW display-id consume too much CPU time since it was possi­ NEXT ble that only one item might have changed. We PREVIOUS considered a number of ways to run the evalua­ POP tion routines "in reverse," so that only those PAN direction [dis tance] FIND component-id items that were changed by news data would be MAGNIFY scale-multiplier• re-evaluated. However, we rejected all those COMPRESS scale-multiplier• ways since they were too complex to implement. The problem was that a change in one piece of Modification Commands ADD component-id data would change items shown on many dis­ DELETE component-id plays, only one of which was seen by the user. SET [component-id] item-list An important goal of the UI program was to MOVE component-id X coordinate Y coordinate • make as simple as possible the writing of a dis­ Miscellaneous Commands play action routine. These routines and the HELP topic evaluation software that supports them are the SPAWN DCL-command largest body of code in the monitor. REPORT report-command Many evaluation routines are supported. Some CONNECT kernel are used to compute statistics from the data col­ EXIT lected from the counters. The method used is *Only applies to the Network Map called normalization, which uses averaging and

138 Digital Te cbnicalJournal No . 3 September 1986 New Products

Figure 9 Photo of Computer Screen interpolation techniques to estimate statistics Presentation Modules over any time period. Other routines determine The network manager interacts with NMCC via the current states of portions of the network, either a VT240 terminal or a VT241 terminal. while still others determine the configuration of The software managing that interaction is called the network. the presentation software. It presents a consis­ The modify action routines change the con­ tent structure for output on the screen and for tents in the on-line database. Invoked by the keystrokes entered by the user. The screen is parser, these routines use the services of CDM divided into four areas: an identification area, and are synchronous with respect to the data­ where the current display is shown ; the data base. They were made synchronous to avoid con­ area, where the info rmation is shown; a com­ fusing users whose commands were invalid. If an mand area; and a message �rea. Figure 9 shows a . error message indicating that a command was typical screen format with' the three areas. invalid was displayed long after the user issued The UI program support,s a .number of presen­ the command, he might have proceeded with tation styles. The information on any given page another command and therefore might lose track is best displayed for the Oser in one particular of the cause for the error. style. Some of the styles �upported are forms, The remaining commands supported by the UI tables, histograms, and �maps. Each page is program perform such functions as providing designed to present the data to the user in the help to the user, spawning a subprocess, invok­ most effective manner, given the limitations of ing the reports package, connecting to a different the two terminals supported. We avoided the use kernel in the network, and the all-important of flashing warnings to a!Crt users to problems. EXIT command. Instead, each display presents data so that users

Digital Te chnicalJournal 139 No. 3 September 1986 The NM CCjDECnet Monitor Design can spot problems quickly or observe behavior on the component. Once that is done, the user patterns in an advantageous way. can press a key to issue a command with the The forms may contain text, numbers, meters, pointed-at component as the object of the com­ and color-coded values for text or numbers. mand. For example, a line could be deleted in Meters are graphs of numeric values; they may this way. Each command has an equivalent func­ also show thresholds that, if exceeded, indicate tion key. problems. These thresholds can be preset by the The values in a form may be set by typing the user, although reasonable defaults are provided. new value over the current value. Function keys The network map is a plot showing the topol­ will move the cursor from field to field. ogy of the network. Systems are shown as In the network map, a user can directly manip­ squares and wires as lines connecting systems. ulate the position of a system or wire by pointing (The shape of each line indicates the type of the cursor at the object, pressing the MOVE func­ wire .) The user can also request that each tion key, and "dragging" the component to the component shown on the map be color coded desired location with the arrow keys. These with its status. The map communicates a large actions allow users to create displays that are aes­ amount of information in a comprehensible fash­ thetically pleasing. We did not create algorithms ion to the user. Statistics can be displayed in the that tried to optimize the positions of systems form of histograms, which plot values against and wires on the map; we found that the results time. were usually too crowded or did not reflect how A table is used to display each succeedingly users pictured the network. lower level of the component hierarchy depicted Finally, the terminal I/0 (TIO) software is in Figure 3. Each row in the table describes one responsible for all access to the VT240 and component "owned" by the requested .compo­ VT 241 terminals. TIO uses GKS (a graphics nent. For example, for each system, a table is dis­ package) , SMG (a text I/0 package), and in some played listing all lines known to be <;onnected to cases the ReGIS graphics protocol to format the that system. Each column displays a key summary output or to accept input. Logically, TIO isolates fa ct about the owned component. the remaining software from having to have The raw data collected from the DECnet net­ detailed knowledge of the terminal hardware. work is shown in lists of parameters and counters with their respective values. Reports Package In the case of the map, tables, and lists, there The reports package of the NMCC/DECnet Moni­ may be more information available than can be tor is a separate set of programs run in batch displayed on the screen. Thus the data area pro­ mode to evaluate and present information about vides a "window" onto the data; the user can pan the network in list form. The programs are run in over the available information by manipulating two phases. The fi rst phase is run after the history this window. For tables and the map, the user can files have been written by the kernel. This phase also locate a component with a FIND command, normalizes the collected counters into hourly which moves the window over the component. periods. The second phase is invoked by a user. The map can be scaled by magnifying or com­ This phase extracts data from the summary files pressing the display. and formats the subsequent information into The presentation software supports a direct report listing files, which may be printed. manipulation style, as well as a command syntax The main goal in designing the reports pack­ that uses whole words rather than acronyms. In age was that it be extensible. We knew that each some cases direct manipulation is a more natural user would have his own unique requirements style to the user. 4 However, the commands allow for accessing and manipulating data. Thus the more complex actions to be expressed. For reports provided can be viewed more as samples example, one SHOW command can navigate to a of what can be done than as solutions to every new display unrelated to the current display user's needs. The VAX DATATRIEVE system, a more quickly than can a series of function key report-generation package, can be used to gener­ actions. ate custom reports from the data contained in the Each presentation style can be directly manip-. history files. ulated by a user. In tables and the map, he can The history and summary files used in the point at a component by positioning the cursor reports are simple sequential binary files in a

140 Digital TecbnicalJournal No. 3 September 1986 New Products

fixed format that are accessible through any pro­ References gramming language. j 1. DECnet DIGITAL NeI twork Architecture The first phase of report processing ·is con­ {Phase IV) General Description (May- trolled by a command file, which can be modi­ nard: Digital Equipment Corporation, fied by the user. For example, the user can auto­ Order No. AA-N149A-TC, 1982). matically produce daily reports. 2. DEC net Digital Network Architecture Summary Phase IV Network Ma nagement Func­ tional Sp ecification Networks are complex systems at the leading (Maynard: Digital edge of modern communications engineering. Equipment Corpodtion, Order No. AA­ X437A-TK, 1983). I The NMCCjDECnet Monitor syste m creates a 1 model of a network through a database of infor­ 3. C.A.R. Hoare, "MoQ.itors: An Operating mation that reflects the complex relationships System Structuring Concept," Communi­ between the components of that network. The cations of the ACM, vol. 17, no. 10 challenge in designing this monitor was to (October 1974): 549-557. present this complexity as simply as possible to the network manager. He is ultimately responsi­ 4. B. Shneiderman, "Direct Manipulation: A ble for the quality of service that the network Step Beyond Programming Languages," provides. Computer (August � 983): 57-69. The monitor was designed to be a truly dis­ Other References tributed application. Special monitor software i does not have to reside on each node, yet the R. Rubinstein and H. Hersh, Th e Hu man Fa ctor monitor can collect information about any node (Bedford: Digital Press, 1984). in the network. By separating the I/O-intensive NMCCjDECnet Monitor User's Guide (May­ kernel from the compute-intensive user inter­ nard: Digital Equipment Cqrporati. on, Order No. face, yet allowing them to cooperate in monitor­ AA-EW35A-TE, 1986) . ing the network, the actual monitoring can be divided between two machines. Both can be optimized for the tasks assigned to them. This design is a framework within which many new functions and additional data can fit. As we gain experience with using the monitor, and fe edback on the human engineering of simple presentations of complex data, we are confident that this design can support an evolving manage­ ment system. Acknowledgments The author thanks Jim Critser, Nancy La Pelle, Linsey O'Brien, and Bruce Luhrs for their thoughtful review. Most of all, he thanks the developers of the NMCCjDECnet Monitor, whose long hours and hard effort made possible the realization of this software . Those developers were Peter Burgess, Bill Gist, Matthew Guertin, Robert Merrifield, Dennis Rogers, Arundhati Sankar, Robert Schuchard, Evelyn Wang, and Riaz Zolfonoon.

Digital Te cbnlcaiJournal 141 No . 3 September 1986 Editorial Staff Editor - Richard W. Beane Production Staff Production Editor - Jane C. Blake Designer - Charlotte Bell Interactive Page Makeup -Terry Reed

Advisory Board Samuel H. Fuller, Chairman Robert M. Glorioso John W. McCredie John F. Mucci Mahendra R. Patel F. Grant Saviers William D. Strecker

The Digital Te chnical jo urnal is published by Digital Equipment Corporation, 77 Reed Road, Hudson, Massachusetts 01749. Changes of address should be sent to Digital Equipment Corporation, attention: Media Response Manager, 200 Baker Ave ., CFO I-l/M94, Concord, MA 01742. Comments on the content of any paper are welcomed. Write to the editor at Mail Stop HL0 2-3/Kl l at the published-by address. Comments can also be sent on the ENET to RDVAX::BEANE or on the ARPANET to BEANE%RDVAX.DE C@DECWRL. Copyright © 1986 Digital Equipment Corporation. Copying without fee is permitted provided that such copies are made for use in educational institutions by faculry members and are not distributed for commer­ cial advantage . Abstracting with credit of Digital Equipment Corporation's authorship is permitted. Requests for other copies for a fee may be made to the Digital Press of Digital Equipment Corporation. All rights reserved. The information in this journal is subject to change without notice and should not be construed as a com­ mitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibiliry for any errors that may appear in this document.

ISBN 1-55558-000-9 Documentation Number EY-67 15E-DP The following are trademarks of Digital Equipment Corporation: ALL-IN- I, DATATRIEVE, DDCMP, DEC, DECconnect, DEChealth, DECnet, DECnet-DOS, DECnet Router, DECnet RouterjX .25 Gateway, DECnet-RSX, DECserver, DECnetjSNA Gateway , DECnet-ULTRIX, DECnet-VA X, DEQNA, DEUNA, Digital Network Architecture (DNA) , lAS, LANBridge, the Digital logo, MicroRSX, MicroVAX, MicroVMS, NMCC, NMCCjDECnet Monitor, PDP-8, PDP-11, PjFM, PROjDECnet, Q-bus, Rainbow, ReGIS, RSTS, RSX, RSX- IIM, RSX- IIM-PLUS, RSX- IIS, RX02, TELEPRO, ThinWire, TOPS- 10, TOPS-20, ULTRIX, ULTRIX-32, ULTRIX-32m, VA X, VA X- 11/730, VA X- 11/780, VAXcluster, VMS, VT, VT 100, VTI03, VT240, VT241, UNIBUS AT&T and UNIX are trademarks of American Tele­ phone & Telegraph Company. IBM is a registered trademark of International Business Machines, Inc. Cover Design Intel is a trademark of Intel Corporation. Th is issue fe atures networking products. Our cover depicts Lightspeed is a trademark of Lightspeed Computers, Inc. the veins of a leaf as a visual metaphor fo r the connections in a network. As the leaf grows to support the flow of nutri­ Motorola is a registered trademark of Motorola, Inc. ents, so the local area network expands with extended LANs, MS is a trademark of Microsoft Corporation. ga teways, and terminal servers to support the flow of info r­ Xerox is a registered trademark of Xerox Corporation. mation. Th e image was created using the Lightspeed system. 3COM is a trademark of 3COM Corporation. 68000 is a trademark of Motorola, Inc. 1 he cover was designed by Deborah Fa lck and Eddie Lee of Book production was done by Educational Services the Graphic Design Department. Media Communications Group in Bedford, MA. ISSN 0898-90 1X Printed in USA EY-67 15E-DP Copyright© September 1986 Digital Equipment Corporation