ALL RIGHTS RESERVED Copyright ® 1986 Corporation 257 Cedar Hill Street Marlboro, MA 01752 (617)460-0500

This document is the property of Encore Computer Corporation. Encore does not convey herewith any license under its proprietary rights, its patent rights, or under the patent rights of others. This document does not imply a commitment on the part of Encore to build the described products in any form or implementation.

The information in this document is subject to change without notice, and should not be construed as a commitment by Encore Computer Corporation. Encore assumes no responsibility for any errors that may appear in this document.

Annex, Encore Continuum, HostWindow, HostStation, Multimax, Nanobus, Resolution, and UMAX are trademarks of Encore Computer Corporation, ALLY is a trademark of Foundation Computer Systems, Inc., a subsidiary of Encore.

726-01759 Rev A First Printing May, 1985 Printed in the USA Contents

PREFACE by C. Gordon Bell

CHAPTER 1 THE SYSTEM Multimax Hardware Overview 1-1 Multimax Performance Advantage 1-2 Multimax Configurability Advantage 1-4 Multimax Reliability Advantage 1-4 Multimax Software Advantage 1-5 The Encore Computing Continuum 1-5 Preparing for the Future 1-7 The New Step Forward in Computing 1-7

CHAPTER 2 HARDWARE SUMMARY DESCRIPTION Multimax System Packaging 2-1 The Multimax System Cabinet 2-1 The Multimax Peripheral Cabinet 2-1 Multimax Functional Overview 2-1 Summary Specifications 2-2 TheNanobus 2-3 System Control Card 2-6 Dual Processor Card 2-8 Shared Memory Card 2-9 Ethernet/Mass Storage Card 2-11 Hardware Options 2-12 Mass Storage 2-12 Annex Network Communications Computers 2-13 Gateway Computers 2-14 HostStationllO 2-15 Band Printers 2-16 Cables and Connectors 2-16

Multimax Technical Summary Contents

CHAPTER 3 SOFTWARE SUMMARY DESCRIPTION

Introduction to UMAX 4.2 and UMAX V 3-1 Major features of UMAX 3-2 UMAX Performance 3-6

CHAPTER 4 PARALLEL PROGRAMMING ON THE MULTIMAX Opportunities for Applied Parallelism 4-1 Required Support Features 4-2 Types of Parallelism 4-2 Independent Parallelism 4-3 Very Coarse Grained Parallelism 4-3 Coarse Grained Parallelism 4-4 Medium Grained Parallelism 4-6 Parallelizing an Application: An Example 4-12 Fine Grained Parallelism 4-12 Conclusions 4-12

CHAPTER 5 MULTIMAX RELIABILITY AND MAINTAINABILITY Multimax Self-Test Capabilities 5-2 System Self-Test and Configuration 5-2 DPC and EMC Self-Tests 5-3 SMC Self-Tests 5-4 Annex Self-Tests 5-4 The Console Command Interpreter 5-4 The System Exerciser 5-4 Software Product Reliability 5-4

APPENDIX A THE NS32000 FAMILY PROCESSOR ARCHITECTURE The Current Multimax Processor A-l NS32000 Architecture A-2 Data Types Supported A-2 Operators A-5 Register Set A-6 Instruction Set A-9

APPENDIX B UMAX 4.2 COMMAND SUMMARY

General Purpose Utilities B-l System Administration Utilities B-6 User-Contributed Software B-7 Superceded Software B-8

APPENDIX C UMAX V COMMAND SUMMARY General Purpose Utilities C-l System Administration Utilities • C-5 TCP/IP Networking Utilities C-7 Distributed, but Not Supported, Utilities C-7

Multimax Technical Summary Contents

APPENDIX D OPTIONAL SOFTWARE PRODUCTS

FORTRAN Utilities D-l Pascal Utilities D-l EMACS Utilities D-2

GLOSSARY OF MULTIMAX TERMS

INDEX

Multimax Technical Summary Preface

The Multi - A New Computer Class by C. Gordon Bell

This document introduces a new computer - the Multimax - which we believe to be the best example of an entirely new class of computing structure - the Multi.

The Multi (for multiple ) is an emerging computer class made possible by recent, powerful micros that have the speed and functionality of mid-range superminicomputers. A Multi is scalable, permitting a single computer to be built which spans a performance range, in contrast to computer families implemented from a range of technologies. The Multi is likely to impact traditional micros, minis, mainframes, and even supercomputers.

Multis can be used today - without redesign or reprogramming of applications - because computer systems often operate on many independent processes. With Multis, it is possible to operate on many of these processes in a parallel fashion, each on an independent processor, transparent to the user. Most importantly, the Multi is likely to be the path to the Fifth Generation based on parallel processing.

This Preface briefly summarizes the generic Multi - what it is, why it has come to be, and how it is applied - to better prepare those unfamiliar with this new concept for the Multimax design discussions which follow.

THE MULTI - ITS HISTORICAL AND TECHNOLOGICAL BASIS

Computer systems with multiple processors have existed since the second generation (the Burroughs B5000, a dual symmetrical processor, was introduced in 1961). Most mainframe vendors and some suppliers currently offer systems with up to four processors. However, these structures have been expensive to build - due to the high cost of typical processors - and hence have found application mostly for high-availability computing (e.g., communications, banking, airline reservations).

The modern 32-bit microprocessor's function, performance, size, and negligible cost are creating a new potential for multiprocessors. In addition to 32-bit addressing, these provide hardware support for paged and - as well as complete instruction sets with integer, floating, decimal, and character

Multimax Technical Summary 5 Preface

operations. The result is performance levels comparable to that of mid-range superminis such as the VAX™-11/750.

The Multi is a multiprocessor structure designed to use these new microprocessors to advantage. It employs an extended UNIBUS'"-type interconnect, whereby all arithmetic and input/output processor modules can access common memory modules. Cache memories attached to each processor handle approximately 95% of its requests, limiting traffic on the common . With these local caches, ten times as many processors can be attached before saturating the common bus.

With proper attention to design of critical elements (e.g., the common bus), large Multis using current-technology micros can outstrip high-end superminis, and even some mainframes, in total performance. This advantage should continue to grow. The performance of MOS and CMOS microprocessors has improved (and is expected to continue to improve) at a 40% per year rate, while TTL and ECL bipolar tech­ nologies (on which most traditional minis are based) have shown roughly a 15% per annum improvement.

Besides a bright performance future in leveraging the MOS microprocessor evolution, the Multi offers several other key advantages.

• Configurability - through modular design, the Multi allows the user to "construct" the desired level of performance or price, without having to choose among a restricted set of computer family members, none of which may provide an exact match for the requirements.

• Availability -The Multi has inherent reliability through redundancy because it is built from as few as four different module types. With appropriate software support, faulty modules which are replicated can be taken out of service - allowing continued operation with minimum downtime.

• Designability and Manufacturability - Because the Multi contains multiple copies of few modules, instead of the many unique boards in a typical minicomputer, it is faster and less expensive to design. Moreover, individual module types are manufactured in larger volumes, producing improvements in manufacturing costs over older technologies.

When compared to traditional uniprocessor designs, the Multi delivers improved performance, price, and price/performance.

APPLYING THE MULTI

Multis will be widely used for many applications because they can provide the most cost-effective computation unless the power of one large processor is required to run a single sequential program. Because of the rapid rate of microprocessor evolution, relatively few applications require single-stream performance greater than that delivered by each of the Multi's processors. This number will continue to shrink.

We can better understand where Multis can be applied by classifying the degrees of parallelism achievable. Grain size is the period between synchronization events for

VAX and UNIBUS are trademarks of Digital Equipment Corporation.

6 Multimax Technical Summary Preface

Synchronization Encore Computer Construct for Grain Size Interval Structures to Parallelism (instructions) Support Grain

Fine Parallelism inher­ <20 Specialized proces­ ent in single in­ sors (e.g., systolic or struction or data array processors) stream added to Multimax Medium Parallel processing 20-200 Multimax or multi-tasking within a single process Coarse Multiprocessing of 200-2000 Multimax concurrent pro­ cesses in a multi­ programming environment

Very Coarse Distributed pro­ 2000-1M Multiple Multi- cessing across net­ maxes, worksta­ work nodes to tions, and other form single com­ machines, on puting environ­ Ethernet ment multiple processors or processing elements Synchronization is necessary in parallel processing to initialize a task, parcel out work, and merge results. The Multi exploits the coarse- or medium- grain parallelism within an application, not the Fine-Grain, which is the focus of pipelined machine designs. Groups of Multis can interact over networks to implement very coarse granularity.

As all modern operating systems are multiprogrammed, whereby each job in the system is at least a single process, and many support multi-tasking or sub-processes, most current applications are already designed to take advantage of the Multi at the coarse-grain level. Also, when used in a timesharing or batch environment, each processor of a Multi can run a separate job to exploit the parallelism inherent in the work load The UNIX pipe mechanism allows multiple processes to be used con­ currently on behalf of a single user or job.

The Multi provides a much more efficient multiprogramming engine than the tradi­ tional uniprocessor, because the number of context switches (and hence lost time) is dramatically reduced. Additional parallelism at the coarse-grain level can be found in the operating system itself. Execution of operating system code often accounts for 25% or more of available processing time, when file, database, and communications subsystems are included. Changing the operating system internal structure allows multiple, independent system functions to run on independent processors.

When parts of an application can be reprogrammed, Multis realize additional parallelism at the medium-grain level (i.e., parallel processing) by segmenting a problem's data for parallel manipulation by independent processors. This is especially effective on simulation, scientific modeling, and analysis problems (such as

Multimax Technical Summary 7 Preface

matrix operations, linear programming, solving partial differential equations, etc.) which permit data elements to be processed in segments.

Finer granularity of parallelism is achievable in the framework of the Multi if specialized processors can be installed in the Multi's bus. This is most effective when the algorithms to be used are known a priori, such as in certain signal processing applications.

We believe that multiprocessors, augmented by programmable pipeline (i.e., systolic) and specialized processors for fine-grain parallelism, will cover the widest range of problems of any computing structure.

THE MULTI TOMORROW

With appropriate design - particularly of the common bus - Multis allow long-term performance gains through hardware evolution. As key components of the processor and memory cards improve over time, the computer can be upgraded without replacement in an evolutionary fashion. In addition, larger cache sizes through denser parts, and improved cache management disciplines will permit substantially more processors to be installed without saturating the common bus.

All of this will permit graceful evolution in performance and memory size over a range of one or two orders of magnitude.

The existence of today's cost-effective Multis should greatly accelerate the development of parallel processing for all types of applications. When this is ac­ complished, the Multi will supplant even the conventional, high-performance uniprocessors.

ENCORE AND THE MULTI

Encore Computer Corporation is committed to building the best of the Multi class, and to furthering its evolution into machines that break the 1000 MIPS barrier. Yet, we remain equally committed to industry standards, such as Ethernet, commodity 32-bit microprocessors, and UNIX, which have helped foster the emergence of these latest examples of the computer art.

The remainder of this document describes the Multimax - a powerful Multi for today, yet unique in its adaptability to the future.

C. Gordon Bell

8 Multimax Technical Summary Chapter 1

The System

Encore Computer Corporation's Multimax Immediate benefits that accrue from the system represents the best of a new computer Multimax multiprocessor and its networking class utilizing multiple processors configured architecture include: in an expandable high-performance archi­ tecture. The system incorporates from 2 to 20 • Processing performance and computing 32-bit processors, each capable of executing availability beyond that of superminicom­ 0.75 million instructions per second (MIPS), puters, at much lower cost resulting in a relatively linear performance • Scalable performance and configurability rating of from 1.5 to 15 MIPS. System features include fast shared memory (4 to 32 Mbytes) • Reliability through module redundancy and configurable I/O capacity (1 to 10 and automatic error correction intelligent network and mass storage chan­ • Rich software capability, plus easy porta­ nels). The microprocessors, memory, and I/O bility of existing applications interfaces are coupled across a wide, high­ speed main system bus to provide a single • Ability to handle a wide range of new mul­ computer product that spans a spectrum of tiprocessing applications performance from that of to that of mainframes. Building for the future, the Multimax archi­ tecture is intended to permit expansion by the A central feature of the Multimax archi­ 1990's to balanced configurations capable of tecture is its local area networking capability 1000 MIPS or better. which fosters ready communications among systems and economically supports a very MULTIMAX HARDWARE OVERVIEW large number of terminals. Terminals are The Multimax is a modular, expandable interfaced to intelligent Annex Network Com­ multiprocessor system built around a unique, munications Computers, which can be added wide, fast bus, the Nanobus, which provides in virtually any number to a Multimax system the primary communications pathways or to a network of systems. The result is a between system modules. The backplane in network-based terminal architecture with a which the Nanobus is embodied provides performance range that matches the pro­ twenty slots which can be filled by four card cessing performance range of the Multimax types: Dual Processor Cards, Ethernet/Mass system. Storage Cards, Shared Memory Cards, and the System Control Card. Eleven backplane slots

Multimax Technical Summary 1-1 The System

out of the available twenty are dedicated to system capacity and processing power is a either Dual Processor Cards or Ethernet/Mass linear function of the number of available pro­ Storage Cards (allowing processing power cessors, the amount of RAM, the capacity of the requirements to be traded off against those for mass storage devices, and the data transfer mass storage throughput). Eight slots are bandwidth. Each increment of processor, allocated to Shared Memory cards, and one memory, or I/O performance comes on a slot is reserved for the System Control Card. separate printed circuit card.

Each Dual Processor Card contains two Tightly-Coupled Multiprocessing 32032 32-bit pro­ cessors with LSI memory management units In the tightly-coupled architecture of the Mul­ and floating point . Dual Pro­ timax, all processors and programs share cessor Cards provide the general purpose access to all of main memory, I/O interfaces, computing power of the Multimax. mass storage, and the Ethernet. This feature gives the Multimax the following important The Ethernet/Mass Storage Card provides characteristics: interfaces to an Ethernet local area network and to mass-storage device controllers via a • The Multimax operating system is not rep­ Small Computer System Interface (SCSI) bus. licated for each processor - an economy These interfaces permit all I/O to be offloaded that provides more memory space for pro­ from the primary processors. In situations grams and data. Shared access to system requiring massive amounts of data storage memory also produces higher effective and transfer bandwidth, multiple Ether­ operating speed, since processors need not net/Mass Storage Cards can be installed to pass messages in order to communicate. provide multiple, independent, 1.5 mega­ • Processors are not dedicated to specific byte/second I/O channels. uses or processes. Rather, they are dynam­ ically allocated to whatever processes cur­ Shared Memory Cards, each containing 4 rently have the highest priority. megabytes of random access memory (RAM), provide system storage. Filling all eight mem­ • Memory is allocated dynamically to pro­ ory card slots currently creates a total of 32 cesses, not to processors. The result is megabytes of RAM. more efficient usage of available memory and improved interprocess communica­ The System Control Card supplies system tion. control, diagnostics, console interfacing, and • I/O channels can operate anywhere in bus coordination. main memory with no access speed pen­ alty for any section. Except for terminals connected to the two serial I/O ports provided by the Multimax Sys­ • The operating system can distribute the tem Control Card, user terminals connect to computing workload evenly. Increased Annex Network Communications Computers. loading beyond the saturation point slows Annexes connect to the Multimax directly, to down all processes slightly, without mak­ one another in daisy-chain fashion, or, via an ing the system inaccessible to any particu­ Ethernet transceiver, to an Ethernet cable lar process. shared by Multimaxes and other systems. These features distinguish the tightly-coupled Multimax multiprocessor from competing MULTIMAX PERFORMANCE ADVANTAGE closely-coupled, multicomputer designs. Ma­ Multimax systems provide aggregate perform­ chines in the latter category run each ance ranging from that of 2 to that of 20 super­ computer in a closed environment consisting minicomputers of the VAX-11/750 class - of processor, private memory, I/O interface, whether measured in MIPS or standard floating and separate operating system. Each consti­ point Fortran Whetstone operations per tuent computer is as isolated as if it were an second. For most applications, Multimax

1 -2 Multimax Technical Summary The System

independent node on a fast network (in some share data through memory, taking full cases, a small shared memory is provided), advantage of memory caching, which in­ and overall system operation can never creases access speed for every processor in the achieve the close coordination and flexibility system. In addition, the Multimax features a that characterize the Multimax. For example, fast, high bandwidth interrupt delivery mech­ more than one processor cannot efficiently be anism able to handle large, high speed bursts applied to the same task (such as running the of interrupts that can be serviced dynamically operating system) without large amounts of by multiple processors. data being transferred each time a context switch is made. Context switching overhead Fastest System Software limits how effectively the pool of processors can be balanced to a rapidly changing task Both operating systems for the Multimax, load. UMAX 4.2 and UMAX V, deliver the full performance potential of the Multimax through software design techniques unique to Highest Bandwidth Bus large scale parallel processors. Programs The ability of the Multimax to support many running on different processors can get processors, much memory, and multiple I/O simultaneous access to operating system cards derives from the extremely high data services from a single, shared copy of UMAX. transfer speed of the main system bus, the The operating system is built to deal with Nanobus. Data paths within the Nanobus can many such requests in parallel by supporting carry 96 bits of new information every 80 nsec, multiple, simultaneous streams of control - a even if previous requests are still in process. technique called multi-threading. To perform The result is a true data transfer bandwidth of efficiently when many processes and users 100 Mbytes/second. Since the Nanobus require system services, UMAX employs arbitration scheme is optimized for multi­ techniques such as caching in memory com­ processing, all system elements have fair monly referenced data structures that would access to the bus, even over short time exact large time penalties to reference on disk. intervals. The Nanobus also allows future use Because the Multimax uses only one copy of of specialized processors to handle non­ the operating system, executing multiple standard data types at very high speeds. simultaneous processes exacts minimum memory usage and bus loading penalties. The subsystems on all Nanobus cards are opti­ Moreover, processes can migrate from pro­ mized for the multiprocesing environment in cessor to processor as the system load changes which large amounts of data must be without the need to move program or data to exchanged among cards. For example, mem­ different locations in memory - which means ory bandwidth matches Nanobus capacity that dynamic load balancing can be accom­ through memory interleaving - a technique plished with much less overhead than might that allows contiguous longwords to be stored otherwise be required. The flexibility of this in up to eight separate memory banks which software design pays off in greater available can be accessed at system clock rates rather power, greater expansion potential, and a than at the slower memory clock rates. This much wider range of possible applications. technique allows data to move between pro­ cessors, I/O devices, and memory at speeds of up to 100 Mbytes per second. Most Flexible Terminal Architecture The terminal architecture of the Multimax The Nanobus provides features for fast synch­ ensures that an additional increment of ronization between processors and I/O devices performance is added with each increment of with negligible effect upon other system acti­ terminals. Each Annex contains an NS32016 vities - a requirement in a high performance processor and 500K-bytes of memory. This multiprocessing environment. Efficient inter- processing power (roughly 0.6 MIPS) is added to processor communication via memory, and the system for every group of 16 terminals, multiprocessor interrupt handling are two such features. Multimax processors are able to

Multimax Technical Summary 1-3 The System

and is used to offload terminal processing from • Costs associated with large numbers of the Multimax. long cables are greatly reduced. Annexes optimize terminal transmission • Individual Annex terminals can be packet size for all connected terminals. They "bound" to a particular Multimax ma­ transmit packets as short as a single character chine, but they can also be left unalloca­ or as long as an entire line, depending on the ted. Unallocated terminals are available application. The result is that terminals to any user in need of any Multimax host generate many fewer Multimax interrupts (or any other host running UNIX 4.2BSD) than they would if connected to conventional residing on the Ethernet. In addition, serial interfaces. users of unallocated Annex terminals can log in to several hosts and easily switch

MULTIMAX CONFIGURABILITY ADVANTAGE from host to host at the same terminal. Traditional computer architectures tend to • Annexes make the network a transparent shared resource system, one in which Mul­ dictate sweeping changes in operating system timax processors share control of a wide and applications software with each upgrade, because new generations of technology are range of peripheral devices, and system rarely fully compatible with the old. Multimax users share access to several Multimaxes, systems, however, allow users to start out other hosts, and gateways attached to the Ethernet. with a minimal configuration and enlarge it in small increments to a very large system with­ out the costly disruptions of converting to new Annexes are tailored precisely to intelligent hardware and software architectures. Simply terminal concentration. Each Annex is down­ adding more processor, memory, or I/O cards loaded at startup with appropriate software from any Multimax via the Ethernet. increases the Multimax processing capabili­ ties. And Multimax software - whether operating system or user-developed applica­ MULTIMAX RELIABILITY ADVANTAGE tion - runs in the larger environment as harmoniously as it did in the smaller; there is In addition to parity protection and Error Cor­ no need to tailor the software to the new recting Code (ECC) data correction at critical system. points in most data transfers, Multimax sys­ tems perform automatic retries on soft error Adding more terminals simply requires conditions, thus reducing failures within the adding Annex computers, a virtually unlim­ system and contributing to greater system availability. ited number of which can be geographically distributed on an Ethernet. This configuration flexibility provides several benefits: The Multimax System Control Card contains a microprocessor and local memory dedicated • Traditional limits on the number of termi­ to running confidence checks at each start-up, nal and printer ports are gone - literally as well as to monitoring normal system oper­ thousands of ports can be provided by ation. At restart, it can automatically recon­ Annexes configured on an Ethernet. And figure the system around failed, redundant each Ethernet/Multimax combination can cards, and thus increases system availability. service several hundred simultaneous Temperature sensors located in critical areas users without contention problems. of the enclosure trigger warnings when fans • Terminal-to-computer distance increases malfunction or airways are blocked. Proces­ by an order of magnitude beyond that sors that malfunction can be deconfigured attainable with standard serial cable automatically on system restart; memory connections. This eliminates the expense banks that accumulate a history of excessive of modems and associated private or public ECC operations - or that manifest any uncor­ lines. rectable error states - can be taken out of ser­ vice. Faults are identified on the system

1-4 Multimax Technical Summary The System

console whenever they are discovered; non­ Many existing applications will run (without fatal faults discovered after the UMAX redesign or reprogramming) in parallel, each operating system has been started are made on an independent processor, transparent to available for logging by the operating system. the user. Encore's operating systems also support parallel processing, in which the user Because of this self-diagnosis capability, non­ can dedicate multiple processors to a single technical personnel can maintain the Multi- problem, using shared memory. max. Cards that are removed from service by the diagnostic processor identify themselves THE ENCORE COMPUTING CONTINUUM by status lights. Consequently, users can correct the problem at the first opportunity to The Multimax multiprocessing computer sys­ power down the system by merely unplugging tem is one aspect of the Encore Computing the defective card and plugging in a replace­ Continuum, which provides a true multi­ ment. Since only four major logic card types processing and distributed computing envi­ make up the system's critical components, it is ronment that locates computing resources inexpensive to stock replacement spares. In where they are most needed. The Encore many systems, memory and processor "spares" Continuum uses tightly-coupled multiproces­ can be "stored" on line in the Multimax sing, distributed, intelligent control of peri­ backplane rather than on the shelf. Not only pherals, and clustering of multiple Multimax does this procedure reduce downtime when systems to provide users with a powerful, failures occur, it also provides standby integrated, and open-ended computing net­ processing power for those occasions when it is work capable of solving computing needs well needed. into the future. And it works with, not against, existing hardware and industry standards. MULTIMAX SOFTWARE ADVANTAGE The programmability and usability of the The Encore Computing Continuum, dia­ Multimax is enhanced by its operating system grammed in Figure 1-1, includes: software, UMAX, Encore's implementation of • One or more Multimax multiprocessors, UNIX. Two versions are available, based on the the scalable computing resource. Each two most popular versions of UNIX - each Multimax system provides an aggregate extended to function optimally in a multi­ 1.5 to 15 MIPS performance range, which processor environment. Versatile networking will increase dramatically as new tech­ software based on TCP/IP protocols enables nology improves the performance of communication between Multimaxes and individual microprocessors. other systems running UNIX 4.2BSD. These industry-standards, as well as such standards • Future clusters of today's Multimaxes that as X.25, SNA and IEEE floating point format, could achieve performance in the hun­ simplify integrating Multimaxes into existing dreds to thousands of MIPS. networks and provide flexible means to solve • Annex network communications compu­ users' immediate processing problems. In ters, which put computing power out on addition, they accommodate future growth the network for intelligent control of peri­ without loss of investment. pherals such as terminals and line prin­ ters. This approach allows more efficient Encore is also integrating innovative software use of Multimax computing power by products of its own in such areas as pro­ reducing its interrupt and character gramming languages, software development processing load. tools, productivity tools, improved commun­ ications protocols, and distributed processing • Encore Gateway Computers, for making features. The framework for this software is connections between the Encore Comput­ an open architecture that can incorporate new ing Continuum and external computing software standards as they arise. Users bene­ environments. These gateways enable fit from leveraging off these standards. connections to and from public data net­ works (X.25) and IBM SNA facilities.

Multimax Technical Summary 1-5 The System

Figure 1-1 The Encore Computing Continuum

Communications take place without of unified software tools: compilers for consuming valuable host cycles in the software development, database managers translation of foreign protocols. All trans­ and applications-building environments lations are done in the gateways, which for business, communications capabil­ accommodate any computer or work­ ities, documentation tools, and printing station in the Encore Continuum. The facilities. gateways use password protected ports • Interconnection to other computer systems and access control lists to achieve security. that support TCP/IP networking protocols. Network monitoring and peak traffic load reports assist the administrator with capacity planning. The Encore Continuum Applied • HostStation 110 display servers and a The Continuum applies the greatest range of variety of printers and other devices that computing power to the widest range of connect via one or more Annexes to the computing problems. A Multimax can start Ethernet local area network. with just a few processors and grow to 20 processors. It can then be interconnected via • A software environment that offers a Ethernet or other networking techniques to choice of operating systems and a wide other Multimaxes and even computers from range of applications and tools. Imple­ other vendors. Users achieve maximum effi­ mentations of both UNIX 4.2BSD and UNIX ciency for terminal support by inexpensively System V are available for the Multimax. adding powerful Annex computers onto the Each gives multiple processors simulta­ network. These additions achieve very neous access to operating system services efficient, cost-effective timesharing for almost for maximum efficiency. In addition, the unlimited numbers of users. In addition, software environment provides a large set

1-6 Multimax Technical Summary The System

Multimax's ability to support large numbers A NEW STEP FORWARD IN COMPUTING of terminals, combined with its large physical and virtual memory and its high I/O band­ The Multimax architecture defines more than width make it ideal for large database and just a new computer: it provides a new strategy transaction processing applications. of computing, one that is more modular, more flexible, more economical than any of its predecessors. PREPARING FOR THE FUTURE Multimax places few restrictions on the type Multimax unites a broad range of power with of processors that can be interfaced to the the versatility of multiprocessing, the econ­ Nanobus. Although the initial design uses omy of industry standards, and the wide dis­ National Semiconductor 32032 micropro­ tribution of UNIX-based software. Offering a cessors, cards using new or different processor performance range from supermini to main­ architectures can be accommodated without frame in one machine with unprecedented low major redesign. The bandwidth of the Nano­ price and high performance, the tightly- bus is broad enough to accept future micropro­ coupled multiprocessor ushers in a new phase cessors with markedly higher performance. of computing.

Similarly, customized expansion buses of Multimax also provides an architecture with almost any design can be connected to the far-reaching implications for coming evolu­ Nanobus merely by constructing an appro­ tions in microprocessors, memory chips, I/O priate adapter. The system is tolerant of buses, and mass storage technologies. Multi­ diverse technologies. max is the only computer that brings the per­ formance and price benefits of the multi to Finally, the Nanobus is already provided with practical applications today, and yet provides addressing and prioritizing features that will a clean growth path to the increasingly paral­ permit multiple systems to be interconnected lel computing structures of tomorrow. to form a single very large computing array. The central feature of this array will be that any processor in any participating Multimax will have the same closely-coupled access to the physical resources of the entire array that it enjoys for the resources connected to its own Nanobus. The potential power of such arrays is enormous.

Multimax Technical Summary 1-7 Chapter 2 Hardware Summary Description

This chapter provides an overview of the • I/O control interfaces (for disk and tape Multimax system hardware: its physical drives) packaging; its modular processor, memory, • I/O connection panel and input/output components; its peripherals; and its hardware options. The I/O connection panel at the bottom rear of the cabinet provides all cabling connections to MULTIMAX SYSTEM PACKAGING Peripheral Cabinets, networks, and external removable disk drives. A Multimax system consists of two or more cabinets: the System Cabinet, and one or more Peripheral Cabinets, normally attached to the The Multimax Peripheral Cabinet System Cabinet or to one another. The Peripheral Cabinet (see Figure 2-2) houses mass storage, and can contain one 6250 The Multimax System Cabinet bpi half-inch tape drive and from one to four 500 Mbyte fixed-disk drives. Available as This cabinet (see Figure 2-1) houses all options are auxiliary peripheral cabinets with Multimax system components except for mass room for up to four additional fixed-disk drives storage peripherals (which go in Peripheral and/or an additional tape drive. Printers and Cabinets) and communications ports for removable media disk drives are housed in terminals, line printers, and foreign networks their own cabinets. (which are on Annex Network Communi­ cations Computers). The System Cabinet front panel contains status indicators as well as MULTIMAX FUNCTIONAL OVERVIEW power and control switches. Each Multimax Diagrammed in Figure 2-3 are the functional System Cabinet provides the following components of the Multimax - the main resources: system bus; the processor, memory, and • System power supplies (battery backup input/output cards; the mass storage con­ optional) trollers and peripherals; and the Annex Network Communications Computers for ter­ • Cooling fan minals, line printers, and gateways. Items • One 20-slot Nanobus backplane rendered by solid lines are required on all Multimax machines, while items rendered by • Control panel and display dotted lines are optional.

Multimax Technical Summary 2-1 Hardware Summary Description

'71

<

Centrifugal Blower and Motor

Disk/Tape 20 Slot Nanobus Controllers Backplane with Bus Bar

5V, 300A Power Supplies

± 5V. + 12V Battery Back-up Unit (Option) Power Supply

Figure 2-1 The Multimax System Cabinet

240 VAC, single-phase, 30A (System Summary Specifications (two-cabinet system) Cabinet) • Size: 60.5" H x 40.5" W x 32" D (154 cm x 240 VAC, single-phase, 20A (Peri­ 103 cm x 81 cm) pheral Cabinet) • Required front/rear clearance: 22" (56 cm) • Noise Level (A-weighted, at 1 meter, 1.5 • Weight: approximately 1000 lbs (455 kg) meters from the floor): minimum Front: less than 67dB SPL • Power Requirements: Rear: less than 63dB SPL

2-2 Multimax Technical Summary Hardware Summary Description

X /

r- Bffl

TDC-01 Halt-inch Tape Drive

z:

FDD-01 Fixed Disk Drive(s)

Figure 2-2 The Multimax Peripheral Cabinet i Heat Dissipation: drives, battery backup option): 25,200 BTU/hour Minimum (4 Nanobus cards, 2 con­ troller cards, 1 fixed-disk drive, 1 tape THENANOBUS drive): 5550 BTU/hour Typical (10 Nanobus cards, 3 con­ The Nanobus is so named because it is one foot troller cards, 4 fixed-disk drives, 1 long - approximately the distance traveled by tape drive, battery backup option): light in one nanosecond. The Nanobus is the 14,000 BTU/hour industry's fastest bipolar bus, providing a true data transfer rate of 100 Mbytes per second. It Maximum (20 Nanobus cards, 5 con­ can transfer interrupts at high speed and at troller cards, 8 fixed-disk drives [4 in high burst rates, and supports hardware load second Peripheral Cabinet), 4 tape

Multimax Technical Summary 2-3 Hardware Summary Description

: (Terminals, : printers, •modems,etc)

Multimax System Cabinet

: : DP EM 0 Card Card : ; i3 External - Disk Drives

External Half-inch Tape Drives

Annex: Network I/O Computer DP : Dual Processor SC: System Control EM: Ethernet/Mass Storage SM: Shared Memory

Figure 2-3 Multimax System Functional Diagram levelling of interrupts among processors. Electrical and mechanical support for a fully Facilitating cache coherency and incorporat­ loaded, 20-card Nanobus comes standard with ing high reliability and fair arbitration the Multimax, guaranteeing easy expansion. features, the Nanobus supports internal data rates high enough to accommodate a large Note that some Nanobus cards (the SCC, DPCs, variety of future system enhancements. and EMCs) can request use of the address bus but do not respond to requests for data. These The 20 slots of the Nanobus backplane accom­ cards are generically referred to in the modate three types of Nanobus cards: following discussion as requesters. Some cards (SMCs) do not issue address bus requests, but • System Control Card (SCC) I slot do respond to requests for data from requester • Shared Memory Cards (SMCs) 8 slots cards. These cards are referred to as respon- ders. One card, however, the SCC, shares both • Bus requester cards 11 slots characteristics and is therefore called a (Includes Dual Processor Cards [DPCs], req uesterl responder. Ethernet/Mass Storage Cards [EMCs])

2-4 Multimax Technical Summary Hardware Summary Description

• Separate vector bus (14 bits wide). Pro­ Maximizing Bus Performance vides a path for interrupt vector distribu­ The Nanobus possesses high performance tion throughout the Multimax. All throughput and response time characteristics requesters on the Nanobus can generate that enable system components to exchange interrupts that must be fielded by other large amounts of data with minimum commu­ requesters. Since these interrupts move nication delays. Toward this end, the Nano­ across the independent vector bus, they do bus, diagrammed in Figure 2-4, incorporates not interfere with data and address the following features: transmissions. • High speed synchronous operation. Pro­ • Separate, parity-protected control bus. This vides up to 12.5 million bus "transactions" bus consists of many miscellaneous control per second (all data transfers are synchro­ lines carrying reset, power fail, and clock nized with a 12.5 MHz bus clock). signals, as well as signals supplying aux­ iliary information about every data trans­ • Separate parity-protected address and data fer. Since these lines permit additional buses. To assure maximum bandwidth, control information to be passed in each addresses and data are not multiplexed on bus cycle, fewer cycles are needed to per­ the same bus lines. The address bus is 32 form complex operations. bits wide, plus 4 parity bits; the data bus is 64 bits wide, plus 8 parity bits. • Pended "deferred response" bus operation. Fullest use of Nanobus cycles is assured by allowing many transactions to occur

Figure 2-4 The Nanobus

Multimax Technical Summary 2-5 Hardware Summary Description

simultaneously across the bus. After a bus Figure 2-5 shows the basic functional blocks "requester" asks for data, other unrelated of the SCC. These blocks are discussed below. requests and responses can occur before the bus "responder" returns that data. Bus The Diagnostic Processor protocols facilitate pended operation by tagging address and data with the Based on an NS32016 microprocessor, the SCC requester's identity (using control lines). Diagnostic Processor can access up to 128K bytes of ROM (64K bytes standard), 512K bytes • Pipelined bus interfaces. Bus interfaces of on-board dynamic RAM (128K bytes can pipeline multiple bus transaction standard), and 4K bytes of static RAM (always requests by buffering them at different backed up by an on-board battery). It performs stages of processing. A memory card, for system tests and initialization after power-up, example, can send data to a requester, provides a time-of-year clock, and supervises while simultaneously accepting an unre­ the system control panel as well as the system lated request for data from another console port and a port designed to allow requester. This is a key feature for effec­ remote control of the system. The Diagnostic tively utilizing the full bandwidth of bus Processor also takes control of the Nanobus interfaces. and all associated modules when a serious • Processor-memory interlocked operations. system error occurs. If the error was caused by The Nanobus assures rapid synchroni­ a failed component on a Nanobus card, the SCC zation among processors through atomic can deny that card access to the Nanobus on test-and-set protocols among bus reques­ the next restart and inform the operating sys­ ters and responders. Interlocked read- tem that the card is inactive. modify-write bus cycles can overlap one another and other bus transactions Shared Memory and Timers without compromising atomicity, and other system activity is not delayed while The Shared Memory and Timer facility is interlocked operations are taking place. accessible by all active modules on the Nanobus but, like Shared Memory Cards, it

SYSTEM CONTROL CARD

The System Control Card (SCC) functions as the communications clearinghouse, bus coor­ dinator, and diagnostic center for the Multi- max. This card provides the following functions: Interface to front panel switches and indicators System start-up Hardware fault diagnosis Control System Bus arbitration Clocks Bus timing signal generation Data Bus Arbiter Interval timers

Time-of-year clock Address Bus Environmental monitoring Arbiter Console Terminal and Remote Diagnosis interfaces Figure 2-5 System Control Card Block Diagram

2-6 Multimax Technical Summary Hardware Summary Description

does not actively initiate bus requests. This Vector Bus Arbiter facility contains timers for creating process IDs and timed interrupts. Of particular note is The vector bus cycle time is 160 nsecs, the free-running counter, a 32-bit counter allowing for low interrupt latency, and a high which increments every microsecond and interrupt burst rate. Vector bus arbitration is provides a mechanism for precise interval of two different types - one for gaining access measurement by software. This logic also has to the vector bus itself, the second for defining 32K bytes of static RAM that is used to which processors sharing the vector bus are exchange commands and acknowledgements "most interruptable." The first type is between the SCC and active Nanobus modules. implemented by the SCC; the second type is, The static RAM is supported by the system strictly speaking, a function of vector bus battery back-up option. architecture and protocol. In general, the operating system dynamically Nanobus Interface assigns processors to one of three "interrupt classes," and the interrupt designates the This interface permits the SCC's diagnostic particular class. Interrupt vector arbitration processor to access other Nanobus cards while hardware then directs the interrupt to the simultaneously allowing those cards to read processor with the lowest interrupt priority and write the memory and registers in the within its class. Shared Memory and Timers logic.

System Clocks Address, Data, and Vector Bus Arbitration The 12.5 MHz master system clock for the Address, data, and vector bus arbitration are Multimax system is distributed from the SCC. accomplished interactively between arbiters All bus clock lines on the Nanobus are driven on the SCC and the individual Nanobus cards. by this SCC logic.

The Address Bus Arbiter Environmental Monitors The Address Bus Arbiter decides priority on a The SCC monitors all system power supplies round-robin basis: once a module is granted except that for battery backup. In addition, it access to the address bus, it becomes the is equipped with two temperature sensors, one lowest priority module, and the next logical near the top and the other near the bottom of module is assigned the highest priority. the card, and also monitors a temperature sensor on the front panel printed circuit board. The Data Bus Arbiter By comparing the output of these three sensors, the SCC can provide early warning of The Data Bus Arbiter uses a fixed priority problems ranging from excessive ambient algorithm that gives the SCC the highest pri­ temperature to blocked air passages or failed ority, followed by any requester/responders in fans within the Multimax. the last two slots of the Nanobus, and Shared Memory Cards. If a write cycle is in progress Potential environmental problems are repor­ on the address bus, the data bus always ted to the operating system, which can log transfers write data in the next bus cycle minor variations on the system console or, for regardless of other pending data bus requests. severe problems, initiate an orderly system Memory modules can assert a special priority shutdown. This shutdown can be carried all request for the data bus if an effort to put read the way to removing system power since the data on the bus is unsuccessful for four clock SCC has control of the system AC circuit cycles. This special request suspends address breaker and can be commanded to cut AC bus arbitration for writes and allows read data power to the system. to be transferred unless a higher priority responder is also asserting a data bus request.

Multimax Technical Summary 2-7 Hardware Summary Description

addresses of the locations thus stored is kept in Front Panel and Serial Interface the CPU tag memory array. Thereafter, any The SCC provides the interface to the front reference to those locations will access the panel switches and indicators. It also provides cache, not main memory. Cache accesses do four serial ports that are brought to the not incur the processor wait states required for Multimax I/O panel. Of these four, two serve main memory access, nor do they impose any debugging purposes and two serve as system traffic on the Nanobus. console ports. One of the latter ports can be connected to a modem and used for remote The DPC cache is kept current with relevant Encore system diagnosis. changes in main memory (generated by writes from other processors or I/O devices) by means of the bus tag logic, which continuously scans DUAL PROCESSOR CARD the Nanobus for memory writes involving Each Dual Processor Card (DPC) provides two locally-cached addresses. When such writes independent 10 MHz NS32032 processors and are detected, the "valid" bit for that cache one shared cache memory. Each processor has address is switched to its invalid state. Later, a (MMU) that when an on-board processor next needs data enables the generation of 32-bit physical from that cache address, it will recognize that addresses. The DPC is diagrammed in Figure the associated cache entry is now invalid and 2-6. go to main memory rather than cache for the data. A main memory access automatically updates the entry in cache. 32-Bit Processor Each National Semiconductor NS32032 on the Because the bus tag logic is independent of the DPC is a 32-bit processing unit with a full 32- CPU tag logic and replicates its data, bit data bus to memory. Executing approx­ maintaining cache concurrency through bus imately 0.75 million instructions per second monitoring occurs without impacting speed of (MIPS), the processor has a compactly encoded access to the cache by the processors. instruction set that provides efficient and economical support for high level languages. Memory Management Its fully integrated floating point instruction set, and 13 addressing modes designed for the Associated with each processor is an MMU, kinds of accesses compilers generate, make which provides hardware support for demand- the NS32032 a powerful and efficient pro­ paged virtual memory management. The MMU cessing engine for the Multimax. translates the 24-bit virtual addresses gener­ ated by the processor into 32-bit physical addresses. Cache Memory

Multimax processors spend most of their time Associated with each MMU are a Translation reading instructions and data from memory, Look-aside Buffer (TLB) and an Extended executing those instructions, and (much less Translation Look-aside Buffer (ETLB) which often) writing new or transformed data back to speed up address translation by providing a memory. In a shared memory multiprocessor quickly accessible cache of the most recently environment, processors also communicate used translations. with one another, mostly through shared memory. Floating Point Unit The Multimax decreases memory access time Two National Semiconductor 32081 floating and bus loading by storing frequently point units are standard equipment on all referenced instructions and data in a 32K-byte Multimax DPCs. These FPU chips provide 32- cache of fast static RAM on each DPC. Memory and 64-bit IEEE floating point operations at data is stored in this cache whenever either of approximately 250,000 floating point opera­ the two processors on a DPC reads or writes tions per second. main memory locations, and an index of the

2-8 Multimax Technical Summary Hardware Summary Description

DP CARD NANOBUS LINES TSE TSE Timing and Control Logic Logic Control A B Local State Machines

Processor A* Data Parity Gen/Chk I Diagnostic Data Parity Vector Vector and Slave Cache Bits FIFO FIFO Logic

I Registers Address Parity Generator

Processor B* Registers

| "Contains: J Processor Bus CPU Valid ' MMU Tag Tag Bit Address I ETLB Store Store FPU Parity Check r \l

Figure 2-6 Dual Processor Card Block Diagram

Time Slice End Interrupt Control SHARED MEMORY CARD

The Time Slice End (TSE) logic provides a way Each Shared Memory Card (SMC - see Figure for a processor to terminate a compute bound 2-7) provides 4 Mbytes of random access process after a software-selected amount of memory (RAM) in two independent banks of execution time. The TSE logic can provide an 256K-bit MOS RAM chips. Each card supports 2- interrupt after from 1 to 232 100-nanosecond way interleaving between banks and 4-way clock ticks (that is, 100 nanoseconds to interleaving between boards - permitting 8- approximately 430 seconds). TSE interrupts act way interleaving on systems that have at least directly on the processor and do not pass four SMCs. The base address and interleaving through the Vector Bus FIFO. characteristics of each memory card are set under software control at system startup.

Multimax Technical Summary 2-9 Hardware Summary Description

The memory cycle time of each SMC is four A key feature of the SMC is that any byte in its Nanobus cycles, or 320 nanoseconds. During memory can be used as a multiprocessor this time a given SMC can compose up to eight "lock," and these locks can be set or reset bytes of data for transfer to the Nanobus or across the Nanobus using atomic read-modify- accept and store up to 8 bytes received from write bus cycles. Other processors, when the Nanobus. Since the bus architecture testing the state of a lock, will first read the allows another interleaved board to begin a byte's contents from the SMC into their cache, new 8-byte memory transfer with each and subsequently read from the cache until successive bus clock cycle, the four boards the value of the lock changes. The result is involved in 8-way interleaving can transfer that no load is imposed on the Nanobus or the double longwords of data (64 bits or 8 bytes) at SMC during the time spent waiting for the lock an aggregate rate of 100 Megabytes per to change state . second. All data is stored with an Error Correcting Code (ECC). Single bit errors in each 32-bit

SM CARD

Figure 2-7 Shared Memory Card Block Diagram

2-10 Multimax Technical Summary Hardware Summary Description

longword are detected and corrected with each the Small Computer Systems Interface (SCSI) access; double bit errors are detected and bus. reported. In addition, the SMC "sweeps" the entire memory array during refresh and cor­ Ethernet is an industry-standard lOMbit/sec rects any single bit errors that it encounters. Local Area Network (LAN) used to connect the Since a full refresh sweep on each bank occurs Multimax to Annex Network Communica­ approximately once every eight seconds, the tions Computers and to other computer sys­ likelihood of an uncorrectable double bit error tems. Processing nodes with their own Ether­ is dramatically reduced. Because of ECC, two of net interfaces are connected by transceiver the 256K-bit memory chips on an SMC (one in cables to transceivers, which in turn are each bank) could fail completely without connected to an Ethernet coaxial cable that impacting system operation. can be as long as 500 meters. The Ethernet interface on the EMC attaches to an Ethernet Each SMC carries a diagnostic processor that transceiver port on the Multimax I/O panel. checks both of its memory banks at power-up and whenever directed to do so by the system Up to four disk controllers (supporting up to diagnostic processor on the System Control eight disk drives for upwards of four gigabytes Card. In addition, each memory bank main­ of storage) and one tape controller (supporting tains a Control/Status Register (CSR) through up to four tape drives) can be connected via the which it reports single or double bit errors, or SCSI bus, either to a single EMC card or, if bus parity errors, to the requesting processor. additional throughput is needed, to multiple The Parity & ECC logic identifies parity errors EMC cards. in written data so that the requesting processor can automatically retry the write The Nanobus provides 11 card slots for DPCs or operation. EMCs. At least one DPC and one EMC are always installed; the remaining 9 slots can accom­ modate any mixture of DPCs and EMCs. ETHERNET/MASS STORAGE CARD However, since the System Cabinet provides The Ethernet/Mass Storage Card (EMC) pro­ slots for only four disk controllers and one tape vides interfaces both to the Ethernet, and to

EM CARD

Figure 2-8 Ethernet/Mass Storage Card Block Diagram

Multimax Technical Summary 2-11 Hardware Summary Description

controller, one to three EMCs are all the system HARDWARE OPTIONS usually needs. In addition to the standard Nanobus cards, the The EMC consists primarily of three functional following hardware options are available for blocks, shown in Figure 2-8. The Nanobus Multimax machines: interface logic on this card is shared by the EMC processor, the SCSI control interface and Mass Storage the LAN control interface. Any of these three functional blocks can become temporary Disk and tape drives interface to the Nanobus master of the Nanobus interface and thereby by means of Ethernet/Mass Storage Cards and gain access to the entire Nanobus memory auxiliary SCSI controller cards located beside space. the Multimax backplane. External mass storage devices are either housed in their own independent short cabinets (like the remov­ EMC Control Logic able disk drive), or are mounted in a Multimax This portion of the EMC consists of an NS32032 Peripheral Cabinet. A Multimax system can processor equipped with local ROM for program have from one to four Peripheral Cabinets, storage, 64K bytes of local RAM for program providing up to eight disk drives and four tape and data storage, local control and status drives. registers, vectored interrupts, and two win­ dows into main memory. FDD-01 Fixed Disk Drive

The EMC processor has three principal tasks. It The primary Multimax disk drives are ran­ accepts LAN and SCSI control information from dom access rotating memory units using fixed the system; it uses this information to initiate Winchester disks as storage media. These LAN and SCSI interface DMA data transfers; and units mount in Peripheral Cabinets which can it monitors the operation of both the SCSI and support up to four disk drives and one optional the LAN control blocks. tape drive per cabinet. Storage capacity: 516 Mbytes (unfor­ SCSI Control Logic matted); 408 Mbytes (formatted) Disks!spindle: 7 The SCSI control interface consists of a SCSI bus Data surfaces!spindle: 12 controller, a 512-byte data FIFO, a micropro­ Read!write heads: 24 cessor, and a dedicated SCSI direct memory Number of cylinders: 711 access (DMA) engine. The SCSI bus controller Data transfer rate: 1.819 Mbytes/sec transfers data between the SCSI bus and the Seek time (ms): full, 45; average, 20; SCSI data FIFO under control of the SCSI pro­ single track, 5 cessor. The SCSI data FIFO can be filled from or Average rotational latency: 8.33 ms transferred to Multimax main memory by the Positioning method: Linear voice coil SCSI DMA engine. actuator Size: 10.2" H x 8.5" W x 30" D (25.9 x LAN Control Logic 21.6 x 76 cm) Weight: 82 lb (37 kg) The LAN control interface consists of an Ether­ net controller, a separate dedicated DMA engine, and 32K bytes of local memory. This RDD-01 Removable-Disk Drive memory is used to store transmitted and This unit is a random access rotating memory received data, command and status informa­ using removable disk packs as storage media. tion, network management statistics, and Each drive is provided with its own top- diagnostic information. Any portion of the LAN loading short cabinet. memory can be filled from or transferred to Multimax main memory by the LAN DMA Storage capacity: 309.5 Mbytes (unfor­ engine. matted) ; 252 Mbytes (formatted) Disks! spindle: 10

2-12 Multimax Technical Summary Hardware Summary Description

Data surfaces!spindle: 19 by connecting additional Annexes to the Read/write heads: 19 network. Number of cylinders: 808 Data transfer rate: 1.2 Mbytes/sec Versions of Annex exist both for use with IBM Seek time (ms): full, 55; average, 30; hosts employing BSC and SNA 3270 protocols, single track, 6 and for use with public data networks Average rotational latency (ms): 8.33 employing X.25 protocols. Size: 34" H x 19" W x 34" D (86.4 x 48.3 x 86.4 cm) The Annex Design Weight: 243 lb (110 kg) The Annex is a powerful standalone computer containing an NS32016 microprocessor, 512K TDC-01 Half-Inch Tape Drive bytes of local memory, an Ethernet trans­ This unit is a 9-track magnetic tape drive that ceiver port, an Ethernet cascade port, 16 serial can be installed in the upper portion of a Peri­ ports for connecting to terminals or modems, pheral Cabinet. and one parallel printer port. The unit's diskless design permits it to run unattended. Reel Diameter: 10.5", 8.5", or 7" (26.67, 21.59, or 17.78 cm) The Annex interfaces either directly to the Recording Mode: Multimax Ethernet transceiver port (with no Normal: Phase Encoded, ANSI-and intervening Ethernet cable and transceiver), IBM-compatible, 1600 bpi or through a transceiver to an Ethernet cable (63 CPmm) see Figure 2-10). High Density: Group Coded Recording (GCR), 6250 bpi (250 CPmm) Annex Cluster Kits allow up to four Annexes Tape Velocity: to be daisy-chained together. These clusters 25 ips (start/stop or streaming) can be connected either to an Ethernet 75 ips (streaming only) transceiver drop cable or directly to the Capacity (unformatted; 2400 foot tape): transceiver port of a Multimax. The direct 40 Mbytes @1600 bpi connection option permits up to 64 terminals 146 Mbytes @ 6250 bpi (8K byte to be located within 50 meters or so of a blocks) Multimax and eliminates the need for an Size: 24" H x 19" W x 13.5" D (61 x 48 x Ethernet cable. 34 cm) Weight: 110 lbs (50 kg) Transceiver drop cables can be of any length up to 50 meters. An Ethernet cable can be as Annex Network Communications Computers long as 500 meters, and transceivers can be installed at 2.5 meter intervals. The Annex Network Communications Com­ puter, shown in Figure 2-9, is an intelligent, The Annex design confers several benefits on Ethernet based device for providing gateway, the Multimax system: terminal and line printer services to the Multimax (and to other UNIX systems). Acting • Annexes are small, free-standing, user- as terminal concentrators, Annexes enhance installable computers that can operate in system performance by offloading burdensome ordinary office environments using nor­ character processing and protocol conversion mal office power facilites. tasks from the Multimax. • Because each Annex has its own powerful NS32016 processor, adding Annexes effec­ Terminals attach to Annex computers, which tively adds to the total processing power of can be geographically distributed on a net­ the system. Annexes permit more termi­ work, and communicate with given hosts via nals to be serviced by the system without the Annex to which they are connected. Once performance degradation because key­ all available Annex ports have been allocated, board interrupts (and corresponding additional terminals can be supported simply

Multimax Technical Summary 2-13 Hardware Summary Description

18 in.

Connectors (16 Serial Lines)

Figure 2-9 ' The Annex

character echoing responsibilities) are not Required rear clearance (for connectors passed to the host with every keystroke. and cables): 6" (15 cm) • Printers (both low and high speed) can be Power Requirements: 120/240 VAC; located where they are needed along the 1.85/0.9 A Ethernet without the need for long and expensive serial or parallel cables and Gateway Computers corresponding dedicated interfaces at the processor end. The LAN-based Encore system architecture allows the use of gateway computers for Wide • Any terminal or modem can access any Area Network (WAN) interconnection. The Multimax or any other 4.2BSD host on the gateway takes responsibility for translating Ethernet. between the TCP/IP Ethernet protocol used on • Any Annex user can initiate multiple the Multimax LAN and the outside protocols to sessions on different hosts and switch con­ which it acts as interface. By placing these veniently among them. functions in a gateway computer on the LAN, access to the WAN can be shared by all computers on the the LAN which support TCP/IP Annex Summary Specifications - whether Encore-supplied or not. • Size: 14" H x 4.25" W x 18" D (35.6 cm x 10.8 cm x 45.7 cm) The gateways are high performance units, able to accommodate data transfer speeds as • Weight: 22 lbs (9.9 kg) high as 56Kbps. In the current imple­ mentation, they are available for interfacing between the Multimax Ethernet on the one

2-14 Multimax Technical Summary Hardware Summary Description

Multimax System

mmRS232 Devices RS232 Devices RS232 Devices Transceiver Cables (no Ethernet required) ~r Loopback Connector

Annex I Annex I Annex I Annex

ijiWiwRS232 Devices RS232 Devices RS232 Devices RS232 Devices

Figure 2-10 Alternative Annex Installations hand and either X.25 or IBM 3270 protocols on Display: the other. • full 19" diagonal monitor, P104 non-glare phosphor, tilt and swivel • 1056 x 864 monochrome bitmapped pixels HostStationllO • 60 Hz non-interlaced refresh The HostStationllO is an advanced graphics • up to four different screen partitions, each terminal that offers unique visual capability independently configured for host based applications. It features a large, Graphic Screen Partitions: adjustable from high resolution display capable of mixing text 0-1055 pixels horizontally and 0-863 and graphics, as well as a detached low profile pixels vertically. keyboard, and electronics based on the NS32000 family. It supports a wide range of Character Screen Partitions: shown in Table connections, including mice, tablets, and up to 2-1. four simultaneous hosts. Finally, it offers Connections: downline loadable fonts for document Ports: 3 serial asynchronous, full preparation and computer-aided publishing duplex - each 110 baud to 38.4 applications. kilobaud

Multimax Technical Summary 2-15 Hardware Summary Description

Table 2-1 HostStation 100/110 Character Screen Partitions

Character Mode Matrix Pixels Rows x Columns Total

Small 4x6 3x5 144x264 38016 Ledger 6x8 5x7 108x176 19008 Standard 8x12 6x9 72x 132 9504 Large 12x16 9x 13 54x88 4752 EM100W 10x20 7x 14 43x 105 4780 EM100N 6x20 5x 14 43x176 7568

Protocol: ANSI 3.4 ASCII and control Cables and Connectors ANSI 3.41 extensions ANSI 3.64 escape sequences • Ethernet cable in 117 meter lengths ReGIS high level graphics • Ethernet cable couplers, allowing con­ with extensions struction of longer segments by joining VDI, GKS cable sections Emulators'. DEC VT100, ReGIS, Tektronix 4010 and 4014 • Ethernet transceivers, transceiver cables, and terminators Dimensions: Footprint: 16.5" x 16.5" (42 x 42 cm) • Miscellaneous data cables for external Weight: 68 lbs (31 kg) mass storage devices, terminals, printers, Height: 21" (54 cm) at maximum tilt etc. Keyboard: Type: Low profile, DIN standard, detached Layout: 105 keys, programmable functions

Band Printers

• 300,600, or 900LPM • 96 character set options • Cassette ribbon cartridge • Membrane touch control panel Dimensions: 300 LPM: 13" H x 31" W x 27" D (33 x 78 x 67 cm) 600/900 LPM: 44" H x 31" W x 27" D (110 x 78 x 67 cm) Weight: 300 LPM: 140 lbs (64 kg) 600/900 LPM: 240 lbs (110 kg) Power: 300 LPM: 120/240 VAC; 400 W (operating); 200 W (standby) 600/900 LPM: 120/240 VAC; 600 W (operating); 200 W (standby)

2-16 Multimax Technical Summary Chapter 3

Software Summary Description

from the AT&T-licensed UNIX operating system. INTRODUCTION TO UMAX 4.2 and UMAX V UMAX 4.2 is compatible with UNIX 4.2bsd The aggregate speed of the Multimax pro­ (developed at the University of California at cessors, together with the speed of commu­ Berkeley), currently the standard among nication between the processors, memory, and scientific, technical, and academic computer I/O devices, creates the potential for high users. UMAX V is compatible with UNIX System system performance. To be fully realized, V (supported by AT&T), the de facto standard in however, this potential requires system soft­ office and commercial environments ware that manages these physical resources in a fast and economical fashion. This chapter Principal UMAX features include: surveys the key ways in which the system • The key user interfaces, networking software works with the hardware and also capabilities, system services, and utility provides a few glimpses into how the multi­ programs that are found on other UNIX processing system supports simultaneous, 4.2bsd and System V systems exist in high-performance execution of both "sequen­ UMAX 4.2 and UMAX V on the Multimax. tial" and "parallel" applications. • Programs that run successfully on other Two multiuser, multiprogramming operating machines under either version of UNIX systems are available for the Multimax: UMAX generally run without changes under the 4.2 and UMAX V. Both ensure that multiple corresponding version of UMAX on the processors can share all of the basic physical Multimax. resources of the machine at extremely high • The same networking facilities are pro­ speed. All Multimax physical resources vided on both UMAX 4.2 and UMAX V. These (processors, memory, I/O capacity, and bus two operating systems are compatible bandwidth) are available to every application, with one another across local and wide with mechanisms for computation and area networks and are also compatible communication that degrade minimally as with other UNIX-based systems that requirements grow. support the Internet protocol family. In addition, these networking facilities Both UMAX 4.2 and UMAX V promote simple, communicate with Annex Network efficient, flexible, and productive software Communications Computers for terminal, development, as well as high-performance line printer, and gateway support. execution environments for both sequential and parallel programs. Both were derived

Multimax Technical Summary 3-1 Software Summary Description

MAJOR FEATURES OF UMAX which can contain files (for example, "/filel") and directories (for example, "/dirl"). Direc­ Both versions of UMAX provide the major tories can contain any number of other features of modern, multiuser, multiprogram­ directories or files, such that "/dirl" can con­ ming operating systems. Both also include tain both file 7dirl/file2" and directory performance enhancements to make effective 7dirl/dir2", which can in turn contain use of the Multimax architecture. additional directories and files, and so on. The resulting, complete set of directories and files Multiprogramming and Multiprocessing in a system forms a single tree, which can be as deep or wide as cirumstances require. On a typical multiuser uniprocessor, the oper­ ating system spends much of its time emulating a multiprocessor - that is, it simu­ lates parallel execution of users' programs and system utilities. This is traditionally called multiprogramming, and it means sharing a processor by transparently switching system dir 1 filel usr projects resources among several programs. Most large

computers operate in this way. /\ /\ /\ dir2 file2 sally john projl proj2 Both versions of UMAX on the Multimax pro­ vide true multiprocessing. That is, both Figure 3-1 support truly simultaneous execution of user UMAX Tree-Structured File System programs on multiple processors, and do so with far less time-consuming switching of The ability to extend the file system's depth or process context than multiprogramming uni­ breadth indefinitely provides great flexibility. processors must perform. Typically, a user ("sally" or "john" in Figure 3-1) groups related data files (or directories) The advantage of multiprocessing is that, in the same directory (or tree branch), to while multiprogramming appears to give facilitate easy access by both users and pro­ simultaneous attention to several programs at grams that manipulate these files. Not only once, multiprocessing actually delivers simul­ will every user normally maintain a private taneous attention to those programs. As a branch of the overall file system tree for result of multiprocessing, overall Multimax personal files, but members of a project often system throughput (in terms of the number of store project-related files in a separate, well jobs executed per unit time) increases with the known branch of the overall tree. number of processors in the system. UMAX'facilitates protection of files by a com­ prehensive identity based system, which Memory Management Support designates any sub-tree or individual file as

Both versions of UMAX take full advantage of shareable for reading, writing, or execution by the Multimax demand-paged virtual memory, other selected groups of users (or all users, if to provide up to 16 megabytes of virtual required). address space per process. Powerful Command Languages ("Shells")

Hierarchical File System Both versions of UMAX offer a choice of two flexible, easy-to-use command languages. UMAX facilitates rapid user access and sharing of data on mass storage devices through a These "shells" are, in effect, high level simple naming and protection scheme - a languages with many of the features found in hierarchical file system, which supports direc­ conventional programming languages. The tories and data files organized into an user typically issues commands to the system arbitrarily large tree structure (see Figure by typing them to the shell in a syntax that is 3-1). The tree is rooted in a directory called "I", essentially the same for all shells:

3-2 Multimax Technical Summary Software Summary Description

command options argl arg2... displayed on the terminal. Moreover, it allows users to arrange for a program receive A powerful feature of shells is that they accept consistent input from a file, eliminating the lists of commands in previously stored need to type it in repeatedly. "scripts" whose execution can be governed by flow control, parameter passing and substitu­ Process I/O can also be directed to or from tion, and other programming language fea­ another process, in a construct known as a tures. Such command scripts are typically pipeline or "pipe," whose symbol is "|". This invoked as is the command above, but permit feature makes the output of one program the the execution of potentially lengthy, complex input to another, in effect building a daisy procedures composed of many commands. This chain of programs that can behave like a new, capability is useful both for production pur­ larger program. With a slight modification to poses and for application prototyping. In the above example, the lines of output addition, the standard shells can be replaced containing "December" can also be sorted with user-developed programs, enabling the alphabetically: substitution of custom interfaces that are grep December in.file | sort > out.file tailored to meet specific needs. This ability to form new programs by con­ Efficient, Flexible Process Management catenating previously existing programs is one of the keys to the success of UNIX and its UMAX shells offer "job control," or the ability to derivatives. The command set effectively prescribe the order, timing, and priority of becomes a collection of building blocks that program execution with simple commands. can be quickly and easily assembled to Program execution can be temporarily perform specialized tasks without costly suspended with job control, to enable a detour programming. to a higher priority activity without losing work in progress. The user also controls whether a program or group of programs High Level Languages should run sequentially, run asynchronously High level languages speed program develop­ as background processes, or be batched for ment and reduce programming costs by execution at a later time. making programs more concise and by hiding the low level details of a particular machine Redirectable Input/Output environment. Programs written in high level languages are almost always more reliable, UMAX shells support dynamic redirection of easier to debug, and far less costly to enhance process input and output to and from devices, or maintain. Because high level code is files, and other processes. The user's terminal inherently portable from one computer to screen and keyboard are the default devices another, programs written for other compu­ with which the system exchanges data ters can be moved to the Multimax, either interactively. However, for the duration of a with or without being parallelized for better command, a set of commands, or a shell script, performance. Similarly, non-parallelized pro­ I/O can be directed to or from flies (or physical grams developed in the rich UMAX environ­ devices such as magnetic tape), by means of ment can be ported back to older computers, the redirection characters "<" and ">". For such as the VAX. example, the program grep captures those lines of input that contain a given pattern of No single language best serves every purpose. characters. The following command will Both versions of UMAX include the C pro­ output to file out.file those lines of file in.file gramming language and a full, enriched, C that contain the word "December": program development environment. Both offer grep December in.file > out.file optional ANSI standard Fortran-77, and ISO standard Pascal. This feature can be used to capture a program's output in a file instead of being

Multimax Technical Summary 3-3 Software Summary Description

The structured and portable C language was • ex- An interactive, command driven, line created at Bell Labs to improve programmer oriented text editor; a companion to the productivity and replace cumbersome assem­ display oriented vi. bly languages for time-critical, system level • edit - A simplified version of ex for begin­ programs. ANSI standard Fortran-77 is the ning users. most common choice for scientific and technical applications. ISO standard Pascal • vi- An interactive, display oriented text combines the computational ease of Fortran editor that causes the terminal screen to with the structure and rigor of C; strong act as a window into the file being edited. typing and ease of verification recommend it Changes are immediately reflected on the wherever high reliability or formal validation screen. are needed. • emacs - An optional interactive, display oriented, multi-window editor with All the high level languages have advanced, dynamically extensible commands. highly optimizing compilers that greatly improve on the runtime speed and compact­ ness of earlier programming language sys­ General Purpose Tools and System Services tems. Most programs run significantly faster Both versions of UMAX include approximately and occupy less main memory after simply 200 general purpose and user productivity being recompiled with an optimizing compiler. tools, as well as system calls and runtime libraries that perform a wide range of services. Alongside their respective compilers, each A complete list of available tools for UMAX 4.2 optional language has a full runtime system, and UMAX V appears in Appendices B and C. including a user-callable runtime library. Some categories are as follows: Pascal includes a complete programming environment with its own debugging, main­ • Flexible, powerful document preparation tenance, and documentation tools. One power­ and text processing programs that allow ful, symbolic debugger handles both C lan­ rapid creation of structured text and guage programs and Fortran-77. graphics for output to a variety of hard- and soft-copy devices. Several standard All the high level languages generate object text macro packages are supplied as well. files using the Common Object File Format • A source code control system, known as (COFF), sponsored by AT&T. Therefore, pro­ RCS, helps manage software development grammers can mix languages within certain and documentation efforts. guidelines, coding each module in the most suitable programming language. Programs • Automatic reconfiguration and automatic are highly portable from UMAX 4.2 to UMAX V, bootstrap capabilities that operate generally with no significant changes. without manual intervention, allowing the Multimax to run unattended.

Text Editors System Administration Facilities Encore provides the following text editors: UMAX provides all of the pertinent system • ed- The original, command driven, inter­ administration utilities that prior users of the active line oriented editor created for UNIX Berkeley or AT&T UNIX distributions might users at Bell Labs. expect (see Appendices B and C for details). A • sed - A stream editor, based on ed, which few of these programs have been enhanced to supports complex modifications of single better meet the requirements of the Multimax. text files or groups of text files. These Moreover, several new utilities have been modifications can be made by means of added. Some of the more salient of these new arguments in the sed invocation or or enhanced utilities are discussed below. through predefined scripts. devconfig (new) - A screen oriented, menu driven, interactive program that gathers

3-4 Multimax Technical Summary Software Summary Description

information about the devices configured • Gives full control over which compo­ into a system and provides reasonable nents are monitored, in what detail, at defaults for values that the user does not what interval, over what period of time, specify. etc. • Centralizes functions that other UNIX • Permits devices to be referenced in the systems scatter among a host of unre­ UMAX 4.2 format "/dev/users" rather lated, uncoordinated utilities. than the UNIX 4.2 format "/dev/mdOg". • Stores enough device information in the sysparam (new) - UMAX 4.2 implements system to allow the Multimax to auto- many system resource values as para­ boot the operating system without meters that can be specified at boot time. operator input. Sysparam sets or alters the values of the • Facilitates and automates the creation boot parameters and can change many of "special files" (/dev/xxx). system resources without rebuilding the operating system. fsck (enhanced) - Performs the usual con­ sistency checking and interactive repair functions, but with the following Networking Facilities enhancements: UMAX provides access to other computer sys­ • Defaults to block devices; no need to tems using standard Internet (TCP/IP) net­ reboot after checking root. working protocols. Also supported is access to • Checks all partitions simultaneously, in other computers on the Ethernet or, via a parallel, for faster system restart. gateway computer, to computers connected to • Eliminates unnecessary checks after other networks. normal system shutdowns. • Improved diagnostics identify faults UMAX also has utilities for remote login, file precisely, in greater detail. transfer, electronic mail, and remote com­ mand execution, both over Ethernet (using the partition (enhanced) - Allows disks to be Internet protocol suite) and over dial-up tele­ repartitioned while UMAX is running in phone lines (using the UNIX-standard uucp multi-user mode. Other features: protocols). • Eliminates the need to rebuild device drivers. Annex Network Communications Computers • Allows controller code to be updated in support distributed terminals and line multi-user mode without affecting other printers. Distributed X.25 and IBM-3270 data on the disk. gateway service is also available. • Straightforward interface provides many default parameters, eliminating Distributed Intelligent Peripheral Control tedium and errors. • Fully integrated with sysboot and In general, any serial port on an Annex can devconfig programs. access any Multimax system on the Ethernet. Although Multimax system managers can sysboot (new) - Allows system managers to "bind" certain Annex ports to a particular direct system startup. Features: Multimax, nothing in the system architecture • System disk-building commands prevents any terminal on an Annex from include formatting, partitioning, and being used with any available Multimax. A root file system creation. terminal can also communicate with non- • Allows specification of alternate UMAX Encore computers on the network, provided image and boot parameters. they are running Berkeley 4.2 UNIX or TCP/IP and either the rlogin or the telnet protocol. sysmon (new) - Allows performance monitor ing of all major system components, both A user on a terminal converses with a local hardware and software, singly or in combi­ Annex "shell." From the shell, the user can nation. Features: request a connection to an available Multimax

Multimax Technical Summary 3-5 Software Summary Description

(or other) host. The Multimax is notified, the The ALLY Software Development terminal becomes allocated to it, and the Environment Multimax then runs a standard login process. ALLY is a fourth generation software environ­ The Annex also supports multiple simulta­ ment combining "programmerless" applica­ neous connections from a single terminal. A tion development with terminal-independent, user can log in to one Multimax machine, type window-oriented screen management - all an attention character, and issue an Annex designed to ease development and main­ command to switch his terminal connection to tenance of interactive software systems. another Multimax machine. This procedure Providinga set of features that are not provides the user with direct access to the available in any other high level application second machine, precluding the need for rout­ development tool, ALLY allows applications to ing transmissions through the first machine. be written 10 to 30 times faster than with procedural languages such as COBOL. A few of The following commands are available in the its features: Annex control software: • Runs on almost any computer, under most operating systems, and therefore gener­ attention Typing the "break" character ates highly portable applications. causes the Annex to suspend the job in progress and prompt for a • Works with any terminal and is not tied to command. a specific database system. call Makes a connection to a Multimax • Includes a window oriented task manager host and allows normal login to that provides windows on standard ASCII that host. terminals. fg Returns to any "foreground" job • Allows interaction with programs written (such as a session on an Ethernet in traditional languages such as COBOL. host machine) that has been • Allows the user interface to be tailored to suspended with the attention the application. command. • Applications run as fast as similar pro­ help Displays a brief description of each grams written in conventional third- command. generation languages. hosts Shows the hosts on the network and their current status. UMAX PERFORMANCE jobs Lists current jobs for the current UMAX 4.2 and UMAX V are examples of highly port. parallel applications making the benefits of a kill Ends a job. large multiprocessor transparently available to the timesharing user. Moreover, the oper­ rlogin Makes a connection to a host using ating systems support many simple tech­ the rlogin protocol. niques that users can employ to explicitly stats Displays information about net­ invoke parallel operation beyond that which work traffic and Annex perform­ the system automatically performs (see Chap­ ance statistics. ter 4 for more detail).

stty Sets terminal parameters. The high degree of parallelism in UMAX frees telnet Makes a connection to a host using application programmers from concern about the telnet protocol. operating system bottlenecks. If multiple processes issue simultaneous system calls to ? Same as help. execute a parallel algorithm, these calls never interfere with one another except at the finely

3-6 Multimax Technical Summary Software Summary Description

grained level of access to individual data structures in the kernel. Applications layer System call access layer

Symmetrical Multiprocessing

If UMAX were designed to respond to one user job's operating system service request at a time, multiple simultaneous requests would have to be queued and then acted upon one at a time, resulting in delay of execution of the requesting programs.

Such a potentially large system bottleneck has been avoided in UMAX by allowing all Multiprocessor UIVIM/V >uu*jy>icim processors running programs to perform oper­ primitives core (scheduling, virtual memory, networking, ating system services on their own, simul­ Annex access, etc.) taneously. On the Multimax, all processors are equal; all can service interrupts; all can Figure 3-2 run the operating system; and all can run any UMAX Layering user's process. Such symmetry of operation between processors is critical to system prohibit read access either while any write performance. is pending or in progress .

The UMAX file system illustrates these mech­ UMAX exploits symmetrical multiprocessing by a technique called multithreading, that is, anisms. When multiple processes need to read allowing simultaneous streams of control for (but not modify) a directory concurrently, all processes through the operating system there is no reason to prevent simultaneous kernal. The principal difficulty in multi­ access. Processes performing directory lookup threading an operating system is providing will therefore acquire a read lock while read­ controlled, concurrent access to shared ing the directory. When a process needs to resources. Multithreading requires locking of modify the directory structure, perhaps by shared resources and synchronization between deleting a file, it acquires the write lock for processors that try to gain access to these that directory. Access to other unrelated direc­ resources on behalf of processes. tories can proceed in parallel.

UMAX supports multithreaded operation Scaling Performance to Large Configurations through a set of core primitives (see Figure 3-2) that provide synchronization between The Multimax architecture supports large user populations making heavy demands on a UMAX processes and Multimax hardware. This is done with three basic mechanisms: very large set of computing resources. For that reason, certain algorithms for manipulating • Spin locks - execute tight instruction loops shared data in the operating system are until the expected condition occurs. Used designed to minimize the frequency and only for critical, short-duration events. duration of accesses (and therefore of mutual • Semaphores - suspend a process until the exclusion through locking) to these data. resource it needs is available. Two techniques enhance data manipulation • Read/write locks - control access to data algorithms: caching and fine-grained locking. structures for a single writer or multiple Resources that are likely to be used fre­ readers. Read/write locks are special quently, such as file and directory entries, are semaphores that prohibit write access cached by UMAX. For resources that are tables, until all pending reads are complete, and it is appropriate in some cases to lock individual entries rather than the whole table. In other cases, kernel tables have been divided

Multimax Technical Summary 3-7 Software Summary Description

into subpools of entries, linked together and Multimax and the Annex, these routines are located by hashing. This minimizes search used by Encore's version of emacs to times for tables that can have huge numbers substantially improve editing performance. of entries. It also decreases bottlenecks in accessing table entries, since subpools are The remote editing protocol provides character locked rather than the whole table. and line editing operations for data already on the screen. This includes character echoing, cursor positioning, character insertion, and Terminal Performance on the Annex line insertion. The protocol also provides The UNIX terminal driver is distributed so that resynchronization with the host when the user the processing power of Annex computers can moves to another screen, or performs some be used to reduce terminal character process­ other operation that cannot be performed ing overhead. locally by the Annex. At resynchronization time, the protocol updates the host file (with Annexes are initialized by downline loading edits since the last synchronization) by send­ the necessary terminal and printer drivers ing a few well-formed packets from the Annex from a Multimax or other host on the network. to the Multimax. Annexes can thus assume the major terminal support burdens. This feature greatly Performance advantages of this architecture improves overall system performance because are that the Annex effectively batches it avoids much of the interrupt handling and multiple atomic editing operations. There is computational overhead incurred by character no load on the host until a re-synchronization processing in non-distributed architectures. event occurs. For programs that perform intensive character processing, the remote The UMAX terminal drivers were designed to editing protocol provides a big performance work optimally with the unique capabilities of advantage over systems that must handle the Annex hardware. A proprietary, message each keystroke in "raw" mode. based protocol between Multimax systems and Annex computers passes and synchronizes the Parallel Utility Programs character data and control information in a way that enables most of the terminal driver Enhanced UMAX versions of standard UNIX functions to execute in the Annex. When line utility programs take full advantage of Multi­ editing and echoing of characters can be per­ max parallel processing capabilities. For formed by the Annex, the Multimax receives example, the make utility automatically does only well-formed packets rather than indi­ as much complilation and linking in parallel vidual characters requiring extensive pro­ as possible - and thereby minimizes the time cessing. This performance advantage is partic­ spent rebuilding an application. A parallel ularly valuable in large system configurations implementation of the grep utility (see Chap­ supporting many terminals. ter 4) is another example of user-transparent acceleration of standard UNIX commands. This architecture also supports gateways. Outgoing and incoming calls to and from X.25 and SNA networks are provided. These capa­ bilities can be extended to other communi­ cations domains.

Annex Remote Editing Protocol For achieving high performance on interactive programs, such as screen editors, Encore offers a library of remote editing routines that run in the Annex but are callable from a Multimax program. Implemented by a proprietary, high performance network protocol between the

3-8 Multimax Technical Summary Chapter 4 Parallel Programming on the Multimax

OPPORTUNITIES FOR APPLIED PARALLELISM of the application program to achieve substantially improved performance. Parallel processing can be modeled on a uniprocessor, but only with special effort, since the operating system must spend Functional Partitioning significant amounts of time switching among Many applications are inherently parallel. tasks to simulate parallel execution, and this Simulation problems, realtime process con­ time must be subtracted from the time avail­ trol, and transaction processing, to name just able for doing substantive processing. True a few, all involve programs that could run parallel processing of the sort provided by the simultaneously because they deal with events Multimax offers substantial benefits to a large outside the computer, and real world events number of applications. These can be reduced often occur simultaneously. These applica­ to the three types discussed below. tions can be divided into several different functional units, which can execute in Data Partitioning parallel. Many applications have no inherent require­ ments either for parallel or for serial exe­ Pipelining cution, but nevertheless can run faster by A third class of applications that profits from using parallel programming techniques. Oper­ parallel execution is characterized by process­ ations involving matrices (for example, fast ing a data stream serially through several Fourier transforms, solving partial differen­ functional units, each of which could execute tial equations), searching, and sorting are in parallel. This is a technique referred to as typical problems for which both parallel and "pipelining." For example, UMAX supports serial algorithms have been developed. For several different text formatting packages: the example, the sorting algorithm known as eqn package converts mathematical equations "quicksort" operates by recursively parti­ to typesetting commands; tbl generates tioning the file to be sorted, sorting those typesetting commands to draw tables; troff partitions, and then combining the results. takes as data the typesetting commands, While quicksort can run serially (and most performs page layout and produces the output often does), nothing prevents the partitions to drive a phototypesetter. A document that from being sorted in parallel by several copies contains text, tables and equations can be processed by running it through these pack-

Multimax Technical Summary 4-1 Parallel Programming on the Multimax

ages in a pipelined fashion to gain the benefit surately long periods of computation between of having the stages all running in parallel. synchronization points to minimize the relative cost of the mechanism with respect to the "useful" computation. Conversely, parallel REQUIRED SUPPORT FEATURES threads of control that must synchronize every Parallel programming requires three key sup­ few instructions should have an efficient and port features: economical synchronization mechanism. • Multiple threads of control - that is, mul­ tiple "execution units" which have separ­ TYPES OF PARALLELISM ate existence and hence can be executed in One can characterize parallelism in terms of parallel. The term process will be used to the length of the period between synchro­ refer to a thread of control. nizing events. This has been done by Bell in • Data communications - the means by the preface to this book, from which Table 4-1, which the parallel processes share the below, is excerpted. This table characterizes data common to the problem being solved. parallelism in terms of "grain size," the period between synchronizing events for parallel • Synchronization mechanisms - which per­ activities. The sections that follow examine mit parallel accesses to shared data to these four types of granularity and describe occur in a controlled, orderly, and econom­ the corresponding facilities provided by the ical fashion. Multimax hardware and the UMAX operating systems. A fourth feature, not required but highly desirable, is that the facilities for parallel pro­ The grain-size characterization in Table 4-1 gramming be easy to use. assumes the parallelizing of one application. A fifth category can be added to this table if one The third key item above, particularly the chooses to consider the parallelism inherent in economy component, is important to the the workload generated by multiple users effective implementation of parallel program­ working simultaneously on a system. At the ming. The higher the runtime cost of a user level, these activities are independent particular synchronization mechanism, the and have little or no synchronization; hence longer the intervals between synchronizations the grain size can be considered infinite. Since ought to be. An expensive (that is, time such independent but simultaneous activity is consuming) mechanism should have commen- a common characteristic of multiuser systems,

Table 4-1 Grain Size in Parallel Computing Structures

Encore Computer Construct for Synchronization Interval Grain Size Structures to Support Parallelism (instructions) Grain

Parallelism inherent in a <20 Specialized processors single instruction or (e.g., systolic or array data stream processors) added to Multimax

Medium Parallel processing or Multimax multitasking within a single process

Coarse Multiprocessing of con­ 200-2000 Multimax current processes in a multi-programming environment

Very Coarse Distributed processing 2000-1M Multiple Multimaxes and across network nodes other vendors' machines, to form single on Ethernet computing envi­ ronment

4—2 Multimax Technical Summary

I Parallel Programming on the Multimax

and is also one for which the Multimax Hardware Support for Independent architecture provides substantial benefits, the Parallelism next section describes the hardware and software support for multiple independent High performance parallel execution of inde­ parallel processes. pendent processes makes many demands on system hardware. The Multimax hardware was designed explicitly to meet those Independent Parallelism demands. It provides: Modern operating systems like UMAX support • Multiple processors. multiple concurrent users who generate a load on the system consisting of many independent • Shared memory. Not only do all pro­ processes. On a typical uniprocessor, the cessors have access to all of memory, operating system spends much of its time caches on each Dual Processor Card simulating the parallel execution of those minimize processor access latency to data processes. Traditionally called multiprogram­ and instructions, while bus-watching ming, this activity requires the sharing of hardware insures data coherency across resources such as the processor among the all caches. competing processes, with the operating sys­ • Bus bandwidth to support full-speed oper­ tem transparently switching these resources ation of all processors. The Nanobus has a between those processes. The uniprocessor sustained bandwidth of 100 megabytes per operating system can consume a significant second, a speed which easily meets this portion of available processor cycles in imple­ requirement. menting these context switches. • A fast interrupt vector bus. The Multimax vector bus can handle burst rates of up to Software Support for Independent 1 million interrupts per second. Parallelism • Symmetrical multiprocessing. All pro­ Both UMAX 4.2 and UMAX V provide all the cessors on the Multimax are equal, all can major features of modern, multiuser operating service interrupts, all can run the oper­ systems. Both offer multiple processes, and ating system, and all can run any user's both use multiprogramming. However, since process. Such symmetrical operation the Multimax has more processors, each between processors is critical to system supports fewer users than would a similarly performance. loaded uniprocessor - and therefore the Multi­ max performs less context switching. As a • Fast I/O data transfer. Under UMAX, all result, the overall Multimax system through­ processors can initiate I/O, and all pro­ put (in terms of user processes executed per cessors are available to respond to inter­ unit time) increases with the number of rupts that signal the completion of an I/O processors in the system. event. Maximal concurrency of operation between users' jobs and processors requires that all Very Coarse Grained Parallelism programs be able to receive operating system Very coarse grained parallelism categorizes services at all times, with minimal delay. If those parallel applications in which the period UMAX were designed to respond to the between synchronizing events is separated by operating system requests of a single user several thousand instructions. Distributed process at a time, multiple requests arising processing across the computing nodes in a from the multiple users would have to be network is a good example of the application of queued. UMAX avoids such a potentially large very coarse grain parallelism. Since the cost of system bottleneck by allowing all processors synchronization of network communications is to perform operating system services high, there must be long periods of compu­ simultaneously. In other words, UMAX itself is tation between synchronizations. a good example of a parallelized application.

Multimax Technical Summary 4-3 Parallel Programming on the Multimax

Software Support for Very Coarse Grained can participate with the Multimax in very- coarse grained parallel processing. Parallelism

The UMAX operating systems provide a num­ Coarse Grained Parallelism ber of system calls to facilitate network com­ munications. Among these calls are: Coarse grained parallelism identifies those applications in which the interval between • socket - establishes an end point for com­ synchronization points is an order of mag­ munications, the type of communications nitude smaller than that of very-coarse (for example, streams of bytes or data­ grained parallelism - that is, on the order of grams), and the protocol to be used for hundreds to thousands of machine instruc­ communications. tions. Because communication and synchro­ • listen, accept - establish two-way com­ nization overhead for networks is relatively munications over sockets. high, coarse grained parallelism tends to become an issue for operations within one • send, recv, read, write - exchange data Multimax system. between the cooperating processes. Synch­ ronization between the producer of data Applications amenable to the pipelining and the consumer can be automatically strategy discussed earlier in this chapter are a provided by UMAX. For example, a process particularly appropriate match to the coarse attempting to read from a socket which grained parallelism category. Consider the has no data to be read can be suspended by following problem: a user wants to select a set UMAX and automatically released to of lines of data from a file, sort those lines synchronize it with the arrival of data. based on some ordering criteria, and then print out the results. These needs might be The UMAX operating systems support the satisfied by a single sort program which widely used Internet protocol family. Conse­ attempted to provide as much flexibility as quently, UMAX provides direct and readily possible. But such a program would be accessible support for very coarse grained difficult to use (precisely to the extent that it parallelism that involves both multiple succeeded in being flexible) and even more Multimaxes on a network, and other systems difficult to create in the first place. supporting the same protocols. An alternative solution is for the system to Hardware Support for Very Coarse Grained provide a set of separate utilities, each of which implements a limited subset of the Parallelism desired functions. The user can then combine The primary network supported by the Mul­ these utilities to get the job done. Unfor­ timax hardware is the Ethernet, which tunately, in many systems the task of provides a high-bandwidth channel (10 combining the utilities is difficult, and the Mbits/second) over which the software com­ user is forced into solutions of the form out­ munications protocols can transfer data. Since lined below: Ethernet communications are handled by the S redirect the output of the next intelligent Ethernet/Mass Storage Card, the command into temporary file 1 Dual Processor Cards (and the Nanobus) are $ select "patterns" from input-file relieved of much of the time consuming $ redirect the output of the next responsibility of buffering and forming mes­ command into temporary file 2 sage packets. $ sort "sort criteria" temporary file 1 In addition to the Internet protocols, Encore $ send temporary file 2 to the printer gateway computers allow communication with Using a set of smaller, individually simpler other protocol families such as X.25 and SNA, programs removes some of the problems of the thereby expanding the range of systems that single program approach. However, it raises new problems (like ensuring that the tem-

4-4 Multimax Technical Summary Parallel Programming on the Multimax

porary files get deleted), and again there is no • Multiple control threads. The fork and parallelism available to the user. This second exec calls create a process by loading its approach is essentially a pipelining solution to text and initialized data segments from a the problem, even though its serial file, setting up its stack, and zeroing its implementation via explicitly shared files uninitialized data. The share call precludes parallelism. declares a portion of the process data space to be shareable by all child processes that are created by this program. After invok­ Software Support for Coarse Grained ing share, the parent process can create Parallelism child processes, all of which have access to UMAX provides a superior solution to the above the shared memory space1. problem, using multiple separate parts to In addition, fork creates child processes construct the overall solution while removing that can run simultaneously on additional the complications of putting those pieces processors. Although child processes have together. access to the designated "shared" data, the stack and local data segments are always UMAX directly supports the notion of a data private to each process, so that every pipeline between processes, easily enabling process can maintain its own independent the output of one process to be provided as the thread of control. input for a subsequent process. The pipe system call of UMAX achieves this effect, and is • Data communications. The pipe system used, for example, by the command language call creates a pipeline. At fork time, a interpreters (the "shells"), discussed in child process inherits this pipeline to set Chapter 3. Using this feature, we can solve the up the data channel. entire data selection problem with: The read/write system calls allow one $ grep "patterns" input-file | sort process to put data into the pipe and the "sort criteria" | lpr other process to extract that data. • Synchronization mechanisms. Synchro­ Grep is one of the UMAX utilities which selects nization is provided automatically in particular lines from a file, and lpr is a utility UMAX by the read and write system calls. which sends data to a printer. The "|" If a consumer process attempts a read character is called a pipe. A process is created when there is no data in the pipeline, for each command in the pipeline (three in the UMAX suspends that process, freeing its above example), and the pipe character directs processor for other activity. When data the output from the process to the left of the "|" arrives, UMAX releases the suspended to the input stream of the process on the right process and lets it complete the read call. of the "|". Thus the pipeline user is not obliged to do anything to obtain the synchronization This solution overcomes the artificial ordering necessary for correct passing of data of events imposed by the previous solution. between the parallel processes. Moreover, it lets the UMAX software exercise its potential for simultaneously executing separate processes. Since this form of Hardware Support for Coarse Grained parallelism is easy to achieve, it is extensively Parallelism exploited by users and has proved to be a The hardware support for coarse grained valuable and powerful feature. parallelism has essentially been covered in the previous section on independent paral­ UMAX directly supports coarse grained paral­ lelism. The multiprocessing character of the lelism with the following facilities: Multimax is of course of direct benefit to

tThe share call as described applies to UMAX 4 2. similar capabilities are provided by the shmctl, shmget, shmat, and shmdt under UMAX V. For simplicity, thischapter refersonly tothe UMAX4.2 facility.

Multimax Technical Summary 4-5 Parallel Programming on the Multimax

pipelining, since it enables each stage of the room and lock the door, take an item, and then pipeline to be executed in parallel. Hence the unlock the door and leave. While the door is Multimax can often provide the user with locked, no other person can enter the room. substantially faster pipeline operation than can a uniprocessor system offering the same To obtain mutual exclusion, most modern basic software support. computers provide a locking mechanism that achieves the effect of the locked door. This mechanism is in the form of an uninterrup- Medium Grained Parallelism table, machine instruction, called a test-and- Applications that can exploit medium grained set instruction, which uses a word or byte of parallelism exhibit synchronization intervals (shared) memory as a lock controlling access in the tens to hundreds of instructions range. to shared data. With this size of synchronization interval, the cost of the synchronization mechanism must The test-and-set instruction operates as be substantially lower than that appropriate follows: if the lock being operated on by the to coarse grained parallelism. System calls are instruction is clear (indicating not taken) then likely to be too expensive, both to invoke the instruction sets that lock and returns the synchronization and to read or write shared fact that the lock was originally clear. From data. this result, the process can see that it has now acquired that lock and is free to access the The best solution to these problems is to shared data the lock was protecting. If the lock enable the parallel threads of control to access is already set when the test-and-set instruc­ the same, shared memory. This allows their tion is executed, then some other process has communications to take place with a acquired the lock. The "already set" value is minimum of overhead, since any changes to returned, indicating that the caller must wait. shared memory made by one process are A process can indicate that is has finished immediately visible to the sharing processes. with the shared data by clearing (releasing ) However, it should be apparent that such the lock. sharing cannot take place in an uncontrolled fashion. The sharing processes must synch­ A variety of synchronization mechanisms and ronize their actions in order to ensure data policies can be built from this basic mutual integrity. exclusion mechanism. Before we consider these other mechanisms, however, it is worth As an example of the need for synchronization, examining in detail the software and hard­ consider a list of items being maintained in ware support that that UMAX and the Multi­ shared memory, with two parallel processes max provide for medium grained parallelism. needing to extract items from that list. Without synchronization both processes could UMAX Operating System Support for erroneously remove the same item from the Medium Grained Parallelism list for processing. UMAX provides the three key support features Mutual exclusion is the fundamental synch­ for medium grained parallelism in the follow­ ronization operation for coordinating parallel ing form: accesses to shared data. To understand the • Creation of multiple threads of control. operation of mutual exclusion mechanisms, The fork and exec system calls described consider a room containing a number of items in the previous section create the parallel (representing shared data) that are to be processes needed for medium grained added to or removed by a number of people. parallelism The desired effect is achieved by allowing only one person to enter the room at a time, giving • Data communication. The share system that person exclusive access to the items. To call designates a region of the address support this, one could provide a single door space of a process as being "shared." Such into the room together with a lock. To acquire shared regions are inherited by child mutual exclusion, a person would enter the processes created by forks. The child

4—6 Multimax Technical Summary Parallel Programming on the Multimax

processes get private copies of all of the system intervention. However, if the pro­ code, data, and registers of the creator, grammer desires forms of these synch­ except for those sections of the address ronization mechanisms which block process space which were designated as being execution, UMAX provides system calls per­ shared prior to the fork. Since the creator mitting their construction. and the created processes share those por­ tions of the address space (see Figure 4-1), Hardware Support for Medium Grained any changes made by one process are Parallelism immediately visible to the sharing pro­ The multiprocessing architecture of the Multi­ cesses). The share call may also be max provides direct support for medium- invoked by child processes, creating addi­ grained parallelism. The common memory tional shared segments which are visible accessible to all processors directly assists in only to the child and its progeny. implementing the shared memory scheme Note that programs can use shared mem­ described above, and the caches and their bus- ory directly, without having to resort to watching logic provide fast coherent access to convoluted device driver calls, as other data. systems require. The Multimax processor and the Nanobus Synchronization. Since any memory loca­ provide a test-and-set instruction to imple­ tion can be used as a lock, the UMAX oper­ ment the basic mutual exclusion mechanism. ating system does not need to provide any The actual NS32032 instruction, termed an special system calls to establish synch­ interlocked instruction, generates a special ronization mechanisms for use with med­ Nanobus cycle that reads back the requested ium grained parallelism. Any of the byte location (see Figure 4-2). If the specified synchronization schemes described here byte contains zero, the lock is free, and the can be easily programmed for efficient memory card atomically sets that byte to a runtime execution without operating pattern of all Is. If it is not zero, it is assumed

Stack

Unused

Program data

Program text

1. Initial process 2. Process mapping after 3. After two fork calls, mapping dynamic allocation of three processes share shared memory (A) text and the shared segment

Figure 4-1 Memory Allocation for Parallel Programs

Multimax Technical Summary 4-7 Parallel Programming on the Multimax

Spinlocks As its name suggests, a spinlock synchro­ nization mechanism is one in which a process that fails to acquire a lock continually loops back, repeating the attempt until the lock becomes free and the lock is acquired (see Fig­ ure 4-3). Because the first attempt to take the

Figure 4-2 Hardware Locking Sequence to have already been taken by another processor and the interrogating process must wait. Since any memory byte can be used in this way, the programmer has millions of locks at his disposal. Moreover, locks can be conveniently imbedded in the related data Figure 4-3 structures - rather than being separately Spinlock Operation allocated, as in the case of systems which segregate locks to a specialized locking lock reads it into the onboard cache on the memory. DPC, no Nanobus traffic is generated by processors that are "spinning" waiting for Synchronization Primitives for Medium locks to be released. Grained Parallelism The main advantage of a spinlock is that it The Multimax interlocked instruction can be minimizes the time required by a process to used to build a variety of well-known synch­ acquire a lock which is about to be released by ronization mechanisms, such as: another process (at most, the wait time is a • Spinlocks couple of instructions). The disadvantage of spinlocks is that a processor must execute • Semaphores continual attempts to acquire the lock instead • Read/Write Locks of doing other useful work. For this reason, spinlocks in their simplest form are most • Events useful for synchronization when the data is • Barriers locked only briefly. • Monitors Semaphores All of these primitives can be implemented by Semaphores are a well-known synchroniza­ user level code without involving the oper­ tion mechanism first proposed by Dijkstra in ating system. They can also be constructed in 1965. A semaphore is essentially a lock on instrumented forms that gather statistics on which two operations can be performed. These locking times and accesses. operations are usually given the names P and

4-8 Multimax Technical Summary Parallel Programming on the Multimax

V (derived from Dutch terms in the original ing process (that is, a process proposal). that was waiting because of a previous P operation). In The P operation on a semaphore tries to either case, the task which acquire the lock. If the operation succeeds, the executes the V operation may process can access the shared data being proceed. protected by that lock. If, however, the oper­ ation fails because the lock is already held, the For either type of semaphore, the P and V process must wait. operations are atomic, that is, they are executed indivisibly, and while one process is The V operation unlocks a semaphore, relin­ operating on a semaphore, all other processes quishing all claim to the lock and thereby are prevented from operating on it. When freeing any process waiting for that sema­ several processes are waiting on a semaphore phore. The outline of a program using a which is then unlocked, one of the waiting semaphore is as follows: processes is made ready to run; the others continue to wait. Normally, the process which P(semaphore) runs is the first task to have waited; thus semaphores are normally first-in, first-out queues for processes. Manipulate shared data Read/Write Locks Read-write locks are a more specialized form V(semaphore) of semaphore and provide the potential for greater parallelism. To understand their Semaphores differ from spinlocks in that a behavior, let us reconsider the example of the process that is forced to wait because of an room containing a number of items and hav­ already claimed semaphore relinquishes its ing a single door. If several people wanted to processor and is "put to sleep." Thus, in enter the room merely to count the number of addition to performing as a lock, a semaphore items present, they could all enter and count acts as a queue of processes waiting for the in parallel, since counting never changes the semaphore. number of items. All of the people would leave the room with the same answer for the The tradeoff between using a semaphore or number of items; all would present a consis­ using a spinlock has to be decided by the tent view of the data. However, if one person programmer, taking into consideration the wished to add or subtract from the number of length of time for which the shared data is items in the room, that person would require locked, the time taken to suspend and resume exclusive access to the room to prevent incon­ a process, and the cost of having a processor sistencies from arising. spinning waiting for a lock. To implement this behavior, one could replace In its simplest form a semaphore can be the single lock on the door with two types of binary, with the values 0 (unlocked) or 1 lock, one for those who do not intend to change (locked). A more general definition which the room's contents ("read-only" people in encompasses the binary semaphore is the computer jargon), and one for those who do counting semaphore. The operation of a (people who will "write" into the room's counting semaphore is as follows: contents). In this case, the following rules P(semaphore) decrement the value of the could apply: semaphore; if the value is now • If the door is unlocked, a "reader" or greater than or equal to zero, "writer" can enter the room and apply his proceed, else wait. type of lock to the door. V(semaphore) increment the value of the • If the door has been locked by a reader, semaphore; if the value is now other readers can enter the room. Writers, less than zero, wake up a pend­

Multimax Technical Summary 4-9 Parallel Programming on the Multimax

however, must wait until all readers have processes to wait until they have all reached left the room. some common point, then proceed. The only operation performed on a barrier is to wait at • If a writer has locked the door, all others it: must wait until the door is unlocked by that writer. Barrier (b) wait until the other processes have also arrived at this barrier, then This is exactly the behavior that processes can proceed. achieve by using read-write locks on data structures. The advantage comes from the Monitors greater parallelism that can be achieved in Monitors were introduced in 1974 as an oper­ the multiple reader case as compared to ating system structuring concept. A monitor spinlocks or semaphores which always impose consists of some critical, shared data together single-stream access to shared data. with a set of procedures which are the only means for manipulating that data. The The UMAX file system provides one example of procedures are referred to as the entries to the the extensive use of read-write locks. Multiple monitor. Whenever one of these entry proce­ processes may need to read a directory con­ dures is called, mutually exclusive access to currently. As long as no process tries to the shared data is guaranteed by the monitor's change that directory, there is no reason to implementation. In effect, a monitor provides prevent simultaneous read accesses. Processes P and V operations around the body of the performing directory lookup will therefore procedure to provide synchronization. acquire a read lock before reading the direc­ tory. If a process needs to modify the directory User Level Multitasking Library structure, perhaps by deleting a file, that process acquires the write lock for the direc­ So far in the discussion of medium-grain tory. Accesses to other unrelated directories parallelism the notion of a process, as created can of course proceed in parallel. and supported by the UMAX operating systems, has formed the basis for the threads of control Events for parallelism, of which there can be many simultaneously executing. UMAX uses multi­ Events are a synchronization mechanism with programming techniques to allocate the the following characteristics: potentially large number of processes to the Wait (event) process waits until the value of set of available processors. As has been shown, the event is non-zero the programming facilities provided by UMAX are a powerful, efficient, and sufficient set of Post (event) make the value of the event tools for programming applications to make non-zero use of medium grained parallelism. Clear (event) make the value of the event zero However, there is naturally some cost inher­ ent in creating, suspending, resuming and Like semaphore operations, event operations deleting processes. The UMAX overhead to act indivisibly. When several processes are duplicate the facilities of one process in a waiting on an event which is "posted," all of forked process cannot be avoided, even though the waiting processes are made ready. Wait­ many of those facilities may not be needed or ing on a posted event has no effect. Posting a used in a parallelized application. Imagine, for posted event also has no effect. Similarly, example, an application that naturally parti­ clearing a cleared event has no effect. tions into 30 parallel threads of control, each of which simply requires access to shared data. Barriers One simple approach to constructing this parallel application would be to designate the A barrier is initialized with some positive entire data region as shared and then create integer value before use. When that number of 30 processes. However, suppose that many of processes has "arrived" at the barrier, they all the facilities available to these processes are proceed. Thus, barriers permit a number of

4-10 Multimax Technical Summary Parallel Programming on the Multimax

never used in this application. Say, for example, that the threads of control have no need for separate address spaces, separate signal handlers, separate sets of file descriptors and the like. Since UMAX cannot know that these facilities are never used, it must incur the overhead of supporting them for all 30 processes. Although this work load would perform much better on the multiprocessing architecture of the Multimax than it would on a uniprocessor, the load unnecessarily burdens the system.

Also important is the amount of time for which each thread of control will execute. If the number of instructions executed is small, the overhead of process creation and termination may outweigh the advantage to be gained from having a parallel process. Figure 4-4 Task/Process/Processor Relationships

To address these concerns, Encore has created a multitasking library which implements the consider creating very large numbers of tasks, notion of a task as a thread of control. A task is while it may be inappropriate to create that a "light-weight" or "cheaper" thread of control many processes. than a process; it does not have all of the features of a process, and consequently it has The features provided by the Multitasking less overhead. A task is analogous to a library are a natural extension for parallelism procedure in a program, except for the to the model of (serial) computation already important difference that it can execute in well understood by the programmer. The basic parallel with its caller. Like a procedure, a procedure-like mechanism of tasks simplifies task has arguments passed by its caller. It can understanding of what a task is and what it have local variables, it has a private stack, can do. In addition, the routines in the library and it can access global data that can be can be easily incorporated into programs shared with other tasks. To draw an analogy written in C, Fortran or Pascal. The result is with a programming language, a task is to a that the tasking facilities are very easy to use. process as in-line code is to a procedure. For these reasons, a task is less expensive than a Features of the Multitasking Library process; it has, however, some restrictions. The capabilities provided by the Multitasking The task abstraction is separate from the library are briefly described below. UMAX process; a process can contain many • Start a task - creates a "unit of activity" tasks. A task is a unit of activity that shares which may execute in parallel with the the an address space with other tasks in the caller. process. The number of tasks is independent of the number of processes used to execute those • Stop a task - terminates the task tasks. The Multitasking library allows a user • Synchronization mechanisms and their to create a virtually unlimited number of operations: tasks, and to control the number of processes which execute those tasks. Figure 4-4 shows - spinlocks this diagramatically. - counting semaphores -events Because tasks provide fewer features than - barriers processes, they are less costly to create, delete and multiplex. Also, in general, one can

Multimax Technical Summary 4-11 Parallel Programming on the Multimax

• Add a process - create a new UMAX process pattern matching function. This would seem to service the pool of available tasks. to be a good place to use several tasks, rather than several processes. In this decomposition • Remove a process - remove UMAX process of the application, a variant of pipelining, one • Allocate memory - allow tasks to acquire process could be reading in input lines, some shared memory number of pattern-matching tasks could each be working on a separate line, and one output process could collect the matching lines from Parallelizing an Application: An Example the pattern matching tasks and coordinating There are many possible approaches to their display. developing a parallel application or convert­ ing serial applications to a parallel ones. To In fact, the parallelism of creating multiple make some of the tradeoffs concrete, let's processes to work on separate files is already consider an example. provided in the UMAX grep utility. The speedup resulting from this simple The UMAX utility, grep, searches a set of files modification of grep is nearly a linear function for occurrences of a given pattern of char­ of the number of processors, although some acters. The algorithm for the program is degradation occurs because of file access over­ (loosely) as follows: head. In some test cases, the parallelized grep executes 9 to 10 times faster on a 14-processor for each file in the command line: Multimax than does the previous serial grep. a) open the file b) for each line in the file: Fine Grained Parallelism if the pattern occurs on the line, The classification of fine grained parallelism print the line encompasses those cases where parallelism c) close the file can be applied at the level of a single instruction, for example, as provided by There are two obvious loops in the program: special hardware such as systolic or array the inner loop processes the lines of a file, processors. This granularity of parallelism can while the outer loop processes the files whose only be achieved only with special hardware names appear on the command line. In fact, which does not yet exist for the Multimax. there is a third loop which processes the What may be significant to the Multimax characters on each line to determine whether user, however, is that the Multimax is capable or not the line matches the pattern. of supporting cards that include processors of unconventional or exotic design. The Nanobus Several approaches to parallelization may be bandwidth, not to speak of the memory and I/O applicable here. First, there is no connection card architecture, was designed with such between the processing of one file and the additions in mind. processing of the next. This suggests an ideal data partitioning opportunity. A copy of the CONCLUSIONS grep program could be "forked" for each file on the command line. Or, if the expense of The importance of parallel programming can starting and stopping all of those processes only increase as the cost of processors seems too great, the list of Files could be decreases, as physical limitations on the speed divided into sub-groups which could each be achievable by a single processor are reached, processed by an independent copy of the grep and as multiprocessor architectures like the program. Multimax enable the benefits of multiple processors to be used by programmers. The A second opportunity for parallelization is in Multimax hardware and software environ­ the pattern matching itself. If it is determined ment is rich in features to support manageable that the cost of reading in an input line is far parallelism at all grain sizes. exceeded by the cost of matching the pattern to that line, it may be worth replicating the

4-12 Multimax Technical Summary Chapter 5

Multimax Reliability and Maintainability

The Multimax system is designed to be both One reason for the reliability of the Multimax highly reliable in operation and easy to lies in the simplicity of its design. Apart from maintain and repair in the event of failure. Its cabinets, cables, fans, and external peri­ reliability is enhanced not only by extensive pherals, there can be only twelve types of monitoring of data transfers via parity and complex electronic components in any Multi­ ECC logic, but also by close monitoring of max system: environmental sensors within the system. • Nanobus Backplane Because full diagnostics (at both the card and system level) are always available and are • Dual Processor Cards automatically invoked on power-up, problems • Shared Memory Cards which do occur are easily isolated to a field- replaceable unit, which can generally be • Ethernet/Mass Storage Cards installed by non-technical personnel. • System Control Card Although Multimax is not designed to be • Three + 5 Volt Power Supplies "fault tolerant" (that is, it is not built with • One + 5, -12, + 12 Volt Supply total redundancy so that no failure in a single component can disrupt system activity for any • Battery Backup Unit appreciable time), it is designed to be "fault • Disk Controller Cards minimizing." All but the simplest Multimax systems have multiples of almost every • Tape Controller Card Nanobus card type except the System Control Cards. The result is that component failures The result of this simplicity is that there are on processor, memory, or I/O interface cards fewer individual components, fewer solder normally occur in a context that allows other, joints, fewer interconnections - in short, fewer identical cards to take over the tasks of cards opportunities for deterioration to set in. Add to that have failed. The overall system through­ this minimalist architecture the fact that put may diminish somewhat if a processor many of the Multimax boards are redundant card or memory bank is removed from the for basic system operation, and you have a Nanobus, but the system can continue to system whose high mean time between operate until replacement cards are obtained failures and low mean time to repair combine and swapped into the backplane. to produce a system with remarkably high long-term availability.

Multimax Technical Summary 5-1 Multimax Reliability and Maintainability

error detecting, and error correcting on their MULTIMAX SELF-TEST CAPABILITIES own and thus reduce the flow of data across Multimax systems are provided with sensors the Nanobus. It also means that, on command for AC and DC voltage and for temperature. from the SCC at power-up or system reset, each The System Control Card (SCC) continually card can test itself and report on its internal monitors these sensors and can initiate an state. If any card fails its self-test, the SCC orderly shutdown in response to out-of-limit causes it to be logically removed from the conditions. Nanobus and notifies the system software that this action has been taken. System status is indicated by lamps on the Multimax front panel (see Figure 4-1). Status System activity at power-up or system reset is also recorded in non-volatile RAM and made (front panel Start Mode switch set to Auto) available to the UMAX operating system. occurs in the following sequence:

1. The SCC begins its own self-test while all System Self-Test and Configuration other Nanobus cards are held inactive under hard reset. One source of the Multimax system's reliabil­ ity is the fact that each Nanobus card in the 2. Early in the SCC self-test, the SCC releases backplane has its own intelligence and local the backplane reset line. This action storage independent of that on the Dual allows the processors on all non-memory Processor and Shared Memory Cards. This cards to begin their own self-tests. means that Multimax cards can do buffering,

Figure 4-1 Multimax Front Panel

5-2 Multimax Technical Summary Multimax Reliability and Maintainability

Meanwhile, the bus interface logic on each reported, the SCC builds a resource table card is held inactive. for later use by the operating system.

Upon finishing its self-test, the SCC lights If any card reports self-test failure, the SCC the Diagnostic Processor OK lamp on the removes that card from the Nanobus, Multimax front panel. If the self-test lights the Attention Required or System detects minor SCC faults that do not pro­ Fault lamp as appropriate, and records the hibit reliable system operation, the SCC failure for later use by the operating lights the Attention Required lamp. If the system. self-test detects major faults, the SCC 5. When all cards in the backplane have been lights the System Fault lamp and the sys­ dealt with, the SCC initiates and monitors tem waits for the SCC to be replaced. If the a multi-test sequence designed to generate SCC can communicate with the system con­ complex bus traffic. Errors, whether detec­ sole, the operator can request more de­ ted by the SCC or reported to the SCC by the tailed information. (Internal state is also participating cards, are displayed on the reflected by diagnostic indicators on the system console. On multi-test failures, the rear edge of each Nanobus card.) SCC will attempt to recover by deconfig- 3. While non-memory cards are running uring the suspect card or cards and re­ their individual self-tests, the SCC in­ starting the multi-test sequence. structs the Shared Memory Cards (SMCs) 6. The multitests takes from three to five to start the memory self-tests. As these minutes, depending on the hardware con­ tests are passed, the SCC builds a map of figuration. If the system passes these available memory for later use. tests, the SCC searches for a storage device Any SMC failing this test is excluded from from which it can load its own startup the memory configuration and is therefore code. (All activity up to this point has been effectively removed from the Nanobus. If executed from ROM.) This device will nor­ sufficient memory remains for system mally be a system disk drive, although operation, the SCC flags SMC failures by magnetic tape is an option. lighting the Attention Required lamp, 7. The SCC startup program controls the SCC printing an error message on the system during normal system operation. The first console (if available), and recording the action performed by this program is to failure for transmission to the UMAX oper­ load the software for the EMCs. Then the ating system. If insufficient memory SCC bootstraps the UMAX operating system. remains, the SCC lights the System Fault lamp, prints an error message on the sys­ When this is done, the system is fully opera­ tem console (if available), and proceeds to tional. The SCC has a record in its non-volatile test the remainder of the system. RAM of the current card configuration, the In battery backed-up systems returning amount of interleaved and non-interleaved from power failure, SMCs run no tests that memory, and some additional specifications might destroy data. Existing data is in­ passed to it by the software after start-up. stead checked and all single-bit errors are Even if the hardware configuration has corrected. The existence of double-bit changed since the last power-up, no Nanobus errors indicates that the power failure has cards require manual strap installation or exhausted the reserves provided by the switch setting, and no dialogue with the oper­ battery backup system and that a normal ating system is required. (destructive) test is appropriate. 4. Proceeding one Nanobus card at a time, DPCand EMC Self-Tests the SCC enables the bus interface logic on DPCs and EMCs run a similar set of initial diag­ each non-memory card in the backplane. nostics: Each card reports self-test status to the SCC. As successful completions are • A ROM checksum calculation

Multimax Technical Summary 5-3 Multimax Reliability and Maintainability

• A local processor test battery backed-up RAM), or default to the boot list. • A series of RAM tests • Tests specific to the type of card bootmode Accept a bitmask defining the source of images to be used during primary boot. SMC Self-Tests eonfig Add a Nanobus board to the cur­ The SMC has an 8031 auxiliary processor rent configuration. which acts only when commanded to do so by deconfig Remove a Nanobus board from the the SCC. current configuration.

On SCC command, the SMC runs a quick diag Run the ROM based diagnostic (approximately 10 second) self-verify test that system. cycles a limited set of patterns through mem­ ory. This test provides a high probability of dump Dump physical memory starting detecting all memory error conditions. A more at a given address. comprehensive self-test mode is selected when exit Perform normal exit from the an appropriate jumper (intended for use by console command interpreter. Encore manufacturing or field service) is installed. help Display this list of commands. rawboot Boot from the specified device. Annex Self-Tests THE SYSTEM EXERCISER The Annex represents an exceptional case insofar as it is not a card that plugs into the If the operator specifies an appropriate bit- Multimax backplane but is instead an inde­ mask for the bootmode command and then pendent device designed to operate at loca­ executes rawboot, the Multimax will load the tions remote from the host Multimax system. System Exerciser. This provides the ability to Nevertheless, it conforms to the power-up pat­ run many specific tests that stress the system terns of the standard Nanobus cards; that is, much more exhaustively than the self-tests when it is powered up, it runs a ROM check­ run at start-up. The System Exerciser permits sum calculation, a processor check, a series of tests to run one at a time or automatically in RAM tests, and a set of tests specific to itself sequence. and its Ethernet connection.

SOFTWARE PRODUCT RELIABILITY THE CONSOLE COMMAND INTERPRETER Extreme care has gone into insuring the relia­ If power-up or reset occurs when the Multimax bility and maintainablility of Encore's entire mode switch is in the Manual position, the software offering. All diagnostic software, the process is the same as that described for the UMAX 4.2 and V operating systems, network­ Autotest power-up until step 5. At this point, ing software, language software, and tools however, the SCC displays the console receive ongoing quality control imposed dur­ command interpreter prompt: ing the development cycle, followed by rigor­ > > > ous quality assurance, and release control as the software products are manufactured. The console command interpreter accepts the following commands: Encore uses a series of automatic sequences to perform extensive testing of the systems abort Perform abnormal exit from the during development and prior to general console command interpreter. customer release. These sequences test system boot Boot from a specified device in the components for concurrence with their speci­ fications and documentation. The tests start boot list (contained in SCC local with the operating system's programmer interface and libraries, and proceed through to

5-4 Multimax Technical Summary Multimax Reliability and Maintainability

general user-level utilities. A series of system-level stress tests checks multiuser system functions and system-level boundary conditions in a systematic, repeatable manner.

Software manufacturing procedures are auto­ matic and operate from a bill of materials for software components. These procedures assure that the correct revision of every software component is included in every distribution, and that subsequent software distributions are consistent with earlier ones. Encore backs up the distribution system by revision control and software archive systems, also used in the software development process to help assure the integrity of the software being built.

Multimax Technical Summary 5-5 Appendix A The NS32000 Family Processor Architecture

tion set designed to provide efficient and THE CURRENT MULTIMAX PROCESSOR economical support for high level languages. When the Multimax system design was begun, Some of its powerful features are: three requirements were clear: • a compactly encoded, completely symmet­ • The desired system performance required rical, dual address instruction set a fast processor capable of dealing, intern­ • nine addressing modes designed for the ally and externally, with a word size of at kinds of accesses that compilers generate least 32 bits. • indexing that is automatically scaled to • The projected multiuser, multiprocess argument size (1, 2, 4, or 8 bytes), appli­ applications required available and econ­ cable to any omical memory mapping and virtual memory mechanisms. • instructions to implement high level language constructs such as case state­ • The anticipated use of high level lan­ ments, loops, and calls, as well as bit-field guages made it desirable that the pro­ and string manipulation cessor's instruction repertoire correspond in a reasonable way with the require­ • a fully integrated floating point instruc­ ments of compilers. tion set, supported by hardware

At the time that Multimax system design was The above features, among others, led the begun, only one announced processor chip met Multimax designers to adopt the NS32032 as these requirements - the National Semicon­ the system's main engines on the Dual ductor 32032. This chip is designed to run with Processor Card. This chip, and others in the a 10 MHz clock, has a full 32-bit Arithmetic NS32000 family, are also used as auxiliary Logic Unit and register set - as well as a full processors on other Multimax cards and in the 32-bit data bus to memory. Moreover, it Annex. belongs to the National Semiconductor 32000 family, which has a complement of proven Note that, except for the chip's memory memory management and floating point mapping and floating point capabilities, chips. compatibility with unique Multimax system design requirements was much less an issue in In addition, the NS32032 - like the rest of the selecting the NS32000 family than were its NS32000 family - has an orthogonal instruc­ architectural characteristics. The Multimax

Multimax Technical Summary A-1 Appendix A

system tolerates many diverse processor Integer Data Type architectures. When future, more powerful processors become available, the Multimax The integer data type is used to represent can be upgraded to accommodate them. integers - that is, whole numbers without fractional parts. Integers can be signed (nega­ The following information is drawn in part tive or positive) or unsigned (positive only). In from National Semiconductor literature and is the NS32000 family, integers are available in included here to provide some insight into the three sizes: 8-bit (byte), 16-bit (word), and 32- capability of the processing engine that drives bit (double word). Signed integers are the Multimax. The features described below represented as binary two's complement are for the most part transparent to Multimax numbers and have values in the range -27 to users. 27-1, or -215 to 215-1, or -231 to 231-1; unsigned integers have values in the range 0 to 28-1, or 0 to 216-1, or 0 to 232_i. When NS32000 ARCHITECTURE integers are stored in memory, the least significant byte occupies the lowest address Data Types Supported and the most signficant byte is at the highest The objects and concepts of a high level address. language include constants, variables, expressions, and functions, each of which has Floating Point Data Type a particular data type. Its type determines the range of values that a constant, variable, The floating point data type is used to expression, or function can assume in a represent real numbers - that is, numbers program. with fractional parts. Floating point numbers are represented by an encoded version of the A data type is said to be supported by a com­ familiar scientific notation: puter if the computer's instruction set either n = sxfx 10e contains operators that directly manipulate the data type or has operators and addressing where s is the sign of the number, f is the frac­ modes that facilitate such manipulation. Data tion or mantissa, and e is a positive or negative types directly manipulated by the hardware integer exponent. (Figure A-l shows how are called primitive data types. Those sup­ these values are represented by fields within ported by the hardware, but not manipulated the number.) Floating point numbers are directly, consist of ordered collections of available in two sizes: 32-bit (single precision) primitives and are called structured data and 64-bit (double precision). Double precision types. offers both a larger range (with its larger exponent) and more precision (with its larger The NS32000 family supports the following mantissa). The NS32000 floating point data primitive and structured data types: type is compatible with the IEEE floating point standard. • primitive data types (see Figure A-l) integer (signed and unsigned) Most floating point computations are floating point automatically handled by the NS32081 Boolean Floating Point Processor (FPU). A programmer binary coded decimal (BCD) can treat both single and double precision bit field floating point numbers like any other • structured data types NS32000 family data types and can use any of matricies and arrays the NS32000 addressing modes to reference - records them. In addition, conversion is provided from strings every integer and floating point format to stacks every other integer and floating point format.

A-2 Multimax Technical Summary Appendix A

INTEGER

Byte

Double Word

31 0

FLOATING POINT

1 Sign Exponent Fraction 32 BITS

Isign Exponent

| Bit Field | | UP TO 32 BITS

BCD DIGITSIGITS

|Digit 1 | Digit 0 | 8 BITS

| Digit: Digit 2 Digit 1 Digit 0 J 16 BITS 15 12 11 87 43 0

|Digit 7 | Digit 6 | Digit 5 | Digit 4 | Digit 3 | Digit 2 | Digit 1 Digit 0 | 32 BITS

31 28 27 24 23 20 19 16 15 12 11 8 7 4 3 0

Figure A-1 Primitive Data Types

The bit field data type is different from other Other Primitive Data Types (Boolean, Bit primitive data types in that the basic Field, BCD Digits) addressable unit is measured in bits instead of The Boolean (or logical) data type is a single bytes. In the NS32000 family, bit fields can be bit whose value, 1 or 0, represents the two 1 to 32 bits long, and can be located arbitrarily logic values TRUE and FALSE. A Boolean data with respect to the beginning of a byte. They type has many uses in a program; for example, are useful when a data structure includes to save the results of comparisons, to mark elements of nonstandard lengths, since they special cases, and in general to distinguish allow programs to manipulate fields smaller between two possible outcomes or conditions. than a byte. Booleans are represented in the NS32000 family by integers (byte, word, or double With the binary-coded decimal (BCD) data word). TRUE is integer 1; FALSE is integer 0. type, unsigned decimal integers can be stored in the computer, using four bits for each

Multimax Technical Summary A-3 Appendix A

decimal digit. The BCD data type is repre­ component. Records are usually grouped into sented by three formats, consisting of two, large arrays, called files in COBOL, structures four, or eight digits. Two BCD digits can be in PL/1, and record structures in Pascal. packed into a byte, four to a word, or eight to a double word. Thus one byte can represent the The NS32000-family addressing modes values from 0 to 99, as opposed to 0 to 255 for a facilitate quick access to record elements (see normal unsigned 8-bit number. Similarly, a "High Level Language Addressing Modes," word can represent values in the range 0 to below). 9999, and a longword can represent values in the range 0 to 99999999. Though BCD requires Strings more bits to represent a large decimal number, this data type has certain advantages over A string is an array of integers, all of the same binary numbers, particularly ease of use by length. The integers may be bytes, words, or humans. longwords. Strings are common data struc­ tures in high level languages. For example, strings of ASCII characters (bytes) are com­ Arrays monly used to contain alphanumeric text. An array is a structured data type consisting of a number of components, all of the same In the NS32000 family, a string is represented data type, such that each data element can be by a sequence of integers stored in contiguous individually identified by an integer index. memory locations. Special operators facilitate Arrays represent a basic storage mode for all comparison of strings, movement of strings, high level languages. and searching strings for particular integer values (see "Block, String, and Array Oper­ In Pascal programs, for example, each ators," below). component of the array is referenced by the array name and an index value giving the Stacks component's position in the array. Arrays range from simple one-dimensional groups of A stack is a one-dimensional data structure in vectors to more complex multi-demensional which values are entered and removed one structures. The elements of an array can be item at a time at one end (the top of stack). It integers, floating point numbers, Boolean consists of a block of memory and a variable variables, characters, or more complex objects called the stack pointer. built up from these types. Stacks are important data structures in both The NS32000 family provides special oper­ systems and applications programming. They ators that facilitate calculation of the array are used to store return address and status index and determination of whether or not the information during subroutine calls and inter­ index is outside the limits of the array (see rupt servicing. Moreover, algorithms for "Block, String, and Array Operators," below). expression evaluation in compilers and inter­ In addition, certain NS32000 addressing preters depend on stacks to store intermediate modes facilitate quick access to array ele­ results. Block structured high level languages ments (see "High Level Language Addressing such as Pascal keep local data and other infor­ Modes," below). mation on a stack. Procedure parameters in block structured high level languages are usually passed on a stack, and assembly Records language programs sometimes use this con­ Records, like arrays, are a structured data vention as well. type with several components. However, unlike arrays, the components of a record can The NS32000 family supports both user and each be of a different data type. In high level interrupt stacks. Depending on the mode of languages, such as Pascal, a component of a operation, one of two stack pointers (SPO or SPl) record is selected by using both the name of contains the memory address of the top item the record variable and the name of the on the stack. Instructions allow for explicit

A-4 Multimax Technical Summary Appendix A

manipulation of the stack pointer, and the version of any integer into a corresponding current stack can be used to hold an operand floating point number or vice versa. in almost all NS32000 instructions (see "High Level Language Addressing Modes," below). These operators are implemented by the FPU and display the same symmetry, addressing modes, and flexibility as the rest of the Operators instruction set. The NS32000 architecture The NS32000 family architecture provides a makes all NS32000 addressing modes complete and comprehensive set of operators available to the FPU. Instructions can be for every hardware-recognized primitive data register-to-register, memory-to-register, or type. In addition, special operators are memory-to-memory. available that facilitate manipulation of structured data types. Logical, Boolean, Bit, and Bit Field Operators Integer Operators Logical operators treat a data word as an A large set of arithmetic operators are array of bits, and allow each bit to be handled provided for integer manipulation: addition independently. Logical operators include AND, and subtraction, multiplication and division OR, Exclusive Or, and Complement. (with various remainder, rounding, modulus, and result-length options), two's complement, Also provided is a special Boolean NOT and absolute value. Other operators include: operator for implementing high level languages which require that TRUE = 1 and • Move operators that allow either zero FALSE = 0. To simplify the handling of propagation or sign extension Boolean expressions in compilers, a Condi­ • Shift operators allowing logical and arith­ tional Set operator stores a 1 into its single metic shifts, as well as rotation left or operand if a condition code check is satisfied; if right, both by any amount. not, it stores a 0.

• Logical operators (AND, OR, Exclusive OR, Bit operators allow convenient handling of Complement, and Bit Clear) allowing each individual bits or arbitrarily large bit arrays. bit in a data word to be manipulated inde­ Apart from its ability to set, clear, test, or pendently and thus facilitating a wide complement any bit in memory or in a variety of data manipulation and testing. register, the NS32000 family has semaphore • Two BCD arithmetic operations, Add and primitives (test and set, test and clear) for Subtract, handling up to eight digits at a coordination in a multiprocessing or multi­ time. tasking environment. Also provided is a Convert Bit-Field Pointer operator which • Extended Multiply and Divide operators converts a byte address and a bit offset into a which return a result that is twice the size bit address. This allows a field address to be of the participating operands. (NS32000- converted to an integer and thus be passed to a family processors are unique among procedure or function - a useful feature for current microprocessors in their ability to high level languages. A Find First Set to calculate a 64-bit result using integer operator searches a sequence of bits, either in multiplication and division.) memory or in a register, and returns the bit number of the first"1" bit detected. Floating Point (FPU) Operators Two Bit Field operators can access bit fields of The NS32000 family supports 32-bit and arbitrary length (up to 32 bits) anywhere in 64-bit precision floating-point calculations, as memory, independent of byte alignments. The well as 8-, 16-, and 32-bit fixed point cal­ Extract operator reads a bit field, expands the culations. NS32000 processors support float­ result to the length specified in the opcode, ing Add, Subtract, Multiply, Divide, Move, and then stores the expanded result into and Compare instructions and allow the con­ another operand. An Insert operator reads an

Multimax Technical Summary A-5 Appendix A

operand of the length specified in the opcode Jumps, Branches, and Calls and stores the low-order part into a bit field. The NS32000 family implements a number of Block, String, and Array Operators different Jumps and Branches: simple Jump, Jump to Subroutine, simple Branch, Con­ For the many iterative operations that are ditional Branch, and Multiway Branch (a required in high level languages, the Block branch is a PC-relative Jump). Since the Move and Block Compare operators facilitate displacement in these instructions can be as efficient generation of compiler code. They are large as the PC, there is no limit to their range. written the same way as the standard In addition, several different returns are memory-to-memory move and comparison supported: return from subroutine, return instructions, except for the addition of a third from trap, and return from interrupt. displacement operand, which specifies how many elements (bytes, words, or longwords) Register Manipulation Operators are to be moved or compared. Any general purpose register (see "Register Strings of bytes, words, or longwords are Set," below) can be accessed via the general easily manipulated with the Move String, addressing modes (see "Instruction Set"). For Compare String, and Skip operators. To avoid that reason, any NS32000 operator that uses a destructive overwriting, move and compare general addressing mode to access one of its operations can proceed from low addresses to operands can manipulate these registers. In high addresses, or vice versa. These operations addition, several operators are provided can proceed unconditionally, or be terminated explicitly for register manipulation. when a comparison condition is met (when either a specific value is encountered or when The Save and Restore operators manipulate a value is no longer encountered). Also, a the general purpose registers. These instruc­ string of instructions can be interrupted or tions include an immediate field of eight bits, aborted, and then restarted where it left off. and each bit is used to specify one of the eight These string operators are comparable in their general purpose registers (or the eight floating power to those available on large mini­ point registers, depending on the instruction). computers and mainframes. The appropriate registers, for which the corresponding bits are set in the field, are Two operators, Check and Index, are provided either pushed or popped, depending on the for array handling. The Check operator deter­ instruction. Since all or any combination of mines whether an array index is within the eight registers are specified in a single bounds. It allows the user to specify both an instruction, the user does not have to worry upper and a lower bound. It also subtracts the about the order in which information is saved lower bound from the value being checked and on the stack. stores the difference in a register, where it can be used in an Index instruction or an indexed Operators for the special purpose registers addressing mode. allow these registers to be loaded and stored; bits in the program status register can be set The array Index operator performs one step of and cleared, and the stack pointer can be a multidimensional array address calculation. adjusted. Other operators for these registers The opcode specifies the length of the second are discussed under "CPU Special Purpose and third operands; the first operand is a Registers," below. general purpose register. The Index operator performs a multiplication and an addition, Register Set leaving the result in a register. The result is then used in another Index instruction for the The NS32000 family architecture supports 36 next dimension, or is used in an index registers grouped into two register sets: 16 addressing mode. general purpose registers and 20 special purpose registers (see Figure A-2). Eight of the general purpose registers are located on

A-6 Multimax Technical Summary Appendix A

the processor; the other eight are located on mally used only by the operating the FPU. The 20 special purpose registers system, primarily for temporary data include eight on the processor, one on the FPU, storage and for holding return infor­ and 11 on the MMU. Besides storing operands mation (for operating system subrou­ and the results from arithmetic operations, tines and interrupt and trap service these registers can also be used for the routines). The SP1 register points to temporary storage of program instructions the lowest address of the last item and control information concerning which stored on the user stack. This stack instruction is to be executed next. can be used by normal user programs to hold temporary data and subroutine return information. Processor General Purpose Registers FP: The Frame Pointer register is used by Internal to the processor are eight 32-bit a procedure to access parameters and general purpose registers R0 through R7, which local variables on the stack. It is set up fast local storage for the processor. provide when a procedure is entered, and They can be used to store bytes, words, points to the procedure's stack frame, longwords, and double longwords. which contains the parameters for the currently executing subroutine and All general purpose registers are available to also the volatile (as opposed to static) all instructions. A compiler therefore has local variables. The procedure para­ considerable freedom in its use of registers meters are addressed with positive and can get by with a bare minimum of offsets from the frame pointer; the housekeeping. Since general purpose registers local variables of the procedure are can be used as accumulators, data registers, addressed with negative offsets. and address pointers, the machine creates fewer bottlenecks in address calculations than SB: The Static Base register points to the occur with machines that permit only certain global variables of a software module. registers to serve as address pointers. All references to a module's data are relative to this register.

Processor Special Purpose Registers INTBASE: The Interrupt Base register holds the address of the dispatch table for The eight special purpose registers on the interrupts and traps. processor chip are used for storing address and status information. The MOD register and the MOD: The Module register holds the address Processor Status Register are both 16 bits of the module descriptor of the cur­ wide; the other registers are effectively 24 bits rently executing software module. wide, though an additional eight bits (which PSR: The Processor Status Register holds in the current implementation are always set the processor status and control flags. to zero) have been provided to allow for future The PSR is 16 bits long, and is divided expansion. into two 8-bit halves. The low-order PC: The Program Counter register is a eight bits are accessible to all pro­ pointer to the first byte of the current­ grams, while the high-order bits are ly executing instruction. After the accessible only to programs running in instruction is completed, the program supervisor mode. Among the bits in counter is incremented to point to the the PSR are the Carry bit, the Trace bit next instruction. Since this register is (which causes a trap to be executed 24 bits wide, 16 Mbytes of virtual after every instruction), the Mode bit memory can be directly addressed (which is set when the processor is in without the need for segmented user mode), the Interrupt Enable bit addresses. (which if set causes interrupts to be accepted), and several other bits SPO, i: The SPO register points to the lowest usable by comparison instructions. address of the last item stored on the Interrupt Stack. This stack is nor­

Multimax Technical Summary A-7 Appendix A

CPU Special Purpose Registers General Purpose Registers

32 BITS 32 BITS

Program Counter PC RO

Static Base SB R1

Frame Pointer FP R2

User Stack Pointer SP1 R3

Interrupt Stack Pointer SPO R4

Interrupt Base INTBASE R5

16 BITS R6

PSR Program Status R7

MOD

FPU

32 BITS 32 BITS Floating Point Status FO

F1 MMU n F2 32 BITS

Page Table Base PTBO F3

Page Table Base PTB1 F4

Error/Invalidate Address EIA F5

Memory Status MSR F6

Breakpoint BPRO F7

Breakpoint BPR1

Breakpoint Count BC

16 BITS

Sequential Count SCO

Sequential Count SC1

Figure A-2 Register Set

A-8 Multimax Technical Summary Appendix A

memory management status, and is only FPU Registers accessible in supervisor mode. The Floating Point Unit registers are located on the FPU slave processors and consist of eight Other registers in the MMU provide high level, 32-bit registers and a dedicated Floating Point software debug facilities during program Status Register. The eight floating point execution. These include flow tracing with registers can each store a single precision four dedicated registers in the MMU and the operand or half of a double precision operand. trace trap bit in the processor's PSR. If flow To manipulate 64-bit double precision oper­ tracing is activated (by means of a bit in the ands, a specified register (n) and the next MSR), two 32-bit Program Flow registers (PFO register (n -I- 1) are concatenated for the and PFl) always holds the addresses of the last operation. Register n + 1 contains the high- two instructions that were executed out of order bits. sequence. The two 16-bit Sequential Count registers (SCO and SCI) record the number of The Floating Point Status register (FSR) holds sequential instruction fetches between each mode control information, error bits, and trap change in program flow. All four of these enables. Like the other registers, the FSR is 32 registers can be cleared by the Load Memory bits wide. Management Register (LMR) instruction.

MMU Registers Instruction Set The memory management architecture uses The NS32000 family instruction set includes the following 32-bit dedicated registers to over 100 basic instruction types, chosen on the control address translation: basis of a study of the use and frequency of specific instructions in various applications. PTBO, 1: The Page Table Base registers are con­ Special case instructions, which compilers trolled by the operating system and cannot use, have been avoided. The instruc­ point to the starting location of the tion set is further expanded through the use of translation tables in physical memory. special slave processors, acting as extensions All supervisor mode addresses are to the processor. translated with the PTBO register. User mode addresses are translated using This instruction set is symmetrical; that is, this register if the DS bit in the MSR is instructions can be used with any addressing one; if this bit is zero, the PTBl register mode, any operand length (byte, word, and is used. longword), and make use of any general EIA: The Error/Invalidate Address register purpose register. is used to invalidate addresses in the translation buffer, a transparent cache The instructions are genuine two-operand of the most recently used page table instructions, though many use up to five entries. When an entry in a page table operands. This feature, combined with the is modified in memory, the copy of it in consistent and symmetric architecture, the translation buffer is deleted by reduces program size considerably. The writing the address of the affected instruction set is summarized in Table A-l virtual page into the EIA register. and is discussed in the following text. When a PTB register is modified, all cache entries made using that register Addressing Modes are deleted. The EIA is also used to store the address which caused a Information encoded in an instruction memory management exception to specifies the operation to be performed, the occur. type of operands to be manipulated, and the location of these operands. An operand can be MSR: The Memory Status register holds located in a register, in the instruction itself fields that control and examine the (as an immediate operand), or in memory. Instructions can specify the location of their

Multimax Technical Summary A-9 Appendix A

operands by nine addressing modes. Two operands in memory. The address of the modes are used to access operands in registers operand is calculated according to the desired and in instructions - register mode and addressing mode. The calculation is done by immediate mode. The other modes access

Table A-1 NS32032 Instruction Summary

Operation Description Operation Description

MOVES INDEXi Recursive indexing step (multi-dimensional MOVi Move a value. arrays). MOVQi Extend and move a 4-bit constant. STRINGS MOVMi Move Multiple: disp bytes. Options: B(backward), U(until match,W(while MOVZBW — Move with zero extension. match) MOVZiD Move with zero extension. MOVSi Move String 1 to String 2. MOVXBW Move with sign extension. MOVST Move string, translating bytes. MOVXiD Move with sign extension. COMPSi Compare String 1 to String 2. ADDR Move Effective Address CMPST Compare, translating String 1 bytes. INTEGER ARITHMETIC SKPSi Skip over String 1 entries. ADDi Add. SKPST Skip, translating bytes for Until/While. ADDQ Add 4-bit constant. JUMPS AND LINKAGE ADDCi Add with carry. JUMP Jump. SUBi Subtract. BR Branch (PC Relative). SUBCi Subtract with carry (borrow). Bcond Conditional branch. NEGi Negate (2's complement). CASEi Multiway branch. ABSi Take absolute value. ACBi Add 4-bit constant and branch if non-zero. MULi Multiply. JSR Jump to subroutine. QUOi Divide, rounding toward zero. BSR Branch to subroutine. REMi Remainder from QUO. CXT Call external procedure. DIVi Divide, rounding down. CXPD Call external procedure using descriptor. MODi Remainder from DIV (Modulus). MEIi Multiply to Extended Integer. JUMPS AND LINKAGE (continued) DEIi Divide Extended Integer. SVC Supervisor Call. FLAG Flag Trap PACKED DECIMAL (BCD) BPT Breakpoint Trap. ADDPi Add Packed. ENTER Save registers and allocate stack frame SUBPi Subtract Packed. (Enter Procedure). INTEGER COMPARISON EXIT Restore registers and relaim stack frame CMPi Compare. (Exit Procedure). CMPQi Compare to 4-bit constant. RET Return from subroutine. CMPOMi Compare Multiple: disp bytes. RETT Return from Trap (privileged). RETI Return from interrupt (privileged). LOGIC AND BOOLEAN ANDi Logical AND. CPU REGISTER MANIPULATION ORi Logical OR. SAVE Save General Purpose Registers. BICi Clear selected bits. RESTORE .. .Restore General Purpose Registers. XORi Logical Exclusive OR. LPRi Load Dedicated Register. COMi Complement all bits. SPRi Store Dedicated Register. NOTi Boolean complement: LSB only. ADJSPi Adjust Stack Pointer. Scondi Save condition code (cond) as a Boolean BISPSRi Set selected bits in PSR. variable of size i. BICPSRi Clear selected bits in PSR. SETCFG Set Configuration Register (privileged). SHIFTS LSHi Logical Shift, left or right. FLOATING POINT ASHi Arithmetic Shift, left or right. MOVf Move a Floating Point value. ROTi Rotate, left or right. MOVLF Move and shorten a Long value to Standard. MOVFL Move and lengthen a Standard value to BITS Long. TBITi Test bit. MOVif Convert integer to Standard/Long Floating. SBITi Test and set bit. ROUNDfi Convert to integer by rounding. SBITIi Test and set bit, interlocked. TRUNCfi Convert to integer by truncating, toward CBITi Test and clear bit. zero. IBITi Test and invert bit. FLOORfi Convert to largest integer less than or equal FFSi Find first set bit. to value. BIT FIELDS (not aligned to byte boundaries) ADDf Add. EXTi Extract bit field (array oriented). SUBf Subtract. INSi Insert bit field (array oriented). MULf Multiply. EXTSi Extract bit field (short form). DIVf Divide. INSSi Insert bit field (short form). CMPf Compare. CVTP Convert to Bit Field Pointer. NEGf Negate. ABSf Take absolute value. ARRAYS LFSR Load FSR. CHECKi Index bounds check. SFSR Store FSR.

Notations: i = Integer length suffix (B - Byte. W = Word, D = Double Word): f = Floating Point Length suffix (F = Standard Floating, L = Long Floating).

A—10 Multimax Technical Summary Appendix A

Table A-1 (cont.) NS32032 Instruction Summary

Operation Description Operation Description

MEMORY MANAGEMENT CCALIc ... Custom Calculate. LMR Load Memory Management Register CCAL2c ,, .Custom Calculate. (privileged). CCAL3c Custom Calculate. SMR Store Memory Management Register CMOVOc , . Custom Move. (privileged). CMOVIc Custom Move. RDVAL Validate address for reading (privileged). CMOV2c . .Custom Move. WRVAL Validate address for writing (privileged). CCMPO Custom Compare. MOVSUi Move a value from Supervisor Space to user CCVOci Custom Convert. Space (privileged). CCVIci Custom Convert. MOVUSi . Move a value from User Space to CCV3ic Custom Convert. Supervisor Space (privileged). CCV4DQ Custom Convert. CCV5QD Custom Convert. MISCELLANEOUS LCSR Load Custom Status Register. NOP No Operation. SCSR Store Custom Status Register. WAIT Wait for interrupt. CATSTO Custom Address/Test (privileged). DIA Diagnose. Single-byte "Branch to Self" for CATST1 (Privileged). hardware breakpointing. LCR Load Custom Register (privileged). CUSTOM SLAVE SCR Store Custom Register (privileged). CCALOc Custom Calculate.

taking the sum of up to three components: • a displacement element in an instruction • a pointer (that is, an address) in a register or in memory • an index value in a register

The nine addressing modes can also be divided into those that are standard for microprocessor architectures, and those that are particularly suited to the operations and data structures of high level languages.

Standard Addressing Modes The following standard addressing modes are supported by the NS32000 architecture (see Figure A-3):

• Register Figure A-3 Standard Addressing Modes • Immediate • Absolute mode operand is specified by the operand length or by the basic instruction length. • Register Relative ABSOLUTE: With absolute mode, the operand REGISTER: In the register addressing mode, the address is the value of a displacement in the operand is in one of the eight general purpose instruction. registers. In certain slave processor instruc­ tions, an auxiliary set of eight registers can be REGISTER RELATIVE: The register relative referenced instead. mode computes an effective address (the operand address) by adding a displacement IMMEDIATE: The immediate mode operand is in the instruction. The length of the immediate

Multimax Technical Summary A-11 Appendix A

given in the instruction to a pointer in a entry to be used) and an offset to a sub-field of general-purpose register. the referenced variable (for example, a sub- field of a Pascal record).

High Level Language Addressing Modes TOP OF STACK: In this addressing mode, also In addition to the above standard addressing unique to the NS32000 family, the currently modes, the NS32000 architecture employs selected Stack Pointer (SPO or SPi) specifies the several modes designed to support the require­ location of the operand. Depending on the ments of high level languages: instruction, the SP will be incremented or decremented, allowing normal push and pop • Memory Mode facilities. This addressing mode allows • Memory Relative manipulation or accessing of an operand on the stack by all instructions. For instance, the • External TOS value can be added to the contents of a • Top of Stack memory location, a register, or to itself, and the result saved on the stack. On most other • Scaled Index microprocessors, in which top-of-stack addressing is limited to a very small number MEMORY MODE: This addressing mode is of instructions, these manipulations would identical to Register Relative, discussed require several instructions to achieve the above, except that the register used is one of same results. The great advantage of this the dedicated registers - PC, SP, SB, or FP. These addressing mode is that it allows quick registers point to data areas generally needed reference, using a minimum number of bits, to by high level languages. intermediate values in arithmetic compu­ tations. MEMORY RELATIVE: The memory relative mode allows pointers located in memory to be used SCALED INDEX: This addressing mode com­ directly, without having to be loaded into putes the operand address from one of the registers. Memory relative mode is useful for general purpose registers and a second handling address pointers and manipulating addressing mode. The register value is fields in a record. When this addressing mode multiplied by one, two, four, or eight (index is used, the instruction specifies two displace­ byte, index word, index double, or index quad). ments. The first displacement is added to a The effective address of the second addressing designated special purpose register, and a mode is then added to the multiplied register longword is fetched from this address. The value to form the final operand address. The operand address is the sum of this value and Scaled Index mode is used for addressing into the second displacement. In accessing records, arrays, when the elements of the array are the second displacement specifies the location bytes, words, longwords, floating point num­ of a field in the record pointed to by the bers or long floating point numbers. longword. The exact size of this field is programmable. Instruction Format EXTERNAL: The external addressing mode is The NS32000 family architecture provides a unique to the NS32000 family, and supports variable-length instruction format in which the software module concept, which allows the instructions are represented as a series of modules to be relocated without linkage bytes. Figure A-4 shows the general format of editing. This mode can be used to access an NS32000 instruction. operands that are external to the currently executing module. Associated with each The basic instruction is one to three bytes long module is a linkage table, containing the and contains the opcode and up to two 5-bit absolute addresses of external variables. The general addressing mode ("Gen") fields. external addressing mode specifies two Following the basic instruction field is a set of displacements: the ordinal number of the optional extensions, which may appear, external variable (that is, the linkage table

A-12 Multimax Technical Summary Appendix A

Following index bytes come any displace­ Optional . Basic ments (addressing constants) or immediate Extensions Instructor values associated with the selected addressing modes. Each disp/imm Field can have one or DISP2 DISP2 Gen Gen two displacements, or one immediate value. Implied Scaled Scaled Addr Addr Op- Oper­ Index Index Mode Mode Code The size of a displacement field is encoded and^) Byte 2 Byte 1 IMM2 IMM2 2 within the top bits of that field, with the remaining bits interpreted as a signed (two's complement) value. The size of an immediate Figure A-4 value is determined from the opcode field. General Instruction Format depending on the instruction and the address­ Special Encodings ing modes selected. Two other special encodings, reg and quick, The opcode specifies the operation to be allow the very compact encoding of frequently performed, for example, ADD or MOV, and the used instructions. For example, there are number of operands to be used in the instruc­ quick forms of add, move, and compare tion. The specification of an operand length (B, instructions which encode a small integer W, D, F, or L) is written appended to the opcode, operand (range from - 8 to + 7) in place of a for example, ADDW, MOVF. The length speci­ second general addressing mode. fication in integer instructions is encoded in the basic instruction as B = 00, W = 01, or D = Some instructions require additional, "im­ 11; the length specification in floating point plied" immediates and/or displacements, apart instructions is encoded in the basic opcode as F from those associated with addressing modes. = 0 or L = 1. Any such extensions appear at the end of the instruction, in the order that they appear The general addressing mode ("Gen") fields within the list of operands in the instruction specify the addressing mode to be used to definition. access the instruction's operands.

Index bytes appear in the instruction format when either or both Gen fields specify Scaled index mode. In this case, the Gen Field specifies only the scale factor (1,2, 4, or 8), and the index byte specifies which general purpose register to use as the index, and which addressing mode calculation to perform before indexing.

Multimax Technical Summary A-13 Appendix B UMAX 4.2 Command Summary

The following commands are provided as part of the basic Multimax software release and are supported by Encore.

GENERAL PURPOSE UTILITIES (Manual Section 1)

Name Description

adb general purpose interactive debugging program addbib create or extend a bibliographic database aoutdump output a description of an executable file apply apply a command to a set of arguments apropos locate commands by keyword lookup ar archive and library maintainer as assembler at execute commands at a later time awk pattern scanning and processing language basename strip filename affixes be arbitrary-precision arithmetic language biff be notified if mail arrives and who it is from binmail send or receive mail among users cal print calendar calendar electronic reminder service cat catenate and print cb C program beautifier cc C language compiler ccat uncompress files to standard output cd change working directory checkeq check typeset mathematics files checknr check nroff/troff files chfn change finger entry chgrp change file group chmod change file mode

Multimax Technical Summary B-1 Appendix B

General Purpose Utilites (cont.) chsh change default login shell ci check in RCS revisions clear clear terminal screen cmp compare two files CO check out RCS revisions col filter reverse line feeds colcrt filter nroff output for CRT previewing colrm remove columns from a file comm select or reject lines common to two sorted files compact compress files cp file copy cpio System V archiving utility crypt encode or decode a file csh a shell (command interpreter) with C-like syntax ctags create a tags file cu establish a connection using TIP date print and set the date dc desk calculator dd convert and copy a file deroff remove nroff, troff, tbl and eqn constructs df summarize disk free space diction find wordy phrases diff differential file and directory compare difT3 3-way differential file compare dis disassemble object code into assembly language du summarize disk usage echo echo arguments ed text editor edit the ex text editor, a display oriented version of ed egrep extended regular expression match eqn typeset mathematical expressions error analyze and disperse compiler error messages ex display oriented text editor expand expand tabs to spaces explain thesaurus for diction expr evaluate arguments as an expression eyacc modified yacc allowing much improved error recovery false provide false (non-zero) status fgrep fast fixed string match file determine file type find find files finger user information lookup program fmt simple text formatter fold fold long lines for finite width output device from who is my mail from? ftp file transfer program graph draw a graph grep search a file for a pattern groups show group memberships head give first few lines

B-2 Multimax Technical Summary Appendix B

General Purpose Utilities (cont.) help provide basic help to user hostid set or print identifier of current host system hostname set or print name of current host system ident identify files indent indent and format a C language program source indxbib build inverted index for a bibliography install install binaries join relational database operator kill terminate a process last indicate last logins of users and teletypes lastcomm show last commands executed in reverse order Id link editor leave remind you when you have to leave lex generator of lexical analysis programs lint a C program verifier In make links lock reserve a terminal login sign on look find lines in a sorted list lookbib find a reference in a bibliography lorder find ordering relation for an object library lpq examine a spooler queue lpr offline print lprm remove jobs from the line printer spooling queue Is list contents of directory m4 macro processor mail send and receive electronic mail make maintain program groups man look up commands in the online manual merge three-way file merge mesg permit or deny messages mkdir make a directory mkstr create an error message file from a C language program more file perusal for crt viewing mt magnetic tape manipulating program mv move or rename files neqn preprocessor for typesetting mathematics on terminals netstat show network status newaliases rebuild the data base for the mail aliases file nice run a command at low priority (sh only) nm print name list nohup run at low priority (csh only) nroff a text formatting program od octal, decimal, hex, or ASCII dump page file perusal filter for crt viewing pagesize print system page size passwd change login password plot graphics filters

Multimax Technical Summary B-3 Appendix B

General Purpose Utilities (cont.) pr print file print print to the line printer printenv print out the environment prmail print out mail in the post office prof display profile data ps print process status pti phototypesetter interpreter ptx generate a permuted index pwd print working directory name ranlib convert archives to random libraries rep remote file copy res change RCS file attributes rcsdiff compare RCS revisions rcsmerge merge RCS revisions red restricted form of the ed text editor refer find and insert literature references in documents reset reset the terminal mode bits to a known state rev reverse the lines of a file rlog print log messages and other information about RCS files rlogin remote login rm remove files rmdir remove directories roffbib run off a bibliographic database rsh remote shell ruptime show host status of local machines rwho who is logged in on local machines script make typescript of terminal session sdb symbolic debugger sed stream editor sh the Bourne shell, a command language size report the size of an object file sleep suspend execution for an interval soelim eliminate .so's from nroff input sort sort or merge files sortbib sort a bibliographic database spell find spelling errors spellin build a hashed spelling list spellout verify a hashed spelling list spline interpolate a smooth curve split split a file into pieces strings find the printable strings in a object, or other binary, file strip remove symbols and relocation bits stty set terminal options style analyze surface characteristics of a document su substitute user id temporarily sum sum and count blocks in a file sysline show system status in status line sysmon system statistics monitor t300 DASI300 plot filter - see plot t300s DASI300S plot filter - see plot t450 DASI 450 plot filter - see plot

B-4 Multimax Technical Summary Appendix B

General Purpose Utilities (cont.) tabs set terminal tabs tail deliver the last part of a file talk talk to another user tar tape archiver tbl format tables for nroff or troff tcontrol human readable plot filter - see plot tee branch operator for pipe fitting tek 4014 plot filter - see plot telnet user interface to the TELNET protocol test test for conditional command execution tftp trivial file transfer protocol interface thslOO ReGlS plot filter - see plot time time a command tip connect to a remote system tk paginator for the Tektronix 4014 touch change the date on which a file was last modified tr translate characters troff text formatting & typesetting program true provide true (zero) status tset terminal dependent initialization tsort topological sort tty get terminal name ul display underlining in text uncompact uncompress files unexpand unexpand spaces into tabs uniq report repeated lines in a file units conversion program uptime show how long system has been up users compact list of users who are on the system u compact list of system users - see users uucp unix to unix copy uudecode decode a binary file for transmission via mail uuencode encode a binary file for transmission via mail uulog log uucp and uux transactions uuname list system names known by uucp uusend send a file to a remote host uux unix to unix remote command execution vgrind grind nice listings of programs vi screen oriented (visual) display editor based on ex view read-only vi w who is logged on and what are they doing wait await completion of process wall write to all users wc word count what search files for SCCS header strings whatis describe what a command is whereis locate the source, binary, and or manual entry for a program which locate a program file including aliases and paths (csh only) who show who is on the system whoami print effective current user id whois query the network user database

Multimax Technical Summary B-5 Appendix B

General Purpose Utilities (cont.) windim dimension a window in TERMCAP write write to another user xstr extract strings from C programs to implement shared strings yacc yet another compiler-compiler yes be repetitively affirmative

SYSTEM ADMINISTRATION UTILITIES (Manual Section 8)

Name Description ac login accounting accton enable/disable command logging adduser procedure for adding new users badsect create files to contain bad disk sectors catman create files for the online manual cda crashdump analyzer chown change the owner of a file comsat biff server cron the clock daemon ctp Ethernet configuration test protocol utility devconfig device configuration program dmesg collect system diagnostic messages to form an error dump incremental file system dump dumpfs dump file system information fastboot reboot the system without checking the disks fasthalt halt the system without checking the disks format format disks fsck file system check ftpd DARPA Internet File Transfer Protocol server gettable get NIC format host tables from a host getty set terminal mode halt stop the processor htable convert NIC standard format host tables ifconfig configure network interface parameters init process control initialization lpc line printer control program lpd line printer daemon make node build a special file makekey generate an encryption key mkfs construct a file system mklost + found make a lost + found directory for fsck mkrnod build special files for remote nodes mis annex support (call) mount mount a file system newfs construct a new file system pac supply printer/plotter accounting information partition make logical disk partitions rc command script for auto-reboot and daemons

B-6 Multimax Technical Summary Appendix B

System Administration Utilities (cont.) rdump remote file system dump reboot bootstrapping procedures renice alter priority of running processes restore incremental file system restore rexecd remote execution server rlogind remote login server rmail handle uucp remote mail rmt remote magtape protocol module route direct network routing routed network daemon server rrestore remote file system restore rshd remote shell server rwhod system status server sa system accounting savedump save a core dump of the operating system sendmail send electronic mail over the internet shutdown close down the system at a given time swapon specify additional device for paging and swapping sync update the super block syslog Log system messages talkd talk server telnetd DARPA TELNET protocol server tftpd DARPA Trivial File Transfer Protocol server trpt transliterate protocol trace tunefs tune up an existing file system umount unmount a file system update periodically update the super block uuclean uucp spool directory clean-up uusnap show snapshot of the uucp system vipw edit the password file

USER-CONTRIBUTED SOFTWARE The following commands and files are provided as part of the basic Multimax software release but are neither warranted nor supported by Encore Computer Corporation. They are provided on an as-is basis for the convenience of the customer.

Name Description advent adventure - an exploration game arithmetic provide drill in number facts backgammon the game banner print large banner on printer bed convert to binary coded decimal canfield, cfscores the solitaire card game canfield factor print prime factors fish play "Go Fish" fortune print a random, hopefully interesting, adage hangman computer version of the game hangman

Multimax Technical Summary B-7 Appendix B

User-Contributed Software (cont.) mille play Mille Bournes monop monopoly game number convert Arabic numerals to English primes print prime numbers quiz test your knowledge rain animated raindrops display snake, snscore display chase game trek trekkie game worm play the growing worm game worms animate worms on a display terminal wump the game of hunt-the-wumpus The file /etc/termcap contains the characteristics of many commonly used terminals. Only the termcap entries describing the HostStation 100/110, the VT100 and the Wyse 75 are supported by Encore.

SUPERCEDED SOFTWARE

The following obsolete 4.2BSD software is not part of the Multimax release.

Name Description Reason clri clear i-node Superceded by fsck dbx debugging program Superceded by sdb dcheck file system directory consistency check Superceded by fsck icheck file system check Superceded by fsck mknod build special file Superceded by make_node ncheck generate names from I-numbers Superceded by fsck pstat print system facts Superceded by sysmon symorder rearrange name list tp tape archive Superceded by tar and cpio vmstat report virtual memory statistics Superceded by sysmon

B-8 Multimax Technical Summary Appendix C UMAX V Command Summary

The following commands are provided as part of the UMAX V Multimax software release and are supported by Encore.

GENERAL PURPOSE UTILITIES

Name Description 4014 paginator for the Tektronix 4014 terminal acctcom search and print process accounting file(s) admin create and administer SCCS files ar archive and library maintainer for portable archives as common assembler asa interpret ASA carriage control chars at execute commands at a later time awk pattern scanning and processing language banner make posters basename deliver portions of path names batch execute commands at a later time be arbitrary-precision arithmetic language bdiff big diff bfs big file scanner bs a compiler/interpreter for modest-sized progs cal print calendar calendar reminder service cancel cancel requests to an LP line printer cat concatenate and print files cb C program beautifier cc C compiler cd change working directory ede change the delta commentary of an SCCS delta cflow generate C flow graph chgrp change group chmod change mode chown change owner

Multimax Technical Summary C-1 Appendix C

General Purpose Utilities (cont.) cmp compare two files col filter reverse line-feeds comb combine SCCS deltas comm select or reject lines common to two sorted files cp copy flies cpio copy file archives in and out cpp the C language preprocessor crontab user crontab file csplit context split ct spawn getty to a remote terminal ctrace C program debugger cu call another UNIX system cut cut out selected fields of each line of a file cxref generate C program cross-reference date print and set the date dc desk calculator dd convert and copy a file delta make a delta (change) to an SCCS file deroff remove nroff, troff\ eqn, and tbl constructs diff differential file comparator diff3 3-way differential file comparison diffmk mark differences between files dircmp directory comparison dis suppresses the use of symbolic names disable disable LP printers dirname deliver portions of path names du summarize disk usage dump dump selected parts of an object file echo echo arguments ed text editor edit text editor (variant of ex for casual users) enable enable LP printers egrep search a file for a pattern env set environment for command execution ex text editor expr evaluate arguments as an expression factor factor a number false provide truth values fgrep search a file for a pattern file determine file type find find files get get a version of an SCCS file getopt parse command options graph draw a graph greek select terminal filter grep search a file for a pattern hashcheck see spell hashmake see spell help ask for help hyphen find hyphenated words id print user and group IDs and names

C-2 Multimax Technical Summary Appendix C

General Purpose Utilities(cont.) ipcrm remove a message queue, semaphore set or shared memory id ipcs report inter-process communication facilities status join relational database operator kill terminate a process Id link editor for common object files lex generate programs for simple lexical tasks line read one line lint a C program checker In link files login sign on logname get login name lorder find ordering relation for an object library lp send requests to an LP line printer lpstat print LP status information Is list contents of directory m4 macro processor machid provide truth value about your processor type mail (rmail) send mail to users or read mail mailx interactive message processing system make maintain, update, and regenerate groups of programs makekey generate encryption key man print entries in this manual mesg permit or deny messages mkdir make a directory mv move files newform change the format of a text file newgrp log in to a new group news print news items nice run command at low priority nl line numbering filter nm print name list of common object file nohup run a command immune to hangups and quits nroff text formatting od octal, decimal, hex, or ASCII dump pack (peat, unpack) compress and expand files passwd change login password paste merge same lines of several files or subsequent lines of one file peat see pack Pg file perusal filter for soft-copy terminals pr print files prof display profile data prs print an SCCSfile ps report process status ptx permuted index pwd working directory name regemp regular expression compile red text editor rm remove files rmmail see mail rmdir remove directories rmdel remove a delta from an SCCSfile

Multimax Technical Summary C-3 Appendix C

General Purpose Utilities (cont.) rsh see sh sact print current SCCSfile editing activity sag system activity graph sar system activity reporter sccsdiff compare two versions of an SCCSfile sdb symbolic debugger sdiff side-by-side difference program sed stream editor sh (rsh) shell, the standard/restricted command programming language shl interactive - shell layer manager size print section sizes of common object files sleep suspend execution for an interval sno SNOBOL interpreter sort sort and/or merge files spell (hashmake, spellin, hashcheck) file spelling errors spline interpolate smooth curve split split a file into pieces strip strip symbol and line number information from a common object file stty set the options for a terminal su become super-user or another user sum print checksum and block count of a file swap swap administrative interface sync update the super block tabs set tabs on a terminal tail deliver the last part of a file tar tape file archiver tee pipe fitting test condition evaluation command time time a command timex time a command; report process data and system activity touch update access and modification times of a file tplot graphics filters tput query terminfo database tr translate characters troff typesetting true provide truth values tsort topological sort tty get the name of the terminal umask set file-creation mode mask uname print name of current UNIX system unget undo a previous get of an SCCS file uniq report repeated lines in a file units conversion program unpack see pack uucp (uulog,uuname) UNIX system to UNIX system copy uulog see uucp uuname see uucp uupick see uuto uustat uucp status inquiry and job control uuto (uupick) public UNIX-to-UNIX system file copy uux UNlx-to-UNix system command execution val validate SCCS file

C-4 Multimax Technical Summary Appendix C

General Purpose Utilities (cont.) vc version control vedit see vi vi {view, vedit) screen-oriented (visual) display editor based on ex wait await completion of process wc word count what identify SCCS files who who is on the system write write to another user xargs construct arg list(s) and execute command yacc yet another compiler compiler

SYSTEM ADMINISTRATION UTILITIES

Name Description accept allow LP requests acctcms command summary from per-process accounting records acctconl connect-time accounting acctcon2 connect-time accounting acctdisk overview of acctng and miscellaneous acctng commands acctdusg overview of acctng and miscellaneous acctng commands acctmerg merge or add total accounting files accton overview of acctng and miscellaneous acctng commands acctprcl process accounting acctprc2 process accounting acctsh (chargefee, ckpacct, dodisk, lastlogin,monacct, nulladm, prctmp, prdaily, prtacct, shutacct, startup, turnacct) shell procedures for acctng acctwtmp overview of acctng and miscellaneous acctng commands bcheckrc see brc brc (bcheckrc,rc,powerfait) system init shell scripts chargefee see acctsh checkall faster file system checking procedure chroot change root directory for a command ckpacct see acctsh clri clear i-node config configure a UNIX system cpset install object files in binary directories crash examine system images cron clock daemon dcopy copy file systems for optimal access time devnm device name df report number of free disk blocks dfsck see fsck diskusg generate disk accounting data by user ID errdead extract error records from dump errdemon error-logging daemon errpt process a report of logged errors errstop terminate the error-logging daemon ff list file names and statistics for a file system filesave (tapesave) daily/weekly UNIX file system backup fine fast incremental backup format format a disk

Multimax Technical Summary C-5 Appendix C

System Administration Utilities (cont.) free recover files from a backup tape fsck (dfsck) file system consistency check and interactive repair fsdb file system debugger fuser identify processes using a file or file structure fwtmp (wtmpfix) manipulate connect acctng records getty set terminal type, modes, speed, and line discipline grpek (pwck) group/password file checkers init (telinit) process control initialization install install commands killall kill all active processes labelit see uolcopy lastlogin see acctsh link (unlink) exercise link and unlink system calls lpadmin configure the LP spooling system lpmove start/stop the LP request scheduler and move requests lpsched start/stop the LP request scheduler and move requests lpshut start/stop the LP request scheduler and move requests mkfs construct a file system mknod build special file monacct see acctsh mount (umount) mount and dismount file system mvdir move a directory ncheck generate names from i-numbers nulladm see acctsh partdisk partition a disk powerfail see brc prctmp see acctsh prdaily see acctsh prfld see profiler prfdc see profiler prfpr see profiler prsnap see profiler prfstat see profiler profiler (prfld,prfstat,prfdc,prfsnap,prfpr) operating system profiler prtacct see acctsh pwck (grpek) password/group file checkers qasurvey quality assurance survey rc see brc reboot reboot the system reject prevent LP requests runacct run daily accounting sadp disk access profiler sar (sal, sa2, sadc) system activity report package setmnt establish mount table shutacct see acctsh shutdown terminate all processing startup see acctsh sysdef system definition tapesave see filesaue telinit (init) process control initialization tic terminfo compiler turnacct see acctsh

C-6 Multimax Technical Summary Appendix C

System Administration Utilities (cont.) umount see mount unlink see link uuclean uucp spool directory clean-up uusub monitor uucp network volcopy (labelit) copy file systems with label checking wall write to all users whodo who is doing what wtmpfix see fwtmp

TCP/IP NETWORKING UTILITIES

Name Description erpcd (bfs) expedited remote procedure call daemon and block file server ftp file transfer program ftpd file transfer program daemon hostid set or print identifier of current host system hostname set or print identifier of current host system ifconfig configure network interface parameters mkrnod build network special file mis Annex login server nets tat show network status nsh remote shell nshd remote shell daemon rep remote file copy rexecd remote command daemon rlogin remote login rlogind remote login daemon ruptime show host status of local machines rwho who is logged in on local machines telnet user interface to the TELNET protocol telnetd user interface to the TELNET protocol daemon tftp trivial file transfer protocol tftpd trivial file transfer protocol daemon

DISTRIBUTED, BUT NOT SUPPORTED, UTILITIES

Name Description 300 handle special functions of DASI 300 and 300s terminals 450 handle special functions of the DASI 450 terminal adb a debugger games a collection of computer games gath see send hp handle special functions of Hewlett-Packard 2640 and 2621 series terminals rjestat RJE status report and interactive status console

Multimax Technical Summary C-7 Appendix D Optional Software Products

The following software product packages are optionally available from Encore.

FORTRAN PRODUCT

The following commands are provided as part of the Multimax Fortran 77 product and are supported by Encore.

FORTRAN UTILITIES (Manual Section 1)

Name Description efl extended fortran language (77 fortran 77 compiler fpr print Fortran file fsplit split multi-routine fortran file into separate files ratfor rational fortran dialect struct structure fortran programs

PASCAL PRODUCT The following commands are provided as part of the Multimax Pascal product and are supported by Encore.

PASCAL UTILITIES (Manual Section 1)

Name Description pasmat Pascal-2 program formatter pb Pascal-2 program formatter pc Pascal-2 compiler pdb Pascal-2 source level debugger procref Pascal-2 procedure cross-reference generator

Multimax Technical Summary D-1 Appendix D

Pascal Utilities (cont.) prose simple text formatter pxref Pascal-2 cross-reference lister

EMACS PRODUCT The following commands are provided as part of the Multimax Emacs product and are supported by Encore.

EMACS UTILITIES

Name Description emacs emacs editor dbadd emacs database manipulating program dbcreate emacs database manipulating program dblist emacs database listing program dbprint emacs database printing program filesort emacs sort program

D-2 Multimax Technical Summary Glossary of Multimax Terms

The following terms have special meaning in and memory. They are automatically down­ the context of Multimax hardware and soft­ loaded from host Multimax systems (via the ware or are essential to the understanding of Ethernet) with front-end software designed to such meanings. Italicized words identify other minimize character interrupts to the host. entries in this Glossary. cache memory address A small, high-speed memory placed between A number used by the operating system and slower main memory and the processor. A user software to identify a storage location. cache increases effective memory transfer See also virtual address and physical address. rates and processor speed. It contains copies of data recently used by the processor, and address space fetches several bytes of data from memory in The set of all possible addresses available to a anticipation that the processor will access the process. Virtual address space refers to the set next sequential series of bytes. of all possible virtual addresses. Physical address space refers to the set of all possible closely-coupled physical addresses sent out on the Nanobus. Describes a multicomputer architecture in which component processors share the same ALLY bus, but run separate operating systems and A fully integrated software development and are assigned private blocks of memory. See execution environment available from Encore. also loosely-coupled, tightly-coupled. ALLY runs on nearly any computer or oper­ ating system and allows both programmers direct mapping cache and non-programmers to create applications A cache memory organization in which only requiring the interactive display and manipu­ one address comparison is needed to locate any lation of data in accounting, inventory control data in the cache because any block of main and reporting, medical and dental office man­ memory data can be placed in only one agement, personnel records and processing, possible position in the cache. Compare fully engineering laboratory support, and so on. associative cache.

Annex DPC This is the primary serial/parallel line con­ Multimax "Dual Processor Card." This card centrator for the Multimax system. Annexes carries two primary Multimax processors are provided with multiple I/O ports for serial (NS32032s), 32K bytes of high-speed cache and parallel devices (such as terminals, memory, and bus interface and self-test logic. modems, printers, personal computers, and workstations ) and contain their own processor

Multimax Technical Summary Glossary-1 Glossary of Multimax Terms

ECC cache to find any particular block. See also "Error Correction Code." Designates a tech­ direct mapping cache. nique of encoding and checking extra bits associated with each data word so that any interleaving single-bit error in the data word can be iden­ Assigning consecutive longword addresses tified and automatically corrected. Multimax alternately between two or more memory Shared Memory Cards perform ECC operations controllers. Interleaving speeds up memory on each 32-bit word, not only when it is transfers because each interleaved bank can accessed, but also with each memory refresh begin processing the next sequential longword cycle (once every 2 seconds). before the antecedent longword in the pre­ vious bank has been completed. Consecutive EMC longwords can thus be accessed at Nanobus Multimax "Ethernet/Mass Storage Card." clock rates (80 ns) rather than at memory This card carries interface logic for a single clock rates (320 ns). Since the two halves of a Ethernet and for a Small Computer System double longword are formed simultaneously Interconnect (SCSI) bus. It also carries an in the two banks of any Shared Memory Card, auxiliary processor, local memory, and self - transfers involving sequential double long- test logic. words can take advantage of the full Nanobus bandwidth (100 M bytes/sec). Ethernet A network consisting of wide-band (10 MHz) Multimax Shared Memory Cards provide two- coaxial cable to which participating nodes way interleaving between memory banks. In connect by means of electrically isolated addition, any two or four SMCs of the same size transceivers. These transceivers assist in can be interleaved - with the result that eight- synchronizing transmissions by sensing the way interleaving is routinely available on presence of contending signals on the Ether­ Multimax systems containing at least four net and requiring their own nodes to retry SMCs. conflicting transmissions. Ethernet communi­ cations are accomplished by means of data lock packets, each of which carries the identity of A specialized form of semaphore that provides both sender and intended receiver. access to data structures for a single writer or multiple readers. Read/write locks prohibit ETLB write access until all pending reads are com­ "Extended Translation Lookaside Buffer." The plete, and prohibit read access while any write portion of the extended memory mapping logic is in progress. associated with each Multimax primary pro­ cessor that performs the extended memory longword mapping function. Also known as XMMU. See Four contiguous bytes (32 bits) starting on an MMU. addressable byte boundary. Bits are numbered from right to left, 0 through 31, and the add­ FPU ress of the longword is the address of the byte "Floating Point Unit." The logic associated containing bit 0. with each primary Multimax processor that performs the transformations required when loosely-coupled the processor executes floating point instruc­ Describes a multicomputer architecture in tions. which component processors share neither bus, memory, nor operating system. Assem­ fully associative cache blages of computer systems interconnected on A cache memory organization in which any a network are loosely-coupled. See also closely- block of data from main memory can be placed coupled, tightly-coupled. anywhere in the cache. Address comparison must take place against each block in the

Glossary-2 Multimax Technical Summary Glossary of Multimax Terms

memory management sharing the processor (or processors) by rapid, The assemblage of hardware protocols that transparent switching among several allow virtual addresses to be mapped into processes or tasks. See also multiprocessing, physical addresses. multitasking, parallel programming.

MIPS multitasking "Millions of Intructions Per Second." A mea­ Identifies tactics that permit multiplexing sure of (typically integer-only) computer several "simultaneous" tasks whose activity system performance. There is, unfortunately, must be synchronized at certain points. See no agreed-upon standard for the measurement also multiprogramming, parallel of MIPS. The maximum number of instruction programming, task. fetches per second is not a viable metric: obvi­ ously, a simple instruction (such as a no- Nanobus operation) takes negligible time as compared The structure supporting communications to the typical instruction. For comparison pur­ between Nanobus cards. The Nanobus implies poses, Encore uses measures compatible with a precise communications protocol, and is generally-accepted figures from industry anal­ physically embodied in the Multimax back­ ysts, and confirmed (relative to other known plane. This backplane provides address, data, machines) by benchmarks. In this system, the vector, and control lines that are one foot long NS32032 processor, operating at 10 MHz, is - approximately the distance traveled by light rated at 0.75 MIPS. in one nanosecond; hence the name.

MMU page "Memory Management Unit." The logic (asso­ A set of 512 contiguous bytes used as the unit ciated with each primary Multimax processor) of memory management and protection. that controls page mapping and protection. Associated with the MMU is an Extended Mem­ page fault ory Management Unit (XMMU) - a custom An exception generated when an executing design that extends the NS32032's physical process refers to a page which is not currently address space from that definable in 24 bits to in physical memory. See paging. that definable in 32 bits. The XMMU imple­ ments virtual to physical mapping. paging The action of bringing pages of an executing multicomputer process into physical memory when refer­ Describes computing systems that make use of enced. When a process executes, all of its pages more than one processor in a loosely-coupled or are said to reside in virtual memory. However, closely-coupled way. See also multiprocessor. only the actively used pages need to reside in physical memory. multiprocessing Available only on multiprocessors, multi­ parallel programming processing is an extension of multiprogram­ Identifies the tactics used by programmers on ming and is characterized by the execution of multiprocessor systems to maximize truly multiple processes with actual, rather than simultaneous execution, on separate proces­ simulated, simultaneity. sors, of multiple processes or tasks. See also multiprogramming, multitasking. multiprocessor Describes tightly-coupled computing systems pended like the Multimax that make use of more than Describes a bus structure that allows requests one processor. See also multicomputer. for information to be dissociated in time from the replies they generate. Pending allows a multiprogramming number of relatively slow devices (processors, Identifies the act of running multiple pro­ for example) to communicate with other slow cesses with simulated simultaneity. Multi­ devices (main memory banks, for example) programming implies multiplexing - that is, without compromising the bandwidth of a

Multimax Technical Summary Glossary-3 Glossary of Multimax Terms

pathway (the Nanobus, for example) designed scatter/gather to accommodate higher speed transfers than The ability to transfer in one I/O operation any single device can manage by itself. When data from non-contiguous pages in physical requests are pended, they are tagged with the memory to contiguous blocks on disk {gather), requester's ID and sent to the recipient at the or data from contiguous blocks on disk to non­ first opportunity. When the recipient responds contiguous pages in memory {scatter). This at some later time, the response is tagged with ability is required in virtual memory systems the requester's ID. Neither participant in the because of the scattering characteristics of dialog is aware that many other transactions virtual memory operations. (between other requesters and responders) may have intervened between the request and see the response. Multimax "System Control Card." This is the Multimax card that performs central system physical memory coordination and diagnostic functions. The SCC The set of storage locations directly accessible contains its own processor as well as local ROM in RAM or ROM within a given system. Syn­ and RAM. In addition, it contains non-volatile onymous with primary storage; contrasts with memory for preserving system state informa­ secondary storage and with virtual memory. tion across shut-downs and power failures. Finally, it contains interfaces that allow the pipeline connection of two external serial devices such A series of operations, performed one after as local and remote consoles. another, in sequence, to achieve a larger result. Pipelined computation has great poten­ secondary storage tial for parallel execution because each stage Synonymous with disk or other high-speed of the pipeline can run on its own dedicated mass storage. Contrasts with primary storage. processor. This eliminates context switching and allows multiple, independent computa­ semaphore tions to overlap in time. A mechanism for synchronizing program exe­ cution by putting requesting processes to sleep primary storage until the requested resource is available. Synonymous with physical memory. Contrasts with secondary storage. SMC Multimax "Shared Memory Card." This Multi­ process max card carries two interleavable 2-Mbyte A major thread of control in an address space, banks of dynamic memory and a control/diag­ able, like a task, to share memory and run in nostic processor. parallel with other control threads, but more general (and therefore more burdoned with SMD overhead) than a task. A process may consist "Storage Module Drive. "A bus and data of multiple tasks. protocol standard for interfacing between computers and mass storage devices such as process priority hard disk drives. The priority assigned to a process for sched­ uling purposes. SNA "Systems Network Architecture." SNA defines quadword a family of IBM protocols for communications Eight contiguous bytes (64 bits) starting on an between (and with) IBM host systems. addressable byte boundary. Bits are numbered from least to most significant, 0 to 63, and the task address of the quadword is the address of the A minor thread of control in an address space, byte containing bit 0. able, like a process, to share memory and run in parallel with other control threads, but less general (and therefore less burdoned with

Glossary-4 Multimax Technical Summary Glossary of Multimax Terms

overhead) than a process. Multiple tasks may Whetstone go to make up a process. A common FORTRAN benchmark program, originally developed by the British Ministry of TCP/IP Defense, which measures floating point per­ "Transport Control Protocol/Internet Proto­ formance of a computer system. Also, a short­ col." TCP and IP have been the standard hand notation for the results of the program, Department of Defense Internet communi­ more accurately reported in "Whetstone cations protocols. They have also become the Instructions per Second" or "Kilo Whetstone standard UNIX networking protocols. Instructions per Second." Two separate per­ formance figures are quoted for most tightly-coupled machines - one for single-precision floating Describes multiprocessor computing systems point, another (typically smaller) for double- in which component processors share the precision. same bus, operating system, and memory. See also closely-coupled, loosely-coupled. working set The minimum amount of physical memory thrashing required by a process to prevent thrashing. A situation in which pages are requested faster than they can be supplied - with the write back consequence that the requesting processor A cache memory management technique, not remains idle, waiting for service from used by Multimax DPCs, whereby data from a secondary storage. See working set. write operation to cache is copied into main memory only when the data in cache must be UMAX overwritten. This action results in temporary Refers to either of the Multimax operating sys­ inconsistencies between cache and main tem environments - UMAX 4.2 or UMAX V - memory. See also write through. which are based on the Bell Laboratories' UNIX system. write through A cache management technique whereby data virtual address from a write operation is copied in both cache A 24-bit integer identifying a byte location in and main memory. This procedure keeps cache virtual address space. The memory manage­ and main memory always in step with one ment hardware translates virtual addresses to another. Multimax DPCs use the write through physical addresses. technique. See also write back. virtual address space X.25 The set of all possible virtual addresses that a A series of standard communications protocols process can use to identify the location of an developed by CCITT which define how com­ instruction or data element. The virtual puters are connected to public packet-switched address space seen by the Multimax program­ networks. X.25 is rapidly becoming the stan­ mer is a linear array of over 16 million (224) dard international communications protocol. byte locations. XMMU virtual memory See ETLB, MMU. The set of storage locations in physical mem­ ory and on disk that are referred to by virtual addresses. Virtual memory makes disk stor­ age locations appear to the programmer to exist in physical memory.

Multimax Technical Summary Glossary-5 Index

a Boolean operators (NS32032) A-5 accept 4-4 branches (NS32032) A-6 address bus 2-5 bus tag (Dual Processor Card) 2-8 addressing modes (NS32032) A-12 bus throughput advantages of Multis p-6 maximizing 2-5 advantages of UMAX 3-1 C ALLY 3-6 C language under UMAX 3-4 Annex 2-13 cabinets for Multimax system 2-1 commands 3-6 cables and connectors 2-16 processor and memory 2-13 cache memory (Dual Processor Card) 2-8 remote editing protocol 3-8 calls (NS32032) A-6 self-tests 5-4 character handling (Annex) 2-13 specifications 2-14 clocks terminal performance 3-8 System Control Card 2-8 Annex advantages 2-13 close coupling vs. tight coupling 1-2 Annex character handling 2-13 Cluster Kit (Annex) 2-13 Annex Cluster Kit 2-13 coarse grained parallelism 4-1 applied parallelism hardware support 4-5 data partitioning 4-1 software support 4-5 functional partitioning 4-1 COFF under UMAX 3-4 pipelining 4-1 communications architecture on Multimax 1-6 applying Multis p-6 configurability of Multimax 1-4 arbiters (System Control Card) 2-8 connectors and cables 2-16 array operators (NS32032) A-6 console command interpreter (Multimax) 5-4 arrays (NS32032) A-4 Continuum availability of Multimax 5-1 Encore Computing 1-1,1-5 b cost of synchronization (parallel programming) backplane on Multimax system 2-3 4-2 band printers 2-16 CPU tag (Dual Processor Card) 2-8 BCD data types (NS32032) A-4 d binary-coded decimal data types (NS32032) A-4 data bus 2-5 bit field data types (NS32032) A-3 data partitioning (applied parallelism) 4-1 bit field operators (NS32032) A-5 data transfer rate bit operators (NS32032) A-5 Nanobus 2-3 block operators (NS32032) A-6 data types supported by NS32032 A-l Boolean data types (NS32032) A-3 devconfig 3-4

Multimax Technical Summary lndex-1 Index

diagnostic processor (System Control Card) 2-6 h distributed peripheral control under UMAX 3-5 hardware overview DPC (see Dual Processor Card) Multimax 1-1 Dual Processor Card 2-8 hierarchical file system in UMAX 3-2 bus tag 2-8 high level languages in UMAX 3-3 cache memory 2-8 historical basis of Multis p-5 CPU tag 2-8 HostStationllO 2-15 floating point unit 2-8 i memory management 2-8 independent parallelism processor type 2-8 hardware support 4-3 time slice end 2-9 software support 4-3 e instruction format (NS32032) A-12 ECC (see error correcting code) instruction set (NS32032) A-9, A-10 EMC (see Ethernet/Mass Storage Card) integer data types (NS32032) A-2 Encore Computing Continuum 1-1,1-5 integer operators (NS32032) A-5 environmental monitors (System Control Card) interleaving (on SMC banks) 2-9 2-7 interlocked instruction 4-7 error correcting code (Shared Memory Card) 2-10 interlocked operations on Nanobus 2-6 Ethernet/Mass Storage Card 2-11 EMC control 2-12 jumps (NS32032) A-6 LAN control 2-12 maximum number 2-11 1 SCSI control 2-12 LAN control (Ethernet/Mass Storage Card) 2-12 example of parallelizing an application 4-12 listen 4-4 exec 4-6 locks on Shared Memory Card 2-10 expandability of Multimax 1-4 logical operators (NS32032) A-5 f m fault minimization on Multimax 5-1 maximizing bus throughput 2-5 fifth generation computers p-5 medium grained parallelism fixed-disk drive on Multimax 2-12 hardware support 4-7 floating point operators (NS32032) A-5 software support 4-6 floating point unit (Dual Processor Card) 2-8 synchronization primitives 4-8 floating-point data types (NS32032) A-2 memory management (Dual Processor Card) 2-8 fork 4-6 memory management support in UMAX 3-2 Fortran-77 under UMAX 3-4 minimizing faults on Multimax 5-1 FPU registers (NS32032) A-9 MMU registers (NS32032) A-8, A-9 front panel interface (System Control Card) Multi 2-8 technological basis p-5 front panel switches/indicators (Multimax) 5-2 Multi, defined p-5 fsk 3-5 Multimax functional overview communications architecture 1-6 Multimax 2-1 configurability 1-4 functional partitioning (applied parallelism) 4-1 console command interpreter 5-4 future evolution of Multis p-8 expandability 1-4 g fault minimization 5-1 fixed-disk drive 2-12 gateway computers 2-14 front panel switches/indicators 5-2 general purpose registers (NS32032) A-7 hardware overview 1-1 generalpurpose tools under UMAX 3-4 mass storage 2-12 grain size memory bandwidth 1-3 Multis p-7 microprocessor chip A-l grain size in parallel programming 4-2 performance 1-2 reliability 1-4, 5-1

lndex-2 Vfu'T,max Technical Summary Index

removable-disk drive 2-12 binary-coded decimal data types A-4 self-test capablities 5-2 bit field data types A-3 self-test sequence 5-2 bit field operators A-5 software product reliability 5-4 bit operators A-5 system backplane 2-3 block operators A-6 system cabinets 2-1 Boolean data types A-3 system diagram 2-4 Boolean operators A-5 system exerciser 5-4 branches A-6 system packaging 2-1 calls A-6 tape drive 2-13 data types supported A-2 terminal architecture 1-3 features A-l tight coupling 1-2 floating point data types A-2 Multimax availability 5-1 floating point operators A-5 Multimax functional overview 2-1 FPU registers A-9 Multimax software 1-5 general-purpose registers A-8 Multimax specifications 2-2 instruction format A-12 Multimax strategy 1-7 instruction set A-9, A-10 multiprocessing vs. multiprogramming 3-2 integer data types A-2 multiprogramming and Multis p-7 integer operators A-5 multiprogramming vs. multiprocessing 3-2 jumps A-6 Multis logical operators A-5 advantages p-6 MMU registers A-8, A-9 applying p-6 operators A-5 future evolution p-8 records A-4 grain size p-7 register manipulation operators A-6 historical basis p-5 register set A-6, A-8 synchronization p-7 special encodings A-l3 Multis and multiprogramming p-7 special pupose registers A-7 Multis vs.superminicomputers p-6 stacks A-4 multitasking library 4-11 standard addressing modes A-l 1 multithreading (UMAX) string operators A-6 read/write locks 3-7 strings A-4 semaphores 3-7 O spin locks 3-7 operators (NS32032) A-5 mutual exclusion 4-6 n P packaging Nanobus 2-3 Multimax system 2-1 address bus 2-5 parallel operation under UMAX 3-6 data bus 2-5 parallel programming data transfer rate 2-3 coarse grained parallelism 4-4 future Multimax arrays 1-7 fine grained parallelism 4-12 interlocked operatons 2-6 grain size 4-2 pended operation 2-5 independent parallelism 4-3 pipelined operation 2-6 medium grained parallelism 4-6 synchronous operation 2-5 memory allocation 4-7 vector bus 2-5 required features 4-2 Nanobus interface (System Control Card) 2-7 very coarse grained parallelism 4-3 network facilities under UMAX 3-5 parallel utility programs (UMAX) 3-8 NS32032 partition 3-5 addressing modes A-12 Pascal under UMAX 3-4 array operators A-6 pended operation on Nanobus 2-5 arrays A-4 performance of Multimax 1-2 BCD data types A-4 pipelined operation of Nanobus 2-6

Multimax Technical Summary lndex-3 Index

pipelining (applied parallelism) 4-1 monitors 4-10 process management in UMAX 3-3 read/write locks 4-9 processor chip (Multimax) A-l semaphores 4-8 processor type (Dual Processor Card) 2-8 spinlocks 4-8 r synchronous operation 2-5 sysboot 3-5 read 4-4 sysmon 3-5 records (NS32032) A-4 sysparam 3-5 recv 4-4 system administration facilities under UMAX 3-4 redirectable I/O in UMAX 3-3 system clocks (System Control Card) 2-8 register manipulation operators (NS32032) A-6 System Control Card 2-6 register set (NS32032) A-6, A-8 arbiters 2-8 reliability of Multimax 1-4, 5-1 diagnostic processor 2-6 remote editing protocol (Annex) 3-8 environmental monitors 2-7 removable-disk drive on Multimax 2-12- front panel interface 2-8 S Nanobus interface 2-7 SCC (see System Control Card) serial interface 2-8 SCSI control (Ethernet/Mass Storage Card) 2-12 shared memory & timers 2-6 self-test capabilities on Multimax 5-2 system clocks 2-8 self-test on DPCs and EMCs 5-3 system diagram of Multimax 2-4 self-test on SMCs 5-3 system exerciser on Multimax 5-4 self-test sequence on Multimax 5-2 system services under UMAX 3-4 self-tests on Annexes 5-4 t semaphores (UMAX multithreading) 3-7 tape drive on Multimax 2-13 send 4-4 technological basis of Multis p-5 serial interface (System Control Card) 2-8 terminal architecture on Multimax 1-3 share 4-7 terminal drivers in UMAX 3-8 shared memory & timers (SCC) 2-6 terminal performance on Annex 3-8 Shared Memory Card 2-9 test-and-set instruction 4-6 Error Correcting Code 2-10 text editors under UMAX 3-4 interleaving 2-9 tight coupling on Multimax 1-2 locks 2-10 tight coupling vs. close coupling 1-2 shells in UMAX 3-2 time slice end (Dual Processor Card) 2-9 SMC (see Shared Memory Card) SMC self-tests 5-3 u socket 4-4 UMAX software product reliability 5-4 advantages 3-1 special encodings (NS32032) A-13 Clanguage 3-4 special purpose registers (NS32032) A-7, A-8 caching 3-7 specifications COFF 3-4 Annex 2-14 distributed peripheral control 3-5 Multimax 2-2 fine-grain locking 3-7 spin locks (UMAX multithreading) 3-7 Fortran-77 3-4 stacks (NS32032) A-4 general purpose tools 3-4 standard addressing modes (NS32032) A-l 1 hierarchical file system 3-2 string operators (NS32032) A-6 high level languages 3-3 strings (NS32032) A-4 memory management support 3-2 superminicomputers vs. Multis p-6 multithreading 3-7 symmetrical multiprocessing under UMAX 3-7 network facilities 3-5 synchronization parallel operation 3-6 Multis p-7 parallel utility programs 3-8 synchronization primitives Pascal 3-4 barriers 4-10 performance 3-6 events 4-10 process management 3-3

lndex-4 Multimax Technical Summary Index

redirectable I/O 3-3 shells 3-2 symmetrical multiprocessing 3-7 system administration facilities 3-4 system services 3-4 terminal drivers 3-8 text editors 3-4 UMAX 4.2 3-1 UMAX V 3-1 UNIX AT&T System V 3-1 Berkeley 4.2BSD 3-1

V vector bus 2-5 very coarse grained parallelism hardware support 4—4 software support 4-4 w write 4-4

X X.25 gateways 3-8

Multimax Technical Summary lndex-5 Encore Computer Corporation R—(48 POINT)ALTERNATE GOTHIC CONDE.K5LD (SEE NOTE 2.)

- THIS 'ARE-A, TO BE. CLEAR (TRANSLUCENT) LO PLCS .E^ORAD. 4 PLCS

7 EQuAV. SPACE.S <£) \\\Z5- 7.875 TOL. MOM ASCCUKA.

:so |,a5° ! L! ! .500 4 1

T" P