Sequent’s NUMA-QTM Architecture Overview Platform architecture This document overviews the path-break- alternatives for the enterprise ing technologies introduced by Sequent’s® Sequent’s target market is the commercial NUMA-Q™ architecture. It describes data center solving mission-critical prob- how the current enterprise-class system lems. systems in these centers architectures are driven by usage models have several characteristics in common. such as on-line transaction processing They need to be highly available (just (OLTP), decision support systems (DSS), minutes of downtime per year), highly and business communications. It also reliable, capable of meeting ever increasing describes Sequent’s work to develop a performance demands, highly scalable, common building block for all enterprise- and finally integrated into a heterogeneous class system architectures. This includes systems management environment. The descriptions of the 4x ® ® Pro primary applications, or usage models, processor SMP system (quad) building found in data center computing today block, a Sequent-developed system inter- fall into three major categories: connect for linking these quads, the ■ OLTP: On-line transaction processing architectures to which they can be refers to the day-to-day management applied, and the benefits realized from of business functions using a relational these applications. database. ■ DSS: Decision support systems refer Sequent’s new NUMA-Q (Non Uniform to the extraction, analysis, and pre- Memory Access for Quads) architecture sentation of data from databases to yields new levels of performance, avail- enable decision-making based on ability, and manageability in enterprise- operations. class systems. NUMA-Q is not so much ■ Business communications: Refers to a family of products as it is a quantum messaging, web servers, document leap in symmetric multiprocessing (SMP) retrieval, and workflow. and clustered systems architectures, and the realization of highly available and manageable enterprise-class networked server architectures.

Networked Servers Large SMP systems

MPP, Shared Nothing systems Clustered Shared Disk SMP systems

Four fundamental architectures for enterprise-class computing 1 Architecture Usage Model Pros Cons Networked servers Small Digital Libraries Inexpensive Management, availability SMP DSS, OLTP, Bus. Comms Easy to program Limited in size by backplane Clustered SMP DSS, OLTP, Bus. Comms High Availability Requires more management MPP DSS Can be very large Data skew problems

OLTP, DSS, and business communica- networked-servers model is unsuitable tions systems designers currently have for implementing large OLTP, DSS, and four architectural options for their com- business communications applications. puting platforms: It forces an arbitrary distribution of ■ Small networked-servers: Multiple data across servers, leading to the diffi- small standalone servers connected cult problems of migrating processes over a network. or replicating data across a network ■ Large SMP nodes: Many processors of servers. and resources running under one operating system. The primary difficulty with implement- ■ Clustered SMP nodes: Multiple ing a vast network of small servers is instances of an application running in management and availability. Most on separate nodes under separate commodity server companies put more instances of an operating system, emphasis on cost than reliability and but sharing some storage devices manageability. As a result, networked- and data. server solutions are often driven by a ■ MPP (Massively Parallel Processor) low-cost requirement and suffer avail- systems: Many unique instances of ability and manageability problems. an operating system and application, on separate nodes, usually without SMP any shared resource, and communi- Large single-node SMP systems have cating by passing messages. gained popularity, because they are ideally suited to large DSS and OLTP Each of the usage models has different applications. Data managed by an SMP requirements with respect to I/O, mem- system is centrally located, users share ory, processor, and connectivity. Thus, a pool of resources, and SMP systems each architecture has characteristics are easy to manage. Additionally, single that can be a help or a hindrance SMP nodes make it easy to measure depending on the usage model. The peak performance, and project and plan choice of architecture is, therefore, for future performance needs. Another largely dependent on the usage model. reason why SMP has become the domi- nant enterprise architecture is because Networked servers it provides a smooth migration path for The networked servers model suggests sophisticated uniprocessor applications that many large computing problems to high-performance multi-processor can be solved by a network of small systems. or servers. It is true that a collection of networked servers can be One future drawback for large single- successfully and economically applied node SMP systems is that the number to some problems such as a small of processors will be increasingly limited World Wide Web service, or a digital by the size and speed of the backplane library for presentations and documents and the shared system bus. Physics is in large corporations. However, the the largest contributor to future band-

2 width limitations. As application be running on all nodes performance continues to dramatically simultaneously and that nodes commu- increase, computer system designers are nicate before making changes to shared forced to make a bus length/bus speed data. The latter point is, in fact, what tradeoff—electrons travel at near light controls cluster scalability. It is also one speeds, and no amount of encourage- of the many factors that limit the appli- ment will speed them up! Large SMP cation of MPP to business problems. system designs must include shorter The industry has learned how to make backplane/system buses to meet the 4-8 SMP nodes communicate effectively, needs of faster processors and I/O. The while MPP architectures attempt to smaller backplanes, while faster, support pass messages between hundreds of fewer processors simply because of nodes. The immaturity of MPP message- packaging constraints. This architectural passing software, and the associated limit will constrict the amount of I/O overhead, limits MPP’s applicability into and out of single-node SMP sys- of DSS and OLTP problems. “Out of tems. In the future however, DSS and box” clustered SMP performance is still business communications applications improving, especially with the advent will continue to require increasing of software that can take advantage of amounts of I/O. Another downside to reflective memory technology such as large single-node SMP systems is that Sequent’s Scalable Data Interconnect there are single points of failure, which (SDI). can cause application interruptions. The downside of clusters is that they Clustered SMP require more thought in management The solution to the latter problem is the and load balancing. The more nodes, interconnection of single SMP systems the more complex the problem. The into a cluster of nodes. When imple- speed and latency of the message passing mented to gain availability, clustering interconnect is key to improving the provides enough performance on one scalability of clusters. or more nodes-and access to common resources-to completely replace the MPP unplanned loss of another node. In the The one overwhelming advantage of event of a single node outage, the other MPP architectures is the ability to con- nodes continue to operate and may nect hundreds of processor/memory automatically assume the load of the cells (individual nodes with their own failed node in a period of minutes. copy of the OS and application). This Open-systems relational database man- is also the overwhelming disadvantage. agement system (RDBMS) companies For problems that require an enormous are evolving their software to support amount of I/O followed by localized clustered environments to dramatically computation, and where the intervening improve availability beyond what has results and original data do not have to been practical on traditional single-node be shared across the pool of processors, SMP systems. MPP systems can offer satisfactory results. A video server is just such an Clusters can also achieve far greater application. However, for applications performance and scalability than a single that need to scan large data sets in an SMP node. This “out of box scaling” unpredictable fashion (DSS) or applica- is generally due to the increased number tions that require many updates (and, of users that can be connected, the therefore, locking) like OLTP, the cum- increased I/O bandwidth, and the bersome messaging of MPP becomes a increased amount of processors and bottleneck. memory. It also requires that the

3 This is where SMP’s single large memory To overcome the architectural limitations and processor pool excels. In an SMP of virtually all the current approaches system, message-passing between proces- to building enterprise-class computing sors is implicit through shared memory systems today, Sequent launched a and as such is orders of magnitude massive design project in 1992 with faster than MPP. The equally short the following ambitious goals: memory access latency of SMP systems ■ Creating a new set of CPU and makes optimizing performance quite memory-interconnect building straightforward. The “distributed every- blocks for enterprise-class systems thing” or “shared nothing” model of beyond the year 2000. most MPP systems is a complex environ- ■ Meeting the needs of the OLTP, ment that is difficult to program in— DSS and business communications thus limiting the range of applications markets in the same time period. where an MPP system is a suitable ■ Leveraging higher-level integrated choice. MPP systems require software components from Intel and other architectures that have not yet been component suppliers. proven in open systems production ■ Building systems from these building environments. Currently, MPP vendors blocks that meet avail-ability and are trying to make their processing manageability needs. nodes more powerful by turning them ■ Allowing OLTP, DSS, and business into SMP nodes. The MPP vendors communication system designers to are also beginning to add shared disk choose freely between networked- capabilities to their systems, made easier servers, SMP, and clustered SMP because of the advent of optical inter- architectures. connects. An MPP system made up of many loosely coupled SMP nodes and In the process, two breakthrough shared-disk resources is just an SMP technologies have been invented— cluster. It is becoming apparent that the NUMA-Q architecture and the shared-disk clusters are, in fact, the IQ-Link™ interconnect-that will revolu- convergence point of MPP and SMP. tionize system construction in the next The old “distributed everything” MPP ten years. In simplest terms—a system model will soon be abandoned. This based on the NUMA-Q architecture will leave just three effective architectures includes multiple 4x Pentium Pro for implementing mission-critical solu- quad SMP systems tied together with tions in open systems: networked-servers, Sequent’s new IQ-Link interconnect SMP, and clusters. technology to form a single large computing complex. Sequent’s new NUMA-Q architecture The building block: the 4x A pervasive trend in the past ten years Pentium Pro processor quad has been the widespread use of com- The Sequent NUMA-Q architecture modity technologies as building blocks leverages the new Intel 4x Pentium Pro for large systems. In the past, merchant processor SMP baseboard as a com- were the primary modity building block for large systems. building block and were applied all the A four-processor Pentium Pro SMP sys- way from the desktop to the superserver. tem may at first glance appear to be In the future, however, the microprocessor a natural progression for 4 processor will be replaced by the 4x Pentium Pro Pentium systems available in the mar- processor quad. The same 4x Pentium ketplace today. However, there is one Pro processor (quads) used to imple- important difference: 4x Pentium Pro ment a network of servers can be used processor systems use newer Intel bus to build very large SMP systems. logic that allows for third-party control

4 of the processor bus. Third-party control In the quad CPUs, memory and I/O are is the “hook” required to permit the uniquely arranged. Sequent has essen- joining of multiple quads together to tially pulled memory apart and put form a larger system. pieces of it near each processor. I/O is also closer to each processor, yielding Sequent partnered with Intel to design several advantages. The primary advan- the 4x Pentium Pro SMP baseboard. tage is that for all memory and I/O However, we made some changes to the accesses that can be satisfied inside the 4x quad to make it suitable for high- quad, there is no need to go out and use end, mission-critical, data center appli- bandwidth on the interconnect between cations. These changes include: quads. In today’s single-node SMP ■ On-Line Replacement/On-Line implementations, all memory and I/O Insertion of redundant power supplies access travel over a single shared bus. ■ Additional EMI shielding to meet In the NUMA-Q architecture, many of large system requirements these accesses are handled at the quad ■ Greater fault isolation capability level. When a memory access does go ■ A 4x Pentium Pro daughter card out on the IQ-Link interconnect, it hap- ■ Reliability improvements pens as fast in the new architecture as ■ Removal of some unneeded PC logic it does in Sequent’s current Symmetry® ■ Design of a management and diag- 5000 systems. Software applications do nostic processor (MDC) not have to change to accommodate ■ Design of our own high-quality the architecture. memory controller The finished quad comes in its own The value of having memory and I/O rackmount box, with two PCI buses in the quad near the processors is that accommodating up to seven PCI boards the 500 MB/sec bus that links these per quad. Each quad includes four together can operate independently processors and between 512 megabytes of all the other quads until it makes a (MB) to 4 gigabytes (GB) of memory. request that must be fulfilled outside This is the new building block for the quad. The effective bus bandwidth enterprise class systems, commercially of the system is now the summation hardened for mission critical applica- of all the quads’ 500 MB/sec buses, or tions by Sequent. 32 GB/sec for a 252-processor system. With two PCI buses in each quad, each rated at 133 MB/sec, half of this 32 GB/sec can be used for I/O.

500 MB/S Mem I/O SMP Bus

The quad

5 IQ-Link 500 MB/S SMP Bus

Mem I/O IQ-Link

The quad with an IQ-Link interconnect

The IQ-Link: The Sequent-designed connection tech- 1 GB/second SCI link nology for linking multiple quads is Engineers at Sequent reasoned that if called IQ-Link. These links can be made the industry is at a point where the memory-coherent (as in creating a single networked-servers, SMP, and clustered large SMP system from multiple quads), architectures can utilize the same build- or the links can be used strictly for fast, ing block to forge a solution, then low-latency message-passing (as in the perhaps one can also craft a common case of networked-server or clusters interconnect to suit all these architec- architecture). These attributes are also tures. Why not create a systematic way those needed to tie clusters of large to manage the quads and the software SMP servers together to maximize per- running on them from a central location, formance. In order for a group of quads regardless of the architecture? And to run as a single SMP system (a node), when you design the quad, interconnect, the interconnect creates a single large and software programs, focus on achiev- contiguous coherent view of memory ing the highest availability possible, out of the distributed pieces of physical because it is key to all three architectures. memory found in each quad. Because IQ-Link can provide this unified view The interconnects for processor chips of memory, with ranges of the address are well understood; they involve caches space parceled out to each quad, one and buses in SMP systems. Intercon- instance of the OS and the applications necting systems pose a much greater simultaneously runs on all interconnect- and less well understood challenge. ed quads. The result is a very large, Sequent has met the challenge of creat- single-node SMP system. ing an interconnect for the new building blocks of the computer industry, an When IQ-Link is used to create a large interconnect that goes beyond just SMP system, it has the ability to moni- building large SMP systems. Sequent tor the Pentium Pro quad processor bus has created an interconnect that can and knows when it should respond to be used to build 252-processor SMP requests for specific memory locations systems, support very large clusters of (those outside of the range of the por- large SMP systems, or even network tion of memory contained on this hundreds of servers together with quad). IQ-Link examines its own large unparalleled bandwidth and low latency, all without a backplane.

6 cache (L3) for this data, and if it cannot bandwidth, but taken altogether, multi- be found there, it puts a request out to ple segments can connect a large number the portions of memory on other quads. of processors while delivering an aggre- All of this activity is transparent to the gate bandwidth of more than 1 GB/sec. database and application software. Leveraging key technologies Some memory accesses are resolved In NUMA-Q, Sequent is leveraging quickly, when the data is found in the key developing technologies. SCI, the memory on the same quad that made Scalable Coherent Interconnect, is the the request. Other accesses are resolved basis for the IQ-Link interconnect. slower, when the data has to be fetched Sequent has partnered with Vitesse from a different quad by IQ-Link. This Corporation to develop a 1 GB/sec data type of architecture is often referred to pump chip based on GaAs. This compo- as CC-NUMA: Cache-Coherent Non nent is integral to the implementation Uniform Memory Access. The key to and design of IQ-Link. Sequent’s own implementing CC-NUMA successfully ASIC designs on IQ-Link are the key is to make the fetches to memory in to achieving performance far superior other quads so fast and so rare that the to other SCI implementations. software can effectively ignore it. Poorly implemented interconnects will result in PCI is evolving as the standard I/O bus, long latency, poor scaling of nodes, and replacing VMEbus, for both desktop disappointing performance. systems and servers. Fibre Channel is the I/O standard for disks and tapes and The 1 GB/sec IQ-Link is not a backplane. provides the reliability, availability, and It is in fact a daisy chain connection serviceability, that enterprise class systems between quads. It is this point-to-point need. Lastly, the Pentium Pro processor nature of the IQ-Link that overcomes is SMP-ready and allows third-party the length/speed design tradeoff discussed control of its bus. Sequent has leveraged earlier. Indeed, each segment of the IQ- all of these emerging technologies in the Link is limited in length to achieve high design and implementation of the NUMA-Q architecture.

IQ-Link IQ-Link IQ-Link Mem I/O Mem I/O Mem I/O

QUAD Interconnects: 1 GB/Sec

CC-NUMA SMP architecture

7 Benefits of Sequent’s ■ Improved serviceability and reliability, new architecture reducing customer downtime. Sequent’s new NUMA-Q architecture ■ Increased I/O capacity permitting provides an enterprise with a number greater on-line storage and backup. of important advantages over the next ■ Increased memory capacity, permit- ten years including: ting more users and faster processing. ■ Flexibility to use one set of ■ Improved install-ability, reducing the components to create a variety time it takes to install a system and of architectures. get it operational. ■ High-end performance leadership. ■ Headroom for growth of at least ■ Common set of availability and 30% per year for the next five years. manageability tools for a variety ■ Maximum customer investment of solutions. protection. ■ Higher application availability. ■ Binary compatibility with Sequent ■ Higher single-stream and system- Symmetry 5000. wide performance, to support more users and greater throughput. The NUMA-Q architecture will allow ■ Vastly improved system management, OLTP, DSS and business communications reducing operator errors and training, system designers to build very large simplifying system administration, mission-critical solutions without the and improving application availability. management nightmares of the net- worked-server implementations, the

Large SMP systens

Networked servers

P/M P/M P/M P/M

P/M P/M P/M P/M

P/M P/M P/M P/M P/M✕P/M P/M P/M

MPP, shared nothing systems Clustered shared disk SMP systems

The future for the four fundamental architectures of enterprise-class computing

8 backplane limitation of current SMP The data center will also see old-style systems, and the programming intrica- shared-nothing MPP systems gradually cies of the MPP paradigm. In huge evolve into shared-disk clusters of large information servers comprised of many SMP nodes. For the shared-disk clusters distinct 4x Pentium Pro quad servers, of SMP nodes, the performance of the new Sequent IQ-Link provides full NUMA-Q architecture systems is memory bandwidth communication greater and more predictable than directly between quads with a fraction MPPs, management is easier, and of the overhead experienced with net- availability is higher. worked servers today. In large SMP systems, instead of limited scalability In 1983, Sequent took the lead in lever- above 30 processors, a single OS aging the microprocessor as a building instance can manage hundreds of proces- block for creating larger systems and sors. With an effective bus bandwidth in pioneering the industry move into of 32 GB/sec, interconnect bandwidth symmetric multiprocessing. In 1997, in excess of 1 GB/sec and latencies as Sequent will lead the industry to the low as two microseconds, memory next great step for computer architec- accesses over the IQ-Link to other tures: applying one common building quads are faster than today’s backplane- block and one common interconnect based SMP systems. This means that to all of the architectures needed in applications aren’t affected, and in fact enterprise class computing. In success- never need to know that the backplane fully leveraging the 4x Pentium Pro has disappeared. processor quad as a building block, and crafting an interconnect that requires no programming changes from the successful SMP model, Sequent is demonstrating the way forward.

9 Corporate headquarters: American headquarters: Sequent Computer Systems, Inc. 15450 SW Koll Parkway Beaverton, 97006-6063 (503) 626-5700 or (800) 257-9044 URL: http://www.sequent.com

European headquarters: Sequent Computer Systems, Ltd. Sequent House Unit 3, Weybridge Business Park Addlestone Road Weybridge, Surrey KT15 2UF England (44) 1932 851111

Asia/Pacific headquarters: Sequent Computer Systems (Singapore) Pte Ltd. 80 Robinson Road, #18-03 Singapore 068898 (65) 223-5455

With offices in: Australia, Austria, Brazil, Czech Republic, France, Germany, Hong Kong, India, Indonesia, Italy, Japan, Korea, Malaysia, Mexico, The Netherlands, New Zealand, Philippines, Poland, Russia, Singapore, Taiwan, Thailand, United Kingdom, and United States.

With distributors in: Bahrain, Brunei, Croatia, Czech Republic, Egypt, Greece, Hong Kong, Hungary, India, Japan, Korea, Kuwait, Malaysia, Mexico, Oman, People’s Republic of China, Philippines, Poland, Russia, Saudi Arabia, Slovenia, South Africa, Sri Lanka, Thailand, Ukraine, United Arab Emirates, and Yugoslavia/Serbia.

Sequent, Symmetry and WinServer are registered trademarks and NUMA-Q and IQ-Link are trademarks of Sequent Computer Systems, Inc. Intel and Pentium are registered trademarks of Intel Corporation.

Copyright ©1997 Sequent Computer Systems, Inc. All rights reserved. This document may not be copied in any form without written permission from Sequent Computer Systems, Inc. Information in this document is subject to change without notice. Printed in U.S.A.

PD-1124 6/97