<<

3 METRICS FOR TECHNOLOGY PERFORMANCE

A customer pays for value. We often associate functionality with value and how well the product or service performs that function as performance. Though this is often true, it is always impor- tant to keep in mind that what we think is of value to the customer, may not always be the case. This is illustrated in Figure 3-1. In the context of handling devices, performance is related to the specific function of the various types of information handling.

© 1982 by Sidney Harris – “What’s So Funny About ?”, William Kaufmann, Inc./ ICE, "Roadmaps of Packaging Technology" 16065

Figure 3-1.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-1 Metrics for Technology Performance

INFORMATION HANDLING TECHNOLOGIES

Our society has always had a thirst for information, and technology has been fueling the revolu- tion in how we access information since the Gutenberg press first churned out 42 line Bibles in 1454. Information is handled in four ways:

¥ processed ¥ transmitted ¥ stored ¥ interfaced with the physical world

Every task performed by every electronic device fits into one of these four tasks. Figure 3-2 lists examples of each of these activities. Computers, communications and consumer products are huge markets for information handling.

Information Processing Image morphing Database searching Translation Information Transmission Telephone transmission over twisted pair TV transmission over radio waves PDA to PC over IR link IDE controller to hard drive over SCSI Information Storage Floppy drive Hard drive CD Magnetic tape Information Interface CRT monitor Keyboard Mouse Speakers

Source: ICE, "Roadmaps of Packaging Technology" 22164

Figure 3-2. Four Information Operations

Information processing is the transformation of data into information. This is a pervasive task which happens in virtually all electronic systems from to digital watches embed- ded in pens or rings. It only takes a few gates to information, and when a 4-bit embed- ded costs 25¢ in high volume, it is only a matter of time before any device with access to a battery or is plugged in the wall will be capable of information processing.

3-2 ENGINEERING CORPORATION Metrics for Technology Performance

Information transmission is the transport of data from one location to another. Every electronic interconnect interface is devoted to either power or information transmission. This task spans from to transistor on a chip in 2 micron long fine line aluminum wires, to 45 kilometer long fiber-optic cables from repeater to repeater station, two miles deep in the Atlantic Ocean. In a scale appropriate to our human size, we are touched by information transmission on a daily basis over telephone lines, by electromagnetic waves to radios and by infrared links from our remote control to a TV.

Information storage is the placement of data in a static media where it can be located and retrieved at a later time. This task spans the scale from on chip registers that may be only 16 bits wide, through optical disk farms that may contain 10,000 CD discs, each with 10Gbits of data. There are three currently used electronic media for information storage: semiconductorÑin various forms of random access memory (RAM), such as dynamic (DRAM), static (SRAM), video (VRAM), flash, etc.; magnetic, such as magnetic tape, floppy disks and hard disks; and optical, such as compact disks (CDs), and digital video disks (DVDs).

Information interface refers to the transfer of information from the physical world to the elec- tronic world, either as input or as output. The Man-Machine interface is a specialized case. The most common output interfaces encompass visual display devices such as CRT (cathode ray tube), LCD (liquid crystal display) and printers, or sound generation through speakers. Vibration is also popular as an output for pagers, transmitting one bit of information. The most popular input devices today are keyboards, mice, pen-touch screens and microphones for use with voice recognition . In addition to the Man-Machine interfaces, there is a whole universe of sen- sors and actuators that are used in monitor and control applications for automotive, home and industrial environments.

Though only information processing and transmission are discussed below, the packaging tech- nologies used in all four applications are discussed throughout this book.

INFORMATION PROCESSING

The Migration from Super to Shirt Pocket

There has never been, and will never be, enough processing power available to the individual. The functions performed by super computers today will eventually be performed by personal computers and PDAs tomorrow. Those functions only dreamed of now, will some day be per- formed by the leading-edge super computers.

Information processing, or computing power, is an intrinsic feature of every electronic device we use. We call some of these devices computers, such as a mainframe, server, personal computer or laptop. And some we do not recognize as computers, yet have information processing as their

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-3 Metrics for Technology Performance

foundation, such as digital phones, cameras, personal digital assistants, TVs, washing machines, and sewing machines. Computers have become embedded into virtually every elec- tronic device that plugs into the wall or is powered by a battery.

The performance of a computer is not the factor that classifies it as a mainframe or a microcom- puter. Any table that listed the MIPS (millions of ) or FLOPS (Floating Point Operations Per Second) rating of a ÒtypicalÓ super computer would be out of date within a few years of its introduction.

For example, in 1982, a super computer was defined as a computer of about 20MegaFLOPS or higher. In 1989, the 33MHz 486DX had a peak speed of 27MIPS, roughly equivalent to 20MegaFLOPS, the threshold for super computer speed. Also in 1989, NEC introduced the SX-X, at 22GigaFLOPS, three orders of magnitude higher than the super computer threshold. This ever increasing trend in performance is shown in Figure 3-3. The performance of any computer family is a constantly increasing quantity.

$12M 105 Cray T90 Cray C90 Cray T3E Cray Y-MP 4 Cray T3D 10 Cray 2 Cyber 205 er Second) Cray X-MP 103 Cray 1 Intel iPSC TMC CM-1

102 CDC 7600 IBM 360/90 (DEC VAX 780) 10 CDC 6600 IBM 7094 Illiac IV IBM Stretch (DEC PDP11) IBM 7090 (IBM 360) IBM (PDP6) 1 704 eak Speed (Millions of Operations P (DEC PDP1) P $5M 0.1 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000

Year Source: Physics Today/ICE, "Roadmaps of Packaging Technology" 21981

Figure 3-3. Leading Edge Growth of Computer Technologies Since 1955

Rather than performance capability, a computer or information appliance is defined by its form factor. The form factor reflects the physical size of the product, the number of people it can serve and its shape. These factors also define general price ranges for each form factor. A variety of

3-4 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

form factors have emerged for computer systems and information appliances over the last few decades (Figure 3-4). As electronics technology evolves, these form factors and market prices stay relatively the same. What changes is the performance capability of each product.

Super Computer 100 Inches $10M

Mainframe Computer

Server

Workstation

PC

HDTV

Laptop Fax Notebook Camera

Telephone Calculator

Watch 1 Inch $1 Size Price Relative Performance

Source: ICE, "Roadmaps of Packaging Technology" 15778A

Figure 3-4. Form Factors for Electronic Systems

The real revolutions in new products are occurring in the large form factors and in the smallest form factors. At the high end, the total processing capability available to run an individual pro- gram opens up new problems to simulation, in a reasonable time period. For example, to simu- late and predict the local weather for the next day, in less than one dayÕs worth of computation time, requires an estimated performance of 100GigaFLOPS.

Higher levels of chip integration have allowed what was PC performance to migrate into the shirt pocket. This is seen in the new generation of Òpersonal information managersÓ such as the Zaurus, the Pilot and the Wizard, as well as the Notebooks, such as from Toshiba, Compaq, and IBM. This evolution of performance is illustrated in Figure 3-5.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-5 Metrics for Technology Performance

3 10 Pro PC

er Second) Pentium PC $5K 2 10 Macintosh Sun WS PC IBM PC Apollo WS 10 Apple II

Altair 8800

1

DEC PDPS $30K 0.1 eak Speed (Millions of Operations P

P 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Year Source: Physics Today/ICE, "Roadmaps of Packaging Technology" 21980

Figure 3-5. Functional/Affordable Growth of Computer Technology Since 1955

In between these two extreme form factors there is a steady migration of features and performance from the large systems into the smaller ones. Patrick Gelsinger, who led IntelÕs 80486 design team, has proposed a new law of computing,

ÒEvery concept proven useful in mainframes or minicomputers has migrated onto the microprocessor.Ó

We might even generalize this more and propose that,

ÒAny function performed by a large size computer will eventually be offered in the smallest computers.Ó

Driving Forces on Computing Devices

The universal driving force for all these devices is more processing power, in a smaller volume, at lower price. This has often been summarized with the phrase, ÒFaster, smaller, cheaper.Ó This march of ever increasing performance density per unit cost is propelled by finding engineering solutions that allow packing into each form factor more gates, able to at higher speeds, at a lower manufacturing cost.

At the high end, the high power dissipation is accepted, and adequate means of dealing with it are implemented. At the low end, where portability is key, low power consumption becomes another critical factor, and the driving force is more performance density per unit cost per watt.

3-6 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

When the information processing capability that can fit in a form factor reaches the level to encom- pass a new task, at an acceptable market price, a new product may be created. The digital watch was created when the sophisticated timing, alarm, and display functions could be integrated onto one chip. The smart card and hand calculator were made possible with the first generation micro- , which integrated about 2000 gates on one chip. These gates, switching at a few hun- dred kilohertz, provided enough processing power to perform floating point operations and transcendental calculations in less than a second.

Two revolutions will drive the need for more computing power to devices used by the individual in the next few years, such as desktop computers and information appliances: ÒsocialÓ interfaces, and real time virtual reality displays.

Social interfaces take advantage of information processing to lower the technical barriers between users and computers or information appliances. Two major elements of a social interface are voice recognition and . With these two features, users may be able to interact with a computer in plain English, without special training.

Virtual reality is a term used to describe the generation of a simulated environment that approxi- mates our real world and responds to the user. This includes interfaces to all of our five senses. In its simpler form, it is a visual and audio medium that has 3D objects in a 3D world. Just creat- ing the 3D images, with realistic shadowing and textures, responding in real time to the changing view of the user, is driving information processing. Some of the applications for virtual reality are listed in Figure 3-6.

¥ Training ¥ Real-time, on-line maintenance manuals ¥ Tele-operation of equipment ¥ Medical diagnosis ¥ Entertainment ¥ Architectural design ¥ Generic man-machine interface

Source: ICE, "Roadmaps of Packaging Technology" 22165

Figure 3-6. Applications for Virtual Reality

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-7 Metrics for Technology Performance

Implementing Information Processing

The chips that perform the information processing have evolved as well. The range of information processing devices includes:

¥ motherboard based CPU ¥ chip set based CPU ¥ single chip microprocessor unit (MPU) ¥ microcontroller unit (MCU) ¥ (DSP)

At the high end, it is usually a chip set which composes the CPU (central processing units). For example, in the VP2000, the CPU is one or more boards, each containing over 100 ECL gate arrays, as shown in Figure 3-7.

Source: Fujitsu/ICE, "Roadmaps of Packaging Technology" 22166

Figure 3-7. Fujitsu VP2000 CPU Board and Cold Plate

In the case of servers and high end workstations, a chip set usually is composed of a CPU, a unit (MMU), an integer processor unit (IPU), and associated level 2 SRAMs. An example of the Ross Hypersparc processor, used in workstations, for example, is shown in Figure 3-8.

3-8 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

Source: nChip/ICE, "Roadmaps of Packaging Technology" 16062

Figure 3-8. Ross/nCHIP RISC, Thin-Film MCM in Cofired Package

The personal computer revolution, which began in 1980, was enabled because of the introduction of the single chip microprocessor. The first IBM PC, the XT, for example, used the processor. The single chip processor has increased in processing power, as shown in Figure 3-9. This is a direct result of the advance in the number of that can be economically imple- mented on a chip, and the increase in operating clock frequency.

Two other types of chips have become an integral part of the information processing revolution: microcontroller units (MCUs) and digital signal processors (DSPs). A microcontroller typically integrates on one chip, many of the functions of an MPU, but with lower performance. Where current MPUs operate with 32-bit and 64-bit data streams, current MCUs operate with 8-bit and 16-bit data streams. An MCU has a microprocessor at its core, with on chip ROM, RAM and multiple I/O ports.

Microcontrollers are embedded in more products than all other processor chips. Figure 3-10 illus- trates the increasing market volume for MCUs over MPUs. Most PDAs use as their CPU, because fewer chips are needed, and the MCU has sufficient processing power for simple functions. Figure 3-11 lists the CPUs for some common PDAs.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-9 Metrics for Technology Performance

1,000 700 P6 600 MIPS (200MHz) 500

300

P6* 300 MIPS (100MHz) 100 Pentium 112 MIPS 70 Intel 486DX 41 MIPS (50MHz) (66MHz) 50

30 Intel 386DX 11.4 MIPS (30MHz)

Intel 386DX 8.5 MIPS (25MHz) Intel 486DX 27 MIPS (33MHz) 10 Intel 386DX 7 MIPS (20MHz) 7 5 80286 2.66 MIPS (12MHz) Intel 386DX 6 MIPS (16MHz)

3

8080 0.64 MIPS (2MHz)

1.0 Performance (MIPS)

0.7 8086 0.75 MIPS (10MHz) 0.5

0.3 8085 0.37 MIPS (5MHz)

0.1

0.07 8008 0.06 MIPS (200KHz) 0.05

0.03 4004 0.06 MIPS (108KHz)

0.01 1970 1975 1980 1985 1990 1995 2000

*ICE estimate Year Source: ICE, "Roadmaps of Packaging Technology" 21602B

Figure 3-9. Intel Microprocessor Clock Frequency

A DSP chip is a highly specialized chip which performs a numeric transform of an input data stream to result in an output data stream. The most common function is the Fast Fourier Transform (FFT). This operation takes a time domain data stream and converts it into a frequency spectrum. In this respect, a DSP is a hardware accelerator for some mathematical transforms.

3-10 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

18,000 3,500 MPU (Units) MPU (Dollars) 16,000 3,000 MCU (Units) MCU (Dollars) 14,000 2,500

12,000 2,000 10,000 1,500 8,000 Millions of Units Millions of Dollars 1,000 6,000

4,000 500

2,000 0 1991 1992 1993 1994 1995 1996 (EST) Year Units (M) MCU 1,722 1,902 2,221 2,659 3,067 3,470 MPU 136 143 167 170 212 245 Dollars ($M) MCU 4,850 5,245 6,560 8,275 10,735 11,615 MPU 3,565 5,460 8,590 10,995 14,280 17,510

Source: WSTS/ICE, "Roadmaps of Packaging Technology" 20318C

Figure 3-10. Comparison of the MCU and MPU Markets

Clock PDA CPU Frequency

Casio Z-7000 7.5MHz

Apple Newton ARM610 20MHz

Sharp ZR5000 16bit custom —

Sony Magic Link PIC-1000 Zilog Z85180 14.3MHz

Motorola Envoy Communicator Motorola Dragon 68349 16MHz

Source: ICE, "Roadmaps of Packaging Technology" 22167

Figure 3-11. Microcontrollers Inside Popular PDAs

Typically, blocks of 1024 points of data are sampled. If this is performed for voice processing, the data stream may be the microphone voltage, sampled and digitized every 100 microseconds. A 1024 point data stream will represent 0.1sec of speech. When an FFT is performed on this data set,

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-11 Metrics for Technology Performance

the output will be the amplitude and phase of the spectrum, from 10Hz, up to 5KHz, at 10Hz intervals. Many algorithms, such as voice recognition and signal recovery in the presence of noise, operate more efficiently in the frequency domain than in the time domain.

DSPs are typically rated by how quickly they perform an FFT operation on a vector of 1024 data points, each of 16- or 32-bit size. For example, the ADSP-21060 DSP from Analog Devices can per- form the FFT in 0.046 seconds. Two devices could work in parallel and provide real time, contin- uous display of an audio channel.

Applications for the FFT function have exploded in the past few years. They appear in the obvi- ous applications, such as video processors, multi-media, audio, modem/fax chips sets, and wire- less communications. In addition, they are being used as motor controllers. Every device with a motor will soon have a DSP chip. This includes disk drives, dishwashers, air conditioners, and autos. An example of the market breath for DSP chips is illustrated in Figure 3-12, for the $2B market in 1995.

10,000

9,000

8,000

7,000

6,000

5,000 ($M) 4,000

3,000

2,000

1,000

0 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Year

DSP Market ($M)337 447 674 998 1,729 2,460 3,380 4,400 5,735 7,440 9,110

Source: ICE, "Roadmaps of Packaging Technology" 20435B

Figure 3-12. DSP Market Trends ($M)

Just as microprocessor functions have become embedded on chips as cores, the DSP function has also become embedded on ASIC cores, as well as high end FPGAs (field programmable gate arrays).

3-12 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

For specialized operations, a DSP based chip can outperform the fastest MPUs. For example, TI introduced the Multimedia Video Processor (MVP). It is a four million transistor chip with four 32-bit DSPs, a 32-bit RISC processor with 100MFLOPs capability, a transfer controller, two video controllers, and 50Kbytes of SRAM. It is capable of 2,000MIPS when performing specialized video processing.

An Introduction To Performance Metrics

Performance, in general, is an ambiguous and poorly defined term for describing computer sys- tems. More often than not, it is used as a sales and marketing tool to justify the price of a new product. It makes sense only when the specific test that is used to measure the performance is explicitly stated, and then only if the test is related to the intended use.

There are two different methods of quantifying performance: the actual measured execution time of running special, well defined reference programs, called benchmarks, or the rate at which well defined instructions, either integer or floating point, can be processed.

The execution speed of instructions will depend on the type of code that is running, and how well it has been optimized for the architecture of the computer. It is important to be aware of the con- ditions of the test run to rate the performance of a computer. Performance ratings can vary over a factor of 10 for the same computer because of variations in test conditions.

Benchmarks As Performance Metrics

There are a number of standardized programs that have been written and accepted by the indus- try as rulers for performance. Their variety reflects the different types of tasks a computer per- forms. The most common historical benchmarks are:

1. Linpack; solves 100 simultaneous linear equations with 100 unknowns, exercising matrix manipulation abilities.

2. ; most general test, composed of a collection of -like routines that include floating point and integer calculations, transcendental functions, array manipula- tion, and conditional jumps and loops.

3. ; non-numeric test to simulate a high-level program that contains memory assignments, control statements, and function calls.

4. TP1; used to simulate transaction analysis; fetches a record from a database, processes it, and rewrites it.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-13 Metrics for Technology Performance

To evaluate the performance of a computer, these programs are run and the number of times they can be repeated per second is reported. The units are in Whetstones/sec, Linpacks/sec, etc. For example, the Sun 4/260 is rated at 19,000 /sec.

The final application will influence which is the most critical to use. For example, a computer for scientific work should be rated in Whetstones.

Figure 3-13 lists the performance ratings of various computer systems against the standard metrics.

CLOCK PERIOD MIPS MFLOPS COMPUTER (nsec)

Compaq Deskpro 50 15 — 486/25

Intel i860 25 65 5.4

IBM 320 50 27.5 7.4 (RS/6000)

VAX 9000 16 30 125

Amdahl 5990-700 10 63 — (Dual Processor)

NEC SX-3 2.9 — 5,500

Source: ICE, "Roadmaps of Packaging Technology" 15781

Figure 3-13. Performance Ratings of Selected Computers

MIPS and FLOPS

A measure of intrinsic processing speed falls into two categories; the number of instructionsÑ such as add, subtract, move, branch, and fetchÑthat are executed per second, and the number of floating point operations per second. A floating point operation, involving real numbers and operations such as multiply and divide, typically requires 0.1 to 10 instructions to execute, depending on the logic architecture and the code.

When reporting the speed in instructions per second, convention has adopted the units of MIPS, Millions of Instructions Per Second. When reporting the speed for Floating Point Operations Per Second, the units are FLOPS, often given in units of MegaFLOPS or GigaFLOPS.

If the userÕs applications will be mostly floating point operations, the FLOPS are a better metric for real world performance than the MIPS. The number of FLOPS at which a system operates can be maximized by the design of the logic architecture and optimized code. With a co-processor or , common with super computers, the MegaFLOPS may be from 0.5 to 10 times the MIPS rating.

3-14 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

To realistically measure the performance of a computer, the MIPS or FLOPS rating should be mea- sured while running a benchmark. The benchmark run should reflect the final intended applica- tions. Even then, the way the benchmark is coded can greatly influence the measured speed. For example, a CRAY XMP-4, running a standard Linpack benchmark, will do 40MegaFLOPS. Running code optimized for matrix algorithm solutions on the Cray will allow the same Linpack to run at 800MegaFLOPS.

The MIPS rating is basically the number of times the clock frequency. For most RISC-based systems, in which one instruction is performed per cycle, the MIPS is very nearly the clock frequency. For example, the Hyperstone computer, using a novel mixed RISC-CISC processor has a 25MHz clock and executes one instruction per cycle with one processor. It is rated at 25MIPS.

The and has had a profound effect on the number of . In Figure 3-9, the evolution of the Intel processors is shown. The 100MHz Pentium, for example, can execute 3 instructions per cycle and has a rating of 300MIPS.

Some processors are still measured in VAX MIPS. It is the performance of a VAX 11/780. The VAX MIPS rating of a computer is how many times it executes the same code compared to a VAX 11/780. The VAX 11/780, introduced in 1977, is roughly a 1MIPS machine.

Because of the inherent ambiguities in quantifying performance, and the optimism of most mar- keting organizations, which has cast confusion in the popular trade journals, it has been said that MIPS really stands for ÒMeaningless Indication of Performance.Ó

SPECmarks

The System Performance and Evaluation Cooperative (SPEC) is a group formed in 1989 to estab- lish benchmarks specifically for RISC-based systems. The spec is updated periodically. It was first introduced in 1992, and results are referred to in SPEC92 units. There are two test suites, one con- sisting of programs dealing with integer operations, and a second suite of programs dealing with floating point operations.

The single performance number for each suite, in units of SPECmarks, is the ratio of the geomet- ric mean of the rate at which each of the programs can be run, compared to when executed on a reference platform. For the 1992 test spec, the reference platform was the VAX 11/780. Thus, one SPECmark92 is a VAX MIPS, which is roughly one MIP.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-15 Metrics for Technology Performance

Ratings are reported in terms of the SPECint92 or SPECfp92, corresponding to the integer or the floating point test suites, respectively. A higher number means it can operate the program more often and can run faster. For example, a PowerPC 601 operating at 100MHz has a 112 SPECint92 rating. This is roughly 112MIPS. A comparison of the SPECmark rating of a variety of worksta- tion class processors is shown in Figure 3-14.

Chip Vendor Chip Clock (MHz) SPECint92 SPECfp92 SPECint95 SPECfp95

DEC Alpha 21164 291 319.3 602.2 7.03 9.64 333 405.9 518 8.08 12.1 350 405.9 518 — — HP PA-RISC 7200 120 169.6 270.5 4.41 7.45 IBM/Motorola PowerPC 604 100 — — 3.47 3.11 133 156.8 144.8 — — Intel Pentium 133 149.3 116 3.9 3.28 200 320.2 283.2 6.75 8.09 Fujitsu SPARC64 118 212.6 282.8 — — Ross hyperCACHE 150 — — 4.05 4.89 (HyperSPARC)

Sun UltraSPARC* 167 252 351 — — 200 332 505 — — UltraSPARC II** 300 — — 8.5 15 MIPS R4400MC 250 181 — 4.39 — * Supplied by Sun Microelectronics ** Estimated performance, due out fall '96 Source: Computer Design/ICE, "Roadmaps of Packaging Technology" 21974

Figure 3-14. SPEC Benchmarks

SPECmarks provides an unambiguous scale with which to compare the performance of different processors. Figure 3-15 illustrates the neck and neck race between the Intel family and the PowerPC family of processors for SPECint92 performance. For the same clock frequency, the PowerPC performance is slightly ahead of the Intel processor.

The SPECmark test suites were recently updated in 1995, and the tests are listed in Figure 3-16. The reference platform was changed with this test spec. It is now the Sun Microsystems SPARCstation 10/40, a 40MHz Supersparc based workstation. A SPECmark95 of 10 means the computer will execute the test suite of programs 10 times faster than a SPARCstation 10, with a 40 MHz processor. The SS10 is approximately 65 times faster than a VAX 11/780, so SPEC95 ratings for a computer will be about 65 times smaller than SPEC92 ratings. Partly because of the psycho- logical impact of smaller numbers to represent higher performance, and because this test suite is still very new, most current test results are still reported in terms of the SPEC92 test suite. However, Figure 3-17 lists the SPEC95 ratings for various processors.

3-16 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

1,000 New P6 P7?? 620 Core 620 604 e New 604 Core P54/P55 603 500

400 200 P6+ e 300 133MHz 200 150 180 New 603 Core 200 133 150 100MHz

SPECint92 (log scale) 150 150

120

100 90MHz 100

80MHz 0 1994 1995 1996 1997 1998 Date of First Volume Shipments

Source: MicroDesign Resources/ICE, "Roadmaps of Packaging Technology" 21975

Figure 3-15. Pentium Versus PowerPC Performance

Suite Name Comments CINT95 099.go Al game. Plays "Go." 124.m88ksim Motorola 88k chip simulator with test program. 126.gcc New version of GCC, compiles SPARC code. 129.compress Compresses/decompresses file in memory. 130.li Lisp interpreter. 132.ijpeg Graphic JPEG compression/decompression. 134.perl Perl code that manipulates strings, prime numbers. 147.vortex Data-base program. CFP95 101.tomcatv Mesh-generation program. 102.swim Shallow-water model (1024 x 1024 grid). 103.su2cor Quantum physics; Monte Carlo simulation. 104.hydro2d Astrophysics; Hydrodynamical Navier Stokes equations. 107.mgrid Multi-grid solver in 3-D potential field. 110.applu Parabolic/elliptic partial differential equations. 125.turb3d Simulates isotropic, homogeneous turbulence in cube program. 141.apsi Solves problems in distribution of pollutants. 145.fppp Quantum chemistry. 146.wave5 Plasma physics; electromagnetic particle simulation.

Source: Computer Design/ICE, "Roadmaps of Packaging Technology" 21976

Figure 3-16. SPEC Benchmark Application Programs

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-17 Metrics for Technology Performance

SPECint SPECfp SPEC SPEC Model Name CPU Type _base95 _base95 int95 fp95

IBM POWERserver 591 77MHz POWER2 3.67 11.20 — 12 DEC AlphaSta 600 5/300 300MHz 21164 7.33 11.59 7.33 12 DEC AlphaSta 600 5/266 266MHz 21164 6.43 10.64 6.43 11 HP K400 4-CPU 100MHz PA-7200 — 10.20 — 10 HP J210 MP 120MHz PA-7200 — 9.91 — 10 HP K400 3-CPU 100MHz PA-7200 — 9.65 — 10 IBM POWERserver 39H 66.7MHz POWER2 3.28 9.44 — 10 HP K400 2-CPU 100MHz PA-7200 — 8.38 — 8 HP J200 MP 100MHz PA-7200 — 8.12 — 8 HP J210 120MHz PA-7200 4.37 7.54 4.37 7 DEC 3000 Model 900 275MHz 21064A 4.24 6.29 — — HP J200 100MHz PA-7200 3.64 6.28 3.64 6 HP K400 100MHz PA-7200 3.58 6.21 3.58 6 DEC AlphaSta 250 4/266 266MHz 21064A 4.18 5.78 4.18 5 DEC 3000 Model 700 225MHz 21064A 3.66 5.71 — — HP 735/125 125MHz PA-7150 4.04 4.55 4.04 4 HP 735/99 99MHz PA-7100 3.27 3.98 3.27 4 DEC 3000 Model 500 150MHz 21064 2.15 3.65 — — IBM POWERserver C20 120MHz PowerPC 604 3.85 3.50 — — HP 715/100 100MHz PA-7100LC 2.89 3.47 — — IBM POWERstation 43P 133MHz PowerPC 604 4.45 3.31 — — IBM POWERserver C10 80MHz PowerPC 601 2.37 2.97 — — Intel 1110/133 Pentium Processor 3.64 2.37 3.68 3 Sun SPARCstation 20/71 75MHz SuperSPARCII 2.46 2.14 — — Sun SPARCstation 10/40* 40MHz SuperSPARC 1.0 1.0 — —

Source: Computer Design/ICE, "Roadmaps of Packaging Technology" 21977

Figure 3-17. Representative Sample of SPEC95 Benchmarks

Subtle Factors Influencing Measured Performance

In general, Òhard codingÓ a benchmark to be optimized for a particular computerÕs logic archi- tecture, memory architecture, and peripheral access can improve the benchmark performance by 10x. It is common for different computers using the same microprocessor, operating at the same clock frequency, to be rated at different MIPS. This is due to the different tests that were run and how the memory and other were accessed by the code.

3-18 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

It is clear there can be ambiguities in using the MIPS of a machine to measure performance. The performance rating depends on a number of subtle factors, rarely stated explicitly:

1. The program being run. 2. The optimization of the code for the specific computer. 3. The logic design of the computer. 4. The memory architecture of the computer. 5. The use of other peripherals and their access times.

INFORMATION TRANSMISSION

Analog and Digital Networks

Two types of networks are in use today, analog and digital. All communications between com- puters is with digitally encoding information. The plain old telephone system (POTS) is still analog based in most locations. The first implementation of cellular phones were analog based. They are now being switched to digital encoding. Radio and TV transmission are still analog based. However, it is only a matter of time before TV is transmitted as digital, especially as it becomes integrated into the Information Superhighway. Only digitally encoded information can become part of the Information Superhighway.

In digital networks, the information is encoded as bits. In a wire or optical fiber, the bit stream is most commonly encoded as a voltage or intensity level. For wireless, and some leading edge, ultra high fiber systems, a carrier frequency is modulated. The most common is ampli- tude modulation (AM), frequency modulation (FM) or frequency shift keying (FSK), phase shift keying (PSK), and its derivatives, such as quadrature phase shift keying (QPSK).

The information transmission rate is defined as how many bits per second are transported in the interconnect. Though the information may be encoded as digital bits, the actual signal that prop- agates is still an analog signal, representing a varying voltage level or light intensity level. Ultimately, analog voltage levels are measured and compared to a threshold to recover the digital information of high and low.

The use of digital communications allows a much more versatile network, where the information being transmitted carries with it its own address. This increases the efficiency of the switching system which routes the information to the correct end user. With the use of DSP, error correcting codes and compression, the absolute highest information density can be transmitted over the available bandwidth of the interconnect with digital signals. All networks will move toward dig- ital encoding in the future.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-19 Metrics for Technology Performance

In digital networks, there are four elements, the transmission medium that carries the information from point to point, the switching systems that routes the information into the various transmis- sion channels, and the transmitter and receiver. Often the switching system also acts a repeater for receiving, routing and re-transmitting the digital signals.

Interconnect Media

Digital networks typically use four media for transmitting information; wire, such as printed cir- cuit board traces, twisted pair, ribbon cable or coax cable; fiber-optic cable, either multimode or single mode; infrared through free space; and wireless, as radio waves, rf or microwaves. When using free space radio waves, to avoid interference between channels, the FCC has regulated what parts of the spectrum can be used for what purpose. No such restrictions are placed on wire or fiber-optic networks, since they are dedicated lines which do not radiate or interfere. IR is typi- cally for communications between devices on one desktop, or at most within one room, and inter- ference between devices can be easily controlled.

The maximum rate at which bits of information can flow in a channel was first described by in 1948. It has since become know as ShannonÕs Law:

capacity (bits/sec) = BW x log2 (1+SNR)

The bandwidth is the span in frequency that is used by the signal. When a carrier frequency is used, as in rf transmission, the bandwidth is roughly the modulation frequency of the carrier. The bandwidth in frequency space is closely regulated by the FCC to minimize interference between users. In wire or fiber based digital networks the bandwidth is the highest sine wave frequency component present in the signal. This is related to the edge rate of the signal. It is approximately 0.35/rise time. In a typical digital system network, the bandwidth is about 5x the clocked fre- quency. This is reviewed in detail in Chapter 7.

To increase the transmitted bit rate, either the bandwidth must be increased, which means the clock frequency increases, or the signal to noise ratio must increase. Both of these methods are being used to increase the carrying capacity of networks.

There is a fundamental trade off in bit rate capacity and distance for an interconnect. As the length of an interconnect increases, effects such as attenuation decrease the signal to noise ratio, and reflections and distortions from impedance discontinuities decrease the bandwidth. The choice of the medium to use depends on the distance to be traveled and the required bandwidth. As digi- tal signal processing technology becomes more sophisticated, the data carrying capacity of an interconnect will steadily increase. Figure 3-18 shows the variation of carrying capacity and dis- tance for a variety of interconnect media.

3-20 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

1x105

1x104

Fiberoptic

1x103

Coaxial Cable

100

10

Twisted Pair with DSP

1

ransmitted bit Rate (Mbits/sec)

T 0.1

Twisted Pair

0.01

0.001 1 10 100 1x103 1x104 1x105 Length (meters)

Source: ICE, "Roadmaps of Packaging Technology" 22168

Figure 3-18. Maximum Data Transmission for Different Interconnect Media

For long distance and high capacity, optical fiber is the clear winner. All high bandwidth networks are going to use optical fiber. The longest span, the Trans Atlantic Telephone (TAT) 12/13 Network, between the U.S. and the U.K., is operational as of Spring 1995. It carries information at the rate of 10Gbits/sec across a span of 5,913 kilometers, with 133 repeaters, spaced every 45 kilometers. The reliability is rated at less than 1 ship based repair required on the entire network in 25 years. It consists of two cables, each with four single mode fibers, grouped in two pairs. Each pair carries 2.5Gbit/sec in each fiber, or 5Gbits/sec per pair. In each cable, one pair is des- ignated the service pair and carries most of the information. The second pair is termed the restora- tion pair and is used for maintenance, and is available if there is a fault in the service pair. Between the two cables, there is a total carrying capacity of 10Gbits/sec.

Fiber-optic interconnects are becoming the medium of choice for very high speed wide bandwidth local area networks (LANs) and wide area networks (WANs). For example, there are standards for SONET (synchronous optical network) protocol, with specifications for 0.155, 0.622 and 2.5Gbits/sec. Gigabit Ethernet, also using fiber-optic cables is rated at 1Gbit/sec.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-21 Metrics for Technology Performance

Slowing the implementation of fiber-optic networks is the cost of the transmitters, receivers and cables, compared with coaxial cable. Only when the higher bandwidths justify it, will it be used.

At the other extreme, twisted pair has the lowest bandwidth for a fixed length. However, it is already installed in most homes as the medium for transmission from the nearby switch box to the home. This connection is termed the subscriber loop, designed for audio signal transmissions of about 3KHz in bandwidth. This bandwidth is limited by poor signal integrity as a result of its loose specifications with respect to twists, turns, and proximity to adjacent wires. Reflections and signal distortions over the maximum 500 meter length drastically reduces the bandwidth. When these systems were designed and installed, higher bandwidth needs were not envisioned.

The Information Superhighway has quickly obsoleted the audio bandwidth. When surfing the net from home on even a 14.4k baud modem, most users refer to the WWW as the ÒWorld Wide WaitÓ. What constrains the use of higher data carrying capacity is the lower signal-to-noise ratio (SNR) at higher modulation frequencies on the subscriber loop. The advance from 9.6Kbit/sec to 14.4Kbits/sec and 28.8Kbits/sec was enabled by advances in signal recovery chips on the trans- mitting modem and on the receiving modem, enhancing the SNR.

U.S. Robotics and Rockwell have announced modems operating at 56Kbits/sec, both using advanced signal processing chips to recover the distorted analog signal from the noisy back- ground. Both of these require extensive information processing. The Rockwell modem uses an MCM to package the DSP chips in a small volume.

To increase the data capacity of twisted pair for the 500 meters of the subscriber loop, digital trans- mission offers an advantage. A number of digital subscriber loop (DSL) technologies have emerged that are capable of transmitting digital information over the twisted pair lines at up to 6Mbits/sec, with advanced DSP chips to recover the digital information from the noise. Amati Communications and Pairgain Communications both have trial systems in place at 6Mbits/sec in subscriber loops.

Delivering the Information Superhighway to the home through a high bandwidth pipe is such a large market that there are four competing technologies: DSL on existing twisted pair lines, new cable modems using the existing cable TV lines, installing fiber-optic cables to the home, and wire- less connections. All of these methods will be capable of delivering video bandwidth rates. They may all successfully be used.

The growth of the Information Superhighway is driving the need for higher data rates. In some cases, the traffic is increasing a factor of 10 each year. An example is shown in Figure 3-19. The need to carry higher data rates will drive both the increasing data clocking frequency and more efficient algorithms and DSP chips to extract lower signals from higher noise. These functions

3-22 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

require continual advances in information processing speeds, both in terms of the speed at which bits can be multiplexed, split and re-routed, encoded and decoded, and in the processing speeds of the DSP chips to keep up with the information flow rate. All the various network protocols that exist today drive this trend of higher processing speed and clock frequency.

100,000 Logs per weekday Logs per weekend

10,000

1,000

100 Number of Hits

10

1 Jul91 Oct Jan92 Apr Jul Oct Jan93 Apr Jul Oct Jan94 Apr Jul

Source: Computer/ICE, "Roadmaps of Packaging Technology" 21978

Figure 3-19. Web Client Growth From July 1991 to July 1994

Figure 3-20 is a forecast of the transmission rates over the next five years.

FDDI = fibre distributed data interface ATM = asynchronous transfer mode 1,000 SCSI = small computer system interface 10,000 HiPPI = high performance parallel interface 100 1,000 HiPPI Fibre Processors Channel 10 100 MIPS

Mbits/s

ATM Token Ring SCSI FDDI

1 Ethernet 10

0.1 1 1970 1980 1990 2000 Year Source: Computer Design/ICE, "Roadmaps of Packaging Technology" 21979

Figure 3-20. The Communications Bottleneck

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-23 Metrics for Technology Performance

Switching Networks

When two or more intelligent devices are connected together and communicate, the collection of devices is called a network. Networks are generically described by their physical sizeÑhow many users they support and how closely they are locatedÑand by the information carrying capacity of the network. For each combination of size and information bandwidth, there are a variety of standard topologies and protocols. Some of these are illustrated in Figure 3-21. The AppleTalk network, a low cost local area network that interfaces computers, printers and other peripherals, is connected as a ring. An Ethernet network, used to connect computers to a server, can be either a ring or a star.

STAR NETWORK RING NETWORK

Server Server Client Client Client Client

Client Client Client Client (Peripheral) Client (Peripheral)

DISTRIBUTED NETWORK

Client Client Client Client

Client Client Client Server Server Server

Client Server Client Client (Peripheral)

Source: ICE, "Roadmaps of Packaging Technology" 22169

Figure 3-21. Network Topologies for Local Area Networks

In a network, i.e., when computers are communicating among themselves, the most common architecture used is the client-server. One computer acts as the coordinator for all the others. Information flows through the server. A server can handle from 2 to 200 clients depending on the processing power, network bandwidth and applications being run.

3-24 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

As a network expands, and servers are added, they can be networked to each other with routers. Because the client server network typically uses dedicated wiring between computers that are located near each other, this is often called a local area network (LAN). Between servers, espe- cially as the distance between them increases, the network is often termed a wide area network (WAN) to refer to the greater distances. LANs are typically short distances, under 500 meters, and connect directly to end users. WANs are typically longer, from 1 kilometer to around the world, and typically connect servers, requiring higher bandwidths.

In the business environment, there are four general network sizes, roughly based on the number of users and their geographical locations:

A workgroup is a small collection of 2-20 users or peripherals, comprising a LAN, which are con- nected to a server. They are usually co-located in the same general area of a building, and use dedicated wiring.

A department is 20-200 users typically in the same building, connected by dedicated wiring and controlled by one or more servers. The servers would be directly connected, also over a LAN.

A campus, as its name suggests, originated to refer to university campuses, and is 200-2,000 users, spread over one or more buildings. The connections can be made by either dedicated wiring, leased lines or commercial phone lines. A campus network may be either a LAN or WAN.

An enterprise has come to mean an entire company which may be spread across a number of dif- ferent buildings, some located in the same area, some remotely distributed across the country or the world. The size of this group can be from 2,000 to 20,000 or more. Enterprise wide communica- tions has become the fastest growing market. It consists of a number of LANs connected by a WAN.

The term intranet has been created to refer to enterprise wide communication. This market seg- ment, because it is tied to the profitability of companies, has more money feeding it and is both a larger market and a faster growing market than the Internet. This is shown in Figure 3-22. An intranet is the internal communications backbone of a company. It is typically only accessible from within an enterprise WAN or LAN. To protect the security of internally proprietary infor- mation, there is a ÒfirewallÓ separating the intranet from the public Internet. The communications security industry is devoted to keeping the firewall impenetrable to outside intruders.

Networks are also described in terms of their information bandwidth or bit rate carrying capa- bility. A narrow band network is one with a low bit rate, typically less than 50Kbits/sec. A wide band network has a high information carrying capacity. With the Information Superhighway driving the need for ever more information to the individual, all networks are going toward requiring increasingly higher bandwidths.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-25 Metrics for Technology Performance

14,000 Intranet Internet 12,000

10,000 s

8,000

6,000 Millions of Dollar 4,000

2,000

0 1995 1996 1997 1998 1999 Year Source: Zona Research Inc./ICE, "Roadmaps of Packaging Technology" 22170

Figure 3-22. Intranet Versus Internet Server Market 1995-1999

In addition to the encoding of information and transmission by wire, the bit stream has to be switched and routed to its final destination. This requires that the data stream be broken down into its constituent packets and routed to other transmission channels. This is accomplished using switchers or routers. Typically N channels of high speed digital information comes in and M channels come out. These operations require exactly the same technologies as for high speed information processing. The packaging of wide band switching networks is limited in the same way as high speed processing systems.

The requirements for these networks is also driving higher clock frequency, higher integration levels, higher density, at lower cost, and with shorter development cycle times.

PACKAGING TECHNOLOGY

ÒThe semiconductor industry and circuit designers are exceeding the capabilities of the pack- aging industry to provide sufficiently advanced carrier, interconnection, and cabling tech- nology to support higher density circuits and to use fully the capabilities of LSI chips containing 100 or more circuits on a die two-tenths of an inch square.....These connections consume space at an alarming rate, resulting in increased distances between adjacent circuit packages and increased signal delays.Ó

Ñ Robert Beall, Amdahl Corp., 1974

3-26 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

ÒVLSI is limited by interconnect and packaging technology.Ó

Ñ Dr. Craig Barrett, VP and GM, Intel, 1987

At every step along the road toward higher performance per volume per unit cost, IC technology leads the way and the packaging technology attempts to keep pace. It is the handful of chips themselves that performs the actual information processing.

With a fixed set of chips, there is nothing that the packaging can do that will increase the infor- mation processing capability of the system above the intrinsic capability of the chips. In this sense, the packaging does not add value to the product. It can only increase the size of the system, decrease the speed of the chips, and add to the system cost.

From the perspective of the packaging, it is not a question of whether the packaging will limit the system performance, it is a question of the magnitude of the penalty, in system size, speed, and cost, and how its detrimental impact can be contained at an acceptable level. Packaging and inter- connect technology have in the past, and will always in the future, limit system performance.

The chip technologies define the ultimate performance capability of a system. The packaging technologies are constraints to this ultimate performance. Engineering solutions are invented to always keep these constraints to a minimum.

Fueled by a $150 billion market (1996), IC technologies will advance at ever faster rates. This evo- lution is enabled by process development, increasing integration, and increasing yields for increasingly complex devices. Packaging technology must also advance at an ever more rapid pace just to maintain its second-place position.

Rao Tummala, former Director of Advanced Electronic Packaging Technology at IBM, says,

ÒBreakthroughs and progress are forged by blending sound technological fundamentals with artistic inspirations to create novel and unusual designs with extra leverage.Ó

Breakthroughs in new designs, materials, and processes are needed as leverage against engineer- ing and fundamental constraints that must be overcome to keep the impact on system perfor- mance, cost and time to market by the package to an acceptable level.

The following chapters describe the intrinsic capabilities of chip technologies, how they are con- strained by the packaging technologies, and some of the potential engineering solutions available today and in the near future.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-27 Metrics for Technology Performance

TOWARD HIGHER PERFORMANCE

Higher Performance Through Software

A Doctor, an Engineer, and a Programmer were arguing about whose profession was the oldest.

ÒGod removed one of AdamÕs ribs and created Eve,Ó the Doctor said, Òa clear case of surgery.Ó

ÒBut before that,Ó said the Engineer, ÒHe created order out of chaosÑobviously an engineer- ing job.Ó

ÒSure,Ó said the Programmer, Òbut who do you think created the chaos?Ó

Software

Software algorithms will always play an important role in increasing the information processing capabilities of computers. In fact, in the world of fluid flow simulation such as in aerodynamics, there has been as much progress made in speeding up the run times because of optimized algo- rithms, as in the increase in speed resulting from higher performance hardware.

The introduction of neural network programs, running on digital computers, may also use a new software algorithm to solve a selected class of artificial intelligence problems more quickly than traditional numerical methods also using the same MIPS platform. Non-digital neural net chips, being developed by Cal Tech, for example, may be the core of the sixth generation com- puters. With their analog-like behavior, the packaging requirements for these chips may be even more stringent than in their all-digital counterparts.

Higher Performance Through Higher Clock Frequency

The equation for theoretical MIPS points out the two most important factors directly affecting per- formance in digital computers: the clock frequency and the number of instructions that can be exe- cuted per cycle, summarized in Figure 3-23.

The clock frequency is directly dependent on the chip technologies used, the interconnection topology, and the packaging technology used to implement the topology. The number of instruc- tions executed per cycle depends on the architecture and the degree of concurrency built into the logic architecture. Figure 3-24 shows how these factors relate.

3-28 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

MIPS

Clock Frequency Number of Instructions/Cycle

Chip Topology Logic Architecture Technology

Logic Memory Degree of Concurrency ¥ Word length ¥ Pipelining ¥ Vector processing ¥ Array processing ¥ ¥ DSP ¥ Multiple processors

IMPACT ON ¥ Decreased wiring delay ¥ Increased number of gates PACKAGING REQUIREMENTS ¥ Increased bandwidth in the CPU

Source: ICE, "Roadmaps of Packaging Technology" 15785

Figure 3-23. Impact of Higher Performance on Packaging Requirements

14 10 Instructions/Cycle 13 Increasing 10 Concurrency 0.01 1012 0.1 1011 1 2

y (Hz) 1010 10 10 9 100 10 8 10 7 k Frequenc 10 6 Multi-

Cloc Processors 10 5 10 4 10 3 10 2 10 -2 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 MIPS Source: ICE, "Roadmaps of Packaging Technology" 15784

Figure 3-24. Performance, Concurrency, and Clock Frequency

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-29 Metrics for Technology Performance

The MIPS rating of any computer directly increases with an increasing clock frequency. This drives every system to use the highest practical clock frequency in each form factor. From the ENIAC in 1945 at about 100KHz, to the NEC SX-X at 330MHz in 1989, there has been a doubling in clock frequency about every four years (Figure 3-25). For the case of single-chip systems, this doubling occurs roughly every three years (Figure 3-26).

Clock Frequency Period

1GHz 1ns

NEC 100MHz 10ns ILLIAC IV SX-X

10MHz 100ns

1MHz 1µs

100KHz ENIAC

1945 1955 1965 1975 1985 1995 Year Source: ICE, "Roadmaps of Packaging Technology" 15786A

Figure 3-25. Clock Frequency Trends of Computer Systems

The fundamental limit of the clock cycle time depends on the delay per gate and the number of gates required by the logic architecture to be sequentially switched in one cycle. Figure 3-27 shows the intrinsic switching speeds for various device technologies and feature sizes. The high- est end processors have historically all used bipolar, BiCMOS or CMOS technology, with the exception of the Cray 3, which uses GaAs integrated circuits. Current generation CMOS devices can switch as fast as current generation bipolar devices, but with higher integration levels.

Higher Performance Through Higher Total Gate Count

The new RISC (Reduced Instruction Set Code) , such as the Motorola 88000, Intel 80860, and MIPS Computer Systems R3000 take advantage of a change in logic architecture that allows most instructions to be executed in one clock cycle. To a large extent, this is accomplished by an optimized use of on-chip cache registers, minimizing the need to use main memory, and maximizing the use of pipelining. Pipelining allows normally idle parts of the CPU to start exe- cuting the next pieces of the problem while the main CPU is processing current pieces.

The charge for this optimization of code is more gates required to implement the logic and to pro- vide for the on-board cache. For example, the Intel 80386 has 70K gates. The RISC 80860 has 250K gates. Currently, over half the RISC microprocessors have been implemented in chip sets because of the need for more gates than can currently fit on one chip at an acceptable yield.

3-30 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

1,000

100

80486 Salicide 80860 Double Metal 80386 CMOS y (MHz)

10 80286 Silicide 8086 k Frequenc 8085 NMOS Cloc

8080

1 PMOS 4004

0.1 1970 1975 1980 1985 1990 1995 2000 Year Source: Intel/ICE, "Roadmaps of Packaging Technology" 14509

Figure 3-26. MPU Technology Trends

A major principle of is sequential processing. Starting with the fourth generation computers and strongly emphasized in the fifth generation computers, concurrent pro- cessing is adding information processing power to systems. This trend toward increasing the degree of concurrency in CPU designs can take the following forms.

1. Word length: number of bits that are processed simultaneously. The first commercial micro- processor, the Intel 4004, was a four bit per word machine. The 80386 is a 32 bit per word machine. Most super computers are 64-bit or 128-bit machines.

2. Pipelining: interleaving of multiple tasks so that while parts of the CPU are waiting to play their role in one task, they are used for another.

3. Vector processing: a large number of operands (of one word length each) are processed simultaneously, rather than just one pair as in a .

4. Array processing: a number of vectors are processed simultaneously.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-31 Metrics for Technology Performance

5. Co-processor: a specialized processor for floating point calculations and some special functions.

6. Digital signal processor: a co-processor with specialized numerical algorithms, such as the Fast Fourier Transform, hardwired in.

7. Multiple processors: repeated processors that are complete and independently operate in parallel, using either their own memory or shared memory, or both.

100 CMOS

10 ECL GaAs

1

HEMT

.1 Switching Time (nsec)

.01

.001 .1 1 10 100 Feature Size (microns) Source: ICE, "Roadmaps of Packaging Technology" 12820A

Figure 3-27. Intrinsic Switching Speeds of Selected IC Technologies

As each of these features is added, more operations are made possible in a clock cycle. The ulti- mate extent of concurrency is when the entire CPU is duplicated and the number of instructions per cycle doubles, such as with parallel processors. Accessing these features, though, requires an ever increasing number of gates in the CPU.

The total gate counts in CPUs used at the high end are listed in Figure 3-28. In Figure 3-29 are shown the gate counts for selected microprocessors.

3-32 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

NUMBER OF LOGIC COMPUTER GATES IN CPU

Cray 1 300K

Hitachi M688H 3.2M

Fujitsu M780 1.0M

ETA-10 5.7M (CMOS)

VAX 9000 1.4M

IBM 3081 360K

Source: ICE, "Roadmaps of Packaging Technology" 15787

Figure 3-28. Selected CPU Counts

10M Pentium Pro (MPU only)

PowerPC 601 1M Pentium

68040 80486 100K 80386 68020 80286 68000 10K 8086 Number Of Gates Per Chip 8085 1K 8080

4004

100 70 72 74 76 78 80 82 84 86 88 90 92 94 96 Year MPU Increase ≈ 1.35/Year Source: ICE, "Roadmaps of Packaging Technology" 16175A

Figure 3-29. Selected Microprocessor Densities

Minimizing Detrimental Impact Of The Memory Topology

The two factors that most directly impact performance are clock speed and total number of gates in the CPU. A secondary factor that can limit performance is the bottleneck associated with memory access. The memory architecture is designed to allow retrieval of most of the data within one clock cycle.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-33 Metrics for Technology Performance

Memory is partitioned into a number of levels depending on the access time. Cache is designed for no wait states. It is always placed as close as possible to the CPU to minimize time delays and wiring densities, which arise when handling the ever widening data buses.

In general, the packaging environment of the CPU is the most expensive real estate in the system. To minimize costs, the size of the cache, which is packaged in the CPU, is kept as small as possi- ble without impairing performance. To achieve 200MHz operation, and keep the on chip cache size manageable, the Intel Pentium Pro places the L2 cache adjacent to the CPU in a dual cavity cofired ceramic package, as shown in Figure 3-30.

Courtesy of Intel Source: ICE, "Roadmaps of Packaging Technology" 22188

Figure 3-30. Intel Pentium Pro Processor in Cofired Ceramic MCM

Alternatively, the logic architecture can be designed to use the wait-time between fetches effec- tively as in the Intergraph Clipper, which uses a 75MHz clock and has two wait-states for its cache. The cost is more complex logic and more gates required to cover the overhead.

Minimizing Detrimental Impact Of The Package

The two technology-related factors that improve the performance the most are increasing the clock frequency and utilizing more total gates in the CPU. As the chip technology provides the capability for higher clock frequencies and higher gate counts, the packaging technology must keep up so as not to excessively reduce performance.

3-34 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

Once the intrinsic clock frequency is established by the chip technology and the logic architecture, the packaging will slow the clock down because of the introduction of off-chip delays. Independent of any novel packaging technology, the speed of light will always dictate the mini- mum off-chip delay.

Virtually all computers today use the same clock uniformly throughout their entire CPUs. In a synchronous clock scheme, gates that can contribute in one clock cycle cannot be physically farther away than the signal can travel in one cycle. This sets a limit on the actual physical size of a CPU.

The typical time of flight, TOF, for signals in an FR4 glass-epoxy PCB (printed circuit board) is 6 in/nsec. This limits the size of the CPU to about 12 inches on a side for each nanosecond of clock cycle. Figure 3-31 shows the maximum CPU sizes for the various clock frequencies. These are upper limits that do not consider factors such as gate delays and clock skew.

Longest Distance of the CPU

1ft

NEC 10ft ILLIAC IV SX-X

100ft

1,000ft

1945 1955 1965 1975 1985 1995 Year Source: ICE, "Roadmaps of Packaging Technology" 22345

Figure 3-31. CPU Size Trends of Computer Systems

In the fastest super computers, the finite speed of light is a driving force in the CPU topology. Two approaches are used. In the case of IBM, DEC, Fujitsu and NEC, the CPU fits on one or two very large boards. In the case of the NEC SX-X, with a 3nsec clock, the CPU board must be less than three feet on a side.

The computers in the Cray ResearchÕs XMP and YMP families use three-dimensional packaging to make up in gates per cubic inch what they lack in gates per square inch. Rather than one very large board, the Cray CPU is contained on many very small boards, which are stacked close together and interconnected with controlled length discrete wiring. For the Cray 2, with a clock cycle time of 4nsec, all gates in the CPU are contained within a cylinder less than four feet in diameter, because of speed of light limitations.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-35 Metrics for Technology Performance

An alternative path is to use partitioned clocks for different regions of the CPU. The on-chip is higher than the module rate, which is higher than the board rate. All the clocks are phase- locked together so there is still global synchronization. The importance of synchronization grows as the degree of concurrency increases, which is the main reason the CPU gate count is so high. This is a very complex design to implement.

Two very different strategies have emerged to enable the highest performance super computers. At one extreme is NECÕs approachÑbuild the fastest and most powerful CPU, and duplicate it as needed. Its SX-X chip has a clock frequency of 330MHz. The single CPU unit has a rating of 5.5GigaFLOPS or roughly 10,000MIPS. It contains about two million logic gates plus memory.

At the other extreme is the Connection Machine from Thinking Machines. It has 65,536 one-bit processors, packaged 16 to a chip. It is rated at 7,000MIPS with a clock frequency of about 1MHz. It has about five million gates in its CPU. However, these machines are optimized for very dif- ferent types of tasks. While the NEC super computer is designed for general purpose, floating point intensive, scientific number crunching, the Thinking MachinesÕ super computer is designed for text search, integer manipulation, and simultaneous operations. Clock frequency packaging is not the driver for the Connection Machine but rather providing interconnection to the 65,536 microprocessors.

In summary, the drive for higher information processing capability pushes the gate count and the intrinsic device switching speed. To minimize the detrimental impact on performance, the package should:

1. Allow the highest clock speed possible. 2. Allow as many gates as possible in the CPU. 3. Allow the chips to be as close together as possible. 4. Allow the memory to be as close as possible to the CPU. 5. Allow high-speed, wide data paths between logic and memory. 6. Keep costs low. 7. Not slow down the time to market.

Value To The Customer

As had been pointed out, there are five generic driving forces on electronic systems in the expand- ing market for computers, communications and consumer products: faster, denser, cheaper, lower power, NOW.

3-36 INTEGRATED CIRCUIT ENGINEERING CORPORATION Metrics for Technology Performance

Performance is not always the attribute most highly valued by the customer. For example, in a pacemaker, the reliability and physical size of the product are where most of the value is. A pace- maker is still a highly sophisticated product that uses state of the art technology. However, it does not push the technology envelope of number crunching farther out.

Performance is never the only design requirement for a system. It must always be balanced with other aspects to arrive at an acceptable product.

In general, the system design is based on balancing:

¥ value to the customer ¥ cost to develop ¥ cost to manufacture ¥ time to market ¥ technical risk ¥ business risk

This is diagrammed in Figure 3-32.

Value to the Customer

Cost to Cost to Develop Manufacture

Technical Business Risk Risk

Time to Market Source: ICE, "Roadmaps of Packaging Technology" 15788

Figure 3-32. Product Design Tradeoffs

Product reliability is also an extremely important issue. In the situations where life can be at stake, such as aircraft, life support, and communications, ultrahigh reliability is required, and is a value- added feature. In general, though, it plays a role as a penalty. Reliability adds no value if present, but detracts from the value if it is not.

INTEGRATED CIRCUIT ENGINEERING CORPORATION 3-37 Metrics for Technology Performance

There are many right answers that result in products with equally high performance but that are arrived at using radically different design approaches. These paths reflect the different strengths in the infrastructure within each company.

Though it is tempting to look at performance as the design rationale for every system, care must be taken. This path leads to false interpretations and the wrong lessons can be learned from case studies believing every design was implemented because it would offer the highest performance. This is demonstrated in Figure 3-33. It is always a good policy to keep in mind what value means to each particular customer, as was emphasized in Figure 3-1.

Reproduced with special permission of Jerry Workman/ ICE, "Roadmaps of Packaging Technology" 16066

Figure 3-33.

3-38 INTEGRATED CIRCUIT ENGINEERING CORPORATION