The Powerpc 604 RISC Microprocessor

Total Page:16

File Type:pdf, Size:1020Kb

The Powerpc 604 RISC Microprocessor The PowerPC 604 RISC Microprocessor The PowerPC 604 RISC microprocessor uses out-of-order and speculative execution techniques to extract instruction-level parallelism. Its nonblocking executionpipelines, fast branch misprediction recovery, and decoupled memory queues support speculative execution. S. Peter Song he 604 microprocessor is the third in Figure 1, this deep pipeline enables the 604 member of the PowerPC family being to achieve its 100-MHz design. The stages are Marvin Denman developed jointly by Apple, IBM, and UMotorola. Developed for use in desk- Fetch. This stage translates an instruction Joe Chang top personal computers, workstations, and ser- fetch address and accesses the cache for up vers, this 32-bit implementation works with the to four instructions. Somerset Design Center software and bus in the PowerPC 601 and 603 Decode. Instruction decoding determines microprocessor^.'-^ While keeping the system needed resources, such as source operands interface compatible with the 601 microproces- and an execution unit. sor, we improved upon it by incorporating a Dispatch. When the resources are available, phase-locked loop and an IEEE-Std 1149.1 dispatch sends instructions to a reservation boundary-scan WAG) interface on chip. In addi- station in the appropriate execution unit. A tion, an advanced machine organization deliv- reservation station permits an instruction to ers one and a half to two times the 601’s integer be dispatched before all of its operands are performance. available.‘ As they become available, the reservation station forwards operands to the Performance strategy execution units. Dispatch can send up to Processor performance depends on three fac- four instructions in program order (in-order tors: the number of instructions in a task, the dispatch) to four of six execution units: two number of cycles a processor takes to complete single-cycle integers, a multicycle integer, a the task, and the processor’s frequency.*s5Our load/store, a floating point, and a branch. architecture, which we optimized to produce Issue/execute. In each execution unit, this compact code while adhering to the reduced stage issues one instruction from its reser- instruction set computer (RISC) philosophy, vation station and executes it to produce addresses the first factor. The high instruction results. The instructions can execute out of execution rate and clock frequency addresses program order (out-of-order execution) the other two factors. The 604 provides deep across the six execution units as well as with- pipelines, multiple execution units, register in an execution unit that has an out-of-order renaming, branch prediction, speculative exe- issue reservation station. Table 1 lists the cution, and serialization. latency and throughput of the execution Six-stage superscalar pipeline. As shown stages. 8 IEEEMicra 0740-7475/94/$04.000 1994 IEEE 111 Authorized licensed use limited to: Universitaetsbibliothek der TU Wien. Downloaded on March 12,2020 at 15:55:02 UTC from IEEE Xplore. Restrictions apply. Completion, An instruction is Branch instructions said to be jinished when it Decode IEvaZh I passes the execute stage. A fin- 1 :;kt IPredict Validate IComplete I ished instruction can be com- Integer instructions pleted 1) if it does not cause an (Fetch (Decode (Dispatch (Execute IComplete ( Write back/ exception condition and 2) Loadstore instructions when all instructions that appear earlier in program IFetch (Decode IDispatch I Cache ( Align IComplete (Write bacq order complete. Tnis is known Floating-point instructions as in-order completion. (Fetch IDecode (Dispatch (Multiply (Add (Rndnorm (Complete (Writeback/ Write buck. This stage writes the results of completed instructions to the architec- Figure 1. Pipeline description. tural state (or the state that is visible to programmer). By- pass logic permits most instructions to complete and Table 1.604 execution timings. write back in one cycle. I I I Instruction Latency Throughput I Although some designs use even deeper pipelines to achieve higher clock frequencies than the 604 does, we felt Most integer 1 1 that such a design point does not suit today’s personal com- Integer multiply (32x32) 4 2 puters. It relies too heavily on one of, or a combination of, a Integer multiply (others) 3 1 very large on-chip cache, a wide data bus, or a fast memory Integer divide 20 19 system to deliver its performance. It would be less than com- Integer load 2 1 petitive in today’s cost-sensitive personal computer market. Floating-point load 3 1 Precise interrupts and register renaming. Most pro- Store 3 1 grammers expect a pipelined processor to behave as a non- Floating-point multiply-add 3 1 pipelined processor, in which one instruction goes through Single-precision floating-point divide 18 18 the fetch to write-back stages before the next one begins. A Double-precision floating-point divide 31 31 processor meets that expectation if it supports precise inter- rupts, in which it stops at the first instruction that should not be processed. When it stops (to process an interrupt), the completion of up to four instructions per cycle. processor’s state reflects the results of executing all instruc- Unlike Smith and Pleszkun’s reorder buffer, the 604’s tions prior to the interrupt-causing instruction and none of reorder buffer does not store instruction results. Temporary the later instructions,including the interrupt-causinginstruc- buffers hold them until the instructions that generated them tion. This is not a trivial problem to solve in multiple, out-of- complete. At that time, the write-back stage copies the results order execution pipelines. An earlier instruction executing to the architectural registers. The 604 renames registers to after a later instruction can change the processor’s state to achieve this; instead of writing results directly to specified reg- make later instruction processing illegal. Sohi gives a gen- isters, they are written to rename buffers and later copied to eral overview of the design issues and solutions.’ specified registers. Since instructions can execute out of order, The 604 uses a variant of the reorder buffer described by their results can also be produced and written out of order Smith and Pleszkun to implement precise interrupts.8The into the rename buffers. The results are, however, copied from 16-entry reorder buffer keeps track of instruction order as the buffers to the specified registers in program order. Register well as the state of the instructions. The dispatch stage assigns renaming minimizes architectural resource dependencies, each instruction a reorder buffer entry as it is dispatched. namely the output-dependency (or write-after-write hazard) When the instruction finishes execution, the execution unit and antidependency (or write-after-read hazard), that would records the instruction’s execution status in the assigned otherwise limit opportunities for out-of-order execution? reorder buffer entry. Since the reorder buffer is managed as Figure 2 (next page) depicts the format of a rename buffer a first-idfirst-out queue, its examining order matches the entry. The 604 contains a 12-entry rename buffer for the instruction flow sequence. To enforce in-order completion, general-purpose registers (GPRs) that are used for 32-bit inte- all prior instructions in the reorder buffer must complete ger operations. The 604 allocates a GPR rename buffer entry before an instruction can be considered for completion. The upon dispatch of an instruction that modifies a GPR. The dis- reorder buffer examines four entries every cycle to allow patch stage writes a destination register number of the October 1994 9 Authorized licensed use limited to: Universitaetsbibliothek der TU Wien. Downloaded on March 12,2020 at 15:55:02 UTC from IEEE Xplore. Restrictions apply. The 604’s speculative execution strategy complements its branch prediction mechanisms. The strategy is to fetch and execute beyond two unresolved branch instructions. The Figure 2. Rename buffer entry format. results of these speculatively executed instructions reside in rename buffers and in other temporary registers. If the pre- diction is correct, the write-back stage copies the results of instruction to the Reg num field, sets a Rename valid bit, and speculatively executed instructions to the specified registers clears the Result valid bit. When the instruction executes, the after the instructions complete. execution unit writes its result to the Result field and sets the Upon detection of a branch misprediction, the 604 takes Result valid bit. After the instruction completes, the write- quick action to recover in one cycle. It selectively cancels back stage copies its result from the rename buffer entry to the instructions that belong in the mispredicted path from the GPR specified by the Reg num field, freeing the entry for the reservation stations, execution units, and memory reallocation. For a load-with-update instruction that modi- queues. It also discards their results from the temporary fies two GPRs, one for load data and another for address, buffers. In addition, the processor resumes its previous state the 604 allocates two rename buffer entries. to start executing from the correct path even before the mis- Register renaming complicates the process of locating the predicted branch and its earlier instructions have completed. source operands for an instruction since they can also reside Since the 604 detects a branch misprediction many cycles in rename buffers. In dispatching
Recommended publications
  • 45-Year CPU Evolution: One Law and Two Equations
    45-year CPU evolution: one law and two equations Daniel Etiemble LRI-CNRS University Paris Sud Orsay, France [email protected] Abstract— Moore’s law and two equations allow to explain the a) IC is the instruction count. main trends of CPU evolution since MOS technologies have been b) CPI is the clock cycles per instruction and IPC = 1/CPI is the used to implement microprocessors. Instruction count per clock cycle. c) Tc is the clock cycle time and F=1/Tc is the clock frequency. Keywords—Moore’s law, execution time, CM0S power dissipation. The Power dissipation of CMOS circuits is the second I. INTRODUCTION equation (2). CMOS power dissipation is decomposed into static and dynamic powers. For dynamic power, Vdd is the power A new era started when MOS technologies were used to supply, F is the clock frequency, ΣCi is the sum of gate and build microprocessors. After pMOS (Intel 4004 in 1971) and interconnection capacitances and α is the average percentage of nMOS (Intel 8080 in 1974), CMOS became quickly the leading switching capacitances: α is the activity factor of the overall technology, used by Intel since 1985 with 80386 CPU. circuit MOS technologies obey an empirical law, stated in 1965 and 2 Pd = Pdstatic + α x ΣCi x Vdd x F (2) known as Moore’s law: the number of transistors integrated on a chip doubles every N months. Fig. 1 presents the evolution for II. CONSEQUENCES OF MOORE LAW DRAM memories, processors (MPU) and three types of read- only memories [1]. The growth rate decreases with years, from A.
    [Show full text]
  • Comparing the Power and Performance of Intel's SCC to State
    Comparing the Power and Performance of Intel’s SCC to State-of-the-Art CPUs and GPUs Ehsan Totoni, Babak Behzad, Swapnil Ghike, Josep Torrellas Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA E-mail: ftotoni2, bbehza2, ghike2, [email protected] Abstract—Power dissipation and energy consumption are be- A key architectural challenge now is how to support in- coming increasingly important architectural design constraints in creasing parallelism and scale performance, while being power different types of computers, from embedded systems to large- and energy efficient. There are multiple options on the table, scale supercomputers. To continue the scaling of performance, it is essential that we build parallel processor chips that make the namely “heavy-weight” multi-cores (such as general purpose best use of exponentially increasing numbers of transistors within processors), “light-weight” many-cores (such as Intel’s Single- the power and energy budgets. Intel SCC is an appealing option Chip Cloud Computer (SCC) [1]), low-power processors (such for future many-core architectures. In this paper, we use various as embedded processors), and SIMD-like highly-parallel archi- scalable applications to quantitatively compare and analyze tectures (such as General-Purpose Graphics Processing Units the performance, power consumption and energy efficiency of different cutting-edge platforms that differ in architectural build. (GPGPUs)). These platforms include the Intel Single-Chip Cloud Computer The Intel SCC [1] is a research chip made by Intel Labs (SCC) many-core, the Intel Core i7 general-purpose multi-core, to explore future many-core architectures. It has 48 Pentium the Intel Atom low-power processor, and the Nvidia ION2 (P54C) cores in 24 tiles of two cores each.
    [Show full text]
  • Technology & Energy
    This Unit: Technology & Energy • Technology basis • Fabrication (manufacturing) & cost • Transistors & wires CIS 501: Computer Architecture • Implications of transistor scaling (Moore’s Law) • Energy & power Unit 3: Technology & Energy Slides'developed'by'Milo'Mar0n'&'Amir'Roth'at'the'University'of'Pennsylvania'' with'sources'that'included'University'of'Wisconsin'slides' by'Mark'Hill,'Guri'Sohi,'Jim'Smith,'and'David'Wood' CIS 501: Comp. Arch. | Prof. Milo Martin | Technology & Energy 1 CIS 501: Comp. Arch. | Prof. Milo Martin | Technology & Energy 2 Readings Review: Simple Datapath • MA:FSPTCM • Section 1.1 (technology) + 4 • Section 9.1 (power & energy) Register Data Insn File PC s1 s2 d Mem • Paper Mem • G. Moore, “Cramming More Components onto Integrated Circuits” • How are instruction executed? • Fetch instruction (Program counter into instruction memory) • Read registers • Calculate values (adds, subtracts, address generation, etc.) • Access memory (optional) • Calculate next program counter (PC) • Repeat • Clock period = longest delay through datapath CIS 501: Comp. Arch. | Prof. Milo Martin | Technology & Energy 3 CIS 501: Comp. Arch. | Prof. Milo Martin | Technology & Energy 4 Recall: Processor Performance • Programs consist of simple operations (instructions) • Add two numbers, fetch data value from memory, etc. • Program runtime = “seconds per program” = (instructions/program) * (cycles/instruction) * (seconds/cycle) • Instructions per program: “dynamic instruction count” • Runtime count of instructions executed by the program • Determined by program, compiler, instruction set architecture (ISA) • Cycles per instruction: “CPI” (typical range: 2 to 0.5) • On average, how many cycles does an instruction take to execute? • Determined by program, compiler, ISA, micro-architecture Technology & Fabrication • Seconds per cycle: clock period, length of each cycle • Inverse metric: cycles per second (Hertz) or cycles per ns (Ghz) • Determined by micro-architecture, technology parameters • This unit: transistors & semiconductor technology CIS 501: Comp.
    [Show full text]
  • Introduction to Cpu
    microprocessors and microcontrollers - sadri 1 INTRODUCTION TO CPU Mohammad Sadegh Sadri Session 2 Microprocessor Course Isfahan University of Technology Sep., Oct., 2010 microprocessors and microcontrollers - sadri 2 Agenda • Review of the first session • A tour of silicon world! • Basic definition of CPU • Von Neumann Architecture • Example: Basic ARM7 Architecture • A brief detailed explanation of ARM7 Architecture • Hardvard Architecture • Example: TMS320C25 DSP microprocessors and microcontrollers - sadri 3 Agenda (2) • History of CPUs • 4004 • TMS1000 • 8080 • Z80 • Am2901 • 8051 • PIC16 microprocessors and microcontrollers - sadri 4 Von Neumann Architecture • Same Memory • Program • Data • Single Bus microprocessors and microcontrollers - sadri 5 Sample : ARM7T CPU microprocessors and microcontrollers - sadri 6 Harvard Architecture • Separate memories for program and data microprocessors and microcontrollers - sadri 7 TMS320C25 DSP microprocessors and microcontrollers - sadri 8 Silicon Market Revenue Rank Rank Country of 2009/2008 Company (million Market share 2009 2008 origin changes $ USD) Intel 11 USA 32 410 -4.0% 14.1% Corporation Samsung 22 South Korea 17 496 +3.5% 7.6% Electronics Toshiba 33Semiconduc Japan 10 319 -6.9% 4.5% tors Texas 44 USA 9 617 -12.6% 4.2% Instruments STMicroelec 55 FranceItaly 8 510 -17.6% 3.7% tronics 68Qualcomm USA 6 409 -1.1% 2.8% 79Hynix South Korea 6 246 +3.7% 2.7% 812AMD USA 5 207 -4.6% 2.3% Renesas 96 Japan 5 153 -26.6% 2.2% Technology 10 7 Sony Japan 4 468 -35.7% 1.9% microprocessors and microcontrollers
    [Show full text]
  • AI Acceleration – Image Processing As a Use Case
    AI acceleration – image processing as a use case Masoud Daneshtalab Associate Professor at MDH www.idt.mdh.se/~md/ Outline • Dark Silicon • Heterogeneouse Computing • Approximation – Deep Neural Networks 2 The glory of Moore’s law Intel 4004 Intel Core i7 980X 2300 transistors 1.17B transistors 740 kHz clock 3.33 GHz clock 10um process 32nm process 10.8 usec/inst 73.4 psec/inst Every 2 Years • Double the number of transistors • Build higher performance general-purpose processors ⁻ Make the transistors available to masses ⁻ Increase performance (1.8×↑) ⁻ Lower the cost of computing (1.8×↓) Semiconductor trends ITRS roadmap for SoC Design Complexity Trens ITRS: International Technology Roadmap for Semiconductors Expected number of processing elements into a System-on-Chip (SoC). 4 Semiconductor trends ITRS roadmap for SoC Design Complexity Trens 5 Semiconductor trends Network-on-Chip (NoC) –based Multi/Many-core Systems Founder: Andreas Olofsson Sponsored by Ericsson AB 6 Dark Silicon Era The catch is powering exponentially increasing number of transistors without melting the chip down. 10 10 Chip Transistor Count 2,200,000,000 109 Chip Power 108 107 106 105 2300 104 103 130 W 102 101 0.5 W 100 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 If you cannot power them, why bother making them? 7 Dark Silicon Era The catch is powering exponentially increasing number of transistors without melting the chip down. 10 10 Chip Transistor Count 2,200,000,000 109 Chip Power 108 107 Dark Silicon 106 105 (Utilization Wall) 2300 104 Fraction of transistors that need to be 103 powered off at all times due to power 130 W 2 constraints.
    [Show full text]
  • AI Chips: What They Are and Why They Matter
    APRIL 2020 AI Chips: What They Are and Why They Matter An AI Chips Reference AUTHORS Saif M. Khan Alexander Mann Table of Contents Introduction and Summary 3 The Laws of Chip Innovation 7 Transistor Shrinkage: Moore’s Law 7 Efficiency and Speed Improvements 8 Increasing Transistor Density Unlocks Improved Designs for Efficiency and Speed 9 Transistor Design is Reaching Fundamental Size Limits 10 The Slowing of Moore’s Law and the Decline of General-Purpose Chips 10 The Economies of Scale of General-Purpose Chips 10 Costs are Increasing Faster than the Semiconductor Market 11 The Semiconductor Industry’s Growth Rate is Unlikely to Increase 14 Chip Improvements as Moore’s Law Slows 15 Transistor Improvements Continue, but are Slowing 16 Improved Transistor Density Enables Specialization 18 The AI Chip Zoo 19 AI Chip Types 20 AI Chip Benchmarks 22 The Value of State-of-the-Art AI Chips 23 The Efficiency of State-of-the-Art AI Chips Translates into Cost-Effectiveness 23 Compute-Intensive AI Algorithms are Bottlenecked by Chip Costs and Speed 26 U.S. and Chinese AI Chips and Implications for National Competitiveness 27 Appendix A: Basics of Semiconductors and Chips 31 Appendix B: How AI Chips Work 33 Parallel Computing 33 Low-Precision Computing 34 Memory Optimization 35 Domain-Specific Languages 36 Appendix C: AI Chip Benchmarking Studies 37 Appendix D: Chip Economics Model 39 Chip Transistor Density, Design Costs, and Energy Costs 40 Foundry, Assembly, Test and Packaging Costs 41 Acknowledgments 44 Center for Security and Emerging Technology | 2 Introduction and Summary Artificial intelligence will play an important role in national and international security in the years to come.
    [Show full text]
  • Summarizing CPU and GPU Design Trends with Product Data
    Summarizing CPU and GPU Design Trends with Product Data Yifan Sun, Nicolas Bohm Agostini, Shi Dong, and David Kaeli Northeastern University Email: fyifansun, agostini, shidong, [email protected] Abstract—Moore’s Law and Dennard Scaling have guided the products. Equipped with this data, we answer the following semiconductor industry for the past few decades. Recently, both questions: laws have faced validity challenges as transistor sizes approach • Are Moore’s Law and Dennard Scaling still valid? If so, the practical limits of physics. We are interested in testing the validity of these laws and reflect on the reasons responsible. In what are the factors that keep the laws valid? this work, we collect data of more than 4000 publicly-available • Do GPUs still have computing power advantages over CPU and GPU products. We find that transistor scaling remains CPUs? Is the computing capability gap between CPUs critical in keeping the laws valid. However, architectural solutions and GPUs getting larger? have become increasingly important and will play a larger role • What factors drive performance improvements in GPUs? in the future. We observe that GPUs consistently deliver higher performance than CPUs. GPU performance continues to rise II. METHODOLOGY because of increases in GPU frequency, improvements in the thermal design power (TDP), and growth in die size. But we We have collected data for all CPU and GPU products (to also see the ratio of GPU to CPU performance moving closer to our best knowledge) that have been released by Intel, AMD parity, thanks to new SIMD extensions on CPUs and increased (including the former ATI GPUs)1, and NVIDIA since January CPU core counts.
    [Show full text]
  • ARM Architecture
    TheARM Architecture Thomas DeMeo Thomas Becker Agenda • What is ARM? • ARM History • ARM Design Objectives • ARM Architectures What is the ARM Architecture? • Advanced RISC Machines • ARM is a 32-bit RISC ISA • Most popular 32-bit ISA on the market • Found in nearly every kind of consumer electronic: o 90% of all embedded 32-bit RISC processors o 98% of all cell phones o Hard drives, routers, phones, tablets, handheld video game consoles, calculators, and more • Recently introduced 64-bit architecture and ISA, labelled 'AArch64' or 'A64' A Bit of History... • A company named Acorn Computers had released the BBC Micro in 1981 • The Micro used the 6502 • Became very popular in the British educational system • Soon dominated by the IBM PC Acorn's next steps • Acorn was focused on meeting the needs of the business community, and this meant they needed more power. • After trying all of the 16 and 32-bit processors on the market they found none to be satisfactory for their purposes. The data bandwidth was not sufficiently greater than the current 8-bit 6502. • They decided to go solo. Acorn's next steps • So Acorn decided to make their own. • Inspired by the Berkeley RISC Project, which was the basis of the SPARC processor, Acorn figured that if some graduate students could build a 32-bit processor, so could they. • In 1983, the Acorn RISC Machine project had been established. Acorn's next steps • The 32-bit world • Reputable R&D department A Bit of History ARM first reached silicon in 1985, and worked just as intended.
    [Show full text]
  • Madison Processor
    TheThe RoadRoad toto BillionBillion TransistorTransistor ProcessorProcessor ChipsChips inin thethe NearNear FutureFuture DileepDileep BhandarkarBhandarkar RogerRoger GolliverGolliver Intel Corporation April 22th, 2003 Copyright © 2002 Intel Corporation. Outline yy SemiconductorSemiconductor TechnologyTechnology EvolutionEvolution yy Moore’sMoore’s LawLaw VideoVideo yy ParallelismParallelism inin MicroprocessorsMicroprocessors TodayToday yy MultiprocessorMultiprocessor SystemsSystems yy TheThe PathPath toto BillionBillion TransistorsTransistors yy SummarySummary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others BirthBirth ofof thethe RevolutionRevolution ---- TheThe IntelIntel 40044004 IntroducedIntroduced NovemberNovember 15,15, 19711971 108108 KHz,KHz, 5050 KIPsKIPs ,, 23002300 1010μμ transistorstransistors 20012001 –– Pentium®Pentium® 44 ProcessorProcessor Introduced November 20, 2000 @1.5 GHz core, 400 MT/s bus 42 Million 0.18µ transistors August 27, 2001 @2 GHz, 400 MT/s bus 640 SPECint_base2000* 704 SPECfp_base2000* SourceSource:: hhtttptp:/://www/www.specbench.org/cpu2000/results/.specbench.org/cpu2000/results/ 3030 YearsYears ofof ProgressProgress yy40044004 toto PentiumPentium®® 44 processorprocessor –– TransistorTransistor count:count: 20,000x20,000x increaseincrease –– Frequency:Frequency: 20,000x20,000x increaseincrease
    [Show full text]
  • The Supercomputer in Your Pocket
    14 The supercomputer in your pocket Over the past decade, the computing landscape has shifted from beige boxes under desks to a mix of laptops, smartphones, tablets, and hybrid devices. This explosion of mobile CPUs is a profound shift for the semiconductor industry and will have a dramatic impact on its competitive intensity and structure. Harald Bauer, The shift toward mobile computing, at the expense Two major contests will play out for semi- Yetloong Goh, of tethered CPUs, is a major change that has conductor vendors competing in the mobile Joseph Park, raised the competitive metabolism of the semi- arena: the clash between vertical-integration Sebastian Schink, and conductor sector. Mobile, now the central and horizontal-specialization business models, Christopher Thomas battleground of the technology industry, is having and the clash between the two dominant an intense impact on the larger semiconductor architectures and ecosystems, ARM and x86. landscape. Mobile-computing processor require- Each of these battles will be explored in ments now drive the industry, setting design detail below. requirements for transistor structures, process generations, and CPU architecture. It’s the More and more smartphones are as capable as must-win arena for all the semiconductor sector’s the computers of yesteryear. Laptops have top companies. Over the next half decade, leading displaced desktops as the most popular form players will spend hundreds of billions of dollars factor for PCs, and, thanks to the success on R&D. of Apple’s iPad, tablets have stormed into the 15 marketplace. PC original equipment manu- assumes a doubling of processing power every facturers (OEMs) aren’t waiting to lose consumer 18 months, holds the same path for the next few share of wallet to tablets.
    [Show full text]
  • University of Wisconsin-Madison CS/ECE 752 Advanced Computer Architecture I
    University of Wisconsin-Madison CS/ECE 752 Advanced Computer Architecture I Professor Matt Sinclair Unit 1: Technology, Cost, Performance, & Power Slides developed by Milo Martin & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood. Slides enhanced by Milo Martin, Mark Hill, and David Wood with sources that included Profs. Asanovic, Falsafi, Hoe, Lipasti, Sankaralingam, Shen, Smith, Sohi, Vijaykumar, and Wood CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 1 Announcements • Manually enrolled everyone in the course on Piazza • Using this for all announcements and questions moving forward • Poll: did everyone receive email sent to class mailing list? • Poll: who is interested in lecture from the Condor team? • Will post poll on Piazza • Lecture slides posted – see Course Schedule • Reminder: HW0 due next Wednesday CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 2 This Unit • What is a computer and what is computer architecture • Forces that shape computer architecture • Applications (covered last time) • Semiconductor technology • Evaluation metrics: parameters and technology basis • Cost • Performance • Power • Reliability CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 3 Readings • H&P • 1.4 – 1.7 (technology) • Paper • G. Moore, “Cramming More Components onto Integrated Circuits” CS/ECE 752 (Sinclair): Technology, Cost, Performance, Power, etc. 4 What is Computer Architecture? (review)
    [Show full text]
  • Cyrix 5X86: Fifth-Generation Design Emphasizes Maximum Performance While Minimizing Transistor Count
    Cyrix 5x86: Fifth-Generation Design Emphasizes Maximum Performance While Minimizing Transistor Count Darrell Benke and Tom Brightman Cyrix Corporation Abstract concurrently. While concurrent execution â increases performance, it does so at a The Cyrix 5x86Ô processor is a new substantial cost in design complexity and performance-optimized x86 design. Drawing on transistors required. the experience gained in the development of Cyrix’s sixth-generation M1 superscalar CPU, Unfortunately, initial superscalar designs have the 5x86 design incorporates many of the been frustrated by the problems of inter- performance-enhancing techniques of the M1 in instruction dependency checking and resource a novel way to meet the key requirement of low usage contention. To manage these conflicts, power system designs: maximum performance some designs imposed instruction-issuing per watt of power consumed. The challenge of constraints on the theory that application the 5x86 design was to achieve compelling programs could be recompiled quickly and system performance at no more than half the easily (based on knowledge of the instruction power consumption of competing solutions. issuing rules and restrictions) to optimize code That goal was achieved by critically evaluating flow. Examples of this approach include the the costs (measured in power and transistor PowerPC 601Ô, HP-PA7100Ô and the Intel count) versus the benefits (measured in PentiumÔ processor - CPUs that can issue two performance) of key architectural features. This instructions simultaneously but only under analysis enabled the careful selection of restrictive, special-case conditions. These architectural features that deliver the desired conditions are more of a limitation for the performance at less than half the power Pentium since the majority of the software it consumption of competing fifth-generation executes will be from the installed base of x86 alternatives.
    [Show full text]