Intel Itanium 2 Processor Architecture

Total Page:16

File Type:pdf, Size:1020Kb

Intel Itanium 2 Processor Architecture IntelIntel®® ItaniumItanium®® 22 ProcessorProcessor ArchitectureArchitecture www.intel.com/software/college ® Intel, Itanium, and the Intel logo are trademarks or registered trademarks of * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. Intel Corporation or its subsidiaries in the United States or other countries. ItaniumItanium®® ProcessorProcessor ArchitectureArchitecture SelectedSelected KeyKey FeaturesFeatures y 64-bit Addressing Flat Memory Model y Instruction Level Parallelism (6-way) y Large Register Files y Automatic Register Stack Engine y Predication y Software Pipelining Support ( Register Rotation + Loop Control Hardware ) y Sophisticated Branch Architecture y Control & Data Speculation y Powerful 64-bit Integer Architecture y Advanced 82-bit Floating Point Architecture ® 2 * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. TraditionalTraditional Architectures:Architectures: LimitedLimited ParallelismParallelism Original Source Sequential Machine Code Code Hardware Compile parallelizedparallelized codecode multiple functional units Execution Units Available- . Used Inefficiently . TodayTodayToday’s’’ss ProcessorsProcessorsProcessors areareare oftenoftenoften 60%60%60% IdleIdleIdle ® * Other brands and names may be claimed as the property of others. IntelIntel®® ItaniumItanium®® Architecture:Architecture: ExplicitExplicit ParallelismParallelism Original Source Parallel Machine Code Code Compile Compiler Hardware multiple functional units Itanium Architecture Compiler Views More efficient use of . Wider execution resources . Scope IncreasesIncreases ParallelParallel ExecutionExecution ® * Other brands and names may be claimed as the property of others. EPICEPIC InstructionInstruction ParallelismParallelism Source Code Instruction Groups No RAW or WAW (series of bundles) dependencies Issued in parallel depending on resources Instruction 3 instructions + Bundles template (3 Instructions) 3 x 41 bits + 5 bits = 128 bits Up to 6 instructions executed per clock ® 5 * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. InstructionInstruction LevelLevel ParallelismParallelism y Instruction Groups instr 1 // 1st. group – No RAW or WAW dependencies instr 2;; // 1st. group – Delimited by ‘stops’ in assembly code instr 3 // 2nd. group – Instructions in groups issued in parallel, instr 4 // 2nd. group depending on available resources. y Instruction Bundles { .mii ld4 r28=[r8] // load – 3 instructions and 1 template in 128-bit bundle – 3 instructions and 1 template in 128-bit bundle add r9=2,r1 // Int op. – Instruction dependencies by using ‘stops’ add r30=1,r1 // Int op. – Instruction groups can span multiple bundles } 128 bits (bundle) Instruction 2 Instruction 1 Instruction 0 template 41 bits 41 bits 41 bits 5 bits Memory (M) Memory (M) Integer (I) (MMI) FlexibleFlexible IssueIssue CapabilityCapability ® * Other brands and names may be claimed as the property of others. LargeLarge RegisterRegister SetSet General Floating-point Predicate Branch Application Registers Registers Registers Registers Registers NaT 64-bit 82-bit 64-bit 64-bit GR0 0 FR0 + 0.0 PR0 1 BR0 AR0 GR1 FR1 + 1.0 PR1 AR1 BR7 GR31 FR31 PR15 AR31 GR32 FR32 PR16 AR32 PR63 GR127 FR127 AR127 32 Static 32 Static 16 Static 96 Stacked 96 Rotating 48 Rotating ® 7 * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. PredicationPredication yy PredicatePredicate registersregisters activate/inactivateactivate/inactivate instructionsinstructions yy PredicatePredicate RegistersRegisters areare setset byby CompareCompare InstructionsInstructions – Example: cmp.eq p1, p2 = r2, r3 yy (Almost)(Almost) allall instructionsinstructions cancan bebe predicatedpredicated (p1) ldfd f32=[r32],8 (p2) fmpy.d f36=f6,f36 yy Predication:Predication: – eliminates branching in if/else logic blocks – creates larger code blocks for optimization – simplifies start up/shutdown of pipelined loops ® 8 * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. PredicationPredication y Code Example: absolute difference of two numbers C Code if (r2 >= r3) Non-Predicated r4 = r2 - r3; Pseudo Code else r4 = r3 - r2; cmpGE r2, r3 jump_zero P2 P1: sub r4 = r2, r3 Predicated Assembly Code jump end cmp.ge p1,p2 = r2,r3 ;; P2: sub r4 = r3, r2 (p1) sub r4 = r2,r3 end: ... (p2) sub r4 = r3, r2 PredicationPredication RemovesRemoves Branches,Branches, EnablesEnables ParallelParallel ExecutionExecution ® 9 * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. RegisterRegister StackStack yy GeneralGeneral registersregisters (GRs)(GRs) 00-31-31 areare globalglobal toto allall procedures.procedures. 96 Stacked yy StackedStacked registersregisters beginbegin atat GR32GR32 andand areare 127 locallocal toto eacheach procedure.procedure. PROC B yy EachEach procedureprocedure registerregister stackstack frameframe variesvaries fromfrom 00 toto 9696 registers.registers. Overlap yy OnlyOnly GRsGRs implementimplement aa registerregister stack.stack. PROC A 32 –– TheThe FRs,FRs, PRs,PRs, andand BRsBRs areare globalglobal toto allall 31 procedures.procedures. 32 Global yy RegisterRegister stackstack engineengine (RSE)(RSE) 0 –– UponUpon stackstack overflow/underflow,overflow/underflow, aa backingbacking storestore transparentlytransparently savessaves oror restoresrestores thethe registers.registers. OptimizesOptimizes thethe Call/ReturnCall/Return MechanismMechanism ® 10 Intel, Itanium, and the Intel logo are trademarks or registered trademarks of * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. Intel Corporation or its subsidiaries in the United States or other countries. IntroductionIntroduction RegisterRegister StackStack yy CallCall changeschanges thethe frameframe toto contcontainain onlyonly thethe callercaller output.output. yy AllocAllocsets sets thethe frameframe regionregion toto thethe desireddesired size.size. –– ThereThere areare threethree architecturearchitecture parameters:parameters: local,local, output,output, andand rotarotating.ting. yy ReturnReturn restoresrestores thethe stackstack frameframe ofof thethe caller.caller. 5656 Outputs 4848 VirtualVirtual Local 5252 5252 Outputs Outputs (Inputs) Outputs 4646 32 3232 4646 Local Local (Inputs) (Inputs) 3232 Call Alloc Ret 3232 PROCPROC AA PROCPROC BB PROCPROC BB PROCPROC AA AvoidAvoid Register Register Spill/Fill Spill/Fill Upon Upon Procedure Procedure Call/Return Call/Return ® 11 Intel, Itanium, and the Intel logo are trademarks or registered trademarks of * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. Intel Corporation or its subsidiaries in the United States or other countries. AllocAlloc SemanticsSemantics Size of Size of local rotating allocalloc rr1 == ar.pfs,ar.pfs, i,i, l,l, o,o, rr Gets copy Size of Size of of AR.PFS input output where:where: ¾¾NewNew stackstack frameframe ofof sizesize (i(i ++ ll ++ o)o) isis allocatedallocated onon thethe generalgeneral registerregister stack.stack. ¾¾PreviousPrevious functionfunction statestate (PFS)(PFS) registerregister isis copiedcopied toto registerregister specifiedspecified byby rr1.1. ¾¾ThisThis instructioninstruction maymay alsoalso resizeresize thethe currentcurrent stackstack frame.frame. ® 12 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. ApplicationApplication ProgrammingProgramming ModelModel RegisterRegister StackStack CFMCFM PFMPFM InstructionInstruction StackedStacked RegistersRegisters 3232 4646 5252 solsol sofsof solsol sofsof AA frameframe Input A Local A Output A 1414 2121 xxxx xxxx callcall 3232 3838 BB frameframe 00 77 1414 2121 Output B1 allocalloc 3232 4747 5050 BB frameframe Input B Loc B Out B2 1515 1919 1414 2121 returnreturn 3232 4646 5252 AA frameframe Input A Local A Output A 1414 2121 1414 2121 Note:Note: ThisThis isis anan animatedanimated slide;slide; pleaseplease viewview inin slideslide show.show. ® 13 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. ApplicationApplication ProgrammingProgramming ModelModel RegisterRegister StackStack EngineEngine MemoryMemory RegisterRegister StackStack (Backing(Backing Store)Store) CurrentCurrent PROC C BSPBSP DirtyDirty PROC B RSE BSPSTOREBSPSTORE CleanClean PROC A PROC A PROC A’s ancestors Higher Higher register memory addresses addresses ® 14 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names may be claimed as the property of others. Copyright © 2004 Intel Corporation. All rights reserved. RegisterRegister rotationrotation yy Example:Example: 8 general y Floating point and predicate registers registers rotating,
Recommended publications
  • UNIT 8B a Full Adder
    UNIT 8B Computer Organization: Levels of Abstraction 15110 Principles of Computing, 1 Carnegie Mellon University - CORTINA A Full Adder C ABCin Cout S in 0 0 0 A 0 0 1 0 1 0 B 0 1 1 1 0 0 1 0 1 C S out 1 1 0 1 1 1 15110 Principles of Computing, 2 Carnegie Mellon University - CORTINA 1 A Full Adder C ABCin Cout S in 0 0 0 0 0 A 0 0 1 0 1 0 1 0 0 1 B 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 C S out 1 1 0 1 0 1 1 1 1 1 ⊕ ⊕ S = A B Cin ⊕ ∧ ∨ ∧ Cout = ((A B) C) (A B) 15110 Principles of Computing, 3 Carnegie Mellon University - CORTINA Full Adder (FA) AB 1-bit Cout Full Cin Adder S 15110 Principles of Computing, 4 Carnegie Mellon University - CORTINA 2 Another Full Adder (FA) http://students.cs.tamu.edu/wanglei/csce350/handout/lab6.html AB 1-bit Cout Full Cin Adder S 15110 Principles of Computing, 5 Carnegie Mellon University - CORTINA 8-bit Full Adder A7 B7 A2 B2 A1 B1 A0 B0 1-bit 1-bit 1-bit 1-bit ... Cout Full Full Full Full Cin Adder Adder Adder Adder S7 S2 S1 S0 AB 8 ⁄ ⁄ 8 C 8-bit C out FA in ⁄ 8 S 15110 Principles of Computing, 6 Carnegie Mellon University - CORTINA 3 Multiplexer (MUX) • A multiplexer chooses between a set of inputs. D1 D 2 MUX F D3 D ABF 4 0 0 D1 AB 0 1 D2 1 0 D3 1 1 D4 http://www.cise.ufl.edu/~mssz/CompOrg/CDAintro.html 15110 Principles of Computing, 7 Carnegie Mellon University - CORTINA Arithmetic Logic Unit (ALU) OP 1OP 0 Carry In & OP OP 0 OP 1 F 0 0 A ∧ B 0 1 A ∨ B 1 0 A 1 1 A + B http://cs-alb-pc3.massey.ac.nz/notes/59304/l4.html 15110 Principles of Computing, 8 Carnegie Mellon University - CORTINA 4 Flip Flop • A flip flop is a sequential circuit that is able to maintain (save) a state.
    [Show full text]
  • With Extreme Scale Computing the Rules Have Changed
    With Extreme Scale Computing the Rules Have Changed Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 11/17/15 1 • Overview of High Performance Computing • With Extreme Computing the “rules” for computing have changed 2 3 • Create systems that can apply exaflops of computing power to exabytes of data. • Keep the United States at the forefront of HPC capabilities. • Improve HPC application developer productivity • Make HPC readily available • Establish hardware technology for future HPC systems. 4 11E+09 Eflop/s 362 PFlop/s 100000000100 Pflop/s 10000000 10 Pflop/s 33.9 PFlop/s 1000000 1 Pflop/s SUM 100000100 Tflop/s 166 TFlop/s 1000010 Tflop /s N=1 1 Tflop1000/s 1.17 TFlop/s 100 Gflop/s100 My Laptop 70 Gflop/s N=500 10 59.7 GFlop/s 10 Gflop/s My iPhone 4 Gflop/s 1 1 Gflop/s 0.1 100 Mflop/s 400 MFlop/s 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2015 1 Eflop/s 1E+09 420 PFlop/s 100000000100 Pflop/s 10000000 10 Pflop/s 33.9 PFlop/s 1000000 1 Pflop/s SUM 100000100 Tflop/s 206 TFlop/s 1000010 Tflop /s N=1 1 Tflop1000/s 1.17 TFlop/s 100 Gflop/s100 My Laptop 70 Gflop/s N=500 10 59.7 GFlop/s 10 Gflop/s My iPhone 4 Gflop/s 1 1 Gflop/s 0.1 100 Mflop/s 400 MFlop/s 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2015 1E+10 1 Eflop/s 1E+09 100 Pflop/s 100000000 10 Pflop/s 10000000 1 Pflop/s 1000000 SUM 100 Tflop/s 100000 10 Tflop/s N=1 10000 1 Tflop/s 1000 100 Gflop/s N=500 100 10 Gflop/s 10 1 Gflop/s 1 100 Mflop/s 0.1 1996 2002 2020 2008 2014 1E+10 1 Eflop/s 1E+09 100 Pflop/s 100000000 10 Pflop/s 10000000 1 Pflop/s 1000000 SUM 100 Tflop/s 100000 10 Tflop/s N=1 10000 1 Tflop/s 1000 100 Gflop/s N=500 100 10 Gflop/s 10 1 Gflop/s 1 100 Mflop/s 0.1 1996 2002 2020 2008 2014 • Pflops (> 1015 Flop/s) computing fully established with 81 systems.
    [Show full text]
  • Theoretical Peak FLOPS Per Instruction Set on Modern Intel Cpus
    Theoretical Peak FLOPS per instruction set on modern Intel CPUs Romain Dolbeau Bull – Center for Excellence in Parallel Programming Email: [email protected] Abstract—It used to be that evaluating the theoretical and potentially multiple threads per core. Vector of peak performance of a CPU in FLOPS (floating point varying sizes. And more sophisticated instructions. operations per seconds) was merely a matter of multiplying Equation2 describes a more realistic view, that we the frequency by the number of floating-point instructions will explain in details in the rest of the paper, first per cycles. Today however, CPUs have features such as vectorization, fused multiply-add, hyper-threading or in general in sectionII and then for the specific “turbo” mode. In this paper, we look into this theoretical cases of Intel CPUs: first a simple one from the peak for recent full-featured Intel CPUs., taking into Nehalem/Westmere era in section III and then the account not only the simple absolute peak, but also the full complexity of the Haswell family in sectionIV. relevant instruction sets and encoding and the frequency A complement to this paper titled “Theoretical Peak scaling behavior of current Intel CPUs. FLOPS per instruction set on less conventional Revision 1.41, 2016/10/04 08:49:16 Index Terms—FLOPS hardware” [1] covers other computing devices. flop 9 I. INTRODUCTION > operation> High performance computing thrives on fast com- > > putations and high memory bandwidth. But before > operations => any code or even benchmark is run, the very first × micro − architecture instruction number to evaluate a system is the theoretical peak > > - how many floating-point operations the system > can theoretically execute in a given time.
    [Show full text]
  • Misleading Performance Reporting in the Supercomputing Field David H
    Misleading Performance Reporting in the Supercomputing Field David H. Bailey RNR Technical Report RNR-92-005 December 1, 1992 Ref: Scientific Programming, vol. 1., no. 2 (Winter 1992), pg. 141–151 Abstract In a previous humorous note, I outlined twelve ways in which performance figures for scientific supercomputers can be distorted. In this paper, the problem of potentially mis- leading performance reporting is discussed in detail. Included are some examples that have appeared in recent published scientific papers. This paper also includes some pro- posed guidelines for reporting performance, the adoption of which would raise the level of professionalism and reduce the level of confusion in the field of supercomputing. The author is with the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center, Moffett Field, CA 94035. 1 1. Introduction Many readers may have read my previous article “Twelve Ways to Fool the Masses When Giving Performance Reports on Parallel Computers” [5]. The attention that this article received frankly has been surprising [11]. Evidently it has struck a responsive chord among many professionals in the field who share my concerns. The following is a very brief summary of the “Twelve Ways”: 1. Quote only 32-bit performance results, not 64-bit results, and compare your 32-bit results with others’ 64-bit results. 2. Present inner kernel performance figures as the performance of the entire application. 3. Quietly employ assembly code and other low-level language constructs, and compare your assembly-coded results with others’ Fortran or C implementations. 4. Scale up the problem size with the number of processors, but don’t clearly disclose this fact.
    [Show full text]
  • Intel Xeon & Dgpu Update
    Intel® AI HPC Workshop LRZ April 09, 2021 Morning – Machine Learning 9:00 – 9:30 Introduction and Hardware Acceleration for AI OneAPI 9:30 – 10:00 Toolkits Overview, Intel® AI Analytics toolkit and oneContainer 10:00 -10:30 Break 10:30 - 12:30 Deep dive in Machine Learning tools Quizzes! LRZ Morning Session Survey 2 Afternoon – Deep Learning 13:30 – 14:45 Deep dive in Deep Learning tools 14:45 – 14:50 5 min Break 14:50 – 15:20 OpenVINO 15:20 - 15:45 25 min Break 15:45 – 17:00 Distributed training and Federated Learning Quizzes! LRZ Afternoon Session Survey 3 INTRODUCING 3rd Gen Intel® Xeon® Scalable processors Performance made flexible Up to 40 cores per processor 20% IPC improvement 28 core, ISO Freq, ISO compiler 1.46x average performance increase Geomean of Integer, Floating Point, Stream Triad, LINPACK 8380 vs. 8280 1.74x AI inference increase 8380 vs. 8280 BERT Intel 10nm Process 2.65x average performance increase vs. 5-year-old system 8380 vs. E5-2699v4 Performance varies by use, configuration and other factors. Configurations see appendix [1,3,5,55] 5 AI Performance Gains 3rd Gen Intel® Xeon® Scalable Processors with Intel Deep Learning Boost Machine Learning Deep Learning SciKit Learn & XGBoost 8380 vs 8280 Real Time Inference Batch 8380 vs 8280 Inference 8380 vs 8280 XGBoost XGBoost up to up to Fit Predict Image up up Recognition to to 1.59x 1.66x 1.44x 1.30x Mobilenet-v1 Kmeans Kmeans up to Fit Inference Image up to up up Classification to to 1.52x 1.56x 1.36x 1.44x ResNet-50-v1.5 Linear Regression Linear Regression Fit Inference Language up to up to up up Processing to 1.44x to 1.60x 1.45x BERT-large 1.74x Performance varies by use, configuration and other factors.
    [Show full text]
  • Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures
    Computer Architecture A Quantitative Approach, Fifth Edition Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Copyright © 2012, Elsevier Inc. All rights reserved. 1 Contents 1. SIMD architecture 2. Vector architectures optimizations: Multiple Lanes, Vector Length Registers, Vector Mask Registers, Memory Banks, Stride, Scatter-Gather, 3. Programming Vector Architectures 4. SIMD extensions for media apps 5. GPUs – Graphical Processing Units 6. Fermi architecture innovations 7. Examples of loop-level parallelism 8. Fallacies Copyright © 2012, Elsevier Inc. All rights reserved. 2 Classes of Computers Classes Flynn’s Taxonomy SISD - Single instruction stream, single data stream SIMD - Single instruction stream, multiple data streams New: SIMT – Single Instruction Multiple Threads (for GPUs) MISD - Multiple instruction streams, single data stream No commercial implementation MIMD - Multiple instruction streams, multiple data streams Tightly-coupled MIMD Loosely-coupled MIMD Copyright © 2012, Elsevier Inc. All rights reserved. 3 Introduction Advantages of SIMD architectures 1. Can exploit significant data-level parallelism for: 1. matrix-oriented scientific computing 2. media-oriented image and sound processors 2. More energy efficient than MIMD 1. Only needs to fetch one instruction per multiple data operations, rather than one instr. per data op. 2. Makes SIMD attractive for personal mobile devices 3. Allows programmers to continue thinking sequentially SIMD/MIMD comparison. Potential speedup for SIMD twice that from MIMID! x86 processors expect two additional cores per chip per year SIMD width to double every four years Copyright © 2012, Elsevier Inc. All rights reserved. 4 Introduction SIMD parallelism SIMD architectures A. Vector architectures B. SIMD extensions for mobile systems and multimedia applications C.
    [Show full text]
  • Summarizing CPU and GPU Design Trends with Product Data
    Summarizing CPU and GPU Design Trends with Product Data Yifan Sun, Nicolas Bohm Agostini, Shi Dong, and David Kaeli Northeastern University Email: fyifansun, agostini, shidong, [email protected] Abstract—Moore’s Law and Dennard Scaling have guided the products. Equipped with this data, we answer the following semiconductor industry for the past few decades. Recently, both questions: laws have faced validity challenges as transistor sizes approach • Are Moore’s Law and Dennard Scaling still valid? If so, the practical limits of physics. We are interested in testing the validity of these laws and reflect on the reasons responsible. In what are the factors that keep the laws valid? this work, we collect data of more than 4000 publicly-available • Do GPUs still have computing power advantages over CPU and GPU products. We find that transistor scaling remains CPUs? Is the computing capability gap between CPUs critical in keeping the laws valid. However, architectural solutions and GPUs getting larger? have become increasingly important and will play a larger role • What factors drive performance improvements in GPUs? in the future. We observe that GPUs consistently deliver higher performance than CPUs. GPU performance continues to rise II. METHODOLOGY because of increases in GPU frequency, improvements in the thermal design power (TDP), and growth in die size. But we We have collected data for all CPU and GPU products (to also see the ratio of GPU to CPU performance moving closer to our best knowledge) that have been released by Intel, AMD parity, thanks to new SIMD extensions on CPUs and increased (including the former ATI GPUs)1, and NVIDIA since January CPU core counts.
    [Show full text]
  • Theoretical Peak FLOPS Per Instruction Set on Less Conventional Hardware
    Theoretical Peak FLOPS per instruction set on less conventional hardware Romain Dolbeau Bull – Center for Excellence in Parallel Programming Email: [email protected] Abstract—This is a companion paper to “Theoreti- popular at the time [4][5][6]. Only the FPUs are cal Peak FLOPS per instruction set on modern Intel shared in Bulldozer. We can take a look back at CPUs” [1]. In it, we survey some alternative hardware for the equation1, replicated from the main paper, and which the peak FLOPS can be of interest. As in the main see how this affects the peak FLOPS. paper, we take into account and explain the peculiarities of the surveyed hardware. Revision 1.16, 2016/10/04 08:40:17 flop 9 Index Terms—FLOPS > operation> > > I. INTRODUCTION > operations => Many different kind of hardware are in use to × micro − architecture instruction> perform computations. No only conventional Cen- > > tral Processing Unit (CPU), but also Graphics Pro- > instructions> cessing Unit (GPU) and other accelerators. In the × > cycle ; main paper [1], we described how to compute the flops = (1) peak FLOPS for conventional Intel CPUs. In this node extension, we take a look at the peculiarities of those cycles 9 × > alternative computation devices. second> > > II. OTHER CPUS cores => × machine architecture A. AMD Family 15h socket > > The AMD Family 15h (the name “15h” comes > sockets> from the hexadecimal value returned by the CPUID × ;> instruction) was introduced in 2011 and is composed node flop of the so-called “construction cores”, code-named For the micro-architecture parts ( =operation, operations instructions Bulldozer, Piledriver, Steamroller and Excavator.
    [Show full text]
  • 5 Microprocessors
    Color profile: Disabled Composite Default screen BaseTech / Mike Meyers’ CompTIA A+ Guide to Managing and Troubleshooting PCs / Mike Meyers / 380-8 / Chapter 5 5 Microprocessors “MEGAHERTZ: This is a really, really big hertz.” —DAVE BARRY In this chapter, you will learn or all practical purposes, the terms microprocessor and central processing how to Funit (CPU) mean the same thing: it’s that big chip inside your computer ■ Identify the core components of a that many people often describe as the brain of the system. You know that CPU CPU makers name their microprocessors in a fashion similar to the automobile ■ Describe the relationship of CPUs and memory industry: CPU names get a make and a model, such as Intel Core i7 or AMD ■ Explain the varieties of modern Phenom II X4. But what’s happening inside the CPU to make it able to do the CPUs amazing things asked of it every time you step up to the keyboard? ■ Install and upgrade CPUs 124 P:\010Comp\BaseTech\380-8\ch05.vp Friday, December 18, 2009 4:59:24 PM Color profile: Disabled Composite Default screen BaseTech / Mike Meyers’ CompTIA A+ Guide to Managing and Troubleshooting PCs / Mike Meyers / 380-8 / Chapter 5 Historical/Conceptual ■ CPU Core Components Although the computer might seem to act quite intelligently, comparing the CPU to a human brain hugely overstates its capabilities. A CPU functions more like a very powerful calculator than like a brain—but, oh, what a cal- culator! Today’s CPUs add, subtract, multiply, divide, and move billions of numbers per second.
    [Show full text]
  • Opinion Ten Reasons Why HP’S Itanium-Based Servers Have Reached the Point-Of-No-Return
    Opinion Ten Reasons Why HP’s Itanium-based Servers Have Reached the Point-of-No-Return Executive Summary Hewlett-Packard (HP) refers to its HP Integrity and Integrity NonStop Itanium-based servers as “business critical systems”. In Q4, 2007, HP sold over $1 billion of these business critical systems. But, since then, due to a number of factors, Itanium-based server sales have declined significantly. Over the past year, business critical systems sales have hovered in the $400 million range per quarter, an almost 60% decline as compared with the 2007 high-mark. From our perspective, HP’s Itanium-based servers have now achieved a form of stasis (a medical term that refers to an inactive state). We expect a rise in Itanium business this quarter (due to pent-up demand for the new Itanium 9500), but we also expect that – within in a few quarters – underlying, dogging problems will again drive Itanium business downward. These problems include HP’s financial woes; increased competition (particularly from Intel x86-based servers); market factors (such as the market move toward Linux and a market preference for x86 architecture); a broken ecosystem (where HP has actually had to take legal action to get a business partner to keep supporting its independent software on Itanium-based platforms); an ill-founded Itanium recovery plan known as “converged infrastructure”; and more (in fact, we list a total of ten reasons why we believe HP’s Itanium-based servers have reached the point-of-no- return on page 2 of this Opinion)… In this Opinion , Clabby Analytics describes why we believe that HP’s business critical Integrity servers have now reached the point-of-no-return.
    [Show full text]
  • The IA-32 Processor Architecture
    The IA-32 processor architecture Nicholas FitzRoy-Dale Document Revision: 1 Date: 2006/05/30 22:31:24 [email protected] http://www.cse.unsw.edu.au/∼disy/ Operating Systems and Distributed Systems Group School of Computer Science and Engineering The University of New South Wales UNSW Sydney 2052, Australia 1 Introduction This report discusses the most common instruction set architecture for desktop microprocessors: IA- 32. From a programmer’s perspective, IA-32 has not changed changed significantly since its introduc- tion with the Intel 80386 processor in 1985. IA-32 implementations, however, have undergone dra- matic changes in order to stay competitive with more modern architectures, particularly in the area of instruction-level parallelism. This report discusses the history of IA-32, and then the architectural features of recent IA-32 im- plementations, with particular regard to caching, multiprocessing, and instruction-level parallelism. An archtectural description is not particularly useful in isolation. Therefore, to provide context, each as- pect is compared with analogous features of other architectures, with particular attention paid to the RISC-style ARM processor and the VLIW-inspired Itanium. 2 A brief history of IA-32 IA-32 first appeared with the 80386 processor, but the architecture was by no means completely new. IA-32’s 8-bit predecessor first appeared in the Datapoint 2200 programmable terminal, released in 1971. Under contract to produce a single-chip version of the terminal’s multiple-chip TTL design, Intel’s im- plementation, the 8008, was not included in the terminal. Intel released the chip in 1972.
    [Show full text]
  • Intel Itanium 2 Processors Get Faster Bus Architecture 18 July 2005
    Intel Itanium 2 Processors Get Faster Bus Architecture 18 July 2005 Intel Corporation today introduced two Intel Itanium based servers. 2 processors which deliver better performance over the current generation for database, business The improved front side bus bandwidth allows for intelligence, enterprise resource planning and 10.6 gigabits of data per second to pass from the technical computing applications. processor to other system components. In contrast, the current generation 400 MHz FSB transfers 6.4 For the first time, Itanium 2 processors have a 667 gigabits of data per second. The ability to move megahertz (MHz) front side bus (FSB), which more data in a very short period of time is critical to connects and transfers data between the compute intensive applications in the scientific, oil microprocessor, chipset and system's main and gas and government industries. memory. Servers designed to utilize the new bus are expected to deliver more than 65 percent Hitachi, which will adopt the new Itanium 2 greater system bandwidth over servers designed processors with the 667 FSB into new Hitachi with current Itanium 2 processors with a 400 MHz BladeSymphony* servers coming in the next 30 FSB. This new capability will help set the stage for days, has also designed a chipset (the the forthcoming dual core Itanium processor, communications controller between the processor codenamed "Montecito," which will feature the and the rest of the computer system) to take same bus architecture. advantage of the new bus architecture. "Intel continues to bring new capabilities to the Platforms using Montecito are expected to deliver Itanium architecture, evolving the platform to up to twice the performance, up to three times the further improve performance for data intensive system bandwidth, and more than 2 1/2 times as tasks," said Kirk Skaugen, general manager of much on-die cache as the current generation of Intel's Server Platforms Group.
    [Show full text]