Loi De Moore, Loi D'amdhal, Quel Futur? La Vision D'intel

Total Page:16

File Type:pdf, Size:1020Kb

Loi De Moore, Loi D'amdhal, Quel Futur? La Vision D'intel LOI DE MOORE, LOI D’AMDHAL, QUEL FUTUR? LA VISION D’INTEL Philippe Thierry SrSrStaff Engineer Intel Corp. Marc Dollfus Responsable HPC secteur public et recherche Intel France Idris --88 janvier 2009 1 Copyright © 2008, Intel Corporation Legal Disclaimers Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Results were derived using simulations run on an architecture simulator or model. Any difference in system hardware or software design or configuration may affect actual performance* All dates and products specified are for planning purposes only and are subject to change without notice Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported. SPEC, SPECint2000, SPECfp2000, SPECint2006, SPECfp2006, SPECjbb, are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor series, not across different processor sequences. See http://www.intel.com/products/processor_number for details. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. All dates and products specified are for planning purposes only and are subject to change without notice Intel, Intel Xeon, Intel Core microarchitecture, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. * Other names and brands may be claimed as the property of others. Copyright © 2008 Intel Corporation. 2 Copyright © 2008, Intel Corporation 1 Intel in High Performance Computing : the big picture Leading Process Manufacturing Leading performance, performance/watt Advanced HPC R&D Dedicated, Broad SW renowned tools expertise portfolio Large scale clusters for test & optimization Defined HPC Platform application building platform blocks A long term commitment to HPC 3 Copyright © 2008, Intel Corporation Agenda: from the micromicro--archarch to the end users 4 Copyright © 2008, Intel Corporation 2 Agenda: from the micromicro--archarch to the end users 5 Copyright © 2008, Intel Corporation Agenda: from the micromicro--archarch to the end users 6 Copyright © 2008, Intel Corporation 3 What G. Moore predicted ?? • 1971 Intel® 4004 : 4/8 bits, 108 kHz, 2300 transistors • 1972 Intel® 8008 : 8 bits, 200 kHz, 3500 transistors • 1974 Intel® 8080 : 8 bits, 2 Mhz, 4500 transistors • 1978 Intel® 8086-8088 : 16 bits, 5 MHz, • 1982 Intel® 80286 : 16/32 bits, plus de 100.000 transistors • 1985 Intel® 80386 : 32 bits, 275.000 transistors • 1989 Intel® 80486 : 32 bits, 25 à 66 MHz, 1,6 millions de transistors • 1993 Intel® Intel® Pentium® : 32 bits, 75 à 133 MHz, 3,1 millions de transistors • 1995 Intel® Pentium® Pro : 32 bits, 150 à 200 MHz, 5,5 millions de transistors • 1997 Intel® Intel® Pentium® II : 32 bits, 7,5 millions de transistors • 1999 Intel® Pentium® III : 32 bits, 450 à 600 MHz, 9,5 millions de transistors • 1999 Intel® Celeron® : 32 bits, 9,5 millions de transistors • 2000 Intel® Pentium® 4 : 32 bits, 1,5 GHz, 42 millions de transistors 7 Copyright © 2008, Intel Corporation Not a simple linear function !! 8 Copyright © 2008, Intel Corporation 4 FOR simple circuits, the cost per component is nearly inversely proportional to the number of components, the result of the equivalent piece of semiconductor in the equivalent package containing more components. But as components are added , decreased yields more than compensate for the increased complexity, tending to raise the cost per component. Thus, there is a minimum cost at any given time in the evolution of the technology. … If we look ahead five years, a plot of costs suggests that the minimum cost per component might be expected in circuits with about 1,000 components per circuit In 1970, the manufacturing cost per component can be expected to be onlya tenth of the present cost. The complexityfor minimum component costs has increased at a rate of roughly a factor of two per year . Certainlyover the short termthis rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer. Electronics, Volume 3838,, Number 88,, April 1919,, 1965 9 Copyright © 2008, Intel Corporation Conclusion intermédiaire 11 • La loi de Moore n’a jamais signifiée – plus de performance – plus de hpc, plus de //ism, plus de scalabilité Juste « il existe un tjs un bon rapport prix / nb de transitor sur un circuit !! » Implicitement , le nb de transistor augmente et les capacités du dit circuit aussi ;-) Mais la seule (?) loi qui domine en HPC reste la loi d’Amdhal et assimilée 10 Copyright © 2008, Intel Corporation 5 However … 11 Copyright © 2008, Intel Corporation Latest steps nb transitor freq (khz) 1000 3,50E+03 100 3,00E+03 10 2,50E+03 1 0,1 2,00E+03 0,01 1,50E+03 0,001 1,00E+03 0,0001 0,00001 5,00E+02 0,000001 0,00E+00 1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 12 Copyright © 2008, Intel Corporation 6 2007 Latest steps 2005 2003 2000 45nm 300mm 65nm 300mm 9090nmnm Dual Core 1000 300300mmmm 3,50E+03 100 130130nmnm 200200mmmm 3,00E+03 10 2,50E+03 1 0,1 2,00E+03 0,01 1,50E+03 0,001 1,00E+03 0,0001 0,00001 5,00E+02 0,000001 0,00E+00 1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 13 Copyright © 2008, Intel Corporation 45 nm HiHi--kk Intel Processor Quad core Intel ® Xeon ® 53xx (Clovertown) Quad core Intel ® Xeon ® 54xx (Harpertown) 65 nm 45 nm Hi-k 22** 22** 22** 22** 143 mm 143 mm 107 mmmm 107 mmmm 582582mm Transistors 820820mm Transistors 8 MB Cache 12 MB Cache 14 *Source: Intel Note: die picture sizes are approximate Copyright © 2008, Intel Corporation 7 Industry’s First 45 nm HighHigh--KK + Metal Gate TransistTransistoror Improved Transistor Density ~~22xx Improved Transistor Switching Speed >>2200%% Reduced Transistor Switching Power ~~3300%% Reduction in gate oxide leakage power >>1100xx 65 nnmm 45 nm 15 Copyright © 2008, Intel Corporation HighHigh--kk + metal gate transistors 16 Copyright © 2008, Intel Corporation 8 High-k + metal gate transistors « The implementation of high-k and metal gate materials marks the biggest change in transistor technology since the introduction of polysilicon gate MOS transistors in the late 1960s », G. Moore, 2007 17 Copyright © 2008, Intel Corporation Intel Driving Moore’s Law into the Future 90nm Node 2003 6655nmnm Node Node 2005 4455nmnm Node Node 2007 3322nmnm Node Node 2009 2222nmnm Node Node 2011 1166nmnm Node Node 2013 25 nnmm 1111nmnm Node Node 2015 88nmnm Node 1155nnmm 2017 77nnmm 55nnmm 3nm 18 Copyright © 2008, Intel Corporation 9 Tick --TockTock Intel ®® Sandy Core™ Penryn Nehalem Westmere Bridge NEW Compaction/ NEW Compaction/ NEW Microarchitecture Derivative Microarchitecture Derivative Microarchitecture 65nm 45nm 4545nmnm 3232nmnm 3232nmnm Tock Tick Tock Tick Tock Montecito/ Montvale Tukwila Poulson 90 nm NEW NEW Microarchitecture Microarchitecture Forecast Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United StateStates as andnd other countries. All 19 products, dates, and figures are preliminary and are subject to change without notice. Copyright © 2008, Intel Corporation Intel® HPC Roadmap 2008 2009 2010 2011 1H 2H 1H 2H 1H 2H 1H 2H Montvale (2C/4T) Tukwila New u-arch Critical Mission Tigerton (4C) Dunnington (6C) Nehalem-EX Caneland w/ Clarksboro chipset EXpandable Harpertown (4C) Derivative New u-arch Wolfdale-DP (2C) Nehalem-EP Efficient Bensley w/Blackford Tylersburg-EP Performance Stoakley w/Seaburg Nehalem-EP Derivative Harpertown (4C) Wolfdale-DP (2C) Tylersburg-EN New u-arch ENtry 20 Copyright © 2008, Intel Corporation 10 Nehalem--EPEP ••44 cores ••8M8M onon--chipchip Shared Cache • 3-level cache hierarchy • 32k I-Cache + 32k D-cache • New 256k L2 cache per core Nehalem-EP • New shared last level cache • Inclusive Cache Policy Core Core Core Core ••SimultaneousSimultaneous MultiMulti--ThreadingThreading capability (SMT) ••QuickPathQuickPath interconnect 8M Shared Cache • Point-to-Point Memory Link • 2 links per CPU socket Controller Controller • 1 for connection to other socket • 1 for connection to chipset • Up to 6.4 GT/sec (12.8 GB/sec) in 3 DDR3 Two each direction per link ( Fully duplex) channels QuickPath ••IntegratedIntegrated
Recommended publications
  • Introduction to the Poulson (Intel 9500 Series) Processor Openvms Advanced Technical Boot Camp 2015 Keith Parris / September 29, 2015
    Introduction to the Poulson (Intel 9500 Series) Processor OpenVMS Advanced Technical Boot Camp 2015 Keith Parris / September 29, 2015 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Information on Poulson from Intel’s ISSCC Paper © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel’s ISSCC Paper http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 3 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel’s ISSCC Paper http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 4 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel’s ISSCC Paper http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 5 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Poulson information from Intel’s ISSCC Paper • Intel presented a paper on Poulson at the International Solid-State Chips Conference (ISSCC) in July 2011. From this, we learned: • Poulson would be in a 32 nm process (2 process generations ahead from Tukwila, which was at 65 nm, skipping the 45 nm process) • The socket would be compatible with Tukwila • Poulson would have 8 cores, of a brand new core design • The front end (instruction fetch) would be decoupled from the back end (instruction execution) • Poulson could execute and retire as many as 12 instructions per cycle, double Tukwila’s 6 instructions http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/itanium-poulson-isscc-paper.pdf 6 © Copyright 2015 Hewlett-Packard Development Company, L.P.
    [Show full text]
  • Evolution of Microprocessor Performance
    EvolutionEvolution ofof MicroprocessorMicroprocessor PerformancePerformance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static & dynamic branch predication. Even with these improvements, the restriction of issuing a single instruction per cycle still limits the ideal CPI = 1 Multiple Issue (CPI <1) Multi-cycle Pipelined T = I x CPI x C (single issue) Superscalar/VLIW/SMT Original (2002) Intel Predictions 1 GHz ? 15 GHz to ???? GHz IPC CPI > 10 1.1-10 0.5 - 1.1 .35 - .5 (?) Source: John P. Chen, Intel Labs We next examine the two approaches to achieve a CPI < 1 by issuing multiple instructions per cycle: 4th Edition: Chapter 2.6-2.8 (3rd Edition: Chapter 3.6, 3.7, 4.3 • Superscalar CPUs • Very Long Instruction Word (VLIW) CPUs. Single-issue Processor = Scalar Processor EECC551 - Shaaban Instructions Per Cycle (IPC) = 1/CPI EECC551 - Shaaban #1 lec # 6 Fall 2007 10-2-2007 ParallelismParallelism inin MicroprocessorMicroprocessor VLSIVLSI GenerationsGenerations Bit-level parallelism Instruction-level Thread-level (?) (TLP) 100,000,000 (ILP) Multiple micro-operations Superscalar /VLIW per cycle Simultaneous Single-issue CPI <1 u Multithreading SMT: (multi-cycle non-pipelined) Pipelined e.g. Intel’s Hyper-threading 10,000,000 CPI =1 u uuu u u Chip-Multiprocessors (CMPs) u Not Pipelined R10000 e.g IBM Power 4, 5 CPI >> 1 uuuuuuu u AMD Athlon64 X2 u uuuuu Intel Pentium D u uuuuuuuu u u 1,000,000 u uu uPentium u u uu i80386 u i80286
    [Show full text]
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]
  • A 65 Nm 2-Billion Transistor Quad-Core Itanium Processor
    18 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 1, JANUARY 2009 A 65 nm 2-Billion Transistor Quad-Core Itanium Processor Blaine Stackhouse, Sal Bhimji, Chris Bostak, Dave Bradley, Brian Cherkauer, Jayen Desai, Erin Francom, Mike Gowan, Paul Gronowski, Dan Krueger, Charles Morganti, and Steve Troyer Abstract—This paper describes an Itanium processor imple- mented in 65 nm process with 8 layers of Cu interconnect. The 21.5 mm by 32.5 mm die has 2.05B transistors. The processor has four dual-threaded cores, 30 MB of cache, and a system interface that operates at 2.4 GHz at 105 C. High speed serial interconnects allow for peak processor-to-processor bandwidth of 96 GB/s and peak memory bandwidth of 34 GB/s. Index Terms—65-nm process technology, circuit design, clock distribution, computer architecture, microprocessor, on-die cache, voltage domains. I. OVERVIEW Fig. 1. Die photo. HE next generation in the Intel Itanium processor family T code named Tukwila is described. The 21.5 mm by 32.5 mm die contains 2.05 billion transistors, making it the first two billion transistor microprocessor ever reported. Tukwila combines four ported Itanium cores with a new system interface and high speed serial interconnects to deliver greater than 2X performance relative to the Montecito and Montvale family of processors [1], [2]. Tukwila is manufactured in a 65 nm process with 8 layers of copper interconnect as shown in the die photo in Fig. 1. The Tukwila die is enclosed in a 66 mm 66 mm FR4 laminate package with 1248 total landed pins as shown in Fig.
    [Show full text]
  • Course #: CSI 440/540 High Perf Sci Comp I Fall ‘09
    High Performance Computing Course #: CSI 440/540 High Perf Sci Comp I Fall ‘09 Mark R. Gilder Email: [email protected] [email protected] CSI 440/540 This course investigates the latest trends in high-performance computing (HPC) evolution and examines key issues in developing algorithms capable of exploiting these architectures. Grading: Your grade in the course will be based on completion of assignments (40%), course project (35%), class presentation(15%), class participation (10%). Course Goals Understanding of the latest trends in HPC architecture evolution, Appreciation for the complexities in efficiently mapping algorithms onto HPC architectures, Familiarity with various program transformations in order to improve performance, Hands-on experience in design and implementation of algorithms for both shared & distributed memory parallel architectures using Pthreads, OpenMP and MPI. Experience in evaluating performance of parallel programs. Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 2 Grades 40% Homework assignments 35% Final project 15% Class presentations 10% Class participation Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 3 Homework Usually weekly w/some exceptions Must be turned in on time – no late homework assignments will be accepted All work must be your own - cheating will not be tolerated All references must be sited Assignments may consist of problems, programming, or a combination of both Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 4 Homework (continued) Detailed discussion of your results is expected – the program is only a small part of the problem Homework assignments will be posted on the class website along with all of the lecture notes ◦ http://www.cs.albany.edu/~gilder Mark R.
    [Show full text]
  • Poulson: an 8 Core 32 Nm Next Generation Intel* Itanium* Processor
    Poulson: An 8 Core 32 nm Next Generation Intel® Itanium® Processor Steve Undy Intel® 1 I NTEL CONFIDENTIAL This slide MUST be used with any slides removed from this presentation Legal Disclaimer • INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLI ED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR I MPLI ED WARRANTY, RELATING TO SALE AND/ OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTI CULAR PURPOSE, MERCHANTABI LI TY, OR I NFRI NGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RI GHT. I NTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. • Intel may make changes to specifications and product descriptions at any time, without notice. • Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. • Configurations: [describe config + what test used + who did testing]. For more information go to http://www.intel.com/performance • Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document.
    [Show full text]
  • Itanium Processor
    IA-64 Microarchitecture --- Itanium Processor By Subin kizhakkeveettil CS B-S5….no:83 INDEX • INTRODUCTION • ARCHITECTURE • MEMORY ARCH: • INSTRUCTION FORMAT • INSTRUCTION EXECUTION • PIPELINE STAGES: • FLOATING POINT PERFORMANCE • INTEGER PERFORMANCE • CONCLUSION • REFERENCES Itanium Processor • First implementation of IA-64 • Compiler based exploitation of ILP • Also has many features of superscalar INTRODUCTION • Itanium is the brand name for 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). • Itanium's architecture differs dramatically from the x86 architectures (and the x86-64 extensions) used in other Intel processors. • The architecture is based on explicit instruction-level parallelism, in which the compiler makes the decisions about which instructions to execute in parallel Memory architecture • From 2002 to 2006, Itanium 2 processors shared a common cache hierarchy. They had 16 KB of Level 1 instruction cache and 16 KB of Level 1 data cache. The L2 cache was unified (both instruction and data) and is 256 KB. The Level 3 cache was also unified and varied in size from 1.5 MB to 24 MB. The 256 KB L2 cache contains sufficient logic to handle semaphore operations without disturbing the main arithmetic logic unit (ALU). • Main memory is accessed through a bus to an off-chip chipset. The Itanium 2 bus was initially called the McKinley bus, but is now usually referred to as the Itanium bus. The speed of the bus has increased steadily with new processor releases. The bus transfers 2x128 bits per clock cycle, so the 200 MHz McKinley bus transferred 6.4 GB/sand the 533 MHz Montecito bus transfers 17.056 GB/s.
    [Show full text]
  • Intel Developer Forum Day 1 News Disclosures from Shanghai
    Intel Corporation 2200 Mission College Blvd. P.O. Box 58119 Santa Clara, CA 95052-8119 News Fact Sheet CONTACTS: Claudine Mangano George Alfs Connie Brown 408-765-0146 408-219-7588 503-791-2367 [email protected] [email protected] [email protected] INTEL DEVELOPER FORUM DAY 1 NEWS DISCLOSURES FROM SHANGHAI April 2, 2008: Intel Corporation is holding its Intel Developer Forum in Shanghai on April 2-3. Below are brief summaries of each executive’s Day 1 keynote speech and news highlights. Anand Chandrasekher, “The Heart of a New Generation” Intel Senior Vice President and General Manager, Ultra Mobility Group Describing how the Internet continues to grow and flourish globally, Chandrasekher said that personal mobility is a key catalyst driving people to embrace the Internet in new places and ways. Chandrasekher formally introduced five Intel® Atom™ processors (formerly codenamed “Silverthorne”) and Intel Centrino® Atom™ processor technology for Mobile Internet Devices (MIDs), outlined a series of innovations in the new platform and highlighted the growing momentum in the MID software and device ecosystem. Also covered were the benefits of the new Atom Microarchitecture, including performance features, Internet and software compatibility, and the drastic reductions that Intel has made in power and packaging to achieve this new technology platform. Chandrasekher also offered insight into the company’s future silicon roadmap. Intel Launches Centrino Atom Processor Technology – Chandrasekher introduced Intel Centrino Atom Processor Technology, the company’s first-generation low-power platform for MIDs designed to enable the best Internet experience in a device that fits in your pocket.
    [Show full text]
  • Intel® Processor Architecture
    Intel® Processor Architecture January 2013 Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Agenda •Overview Intel® processor architecture •Intel x86 ISA (instruction set architecture) •Micro-architecture of processor core •Uncore structure •Additional processor features – Hyper-threading – Turbo mode •Summary Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 2 Intel® Processor Segments Today Architecture Target ISA Specific Platforms Features Intel® phone, tablet, x86 up to optimized for ATOM™ netbook, low- SSSE-3, 32 low-power, in- Architecture power server and 64 bit order Intel® mainstream x86 up to flexible Core™ notebook, Intel® AVX, feature set Architecture desktop, server 32 and 64bit covering all needs Intel® high end server IA64, x86 by RAS, large Itanium® emulation address space Architecture Intel® MIC accelerator for x86 and +60 cores, Architecture HPC Intel® MIC optimized for Instruction Floating-Point Set performance Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3 Itanium® 9500 (Poulson) New Itanium Processor Poulson Processor • Compatible with Itanium® 9300 processors (Tukwila) Core Core 32MB Core Shared Core • New micro-architecture with 8 Last Core Level Core Cores Cache Core Core • 54 MB on-die cache Memory Link • Improved RAS and power Controllers Controllers management capabilities • Doubles execution width from 6 to 12 instructions/cycle 4 Intel 4 Full + 2 Half Scalable Width Intel • 32nm process technology Memory QuickPath Interface Interconnect • Launched in November 2012 (SMI) Compatibility provides protection for today’s Itanium® investment Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation.
    [Show full text]
  • Itanium and Vnuma
    Virtualisation Case Study: Itanium and vNUMA Matthew Chapman [email protected] 1 ITANIUM ➜ High-performance processor architecture ➜ Also known as IA-64 ➜ Joint venture between HP and Intel ➜ Not the same instruction set as IA-32/AMD64/EM64T ➜ Very good floating-point performance 2000: Itanium“Merced” 180nm,upto800Mhz,4MBcache 2002: ItaniumII“McKinley” 180nm,upto1Ghz,3MBcache 2003: ItaniumII“Madison” 130nm,upto1.6Ghz,6MBcache 2004: Itanium II “Madison 9M” 130 nm, up to 1.67Ghz, 9MB cache 2006: Itanium II “Montecito” 90 nm, dual-core, up to 1.6Ghz, 24MB cache 2008: “Tukwila” 65nm,quad-core,upto2.5Ghz? ITANIUM 2 EXPLICITLY PARALLEL INSTRUCTION-SET COMPUTING (EPIC) ➜ Goal to increase instructions per cycle ➜ Itanium can have similar performance to x86 at a lower clock speed ➜ Based on Very Long Instruction Word (VLIW) ➜ Explicit parallelism in instruction set ➜ Simplified instruction decode and issue ➜ Scheduling decisions made by compiler Memory Branch Branch Branch Integer Integer Integer Integer MI;I bundle Integer Memory Memory Memory Memory Floating−point Floating−point Floating−point Floating−point VLIW Instruction MFI bundle Integer Execution Units Execution Units EPIC Instructions EXPLICITLY PARALLEL INSTRUCTION-SET COMPUTING (EPIC) 3 EXAMPLE Load and add three numbers in assembly code: ld8 r4 = [r1] ld8 r5 = [r2] ld8 r6 = [r3] ;; add r7 = r4, r5 ;; add r8 = r6, r7 ;; EXAMPLE 4 Resulting instructions: MMI ld8 r4=[r1] ld8 r5=[r2] nop.i 0 M;MI; ld8 r6 = [r3] ;; add r7=r4,r5 // Arithmetic on M too nop.i 0 ;; M;MI add r8 = r6, r7
    [Show full text]
  • Chapter 29 Itanium Architecture
    Chapter 29 Itanium Architecture Chapter 29 - Itanium Architecture July 2003 Chapter 29 / Page 1 Chapter 29 Itanium Architecture INDEX Introduction 3 Overview of the Itanium Architecture 3 Main features of EPIC....................................................................................................... 3 The Itanium processor family (IPF) 5 HP-UX Itanium Releases 6 Differences between HP-UX for PA and IPF ................................................................... 6 What needs to be considered on an IPF system 8 How an IPF system boots.................................................................................................. 8 Boot disk layout and EFI................................................................................................... 8 How to configure the Console input/output devices....................................................... 10 The boot process ............................................................................................................. 13 How to mirror the root disk............................................................................................. 18 How to replace a failed disk............................................................................................ 21 The ARIES emulator 22 IPF vs. PA-RISC Terminology 22 Additional Information 23 July 2003 Chapter 29 / Page 2 Chapter 29 Itanium Architecture Introduction When PA-RISC was released, HP began designing the architecture to replace it. Several years into the project, HP determined that the economics
    [Show full text]
  • Impact of the New Generation of X86 on the Server Market
    Impact of the New Generation of x86 on the Server Market Gartner RAS Core Research Note G002012320, George J. Weiss, Jeffrey Hewitt, 3 June 2010, R3598 07042011 Intel and OEMs have driven performance and reliability of the Intel Xeon 7500 processor series (code-named Nehalem-EX) to a level overlapping what Gartner judges to be about 80% of the RISC/Itanium function/feature set, but at a much lower price point. The result will cause increased server technology consolidation toward a bipolar distribution: x86-based platforms and one primary market-share-leading RISC technology (IBM Power), complementing the prevailing legacy of mainframes. Key Findings • A bipolar technology distribution of one strong RISC (Power) and x86 will drive at least 70% to 80% of the market’s total revenue, while other technologies subsist with minor single-digit revenue share; beyond this decade, a unipolar market will exist in which x86 technology becomes the standard, predominant server technology. • Unix has receded by $4 billion in the past three years. Migrations from Unix to Linux or Windows continue, according to Gartner client discussions. • Users continue to optimize costs (such as through virtualization, lower hardware costs, open-source software and cloud computing) primarily on x86. • Of the three non-x86 technologies, IBM’s has had a recent track record of sustained market share growth. Recommendations • Calculate an approximate x86 total cost of ownership (TCO) equivalence or line item cost analysis for comparison as if these applications were to be hosted on x86 for similar performance/availability, when considering Unix upgrades. • Make the comparison to an approximately equivalent x86 configuration when evaluating price quotes.
    [Show full text]