AMD Presentation for Linux Kernel Summit

Total Page:16

File Type:pdf, Size:1020Kb

AMD Presentation for Linux Kernel Summit AMD Presentation For Linux Kernel Summit Richard A. Brunner AMD Fellow July 2005 Progress Report Linux is moving into the mainstream with AMD64 technology • AMD64+Linux is penetrating deeper into the Data Center – Demanding mainframe/UNIX functionality: 64-bit, NUMA, multi- core, and virtualization – Requiring solutions to infrastructure issues: more power management, security, and manageability – Requesting innovation without disruption: evolution as opposed to revolution - need to maintain compatibility and stability – Using servers and workstations as the proving ground: Linux must do well in these area before they move to Linux on the desktop • AMD continues the trend of openly providing early technical information on our products to the developer community for feedback. Page 2 July 2005 Linux Kernel Summit AMD’s Technology Roadmap Technology Roadmap N=Today (N+1) (N+2) (N+3) Page 4 July 2005 Linux Kernel Summit 130nm to 90nm Performance & Power Page 5 July 2005 Linux Kernel Summit 65nm Progress Page 6 July 2005 Linux Kernel Summit Post-45nm Research Begun and Processing Page 7 July 2005 Linux Kernel Summit AMD’s Processor Roadmap Introducing AMD64 Dual-Core Processor • Two AMD Opteron™ CPU cores on a single die, each with 1MB L2 cache • 90nm, ~205 million Core 0 transistors* 1-MB L2 – Approximately same die size as 130nm single-core AMD Opteron processor* • 95 watt power envelope fits Northbridge into 90nm power infrastructure • Retains compatibility with existing 32-bit and 64-bit x86-base software 1-MB L2 • Introduced with “K8” Core 1 Revision E core in April 2005 *Based on current revisions of the design Page 9 July 2005 Linux Kernel Summit Designed From The Start To Add Second Core • Shared Northbridge – 3 HyperTransport™ technology links – Dual-channel (128 bit) DDR i/f Existing AMD64 • AMD Opteron™ CPU with Direct Processor Design Connect Architecture was designed as CMP from the start 1MB 1MB – Second port on SRI, request L2 Cache L2 Cache management, two APICs • Two complete CPU cores SRI –SMP model Core 0 Core 1 – Simpler, less-restrictive X-bar programming model than “logical core” approach DDR1 DRAM HyperTransport™ – No need to “pause” one core to give Interface Links 0,1,2 other exclusive use of shared resources Page 10 July 2005 Linux Kernel Summit AMD Dual-Core Technology AMD Athlon™ 64 X2 Dual-Core Processor (Announced June 2005) Model # Freq L2 Cache 4800+ 2.4 Ghz 1 MB + 1 MB 4600+ 2.4 Ghz 512KB + 512KB Desktop 4400+ 2.2 Ghz 1 MB + 1 MB 4200+ 2.2 Ghz 512KB + 512KB http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_8796_9240,00.html AMD Opteron™ Processor Dual-Core Models (Announced on April 21, 2005) Freq 1-way Up to 2-way Up to 8-way 1.8 GHz Model 165 Model 265 Model 865 2.0 GHz Model 170 Model 270 Model 870 Server/Workstation 2.2 GHz Model 175 Model 275 Model 875 http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_9485_13041%5E13076,00.html Page 11 July 2005 Linux Kernel Summit Dual-Core Performance/Watt • SPECweb® 99_SSL Secure Web Connections Example. • Data Center rack space and power budgets are often fixed. • Perf/Watt focus maximizes use of resources. • Typical 48U Rack has 9KVA of Power. Page 12 July 2005 Linux Kernel Summit AMD Direct Connect Architecture + Dual-Core DDR1 16x16cHT Opteron Opteron MEM 800 800 cHT MEM K8 cHT BW DDR BW REV (MHz) 16x16 (MHz) 1-ch/2-ch 1-w/2-w (GB/s) (GB/s) Opteron Opteron 800 800 CG HT1- 3.2/6.4 DDR1 3.2/6.4 800 -400 16x16 E HT1- 4.0/8.0 DDR1 3.2/6.4 1000 -400 PCI-E CORE 0 F HT1- 4.0/8.0 DDR2 Chipset 1000 PCI-E South CORE 1 Bridge Page 13 July 2005 Linux Kernel Summit SSE3 Support • AMD K8 Revision “E” and • ADDSUB[PD,PS] xmm1, xmm2/m128 newer are designed to – Provides interleaved packed add and support SSE3 subtract • Supports SSE3 • FISTTP m16int/m32int/m64int instructions reported by – Like FISTP but with forced truncation CPUID.SSE3 feature flag • HADD[PD,PS] xmm1, xmm2/m128 – Horizontal Adds • Ten new SSE • HSUB[PD,PS] xmm1, xmm2/m128 instructions and one new – Horizontal Subtracts x87 instruction (13 total • LDDQU xmm, m128 opcodes). – Special 128-bit Unaligned load • Monitor/Mwait planned • MOV[D,HD,LD]DUP xmm1, xmm2/m64 in 2007 – Move and Duplicate some elements • CMPXCHG16B planned in 2006 Page 14 July 2005 Linux Kernel Summit Desktop/Workstation Roadmap Page 15 July 2005 Linux Kernel Summit Server/Workstation Roadmap Page 16 July 2005 Linux Kernel Summit Planned 2006 Processor Features • Multi-Core capable • DDR2 support • RDTSCP – see next slide • CMPXCHG16B – compare 16bytes, exchange 16-bytes • Correctable Machine-Check Exception Thresholding • HW Virtualization support (AMD “Pacifica”) Page 17 July 2005 Linux Kernel Summit RDTSCP: Read Serialized TSC Pair • New instruction, similar to RDTSC: – Returns 64-bit TSC value in %edx:%eax – Is a serializing operation -- prevents speculative reads of TSC – Returns TSC_AUX[31:0] MSR in %ecx at same time as TSC ¾ OS initializes TSC_AUX to meaningful value ¾ Atomicity ensures no context switch btw read of TSC & TSC_AUX. – Availability determined by new extended CPUID feature flag • Allows TSC and OS-supplied value (such as CPU number) to be read atomically in a serializing way in user mode. – TSC rates between CPUs in MP-system may vary – Linux can put CPU number in TSC_AUX so user-mode get- time-of-day knows which per-cpu adjustments to use to fix- up TSC value. Page 18 July 2005 Linux Kernel Summit Planned 2007 Processor Features • Multi-core capable • DDR3 support • 1-GB pages – see next slide • 48-bit Physical Addressing – see later slide • Greater than 32-socket support • P-state Invariant TSC (APIC Timer is already) • P-state Fire-n-Forget • Monitor/Mwait • Shared L3-cache • Further Virtualization extensions Page 19 July 2005 Linux Kernel Summit 1 Gigabyte Pages & 48-bit Physical Addresses 64-bit 63 4847 39 38 30 29 0 VA Sign-Extend PML4-O PDP-O Offset CR3 PML4E PDP Page Map Page Dir Level 4 Pointer Table Table Physical 47 30 29 0 Address Page PA Offset Plan is for Physical Address in PTEs to be 48 bits for all page sizes. Page 20 July 2005 Linux Kernel Summit Virtualization Discussion AMD Virtualization Directions • AMD “Pacifica”: HW-Virtualization-Assist. Base features planned launch in 2006 Generation • Primary components of Architecture: – Host/guest management hardware support – Event Injection ¾ Eliminates need for VMM code to emulate x86 exception delivery ¾ Designed to reduce VMM development time significantly – Nested Page Tables ¾ Designed to improve VMM performance, and reduce overhead ¾ Helps reduce VMM complexity Page 22 July 2005 Linux Kernel Summit Core “Pacifica” Architecture: VMRUN • Virtualization based on Virtual Machine Run ( VMRUN) instruction • VMRUN executed by host causes the guest to run • Guest runs until it exits back to the host • Host resumes at the instruction following VMRUN Host instruction Stream while (1) { VMCB // Do World Switch Data rAX = &VMCB Guest instruction Stream Struct VMLOAD(rAX) while (running_VMM) { VMRUN(rAX) switch (exitcode) { // handle intercept // within VMM context } Intercepts VMSAVE(rAX) } Page 23 July 2005 Linux Kernel Summit Core “Pacifica” Architecture: Intercepts • Guest runs until: – It performs an action that causes an exit to the host – It explicitly executes the VMMCALL instruction • The VMCB for a guest has settings that determine what actions cause the guest to exit to host – These intercepts can vary from guest to guest – Two kinds of intercepts ¾ Exception & Interrupt Intercepts ¾ Instruction Intercepts – Rich set of intercepts allow the host to set customize each guest’s privileges • Information about the intercepted event is put into the VMCB on exit Page 24 July 2005 Linux Kernel Summit Nested Paging • CPU maps each Guest_PA to Host_VA and then translates to Host_PA • CPU builds compound gVA_to_hPA TLB entries (guarded by ASID) • Far more efficient than “Shadow Page Tables”, all handled by CPU Guest Translation Host Translation gPA = gen_PML4(gCR3,gVA); hPA = hTRANS( hVA = gPA ); entry = MEMORY[ hPA ]; gPA = gen_PDP(gVA, entry); hPA = hTRANS( hVA = gPA ); entry = MEMORY[ hPA ]; gPA = gen_PDE(gVA, entry); hPA = hTRANS( hVA = gPA ); entry = MEMORY[ hPA ]; gPA = gen_PTE(gVA, entry); hPA = hTRANS( hVA = gPA ); entry = MEMORY[ hPA ]; gPA = gen_PA(gVA, entry); hPA = hTRANS( hVA = gPA ); Page 25 July 2005 Linux Kernel Summit Challenges / Issues Multi-core Numbering • Assume system has non-power-of-two number-of-cores in at least 1 processor due to design or retirement of bad core(s). – How to tell OS? How to keep “sanity” in core/processor bit masks? • BIOS calculates “Rounded Number of Cores” (RNC): – RNC = 2^ceil( log2(Number_of_Cores) ) • BIOS assigns APIC IDs of each processor’s cores to an RNC- aligned block of IDs: – APIC_ID[ proc=i, core=j ] = RNC * (OFFSET + i) + j • Example: 2-processor system Proc Core APIC ID – proc 0 has 3-cores 0 0 0x8 = 4*(2+0) + 0 – proc 1 has 4-cores 0 1 0x9 = 4*(2+0) + 1 – RNC = 4 on all cores 0 2 0xA = 4*(2+0) + 2 rsvd = 4*(2+0) + 0 Want APIC_ID[M:0] to always specify core 1 0 0xC = 4*(2+1) + 0 Initial 1 1 0xD = 4*(2+1) + 1 APIC ID: pppp … cccc Want APIC_ID[N:M+1] to always specify processor 1 2 0xE = 4*(2+1) + 2 1 3 0xF = 4*(2+1) + 3 Page 27 July 2005 Linux Kernel Summit Multi-core Numbering (cont) • OS should use same process to discover topology of processors & cores. • OS can not assume that BSP’s CPUID.number_of_cores is same for all processors. • OS can assume that RNC calculated on any processor is same for all processors.
Recommended publications
  • Memorandum in Opposition to Hewlett-Packard Company's Motion to Quash Intel's Subpoena Duces Tecum
    ORIGINAL UNITED STATES OF AMERICA BEFORE THE FEDERAL TRADE COMMISSION ) In the Matter of ) ) DOCKET NO. 9341 INTEL. CORPORATION, ) a corporation ) PUBLIC ) .' ) MEMORANDUM IN OPPOSITION TO HEWLETT -PACKARD COMPANY'S MOTION TO QUASH INTEL'S SUBPOENA DUCES TECUM Intel Corporation ("Intel") submits this memorandum in opposition to Hewlett-Packard Company's ("HP") motion to quash Intel's subpoena duces tecum issued on March 11,2010 ("Subpoena"). HP's motion should be denied, and it should be ordered to comply with Intel's Subpoena, as narrowed by Intel's April 19,2010 letter. Intel's Subpoena seeks documents necessary to defend against Complaint Counsel's broad allegations and claimed relief. The Complaint alleges that Intel engaged in unfair business practices that maintained its monopoly over central processing units ("CPUs") and threatened to give it a monopoly over graphics processing units ("GPUs"). See CompI. iiii 2-28. Complaint Counsel's Interrogatory Answers state that it views HP, the world's largest manufacturer of personal computers, as a centerpiece of its case. See, e.g., Complaint Counsel's Resp. and Obj. to Respondent's First Set ofInterrogatories Nos. 7-8 (attached as Exhibit A). Complaint Counsel intends to call eight HP witnesses at trial on topics crossing virtually all of HP' s business lines, including its purchases ofCPUs for its commercial desktop, commercial notebook, and server businesses. See Complaint Counsel's May 5, 2010 Revised Preliminary Witness List (attached as Exhibit B). Complaint Counsel may also call HP witnesses on other topics, including its PUBLIC FTC Docket No. 9341 Memorandum in Opposition to Hewlett-Packard Company's Motion to Quash Intel's Subpoena Duces Tecum USIDOCS 7544743\'1 assessment and purchases of GPUs and chipsets and evaluation of compilers, benchmarks, interface standards, and standard-setting bodies.
    [Show full text]
  • Reverse Engineering X86 Processor Microcode
    Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz, Ruhr-University Bochum https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/koppe This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz Ruhr-Universitat¨ Bochum Abstract hardware modifications [48]. Dedicated hardware units to counter bugs are imperfect [36, 49] and involve non- Microcode is an abstraction layer on top of the phys- negligible hardware costs [8]. The infamous Pentium fdiv ical components of a CPU and present in most general- bug [62] illustrated a clear economic need for field up- purpose CPUs today. In addition to facilitate complex and dates after deployment in order to turn off defective parts vast instruction sets, it also provides an update mechanism and patch erroneous behavior. Note that the implementa- that allows CPUs to be patched in-place without requiring tion of a modern processor involves millions of lines of any special hardware. While it is well-known that CPUs HDL code [55] and verification of functional correctness are regularly updated with this mechanism, very little is for such processors is still an unsolved problem [4, 29]. known about its inner workings given that microcode and the update mechanism are proprietary and have not been Since the 1970s, x86 processor manufacturers have throughly analyzed yet.
    [Show full text]
  • AMD's Early Processor Lines, up to the Hammer Family (Families K8
    AMD’s early processor lines, up to the Hammer Family (Families K8 - K10.5h) Dezső Sima October 2018 (Ver. 1.1) Sima Dezső, 2018 AMD’s early processor lines, up to the Hammer Family (Families K8 - K10.5h) • 1. Introduction to AMD’s processor families • 2. AMD’s 32-bit x86 families • 3. Migration of 32-bit ISAs and microarchitectures to 64-bit • 4. Overview of AMD’s K8 – K10.5 (Hammer-based) families • 5. The K8 (Hammer) family • 6. The K10 Barcelona family • 7. The K10.5 Shanghai family • 8. The K10.5 Istambul family • 9. The K10.5-based Magny-Course/Lisbon family • 10. References 1. Introduction to AMD’s processor families 1. Introduction to AMD’s processor families (1) 1. Introduction to AMD’s processor families AMD’s early x86 processor history [1] AMD’s own processors Second sourced processors 1. Introduction to AMD’s processor families (2) Evolution of AMD’s early processors [2] 1. Introduction to AMD’s processor families (3) Historical remarks 1) Beyond x86 processors AMD also designed and marketed two embedded processor families; • the 2900 family of bipolar, 4-bit slice microprocessors (1975-?) used in a number of processors, such as particular DEC 11 family models, and • the 29000 family (29K family) of CMOS, 32-bit embedded microcontrollers (1987-95). In late 1995 AMD cancelled their 29K family development and transferred the related design team to the firm’s K5 effort, in order to focus on x86 processors [3]. 2) Initially, AMD designed the Am386/486 processors that were clones of Intel’s processors.
    [Show full text]
  • The X86 Is Dead. Long Live the X86!
    the x86 is dead. long live the x86! CC3.0 share-alike attribution copyright c 2013 nick black with diagrams by david kanter of http://realworldtech.com “Upon first looking into Intel’s x86” that upon which we gaze is mankind’s triumph, and we are its stewards. use it well. georgia tech ◦ summer 2013 ◦ cs4803uws ◦ nick black The x86 is dead. Long live the x86! Why study the x86? Used in a majority of servers, workstations, and laptops Receives the most focus in the kernel/toolchain Very complex processor, thus large optimization space Excellent documentation and literature Fascinating, revealing, lengthy history Do not think that x86 is all that’s gone on over the past 30 years1. That said, those who’ve chased peak on x86 can chase it anywhere. 1Commonly expressed as “All the world’s an x86.” georgia tech ◦ summer 2013 ◦ cs4803uws ◦ nick black The x86 is dead. Long live the x86! In the grim future of computing there are 10,000 ISAs Alpha + BWX/FIX/CIX/MVI SPARC V9 + VIS3a AVR32 + JVM JVMb CMS PTX/SASSc PA-RISC + MAX-2 TILE-Gxd SuperH ARM + NEONe i960 Blackfin IA64 (Itanium) PowerISA + AltiVec/VSXf MIPS + MDMX/MIPS-3D MMIX IBMHLA (s390 + z) a Most recently the “Oracle SPARC Architecture 2011”. b m68k Most recently the Java SE 7 spec, 2013-02-28. c Most recently the PTX ISA 3.1 spec, 2012-09-13. VAX + VAXVA d TILE-Gx ISA 1.2, 2013-02-26. e z80 / MOS6502 ARMv8: A64, A32, and T32, 2011-10-27. f MIX PowerISA v.2.06B, 2010-11-03.
    [Show full text]
  • The Microarchitecture of Intel and AMD Cpus
    3. The microarchitecture of Intel, AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers By Agner Fog. Copenhagen University College of Engineering. Copyright © 1996 - 2012. Last updated 2012-02-29. Contents 1 Introduction ....................................................................................................................... 4 1.1 About this manual ....................................................................................................... 4 1.2 Microprocessor versions covered by this manual........................................................ 6 2 Out-of-order execution (All processors except P1, PMMX)................................................ 8 2.1 Instructions are split into µops..................................................................................... 8 2.2 Register renaming ...................................................................................................... 9 3 Branch prediction (all processors) ................................................................................... 11 3.1 Prediction methods for conditional jumps.................................................................. 11 3.2 Branch prediction in P1............................................................................................. 16 3.3 Branch prediction in PMMX, PPro, P2, and P3 ......................................................... 20 3.4 Branch prediction in P4 and P4E .............................................................................. 21
    [Show full text]
  • “架构+工艺”,Cpu 业务拉动业绩持续成长 ( )投资价值分析报告| Amd Amd.O 2019.10.10
    2: “架构+工艺”,CPU 业务拉动业绩持续成长 ( )投资价值分析报告| AMD AMD.O 2019.10.10 中信证券研究部 核心观点 AMD CPU 芯片新品架构设计进步迅速,作为纯芯片设计公司,所采用台积电 先进代工工艺历史上第一次阶段性领先竞争对手英特尔。随着有竞争力的新品 持续发布,在 PC 和服务器芯片领域,公司未来三年有望凭借 7nm、7nm+及 5nm 高性价比产品持续抢占份额、扩增营收,服务器 CPU 芯片市场份额有望 创历史新高,公司作为 CPU 和 GPU 双领域全球龙头公司,有望实现持续高速 成长,值得长期重点关注。 徐涛 ▍唯一兼具 CPU+独立 GPU 芯片厂商。公司成立至今经历了英特尔第二供货商、 首席电子分析师 IDM、Fabless+GlobalFoundries、Fabless+TSMC 四个阶段。历史上 2003 年 前后产品性能一度超越英特尔, 年收购 设计厂商 ,成为唯一兼 S1010517080003 2006 GPU ATI 具 CPU 与独立 GPU 设计能力的厂商。我们估测 2018 年公司 PC CPU 收入 22.93 亿美元(营收占比 36%,市场份额 13%);GPU 收入 18.32 亿美元(营收占比 28%,市场份额 18%);服务器收入 3.31 亿美元(营收占比 5%,市场份额 3.2%); 嵌入式与半定制等业务收入 20.19 亿美元(营收占比 31%)。 ▍高壁垒 640 亿美元 CPU+GPU 市场,市场第二名。公司各细分市场中,PC 端 CPU 市场 322 亿美元,服务器端 CPU 市场 166 亿美元,GPU 市场约 120 亿美 元,另有以游戏主机芯片为主的半定制芯片市场,约 34 亿美元。总体市场空间 郑泽科 巨大,公司在 CPU 与 GPU 市场长期为市场第二名,主要竞争对手是英特尔与 电子分析师 英伟达,AMD 在游戏主机芯片市场占据绝大部分份额。凭借近年来架构设计能 S1010517100002 力提升+拥抱台积电先进代工工艺,AMD 有望通过优势新品持续扩大市场份额。 ▍“架构+工艺”,CPU 业务拉动业绩成长。我们认为公司未来两年有望在 CPU 市 场率先提升份额,拉动公司成长。回顾历史,我们发现具有竞争力的新品发布 是市场份额变化重要因素;从设计角度,公司最新发布的 Zen2 架构通过务实创 新设计已实现对英特尔的反超;从制造角度,公司将采用业界顶尖的台积电 7nm,在制程上领先英特尔,后续还将采用 7nm+及 5nm;从时间角度,AMD 将具备领先英特尔至少半年上市的市场机遇期。公司采用灵活的 Fabless 模式, 胡叶倩雯 相较于英特尔 IDM 模式更加适应未来市场趋势,且合作伙伴台积电已将高性能 电子分析师 计算列为重要战略方向。公司 CEO 苏姿丰对产品与技术把握清晰,有望引领公 S1010517100004 司快速发展。 ▍风险因素:PC、服务器市场景气度低于预期;台积电先进制程受不确定性因素 影响演进及产能;公司核心管理层出现重大变化;英特尔先进工艺芯片超预期。 ▍投资建议:我们预计未来两年公司 CPU 业务将随新品发布持续提升份额,拉动 营收并提高毛利率,GPU 业务总体保持平稳增长。基于该假设,我们预测公司 2019/2020/2021 年 EPS 分别为 0.65/1.00/1.54 美元,按照 2020 年 35 倍 PE, 给予目标价 35 美元,首次覆盖,给予“买入”评级。 项目/年度 2017 2018 2019E 2020E 2021E 营业收入 百万美元 ( ) 5,381.00 6,475.00 7,022.85 9,540.25
    [Show full text]
  • AMD K8 Processor Architecture
    Excerpt from MindShare’s Upcoming Book: AMD K8 Processor Architecture Joe Winkles MindShare, Inc. [email protected] November 2005 For training on this topic, visit www.mindshare.com or call 1-800-633-1440 MindShare_K8_Breaking_Tradition.fm Page 1 Tuesday, November 22, 2005 12:38 AM 1 K8 Processors: Breaking Tradition Notice This material is copyrighted and is not to be reproduced without permission from MindShare, Inc. It is offered as a courtesy to MindShare subscribers. Copyright © 2005 by MindShare, Inc. All rights reserved. AMD, AMD Opteron, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Introduction The following is an excerpt from the upcoming MindShare textbook on AMD K8 Processor Architecture. MindShare currently offers a course on AMD based processors which can be found at www.mindshare.com. The K8 Microarchitecture The terms “K8” and “Hammer” are AMD’s internal names for the processor microarchitecture that will be described in detail throughout this book. AMD uses the K8 microarchitecture for several lines of processors such as: —AMD OpteronTM —AMD AthlonTM 64 —AMD AthlonTM 64 FX —AMD TurionTM —AMD SempronTM (a subset of this processor line uses the K8 microar- chitecture, the early Semprons were based on the K7 microarchitecture) Visit MindShare Training at www.mindshare.com 1 MindShare_K8_Breaking_Tradition.fm Page 2 Tuesday, November 22, 2005 12:38 AM The K8 Architecture All of these processors use the same basic internal microarchitecture however they are targeting different markets and thus have different feature sets. A brief description of each processor line’s characteristics can be found later in the chapter.
    [Show full text]
  • Instruction Latencies and Throughput for AMD and Intel X86 Processors
    Instruction latencies and throughput for AMD and Intel x86 processors Torbj¨ornGranlund 2014-07-20 Copyright Torbj¨ornGranlund 2005{2014. Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved. [This report is work-in-progress. A newer version might be available here: http://gmplib.org/~tege/x86-timing.pdf] In this short report we present latency and throughput data for various x86 processors. We only present data on integer operations. The data on integer MMX and SSE2 instructions is currently limited. We might present more complete data in the future, if there is enough interest. There are several reasons for presenting this report: 1. Intel's published data were in the past incomplete and full of errors. 2. Intel did not publish any data for 64-bit operations. 3. To allow straightforward comparison of AMD and Intel pipelines. The here presented data is the result of extensive timing tests. While we have made an effort to make sure the data is accurate, the reader is cautioned that some errors might have crept in. 1 Nomenclature and notation LNN means latency for NN-bit operation.TNN means throughput for NN-bit operation. The term throughput is used to mean number of instructions per cycle of this type that can be sustained. That implies that more throughput is better, which is consistent with how most people understand the term. Intel use that same term in the exact opposite meaning in their manuals. The notation "P6 0-E", "P4 F0", etc, are used to save table header space.
    [Show full text]
  • The Microarchitecture of Intel, AMD and VIA Cpus: an Optimization Guide for Assembly Programmers and Compiler Makers
    3. The microarchitecture of Intel, AMD, and VIA CPUs An optimization guide for assembly programmers and compiler makers By Agner Fog. Technical University of Denmark. Copyright © 1996 - 2021. Last updated 2021-08-17. Contents 1 Introduction ....................................................................................................................... 7 1.1 About this manual ....................................................................................................... 7 1.2 Microprocessor versions covered by this manual ........................................................ 8 2 Out-of-order execution (All processors except P1, PMMX) .............................................. 10 2.1 Instructions are split into µops ................................................................................... 10 2.2 Register renaming .................................................................................................... 11 3 Branch prediction (all processors) ................................................................................... 12 3.1 Prediction methods for conditional jumps .................................................................. 12 3.2 Branch prediction in P1 ............................................................................................. 18 3.3 Branch prediction in PMMX, PPro, P2, and P3 ......................................................... 21 3.4 Branch prediction in P4 and P4E .............................................................................. 23 3.5 Branch
    [Show full text]
  • Welcome to Anandtech.Com [ Article: Intel Core Versus AMD's K
    Welcome to AnandTech.com [ Article: Intel Core versus AMD's K8 ar... http://www.anandtech.com/printarticle.aspx?i=2748 print this page Intel Core versus AMD's K8 architecture Date: May 1, 2006 Type: CPU & Chipset Manufacturer: Intel Author: Johan De Gelas Page 1 Introduction Wide Dynamic Execution, Advanced Digital Media Boost, Smart Memory Access and Advanced Smart Cache; those are the technologies that according to the marketing people at Intel enable Intel to build the high performance, low energy CPUs using the new Core architecture. Of course, as an AnandTech Reader, you couldn't care less about which Hyper Super Advanced Label the marketing folks glue on their CPUs. "Extend the digital lifestyle by combining robust performance with low power consumption" could have been another marketing claim for the new Core architecture, but VIA already cornered that sentence for its C7 CPUs. The marketing slogans for Intel's Core and VIA's C7 are almost the same; the architectures are however vastly different. No, let us find out what is really behind all this marketing hyper-talk, and preferably compare it with the AMD "K8" (Athlon 64, Opteron) architecture of Intel's NetBurst and Pentium M processors. That is what this article is all about. We talked to Jack Doweck, the engineer who designed the completely new Memory Reorder Buffer and Memory disambiguation system. Jack Doweck is one of the Intel Israel Development Center (IDC) architects. The Intel "P8" Intel marketing states that Core is a blend of P-M techniques and NetBurst architecture. However, Core is clearly a descendant of the Pentium Pro, or the P6 architecture.
    [Show full text]
  • UPGRADING and REPAIRING Pcs
    UPGRADING AND REPAIRING PCs, 20th Edition Scott Mueller Que 800 East 96th Street Indianapolis, Indiana 46240 Dual Independent Bus Architecture hS Contents HT Technology 66 Multicore Technology 67 Introduction 1 Processor Manufacturing 68 Processor 72 Book Objectives 1 Re-Marking PGA 72 The 20th Edition DVD-ROM 2 Chip Packaging Contact and My Website: informit.com/upgrading 2 Single Edge Single Edge Processor 73 A Personal Note 2 Packaging Processor Socket and Slot Types 74 1 Development of the PC 5 Socket 478 76 Socket LGA775 77 Computer History: Before Personal Socket LGA1156 78 Computers 5 Socket LGA1366 79 Timeline 5 Socket LGA1155 80 Electronic Computers 10 Socket 939 and 940 80 Modern Computers 11 Socket AM2/AM2+/AM3/AM3+ 81 From Tubes to Transistors 11 Socket F (1207FX) 83 Integrated Circuits 13 CPU Operating Voltages 83 History of the PC 13 Math Coprocessors {Floating-Point Units) Hi Birth of the Personal Computer 13 Processor Bugs and Steppings 84 The IBM Personal Computer 15 Processor Code Names 85 The PC Industry 30 Years Later 16 PI (086) Processors 85 2 PC Components, Features, and P2 (286) Processors 86 System Design 19 P3 (386) Processors 87 P4 (486) Processors 88 What Is a PC? 19 P5 (586) Processors 90 Who Controls PC Software? 20 AMD-K5 92 Who Controls PC Hardware? 23 Intel P6 (686) Processors 92 White-Box 25 Systems Pentium Pro Processors 93 PC Guides 26 Design Pentium II Processors 93 27 System Types Pentium III 95 28 System Components Celeron 97 Intel Pentium 4 Processors 97 3 Processor and Types Pentium 4 Extreme Edition
    [Show full text]
  • An Exploratory Analysis of Microcode As a Building Block for System Defenses
    An Exploratory Analysis of Microcode as a Building Block for System Defenses Benjamin Kollenda, Philipp Koppe, Marc Fyrbiak Christian Kison, Christof Paar, Thorsten Holz Ruhr-Universität Bochum [email protected] ABSTRACT 1 INTRODUCTION Microcode is an abstraction layer used by modern x86 processors New vulnerabilities, design flaws, and attack techniques with devas- that interprets user-visible CISC instructions to hardware-internal tating consequences for the security and safety of computer systems RISC instructions. The capability to update x86 microcode enables a are announced on a regular basis [20]. The underlying faults range vendor to modify CPU behavior in-field, and thus patch erroneous from critical memory safety violations [22] or input validation [21] microarchitectural processes or even implement new features. Most in software to race conditions or side-channel attacks in the under- prominently, the recent Spectre and Meltdown vulnerabilities lying hardware [3, 27, 37, 39, 40, 45, 53]. To cope with erroneous were mitigated by Intel via microcode updates. Unfortunately, mi- behavior and to reduce the attack surface, various defenses have crocode is proprietary and closed source, and there is little publicly been developed and integrated in software and hardware over the available information on its inner workings. last decades [75, 78]. In this paper, we present new reverse engineering results that Generally speaking, defenses implemented in software can be cat- extend and complement the public knowledge of proprietary mi- egorized in either compiler-assisted defenses [5, 9, 19, 54, 60, 65, 70] crocode. Based on these novel insights, we show how modern or binary defenses [1, 25, 32, 64, 80].
    [Show full text]