ISSCC 2021 F5.5 Architecting Chiplet Solutions for High Volume Products

Total Page:16

File Type:pdf, Size:1020Kb

ISSCC 2021 F5.5 Architecting Chiplet Solutions for High Volume Products ISSCC 2021 F5.5 Architecting Chiplet Solutions for High Volume Products Samuel Naffziger ISSCC Forum 2021 1of 42 Outline Trends and driving forces behind chiplet architecture Chiplet benefits Design challenges with chiplet integration – a case study Potential future directions ISSCC Forum 2021 2of 42 GPU and CPU Performance Trends GPU Single Precision Floating Point Specint®_rate2006 2P Server Operations Per Second Trend Performance Trend Over Time 100000 GF 100X 10000 GF 1000 GF 10X 100 GF Single Precision GFLOPS 10 GF Throughput Performance Ratio 1 GF 1X 2006 2008 2010 2012 2014 2016 2018 2020 2008 2009 2010 2012 2013 2014 2016 2017 2019 2020 See Endnotes ISSCC Forum 2021 3of 42 SC2-1 Slide 3 Moore’s Law Keeps Slowing 90nm 65nm 45nm 32nm 22nm 14nm 10/7nm 2004 2006 2008 2010 2012 2014 2016 2018 2020 AMD Internal estimate – See Endnotes ISSCC Forum 2021 4 of 42 SC2-1 Slide 4 While Costs Continue to Increase Cost Per Yielded mm2 for a 250mm2 Die 6.00 5.00 4.00 3.00 2.00 1.00 NORMALIZED MM COST/YIELDED - 45nm 32nm 28nm 20nm 14/16nm 7nm 5nm Source AMD – See Endnotes ISSCC Forum 2021 5 of 42 SC2-1 Slide 5 Die Size Trend Die Size Increases Over Time in Server CPUs and GPUs 1000 RETICLE LIMIT (MM2) DIE SIZE 100 2006 2008 2009 2010 2012 2013 2014 2016 2017 2019 2020 Server CPU GPU See Endnotes ISSCC Forum 2021 6 of 42 SC2-1 Slide 6 Chiplet Architectures to Extend Performance Gains ISSCC Forum 2021 7 of 42 SC2-1 Slide 7 Bigger Chips to Offset Technology Slowdown If technology scaling only gives you (say) 1.5x more devices per 24 months, why not just make chips 1.33x bigger to get 2x transistors? 1000 RETICLE LIMIT DIE SIZE (MM2) 100 395 chips 362 good die 192 chips 162 good die (8% yield loss) (16% yield loss) 2004 2009 2014 2020 (hypothetical/academic example [1], not real yield rates) Server CPU [1] Kannan, Enright Jerger, Loh, “Enabling Interposer-based ISSCC Forum 2021 8of 42 Disintegration of Multi-core Processors,“ International SC2-1Symposium on Microarchitecture (MICRO), 2015 Slide 8 Chiplets Background Alternative: build multiple smaller chips 2X X X Historically, not needed for most markets Except for the largest systems, Moore’s Law was sufficient to meet compute needs X X 2X Chiplets not free Additional area for interfaces, replicated logic One generation later Higher packaging costs Additional design effort, complexity Past methodologies less suited for chiplets X X 2X device functionality costs > 2X silicon area ISSCC Forum 2021 9 of 42 SC2-1 Slide 9 Progression of Advanced Integration/Packaging Ceramic Substrate MCMs Organic Substrate MCMs Silicon Interposer [1] Concept of partitioning systems into multiple chips is not new Evolution of packaging technology has changed the trade-offs in terms of cost, bandwidth, latency, energy, etc. [1] Carsten Schulz, Wikimedia Commons (link, license) ISSCC Forum 2021 10 of 42 SC2-1 Slide 10 High-Level Approach to Chiplets X X X Wafer X Test Assemble X X X Functional SoCs X X X X Test Assemble X X X Functional SoCs ISSCC Forum 2021 11 of 42 SC2-1 Slide 11 A Case Study with AMD EPYC™ Server Processors ISSCC Forum 2021 12 of 42 SC2-1 Slide 12 1st Gen AMD EPYC Architecture MCM approach has many advantages Higher yield, enables increased feature-set Multi-product leverage Traditional Monolithic 1st Gen EPYC AMD EPYC Processors I/O Die 2 2 2 I/O I/O I/O I/O I/O 4x 213mm die/package = 852mm CCX CCX DDR Si/package* CCX CCX DDR CCX DDR CCX I/O CCX CCX Die 1 I/O Hypothetical EPYC Monolithic DDR DDR processor I/O CCX CCX Die 3 I/O 2 DDR CCX ~777mm * CCX DDR CCX CCX TM DDR CCX Remove die-to-die Infinity Fabric PHYs CCX I/O I/O I/O I/O I/O and logic (4/die), duplicated logic, etc. Die 0 I/O 852mm2 / 777mm2 = ~10% MCM area overhead 32C Die Cost 32C Die Cost 1.0X 0.59X1 1. Based on AMD internal yield model using historical ISSCC Forum 2021 13 of 42 defect density data for mature technologies Slide 13 What to do for an Encore? Leadership performance 2X >1.25X 0.5X DENSITY1 FREQUENCY1 POWER1 requires 7nm benefits (same power) (same performance) Yet the cost of advanced 7nm Compute Efficiency Gains technologies are increasing Cost Per Yielded mm2 for a 250mm2 Die Gen1 architecture does not 6.00 scale well to double core 2 5.00 counts 4.00 Innovation required 3.00 2.00 1.00 - Normalize Cost/Yielded mm Normalize Cost/Yielded 1. Based on June 8, 2018 AMD internal testing of same-architecture ISSCC Forum 2021 14 of 42 product ported from 14 to 7 nm technology with similar SC2-1implementation flow/methodology, using performance from SGEMM. Slide 14 Chiplets to Maximize 7nm Benefits Prior Generation RYZEN™ Processor Die High-performance server and desktop processors are IO-heavy Analog devices and bump pitches for IO benefit very little from leading edge technology, and that technology is very costly CPU core + L3 on this die comprises ~56% of the area These circuits see increased 7nm gains Remaining ~44% sees very little performance and Solution: Partition the SOC, density improvement from 7nm reserving the expensive leading- edge silicon for CPU cores while L3 leaving the IO and memory IFOP SerDes interfaces in N-1 generation silicon SMU DFx L3 Zen2 cores Zen2 cores 7nm CCD is ~86% CPU + L3 ISSCC Forum 2021 15 of 42 SC2-1 Slide 15 Chiplets Evolved – Hybrid Multi-die Architecture Traditional Monolithic 1st Gen EPYC CPU 2nd Gen EPYC CPU Use an Advanced Each IP in its Optimal Centralized I/O Die Superior Technology Technology Where it is Technology, 2nd Gen Improves NUMA for CPU Performance Needed Most Infinity Fabric™ and Power Connected ISSCC Forum 2021 16 of 42 Slide 16 Connecting the Chiplets Theoretical Interposer-based Silicon interposers and bridges provide high wire density, but have limited CCD CCD reach IOD Only supports die edge connectivity CCD CCD which limits number of chiplets and Interposer cores that can be supported Performance goals required more Core Selected MCM Approach Complex Die (CCDs) than can be tiled adjacent to the IOD Solution is to retain the on-package SerDes links for die-die connections ISSCC Forum 2021 17 of 42 SC2-1 Slide 17 Package Routing Challenges Prior generation already consumed almost all package routing resources for memory and IO Connecting 9 chiplets in the same package requires innovation I/O I/O Die 2 CCX CCX DDR DDR CCX I/O CCX I/O Die 1 I/O I/O Die 3 CCX CCX DDR DDR CCX CCX I/O I/O Die 0 1st Gen AMD EPYC™ [1] [1] Beck ISSCC 2018 ISSCC Forum 2021 18 of 42 SC2-1 Slide 18 Under-CCD Routing SERDES CCD CCD CCD CCD DDR IOD DDR CCD CCD CCD CCD SERDES Routing Infinity Fabric on Package (IFOP) SerDes links from IOD to the two- deep chiplets required sharing routing layers with off-package SerDes and competing with power delivery requirements ISSCC Forum 2021 19 of 42 SC2-1 Slide 19 Zen vs. Zen 2 VDDM Distribution Dense SRAMs require a separate rail Zen VDDM distribution via package plane Zen 2 VDDM distribution via RDL only ISSCC Forum 2021 20 of 42 SC2-1 Slide 20 Zen 2 VDDM Design Challenges Enables 80 IFOP package RDL is more resistive than a dedicated package layer routed signals under the CCD Therefore, we reduced overall VDDM 4 VDDM LDOs inside the L3 current draw by ~80% compared to Zen [Singh ISSCC 2020] Core L3 4MB L3 4MB Core New, smaller, and distributed LDO design +L2 slice slice +L2 Ensured sufficient routing porosity through the integrated LDOs to enable Core L3 4MB L3 4MB Core critical routing +L2 slice slice +L2 These improvements kept the IR drop to ≈10mV impact VDDM RDL spanning LDO L2 and L3 ISSCC Forum 2021 21 of 42 SC2-1 Slide 21 Package Integration, Server, and Desktop Zen2 Zen2 CCD CCD Zen2 Zen2 • Bump pitch for 14nm and CCD CCD 128 total x16 7nm is 150um and 130um 2nd Gen SERDES respectively AMD • Transitioned IOD from EPYCTM Server solder bumps to copper pillars, enabling a common Processor 72 Data + 8 Clk/Ctl interface for IOD+CCD Zen2 Zen2 (total/CCD) CCD CCD – Conducive to tighter bump pitches (compact) 3rd Gen AMD RyzenTM Zen2 Zen2 – Enabled common die CCD CCD Processor Infinity Fabric (die-to-die) height after assembly IO Controllers and PHYs – Higher max current 2 x DDR4 PHYs (electromigration) limits ISSCC Forum 2021 22 of 42 SC2-1 Slide 22 Chiplet-enabled Socket Upgrades • Chiplet architecture allows migration to “Zen 3” CCDs Zen3 Zen3 without disrupting the platform CCD CCD • Re-use the client IOD • Enable in-place upgrades to “Zen 3” for AMD Socket AM4 with ~19% IPC increase Zen3 Zen3 3rd Gen AMD RyzenTM 4th Gen AMD RyzenTM CCD CCD Processor Processor Infinity Fabric (die-to-die) Infinity Fabric (die-to-die) IO Controllers and PHYs IO Controllers and PHYs 2 x DDR4 PHYs 2 x DDR4 PHYs ISSCC Forum 2021 23 of 42 SC2-1 Slide 23 Improving Memory Performance Prior Generation Server memory latency is a key EPYC™ 7001 Series Processors factor in performance A goal for 2nd Gen was to improve on the 2017 1st Gen EPYC™ CPU design Non-Uniform-Memory-Access (NUMA) behaviors are a result of memory interfaces being distributed across die Significant delays from NUMA1 3 NUMA Distances to NUMA2 impact performance Domain8 NUMA LatencyDomains 1 (ns) for some applications NUMA1 90 NUMA2 141 NUMA3 234 Avg.
Recommended publications
  • Amd(Amd.Us)18Q1 点评 2018 年 07 月 30 日
    海外公司报告 | 公司动态研究 证券研究报告 AMD(AMD.US)18Q1 点评 2018 年 07 月 30 日 作者 AMD 7 年最佳,10 年翻身,重申买入,TP 上调至 何翩翩 分析师 23 美元 SAC 执业证书编号:S1110516080002 [email protected] 业绩超预期,7 年来最佳盈利季 雷俊成 分析师 SAC 执业证书编号:S1110518060004 AMD 18Q2 实现 7 年来最佳盈利季度,non-GAAP EPS 0.14 美元,营收 17.6 [email protected] 亿美元同比大涨 53%,均超过华尔街预期的 EPS 0.13 美元和营收 17.2 亿美 马赫 分析师 SAC 执业证书编号:S1110518070001 元。计算与图形业务同比大涨 64%至 10.9 亿美元好于市场预期的 10.6 亿, [email protected] 但受 Q2 区块链相关贡献进一步减弱带来该业务环比跌 3%。挖矿业务本季 董可心 联系人 营收占比从上季的 10%降低为 6%,公司进一步看淡下半年需求。EESC 业务 [email protected] 同比涨 37%至 6.7 亿美元,好于预期的 6.61 亿,EPYC 逐步进入放量阶段, 公司维持到年底会实现中单位数份额的预测。Q2 毛利率提升至 37%,Q3 指引营收 17 亿美元,同比增长 7%,略低于市场预期的 17.6 亿,毛利率提 相关报告 升至约 38%;全年指引营收增速保持 25%,我们认为公司指引基于 17Q3 的 1 《AMD(AMD.US)点评:EPYC“从 高基数较为保守,且区块链影响作为一次性业务逐渐消弭也会进一步减少 零到一”终实现,7nm 产品周期全方位 业绩不确定性,我们看好 EPYC 会在下半年至 Q4 迎来关键放量。 回归“传奇”;TP 上调至 22 美元,重 服务器市场 AMD 与 Intel“荣辱互见” 申买入》2018-06-20 2 《AMD(AMD.US)点评:公布 7nm 服务器市场 AMD 与 Intel“荣辱互见”,EPYC 服务器随着 Cisco、HPE 适配 GPU 加入 AI 计算抢滩战,Ryzen+EPYC 以及超级云计算客户的需求能见度提高,Q2 出货量和营收均环比提高超 50%,目前与 AMD 合作的 5 个云计算巨头成主要推动力。我们认为 AMD “双子星”仍是中流砥柱;TP 上调至 将继续通过单插槽服务器高核心数和低功耗打造性价比优势,下半年加速 18 美元,重申买入》2018-06-08 市场渗透蚕食 Intel 份额,进入明年则等待 7nm 的第二代 EPYC 面市,面对 3 《AMD(AMD.US)18Q1 点评:2018 已将 10nm Cannon Lake 量产时点延后至明年的 Intel,AMD 将终于实现制 开门红,业绩指引均超预期,Ryzen 继 程反超,加速量价齐升。“从零到一”抢占 20 亿美元以上的市场份额。 续扎实闪耀,EPYC 仍待升级放量,重 反观 Intel Q2 数据中心业务收入 55.5 亿美元,虽然在整体行业高景气度下 申买入》2018-04-30 同比增长 27%,但仍低于市场预期的 56.3 亿美元。业绩发布会上 Intel 更为 4 《2017 扭亏为盈业绩迎拐点,2018 明确消费级 10nm 产品会到 19 年下半年节日旺季才推向市场,让市场情绪 厚积待薄发,Ryzen+EPYC 继续双星闪 愈加悲观的同时也给了 AMD 足够的时间窗口。 耀,重申买入》2018-02-01 Ryzen 继续攻城略地,进一步打开笔记本市场 5 《AMD(AMD.US)点评:合作英特
    [Show full text]
  • Die Meilensteine Der Computer-, Elek
    Das Poster der digitalen Evolution – Die Meilensteine der Computer-, Elektronik- und Telekommunikations-Geschichte bis 1977 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 und ... Von den Anfängen bis zu den Geburtswehen des PCs PC-Geburt Evolution einer neuen Industrie Business-Start PC-Etablierungsphase Benutzerfreundlichkeit wird gross geschrieben Durchbruch in der Geschäftswelt Das Zeitalter der Fensterdarstellung Online-Zeitalter Internet-Hype Wireless-Zeitalter Web 2.0/Start Cloud Computing Start des Tablet-Zeitalters AI (CC, Deep- und Machine-Learning), Internet der Dinge (IoT) und Augmented Reality (AR) Zukunftsvisionen Phasen aber A. Bowyer Cloud Wichtig Zählhilfsmittel der Frühzeit Logarithmische Rechenhilfsmittel Einzelanfertigungen von Rechenmaschinen Start der EDV Die 2. Computergeneration setzte ab 1955 auf die revolutionäre Transistor-Technik Der PC kommt Jobs mel- All-in-One- NAS-Konzept OLPC-Projekt: Dass Computer und Bausteine immer kleiner, det sich Konzepte Start der entwickelt Computing für die AI- schneller, billiger und energieoptimierter werden, Hardware Hände und Finger sind die ersten Wichtige "PC-Vorläufer" finden wir mit dem werden Massenpro- den ersten Akzeptanz: ist bekannt. Bei diesen Visionen geht es um die Symbole für die Mengendarstel- schon sehr früh bei Lernsystemen. iMac und inter- duktion des Open Source Unterstüt- möglichen zukünftigen Anwendungen, die mit 3D-Drucker zung und lung. Ägyptische Illustration des Beispiele sind: Berkley Enterprice mit neuem essant: XO-1-Laptops: neuen Technologien und Konzepte ermöglicht Veriton RepRap nicht Ersatz werden.
    [Show full text]
  • AMD EPYC™ 7003 Series Cpus Set New Standard As Highest Performance Server Processor
    March 15, 2021 AMD EPYC™ 7003 Series CPUs Set New Standard as Highest Performance Server Processor New AMD EPYC processor extends per socket performance leadership and best per core performance1** with new “Zen 3” cores and modern security features Partners including AWS, Cisco, Dell Technologies, Google Cloud, HPE, Lenovo, Microsoft Azure, Oracle Cloud Infrastructure, Supermicro, Tencent Cloud and others grow EPYC processor ecosystem to an expected 400 cloud instances and 100 new OEM platforms by end of 2021 SANTA CLARA, Calif., March 15, 2021 (GLOBE NEWSWIRE) -- At a digital event, AMD (NASDAQ: AMD) announced the new AMD EPYC™ 7003 Series CPUs, which includes the AMD EPYC 7763, the world’s highest-performing server processor2*. The new EPYC 7003 series processors help HPC, cloud and enterprise customers do more, faster, by delivering the best performance of any server CPU with up to 19% more instructions per clock3. “With the launch of our 3rd Gen AMD EPYC processors, we are incredibly excited to deliver the fastest server CPU in the world. These processors extend our data center leadership and help customers solve today’s most complex IT challenges, while substantially growing our ecosystem,” said Forrest Norrod, senior vice president and general manager, Data Center and Embedded Solutions Business Group. “We not only double the performance over the competition in HPC, cloud and enterprise workloads with our newest server CPUs, but together with the AMD Instinct GPUs, we are breaking the exascale barrier in supercomputing and helping to tackle problems that have previously been beyond humanity’s reach.” AMD EPYC Processors, Powering the Modern Data Center Available immediately, AMD EPYC 7003 Series Processors have up to 64 “Zen 3” cores per processor and introduce new levels of per-core cache memory, while continuing to offer the PCIe® 4 connectivity and class-leading memory bandwidth4 that defined the EPYC 7002 series CPUs.
    [Show full text]
  • Best Practice Guide Modern Processors
    Best Practice Guide Modern Processors Ole Widar Saastad, University of Oslo, Norway Kristina Kapanova, NCSA, Bulgaria Stoyan Markov, NCSA, Bulgaria Cristian Morales, BSC, Spain Anastasiia Shamakina, HLRS, Germany Nick Johnson, EPCC, United Kingdom Ezhilmathi Krishnasamy, University of Luxembourg, Luxembourg Sebastien Varrette, University of Luxembourg, Luxembourg Hayk Shoukourian (Editor), LRZ, Germany Updated 5-5-2021 1 Best Practice Guide Modern Processors Table of Contents 1. Introduction .............................................................................................................................. 4 2. ARM Processors ....................................................................................................................... 6 2.1. Architecture ................................................................................................................... 6 2.1.1. Kunpeng 920 ....................................................................................................... 6 2.1.2. ThunderX2 .......................................................................................................... 7 2.1.3. NUMA architecture .............................................................................................. 9 2.2. Programming Environment ............................................................................................... 9 2.2.1. Compilers ........................................................................................................... 9 2.2.2. Vendor performance libraries
    [Show full text]
  • AMD Introduces World's Most Powerful 16- Core
    November 7, 2019 AMD Introduces World’s Most Powerful 16- core Consumer Desktop Processor, the AMD Ryzen™ 9 3950X – AMD Ryzen™ 9 3950X rounds out 3rd Gen Ryzen desktop processor series, arriving November 25 – – New AMD Athlon™ 3000G processor to provide everyday users with unmatched performance per dollar, coming November 19 – SANTA CLARA, Calif., Nov. 07, 2019 (GLOBE NEWSWIRE) -- Today, AMD announced the release of the highly anticipated flagship 16-core AMD Ryzen 9 3950X processor, available worldwide November 25, 2019. AMD Ryzen 9 3950X processor brings the ultimate processor for gamers with effortless 1080P gaming in select titles1 and up to 2X more energy efficient processing power compared to the competition2 as the world’s fastest 16- core consumer desktop processor3. In addition, AMD also announced a significant performance uplift4 coming for mainstream desktop users with the new AMD Athlon 3000G, arriving November 19, 2019. “We are excited to bring the AMD Ryzen™ 9 3950X to market later this month, offering enthusiasts the most powerful 16-core desktop processor ever,” said Chris Kilburn, corporate vice president and general manager, client channel, AMD. “We are focused on offering the best solutions at every level of the market, including the AMD Athlon 3000G for everyday PC users that delivers great performance at an incredible price point.” AMD Ryzen 9 3950X: Fastest 16-core Consumer Desktop Processor Offering up to 22% performance increase over previous generations5, the AMD Ryzen 9 3950X offers faster 1080p gaming in select titles1 and content creation6 than the competition. Built on the industry-leading “Zen 2” architecture, the AMD Ryzen 9 3950X also excels in power efficiency3 with a TDP7 of 105W.
    [Show full text]
  • Take a Way: Exploring the Security Implications of AMD's Cache Way
    Take A Way: Exploring the Security Implications of AMD’s Cache Way Predictors Moritz Lipp Vedad Hadžić Michael Schwarz Graz University of Technology Graz University of Technology Graz University of Technology Arthur Perais Clémentine Maurice Daniel Gruss Unaffiliated Univ Rennes, CNRS, IRISA Graz University of Technology ABSTRACT 1 INTRODUCTION To optimize the energy consumption and performance of their With caches, out-of-order execution, speculative execution, or si- CPUs, AMD introduced a way predictor for the L1-data (L1D) cache multaneous multithreading (SMT), modern processors are equipped to predict in which cache way a certain address is located. Conse- with numerous features optimizing the system’s throughput and quently, only this way is accessed, significantly reducing the power power consumption. Despite their performance benefits, these op- consumption of the processor. timizations are often not designed with a central focus on security In this paper, we are the first to exploit the cache way predic- properties. Hence, microarchitectural attacks have exploited these tor. We reverse-engineered AMD’s L1D cache way predictor in optimizations to undermine the system’s security. microarchitectures from 2011 to 2019, resulting in two new attack Cache attacks on cryptographic algorithms were the first mi- techniques. With Collide+Probe, an attacker can monitor a vic- croarchitectural attacks [12, 42, 59]. Osvik et al. [58] showed that tim’s memory accesses without knowledge of physical addresses an attacker can observe the cache state at the granularity of a cache or shared memory when time-sharing a logical core. With Load+ set using Prime+Probe. Yarom et al. [82] proposed Flush+Reload, Reload, we exploit the way predictor to obtain highly-accurate a technique that can observe victim activity at a cache-line granu- memory-access traces of victims on the same physical core.
    [Show full text]
  • AMD Zen Rohin, Vijay, Brandon Outline
    AMD Zen Rohin, Vijay, Brandon Outline 1. History and Overview 2. Datapath Structure 3. Memory Hierarchy 4. Zen 2 Improvements History and Overview AMD History ● IBM production too large, forced Intel to license their designs to 3rd parties ● AMD fills the gap, produces clones for 15ish years - legal battles ensued ● K5 first in-house x86 chip in 1996 ● Added more features like out of order, L2 caches, etc ● Current CPUs are Zen* tomshardware.com/picturestory/71 3-amd-cpu-history.html Zen Brand ● Performance desktop and mobile computing ○ Athlon ○ Ryzen 3, Ryzen 5, Ryzen 7, Ryzen 9 ○ Ryzen Threadripper ● Server ○ EPYC https://en.wikichip.org/wiki/amd/microarchitectures/zen Zen History ● Aimed to replace two of AMD’s older chips ○ Excavator: high performance architecture ○ Puma: low power architecture https://en.wikichip.org/wiki/amd/microarchitectures/zen#Block_Diagram Zen Architecture ● Quad-core ● Fetch 4 instructions/cycle ● Op cache 2k instructions ● 168 physical integer registers ● 72 out of order loads ● Large shared L3 cache ● 2 threads per core https://www.slideshare.net/AMD/amd-epyc-microp rocessor-architecture Datapath Structure Fetch ● Decoupled branch predictor ○ Runs ahead of fetches ○ Successful predictions help latency and memory parallelism ○ Mispredictions incur power penalty ● 3 layer TLB ○ L0: 8 entries ○ L1: 64 entries ○ L2: 512 entries https://www.anandtech.com/show/10591/amd-zen-microarchiture-p art-2-extracting-instructionlevel-parallelism/3 Branch Predictor ● Perceptron: simple neural network ● Table of perceptrons, each a vector of weights ● Branch address used to access perceptron table ● Dot product between weight vector and branch history vector Perceptron Branch Predictor ● ~10% improve prediction rates over gshare predictor - (2, 2) correlating predictor ● Can utilize longer branch histories ○ Hardware requirements scale linearly whereas they scale exponentially for other predictors D.
    [Show full text]
  • Amd's Commitment To
    This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as AMD’s journey; the proposed transaction with Xilinx, Inc. including expectations, benefits and plans of the proposed transaction; total addressable markets; AMD’s technology roadmaps; the features, functionality, performance, availability, timing and expected benefits of future AMD products; AMD’s path forward in data center, PCs and gaming; and AMD’s 2021 financial outlook, long-term financial model and ability to drive shareholder returns, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward- looking statements in this presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings, including but not limited to AMD’s most recent reports on Forms 10-K and 10-Q. AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this presentation, except as may be required by law.
    [Show full text]
  • AMD Announces World's Best Mobile Processors¹ in CES 2021 Keynote
    January 12, 2021 AMD Announces World’s Best Mobile Processors¹ In CES 2021 Keynote AMD Ryzen Threadripper PRO Processors, designed for the most demanding professional workloads, coming to retail channel SANTA CLARA, Calif., Jan. 12, 2021 (GLOBE NEWSWIRE) -- CES 2021 -- Today, AMD (NASDAQ: AMD) announced the full portfolio of AMD Ryzen™ 5000 Series Mobile Processors, bringing the highly-efficient and extremely powerful “Zen 3” core architecture to the laptop market. New AMD Ryzen 5000 Series Mobile Processors provide unprecedented levels of performance and incredible battery life for gamers, creators, and professionals. New laptops powered by Ryzen 5000 Series Mobile processors will be available from major PC manufacturers including ASUS, HP and Lenovo, starting in Q1 2021. Expanding its leadership client computing product portfolio featuring the “Zen 3” core, AMD also announced the AMD Ryzen PRO 5000 Series Mobile Processors, delivering enterprise- grade security and seamless manageability to commercial users. Throughout the course of 2021, AMD expects a broad portfolio of more than 150 consumer and commercial notebooks based on the Ryzen 5000 Series Mobile Processors. “As the PC becomes an even more essential part of how we work, play and connect, users demand more performance, security and connectivity,” said Saeid Moshkelani, senior vice president and general manager, Client business unit, AMD. “The new AMD Ryzen 5000 Series Desktop and Mobile Processors bring the best innovation AMD has to offer to consumers and professionals as we continue our commitment to delivering best-in-class experiences with instant responsiveness, incredible battery life and fantastic designs. With our PC partners, we are delivering top-quality performance and no-compromise solutions alongside our record-breaking growth in the notebook and desktop space in the previous year.” AMD Ryzen 5000 Series Mobile Processors Building upon the previous generation of leadership mobile processors, the Ryzen 5000 Series includes high-performance H- and ultra-mobile U-Series processors.
    [Show full text]
  • Cs433 Amd Zen 2
    CS433 AMD ZEN 2 Hyoungwook Nam (hn5) Anjana Suresh Kumar (anjanas3) Vibhor Dodeja (vdodeja2) Namrata Mantri (nmantri2) Table of Contents 1. Overview 2. Pipeline Structure 3. Memory Hierarchy 4. Security and Power 5. Takeaways Overview History of AMD's x86 microarchitectures (1) K5 - K7 (95~02) K8 (03~08) K10 (09~11) x86 frontend, RISC backend Introduced x86-64 ISA Up to 6 cores Superscalar, OoO, speculation Dual-core (Athlon 64 X2) Shared L3 cache SIMD, L2 cache (K6) Integrated memory controller GPU integrated APUs (Fusion) https://www.tomshardware.com/picturestory/713-amd-cpu-history.html History of AMD's x86 microarchitectures (2) Bulldozer (11~16) Zen (17 ~ ) Multi-core module (MCM) Simultaneous Multi-thread (SMT) Two cores per module Two threads per core Shared FP and L2 in a module Higher single-thread performance AMD Financial Analyst Day, May 2015 Multi-core Module (MCM) Structure of Zen Single EPYC Package Single Die (Chiplet) Multiple dies in a package, 2 core complexes (ccx) per die, and up to 4 cores per ccx. (~4c8t per die) Fully connected NUMA between dies with infinity fabric (IF) which also interconnects ccx. https://www.slideshare.net/AMD/amd-epyc-microprocessor-architecture Zen 2 Changes Over Zen 1 and Zen+ Dedicated IO chiplet using hybrid process - TSMC 7nm CPU cores + GF 14nm IO chiplet 2x more cores per package - up to 16 for consumer and 64 for server More ILP - Better predictor, wider execution, deeper window, etc. 2x Larger L3 and faster IF2 Extra security features against spectre attacks https://www.pcgamesn.com/amd/amd-zen-2-release-date-specs-performance
    [Show full text]
  • PAP Advanced Computer Architectures 1 ISA Development History
    Advanced Computer Architectures History and Future Czech Technical University in Prague, Faculty of Electrical Engineering Slides authors: Michal Štepanovský, update Pavel Píša B4M35PAP Advanced Computer Architectures 1 ISA development history 1936 Alan Turing: On computable numbers, with an application to 1939 Bombe: the Entscheidungsproblem designed to crack 1937: Howard Aiken: Concept of Enigma Automatic Sequence Controlled 1941 Konrad Zuse: Calculator – ASCC. Z3 – the world first 1945 John von Neumann: First functional Turing Draft of a Report on the EDVAC. complete computer, New idea: Stored-program program controlled computer. Previous computers 1944 Harvard Mark I required to physically modified to 1944 Colossus for given task. Remark: stored- 1946 ENIAC program idea appeared even 1947 Transistor earlier in 1943 year – ENIAC 1948 Manchester development: J. P. Eckert a J. Baby – the first stored- Mauchly program computer 1949 EDSAC – Computers of that era are single accumulator equipped by accumulator (one 1953 EDSAC, register) for arithmetic and logic Manchester Mark I, operations which is fixed IBM 700 series: single accumulator destination and one of source + index register operands. 2 B4M35PAP Advanced Computer Architectures 2 ISA development history 1954 John Backus: FORTRAN (FOrmula TRANslator) language There is a significant separation of the programming model from 1958: JohnMcCarthy: LSP (LISt implementation !!! Processing) language 1961 B5000: Computer designed and 1960 ALGOL (ALGOrithmic Language) optimized for ALGOL 60
    [Show full text]
  • G292-Z22 HPC System - 2U up 8 X Gen3 GPU Server Features
    G292-Z22 HPC System - 2U UP 8 x Gen3 GPU Server Features Able to support up to 8 double slot GPGPU or co-processor cards, the G292 Series enables world-leading HPC within a 2U chassis. • Supports up to 8 x double slot GPU cards • AMD EPYC™ 7002 series processor family • 8-Channel RDIMM/LRDIMM DDR4, 8 x DIMMs • 2 x 10Gb/s SFP+ LAN ports (Mellanox® ConnectX-4 Lx) • 1 x Dedicated management port • 6 x 2.5" SATA and 2 x 2.5 NVMe hot-swap HDD/SSD bays • 2 x M.2 with PCIe Gen3 x4/x2 interface • 8 x PCIe Gen3 expansion slots for GPU cards • 2 x PCIe Gen4 x16 low-profile slots for add-on cards • Aspeed® AST2500 remote management controller • 2+0 2200W 80 PLUS Platinum power supply AMD EPYC™ 7002 Series Processor (Rome) The next generation of AMD EPYC has arrived, providing incredible compute, IO and bandwidth capability – designed to meet the huge demand for more compute in big data analytics, HPC and cloud computing. Built on 7nm advanced process technology, allowing for denser compute capabilities with lower power consumption Up to 64 core per CPU, built using Zen 2 high performance cores and AMD’s innovative chiplet architecture Supporting PCIe Gen 4.0 with a bandwidth of up to 64GB/s, twice of PCIe Gen 3.0 Embedded security protection to help defend your CPU, applications, and data NVIDIA® Tesla ® V100 Support GIGABYTE’s AMD EPYC server systems and motherboards are fully compatible and qualified to use with NVIDIA’s Tesla® V100 GPU, an advanced data center GPU built to accelerate AI, HPC, and graphics.
    [Show full text]