10-Multiprocx6.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

10-Multiprocx6.Pdf Overview. • Mul3processor(OS((Background(and(Review)( - How(does(it(work?((Background)( - Scalability((Review)( • Mul3processor(Hardware( - Contemporary(systems((Intel,(AMD,(ARM,(Oracle/Sun)( - Experimental(and(Future(systems((Intel,(MS,(Polaris)( • OS(Design(for(Mul3processors( - Guidelines( Mul1processor.OS. - Design(approaches( • Divide(and(Conquer((Disco,(Tessela3on)( COMP9242(–(Advanced(Opera3ng(Systems( • Reduce(Sharing((K42,(Corey,(Linux,(FlexSC,(scalable(commuta3vity)( Ihor(Kuz(|([email protected]( T2/2019(Week(10( • No(Sharing((Barrelfish,(fos)( www.data61.csiro.au. 2((|( COMP9242(T2/2019(W10( Uniprocessor.OS. CPU. App1.App2. OS. Mul1processor.OS. OS.data. Applica1on.data. App4. Run.. Process.control.. FS.. App1. queue. blocks. structs. App2. App3. Memory. 3( COMP9242(T2/2019(W10( 4((|( COMP9242(T2/2019(W10( Mul1processor.OS. Mul1processor.OS. CPU. CPU. CPU. CPU. CPU. CPU. CPU. CPU. App1. App3. App4. App4. App1. App3. App4. App4. Key(design(challenges:( OS. OS. OS. OS. OS. OS. • Correctness(of((shared)(data(structures(OS. OS. • Scalability((performance(doesn’t(suffer)( OS.data. Applica1on.data. OS.data. Applica1on.data. App4. App4. Run.. Process.control.. FS.. App1. Run.. Process.control.. FS.. App1. queue. blocks. structs. App2. App3. queue. blocks. structs. App2. App3. Memory. Memory. 5((|( COMP9242(T2/2019(W10( 6((|( COMP9242(T2/2019(W10( Correctness.of.Shared.Data. Scalability. • Concurrency(control( Speedup(as(more(processors(added( - Locks( Ideal. - Semaphores( - T Transac3ons( S(N) 1 - Lockdfree(data(structures( = TN • We(know(how(to(do(this:( - In(the(applica3on( - In(the(OS( Speedup.(S). number.of.processors.(n). 7((|( COMP9242(T2/2019(W10( 8((|( COMP9242(T2/2019(W10( Scalability. Scalability.and.Serialisa1on. Processor.1. Processor.2. Processor.3. Speedup(as(more(processors(added( Parallel. Reality. Parallel( Parallel( Parallel( Program. Parallel( Parallel( Parallel( T Parallel( Parallel( Parallel( S(N) = 1 Parallel( Parallel( Parallel( TN Parallel( Parallel( Parallel( Serial( Parallel( Serial( Serial( speedup. Parallel( Parallel( Serial( Parallel( Parallel( Parallel( number.of.processors. Parallel( Parallel( Parallel( 9((|( COMP9242(T2/2019(W10( 10((|( COMP9242(T2/2019(W10( Scalability.and.Serialisa1on. Serialisa1on. Remember(Amdahl’s(law( - Serial((nondparallel)(por3on:(when(applica3on(not(running(on(all(cores( Where(does(serialisa3on(show(up?( - Applica3on((e.g.(access(shared(app(data)( - Serialisa3on(prevents(scalability( - OS((e.g.(performing(syscall(for(app)(How(much(3me(is(spent(in(OS?( T =1= (1− P)+ P 1 Sources.of.Serialisa1on. P TN = (1− P)+ Locking((explicit(serialisa3on)( N • Wai3ng(for(a(lock(!(stalls(self( T 1 • Lock(implementa3on:(( S(N) = 1 = • Atomic(opera3ons(lock(bus(!(stalls(everyone(wai3ng(for(memory(( T P • Cache(coherence(traffic(loads(bus(!(stalls(others(wai3ng(for(memory(( N (1− P)+ N Memory(access((implicit)( 1 - Rela3vely(high(latency(to(memory((!(stalls(self(( S(∞) → (1− P) Cache((implicit)( - Processor(stalled(while(cache(line(is(fetched(or(invalidated( - Affected(by(latency(of(interconnect( - Performance(depends(on(data(size((cache(lines)(and(conten3on((number(of(cores)( 11((|( COMP9242(T2/2019(W10( From(hgp://en.wikipedia.org/wiki/File:AmdahlsLaw.svg( 12((|( COMP9242(T2/2019(W10( More.CacheMrelated.Serialisa1on. False(sharing( - Unrelated(data(structs(share(the(same(cache(line( - Accessed(from(different(processors(( !(Cache(coherence(traffic(and(delay( Cache(line(bouncing( Mul1processor. - Shared(R/W(on(many(processors( - E.g:(bouncing(due(to(locks:(each(processor(spinning(on(a(lock(brings(it(into(its(own(cache( !(Cache(coherence(traffic(and(delay( Hardware. Cache(misses( - Poten3ally(direct(memory(access(!(stalls(self( - When(does(cache(miss(occur?( • Applica3on(accesses(data(for(the(first(3me,((Applica3on(runs(on(new(core(( • Cached(memory(has(been(evicted( • Cache(footprint(too(big,(another(app(ran,(OS(ran( 13((|( COMP9242(T2/2019(W10( 14( COMP9242(T2/2019(W10( Mul1MWhat?. Interes1ng.Proper1es.oF.Mul1processors. • Terminology:(( • Scale(and(Structure( - core,(die((chip),(package((module,(processor,(CPU)( - How(many(cores(and(processors(are(there( • Mul3processor,(SMP( - What(kinds(of(cores(and(processors(are(there( - >1(separate(processors,(connected(by(offdprocessor(interconnect( - How(are(they(organised((access(to(IO,(etc.)( • Mul3thread,(SMT( • Interconnect( - >1(hardware(threads(in(a(single(processing(core( - How(are(the(cores(and(processors(connected( • Mul3core,(CMP( - >1(processing(cores(in(a(single(die,(connected(by(onddie(interconnect( • Memory(Locality(and(Caches( • Mul3core(+(Mul3processor( - Where(is(the(memory( - >1(mul3core(dies(in(a(package((mul3dchip(module),(ondprocessor(interconnect(( - What(is(the(cache(architecture( - >1(mul3core(processors,(offdprocessor(interconnect( • Interprocessor(Communica3on( • Manycore( - How(do(cores(and(processors(send(messages(to(each(other( - Lots((>100)(of(cores( 15((|( COMP9242(T2/2019(W10( 16((|( COMP9242(T2/2019(W10( Contemporary.Mul1processor.Hardware. Scale.and.Structure. • Intel:(( • ARM(Cortex(A9(MPCore( - Nehalem,(Westmere:((10(core,(QPI( - Sandy(Bridge,(Ivy(Bridge:(5(core,(ring(bus,(integrated(GPU,(L3,(IO( - Haswell((Broadwell):(18+(core,(ring(bus,(transac3onal(memory,(slices((EP)( - Skylake((SP):(mesh(architecture( • AMD:( - K10((Opteron:(Barcelona,(Magny(Cours):(12(core,(Hypertransport( - Bulldozer,(Piledriver,(Steamroller((Opteron,(FX)( • 16(core,(Clustered(Mul3thread:(module(with(2(integer(cores( - Zen:(on(die(NUMA:(CPU(Complex((CCX)((4(core,(private(L3)( - Zen(2:(chiplets((2xCCX)(chiplets,(IO(die((incl(mem(controller)( • Oracle((Sun)(UltraSparc(T1,T2,T3,T4,T5((Niagara),(M5,M7( - T5:(16(cores,(8(threads/core((2(simultaneous),(crossbar,(8(sockets,(( - M8:(32(core,(8(threads,(on(chip(network,(8(sockets,(5GHz( • ARM(Cortex(A9,(A15(MPCore,(big.LITTLE,(DynamIQ( - 4(d8(cores,(big.LITTLE:(A7(+(A15,(dynamIQ:(A75(+(A55( 17((|( COMP9242(T2/2019(W10( 18((|( COMP9242(T2/2019(W10( From(hgp://www.arm.com/images/CortexdA9dMPdcore_Big.gif( ( Scale.and.Structure. Scale.and.Structure. • ARM(big.LITTLE( 19((|( COMP9242(T2/2019(W10( From(hgp://www.arm.com/images/Fig_1_CortexdA15_CCI_CortexdA7_System.jpg( 20((|( COMP9242(T2/2019(W10( From(hgps://developer.arm.com/d/media/developer/Other%20Images/dynamiqdimprovementsdoverdbigdligle.png( ( Scale.and.Structure. Memory.Locality.and.Caches. • Intel(Nehalem( • NUMA((NondUniform(Memory(Access)( From(www.dawnorhered.net/wpdcontent/uploads/2011/02/NehalemdEXdarchitectureddetailed.jpg( From(www.dawnorhered.net/wpdcontent/uploads/2011/02/NehalemdEXdarchitectureddetailed.jpg( 21((|( COMP9242(T2/2019(W10( 22((|( COMP9242(T2/2019(W10( ( ( Interconnect. Interconnect.(Latency). • AMD(Barcelona( ( 23((|( COMP9242(T2/2019(W10( From(www.sigops.org/sosp/sosp09/slides/baumanndslidesdsosp09.pdf( 24((|( COMP9242(T2/2019(W10( From(www.systems.ethz.ch/educa3on/pastdcourses/falld2010/aos/lectures/wk10dmul3core.pdf( “Victoria Falls” 128 threads Par:al'interconnect' FB DIMMFB FB DIMM DIMM FB FBDIMM DIMM FB FB DIMM DIMM FB DIMM 16 cores Interconnect.(Bandwidth). Interconnect. 65x (two sockets) MCU MCU MCU MCU • Oracle(Sparc(T2( MCU MCU MCU MCU UltraSPARC T2 L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ processor L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ 64 threads Node'0' Full Cross Bar eight cores Full Cross Bar 35x C0 C1 C2 C3 C4 C5 C6 C7 C0 C1 C2 C3 C4 C5 C6 C7 FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU SPU SPU SPU SPU SPU SPU SPU SPU UltraSPARC® T1 processor 32 threads Node'6' Node'7' eight cores NIU 14x Sys I/F PCIe (E-NET+) Buffer Switch Core UltraSPARC® IIIi NIU Sys I/F PCIe processor (Ethernet+) Buffer Switch Core 1x 2x 10 Power <100 W x8 @2. GHz Gigabit Ethernet 3GB/s( 6GB/s( 4GB/sd3GB/s( Unidirec3onal( 2x 10 Power <95 W x8 @ 2.0 GHz No'direct'link'between'node'0'and'7,'0'will'do'“2_hops”'to'access'7' Gigabit Ethernet 2004 2005 2006 2007 2008 10' 25((|( COMP9242(T2/2019(W10( From(hgps://www.usenix.org/conference/atc15/technicaldsession/presenta3on/lepers( 26((|( COMP9242(T2/2019(W10( From(Sun/Oracle( Interconnect. Interconnect/Structure/Memory. 27((|( COMP9242(T2/2019(W10( From(hgp://www.anandtech.com/show/8423/inteldxeonde5dversiond3dupdtod18dhaswelldepdcoresd/4( 28((|( COMP9242(T2/2019(W10( From(hgp://www.anandtech.com/show/8423/inteldxeonde5dversiond3dupdtod18dhaswelldepdcoresd/4( Experimental/Future/NonMmainstream.Mul1processor. Scale.and.Structure. Hardware. • Tilera.Tile64.(newest:.Mellanox.TILEMGx),(Intel(Polaris( • Microsor(Beehive( - Ring(bus,(no(cache(coherence( DDR2 Controller 0 DDR2 Controller 1 • Tilera((now(Mellanox)(Tile64,(TiledGx( SerDes SerDes PROCESSOR CACHE - PCIe 0 Reg File L2 CACHE 100(cores,(mesh(network( L-1I L-1D MAC/ P P P I-TLB D-TLB PHY • Intel(Polaris( 2 1 0 2D DMA - UART, MDN TDN 80(cores,(mesh(network( HPI, I2C, GbE 0 UDN IDN JTAG,SPI STN SWITCH • Intel(SCC( Flexible GbE 1 I/O Flexible - 48(cores,(mesh(network,(no(cache(coherency( I/O PCIe 1 • Intel(MIC((Mul3(Integrated(Core)(( MAC/ XAUI 1 PHY MAC/ - Knight’s(Corner/Landing(d(Xeon(Phi( PHY SerDes - 60+(cores,(ring(bus/mesh( SerDes DDR2 Controller 3 DDR2 Controller 2 29((|( COMP9242(T2/2019(W10( 30((|( COMP9242(T2/2019(W10( From(www.3lera.com/products/processors/TILE64(( Cache.and.Memory.and.IPC. Interprocessor.Communica1on. • Intel(SCC( • Beehive( 31((|( COMP9242(T2/2019(W10( From(techresearch.intel.com/spaw2/uploads/files/SCC_Plasorm_Overview.pdf( 32((|( COMP9242(T2/2019(W10( From(projects.csail.mit.edu/beehive/BeehiveV5.pdf( Interconnect. Skylake.SP. • Intel(MIC((Mul3(Integrated(Core)((Knight’s(Corner/Landing(d(Xeon(Phi)(
Recommended publications
  • Amd(Amd.Us)18Q1 点评 2018 年 07 月 30 日
    海外公司报告 | 公司动态研究 证券研究报告 AMD(AMD.US)18Q1 点评 2018 年 07 月 30 日 作者 AMD 7 年最佳,10 年翻身,重申买入,TP 上调至 何翩翩 分析师 23 美元 SAC 执业证书编号:S1110516080002 [email protected] 业绩超预期,7 年来最佳盈利季 雷俊成 分析师 SAC 执业证书编号:S1110518060004 AMD 18Q2 实现 7 年来最佳盈利季度,non-GAAP EPS 0.14 美元,营收 17.6 [email protected] 亿美元同比大涨 53%,均超过华尔街预期的 EPS 0.13 美元和营收 17.2 亿美 马赫 分析师 SAC 执业证书编号:S1110518070001 元。计算与图形业务同比大涨 64%至 10.9 亿美元好于市场预期的 10.6 亿, [email protected] 但受 Q2 区块链相关贡献进一步减弱带来该业务环比跌 3%。挖矿业务本季 董可心 联系人 营收占比从上季的 10%降低为 6%,公司进一步看淡下半年需求。EESC 业务 [email protected] 同比涨 37%至 6.7 亿美元,好于预期的 6.61 亿,EPYC 逐步进入放量阶段, 公司维持到年底会实现中单位数份额的预测。Q2 毛利率提升至 37%,Q3 指引营收 17 亿美元,同比增长 7%,略低于市场预期的 17.6 亿,毛利率提 相关报告 升至约 38%;全年指引营收增速保持 25%,我们认为公司指引基于 17Q3 的 1 《AMD(AMD.US)点评:EPYC“从 高基数较为保守,且区块链影响作为一次性业务逐渐消弭也会进一步减少 零到一”终实现,7nm 产品周期全方位 业绩不确定性,我们看好 EPYC 会在下半年至 Q4 迎来关键放量。 回归“传奇”;TP 上调至 22 美元,重 服务器市场 AMD 与 Intel“荣辱互见” 申买入》2018-06-20 2 《AMD(AMD.US)点评:公布 7nm 服务器市场 AMD 与 Intel“荣辱互见”,EPYC 服务器随着 Cisco、HPE 适配 GPU 加入 AI 计算抢滩战,Ryzen+EPYC 以及超级云计算客户的需求能见度提高,Q2 出货量和营收均环比提高超 50%,目前与 AMD 合作的 5 个云计算巨头成主要推动力。我们认为 AMD “双子星”仍是中流砥柱;TP 上调至 将继续通过单插槽服务器高核心数和低功耗打造性价比优势,下半年加速 18 美元,重申买入》2018-06-08 市场渗透蚕食 Intel 份额,进入明年则等待 7nm 的第二代 EPYC 面市,面对 3 《AMD(AMD.US)18Q1 点评:2018 已将 10nm Cannon Lake 量产时点延后至明年的 Intel,AMD 将终于实现制 开门红,业绩指引均超预期,Ryzen 继 程反超,加速量价齐升。“从零到一”抢占 20 亿美元以上的市场份额。 续扎实闪耀,EPYC 仍待升级放量,重 反观 Intel Q2 数据中心业务收入 55.5 亿美元,虽然在整体行业高景气度下 申买入》2018-04-30 同比增长 27%,但仍低于市场预期的 56.3 亿美元。业绩发布会上 Intel 更为 4 《2017 扭亏为盈业绩迎拐点,2018 明确消费级 10nm 产品会到 19 年下半年节日旺季才推向市场,让市场情绪 厚积待薄发,Ryzen+EPYC 继续双星闪 愈加悲观的同时也给了 AMD 足够的时间窗口。 耀,重申买入》2018-02-01 Ryzen 继续攻城略地,进一步打开笔记本市场 5 《AMD(AMD.US)点评:合作英特
    [Show full text]
  • Best Practice Guide Modern Processors
    Best Practice Guide Modern Processors Ole Widar Saastad, University of Oslo, Norway Kristina Kapanova, NCSA, Bulgaria Stoyan Markov, NCSA, Bulgaria Cristian Morales, BSC, Spain Anastasiia Shamakina, HLRS, Germany Nick Johnson, EPCC, United Kingdom Ezhilmathi Krishnasamy, University of Luxembourg, Luxembourg Sebastien Varrette, University of Luxembourg, Luxembourg Hayk Shoukourian (Editor), LRZ, Germany Updated 5-5-2021 1 Best Practice Guide Modern Processors Table of Contents 1. Introduction .............................................................................................................................. 4 2. ARM Processors ....................................................................................................................... 6 2.1. Architecture ................................................................................................................... 6 2.1.1. Kunpeng 920 ....................................................................................................... 6 2.1.2. ThunderX2 .......................................................................................................... 7 2.1.3. NUMA architecture .............................................................................................. 9 2.2. Programming Environment ............................................................................................... 9 2.2.1. Compilers ........................................................................................................... 9 2.2.2. Vendor performance libraries
    [Show full text]
  • AMD Introduces World's Most Powerful 16- Core
    November 7, 2019 AMD Introduces World’s Most Powerful 16- core Consumer Desktop Processor, the AMD Ryzen™ 9 3950X – AMD Ryzen™ 9 3950X rounds out 3rd Gen Ryzen desktop processor series, arriving November 25 – – New AMD Athlon™ 3000G processor to provide everyday users with unmatched performance per dollar, coming November 19 – SANTA CLARA, Calif., Nov. 07, 2019 (GLOBE NEWSWIRE) -- Today, AMD announced the release of the highly anticipated flagship 16-core AMD Ryzen 9 3950X processor, available worldwide November 25, 2019. AMD Ryzen 9 3950X processor brings the ultimate processor for gamers with effortless 1080P gaming in select titles1 and up to 2X more energy efficient processing power compared to the competition2 as the world’s fastest 16- core consumer desktop processor3. In addition, AMD also announced a significant performance uplift4 coming for mainstream desktop users with the new AMD Athlon 3000G, arriving November 19, 2019. “We are excited to bring the AMD Ryzen™ 9 3950X to market later this month, offering enthusiasts the most powerful 16-core desktop processor ever,” said Chris Kilburn, corporate vice president and general manager, client channel, AMD. “We are focused on offering the best solutions at every level of the market, including the AMD Athlon 3000G for everyday PC users that delivers great performance at an incredible price point.” AMD Ryzen 9 3950X: Fastest 16-core Consumer Desktop Processor Offering up to 22% performance increase over previous generations5, the AMD Ryzen 9 3950X offers faster 1080p gaming in select titles1 and content creation6 than the competition. Built on the industry-leading “Zen 2” architecture, the AMD Ryzen 9 3950X also excels in power efficiency3 with a TDP7 of 105W.
    [Show full text]
  • Take a Way: Exploring the Security Implications of AMD's Cache Way
    Take A Way: Exploring the Security Implications of AMD’s Cache Way Predictors Moritz Lipp Vedad Hadžić Michael Schwarz Graz University of Technology Graz University of Technology Graz University of Technology Arthur Perais Clémentine Maurice Daniel Gruss Unaffiliated Univ Rennes, CNRS, IRISA Graz University of Technology ABSTRACT 1 INTRODUCTION To optimize the energy consumption and performance of their With caches, out-of-order execution, speculative execution, or si- CPUs, AMD introduced a way predictor for the L1-data (L1D) cache multaneous multithreading (SMT), modern processors are equipped to predict in which cache way a certain address is located. Conse- with numerous features optimizing the system’s throughput and quently, only this way is accessed, significantly reducing the power power consumption. Despite their performance benefits, these op- consumption of the processor. timizations are often not designed with a central focus on security In this paper, we are the first to exploit the cache way predic- properties. Hence, microarchitectural attacks have exploited these tor. We reverse-engineered AMD’s L1D cache way predictor in optimizations to undermine the system’s security. microarchitectures from 2011 to 2019, resulting in two new attack Cache attacks on cryptographic algorithms were the first mi- techniques. With Collide+Probe, an attacker can monitor a vic- croarchitectural attacks [12, 42, 59]. Osvik et al. [58] showed that tim’s memory accesses without knowledge of physical addresses an attacker can observe the cache state at the granularity of a cache or shared memory when time-sharing a logical core. With Load+ set using Prime+Probe. Yarom et al. [82] proposed Flush+Reload, Reload, we exploit the way predictor to obtain highly-accurate a technique that can observe victim activity at a cache-line granu- memory-access traces of victims on the same physical core.
    [Show full text]
  • AMD Zen Rohin, Vijay, Brandon Outline
    AMD Zen Rohin, Vijay, Brandon Outline 1. History and Overview 2. Datapath Structure 3. Memory Hierarchy 4. Zen 2 Improvements History and Overview AMD History ● IBM production too large, forced Intel to license their designs to 3rd parties ● AMD fills the gap, produces clones for 15ish years - legal battles ensued ● K5 first in-house x86 chip in 1996 ● Added more features like out of order, L2 caches, etc ● Current CPUs are Zen* tomshardware.com/picturestory/71 3-amd-cpu-history.html Zen Brand ● Performance desktop and mobile computing ○ Athlon ○ Ryzen 3, Ryzen 5, Ryzen 7, Ryzen 9 ○ Ryzen Threadripper ● Server ○ EPYC https://en.wikichip.org/wiki/amd/microarchitectures/zen Zen History ● Aimed to replace two of AMD’s older chips ○ Excavator: high performance architecture ○ Puma: low power architecture https://en.wikichip.org/wiki/amd/microarchitectures/zen#Block_Diagram Zen Architecture ● Quad-core ● Fetch 4 instructions/cycle ● Op cache 2k instructions ● 168 physical integer registers ● 72 out of order loads ● Large shared L3 cache ● 2 threads per core https://www.slideshare.net/AMD/amd-epyc-microp rocessor-architecture Datapath Structure Fetch ● Decoupled branch predictor ○ Runs ahead of fetches ○ Successful predictions help latency and memory parallelism ○ Mispredictions incur power penalty ● 3 layer TLB ○ L0: 8 entries ○ L1: 64 entries ○ L2: 512 entries https://www.anandtech.com/show/10591/amd-zen-microarchiture-p art-2-extracting-instructionlevel-parallelism/3 Branch Predictor ● Perceptron: simple neural network ● Table of perceptrons, each a vector of weights ● Branch address used to access perceptron table ● Dot product between weight vector and branch history vector Perceptron Branch Predictor ● ~10% improve prediction rates over gshare predictor - (2, 2) correlating predictor ● Can utilize longer branch histories ○ Hardware requirements scale linearly whereas they scale exponentially for other predictors D.
    [Show full text]
  • Amd's Commitment To
    This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as AMD’s journey; the proposed transaction with Xilinx, Inc. including expectations, benefits and plans of the proposed transaction; total addressable markets; AMD’s technology roadmaps; the features, functionality, performance, availability, timing and expected benefits of future AMD products; AMD’s path forward in data center, PCs and gaming; and AMD’s 2021 financial outlook, long-term financial model and ability to drive shareholder returns, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward- looking statements in this presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings, including but not limited to AMD’s most recent reports on Forms 10-K and 10-Q. AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this presentation, except as may be required by law.
    [Show full text]
  • AMD Announces World's Best Mobile Processors¹ in CES 2021 Keynote
    January 12, 2021 AMD Announces World’s Best Mobile Processors¹ In CES 2021 Keynote AMD Ryzen Threadripper PRO Processors, designed for the most demanding professional workloads, coming to retail channel SANTA CLARA, Calif., Jan. 12, 2021 (GLOBE NEWSWIRE) -- CES 2021 -- Today, AMD (NASDAQ: AMD) announced the full portfolio of AMD Ryzen™ 5000 Series Mobile Processors, bringing the highly-efficient and extremely powerful “Zen 3” core architecture to the laptop market. New AMD Ryzen 5000 Series Mobile Processors provide unprecedented levels of performance and incredible battery life for gamers, creators, and professionals. New laptops powered by Ryzen 5000 Series Mobile processors will be available from major PC manufacturers including ASUS, HP and Lenovo, starting in Q1 2021. Expanding its leadership client computing product portfolio featuring the “Zen 3” core, AMD also announced the AMD Ryzen PRO 5000 Series Mobile Processors, delivering enterprise- grade security and seamless manageability to commercial users. Throughout the course of 2021, AMD expects a broad portfolio of more than 150 consumer and commercial notebooks based on the Ryzen 5000 Series Mobile Processors. “As the PC becomes an even more essential part of how we work, play and connect, users demand more performance, security and connectivity,” said Saeid Moshkelani, senior vice president and general manager, Client business unit, AMD. “The new AMD Ryzen 5000 Series Desktop and Mobile Processors bring the best innovation AMD has to offer to consumers and professionals as we continue our commitment to delivering best-in-class experiences with instant responsiveness, incredible battery life and fantastic designs. With our PC partners, we are delivering top-quality performance and no-compromise solutions alongside our record-breaking growth in the notebook and desktop space in the previous year.” AMD Ryzen 5000 Series Mobile Processors Building upon the previous generation of leadership mobile processors, the Ryzen 5000 Series includes high-performance H- and ultra-mobile U-Series processors.
    [Show full text]
  • Cs433 Amd Zen 2
    CS433 AMD ZEN 2 Hyoungwook Nam (hn5) Anjana Suresh Kumar (anjanas3) Vibhor Dodeja (vdodeja2) Namrata Mantri (nmantri2) Table of Contents 1. Overview 2. Pipeline Structure 3. Memory Hierarchy 4. Security and Power 5. Takeaways Overview History of AMD's x86 microarchitectures (1) K5 - K7 (95~02) K8 (03~08) K10 (09~11) x86 frontend, RISC backend Introduced x86-64 ISA Up to 6 cores Superscalar, OoO, speculation Dual-core (Athlon 64 X2) Shared L3 cache SIMD, L2 cache (K6) Integrated memory controller GPU integrated APUs (Fusion) https://www.tomshardware.com/picturestory/713-amd-cpu-history.html History of AMD's x86 microarchitectures (2) Bulldozer (11~16) Zen (17 ~ ) Multi-core module (MCM) Simultaneous Multi-thread (SMT) Two cores per module Two threads per core Shared FP and L2 in a module Higher single-thread performance AMD Financial Analyst Day, May 2015 Multi-core Module (MCM) Structure of Zen Single EPYC Package Single Die (Chiplet) Multiple dies in a package, 2 core complexes (ccx) per die, and up to 4 cores per ccx. (~4c8t per die) Fully connected NUMA between dies with infinity fabric (IF) which also interconnects ccx. https://www.slideshare.net/AMD/amd-epyc-microprocessor-architecture Zen 2 Changes Over Zen 1 and Zen+ Dedicated IO chiplet using hybrid process - TSMC 7nm CPU cores + GF 14nm IO chiplet 2x more cores per package - up to 16 for consumer and 64 for server More ILP - Better predictor, wider execution, deeper window, etc. 2x Larger L3 and faster IF2 Extra security features against spectre attacks https://www.pcgamesn.com/amd/amd-zen-2-release-date-specs-performance
    [Show full text]
  • G292-Z22 HPC System - 2U up 8 X Gen3 GPU Server Features
    G292-Z22 HPC System - 2U UP 8 x Gen3 GPU Server Features Able to support up to 8 double slot GPGPU or co-processor cards, the G292 Series enables world-leading HPC within a 2U chassis. • Supports up to 8 x double slot GPU cards • AMD EPYC™ 7002 series processor family • 8-Channel RDIMM/LRDIMM DDR4, 8 x DIMMs • 2 x 10Gb/s SFP+ LAN ports (Mellanox® ConnectX-4 Lx) • 1 x Dedicated management port • 6 x 2.5" SATA and 2 x 2.5 NVMe hot-swap HDD/SSD bays • 2 x M.2 with PCIe Gen3 x4/x2 interface • 8 x PCIe Gen3 expansion slots for GPU cards • 2 x PCIe Gen4 x16 low-profile slots for add-on cards • Aspeed® AST2500 remote management controller • 2+0 2200W 80 PLUS Platinum power supply AMD EPYC™ 7002 Series Processor (Rome) The next generation of AMD EPYC has arrived, providing incredible compute, IO and bandwidth capability – designed to meet the huge demand for more compute in big data analytics, HPC and cloud computing. Built on 7nm advanced process technology, allowing for denser compute capabilities with lower power consumption Up to 64 core per CPU, built using Zen 2 high performance cores and AMD’s innovative chiplet architecture Supporting PCIe Gen 4.0 with a bandwidth of up to 64GB/s, twice of PCIe Gen 3.0 Embedded security protection to help defend your CPU, applications, and data NVIDIA® Tesla ® V100 Support GIGABYTE’s AMD EPYC server systems and motherboards are fully compatible and qualified to use with NVIDIA’s Tesla® V100 GPU, an advanced data center GPU built to accelerate AI, HPC, and graphics.
    [Show full text]
  • Amd Ryzen™ 5000 Series Processors the Fastest in the Game1
    QUICK REFERENCE GUIDE AMD RYZEN™ 5000 SERIES PROCESSORS THE FASTEST IN THE GAME1 THE FASTEST IN THE GAME1 THE LATEST TECHNOLOGIES BUILD WITH CONFIDENCE AMD Ryzen™ 5000 Series processors power the next generation of All AMD RyzenTM 5000 Series processors come with the full suite With AMD RyzenTM 5000 Series desktop processors, you can demanding games, providing one of a kind immersive experiences of Ryzen™ processor technologies designed to elevate your PC’s easily configure and customize your rig for the ultimate gaming and dominate any multithreaded task like 3D and video rendering, processing power including Precision Boost 2 and Precision Boost build. These processors drop-in ready with a BIOS update on and software compiling. With up to 16 cores, 32 threads, boost Overdrive3. Also get maximum graphics and storage bandwidth AMD 500 and 400 Series motherboards. You can easily tweak 2 clocks of up to 4.9GHz and up to 72MB of cache in select models, with these PCIe® 4.0 ready processors. That's not all, AMD and tune your processor with Ryzen™ Master and jump in the TM AMD Ryzen 5000 Series processors deliver the ultimate gaming RyzenTM 5000 Series features the exceptional 7nm architecture game faster with AMD StoreMI technology. performance. performance-per-watt. CORES/ UP TOMAX/BASE TOTAL PCIE® 4.0 LANES UNLOCKED FOR IN BOX THREADS TYPICAL TDP FREQUENCY2,4 CACHE WITH X570 CHIPSET OVERCLOCKING5 ARCHITECTURE COOLER (USABLE / TOTAL) AMD Ryzen™ 9 5950X 16/32 105W 4.9/3.4 72MB 36/44 Yes "Zen 3" - AMD Ryzen™ 9 3950X 16/32 105W 4.7/3.5 72MB
    [Show full text]
  • Amd Ryzentm 5000 Series Mobile Processors
    AMD RYZENTM 5000 SERIES MOBILE PROCESSORS ELEVATING BUSINESS COMPUTING BUILT FOR SUCCESS AMD Ryzen™ 5000 Series Mobile Processors offer the advanced technology you need to stay ahead of the competition. Built on up to 7nm “Zen 3” architecture AMD Ryzen™ is the only processor Up to 90% higher multi-thread for processor performance leadership and family with up to 8 high performance performance vs the competition.2 incredible battery life. X86 cores for ultrathin notebooks.1 POWERHOUSE PRODUCTIVITY Whether working from office, huddle room, home or hotel, stay productive everywhere with the winning performance offered by AMD Ryzen™ 5000 series mobile processors. AMD RyzenTM 7 5800U Performance +90% +49% +30% +23% Up to +2% +9% .......................................................................................................................................................................................................... Cinebench R20 Single Cinebench R20 Multi- GeekBench v5 PassMark 10 PCMark 10 Benchmark MS Office Overall Thread Thread Multi-Core CPU Mark PCMark 10 Apps Intel Core i7-1165G7 RyzenTM 7 5800U See endnote 2 AMD RyzenTM 5 5600U Performance +90% +43% Up to +29% +5% +19% +8% .......................................................................................................................................................................................................... Cinebench R20 Single Cinebench R20 Multi- GeekBench v5 PassMark 10 PCMark 10 Benchmark MS Office Overall Thread Thread Multi-Core CPU Mark PCMark 10 Apps
    [Show full text]
  • “架构+工艺”,Cpu 业务拉动业绩持续成长 ( )投资价值分析报告| Amd Amd.O 2019.10.10
    2: “架构+工艺”,CPU 业务拉动业绩持续成长 ( )投资价值分析报告| AMD AMD.O 2019.10.10 中信证券研究部 核心观点 AMD CPU 芯片新品架构设计进步迅速,作为纯芯片设计公司,所采用台积电 先进代工工艺历史上第一次阶段性领先竞争对手英特尔。随着有竞争力的新品 持续发布,在 PC 和服务器芯片领域,公司未来三年有望凭借 7nm、7nm+及 5nm 高性价比产品持续抢占份额、扩增营收,服务器 CPU 芯片市场份额有望 创历史新高,公司作为 CPU 和 GPU 双领域全球龙头公司,有望实现持续高速 成长,值得长期重点关注。 徐涛 ▍唯一兼具 CPU+独立 GPU 芯片厂商。公司成立至今经历了英特尔第二供货商、 首席电子分析师 IDM、Fabless+GlobalFoundries、Fabless+TSMC 四个阶段。历史上 2003 年 前后产品性能一度超越英特尔, 年收购 设计厂商 ,成为唯一兼 S1010517080003 2006 GPU ATI 具 CPU 与独立 GPU 设计能力的厂商。我们估测 2018 年公司 PC CPU 收入 22.93 亿美元(营收占比 36%,市场份额 13%);GPU 收入 18.32 亿美元(营收占比 28%,市场份额 18%);服务器收入 3.31 亿美元(营收占比 5%,市场份额 3.2%); 嵌入式与半定制等业务收入 20.19 亿美元(营收占比 31%)。 ▍高壁垒 640 亿美元 CPU+GPU 市场,市场第二名。公司各细分市场中,PC 端 CPU 市场 322 亿美元,服务器端 CPU 市场 166 亿美元,GPU 市场约 120 亿美 元,另有以游戏主机芯片为主的半定制芯片市场,约 34 亿美元。总体市场空间 郑泽科 巨大,公司在 CPU 与 GPU 市场长期为市场第二名,主要竞争对手是英特尔与 电子分析师 英伟达,AMD 在游戏主机芯片市场占据绝大部分份额。凭借近年来架构设计能 S1010517100002 力提升+拥抱台积电先进代工工艺,AMD 有望通过优势新品持续扩大市场份额。 ▍“架构+工艺”,CPU 业务拉动业绩成长。我们认为公司未来两年有望在 CPU 市 场率先提升份额,拉动公司成长。回顾历史,我们发现具有竞争力的新品发布 是市场份额变化重要因素;从设计角度,公司最新发布的 Zen2 架构通过务实创 新设计已实现对英特尔的反超;从制造角度,公司将采用业界顶尖的台积电 7nm,在制程上领先英特尔,后续还将采用 7nm+及 5nm;从时间角度,AMD 将具备领先英特尔至少半年上市的市场机遇期。公司采用灵活的 Fabless 模式, 胡叶倩雯 相较于英特尔 IDM 模式更加适应未来市场趋势,且合作伙伴台积电已将高性能 电子分析师 计算列为重要战略方向。公司 CEO 苏姿丰对产品与技术把握清晰,有望引领公 S1010517100004 司快速发展。 ▍风险因素:PC、服务器市场景气度低于预期;台积电先进制程受不确定性因素 影响演进及产能;公司核心管理层出现重大变化;英特尔先进工艺芯片超预期。 ▍投资建议:我们预计未来两年公司 CPU 业务将随新品发布持续提升份额,拉动 营收并提高毛利率,GPU 业务总体保持平稳增长。基于该假设,我们预测公司 2019/2020/2021 年 EPS 分别为 0.65/1.00/1.54 美元,按照 2020 年 35 倍 PE, 给予目标价 35 美元,首次覆盖,给予“买入”评级。 项目/年度 2017 2018 2019E 2020E 2021E 营业收入 百万美元 ( ) 5,381.00 6,475.00 7,022.85 9,540.25
    [Show full text]