A Bibliography of Publications in IEEE Micro

Total Page:16

File Type:pdf, Size:1020Kb

A Bibliography of Publications in IEEE Micro A Bibliography of Publications in IEEE Micro Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 581 4148 E-mail: [email protected], [email protected], [email protected] (Internet) WWW URL: http://www.math.utah.edu/~beebe/ 16 September 2021 Version 2.108 Title word cross-reference -Core [MAT+18]. -Cubes [YW94]. -D [ASX19, BWMS19, DDG+19, Joh19c, PZB+19, ZSS+19]. -nm [ABG+16, KBN16, TKI+14]. #1 [Kah93i]. 0.18-Micron [HBd+99]. 0.9-micron + [Ano02d]. 000-fps [KII09]. 000-Processor $1 [Ano17-58, Ano17-59]. 12 [MAT 18]. 16 + + [ABG+16]. 2 [DTH+95]. 21=2 [Ste00a]. 28 [BSP 17]. 024-Core [JJK 11]. [KBN16]. 3 [ASX19, Alt14e, Ano96o, + AOYS95, BWMS19, CMAS11, DDG+19, 1 [Ano98s, BH15, Bre10, PFC 02a, Ste02a, + + Ste14a]. 1-GHz [Ano98s]. 1-terabits DFG 13, Joh19c, LXB07, LX10, MKT 13, + MAS+07, PMM15, PZB+19, SYW+14, [MIM 97]. 10 [Loc03]. 10-Gigabit SCSR93, VPV12, WLF+08, ZSS+19]. 60 [Gad07, HcF04]. 100 [TKI+14]. < [BMM15]. > [BMM15]. 2 [Kir84a, Pat84, PSW91, YSMH91, ZACM14]. [WHCK18]. 3 [KBW95]. II [BAH+05]. ∆ 100-Mops [PSW91]. 1000 [ES84]. 11- + [Lyl04]. 11/780 [Abr83]. 115 [JBF94]. [MKG 20]. k [Eng00j]. µ + [AT93, Dia95c, TS95]. N [YW94]. x 11FO4 [ASD 05]. 12 [And82a]. [DTB01, Dur96, SS05]. 12-DSP [Dur96]. 1284 [Dia94b]. 1284-1994 [Dia94b]. 13 * [CCD+82]. [KW02]. 1394 [SB00]. 1394-1955 [Dia96d]. 1 2 14 [WD03]. 15 [FD04]. 15-Billion-Dollar [KR19a]. 30-Year [Dia96a]. 300 [Gre07d]. 157-nm [Ano01h]. 16 [JBF94, KS90]. 300-MHz [JBF94]. [DD05, GDLT86, HM93, Sho85]. 16-Bit 300-mm [Ano02c, HOHCV99]. 3171 [SZH82, De 83, NN81a, NN81b, mDTG81]. [BSC+90]. 32 16-Core [FJL+13, YMA+13]. 16-Kbit [CHH+98, KS90, Kur21, RY21, RDJ+13]. [HM93]. 16-nm [FME18]. 16-Way [AK00]. 32-Bit [BY07, Bor85a, CBLR86, GmDT83, 160 [RT92]. 17 [SS06]. 18 [Ano87d, KS07]. Isa83, Kir83a, MKOK88, Mye83c, Mye84b, 18-GHz [Ano87d]. 19 [AM08]. 196 NG87, Smo88b, YSMH91, Bor85b, KS90]. [CES+11]. 1984 [Jef84]. 1990s [Smi96b]. 32-nm [RDJ+13]. 32-Way [KAO05]. 360 1994 [Dia94b]. 1A [XLW+12]. [AB06]. 376 [PK88]. 386 [Ano88c]. 386-20 [Ano88c]. 386-Monopoly [Sla91a]. 390 2 [SAC+99]. 3D [Ano95b]. 3DNow [OFW99]. [Ano88c, Ano97-28, IKN+99, KSI+96, Lee96, 3rd [Pea95]. 3T1D [LCWB08]. MS03, PFC+02b, RMC04, Ste14b, SSB20]. 2.0 [Ano01c, Mat93b, VBC+21]. 2.0-GHz 4 [Ano99x, Ano03b, Ano03d, DP97, Das21, [Ano01c]. 2.5 [Ano03c]. 20 [Ano88c]. 200 GDES08, KSM99, PDT98, Pow94, Spr02b]. [IKNS88, NG87]. 200-MHz [NG87]. 2000 4-Bit [HYM+90]. 4-Gbps [Ano99-33, KY91, Mat98d, Mat00e]. 2008 [DP97, GDES08, PDT98]. 4.1 [Mat93c]. [ET09]. 2011 [FV12, HGPT12]. 2012 40-nm [Man09]. 4096-Core [ZSB21]. 46 [Bel13, FL13, Tor12]. 2013 [BCM+14]. 488 [NS81]. 49 [Fan96]. [Goo14, Mar14, Sco14]. 2014 [Ano14r, Gre15c, KT14, Mud15]. 2015 5 [Ano98z, BHM+00, HVS+07]. 5-GHz [Ano14a, Ste16]. 2016 [HVS+07]. 5.1 [Mat93b]. 5.5 [Ano15b, JQ17, Mar17, Wei17]. 2017 [Mat97c, Mat98e]. 500 [UAN+93]. 5000 [Ano16a, Ano16b, Ano17y, Bro17]. 2018 [RCC07]. 50th [Ano97l]. 511-Core [Ano17m, Ano17b, Dwa18, Gon18]. 2019 [DXT+18]. 520 [RHH+03]. 520-MHz [Ano19-29, Hil19]. 2020 [RHH+03]. 533 [Ano96k, Ano97-31]. [Ano20v, Ano20w, Ano20-35, Ano20-36, 533-MHz [Ano96k, Ano97-31]. 56 [Ano97c]. Ano20-37, Ano20-57, Jim21, Kul20, Kur20a]. 56-Kbps [Ano97c]. 2059 [Ang90]. 21 [AW10]. 2100 [Roe86]. 21164 [ERPR95]. 21264 [Kes99]. 21364 6 [Mat93d]. 6.0 [Das21, MBJ08]. 6.1 [MBL+02, WPM03]. 21st [Mat97d]. 6000 [OB91]. 601 [PVYU94]. 604 [IJ98, Sak99b, Sak00d, Ano14-34, Emm07c]. [SDC94]. 60X [AAWC94]. 6300 21st-Century [IJ98, Sak00d]. 22 [RE11]. [Han85, Mye85a]. 64 [Ano97w, Ano03d, 23 [BB12]. 24 [KZ13]. 25 [NN14]. 250 BCC+00, HMR+00, KKL+00, SCV01]. [HYM+90]. 250-MIPS [HYM+90]. 256-Bit 64-Bit [MMG+99]. 256-Kbyte [ASD+05]. 25nm [AT93, BHM+00, HL99, KM89, BBTV15]. [Ano03b]. 25th [Ano96p]. 26 [GH20, NS15]. 64-Core [DFG+13]. 64.0 [Das21]. 64K 27 [GV21]. 28-nm [CCA+19]. 286 [SKO89]. [Mye83b]. 6800 [MM05]. 68HC05 [Ano97u]. 286-Based [NC86]. 29 [Eec18d]. 2nd 6M [RMC04]. 6T [LCWB08]. 6th [Del91b, Luu90b, Pat90]. [DKyL+17, Kah91a]. 6th-Generation [DKyL+17]. 3 [Ano03d, HWG+09]. 3.0 [Ano96g, Mat93a]. 3.06 [Ano03b]. 30 780 [Abr83]. 796 [OL85]. 3 80 [Ano88c]. 802.11b [Ano02c]. 802.16 [ACA+20, CES17, GMC18, JLWL20, Joh19d, [Ano02e]. 80386 [EAA85]. 8085-Based KJT+11, KSK18, LHMH91, NGSW17, [CJ85]. 8086 [HF81]. 8088-based [Sho85]. OYK+17, PHB15, Ipe19]. Access 80960 [Rya88]. 82 [Mye82a]. 82460GX [Ano02e, Ano14p, HKS16, KMK01, LH12, [DGMM00]. 855 [JC84]. 870 [BCC+02]. LTQZ07, SZZ01, WSZS05, ZZ02, Gre01e, 88000 [Mel89]. 88000-RISC [Mel89]. Gre05f, Ste96b, Ste96f]. Access-Execute [HKS16]. Access-Interleaving [LTQZ07]. 90-nm [Ano03c]. 9000 [SGC94]. 91 Access-Mode [ZZ02]. Accessed [YYK+20]. [Mye91b]. 95 [Ano96t, Mat95d, Mat97d]. 97 Accessible [BMK+21a, BMK+21b]. [San97a]. 98 [Mat98d, Sca98]. 982S Accolade [Ste92d]. Accolades [Ste92d]. [SGC94]. 9th [Ste84a]. accommodate [SLM+97]. Accounting [EE10, JGC+11]. Accurate = [Ano87a]. [BdS98, Hin88, RPE10]. Achieve [Ano17f, Ano17e, Ano17c, Ano17d, LHN95]. A-Board [Alb09]. A-Changin [Mat08b]. Achieved [EM84]. Achieving A/V [GDES08]. A100 [CGG+21]. AAI [LLL+16, MBK+92, SIL+15, SRA+04]. [Ste08a]. Abolishing [Hau88c]. ACM [Ano97l, Ano16c, Ano17g]. Abstraction [NRS+08]. Abstractions acquisition [Jae82a, Jae82b, Jae82c, Jae83, [BMM15, MRJ+15, Sol19]. Abuse Tau84, Tau87]. Acrobat [Ano99x]. [HCPS03, Kir01, Ste01e]. Abusing [MSS15]. Acronym [War92d, Rob98b]. Act AC [GA86]. Academia [Eec17b]. [Ano99a, Ste84b, Pit91, Ste91e, Ste07e]. Academic [Gre14c, Gre97d]. Accelerate Action [Noy85]. Activation [CBJ10]. [ADJK20, HSR18, TONH96]. Accelerated Activations [RAG19]. Active [BCF+14, KMK+19, KBN16, LP21a, LP21b, [Ano97s, CMAS11, DRB+12, GGJ+96, ML05]. Accelerates [DDHS00]. GD01, LDF+13, Mye84a, Rob97a, WOM01, Accelerating [ABA+21a, ABA+21b, ZLTW13, ZHPR17, VBB95]. Activities ABC+20, Bea20, Bha18, ESG+05, GSLK11, [Kah91c, STL92]. Activity HLS+21, HKS16, KRD+20, KLM+15, Lee95, [Eng00h, RGH+10]. ACTORS [BBE+11]. Lee96, LLT+08, MSP+19, PCC+15, SJFM19, Acts [Gre19b, Ste06b]. Acyclic [ED18]. Ad SMQP10, Wal97]. Acceleration [Ano18a, Ano18c, Ano18l, Ano18k, Ano18u, [AMFFM+16, CKG+09, DWLN20, Ano18-29, Ano18z, Ano18-27, Ano18-30, EWW+19, Eec18c, ESCB13, EP19, GDN+17, Ano18-28, Ano18-33, Ano18-32]. Adams GHY+17, HLS+21, KCXmWH17, KS18, [Far88b]. Adaptation [ZZ05]. Adapter MKG+20, MAJ+18, NBS+18, SAC+21a, [Edd02]. Adapting [Bos03b, Hal91]. SAC+21b, SD21, WLF+08, Gre06a]. Adaptive Accelerator [BGRKR88, BDV+08, [FAWR+11, HL06, KJT+11, KMPS06, CDS+15, CG95, DXT+18, FM91, GCE+21a, QJP+08, RCC12, RSE01, SMR20, TS91]. GCE+21b, HKS16, HGS+17, mHP18, Adapts [CR95a]. ADAS [CPS+18]. Add JJK+11, KSA+19, KYG19, KJL+10, KIR19, [FBHN04, Ste89b, Ste92c]. Add-on [Ste89b]. KSB21, MAK19, MKM15, MMG+99, Add-Ons [Ste92c]. Adding [ENSD03]. OYS+11, PmWH08, PGW+20, PZK+18, Additional [Mye84b]. Address SSY97, SRWB15, VCS+19, WFW+21a, [Bha17, Bha18, CD97a, CD97b, OG01, WFW+21b, WPO+07, WWZ+08, Pri90]. PHB15, RLS11, SDG+21, WFA+10, YKG18, Accelerators Dv87, Mat95d]. Address-Correlated 4 [WFA+10]. Addressable affordable [Bos05b]. After [GGB+15, MC92, PCW15, Rob92]. [McG82, Mil88a, Gre05a]. Aftermath Addressing [AW03, Her93]. Adds [Gre19a]. Afterword [DM88b]. again [Ano98g]. Adjusting [Gre18a]. [Ano00g, Gre06b]. Against Adjustments [Gre20d]. Administration [Hau88b, Kar88a, PZL06, Ste85e, Ste86a, [Gre21b, Gre17d]. Administrative Ste88c, YFDV19, SWL11, Ste89f]. Age [Ano18t]. Adobe [Ano98z, Ano99x, Mat97c]. [DPY18, Gre09d, Gre15c, Gre99b]. Agenda adopt [Gre99c]. Adoption [Ano98u]. [Cha85b, Smi17]. Aggregation [AS91a]. ADSP [Roe86]. ADSP-2100 [Roe86]. Aggressive [Gre00b, KMPS06]. Agile Advance [Gre16b, Ste94f]. Advanced [BCJ+20, BC20, DTS20, GHS17, Joh20a, [BGRKR88, DG87, DG88, DG89, LWC +16, PGW+20, TGC+20, WDK+20]. HOHCV99, Her93, KKL+00, KM05, Mis93, Aging [WKK+14, YBNS15]. Agrees SYKM11, SF18, AHO+90, BKM+82, BT84, [Ste15b]. Ahead [Alb08, Ano98t, FH05, FMT91, Shl93, VS87, Ano97d, PJ91]. Gre10a, War90b, ZZ05, Yu96, Mat96b]. AI advanced-architecture [BKM+82]. [FOP+19, Gre19d, MD20, RCK+21, VCS+19]. Advances [Ano17l, INKM05, KOI95, Nic84, Aid [AJ83, Eng00m, LPL86, LMC+83, Ste98f, Ste08d, Ste08e, ZSS+19, Mat01a]. Mye82d, HC83a, LDCS09]. Aided Advancing [De 94, Yao85]. aids [Mat97a, BM85]. [Ano00g, Eng00a, LNLG20, Sak99c, Far84]. aimed [Ano03d]. Air [Ano02e]. Aladdin Advantage [Ste02d]. Advantages [SRWB15]. Alamos [Ano98q]. Algebra [MKRC97]. Advert [Ano09a]. [HGS+17, Ipe19]. Algebraic [Ber81, Dun82]. Advertisement Algol [AF82]. Algorithm [Den83, LHC+12, [Ano13a, Ano13d, Ano13b, Ano13c, Ano13e, LYP+18, LHN95, RGF96, RGF95]. Ano13g, Ano13i, Ano14e, Ano14f, Ano14g, Algorithm-Driven [LYP+18]. Ano14h, Ano14t, Ano14n, Ano14p, Ano14q, Algorithmic [SS16]. Algorithms Ano14u, Ano14-27, Ano14-29, Ano14-30, [AML05, Ano17l, KSA+19, KMG+03, Ano14-32, Ano14-33, Ano14-34, Ano14-38, KAK96, SG01b, SGP02, SZH82, TLYL04, Ano14-39, Ano15j, Ano15c, Ano15f, Ano15l, Lou91, MIM+97, TONH96]. Aliasing Ano15s, Ano15t, Ano15u, Ano15v, Ano15b, [SO02]. Aliasing-Free [SO02]. Alice Ano15-31, Ano15-29, Ano15-30, Ano15-32, [Ste14a, Ste14b]. Alignment [KYGW17]. Ano15-40, Ano15-34, Ano15-39, Ano15-35, alive [Rob00a]. All-Electronic [Ano97e]. Ano15-36, Ano16f, Ano16d, Ano16e, Ano16i, All-Level [HNR10]. All-Programmable Ano16j, Ano16p, Ano16q, Ano16s, Ano16v, [FME18]. Alliance Ano16t, Ano16w, Ano16y, Ano16-37, [Ano96m, Ano96s, Ano97t, Wil97]. Ano16-38, Ano16-41, Ano16-48, Ano17f, Allocation [Gre09f, PPBS03]. allure Ano17k, Ano17l, Ano17n, Ano17q, Ano17u, [Ano96n]. almost [DBDF97, Ste94f]. Alone Ano17v, Ano17-27, Ano17y, Ano17-28, [HABHW+18]. along [Gre14d]. Alpha Ano17-39, Ano17-44, Ano17-45, Ano17-48, [Ano03e, ERPR95, Kes99, McL93, MBL+02, Ano17-55, Ano14-28, Ano16h, Ano16u, WPM03]. Alpha-21264 [Kes99]. Alphorn Ano16-36, Ano16-40, Ano16-47, Ano17e]. [AGH+91]. Also [Ano94c, Mat95d]. Advice [Hil19, Mat07a]. A´ergia Alternative [DMMD11]. AFAN [GTF97]. Affair [ARS03, MK10, MSWP03, TC15]. [Kir90b]. Affect [BGK97]. Affects alternatives [VS87]. AltiVec [DDHS00]. [Ano00b, Eng00l]. Affiliate [Ano13d]. Always [BBC+15]. Always-on [BBC+15]. 5 AM [KBW95]. AM29000 [Joh87, Man92]. Ste06b]. Any [Ste08d, Ste08e, Ste94f]. Am9511 [FL84]. Amalgams [Mac93].
Recommended publications
  • Investigations on Hardware Compression of IBM Power9 Processors
    Investigations on hardware compression of IBM Power9 processors Jérome Kieffer, Pierre Paleo, Antoine Roux, Benoît Rousselle Outline ● The bandwidth issue at synchrotrons sources ● Presentation of the evaluated systems: – Intel Xeon vs IBM Power9 – Benchmarks on bandwidth ● The need for compression of scientific data – Compression as part of HDF5 – The hardware compression engine NX-gzip within Power9 – Gzip performance benchmark – Bitshuffle-LZ4 benchmark – Filter optimizations – Benchmark of parallel filtered gzip ● Conclusions – on the hardware – on the compression pipeline in HDF5 Page 2 HDF5 on Power9 18/09/2019 Bandwidth issue at synchrotrons sources Data analysis computer with the main interconnections and their associated bandwidth. Data reduction Upgrade to → Azimuthal integration 100 Gbit/s Data compression ! Figures from former generation of servers Kieffer et al. Volume 25 | Part 2 | March 2018 | Pages 612–617 | 10.1107/S1600577518000607 Page 3 HDF5 on Power9 18/09/2019 Topologies of Intel Xeon servers in 2019 Source: intel.com Page 4 HDF5 on Power9 18/09/2019 Architecture of the AC922 server from IBM featuring Power9 Credit: Thibaud Besson, IBM France Page 6 HDF5 on Power9 18/09/2019 Bandwidth measurement: Xeon vs Power9 Computer Dell R840 IBM AC922 Processor 4 Intel Xeon (12 cores) 2 IBM Power9 (16 cores) 2.6 GHz 2.7 GHz Cache (L3) 19 MB 8x 10 MB Memory channels 4x 6 DDR4 2x 8 DDR4 Memory capacity → 3TB → 2TB Memory speed theory 512 GB/s 340 GB/s Measured memory speed 160 GB/s 270 GB/s Interconnects PCIe v3 PCIe v4 NVlink2 & CAPI2 GP-GPU co-processor 2Tesla V100 PCIe v3 2Tesla V100 NVlink2 Interconnect speed 12 GB/s 48 GB/s CPU ↔ GPU Page 8 HDF5 on Power9 18/09/2019 Strength and weaknesses of the OpenPower architecture While amd64 is today’s de facto standard in HPC, it has a few competitors: arm64, ppc64le and to a less extend riscv and mips64.
    [Show full text]
  • Wind Rose Data Comes in the Form >200,000 Wind Rose Images
    Making Wind Speed and Direction Maps Rich Stromberg Alaska Energy Authority [email protected]/907-771-3053 6/30/2011 Wind Direction Maps 1 Wind rose data comes in the form of >200,000 wind rose images across Alaska 6/30/2011 Wind Direction Maps 2 Wind rose data is quantified in very large Excel™ spreadsheets for each region of the state • Fields: X Y X_1 Y_1 FILE FREQ1 FREQ2 FREQ3 FREQ4 FREQ5 FREQ6 FREQ7 FREQ8 FREQ9 FREQ10 FREQ11 FREQ12 FREQ13 FREQ14 FREQ15 FREQ16 SPEED1 SPEED2 SPEED3 SPEED4 SPEED5 SPEED6 SPEED7 SPEED8 SPEED9 SPEED10 SPEED11 SPEED12 SPEED13 SPEED14 SPEED15 SPEED16 POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 WEIBC1 WEIBC2 WEIBC3 WEIBC4 WEIBC5 WEIBC6 WEIBC7 WEIBC8 WEIBC9 WEIBC10 WEIBC11 WEIBC12 WEIBC13 WEIBC14 WEIBC15 WEIBC16 WEIBK1 WEIBK2 WEIBK3 WEIBK4 WEIBK5 WEIBK6 WEIBK7 WEIBK8 WEIBK9 WEIBK10 WEIBK11 WEIBK12 WEIBK13 WEIBK14 WEIBK15 WEIBK16 6/30/2011 Wind Direction Maps 3 Data set is thinned down to wind power density • Fields: X Y • POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 • Power1 is the wind power density coming from the north (0 degrees). Power 2 is wind power from 22.5 deg.,…Power 9 is south (180 deg.), etc… 6/30/2011 Wind Direction Maps 4 Spreadsheet calculations X Y POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 Max Wind Dir Prim 2nd Wind Dir Sec -132.7365 54.4833 0.643 0.767 1.911 4.083
    [Show full text]
  • Vxworks Architecture Supplement, 6.2
    VxWorks Architecture Supplement VxWorks® ARCHITECTURE SUPPLEMENT 6.2 Copyright © 2005 Wind River Systems, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means without the prior written permission of Wind River Systems, Inc. Wind River, the Wind River logo, Tornado, and VxWorks are registered trademarks of Wind River Systems, Inc. Any third-party trademarks referenced are the property of their respective owners. For further information regarding Wind River trademarks, please see: http://www.windriver.com/company/terms/trademark.html This product may include software licensed to Wind River by third parties. Relevant notices (if any) are provided in your product installation at the following location: installDir/product_name/3rd_party_licensor_notice.pdf. Wind River may refer to third-party documentation by listing publications or providing links to third-party Web sites for informational purposes. Wind River accepts no responsibility for the information provided in such third-party documentation. Corporate Headquarters Wind River Systems, Inc. 500 Wind River Way Alameda, CA 94501-1153 U.S.A. toll free (U.S.): (800) 545-WIND telephone: (510) 748-4100 facsimile: (510) 749-2010 For additional contact information, please visit the Wind River URL: http://www.windriver.com For information on how to contact Customer Support, please visit the following URL: http://www.windriver.com/support VxWorks Architecture Supplement, 6.2 11 Oct 05 Part #: DOC-15660-ND-00 Contents 1 Introduction
    [Show full text]
  • Ray Tracing on the Cell Processor
    Ray Tracing on the Cell Processor Carsten Benthin† Ingo Wald Michael Scherbaum† Heiko Friedrich‡ †inTrace Realtime Ray Tracing GmbH SCI Institute, University of Utah ‡Saarland University {benthin, scherbaum}@intrace.com, [email protected], [email protected] Abstract band”) architectures1 to hide memory latencies, at exposing par- Over the last three decades, higher CPU performance has been allelism through SIMD units, and at multi-core architectures. At achieved almost exclusively by raising the CPU’s clock rate. Today, least for specialized tasks such as triangle rasterization, these con- the resulting power consumption and heat dissipation threaten to cepts have been proven as very powerful, and have made modern end this trend, and CPU designers are looking for alternative ways GPUs as fast as they are; for example, a Nvidia 7800 GTX offers of providing more compute power. In particular, they are looking 313 GFlops [16], 35 times more than a 2.2 GHz AMD Opteron towards three concepts: a streaming compute model, vector-like CPU. Today, there seems to be a convergence between CPUs and SIMD units, and multi-core architectures. One particular example GPUs, with GPUs—already using the streaming compute model, of such an architecture is the Cell Broadband Engine Architecture SIMD, and multi-core—become increasingly programmable, and (CBEA), a multi-core processor that offers a raw compute power CPUs are getting equipped with more and more cores and stream- of up to 200 GFlops per 3.2 GHz chip. The Cell bears a huge po- ing functionalities. Most commodity CPUs already offer 2–8 tential for compute-intensive applications like ray tracing, but also cores [1, 9, 10, 25], and desktop PCs with more cores can be built requires addressing the challenges caused by this processor’s un- by stacking and interconnecting smaller multi-core systems (mostly conventional architecture.
    [Show full text]
  • IBM System P5 Quad-Core Module Based on POWER5+ Technology: Technical Overview and Introduction
    Giuliano Anselmi Redpaper Bernard Filhol SahngShin Kim Gregor Linzmeier Ondrej Plachy Scott Vetter IBM System p5 Quad-Core Module Based on POWER5+ Technology: Technical Overview and Introduction The quad-core module (QCM) is based on the well-known POWER5™ dual-core module (DCM) technology. The dual-core POWER5 processor and the dual-core POWER5+™ processor are packaged with the L3 cache chip into a cost-effective DCM package. The QCM is a package that enables entry-level or midrange IBM® System p5™ servers to achieve additional processing density without increasing the footprint. Figure 1 shows the DCM and QCM physical views and the basic internal architecture. to I/O to I/O Core Core 1.5 GHz 1.5 GHz Enhanced Enhanced 1.9 MB MB 1.9 1.9 MB MB 1.9 L2 cache L2 L2 cache switch distributed switch distributed switch 1.9 GHz POWER5 Core Core core or CPU or core 1.5 GHz L3 Mem 1.5 GHz L3 Mem L2 cache L2 ctrl ctrl ctrl ctrl Enhanced distributed Enhanced 1.9 MB Shared Shared MB 1.9 Ctrl 1.9 GHz L3 Mem Ctrl POWER5 core or CPU or core 36 MB 36 MB 36 L3 cache L3 cache L3 DCM 36 MB QCM L3 cache L3 to memory to memory DIMMs DIMMs POWER5+ Dual-Core Module POWER5+ Quad-Core Module One Dual-Core-chip Two Dual-Core-chips plus two L3-cache-chips plus one L3-cache-chip Figure 1 DCM and QCM physical views and basic internal architecture © Copyright IBM Corp. 2006.
    [Show full text]
  • Power4 Focuses on Memory Bandwidth IBM Confronts IA-64, Says ISA Not Important
    VOLUME 13, NUMBER 13 OCTOBER 6,1999 MICROPROCESSOR REPORT THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE Power4 Focuses on Memory Bandwidth IBM Confronts IA-64, Says ISA Not Important by Keith Diefendorff company has decided to make a last-gasp effort to retain control of its high-end server silicon by throwing its consid- Not content to wrap sheet metal around erable financial and technical weight behind Power4. Intel microprocessors for its future server After investing this much effort in Power4, if IBM fails business, IBM is developing a processor it to deliver a server processor with compelling advantages hopes will fend off the IA-64 juggernaut. Speaking at this over the best IA-64 processors, it will be left with little alter- week’s Microprocessor Forum, chief architect Jim Kahle de- native but to capitulate. If Power4 fails, it will also be a clear scribed IBM’s monster 170-million-transistor Power4 chip, indication to Sun, Compaq, and others that are bucking which boasts two 64-bit 1-GHz five-issue superscalar cores, a IA-64, that the days of proprietary CPUs are numbered. But triple-level cache hierarchy, a 10-GByte/s main-memory IBM intends to resist mightily, and, based on what the com- interface, and a 45-GByte/s multiprocessor interface, as pany has disclosed about Power4 so far, it may just succeed. Figure 1 shows. Kahle said that IBM will see first silicon on Power4 in 1Q00, and systems will begin shipping in 2H01. Looking for Parallelism in All the Right Places With Power4, IBM is targeting the high-reliability servers No Holds Barred that will power future e-businesses.
    [Show full text]
  • SIMD Extensions
    SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel.
    [Show full text]
  • REPORT Compaq Chooses SMT for Alpha Simultaneous Multithreading
    VOLUME 13, NUMBER 16 DECEMBER 6, 1999 MICROPROCESSOR REPORT THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE Compaq Chooses SMT for Alpha Simultaneous Multithreading Exploits Instruction- and Thread-Level Parallelism by Keith Diefendorff Given a full complement of on-chip memory, increas- ing the clock frequency will increase the performance of the As it climbs rapidly past the 100-million- core. One way to increase frequency is to deepen the pipeline. transistor-per-chip mark, the micro- But with pipelines already reaching upwards of 12–14 stages, processor industry is struggling with the mounting inefficiencies may close this avenue, limiting future question of how to get proportionally more performance out frequency improvements to those that can be attained from of these new transistors. Speaking at the recent Microproces- semiconductor-circuit speedup. Unfortunately this speedup, sor Forum, Joel Emer, a Principal Member of the Technical roughly 20% per year, is well below that required to attain the Staff in Compaq’s Alpha Development Group, described his historical 60% per year performance increase. To prevent company’s approach: simultaneous multithreading, or SMT. bursting this bubble, the only real alternative left is to exploit Emer’s interest in SMT was inspired by the work of more and more parallelism. Dean Tullsen, who described the technique in 1995 while at Indeed, the pursuit of parallelism occupies the energy the University of Washington. Since that time, Emer has of many processor architects today. There are basically two been studying SMT along with other researchers at Washing- theories: one is that instruction-level parallelism (ILP) is ton. Once convinced of its value, he began evangelizing SMT abundant and remains a viable resource waiting to be tapped; within Compaq.
    [Show full text]
  • POWER® Processor-Based Systems
    IBM® Power® Systems RAS Introduction to IBM® Power® Reliability, Availability, and Serviceability for POWER9® processor-based systems using IBM PowerVM™ With Updates covering the latest 4+ Socket Power10 processor-based systems IBM Systems Group Daniel Henderson, Irving Baysah Trademarks, Copyrights, Notices and Acknowledgements Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Active AIX® POWER® POWER Power Power Systems Memory™ Hypervisor™ Systems™ Software™ Power® POWER POWER7 POWER8™ POWER® PowerLinux™ 7® +™ POWER® PowerHA® POWER6 ® PowerVM System System PowerVC™ POWER Power Architecture™ ® x® z® Hypervisor™ Additional Trademarks may be identified in the body of this document. Other company, product, or service names may be trademarks or service marks of others. Notices The last page of this document contains copyright information, important notices, and other information. Acknowledgements While this whitepaper has two principal authors/editors it is the culmination of the work of a number of different subject matter experts within IBM who contributed ideas, detailed technical information, and the occasional photograph and section of description.
    [Show full text]
  • Implementing Powerpc Linux on System I Platform
    Front cover Implementing POWER Linux on IBM System i Platform Planning and configuring Linux servers on IBM System i platform Linux distribution on IBM System i Platform installation guide Tips to run Linux servers on IBM System i platform Yessong Johng Erwin Earley Rico Franke Vlatko Kosturjak ibm.com/redbooks International Technical Support Organization Implementing POWER Linux on IBM System i Platform February 2007 SG24-6388-01 Note: Before using this information and the product it supports, read the information in “Notices” on page vii. Second Edition (February 2007) This edition applies to i5/OS V5R4, SLES10 and RHEL4. © Copyright International Business Machines Corporation 2005, 2007. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . vii Trademarks . viii Preface . ix The team that wrote this redbook. ix Become a published author . xi Comments welcome. xi Chapter 1. Introduction to Linux on System i platform . 1 1.1 Concepts and terminology . 2 1.1.1 System i platform . 2 1.1.2 Hardware management console . 4 1.1.3 Virtual Partition Manager (VPM) . 10 1.2 Brief introduction to Linux and Linux on System i platform . 12 1.2.1 Linux on System i platform . 12 1.3 Differences between existing Power5-based System i and previous System i models 13 1.3.1 Linux enhancements on Power5 / Power5+ . 14 1.4 Where to go for more information . 15 Chapter 2. Configuration planning . 17 2.1 Concepts and terminology . 18 2.1.1 Processor concepts .
    [Show full text]
  • IBM Powerpc 970 (A.K.A. G5)
    IBM PowerPC 970 (a.k.a. G5) Ref 1 David Benham and Yu-Chung Chen UIC – Department of Computer Science CS 466 PPC 970FX overview ● 64-bit RISC ● 58 million transistors ● 512 KB of L2 cache and 96KB of L1 cache ● 90um process with a die size of 65 sq. mm ● Native 32 bit compatibility ● Maximum clock speed of 2.7 Ghz ● SIMD instruction set (Altivec) ● 42 watts @ 1.8 Ghz (1.3 volts) ● Peak data bandwidth of 6.4 GB per second A picture is worth a 2^10 words (approx.) Ref 2 A little history ● PowerPC processor line is a product of the AIM alliance formed in 1991. (Apple, IBM, and Motorola) ● PPC 601 (G1) - 1993 ● PPC 603 (G2) - 1995 ● PPC 750 (G3) - 1997 ● PPC 7400 (G4) - 1999 ● PPC 970 (G5) - 2002 ● AIM alliance dissolved in 2005 Processor Ref 3 Ref 3 Core details ● 16(int)-25(vector) stage pipeline ● Large number of 'in flight' instructions (various stages of execution) - theoretical limit of 215 instructions ● 512 KB L2 cache ● 96 KB L1 cache – 64 KB I-Cache – 32 KB D-Cache Core details continued ● 10 execution units – 2 load/store operations – 2 fixed-point register-register operations – 2 floating-point operations – 1 branch operation – 1 condition register operation – 1 vector permute operation – 1 vector ALU operation ● 32 64 bit general purpose registers, 32 64 bit floating point registers, 32 128 vector registers Pipeline Ref 4 Benchmarks ● SPEC2000 ● BLAST – Bioinformatics ● Amber / jac - Structure biology ● CFD lab code SPEC CPU2000 ● IBM eServer BladeCenter JS20 ● PPC 970 2.2Ghz ● SPECint2000 ● Base: 986 Peak: 1040 ● SPECfp2000 ● Base: 1178 Peak: 1241 ● Dell PowerEdge 1750 Xeon 3.06Ghz ● SPECint2000 ● Base: 1031 Peak: 1067 Apple’s SPEC Results*2 ● SPECfp2000 ● Base: 1030 Peak: 1044 BLAST Ref.
    [Show full text]
  • IBM Power Systems Performance Report Apr 13, 2021
    IBM Power Performance Report Power7 to Power10 September 8, 2021 Table of Contents 3 Introduction to Performance of IBM UNIX, IBM i, and Linux Operating System Servers 4 Section 1 – SPEC® CPU Benchmark Performance 4 Section 1a – Linux Multi-user SPEC® CPU2017 Performance (Power10) 4 Section 1b – Linux Multi-user SPEC® CPU2017 Performance (Power9) 4 Section 1c – AIX Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 5 Section 1d – Linux Multi-user SPEC® CPU2006 Performance (Power7, Power7+, Power8) 6 Section 2 – AIX Multi-user Performance (rPerf) 6 Section 2a – AIX Multi-user Performance (Power8, Power9 and Power10) 9 Section 2b – AIX Multi-user Performance (Power9) in Non-default Processor Power Mode Setting 9 Section 2c – AIX Multi-user Performance (Power7 and Power7+) 13 Section 2d – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power8) 15 Section 2e – AIX Capacity Upgrade on Demand Relative Performance Guidelines (Power7 and Power7+) 20 Section 3 – CPW Benchmark Performance 19 Section 3a – CPW Benchmark Performance (Power8, Power9 and Power10) 22 Section 3b – CPW Benchmark Performance (Power7 and Power7+) 25 Section 4 – SPECjbb®2015 Benchmark Performance 25 Section 4a – SPECjbb®2015 Benchmark Performance (Power9) 25 Section 4b – SPECjbb®2015 Benchmark Performance (Power8) 25 Section 5 – AIX SAP® Standard Application Benchmark Performance 25 Section 5a – SAP® Sales and Distribution (SD) 2-Tier – AIX (Power7 to Power8) 26 Section 5b – SAP® Sales and Distribution (SD) 2-Tier – Linux on Power (Power7 to Power7+)
    [Show full text]