Ultrasparc T1 (Niagara)

Total Page:16

File Type:pdf, Size:1020Kb

Ultrasparc T1 (Niagara) UltraSPARC T1 (Niagara) Vortrag im Rahmen des Seminars „Ausgewählte Themen in Hardwareentwurf und Optik“ HWS 06 Universität Mannheim Jochen Kinzel 1 Inhalt Überblick Core Crossbar Level 2-Cache DRAM-Controller Floating-Point-Unit Abschließende Bemerkungen Ausblick: Niagara II 2 Überblick 3 Der Weg zum UltraSPARC T1 Ursprüngliche Idee von Afara Websystems Inc. Übernahme durch Sun Microsystems Inc. Aug 2004: Präsentation des Niagara auf der HotChips 16 Nov 2005: UltraSPARC T1 ist fertig Dez 2005: Erste Server mit UltraSPARC T1 März 2006: OpenSPARC T1 veröffentlicht 4 Facts 64 Bit Multicore-Prozessor Verfügbar mit 4, 6 und 8 Cores Pro Core: 4 Threads 16 KBytes L1-ICache, 8 KByte L1-DCache 3 MByte Level 2-Cache Max. 1,2 GHz 72 Watt Leistungsbedarf 90nm-Technologie, 9 Layer 5 Anwendungsgebiete Parallele Anwendungen Zentral synchrone Dezentral asynchrone Anwendungen mit Anwendungen mit strukturierter Kontrollflussparallelität Datenflussparallelität Bsp.: Transaktionssysteme, Bsp.: Wettersimulation Webserver-, Datenbankanwendungen Geeignete Systeme: z.B. BlueGene von IBM Geeignete Systeme: z.B. FireT2000 von Sun mit UltraSPARC T1 Prozessor 6 Quelle: [1] 7 Core 8 Die Pipeline Quelle: [1] 9 Instruction Fetch Unit (IFU) Level 1 – Instruction Cache 16 KByte groß 4-fach assoziativ 32 Byte große Cache Line Instruction Fill Queue (IFQ) Missed Instruction List (MIL) Außerdem: Instruction Table Lookaside Buffer 10 IFU und Thread-Select Quelle: [1] 11 Thread-Select (TS) 2 Instruktionsregister pro Thread Program Counter (PC) pro Thread In jedem Takt: Erneute Auswahl des zu bearbeitenden Threads 12 Integer Register File Quelle: [1] 13 Load/Store Unit (LSU) Schnittstelle zw. Core und Crossbar Level 1 – Data Cache 8 KByte groß 4-fach assoziativ 16 Byte große Cache line Write Through Policy (!!!) Puffert 8 Stores und 1 Load pro Thread 14 Floating-Point Frontend Unit Decodiert Floating-Point Instruktionen Einfache Instruktionen werden hier erledigt Komplizierte Instruktionen werden an die globale Floating-Point Unit geschickt Floating-Point Register File 15 Memory Management Unit Translation Lookaside Buffer: ITLB in der IFU, DTLB in der LSU Beide enthalten 64 Einträge 48 Bit virtueller Adressraum 40 Bit physischer Adressraum Unterstützt zweite Virtualisierungsschicht für bis zu acht nebenläufige Betriebssysteme 16 Der Core enthält außerdem: Execute Unit Stream Processing Unit Multiplier Unit Trap Logic Unit 17 Crossbar 18 Der Crossbar Der Crossbar besteht aus zwei Teilen: Dem Processor-Cache-Crossbar ... Quelle: [1] ... und dem Cache-Processor-Crossbar Quelle: [1] 19 PCX – Data Flow 123 104 103 64 63 0 Ctrl Addr. Data Quelle: [1] 20 PCX – Control Flow Quelle: [1] 21 Level 2-Cache 22 Level 2-Cache – Überblick 4 Bänke Interleaved in 64 Byte Schritten Cache Line Größe: 64 Byte 12-fach assoziativ Gesamtgröße von 3 MByte, 768 KByte pro Bank Ersetzung: Pseudo Least Recently Used Directory-Struktur für Cache-Kohärenz mit den Level 1-Caches 23 Quelle: [1] 24 L2-Tag, -VAUD, -DATA L2-Tag: Hält 22 Bit der Adressen Single Ported L2-VAUD Bits: Valid, Allocated, Used, Dirty L2-Data: Lesezugriff von 16 oder 64 Byte Schreibzugriff von 64 Byte 25 Miss-, Fill-, Writeback-Buffer Miss-Buffer Puffer mit 16 Einträgen Speichert Cache-Miss-Instruktionen Fill-Buffer Puffer mit 8 Einträgen Speichert Load-Instruktionen, die gerade vom DRAM-Controller bearbeitet werden Writeback-Buffer Puffer mit 8 Einträgen, je 64 Byte Speichert auszulagernde (dirty) Cache Lines 26 Directory Directory zum Wahrung der Cache- Kohärenz Ein Directory mit 2048 Einträgen pro Bank 1024 für L1-ICaches, 1024 für L1-DCaches Sorgt dafür, dass L1-ICaches und L1- DCaches niemals die selben Daten halten Schickt Invalidierungspakete an die L1- Caches 27 DRAM-Controller 28 DRAM-Controller Unterstützt DDR2-DIMMs Taktfrequenz: 125 MHz bis 200 MHz Frequenz des Datenflusses: 250 oder 400 MHz 37 Bits Adressbreite für max. 128 GByte Speicher pro Controller Max. Bandbreite: 23 GByte/s (bei 200 MHz) 29 Floating-Point-Unit 30 Floating-Point-Unit Quelle: [1] 31 Input FIFO 2 FIFOs à 8 Einträge 155 Bit pro Eintrag Max. eine Instruktion pro Takt aus den FIFOs in die Pipelines DIV-Instruktionen haben die höchste Priorität 32 Pipelines (1/2) Floating-Point Adder: 4-stufige Pipeline Feste Latenz von 4 bzw. 5 Takten Durchsatz: 0,5 - 1 Instruktionen pro Takt Floating-Point Multipier: 6-stufige Pipeline Feste Latenz von 7 Takten Durchsatz: 0,5 Instruktionen pro Takt 33 Pipelines (2/2) Floating-Point Divider: 7-stufige Pipeline Keine feste Latenz, abhängig von den Operanden (!!!) Latenz (Single Precision): 9 – 32 Takte Latenz (Douple Precision): 9 – 61 Takte 34 Output Arbiter Schickt Ergebnisse über CPX zurück zu den Cores Keine FIFO-Queue, kein Puffer! DIV-Ergebnisse haben Priorität 35 Abschließende Bemerkungen 36 Abschließende Bemerkungen Pro Sehr niedrige Verlustleistung Hohe Nebenläufigkeit durch 32 Threads Hoher Datendurchsatz Kontra Nur eine Floating-Point-Unit Kleine Caches Nicht für Multiprozessorsysteme mit Shared- Memory geeignet 37 Ausblick: Der Niagara II 38 Niagara II (1/2) Erstmals vorgestellt auf der HotChips 18 Neuerungen am Core: Weiterhin 8 Cores Jetzt 2 Threadgruppen à 4 Threads pro Core Zwei Execute-Units pro Core Eine FGU pro Core L1-ICache jetzt 8-fach assoziativ 39 Niagara II (2/2) Weitere Neuerungen: L2-Cache jetzt 4 MByte groß Zwei 10 GBit Ethernet Ports PCI-Express x8 Port Vier FB-DIMM Controller Wird in 65nm-Technologie gefertigt Kommt voraussichtlich in der zweiten Jahreshälfte 2007 auf den Markt 40 Gibt es Fragen? 41 Einsatz bei Die Firma Strato AG ist zweitgrößter Webhoster Europas. „Einsatz von Sun Fire T2000 Servern bei Anwendungen mit hohem Datendurchsatz sowie Sun X4200 Servern für rechenintensive Anwendungen mit häufigen Gleitkommaberechnungen.“ „Durch den Einsatz der Sun Fire T2000 Internetserver konnte der Energiebedarf um ein Zehntel und der Platzbedarf im Rechenzentrum auf ein Fünftel gesenkt werden.“ „Unter unseren Anwendungen, egal ob Web- oder E- Mail-Server, entspricht die Leistung eines einzigen Sun Fire T2000 Servers mit nur einem T1 Prozessor der vergleichbaren Leistung eines 32-Prozessor-Systems.“ 42 Quellenangaben (1/3) [1] OpenSPARC T1 Micro Architecture Specification http://opensparc-t1.sunsource.net/specs/OpenSPARCT1_Micro_Arch.pdf [2] UltraSPARC Architecture 2005 http://opensparc-t1.sunsource.net/specs/UA2005-current-draft-P-EXT.pdf [3] UltraSPARC T1 Supplement to the UltraSPARC Architecture 2005 http://opensparc-t1.sunsource.net/specs/UST1-UASuppl-current-draft-P-EXT.pdf [4] A 32-way Multithreaded SPARC® Processor Autor: Poonacha Kongetira http://www.hotchips.org/archives/hc16/3_Tue/14_HC16_Sess10_Pres2_bw.pdf [5] Niagara: A 32-way Multithreaded SPARC® Processor Autoren: Poonacha Kongetira, Kathirgamar Aingaran, Kunle Olukotun http://ogun.stanford.edu/~kunle/publications/niagra_micro.pdf [6] Sun's Big Splash Autor: Linda Geppert http://ogun.stanford.edu/~kunle/publications/niagra_spectrum.pdf 43 Quellenangaben (2/3) [7] Sun stellt den UltraSPARC T1 mit CoolThreads vor Autor: Christian Vilsbeck http://www.tecchannel.de/news/themen/server/433065/index.html [8] Sun stellt neuen Multi-Core Prozessor UltraSPARC T1 vor http://de.sun.com/company/press-releases/2005/pm_087.html [9] Sun stellt mit Sun Fire T1000 und T2000 erste Systeme mit UltraSPARC T1 CoolThreads Technologie vor http://de.sun.com/company/press-releases/2005/pm_096.html [10] Sun Newsflash: Sun gibt Design für den UltraSPARC T1 Prozessor unter GPL Lizenz frei http://de.sun.com/company/press-releases/2006/pm_028.html [11] Sun will Veröffentlichung des Niagara-Chips vorziehen Autor: Stephen Shankland http://www.zdnet.de/news/hardware/0,39023109,39132763,00.htm [12] The Real Story about Sun's CoolThreads (aka Niagara) http://h71028.www7.hp.com/ERC/cache/280124-0-0-0-121.html?ERL=true 44 Quellenangaben (3/3) [13] AMD Opteron Processor Product Data Sheet http://www.amd.com/us- en/assets/content_type/white_papers_and_tech_docs/23932.pdf [14] FB-DIMM: Revolutionärer Speicher für Server http://www.tecchannel.de/server/hardware/402269/ [15] Niagara-2: A Highly Threaded Server-on-a-Chip Autor: Greg Grohoski http://www.opensparc.net/pubs/preszo/06/HotChips06_09_ppt_master.pdf [16] HOT CHIPS: Sun Niagara 2 mit 64 Threads Autor: Christian Vilsbeck http://www.tecchannel.de/news/themen/server/446213/ [17] Sun's Niagara 2 doubles down with twice the threads Autor: Stephen Shankland http://news.com.com/Suns+Niagara+2+doubles+down+with+twice+the+threads/ 2100-1006_3-6108880.html [18] Strato verbessert Energiebilanz um den Faktor zehn http://apps.sun.de/sunrdb/servlets/GetFile/192/1658_AB_Strato.pdf 45.
Recommended publications
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Ecpe Connections
    alumni newsletter fall 2 011 volume 28 issue 1 ecpe connections alum helps bring trees to 9/11 memorial plaza more inside ... • computer engineering student attends google camp • faculty makes breakthroughs in bandwidth recycling • alum becomes new ecpe external advisory board leader www.ece.iastate.edu learn invent impact department news letter from the chair in this issue am pleased to be able to share exciting news about the growth department news ........................ 3-6 and continued success of the Department of Electrical and Com- new faculty and staff • distinguished lecture puter Engineering (ECpE) and report on the accomplishments of I series • new bio-electrical engineering our students, faculty and staff, and alumni, as I begin my second course • faculty and staff recognitions • year as the Palmer Department Chair in Electrical and Computer new online graduate certificate Engineering. With 1,259 undergraduate students enrolled in our electrical, alumni news .............................. 7-10 computer, and software engineering programs, our department is murray harpole’s legacy remembered • in contributing to Iowa State University’s enrollment memoriam • new leader of external advisory Fall Student Enrollment being at an all-time high this fall. Our graduate board • advisory board members • alumni programs also are maintaining strong enrollment, with awards • alum helps bring trees to 9/11 335 students in master’s and PhD programs. Graduate memorial plaza We continue to develop our curriculum to meet Students: 335 the needs of our growing population, and have student news ......................... 12-14 introduced a new bio-electrical engineering course for Undergraduate information assurance student group • wind undergraduates and a new online graduate certificate Students: 1,259 energy research summer program • student in nondestructive evaluation (page 6).
    [Show full text]
  • Sun's Big Splash
    WINNER INTEGRATED CIRCUITS Sun’s Big Splash THE NIAGARA MICROPROCESSOR CHIP IS SUN’S BEST HOPE FOR A COMEBACK BY LINDA GEPPERT he Sunnyvale, Calif., campus of Sun But Sun, headquartered in Santa Clara, Calif., is still far from its glory days of the last decade. It could use a Microsystems Inc. is a quiet and small miracle to get back solidly on its feet, and at last the company may have one: a new microprocessor chip peaceful place with six low-rise build- intended for the volume servers that are the heart of ings connected by tree-lined walkways. data centers running the information and Web pro- cessing for businesses, universities, hospitals, factories, TBut the tranquility masks a frightening real- and the like. Sun’s engineers have had working chips since last spring and are now heavily into testing and ity—Sun is in serious economic trouble. The debugging them and making design changes for the next company was badly splattered by the burst of fabrication run in early 2005. The server business generates $50 billion a year, the dot-com bubble of 2000. Revenues for according to Jessica Yang, a research analyst at IDC, Framingham, Mass., and Sun’s share recently is about this once towering colossus of the server 12 percent—down from 17 percent just four years ago. Sun’s new chip, called Niagara for the torrent of data industry went south, and its stock plunged and instructions that flow between the chip and its from more than US $60 in 2000 to less than memory, was designed from the ground up to do away with the impact of latency—the idle time a micropro- $3 in 2002.
    [Show full text]
  • Understanding SPARC Processor Performance
    Understanding SPARC Processor Performance MAY 15 & 16, 2019 CLEVELAND PUBLIC AUDITORIUM, CLEVELAND, OHIO WWW.NEOOUG.ORG/GLOC About the Speaker • Akiva Lichtner • Physics background • Twenty years experience in IT • Enterprise production support analyst • Java developer • Oracle query plan manager … • Spoke here at G.L.O.C. about TDD and Java dynamic tracing Audience • Developers • System administrators • Tech support analysts • IT managers Motivation • I have been working in tech support for a large application • We have run SPARC T4 servers and now we run T7 servers • Application servers, database servers • Environments are all different • Users complained for years about “environment X” being slow, finally figured out why • What I learned can be very useful for users of SPARC servers What is SPARC? • First released in 1987, created by Sun Microsystems to replace the Motorola 68000 in its workstation products • During the .com boom Solaris/SPARC and Windows/Intel were the only supported platforms for the JVM • In 2000 the bubble burst and Sun server sales plunged • Sun acquired Afara Websystems, which had built an interesting new processor, and renamed it the UltraSPARC T1 • Was followed by T2 through M8, evolutions of the same design • More recently Oracle has added significant new functionality Processor Design • High core count (even in the early days) • Many threads per core • “Barrel” processor • Designed to switch efficiently • Non-uniform memory access • Per-processor shared cache • Core-level shared cache A picture speaks a thousand
    [Show full text]
  • Advanced Multipurpose Microprocessor
    © 2014 IJIRT | Volume 1 Issue 5 | ISSN : 2349-6002 ADVANCED MULTIPURPOSE MICROPROCESSOR Pankaj Gupta, Ravi Sangwan IT Department, mdu university Abstract- the Next Generation Multipurpose European Deep Sub-Micron (DSM) technology in order Microprocessor (NGMP) is a SPARC V8 (E) there have to meet increasing requirements on performance and to been three major revisions of the architecture. The first ensure the supply of European space processors. published revision was the 32-bit SPARC Version 7 (V7) in II. ARCHITECTURAL OVERVIEW 1986. SPARC Version 8 (V8), the main differences between V7 and V8 were the addition of integer multiply and divide It should be noted that this paper describes the current instructions, and an upgrade from 80-bit "extended state of the NGMP. The specification has been frozen and precision" floating-point arithmetic to 128-bit "quad- the activity is currently in its architectural design phase. precision" arithmetic. This paper describes the baseline SPARC machines have generally used Sun's SunOS, architecture, points out key choices that have been made and emphasises design decisions that are still open. The Solaris or Open Solaris, but other operating systems such software tools and operating systems that will be available as Next STEP, RTEMS, FreeBSD, OpenBSD, NetBSD, for the NGMP, together with a general overview of the new and Linux have also been used. LEON4FT microprocessor, are also described. Fig. 1 depicts an overview of the NGMP architecture. The system will consist of five AHB buses; one 128-bit I. BACKGROUND Processor bus, one 128-bit Memory bus, two 32-bit I/O The LEON project was started by the European Space buses and one 32-bit Debug bus.
    [Show full text]
  • Lecture Notes
    Thread-level Parallelism for the Masses Kunle Olukotun Computer Systems Lab Stanford University Feb 2007 The World has Changed • Process Technology Stops Improving ! Moore’s law but … ! Transistors don’t get faster and they leak more (65nm vs. 45nm) ! Wires are much worse • Single Thread Performance Plateau ! Design and verification complexity is overwhelming ! Power consumption increasing dramatically ! Instruction-level parallelism (ILP) has been mined out From Intel Developer Forum, September 2004 The Era of Single-Chip Multiprocessors • Single-chip multiprocessors provide a scalable alternative ! Relies on scalable forms of parallelism " Request level parallelism " Data level parallelism ! Modular design with inherent fault-tolerance and match to VLSI technology • Single-chip multiprocessors systems are here ! All processor vendors are following this approach ! In embedded, server, and even desktop systems • How do we architect CMPs to best exploit thread-level parallelism? ! Server applications: throughput ! General purpose and scientific applications: latency Outline • Motivation: The era of chip multiprocessors • Throughput and low power: Sun Niagara • Latency: Stanford TCC TLP for the Masses (Google) Source: Luiz Barroso, ACM Queue, Sept 2005 Constant Performance/Watt • TCO dominated by power costs Source: Luiz Barroso, ! 4 year server life cycle @ $ 0.09 KWh ACM Queue, Sept 2005 • We must improve performance/watt Commercial Server Workloads Web99 JBB TPC-C TPC-H SAP Web Java Domain OLTP DSS ERP server App. server Instruction-
    [Show full text]
  • Multicore Cpus: Processor Proliferation - IEEE Spectrum 2/15/11 1:51 PM
    Multicore CPUs: Processor Proliferation - IEEE Spectrum 2/15/11 1:51 PM SEMICONDUCTORS / PROCESSORS FEATURE Multicore CPUs: Processor Proliferation From multicore to many-core to hard-to-describe-in-a-single-word core By SAMUEL K. MOORE / JANUARY 2011 This is part of IEEE Spectrum's special report: Top 11 Technologies of the Decade Back in 1994, programmers figured that whatever code they wrote would run at least 50 percent faster on a 1995 machine and 50 percent faster still on a '96 system. Coding would continue as it always had, with instructions designed to be executed one after the other. But Kunle Olukotun, then a newly minted professor of electrical engineering at Stanford, saw that the party couldn't go on forever. The microprocessors of the day couldn't scale up as efficiently as you'd expect through the mere addition of ever more and ever faster transistors, the two things that Moore's Law provided. To solve that problem, Olukotun and his students designed the first general-purpose multicore CPU. This idea, more than any other in the past decade, is what has kept the semiconductor industry climbing the Moore's Law performance curve. Without multicore chips, the computing capability of everything from Illustration: Frank Chimero servers to netbooks would not be much better than it was a decade ago. Everyone's happy—except perhaps for the programmers, who must now write code with threads of instructions that must be executed together—in pairs, quartets, or even larger groupings. It's not that old, single-core CPUs weren't already doing some parallel processing.
    [Show full text]
  • Solaris Features
    INDEX 1. SOLARIS FEATURES 2. DIFFERENCE BETWEEN WINDOWS AND MACINTOSH 3. MAC OS X LEOPARD VS MICROSOFT WINDOWS VISTA 4. SUN MICROSYSTEM 5. NEW FEATURES OF THE FUTURE WINDOWS MEDIA PLAYER 12 SOLARIS FEFEATURES:ATURES: Feature Overview Get more details on the award winning and industry leading features in Solaris 10. Find out how these award winning features , Solaris Containers, ZFS, DTrace, and more can generate efficiencies and savings in your environment. Security Solaris 1 0 includes some of Observability the world's most advanced The Solaris 10 security features, such as release gives you Process and User Rights observability into Management, Trusted your system with Extensions for Mandatory tools such as Solaris Access Control, the Dynamic Tracing Cryptographic Framework (DTrace), which and Secure By Default enables real-time Networking that allow you application to safely deliver new debugging and solutions, consolidate with optimization. security and protect mission-critical data. Performance Platform Choice Solaris 10 delivers Solaris 10 is fully indisputable performance supported on more advantages for database, than 1200 SPARC- Web, and Java technology- based and x64/x86- based services, as well as based systems from massive scalability, top manufacturers, sh attering world records by including systems delivering unbeatable from Sun, Dell, HP, price/performance and IBM. advantages. Virtualization Networking The Solaris 10 OS With its optimized network includes industry- stack and support for first virtualization today’s advanced network features such as computing protocols, Solaris Containers, Solaris 10 delivers high- which let you performance networking to consolidate, isolate, most applications without and protect thousands modification. of applications on a single server.
    [Show full text]
  • Sparc Multithreading
    WINNER INTEGRATED CIRCUITS Sun’s Big Splash THE NIAGARA MICROPROCESSOR CHIP IS SUN’S BEST HOPE FOR A COMEBACK BY LINDA GEPPERT he Sunnyvale, Calif., campus of Sun But Sun, headquartered in Santa Clara, Calif., is still far from its glory days of the last decade. It could use a Microsystems Inc. is a quiet and small miracle to get back solidly on its feet, and at last the company may have one: a new microprocessor chip peaceful place with six low-rise build- intended for the volume servers that are the heart of ings connected by tree-lined walkways. data centers running the information and Web pro- cessing for businesses, universities, hospitals, factories, TBut the tranquility masks a frightening real- and the like. Sun’s engineers have had working chips since last spring and are now heavily into testing and ity—Sun is in serious economic trouble. The debugging them and making design changes for the next company was badly splattered by the burst of fabrication run in early 2005. The server business generates $50 billion a year, the dot-com bubble of 2000. Revenues for according to Jessica Yang, a research analyst at IDC, Framingham, Mass., and Sun’s share recently is about this once towering colossus of the server 12 percent—down from 17 percent just four years ago. Sun’s new chip, called Niagara for the torrent of data industry went south, and its stock plunged and instructions that flow between the chip and its from more than US $60 in 2000 to less than memory, was designed from the ground up to do away with the impact of latency—the idle time a micropro- $3 in 2002.
    [Show full text]
  • NIAGARA 2 OPENS the FLOODGATES Niagara 2 Design Is Closest Thing Yet to a True Server on a Chip by Harlan Mcghan {11/6/06-01}
    MICROPROCESSOR www.MPRonline.com THE REPORTINSIDER’S GUIDE TO MICROPROCESSOR HARDWARE NIAGARA 2 OPENS THE FLOODGATES Niagara 2 Design Is Closest Thing Yet to a True Server on a Chip By Harlan McGhan {11/6/06-01} At the recent Fall Microprocessor Forum, Sun Microsystems presented its new Niagara 2 microprocessor design, the successor to Niagara 1. Because Niagara 1 processors ship under the product name UltraSPARC T1, Niagara 2 will presumably go to market next year as the UltraSPARC T2. Sun’s presentation at MPF was deliv- simpler design era. (See MPR 11/17/03-03, “Will Micro- ered by Robert Golla, Niagara 2 principal architect. processors Become Simpler?”) Sun’s Niagara-series processors are, unquestionably, the odd duck among today’s server processors. Rival server- The Origin of Chip Multithreading processor designs differ from each other in degree. How Niagara is by no means the first radical processor design ever much advantage does the true simultaneous multithreading to reach market. To the contrary, the progression of this lat- (SMT) capability of dual-core, dual-threaded POWER6 est proposed design revolution traces a familiar arc. It began processors provide over the coarse-grained or vertical mul- as an academic research project. Once the feasibility of the tithreading (VMT) capability of dual-core, dual-threaded notion was established, it was implemented by a startup SPARC64 VI processors? Is the compiler-scheduled, static specifically created by its academic founder to give commer- in-order wide-superscalar issue of VLIW-style Itanium cial expression to the new idea. The startup was later processors better than the hardware-driven, dynamic out- acquired by an established system company that adopted the of-order wide-superscalar issue of RISC processors? What new idea for its own and played a key role in developing and benefits does HyperTransport (HT) technology confer on promoting the technology in the marketplace.
    [Show full text]
  • Multiprocessors M September 2005 September
    Queue September 2005 Vol. 3 No. Queue September 2005 Vol. KV the Konqueror Vol. 3 No. 7 Debating the Fuzzy Boundaries 7 September 2005 Make Way for MultiprocessorsMultiprocessors Scaling with CMPs Performance at What Price? Multicore Programming m o c . e u e u q m c a . w w w CONTENTSCONTENTS SEPTEMBER 2005 VOL. 3 NO. 7 MULTIPROCESSORS FOCUS The Future of Microprocessors 26 Kunle Olukotun and Lance Hammond, Stanford University The transition to chip multiprocessors is inevitable. Are you prepared to leverage their power? Extreme Software Scaling 36 Richard McDougall, Sun Microsystems Scaling with multiprocessors is no longer just for boutique, high-end servers. The Price of Performance 48 Luiz André Barroso, Google The question is not, “Can you afford to switch to multicore CPUs?” but rather, “Can you afford not to?” Software and the Concurrency Revolution 54 Herb Sutter and James Larus, Microsoft Multicore hardware architectures are blazing into the future. What’s the next step for the software community? 2 September 2005 QUEUE rants: [email protected] Rapid development with robust objects Lightning speed with a multidimensional engine Easy database administration Massive scalability on minimal hardware Database With Multidimensional Appeal. Caché is the first multidimensional database for transaction processing and real-time analytics. Its post-relational technology combines robust objects and robust SQL, thus eliminating object-relational mapping. It delivers massive scalability on minimal hardware, requires little administration, and incorporates a rapid application development environment. These innovations mean faster time-to-market, lower cost of operations, and higher application performance. We back these claims with this money-back guarantee: Buy Caché for new application development, and for up to one year you can return the license for a full refund if you are unhappy for any reason.* Caché is available for Unix, Linux, Windows, Mac OS X, and OpenVMS – and it's deployed on more than 100,000 systems ranging from two to over 50,000 users.
    [Show full text]